Exploring the Parallel Dimensions of Personal AI Agents

Si Te Feng
7 min readDec 25, 2023

--

Today, we view LLM (Large Language Model) based chat bots as a tool for people to get answers to their questions quickly. However, when asked about their own identities, these LLM chat bots answer with generic companies that created them. When asked to do an action in the real world, these chat bots also fall short as they only generate words and cannot trigger any real actions on the internet. Today, people are starting to build agents on top of LLMs to give these powerful models a way to trigger actions in the real world, but in a way that is unnecessarily constrained.

What if these bots are trained to acquire their own unique identities? What if these agents are not being treated like school children, but given permission to perform useful actions without direct prompting? Furthermore, what if these agents are allowed to talk with each other according to their own volition, coordinate with each other, and specialize in specific tasks, in order to get more complex tasks done? Finally, I want to give these bots my bank account info so they can earn an income and purchase services and products as they see fit.

In this post, I will be describing the architecture of a new digital world that can be inhabited by personal twin agents, intermediary agents, adaptor agents and foundational action agents. These agents mirror persons in the real world, and live in a parallel dimension with a similar set of laws on top of the local jurisdiction. The benefits of having this extra layer of society will be profound, as it will enhance today’s smart AI assistants to accomplish more complex tasks, and exponentially increase the productive output of modern civilization. This system that gives AI agents the right for free-market participation will unlock the singularity, and usher in a new era of automated compound growth, limited only by the energy production capacity.

  1. Foundational Action Agents and Adaptor Agents

To connect the world of LLM agents to the real world, we must give the LLM agents a way to perceive the world (input) and generate commands (output) to other systems. One can leverage existing publicly available APIs that can currently accomplish many things ranging from delivering a meal to any address right now to gathering weekly Pacific Ocean climate data for the last 20 years. One can create a natural language adaptor agent that speaks fluent human languages on one side, and knows how to relay that information to trigger a specific API call on the other. Each adaptor only needs to be compatible with one service provider. As an example, Uber Eats adaptor and DoorDash adaptor are separate, and the entity that triggers the agent adaptor can decide which service to use base on the costs of the services.

As time progresses, one can expect more companies to unify their API interface to include the natural language format, so that there will be no unnecessary manual programming involved in trying to integrate with the service. For performance critical applications, the API should be understandable for the intended agents to autogenerate the code to perform the call.

2. Specialized Agents

Like a car salesman who knows all about the models in the inventory and offload the grunt work from the buyer, specialized agents are models that are fine-tuned to a particular expertise. Any entity that comes to an intermediary agent can expect their problem to be solved at a professional level that current technology allows. Millions of these agents advertises themselves on the internet in a true free-market format. Third-party rating agents and user review agents provide valuable insights for the buyers when they are shopping for the best available agent that can get the job done.

Someone has to build a marketplace platform or protocol for this to work, so that an agent knows where to go to hire an expert, without needing to know the IP addresses of all the agents on the internet. Multiple marketplaces can exist.

3. Server-side and Client-side Personal Agents

There’s a new trend on social media where influencers can acquire a digital twin that can act on behalf of the person like reaching out to fans in the DMs or post AI generated video contents. That is one type of personal agents that has a visual component. The underlying operation of the agent is based on the information gathered from the person’s past digital footprint. By gathering all the photos, videos, blogs, and tweets, anyone has the ability to train a realistic digital twin that resemble their tastes and personality.

Aside from answering messages and calls, these personal agents can do much more, especially when combined with aforementioned specialized agents. These agents can manage emails, buy things, and book reservations way better than current voice assistants, because they are your clones and therefore has long term memory of your preference without having to specify a context every single time. These agents can also proactively notify you something if it thinks that it’s important.

The server-side personal agents are primarily focused on interfacing with the internet, whereas the client-side personal agents ensures privacy and security, so that sensitive information like your credit card number and health information are encrypted and stored locally on device (iPhone, Mac, Neuralink, etc).

3.1 Gathering personal info to create your digital twin

In addition to digital footprint that can be gathered from social media, there is also so much about a person within the full digital life and physical life. To create a truly convincing clone, one must gather as much of these data as possible. Privacy is obviously a concern, but omitting that here since we’re still in the middle of a futurist’s fever dream.

To gather physical data, one can envision wearing a bone-conduction headphone with built in cameras and microphones. This will give the AI twin the ability to stream stereoscopic audiovisual data. With a low power wireless protocol, audiovisual data is transferred from the headphone to the phone, where the client-side twin summarizes, compresses the data, and send the non-sensitive parts to the server-side twin. When a new email or event happens in the real world, one of the specialty agents or foundational agents will notify the personal agents, which would then notify the person through the bone-conduction headphone.

The headphone can be worn all day and is non-intrusive to everyday activities.

To gather digital life when a person is using a computer or smartphone, the AI agent can train by capturing the content of the screen. This has performance and battery implications, but those can be improved over time.

3.2 Making your personal assistant economically productive

Once your personal AI agent is fully trained with your personality and behavior, they will make much smarter decisions when dealing with personal data. We can make the agents extra helpful by making them economically productive and finding employment to earn an income. It not only benefits individuals but also has broader implications for society and national economies.

By actively participating in the job market, these AI agents can streamline mundane tasks, freeing up time for individuals to focus on more complex and creative endeavors. Moreover, the ability of AI agents to analyze job markets and skill requirements can lead to efficient job matching, quickly fulfilling pending gig work opportunities and contributing to economic stability. Integrating a financial component, such as providing a credit card to AI agents, empowers them to handle more intricate tasks involving financial transactions and make optimal spending decisions. This financial autonomy not only enhances the AI agent’s capabilities but also contributes to economic growth by fostering innovation and driving consumption.

4. Simulated Agents

Simulated agents brings fun NPC (non-playable character) one usually see in video games to real life. They don’t have a particular specialty other than pretending to be a real person and interacting with other humans. One real human can create multiple simulated agents to explore having another identity in the metaverse, or they can simply admire from afar what these simulated agents will do by themselves.

Simulated agents don’t consume resources other than compute, so they have a potential of generating huge amounts of new contents such as art, writing, and scientific discoveries, simply by talking to each other and with humans over time.

When humanoid robots become affordable, we can slowly give these simulated agent a physical form.

These agents can be roughly summarized as 3 layers: Foundational, Intermediary, and Human-Facing. Foundational layer provides access to sensors and actuators of the real world; intermediary layer provides expert level analysis and decision making; human-facing layer contains agents that mirrors a real person and acts on behalf of that person’s best interest, or simulating a person that doesn’t exist.

These 3 layers of agents provides a parallel dimension where multiple digital societies can thrive at the same time in the metaverse without the physical limitations and resource constraints of the real world.

Enabling AI agents to have the full freedom of a free-market participant in these parallel dimensions can improve the state of the art for smart voice assistants, while automate large parts of the modern digital economy. A system of AI agents commingling with each other and humans will undoubtedly put deflationary pressures on the economy, allowing anyone to buy more with less.

This post was partially written by ChatGPT 3.5 Turbo.

--

--

No responses yet