The anatomy of an AI agent
What is (and isn’t) an agent?
AI agents are clearly an important thing you should know about but what exactly is an agent? Almost every company or tech product seems to have one, but not everything that’s called an ‘agent’ really qualifies as one.
People are also starting to talk about and treat agents as if they’re real people. With OpenClaw, you can have an AI agent that has a name, a personality, a ‘heartbeat’ and a ‘soul’. You message them requests but they can also act on their own. They can pop up in Slack to answer questions. You need to onboard them like a real colleague. People give them their own email accounts and so on.
So if we’re going to anthropomorphise agents, then maybe a good way to explain how they work is to think about their anatomy. This is my attempt to explain what they are and how they work in plain English.
What is an agent? A model with tools in a loop
Imagine a simple chatbot powered by an LLM: you ask it a question and it gives you an answer.
You could give this chatbot tools (e.g. the ability to search the web or post updates to Slack) and it could be useful, but it still wouldn’t be an agent.
The key difference is that an agent runs in a loop. Give it a goal and it goes round and round, performing tasks, evaluating the results and deciding what to do next, until it gets there.
The core parts: brain, body, home and hands
If you wanted to make your own agent, you’d need the following four things. If you took any of them away, what you’ve got isn’t really an agent anymore.
- Brain (the LLM): does the thinking. You send a message and it sends one back. If you’re making your own agent, then you would usually be using an API from Anthropic, OpenAI, etc. and paying for usage.
- Body (the harness): the program that runs the loop in code. Claude Code is a harness. Codex is a harness. A Python script that calls an LLM API in a loop is a harness, and the simplest version is only about 50 lines of code. Without one, the brain is just a chatbot answering one question at a time.
- Home (the runtime): where the harness process actually runs. Your laptop, a server in the cloud, a Mac Mini, etc.
- Hands (tools): the functions the model can choose to call. A tool can be as simple as a calculator or a web search. Or it might be something more involved, like a Gmail connector.
The supporting parts: personality, memory, keys and voice
You can technically call something an agent with just the four core parts, but it would be of limited use. These next four parts are what take an agent from working to genuinely useful:
- Personality (the system prompt): a chunk of text added to the start of every conversation telling the model who it is, what it’s doing and how to behave. Without this, the agent has no identity or direction. It’s just a generic helper pointed at random tools.
- Memory: which has two parts. Short-term memory is the conversation history that the harness sends to the model on every turn. Long-term memory can be a file or database that tracks the history of what the agent has seen or done. Without long-term memory, every session starts from scratch.
- Keys (credentials): the API tokens, OAuth tokens and passwords that unlock the tools that reach protected systems like Gmail, GitHub or Stripe. The agent itself never sees the raw keys – the harness loads them and uses them when a tool needs to authenticate. Without keys, the agent is limited to tools that don’t need to log in to anything (calculators, web search and local file operations).
- Voice (interface): how you talk to the agent and how it talks back. Could be a terminal (like Claude Code), a chat UI (like ChatGPT), an email address it monitors, or a webhook that fires when something happens. The interface is whatever sends input into the loop and surfaces output back out.
Isn't an agent meant to have autonomy?
One thing I struggled with is whether or not an agent needs to be autonomous to count as one. People call Claude Code an agent, but (in most cases) you have to tell it what to do for it to act.
It turns out that agents can have two types of autonomy:
- Autonomy of action: when the model picks the steps within a task. You tell Claude Code to “fix the bug” and it figures out the rest: read the file, run the test, find the bug, edit the code. You didn’t dictate the steps, it did. This is what makes something an agent.
- Autonomy of initiation: when the agent decides when to act at all. It wakes up at 7am, checks your inbox and sends a summary. Or it notices a calendar conflict and resolves it without being asked.
When people talk about agents as colleagues, they’re picturing something that has both. But really, autonomy of action is what makes something an agent. Initiation is what makes it feel like a colleague.
Understanding them for what they are
Agents are getting more common and capable. They’re showing up in more products and workplaces – in your Slack channels, emails and so on. Increasingly they’re being presented as colleagues rather than tools.
That means the question of what actually counts as an agent matters more than it used to. If it’s model with tools in a loop, it’s an agent. If it’s essentially just a prompt you’ve saved for repeat usage, it’s not really an agent.
Products will keep coming and going, but these core ‘body’ parts won’t. Hopefully this gives you an easier way to understand them.
