What Is an AI Agent, Really? A Builder's Definition (Not the Hype)

What Is an AI Agent? The One-Sentence Answer

An AI agent is an LLM that autonomously decides its own next action, executes it via tools, observes the result, and repeats that loop until a stopping condition is met. Everything else, a chatbot, a pipeline, a prompt chain, is not an agent, no matter what the vendor calls it.

I am Mahmoud Zalt, an independent senior AI systems architect with 16+ years building production software since 2010. I created Laradock (millions of Docker installs) and Apiato, and I founded Sista AI. Through my AI agent development work I have designed and shipped agent systems across customer support, research, code review, and internal operations. This article is the definition I use with every client before a single line of code is written. Read more about me or see my projects.

The Loop Is Everything

Strip away the marketing and an agent is three things: a reasoning engine (the LLM), a tool set (functions it can call), and a loop with a stopping condition. In pseudocode it looks like this:

state = initial_prompt
while not done(state):
    action = llm.decide(state)
    observation = tools.execute(action)
    state = update(state, observation)
return state.final_answer

The word 'decide' is doing all the work. The model looks at what it knows, chooses which tool to call next (or decides to stop), calls it, reads the result, and plans its next move. No human hardcodes the sequence. The sequence emerges from reasoning.

That is what distinguishes an agent from a workflow. In a workflow a human pre-defines every step and branch. In an agent the model figures out the steps at runtime. Both are useful. They are not the same thing.

Chatbot vs. Workflow vs. Agent: A Concrete Comparison

Concept	Who decides the next step?	Can it call tools?	Loop?	When to use it
Chatbot	Human (each turn)	Sometimes	No	Q&A, support triage, guided conversations
Scripted workflow	Developer (hardcoded graph)	Yes	Conditional	Known, repeatable multi-step processes
AI agent	LLM (at runtime)	Yes (required)	Yes	Open-ended tasks where the path is unknown up front

A customer support bot that routes tickets by category is a chatbot. A pipeline that extracts PDF fields, validates them, and posts to an API is a workflow. A system that is given 'research this company and write a due-diligence summary' and figures out which searches to run, which pages to fetch, and when it has enough context to write is an agent.

The distinction matters because agents are harder to make reliable and more expensive per task. Reaching for one when a workflow would do is a common and costly mistake.

What Actually Makes Something a Real Agent

1. Dynamic tool selection

The model must choose from a menu of tools based on context. If the tool call sequence is fixed in code, you have a workflow. Real tool selection looks like: the agent decides to call web_search three times, then read_url twice, then write_draft, then critique_draft before returning. No developer scripted that order.

2. Stopping condition owned by the model

The agent must be able to decide 'I have enough information to answer' without a human counting its steps. A hard loop limit is a safety net, not the primary stopping mechanism. If your system only stops when a counter hits 5, it is a loop, not an agent.

3. State that accumulates across steps

Each tool result updates what the agent knows. The model reads the growing context window (or a structured scratchpad) to decide what to do next. Without this memory-within-a-run, the loop is blind.

4. Genuine ambiguity in the path

If you can fully describe every step before runtime, you do not need an agent. Agents earn their complexity only when the correct sequence of actions depends on information that is not available until the task starts.

What Teams Get Wrong (And It Costs Them)

I see four recurring mistakes when teams build or buy 'agents':

Wrapping a pipeline in agent framing. A sequence of five LLM calls where each prompt is hardcoded is a pipeline. Calling it an 'agentic workflow' does not change the architecture or its failure modes. The danger: you add agent-style complexity (memory, tool routing) to something that did not need it, and reliability drops with no benefit.
No stopping condition, just a step limit. If the only thing stopping your agent is max_iterations=10, it will confidently produce garbage when it hits that ceiling rather than saying 'I cannot complete this task.' Every agent needs an explicit 'I am done and here is why' path.
Tools that are too coarse. Giving an agent a single 'do everything in the CRM' tool is not tool-calling, it is chaos. Tools should be small, single-purpose, and have typed inputs and outputs. Think: search_contacts(query: str) -> list[Contact], not interact_with_crm(instruction: str) -> str.
No evals before production. Agent behavior is non-deterministic. Without a golden-set of 30 to 50 test cases with expected outcomes, you cannot know if a model upgrade or prompt change broke something. Ship evals before you ship the agent.

What a Production Agent Actually Needs

Demos are easy. Production is where the real architecture decisions live. Here is what I require before calling any agent system production-ready:

Observability

Every loop iteration must emit a structured trace: which tool was called, what the input was, what came back, how many tokens were consumed, and how long it took. Without this, debugging a failure is archaeology. Tools like LangSmith, Langfuse, or a custom OpenTelemetry span per tool call all work. Pick one and make it mandatory from day one.

Guardrails

Input guardrails check whether the task is within scope before the loop starts. Output guardrails check whether the final answer is safe and coherent before it is returned. Both are non-negotiable for any agent that touches user-facing output or business data.

Human-in-the-loop checkpoints

For irreversible actions (sending an email, making a payment, deleting a record), the agent must pause and surface a confirmation before executing. The model deciding autonomously to send 10,000 customer emails is not a feature, it is a liability. Design the pause point into the tool interface itself: send_email returns a preview and requires a confirm=True flag on the second call.

Cost budget per run

Set a hard token and dollar ceiling per invocation. An agent that loops 40 times on a confused task can cost 100x what you budgeted. The ceiling forces the agent to escalate rather than spiral.

Retrieval over context stuffing

Do not stuff 100 documents into the context window hoping the model finds the right one. Use RAG (retrieval-augmented generation) to give the agent a search_knowledge_base tool it calls when it needs specific facts. Smaller context, cheaper calls, more accurate results.

Tool-Calling and MCP: The Plumbing Underneath

Modern agents call tools via a function-calling interface built into the LLM API. The model outputs a structured JSON object naming the function and its arguments. Your runtime executes the function and feeds the result back as a new message. The model reads it and decides what to do next.

The Model Context Protocol (MCP) is an emerging open standard for exactly this interface. An MCP server exposes a set of typed tools. An MCP client (your agent runtime) discovers and calls them. The benefit is portability: the same tool server works with any MCP-compatible agent framework, whether that is a custom loop, Claude's tool use API, or an orchestration library like LangGraph.

In practice, when I build agents I define tools as MCP servers for anything that will be reused across projects (web search, database access, internal APIs), and inline simple tools as local functions for task-specific logic. The rule: if a tool needs its own auth, rate limiting, or retry logic, it belongs in a server, not inlined.

A concrete example: a research agent I built for a client had four MCP tools: web_search (Bing API), fetch_url (headless browser), search_internal_docs (vector search over a Notion export), and write_to_draft (Google Docs API). The agent decided the call order. We reused all four tools across three other agents without touching a line of tool code.

When You Do Not Need an Agent

This is the most useful section in the article. If any of the following are true, build a workflow or a simple LLM call instead:

You can enumerate every step before runtime. If you can draw the full flowchart today, a workflow is more reliable, cheaper, and easier to test.
The task always finishes in one or two LLM calls. A chatbot or a single prompt with a structured output schema is enough. Adding a loop adds failure modes.
Latency is critical. Agent loops compound latency. Three tool calls at 800ms each plus three LLM calls at 1.5s each is 6.9 seconds minimum. If you need a sub-second response, you need a different architecture.
The budget is tight. A GPT-4o agent running 8 iterations on a complex task can cost 10x to 50x more than a single well-engineered prompt. Run the numbers before you commit.
Reliability requirements are very high. Agents have higher variance than deterministic workflows. If the task cannot tolerate occasional wrong answers or mid-run failures, the bar for evals and guardrails is high enough that a simpler architecture often wins.

I tell clients: start with the simplest thing that can solve the problem. Reach for an agent only when the task genuinely requires runtime reasoning about which steps to take. That is rarer than the hype suggests.

Frequently Asked Questions

What is the difference between an AI agent and a chatbot?

A chatbot responds to a single user turn and waits for the next input. An AI agent runs a loop: it takes a goal, decides which tools to call, executes them, reads the results, and repeats until it decides the task is complete. A chatbot is reactive. An agent is goal-directed and autonomous within a run.

Is LangChain an AI agent?

LangChain is a framework for building agent systems, not an agent itself. It provides the scaffolding: memory abstractions, tool interfaces, chain and agent executor classes. You still have to define the tools, the prompt, and the stopping condition. The agent is what you build with the framework, not the framework itself.

What is the difference between an AI agent and an AI workflow?

In a workflow, a developer hardcodes every step and decision branch. In an agent, the LLM decides the next step at runtime based on what it has observed so far. Workflows are more predictable and cheaper. Agents handle open-ended tasks where the correct sequence of actions cannot be determined in advance. Most production systems benefit from both: workflows for the known paths, agents for the open-ended subtasks.

How do AI agents use tools?

The LLM outputs a structured function call (a JSON object with a tool name and typed arguments). The agent runtime executes the corresponding function (a web search, a database query, an API call) and returns the result as a new message in the conversation. The model reads the result and decides its next action. Modern tool-calling APIs from Anthropic, OpenAI, and Google all use this pattern. MCP standardizes the tool-server interface so tools can be shared across frameworks.

Are autonomous AI agents safe to deploy?

They can be, but safety has to be designed in. The requirements are: input and output guardrails, human-in-the-loop checkpoints for irreversible actions, a hard cost and iteration ceiling, full observability on every tool call, and a golden-set eval suite before any production deployment. An agent without these is not a product, it is a risk.

What does it cost to run an AI agent in production?

It depends heavily on the number of loop iterations and the model chosen. A simple 3-step agent on Claude Haiku might cost under $0.01 per run. A complex research agent running 15 iterations on Claude Sonnet can easily cost $0.50 to $2.00 per run. At scale, those numbers matter. I always benchmark cost-per-task during the eval phase and set per-run budgets before going live. Choosing a smaller model for lightweight tool calls (retrieval, formatting) and a larger model only for planning decisions is a common cost optimization.

Working With a Builder Who Has Shipped This in Production

Definitions matter before architecture decisions, and architecture decisions matter before code. If your team is trying to figure out whether you need an agent, a workflow, or something simpler, that clarity is faster to reach with someone who has built all three in production. My AI agent development work covers architecture scoping, agent design, evals, guardrails, and production observability. If you have a specific problem to solve, reach out directly. I work with a small number of clients at a time, which means you get real attention, not a templated engagement.

Talk to me about your AI agent architecture

What Is an AI Agent, Really? A Builder's Definition (Not the Hype)

Are you a software engineer moving into AI?

AI Personal Assistant

AI Marketing Manager

AI Sales Representative

AI Support Specialist