Skip to main content

The Contract Behind Every AI Agent

Every AI agent hides an implicit contract. If you’re building with agents, understanding that contract is what keeps your systems predictable and sane.

Code Cracking
30m read
#AI#agents#softwaredesign#architecture
The Contract Behind Every AI Agent - Featured blog post image

CONSULTING

Learning to design multi-agent systems?

Agent contracts, interface design, multi-agent coordination — 1:1 mentoring on the architecture decisions that make or break agent systems.

We’re dissecting how crewAI defines an “agent” through its BaseAgent class, and how that contract quietly governs safety, scalability, and ergonomics across the framework. crewAI is an open‑source agent framework that wires LLMs, tools, knowledge, and security into collaborative AI workers. At the heart of that system is BaseAgent, the abstraction every concrete agent must satisfy.

I’m Mahmoud Zalt, an AI solutions architect helping teams turn AI into ROI, and we’ll walk this file like we’re pair‑programming through the backbone of the agent layer. By the end, you’ll see how to treat “what is an agent?” as an enforceable contract—not a loose pattern—and how to borrow these ideas in your own systems.

How BaseAgent Defines an Agent

BaseAgent sits in crewAI’s core agent layer, orchestrating tools, knowledge, security, and infrastructure wiring for all concrete agents.

crewAI project structure (simplified)

crewAI/
  lib/
    crewai/
      src/
        crewai/
          agents/
            agent_builder/
              base_agent.py   <-- BaseAgent (this file)
            cache/
              cache_handler.py
            tools_handler.py
          knowledge/
            knowledge.py
            knowledge_config.py
            source/
              base_knowledge_source.py
          mcp/
            config.py
          rag/
            embeddings/
              types.py
          security/
            security_config.py
          tools/
            base_tool.py
          utilities/
            config.py
            i18n.py
            logger.py
            rpm_controller.py
            string_utils.py
BaseAgent anchors the agent layer and connects tools, knowledge, security, and infra.

The core abstraction looks like this:

class BaseAgent(BaseModel, ABC, metaclass=AgentMeta):
    """Abstract Base Class for all third party agents compatible with CrewAI."""

    __hash__ = object.__hash__
    _logger: Logger = PrivateAttr(default_factory=lambda: Logger(verbose=False))
    _rpm_controller: RPMController | None = PrivateAttr(default=None)
    _request_within_rpm_limit: Any = PrivateAttr(default=None)
    _original_role: str | None = PrivateAttr(default=None)
    _original_goal: str | None = PrivateAttr(default=None)
    _original_backstory: str | None = PrivateAttr(default=None)
    _token_process: TokenProcess = PrivateAttr(default_factory=TokenProcess)

    id: UUID4 = Field(default_factory=uuid.uuid4, frozen=True)
    role: str = Field(description="Role of the agent")
    goal: str = Field(description="Objective of the agent")
    backstory: str = Field(description="Backstory of the agent")
    # ... many other configuration fields ...
Pydantic fields define the visible contract; private attributes hold runtime‑only wiring.

You can think of BaseAgent as the job description for an AI worker: it specifies identity, capabilities, and safety rules, while subclasses fill in the concrete behavior.

A few design choices shape this contract:

  • Configuration as data. Inheriting from pydantic.BaseModel makes fields typed, validated, and serializable. Identity, tools, apps, and knowledge are all explicit data, not ad‑hoc attributes.
  • Behavior as abstraction. As an ABC, BaseAgent defines abstract methods like execute_task, aexecute_task, and get_*_tools. The “what” is fixed; the “how” is delegated.
  • Runtime wiring kept private. Components like _logger, _rpm_controller, and _token_process are private attributes: they don’t leak into configuration or persistence.

With the shape of an agent defined, the next question is how the system enforces that shape—so invalid or unsafe agents never make it past construction.

Validation as a Customs Checkpoint

BaseAgent doesn’t just describe fields; it acts like a strict customs checkpoint. Pydantic v2 validators normalize configuration, enforce invariants, and adapt external objects into crewAI’s internal types.

At a high level, the class uses:

  • a pre‑model validator to preprocess raw config,
  • field validators to enforce the shape of tools, apps, MCPs, and IDs,
  • post‑model validators to assert critical invariants.

Tools as an Adapter Gateway

Tools are a good example of a rich but controlled interface. The tools field accepts both native crewAI tools and “LangChain‑like” tools, but always normalizes them to BaseTool instances:

@field_validator("tools")
@classmethod
def validate_tools(cls, tools: list[Any]) -> list[BaseTool]:
    """Validate and process the tools provided to the agent."""
    if not tools:
        return []

    processed_tools = []
    required_attrs = ["name", "func", "description"]
    for tool in tools:
        if isinstance(tool, BaseTool):
            processed_tools.append(tool)
        elif all(hasattr(tool, attr) for attr in required_attrs):
            processed_tools.append(Tool.from_langchain(tool))
        else:
            raise ValueError(
                f"Invalid tool type: {type(tool)}. "
                "Tool must be an instance of BaseTool or "
                "an object with 'name', 'func', and 'description' attributes."
            )
    return processed_tools
The tools validator doubles as an adapter: external tools are wrapped into BaseTool when possible.

This is a clean instance of the Adapter pattern: the system internally expects BaseTool, but will accept any object with the right attributes and adapt it via Tool.from_langchain.

  • Runtime safety. Once an agent is constructed, tools is guaranteed to be a list of BaseTool. Execution code can skip repetitive type checks.
  • Smoother integration. Existing tools from other ecosystems can be reused with minimal shaping instead of full rewrites.

Apps and MCPs as Structured Capabilities

Enterprise apps and MCP servers are also constrained early so their surface area stays manageable.

@field_validator("apps")
@classmethod
def validate_apps(
    cls, apps: list[PlatformAppOrAction] | None
) -> list[PlatformAppOrAction] | None:
    if not apps:
        return apps

    validated_apps = []
    for app in apps:
        if app.count("/") > 1:
            raise ValueError(
                f"Invalid app format '{app}'. Apps can only have one '/' for app/action format"
            )
        validated_apps.append(app)

    return list(set(validated_apps))
apps must be plain app names or a single app/action pair; more nesting is rejected.

For MCP (Model Context Protocol) servers, a dedicated validator restricts string references to specific prefixes (like https:// or crewai-amp:) and otherwise requires an MCPServerConfig object. That keeps references to external servers explicit and easy to reason about.

Identity and Narrative as Non‑Negotiables

The contract also enforces identity:

  • id is system‑owned. A validator (_deny_user_set_id) throws a PydanticCustomError if a value is provided. Every agent gets a UUID4 generated by the system.
  • role, goal, backstory are mandatory. A post‑model validator (validate_and_set_attributes) checks these fields and raises if any are missing.

That post‑model validator embodies a simple rule: you can’t have an anonymous, purposeless agent. Every agent must have a defined role and goal, even if it’s never surfaced directly to users.

With configuration guarded at the edge, the next concern is what happens when you start cloning agents to isolate work or scale out. That’s where copy semantics become part of the public contract.

Copy Semantics as Part of the Contract

Real systems rarely keep a single agent instance forever. You copy agents to isolate requests, run experiments, or spin up temporary workers. Copying the wrong things—like IDs, open connections, or heavy histories—can create subtle bugs and resource explosions.

BaseAgent defines its own copy method to make cloning explicit:

def copy(self) -> Self:  # type: ignore
    """Create a deep copy of the Agent."""
    exclude = {
        "id", "_logger", "_rpm_controller", "_request_within_rpm_limit",
        "_token_process", "agent_executor", "tools", "tools_handler",
        "cache_handler", "llm", "knowledge_sources", "knowledge_storage",
        "knowledge", "apps", "mcps", "actions",
    }

    existing_llm = shallow_copy(self.llm)
    copied_knowledge = shallow_copy(self.knowledge)
    copied_knowledge_storage = shallow_copy(self.knowledge_storage)

    existing_knowledge_sources = None
    if self.knowledge_sources:
        shared_storage = self.knowledge_sources[0].storage

        existing_knowledge_sources = []
        for source in self.knowledge_sources:
            copied_source = (
                source.model_copy()
                if hasattr(source, "model_copy")
                else shallow_copy(source)
            )
            copied_source.storage = shared_storage
            existing_knowledge_sources.append(copied_source)

    copied_data = self.model_dump(exclude=exclude)
    copied_data = {k: v for k, v in copied_data.items() if v is not None}
    return type(self)(
        **copied_data,
        llm=existing_llm,
        tools=self.tools,
        knowledge_sources=existing_knowledge_sources,
        knowledge=copied_knowledge,
        knowledge_storage=copied_knowledge_storage,
    )
Copying creates a new identity, reuses heavy resources, and keeps shared storage intentional.

Fresh Identity, Shared Heavy Resources

This method makes several deliberate choices:

  • Fresh identity and runtime state. Fields like id and private attributes (_logger, _rpm_controller, _token_process, etc.) are excluded. The new instance runs through normal validation and gets a brand‑new UUID and runtime wiring.
  • Shallow‑copied infra clients. llm, knowledge, and knowledge_storage are shallow‑copied. That’s a signal that these objects are either light handles (client objects) or intentionally shared.
  • Shared knowledge storage, copied sources. Each knowledge source is copied, but their .storage is set to a shared instance, so data lives in one place even if sources differ.

A practical analogy: you’re creating a new developer workstation with its own user account, but pointing it at the same shared file server. Each workstation is isolated in behavior and identity, while heavy storage is centralized.

The upside is clear: you avoid duplicating expensive resources like vector stores or LLM clients when cloning agents. The trade‑off is that shared mutable state becomes part of the contract; modifying that shared storage affects all agents that reference it.

Copy Behavior Is Part of the Public Contract

The important design lesson is that “what does it mean to copy this agent?” is not an implementation detail. In BaseAgent, copying means:

  • Configuration fields are duplicated.
  • Identity and ephemeral runtime state are regenerated.
  • Expensive external resources are shared on purpose.

With construction and cloning defined, we can look at runtime guardrails: how the base class shapes prompts, caching, and rate limiting without dictating exact agent behavior.

Runtime Guardrails: Prompts, Cache, and RPM

Once an agent starts doing work—calling LLMs, using tools, and querying knowledge—BaseAgent doesn’t implement the workflows, but it defines the hooks and controls that make those workflows safe and efficient.

Dynamic Prompts via Interpolation

Many systems need agent descriptions that adapt to the current request, like “{name}’s financial assistant.” BaseAgent handles this via interpolate_inputs:

def interpolate_inputs(self, inputs: dict[str, Any]) -> None:
    """Interpolate inputs into the agent description and backstory."""
    if self._original_role is None:
        self._original_role = self.role
    if self._original_goal is None:
        self._original_goal = self.goal
    if self._original_backstory is None:
        self._original_backstory = self.backstory

    if inputs:
        self.role = interpolate_only(
            input_string=self._original_role, inputs=inputs
        )
        self.goal = interpolate_only(
            input_string=self._original_goal, inputs=inputs
        )
        self.backstory = interpolate_only(
            input_string=self._original_backstory, inputs=inputs
        )
Original strings are cached once and treated as templates for request‑specific interpolation.
  • The first call caches the original role, goal, and backstory so interpolations don’t compound over time.
  • The agent’s key property uses these original values, not interpolated ones, for stable cache keys and identity.

The contract here separates identity (who the agent is) from presentation (how it describes itself in a given context), and it encodes that distinction in both data and caching behavior.

Caching as an Injected Capability

Caching is modeled as a pluggable concern rather than a built‑in behavior. The agent exposes a narrow method to wire in a CacheHandler:

def set_cache_handler(self, cache_handler: CacheHandler) -> None:
    """Set the cache handler for the agent."""
    self.tools_handler = ToolsHandler()
    if self.cache:
        self.cache_handler = cache_handler
        self.tools_handler.cache = cache_handler
Caching is toggled by configuration and provided as a collaborator.
  • Dependency injection. BaseAgent depends on the CacheHandler interface, not a concrete cache implementation. The agent layer stays infra‑agnostic.
  • Config‑driven behavior. The cache boolean field turns caching on or off. When false, set_cache_handler attaches no handler.

One subtle issue the analysis highlights: set_cache_handler always resets tools_handler. Calling it late in the lifecycle could wipe prior tools_handler state. A small refactor (or explicit documentation) would make this contract clearer: either the handler is only set once at initialization, or resetting is an intentional, documented side effect.

Rate Limits as One‑Shot Configuration

Rate limiting is similarly handled via an RPMController (requests‑per‑minute controller):

def set_rpm_controller(self, rpm_controller: RPMController) -> None:
    """Set the rpm controller for the agent."""
    if not self._rpm_controller:
        self._rpm_controller = rpm_controller
The first attached rate limiter wins; later calls are ignored.

Post‑model validators also auto‑create an RPMController if max_rpm is configured and no controller exists. That gives you a simple rule: if you set max_rpm, this agent will be rate‑limited.

The one‑shot behavior (if not self._rpm_controller) is a safety guard: rate limits aren’t silently changed mid‑flight by later code, which would make production debugging much harder.

With these runtime hooks in place, the final piece is how this contract behaves under load: many agents, many copies, and long‑lived processes.

Scale, State, and Operational Guardrails

Even though BaseAgent doesn’t itself call external services, the way it structures configuration, copying, and state has direct performance and operational implications. The analysis surfaces where the hot paths and risks are when you scale.

Hot Paths You Should Measure

Several operations become noticeable at scale:

  • Agent construction and validation. Every BaseAgent creation runs process_config and all validators.
  • Tool validation. validate_tools walks the tool list and may adapt each tool.
  • Copying agents. copy iterates over fields and knowledge sources.
  • Prompt interpolation. interpolate_inputs is linear in the size of role/goal/backstory strings.

None of these are expensive compared to LLM calls, but they do add up in high‑churn or large‑config scenarios. The analysis recommends making them first‑class metrics, for example:

Metric Why it matters Suggested SLO
agent_initialization_duration_ms Detect slow configs/validators when creating many agents. P95 < 50 ms per agent
agent_copy_duration_ms Track the cost of cloning for request isolation or experiments. P95 < 10 ms per copy
agent_tools_count Large tool sets increase validation and selection overhead. Warn > 100; alert > 500

Instrumenting these tells you when the agent contract is being stretched—for example, someone attaching hundreds of tools or building agents per request instead of reusing copies.

Unbounded State and Long‑Lived Agents

One explicit stateful field is tools_results:

tools_results: list[dict[str, Any]] = Field(
    default=[], description="Results of the tools used by the agent."
)
A convenient, but potentially unbounded, in‑memory log of tool calls.

This is handy for debugging and analytics, but it’s also a growth risk for long‑lived agents. The analysis suggests tightening the contract by:

  • adding a max_tools_results field, and
  • introducing an add_tool_result method that appends and prunes older entries when the cap is reached.

Operationally, you can pair that with a metric like agent_tools_results_entries and alert when it exceeds a threshold (for example, 1000 entries) to catch memory growth early.

Concurrency and Shared State

BaseAgent itself is not built as a concurrency‑safe abstraction. Mutable fields like tools_results, tools_handler, and knowledge_storage can be accessed concurrently if you reuse the same instance across threads or async tasks.

Combined with the shared‑storage copy semantics, the implied contract is:

  • Treat a single agent instance as single‑threaded unless you add your own synchronization.
  • Use copy() for per‑request isolation while intentionally sharing infra objects like vector stores.

Hooks for Observability

Finally, the class structure makes it easy to bolt on observability without polluting business logic:

  • Per‑agent logger. _logger is initialized once with a verbose flag, giving you per‑agent logging control.
  • Natural trace spans. Agent initialization, copy, and subclass task execution are natural boundaries for spans tagged with agent id, key, tool counts, and knowledge counts.
  • Metric naming follows responsibilities. The metrics discussed (agent_initialization_duration_ms, agent_tools_count, etc.) line up directly with the base class’s responsibilities.

The result is a contract that doesn’t pick an observability stack for you, but makes clear what you should measure around agents.

Design Principles to Reuse

The core lesson from BaseAgent is that the power of an agent framework comes less from prompts and more from the contract that defines what an agent is. crewAI treats that contract as something enforced in code, not just described in docs.

  1. Make the agent contract explicit and enforced.
    Use a model layer (Pydantic or equivalent) to define identity, capabilities, and external references. Validate tools, apps, and MCPs aggressively. Reject bad configs at construction so runtime behavior can assume a consistent shape.
  2. Treat copy semantics as part of the public API.
    Decide upfront what a “copy” means: which fields are duplicated, which are regenerated (IDs, loggers, transient state), and which heavy resources are shared. Implement that explicitly in a copy or clone method and document it for framework users.
  3. Model infra concerns as collaborators.
    Caching, rate limiting, knowledge, and security are not hard‑wired; they’re injected as CacheHandler, RPMController, Knowledge, and SecurityConfig. This keeps the base agent portable, testable, and easier to evolve.
  4. Build guardrails for scale into the design.
    Identify hot paths (initialization, interpolation, copying) and unbounded state (like tools_results). Add limits and metrics so the contract holds when you go from a handful of agents to hundreds.
  5. Use validators as your customs checkpoint.
    Let validators normalize heterogeneous external inputs—tools from other ecosystems, app/action strings, MCP URLs—into a clean internal representation. That’s how you keep your agent core small while integrating with a messy outside world.

When you design your own agent framework—or any reusable base class—ask yourself:

  • What minimal identity must every instance have?
  • Which external resources must be injected rather than created inline?
  • How should copies behave, and how will we know when that design is under stress?

Answering those questions in code, the way BaseAgent does, is often the difference between an agent system that scales cleanly and one that devolves into one‑off exceptions. The contract behind every AI agent is where that difference starts.

Full Source Code

Direct source from the upstream repository. Preview it inline or open it on GitHub.

heads/main/lib/crewai/src/crewai/agents/agent_builder/base_agent.py

crewAIInc/crewAI • refs

Choose one action below.

Open on GitHub

Thanks for reading! I hope this was useful. If you have questions or thoughts, feel free to reach out.

Content Creation Process: This article was generated via a semi-automated workflow using AI tools. I prepared the strategic framework, including specific prompts and data sources. From there, the automation system conducted the research, analysis, and writing. The content passed through automated verification steps before being finalized and published without manual intervention.

Mahmoud Zalt

About the Author

I’m Zalt, a technologist with 16+ years of experience, passionate about designing and building AI systems that move us closer to a world where machines handle everything and humans reclaim wonder.

Let's connect if you're working on interesting AI projects, looking for technical advice or want to discuss anything.

Support this content

Share this article

CONSULTING

Agent interfaces poorly defined?

Every multi-agent system lives or dies by its contracts. When interfaces are vague, agents break in ways that are hard to debug and expensive to fix.