When an Agent Loop Becomes a Control Tower

Complex AI agents rarely fail because of a single prompt or a single tool. They fail in the space between those pieces: the loops, the decisions, and the orchestration that glues everything together. In crewAI, that glue lives inside CrewAgentExecutor, a surprisingly rich class that turns raw LLMs and tools into reliable agents. I'm Mahmoud Zalt, an AI solutions architect, and we’ll walk through how this executor behaves like a control tower for your agents — and what we can reuse from its design when building our own orchestration code.

Setting the scene

We’re examining how crewAI runs a single agent to completion. crewAI is an orchestration framework for LLM‑powered agents; it doesn’t try to be an LLM or a tool library itself. At the center of its agents layer is CrewAgentExecutor, a class whose job is to decide when to call the LLM, when to call tools, how to handle errors, and when to stop.

project-root/
  lib/
    crewai/
      src/
        crewai/
          agents/
            base_agent_executor.py   # Base lifecycle and shared logic
            crew_agent_executor.py   # This file: orchestrates agent + tools + LLM
          core/
            providers/
              human_input.py         # Human feedback provider used here
          events/
            event_bus.py             # crewai_event_bus observed by executor
            types/
              logging_events.py      # AgentLogsStartedEvent, AgentLogsExecutionEvent
              tool_usage_events.py   # ToolUsage* events from tool execution
          utilities/
            agent_utils.py           # LLM response helpers, context handling
            file_store.py            # get_all_files/aget_all_files for multimodal
            training_handler.py      # CrewTrainingHandler for TRAINING_DATA_FILE
            tool_utils.py            # execute_tool_and_check_finality, async variant
            i18n.py                  # I18N_DEFAULT for prompts and tool names

CrewAgentExecutor sits in the middle of the agents layer, orchestrating many utilities.

At a high level, a run looks like this:

invoke/ainvoke is called with a dict of inputs.
Prompts are formatted, multimodal files attached, and an initial message history is built.
A main loop runs: call the LLM, interpret the result as either an AgentAction (use a tool) or AgentFinish (we’re done).
Tool calls are executed, results logged and appended to messages.
Human feedback and training data are optionally captured.

This is not a thin wrapper around an LLM. It’s the control tower for a single agent: it decides who talks when, tracks shared history, enforces limits, and tells everyone when the flight is over.

The agent loop as a control tower

Once the executor is wired up, the core question becomes: how does this control tower make sure a conversation actually lands? That logic lives in the agent loop.

The first decision in each run is whether to use native function calling or a ReAct‑style text protocol. The executor chooses a strategy up front:

def _invoke_loop(self) -> AgentFinish:
    """Execute agent loop until completion."""
    use_native_tools = (
        hasattr(self.llm, "supports_function_calling")
        and callable(getattr(self.llm, "supports_function_calling", None))
        and self.llm.supports_function_calling()
        and self.original_tools
    )

    if use_native_tools:
        return self._invoke_loop_native_tools()

    return self._invoke_loop_react()

One executor, two strategies: native tools vs ReAct.

This is a straightforward Strategy pattern: the goal (“run the agent to completion”) is fixed, but the algorithm depends on LLM capabilities. The rest of the class is structured around this switch.

The ReAct path exposes the full machinery of the control tower:

def _invoke_loop_react(self) -> AgentFinish:
    formatted_answer = None
    while not isinstance(formatted_answer, AgentFinish):
        try:
            if has_reached_max_iterations(self.iterations, self.max_iter):
                formatted_answer = handle_max_iterations_exceeded(
                    formatted_answer,
                    printer=PRINTER,
                    messages=self.messages,
                    llm=cast("BaseLLM", self.llm),
                    callbacks=self.callbacks,
                    verbose=self.agent.verbose,
                )
                break

            enforce_rpm_limit(self.request_within_rpm_limit)

            answer = get_llm_response(
                llm=cast("BaseLLM", self.llm),
                messages=self.messages,
                callbacks=self.callbacks,
                printer=PRINTER,
                from_task=self.task,
                from_agent=self.agent,
                response_model=self.response_model,
                executor_context=self,
                verbose=self.agent.verbose,
            )

            # ... parse into AgentAction or AgentFinish ...

            if isinstance(formatted_answer, AgentAction):
                tool_result = execute_tool_and_check_finality(...)
                formatted_answer = self._handle_agent_action(
                    formatted_answer, tool_result
                )

            self._invoke_step_callback(formatted_answer)
            self._append_message(formatted_answer.text)

        except OutputParserError:
            formatted_answer = handle_output_parser_exception(...)

        except Exception as e:
            if e.__class__.__module__.startswith("litellm"):
                raise e
            if is_context_length_exceeded(e):
                handle_context_length(...)
                continue
            handle_unknown_error(PRINTER, e, verbose=self.agent.verbose)
            raise e
        finally:
            self.iterations += 1

    if not isinstance(formatted_answer, AgentFinish):
        raise RuntimeError("Agent execution ended without reaching a final answer.")

    self._show_logs(formatted_answer)
    return formatted_answer

The ReAct loop: limits, LLM calls, tools, callbacks, and robust error handling.

A few orchestration choices stand out:

Termination is explicit. has_reached_max_iterations and handle_max_iterations_exceeded guarantee the loop ends. You never silently spin as the LLM keeps requesting tools.
Rate limiting is at the loop boundary. enforce_rpm_limit runs once per iteration, so request budgets are enforced where you can see them, not buried in a client wrapper.
Context length is a handled failure mode. is_context_length_exceeded and handle_context_length are integrated into the loop. Instead of letting providers throw and crash the run, the executor trims or adjusts history and retries.
Parser failures are treated as normal. OutputParserError is caught and normalized via handle_output_parser_exception, acknowledging that ReAct parsing is probabilistic and must be retried.

The result is simple but critical: the loop either finishes with a valid AgentFinish or fails loudly with a clear error. For production agents, that boring predictability is the difference between “works in a notebook” and “survives real users.”

Tool calls as a disciplined kitchen

Once the loop decides a tool should run, the executor shifts from control tower to restaurant kitchen. The LLM places orders (tool calls), the executor dispatches them to functions, and then plates the result back into the shared conversation.

Native tools are where this kitchen is most structured. The central worker is _execute_single_native_tool_call, which concentrates argument handling, limits, caching, hooks, and events in one place:

def _execute_single_native_tool_call(
    self,
    *,
    call_id: str,
    func_name: str,
    func_args: str | dict[str, Any],
    available_functions: dict[str, Callable[..., Any]],
    original_tool: Any | None = None,
    should_execute: bool = True,
) -> dict[str, Any]:
    args_dict, parse_error = parse_tool_call_args(
        func_args, func_name, call_id, original_tool
    )
    if parse_error is not None:
        return parse_error

    max_usage_reached = False
    if not should_execute and original_tool:
        max_usage_reached = True
    elif (
        should_execute
        and original_tool
        and (max_count := getattr(original_tool, "max_usage_count", None)) is not None
        and getattr(original_tool, "current_usage_count", 0) >= max_count
    ):
        max_usage_reached = True

    from_cache = False
    result: str = "Tool not found"
    input_str = json.dumps(args_dict) if args_dict else ""
    if self.tools_handler and self.tools_handler.cache:
        cached_result = self.tools_handler.cache.read(tool=func_name, input=input_str)
        if cached_result is not None:
            result = str(cached_result) if not isinstance(cached_result, str) else cached_result
            from_cache = True

    # Emit start event, run hooks, execute or skip, emit finished/error events,
    # and return a structured result dict.

A single tool call: parsing, limits, cache, hooks, and events handled together.

This function encapsulates several cross‑cutting concerns:

Argument parsing is centralized via parse_tool_call_args, so provider‑specific quirks don’t leak into the loop.
Usage limits (max_usage_count) live next to the tool, not in the control flow.
Caching is delegated to ToolsHandler.cache, but controlled here, with an optional cache_function policy on the tool.
Hooks around execution use ToolCallHookContext, enabling policy or tracing without touching core logic.
Events (ToolUsageStartedEvent, ToolUsageFinishedEvent, ToolUsageErrorEvent) are emitted predictably, baking observability into each call.

Conceptually, each tool call is a Command: an executable unit with metadata that can be logged, cached, and decorated. The executor is the command dispatcher.

After execution, the result is stitched back into the conversation and may even terminate the run:

def _append_tool_result_and_check_finality(
    self, execution_result: dict[str, Any]
) -> AgentFinish | None:
    call_id = cast(str, execution_result["call_id"])
    func_name = cast(str, execution_result["func_name"])
    result = cast(str, execution_result["result"])
    original_tool = execution_result["original_tool"]

    tool_message: LLMMessage = {
        "role": "tool",
        "tool_call_id": call_id,
        "name": func_name,
        "content": result,
    }
    self.messages.append(tool_message)

    if (
        original_tool
        and hasattr(original_tool, "result_as_answer")
        and original_tool.result_as_answer
    ):
        return AgentFinish(
            thought="Tool result is the final answer",
            output=result,
            text=result,
        )
    return None

Tool outputs become notebook entries; some tools can terminate the run.

This ties into an important metaphor: the message history is a shared notebook. User, assistant, and tools all write into it. The executor keeps the notebook coherent and respects tools that declare, via result_as_answer, “this output is the final answer.”

ReAct vs native tools: one brain, two strategies

ReAct and native tools look different, but the executor treats them as two strategies for the same mental loop: repeatedly “think → maybe act → think again” until you reach AgentFinish.

With native tools, the loop leans on provider‑level structured calling. It converts internal tools into a provider schema, then interprets responses as either tool calls or final text:

openai_tools, available_functions, self._tool_name_mapping = (
    convert_tools_to_openai_schema(self.original_tools)
)

while True:
    # ... max_iter, rpm ...
    answer = get_llm_response(
        llm=cast("BaseLLM", self.llm),
        messages=self.messages,
        callbacks=self.callbacks,
        printer=PRINTER,
        tools=openai_tools,
        available_functions=None,
        ...,
    )

    if isinstance(answer, list) and answer and self._is_tool_call_list(answer):
        tool_finish = self._handle_native_tool_calls(answer, available_functions)
        if tool_finish is not None:
            return tool_finish
        continue

    if isinstance(answer, str):
        formatted_answer = AgentFinish(thought="", output=answer, text=answer)
        # ... log, append, return ...

Native loop: structured tool calls first, then final text or model objects.

Under the hood, helpers like _is_tool_call_list and _parse_native_tool_call recognize provider‑specific shapes (OpenAI, Anthropic, Bedrock, Gemini) and normalize them to simple tuples like (call_id, func_name, func_args). That’s a clean Adapter pattern: external protocol diversity, internal uniformity.

A subtle part of this design is how it treats multiple tool calls in one response. Should they run in parallel? The executor encodes the answer as a simple policy:

if len(parsed_calls) > 1:
    has_result_as_answer_in_batch = any(
        bool(
            original_tools_by_name.get(func_name)
            and getattr(original_tools_by_name.get(func_name), "result_as_answer", False)
        )
        for _, func_name, _ in parsed_calls
    )
    has_max_usage_count_in_batch = any(
        bool(
            original_tools_by_name.get(func_name)
            and getattr(original_tools_by_name.get(func_name), "max_usage_count", None)
            is not None
        )
        for _, func_name, _ in parsed_calls
    )

    # Preserve sequential behavior when semantics demand it.
    if has_result_as_answer_in_batch or has_max_usage_count_in_batch:
        logger.debug("Skipping parallel native execution...")
    else:
        # Build execution_plan and submit to ThreadPoolExecutor(...)

Parallelism is guarded by tool semantics like result_as_answer and usage limits.

The trade‑offs are explicit:

Correctness. Tools that cap their usage or directly answer the user should not run concurrently with casual threading around shared counters.
Performance. Clearly independent tools can be executed in parallel (up to a fixed worker limit) to cut tail latency.
Simplicity. Instead of a general DAG, the executor uses simple booleans on tools to decide whether parallelism is even allowed.

This is a reusable pattern: encode constraints as properties on tools, and let the orchestrator decide if and how to parallelize. You keep orchestration logic generic while still respecting domain semantics.

Hard‑earned lessons you can reuse

Stepping back, CrewAgentExecutor is a large class. Sync and async loops are duplicated, and inputs depend on specific dict keys like "input", "tool_names", and "tools" without strong validation. You could extract helpers like a dedicated ToolCallExecutor or TrainingRecorder to slim it down.

But the more important story is what this file teaches about building agent executors in general: how to design the loop as a control tower rather than a ball of glue. Here are the core lessons worth carrying into your own systems.

1. Treat the executor as a control tower, not a Swiss army knife

The executor already coordinates many concerns: LLM orchestration, tools, hooks, training data capture, human feedback, and logging. It works, but you can see the pressure on class size and complexity.

In your own designs, keep the control‑tower role but give it collaborators from day zero: one object responsible for the loop and messaging; separate components for tool execution, training recording, and human‑in‑the‑loop prompts. The orchestrator should coordinate flights, not repair engines.

2. Make the agent loop boringly predictable

The main loops here are not fancy, but they are deliberate:

Bounded iterations via max_iter and an explicit iteration counter.
Dedicated handling of OutputParserError and context‑length errors, with clear retry behavior.
A strong invariant: runs either end in AgentFinish or raise a RuntimeError rather than silently stopping.

For LLM systems, that kind of predictable loop is a feature. You want the non‑determinism in the model’s answers, not in your control flow.

3. Centralize tool semantics and policy

Tool semantics in this executor are funneled through a small set of functions and properties:

Caching decisions through ToolsHandler.cache and optional cache_function hooks.
Usage constraints via max_usage_count and current_usage_count.
Answer semantics through result_as_answer.
Hooks and events around every call for policy, tracing, and logging.

That centralization makes it possible to reason about performance, safety, and correctness in one place. If your tools have side effects, this is also the right layer to add idempotency guards or audit logging without touching the loop itself.

4. Hide provider quirks behind adapters

The native tools implementation has to deal with OpenAI’s function calls, Anthropic’s tool_use, Bedrock’s toolUseId, and Gemini’s function_call formats. The executor acknowledges these differences only in narrowly scoped helpers like _is_tool_call_list and _parse_native_tool_call, then moves on with a simple internal representation.

That’s textbook Adapter pattern. If you plan to support multiple providers, pick a small, clean internal schema for tool calls early, and treat every provider response as an input format to be adapted. Don’t let provider quirks leak into your main loop.

5. Design for observability from day one

Finally, CrewAgentExecutor shows what it looks like when observability is part of the orchestration contract:

Every agent run emits start and execution events on crewai_event_bus (AgentLogsStartedEvent, AgentLogsExecutionEvent).
Every tool emits start, finish, and error events, which can feed logs, metrics, or tracing systems.
Callbacks and hooks are first‑class, so external systems can attach behavior without patching core code.

The same concerns you see in the code — iterations, LLM calls, tool execution, context truncation, and errors — are the ones you should expose as metrics and alerts in your own executor. That alignment between control flow and telemetry is what makes production debugging tractable.

CrewAgentExecutor may look like “just another big class”, but read as a story, it’s about how to turn a raw LLM and a pile of tools into a dependable agent: a single control loop, two tool strategies, and a disciplined approach to limits, errors, and observability. The primary lesson is to design your agent loop as a control tower — a focused orchestrator that keeps everyone talking in the right order until the plane lands safely.

If you’re designing your own executors, a few concrete takeaways:

Give the loop clear termination rules and explicit error‑recovery paths, especially for parser and context‑length failures.
Centralize tool execution behind a small API that owns semantics, limits, caching, hooks, and events.
Hide provider quirks behind adapters and line up your telemetry with the control flow you actually care about.

As agents grow more complex, this control‑tower mindset becomes the difference between orchestrators that can be trusted in production and ones that remain fragile prototypes.

When an Agent Loop Becomes a Control Tower

Are you a software engineer moving into AI?

Vibe Codingwith Confidence

The Vibecoder's Handbook, from idea to production

Setting the scene

The agent loop as a control tower

Tool calls as a disciplined kitchen

ReAct vs native tools: one brain, two strategies

Hard‑earned lessons you can reuse

1. Treat the executor as a control tower, not a Swiss army knife

2. Make the agent loop boringly predictable

3. Centralize tool semantics and policy

4. Hide provider quirks behind adapters

5. Design for observability from day one

Full Source Code

About the Author

Support this content

Share this article

Get notified about new articles

AI Personal Assistant

AI Marketing Manager

AI Sales Representative

AI Support Specialist

Read More

How to Architect AI Cost: Controlling Token Spend Before It Runs Away

How to Hire an AI Consultant: A Practical Guide

Free AI Tools

Get AI advisory and consulting.