The Gateway Class Behind DSPy Modules

We’re examining how DSPy manages everything that happens around an LLM call, not just inside it. DSPy is a framework for building optimized LLM pipelines, and at the center of those pipelines is dspy.primitives.module.Module—the gateway class every program passes through before it hits the language model. I’m Mahmoud Zalt, an AI solutions architect, and we’ll unpack how this small file centralizes initialization, context, observability, and batching into one opinionated entry point—and what that design gives us for free.

Module as a Gateway, Not Just a Base Class

Inside DSPy, Module is more than an abstract superclass. It is the gateway every pipeline step passes through on its way to the LLM. That gateway is where DSPy enforces invariants, wires callbacks, tracks usage, and exposes a uniform interface for both sync and async execution.

 dspy/
 ├─ dsp/
 │  └─ utils/
 │     └─ settings.py
 ├─ predict/
 │  ├─ predict.py        (Predict)
 │  └─ parallel.py       (Parallel)
 ├─ primitives/
 │  ├─ base_module.py    (BaseModule)
 │  ├─ example.py        (Example)
 │  ├─ prediction.py     (Prediction)
 │  └─ module.py         (Module, ProgramMeta)
 └─ utils/
    ├─ callback.py       (with_callbacks)
    ├─ inspect_history.py (pretty_print_history)
    ├─ magicattr.py      (magicattr.set)
    └─ usage_tracker.py  (track_usage)

Caller code
   │
   ▼
Module.__call__ / Module.acall
   │   (with_callbacks, settings.context, track_usage)
   ▼
Module.forward / Module.aforward  (subclasses)
   │
   ▼
Predict.lm  (LLM calls, network I/O)

Module orchestrates settings, callbacks, tracking, and predictors before handing off to the LM.

A key requirement is that every module instance is correctly initialized, even if the author of a subclass forgets to call super().__init__(). DSPy solves this with a metaclass, ProgramMeta, which intercepts instance creation and injects the base initialization.

class ProgramMeta(type):
    """Metaclass ensuring every ``dspy.Module`` instance is properly initialised."""

    def __call__(cls, *args, **kwargs):
        obj = cls.__new__(cls, *args, **kwargs)
        if isinstance(obj, cls):
            Module._base_init(obj)
            cls.__init__(obj, *args, **kwargs)

            if not hasattr(obj, "callbacks"):
                obj.callbacks = []
            if not hasattr(obj, "history"):
                obj.history = []
        return obj

ProgramMeta guarantees base attributes on every Module instance regardless of subclass __init__.

Instead of relying on documentation (“don’t forget to call super()”), the framework enforces the invariant at the type level. Every module instance has consistent core state like callbacks and history, which in turn keeps the gateway logic simple and predictable.

Enforcing a Safe Call Path

With initialization handled, the next question is: what happens whenever a module is invoked? DSPy makes Module.__call__ the only supported entry point for doing work and layers all orchestration logic there.

@with_callbacks
def __call__(self, *args, **kwargs) -> Prediction:
    from dspy.dsp.utils.settings import thread_local_overrides

    caller_modules = settings.caller_modules or []
    caller_modules = list(caller_modules)
    caller_modules.append(self)

    with settings.context(caller_modules=caller_modules):
        if settings.track_usage and thread_local_overrides.get().get("usage_tracker") is None:
            with track_usage() as usage_tracker:
                output = self.forward(*args, **kwargs)
            tokens = usage_tracker.get_total_tokens()
            self._set_lm_usage(tokens, output)
            return output

        return self.forward(*args, **kwargs)

__call__ wraps forward with callbacks, context, and optional LM usage tracking.

Conceptually, a Module is a smart function. Subclasses implement forward as if it were a plain Python function, but callers always use output = my_module(...). Behind that simple call, the gateway:

Runs callbacks via @with_callbacks for logging, tracing, or metrics.
Updates the context with the caller stack (settings.caller_modules), so nested modules know who invoked them.
Optionally tracks token usage with track_usage() and routes the result into the output object.

There is a matching async gateway, acall, that wraps aforward with the same semantics. The implementation currently duplicates much of the sync path, which is a small refactoring opportunity, but the contract is clear: sync and async calls both go through the same policy layer.

Discouraging direct `forward` calls

To keep all of this logic centralized, Module gently steers developers away from calling forward directly by inspecting attribute access.

def __getattribute__(self, name):
    attr = super().__getattribute__(name)

    if name == "forward" and callable(attr):
        stack = inspect.stack()
        forward_called_directly = len(stack) <= 1 or stack[1].function != "__call__"

        if forward_called_directly:
            logger.warning(
                f"Calling module.forward(...) on {self.__class__.__name__} directly is discouraged. "
                f"Please use module(...) instead."
            )

    return attr

Direct forward calls still work, but emit a warning so usage converges on the gateway.

This uses inspect.stack() to see whether forward is being invoked via __call__ or from user code. Stack inspection has a cost, and a performance review of this file rightly calls it out as a potential hot-spot. Still, the pattern is useful: guide developers toward the safe path without breaking existing code.

Design pattern: let subclasses implement a plain forward, but route all real work through a gateway like __call__ so you have one place to attach logging, metrics, and policies.

Predictors as Pluggable Engines

With a single gateway for calls, the next layer is the “engines” that talk to the model. In DSPy those are Predict objects. A module may contain one or several predictors, and Module provides a minimal facade to discover and reconfigure them.

def named_predictors(self):
    from dspy.predict.predict import Predict

    return [
        (name, param)
        for name, param in self.named_parameters()
        if isinstance(param, Predict)
    ]

def predictors(self):
    return [param for _, param in self.named_predictors()]

Under the hood, BaseModule.named_parameters() walks module attributes. Here, Module simply filters for Predict instances and uses that list to implement higher-level operations:

Set the LM everywhere: set_lm(self, lm) iterates over predictors and assigns param.lm = lm.
Read a shared LM: get_lm() checks that all predictors share the same lm instance and either returns it or raises a ValueError with a clear message.
Transform predictors in bulk: map_named_predictors(func) applies an arbitrary function to each predictor and writes the result back using magicattr.set, which handles nested attributes.

The division of responsibilities is sharp: the module decides when to run and in what context; predictors decide how to talk to the LLM. The small API around named_predictors gives higher-level tooling a stable surface to plug into, whether that’s LM swapping, adding pricing metadata, or benchmarking.

Batching and Concurrency

Most real workloads need to call modules on many inputs at once. DSPy addresses this with Module.batch, which prepares work items and delegates execution to a separate Parallel module.

def batch(
    self,
    examples: list[Example],
    num_threads: int | None = None,
    max_errors: int | None = None,
    return_failed_examples: bool = False,
    provide_traceback: bool | None = None,
    disable_progress_bar: bool = False,
):
    exec_pairs = [(self, example.inputs()) for example in examples]

    parallel_executor = Parallel(
        num_threads=num_threads,
        max_errors=max_errors,
        return_failed_examples=return_failed_examples,
        provide_traceback=provide_traceback,
        disable_progress_bar=disable_progress_bar,
    )

    if return_failed_examples:
        results, failed_examples, exceptions = parallel_executor.forward(exec_pairs)
        return results, failed_examples, exceptions
    else:
        return parallel_executor.forward(exec_pairs)

batch turns examples into execution pairs and lets Parallel handle threading.

The method is intentionally thin:

Each Example becomes a set of keyword arguments via example.inputs().
Those arguments are paired with the module instance: (self, inputs).
Parallel.forward fans out over threads, calling the module behind the scenes.

The heavy lifting—thread management, error aggregation, progress reporting—lives in Parallel. However, there is an important concurrency trade-off that the code implicitly makes: it passes the same module instance into all workers. If forward mutates state on self (for example, appending to history), calls may interleave in ways that are hard to reason about.

Pattern	Pros	Risk
Shared `Module` across threads	Single configuration, one LM, fewer objects	Race conditions if `forward` mutates `self`
One `Module` per worker	Isolated state and history, easier debugging	More instances to manage, must share LM explicitly

Guideline: if your forward writes to attributes on self, assume a single module instance is not thread-safe. Either guard writes with locks or create separate module instances per worker.

Usage Tracking as Part of the Contract

One of the most useful aspects of Module is how it treats observability—especially token usage—as part of the gateway contract rather than an afterthought bolted onto predictors.

Attaching LM usage to predictions

When settings.track_usage is enabled and no thread-local usage tracker is present, __call__ wraps forward with track_usage(). After execution, it passes the collected token counts to _set_lm_usage along with the output.

def _set_lm_usage(self, tokens: dict[str, Any], output: Any):
    prediction_in_output = None
    if isinstance(output, Prediction):
        prediction_in_output = output
    elif isinstance(output, tuple) and len(output) > 0 and isinstance(output[0], Prediction):
        prediction_in_output = output[0]

    if prediction_in_output:
        prediction_in_output.set_lm_usage(tokens)
    else:
        logger.warning(
            "Failed to set LM usage. Please return `dspy.Prediction` object from "
            "dspy.Module to enable usage tracking."
        )

Token usage is attached to a Prediction if one is present in the output.

This introduces an explicit contract: to participate in usage tracking, forward must return a Prediction, or a tuple whose first element is a Prediction. If it doesn’t, the framework logs a warning and drops the token data. For a production system, that’s a subtle but important edge: an accidental change in return type can quietly disable cost visibility for that module.

Hardening this pattern in your own frameworks

In your own code, you can make this safer by failing fast in non-production environments. For example, if tracking is enabled and no Prediction is found, raise in development and only log a warning in production. That keeps the convenience of the gateway while surfacing broken contracts early.

Once usage is attached to Prediction, higher layers can emit metrics per module—invocation counts, latency, token totals—without every module author having to think about observability. The gateway does the wiring; the business logic stays focused on transforming inputs into outputs.

Guideline: pick one object that represents “a completed call” (here, Prediction) and hang all observability data off it. That keeps tracking aligned with how your application actually reasons about work.

Design Lessons You Can Reuse

Stepping back, the interesting part of Module is not any single method, but how much cross-cutting behavior it centralizes behind one gateway. If you’re building your own LLM orchestration layer or service framework, there are several patterns worth reusing.

1. Treat your core abstraction as a gateway

Pick one method—__call__, run, execute—and make it the only supported way to do work. Behind that gateway, handle context, callbacks, error policies, and tracking. Subclasses then only have to implement a straightforward forward-style method.

2. Enforce invariants with metaclasses or factories

When missing super() calls or partially initialized objects lead to subtle bugs, move initialization into a metaclass or a factory. ProgramMeta shows how to guarantee base fields like callbacks and history without trusting every subclass author to remember the right incantation.

3. Wrap internal engines with a tiny facade

Expose methods like named_predictors, set_lm, and get_lm so external code can reconfigure engines in bulk. The same pattern works for anything nested: database handles, HTTP clients, caches, or feature-flag clients.

4. Make observability part of the type contract

If tracking depends on a specific return type, make that contract explicit and, where possible, enforce it. A single well-defined result object that carries both business data and metrics data is easier to reason about than ad-hoc logs scattered across the stack.

5. Be explicit about concurrency semantics

Providing a convenient batch API is valuable, but it comes with expectations. Document whether your modules are safe to share across threads and structure batch helpers accordingly—either with shared instances plus locking, or with cloned instances that keep configuration but isolate state.

You don’t need to mirror DSPy’s implementation line by line to benefit from these ideas. The core lesson is to design a single, opinionated gateway class that quietly handles initialization, context, tracking, and concurrency so the rest of your system can stay simple, testable, and focused on domain logic.

Zalt Blog

The Gateway Class Behind DSPy Modules

Are you a software engineer moving into AI?

AI Personal Assistant

AI Marketing Manager

AI Sales Representative

AI Support Specialist

Module as a Gateway, Not Just a Base Class

Enforcing a Safe Call Path

Discouraging direct `forward` calls

Predictors as Pluggable Engines

Batching and Concurrency

Usage Tracking as Part of the Contract

Attaching LM usage to predictions

Design Lessons You Can Reuse

1. Treat your core abstraction as a gateway

2. Enforce invariants with metaclasses or factories

3. Wrap internal engines with a tiny facade

4. Make observability part of the type contract

5. Be explicit about concurrency semantics

Full Source Code

Read More

AI Is Not Deleting Jobs, It Is Rewriting Them: What I See From the Field

The Vibecoder's Handbook on Scoping Your MVP

Free AI Tools

About the Author

Support this content

Share this article

Zalt Blog

Are you a software engineer moving into AI?

AI Personal Assistant

AI Marketing Manager

AI Sales Representative

AI Support Specialist

Module as a Gateway, Not Just a Base Class

Enforcing a Safe Call Path

Discouraging direct forward calls

Predictors as Pluggable Engines

Batching and Concurrency

Usage Tracking as Part of the Contract

Attaching LM usage to predictions

Design Lessons You Can Reuse

1. Treat your core abstraction as a gateway

2. Enforce invariants with metaclasses or factories

3. Wrap internal engines with a tiny facade

4. Make observability part of the type contract

5. Be explicit about concurrency semantics

Full Source Code

Read More

AI Is Not Deleting Jobs, It Is Rewriting Them: What I See From the Field

The Vibecoder's Handbook on Scoping Your MVP

Free AI Tools

About the Author

Support this content

Share this article

Discouraging direct `forward` calls