Skip to main content
المدونة

Zalt Blog

Deep Dives into Code & Architecture

AT SCALE

The Gateway Class Behind DSPy Modules

By محمود الزلط
Code Cracking
20m read
<

Curious how DSPy routes every pipeline step before it touches an LLM? This piece breaks down the gateway class behind DSPy modules and why it matters.

/>
The Gateway Class Behind DSPy Modules - Featured blog post image

MENTORING

1:1 engineering mentorship.

Architecture, AI systems, career growth. Ongoing or one-off.

We’re examining how DSPy manages everything that happens around an LLM call, not just inside it. DSPy is a framework for building optimized LLM pipelines, and at the center of those pipelines is dspy.primitives.module.Module—the gateway class every program passes through before it hits the language model. I’m Mahmoud Zalt, an AI solutions architect, and we’ll unpack how this small file centralizes initialization, context, observability, and batching into one opinionated entry point—and what that design gives us for free.

Module as a Gateway, Not Just a Base Class

Inside DSPy, Module is more than an abstract superclass. It is the gateway every pipeline step passes through on its way to the LLM. That gateway is where DSPy enforces invariants, wires callbacks, tracks usage, and exposes a uniform interface for both sync and async execution.

 dspy/
 ├─ dsp/
 │  └─ utils/
 │     └─ settings.py
 ├─ predict/
 │  ├─ predict.py        (Predict)
 │  └─ parallel.py       (Parallel)
 ├─ primitives/
 │  ├─ base_module.py    (BaseModule)
 │  ├─ example.py        (Example)
 │  ├─ prediction.py     (Prediction)
 │  └─ module.py         (Module, ProgramMeta)
 └─ utils/
    ├─ callback.py       (with_callbacks)
    ├─ inspect_history.py (pretty_print_history)
    ├─ magicattr.py      (magicattr.set)
    └─ usage_tracker.py  (track_usage)

Caller code
   │
   ▼
Module.__call__ / Module.acall
   │   (with_callbacks, settings.context, track_usage)
   ▼
Module.forward / Module.aforward  (subclasses)
   │
   ▼
Predict.lm  (LLM calls, network I/O)
Module orchestrates settings, callbacks, tracking, and predictors before handing off to the LM.

A key requirement is that every module instance is correctly initialized, even if the author of a subclass forgets to call super().__init__(). DSPy solves this with a metaclass, ProgramMeta, which intercepts instance creation and injects the base initialization.

class ProgramMeta(type):
    """Metaclass ensuring every ``dspy.Module`` instance is properly initialised."""

    def __call__(cls, *args, **kwargs):
        obj = cls.__new__(cls, *args, **kwargs)
        if isinstance(obj, cls):
            Module._base_init(obj)
            cls.__init__(obj, *args, **kwargs)

            if not hasattr(obj, "callbacks"):
                obj.callbacks = []
            if not hasattr(obj, "history"):
                obj.history = []
        return obj
ProgramMeta guarantees base attributes on every Module instance regardless of subclass __init__.

Instead of relying on documentation (“don’t forget to call super()”), the framework enforces the invariant at the type level. Every module instance has consistent core state like callbacks and history, which in turn keeps the gateway logic simple and predictable.

Enforcing a Safe Call Path

With initialization handled, the next question is: what happens whenever a module is invoked? DSPy makes Module.__call__ the only supported entry point for doing work and layers all orchestration logic there.

@with_callbacks
def __call__(self, *args, **kwargs) -> Prediction:
    from dspy.dsp.utils.settings import thread_local_overrides

    caller_modules = settings.caller_modules or []
    caller_modules = list(caller_modules)
    caller_modules.append(self)

    with settings.context(caller_modules=caller_modules):
        if settings.track_usage and thread_local_overrides.get().get("usage_tracker") is None:
            with track_usage() as usage_tracker:
                output = self.forward(*args, **kwargs)
            tokens = usage_tracker.get_total_tokens()
            self._set_lm_usage(tokens, output)
            return output

        return self.forward(*args, **kwargs)
__call__ wraps forward with callbacks, context, and optional LM usage tracking.

Conceptually, a Module is a smart function. Subclasses implement forward as if it were a plain Python function, but callers always use output = my_module(...). Behind that simple call, the gateway:

  • Runs callbacks via @with_callbacks for logging, tracing, or metrics.
  • Updates the context with the caller stack (settings.caller_modules), so nested modules know who invoked them.
  • Optionally tracks token usage with track_usage() and routes the result into the output object.

There is a matching async gateway, acall, that wraps aforward with the same semantics. The implementation currently duplicates much of the sync path, which is a small refactoring opportunity, but the contract is clear: sync and async calls both go through the same policy layer.

Discouraging direct forward calls

To keep all of this logic centralized, Module gently steers developers away from calling forward directly by inspecting attribute access.

def __getattribute__(self, name):
    attr = super().__getattribute__(name)

    if name == "forward" and callable(attr):
        stack = inspect.stack()
        forward_called_directly = len(stack) <= 1 or stack[1].function != "__call__"

        if forward_called_directly:
            logger.warning(
                f"Calling module.forward(...) on {self.__class__.__name__} directly is discouraged. "
                f"Please use module(...) instead."
            )

    return attr
Direct forward calls still work, but emit a warning so usage converges on the gateway.

This uses inspect.stack() to see whether forward is being invoked via __call__ or from user code. Stack inspection has a cost, and a performance review of this file rightly calls it out as a potential hot-spot. Still, the pattern is useful: guide developers toward the safe path without breaking existing code.

Predictors as Pluggable Engines

With a single gateway for calls, the next layer is the “engines” that talk to the model. In DSPy those are Predict objects. A module may contain one or several predictors, and Module provides a minimal facade to discover and reconfigure them.

def named_predictors(self):
    from dspy.predict.predict import Predict

    return [
        (name, param)
        for name, param in self.named_parameters()
        if isinstance(param, Predict)
    ]

def predictors(self):
    return [param for _, param in self.named_predictors()]

Under the hood, BaseModule.named_parameters() walks module attributes. Here, Module simply filters for Predict instances and uses that list to implement higher-level operations:

  • Set the LM everywhere: set_lm(self, lm) iterates over predictors and assigns param.lm = lm.
  • Read a shared LM: get_lm() checks that all predictors share the same lm instance and either returns it or raises a ValueError with a clear message.
  • Transform predictors in bulk: map_named_predictors(func) applies an arbitrary function to each predictor and writes the result back using magicattr.set, which handles nested attributes.

The division of responsibilities is sharp: the module decides when to run and in what context; predictors decide how to talk to the LLM. The small API around named_predictors gives higher-level tooling a stable surface to plug into, whether that’s LM swapping, adding pricing metadata, or benchmarking.

Batching and Concurrency

Most real workloads need to call modules on many inputs at once. DSPy addresses this with Module.batch, which prepares work items and delegates execution to a separate Parallel module.

def batch(
    self,
    examples: list[Example],
    num_threads: int | None = None,
    max_errors: int | None = None,
    return_failed_examples: bool = False,
    provide_traceback: bool | None = None,
    disable_progress_bar: bool = False,
):
    exec_pairs = [(self, example.inputs()) for example in examples]

    parallel_executor = Parallel(
        num_threads=num_threads,
        max_errors=max_errors,
        return_failed_examples=return_failed_examples,
        provide_traceback=provide_traceback,
        disable_progress_bar=disable_progress_bar,
    )

    if return_failed_examples:
        results, failed_examples, exceptions = parallel_executor.forward(exec_pairs)
        return results, failed_examples, exceptions
    else:
        return parallel_executor.forward(exec_pairs)
batch turns examples into execution pairs and lets Parallel handle threading.

The method is intentionally thin:

  1. Each Example becomes a set of keyword arguments via example.inputs().
  2. Those arguments are paired with the module instance: (self, inputs).
  3. Parallel.forward fans out over threads, calling the module behind the scenes.

The heavy lifting—thread management, error aggregation, progress reporting—lives in Parallel. However, there is an important concurrency trade-off that the code implicitly makes: it passes the same module instance into all workers. If forward mutates state on self (for example, appending to history), calls may interleave in ways that are hard to reason about.

Pattern Pros Risk
Shared Module across threads Single configuration, one LM, fewer objects Race conditions if forward mutates self
One Module per worker Isolated state and history, easier debugging More instances to manage, must share LM explicitly

Usage Tracking as Part of the Contract

One of the most useful aspects of Module is how it treats observability—especially token usage—as part of the gateway contract rather than an afterthought bolted onto predictors.

Attaching LM usage to predictions

When settings.track_usage is enabled and no thread-local usage tracker is present, __call__ wraps forward with track_usage(). After execution, it passes the collected token counts to _set_lm_usage along with the output.

def _set_lm_usage(self, tokens: dict[str, Any], output: Any):
    prediction_in_output = None
    if isinstance(output, Prediction):
        prediction_in_output = output
    elif isinstance(output, tuple) and len(output) > 0 and isinstance(output[0], Prediction):
        prediction_in_output = output[0]

    if prediction_in_output:
        prediction_in_output.set_lm_usage(tokens)
    else:
        logger.warning(
            "Failed to set LM usage. Please return `dspy.Prediction` object from "
            "dspy.Module to enable usage tracking."
        )
Token usage is attached to a Prediction if one is present in the output.

This introduces an explicit contract: to participate in usage tracking, forward must return a Prediction, or a tuple whose first element is a Prediction. If it doesn’t, the framework logs a warning and drops the token data. For a production system, that’s a subtle but important edge: an accidental change in return type can quietly disable cost visibility for that module.

Hardening this pattern in your own frameworks

In your own code, you can make this safer by failing fast in non-production environments. For example, if tracking is enabled and no Prediction is found, raise in development and only log a warning in production. That keeps the convenience of the gateway while surfacing broken contracts early.

Once usage is attached to Prediction, higher layers can emit metrics per module—invocation counts, latency, token totals—without every module author having to think about observability. The gateway does the wiring; the business logic stays focused on transforming inputs into outputs.

Design Lessons You Can Reuse

Stepping back, the interesting part of Module is not any single method, but how much cross-cutting behavior it centralizes behind one gateway. If you’re building your own LLM orchestration layer or service framework, there are several patterns worth reusing.

1. Treat your core abstraction as a gateway

Pick one method—__call__, run, execute—and make it the only supported way to do work. Behind that gateway, handle context, callbacks, error policies, and tracking. Subclasses then only have to implement a straightforward forward-style method.

2. Enforce invariants with metaclasses or factories

When missing super() calls or partially initialized objects lead to subtle bugs, move initialization into a metaclass or a factory. ProgramMeta shows how to guarantee base fields like callbacks and history without trusting every subclass author to remember the right incantation.

3. Wrap internal engines with a tiny facade

Expose methods like named_predictors, set_lm, and get_lm so external code can reconfigure engines in bulk. The same pattern works for anything nested: database handles, HTTP clients, caches, or feature-flag clients.

4. Make observability part of the type contract

If tracking depends on a specific return type, make that contract explicit and, where possible, enforce it. A single well-defined result object that carries both business data and metrics data is easier to reason about than ad-hoc logs scattered across the stack.

5. Be explicit about concurrency semantics

Providing a convenient batch API is valuable, but it comes with expectations. Document whether your modules are safe to share across threads and structure batch helpers accordingly—either with shared instances plus locking, or with cloned instances that keep configuration but isolate state.

You don’t need to mirror DSPy’s implementation line by line to benefit from these ideas. The core lesson is to design a single, opinionated gateway class that quietly handles initialization, context, tracking, and concurrency so the rest of your system can stay simple, testable, and focused on domain logic.

If you’d like to see how this all comes together in real code, you can browse the implementation of Module and ProgramMeta in the DSPy repository: dspy/primitives/module.py. Read it with this lens: you’re looking at a gateway that coordinates almost everything that makes DSPy modules production-ready.

Full Source Code

Direct source from the upstream repository. Preview it inline or open it on GitHub.

dspy/primitives/module.py

stanfordnlp/dspy • main

Choose one action below.

Open on GitHub

Thanks for reading! I hope this was useful. If you have questions or thoughts, feel free to reach out.

Content Creation Process: This article was generated via a semi-automated workflow using AI tools. I prepared the strategic framework, including specific prompts and data sources. From there, the automation system conducted the research, analysis, and writing. The content passed through automated verification steps before being finalized and published without manual intervention.

Mahmoud Zalt

About the Author

I’m Zalt, a technologist with 16+ years of experience, passionate about designing and building AI systems that move us closer to a world where machines handle everything and humans reclaim wonder.

Let's connect if you're working on interesting AI projects, looking for technical advice or want to discuss anything.

Support this content

Share this article