We’re examining how DSPy manages everything that happens around an LLM call, not just inside it. DSPy is a framework for building optimized LLM pipelines, and at the center of those pipelines is dspy.primitives.module.Module—the gateway class every program passes through before it hits the language model. I’m Mahmoud Zalt, an AI solutions architect, and we’ll unpack how this small file centralizes initialization, context, observability, and batching into one opinionated entry point—and what that design gives us for free.
Module as a Gateway, Not Just a Base Class
Inside DSPy, Module is more than an abstract superclass. It is the gateway every pipeline step passes through on its way to the LLM. That gateway is where DSPy enforces invariants, wires callbacks, tracks usage, and exposes a uniform interface for both sync and async execution.
dspy/
├─ dsp/
│ └─ utils/
│ └─ settings.py
├─ predict/
│ ├─ predict.py (Predict)
│ └─ parallel.py (Parallel)
├─ primitives/
│ ├─ base_module.py (BaseModule)
│ ├─ example.py (Example)
│ ├─ prediction.py (Prediction)
│ └─ module.py (Module, ProgramMeta)
└─ utils/
├─ callback.py (with_callbacks)
├─ inspect_history.py (pretty_print_history)
├─ magicattr.py (magicattr.set)
└─ usage_tracker.py (track_usage)
Caller code
│
▼
Module.__call__ / Module.acall
│ (with_callbacks, settings.context, track_usage)
▼
Module.forward / Module.aforward (subclasses)
│
▼
Predict.lm (LLM calls, network I/O)
Module orchestrates settings, callbacks, tracking, and predictors before handing off to the LM.
A key requirement is that every module instance is correctly initialized, even if the author of a subclass forgets to call super().__init__(). DSPy solves this with a metaclass, ProgramMeta, which intercepts instance creation and injects the base initialization.
class ProgramMeta(type):
"""Metaclass ensuring every ``dspy.Module`` instance is properly initialised."""
def __call__(cls, *args, **kwargs):
obj = cls.__new__(cls, *args, **kwargs)
if isinstance(obj, cls):
Module._base_init(obj)
cls.__init__(obj, *args, **kwargs)
if not hasattr(obj, "callbacks"):
obj.callbacks = []
if not hasattr(obj, "history"):
obj.history = []
return obj
ProgramMeta guarantees base attributes on every Module instance regardless of subclass __init__.
Instead of relying on documentation (“don’t forget to call super()”), the framework enforces the invariant at the type level. Every module instance has consistent core state like callbacks and history, which in turn keeps the gateway logic simple and predictable.
Enforcing a Safe Call Path
With initialization handled, the next question is: what happens whenever a module is invoked? DSPy makes Module.__call__ the only supported entry point for doing work and layers all orchestration logic there.
@with_callbacks
def __call__(self, *args, **kwargs) -> Prediction:
from dspy.dsp.utils.settings import thread_local_overrides
caller_modules = settings.caller_modules or []
caller_modules = list(caller_modules)
caller_modules.append(self)
with settings.context(caller_modules=caller_modules):
if settings.track_usage and thread_local_overrides.get().get("usage_tracker") is None:
with track_usage() as usage_tracker:
output = self.forward(*args, **kwargs)
tokens = usage_tracker.get_total_tokens()
self._set_lm_usage(tokens, output)
return output
return self.forward(*args, **kwargs)
__call__ wraps forward with callbacks, context, and optional LM usage tracking.
Conceptually, a Module is a smart function. Subclasses implement forward as if it were a plain Python function, but callers always use output = my_module(...). Behind that simple call, the gateway:
- Runs callbacks via
@with_callbacksfor logging, tracing, or metrics. - Updates the context with the caller stack (
settings.caller_modules), so nested modules know who invoked them. - Optionally tracks token usage with
track_usage()and routes the result into the output object.
There is a matching async gateway, acall, that wraps aforward with the same semantics. The implementation currently duplicates much of the sync path, which is a small refactoring opportunity, but the contract is clear: sync and async calls both go through the same policy layer.
Discouraging direct forward calls
To keep all of this logic centralized, Module gently steers developers away from calling forward directly by inspecting attribute access.
def __getattribute__(self, name):
attr = super().__getattribute__(name)
if name == "forward" and callable(attr):
stack = inspect.stack()
forward_called_directly = len(stack) <= 1 or stack[1].function != "__call__"
if forward_called_directly:
logger.warning(
f"Calling module.forward(...) on {self.__class__.__name__} directly is discouraged. "
f"Please use module(...) instead."
)
return attr
forward calls still work, but emit a warning so usage converges on the gateway.
This uses inspect.stack() to see whether forward is being invoked via __call__ or from user code. Stack inspection has a cost, and a performance review of this file rightly calls it out as a potential hot-spot. Still, the pattern is useful: guide developers toward the safe path without breaking existing code.
Predictors as Pluggable Engines
With a single gateway for calls, the next layer is the “engines” that talk to the model. In DSPy those are Predict objects. A module may contain one or several predictors, and Module provides a minimal facade to discover and reconfigure them.
def named_predictors(self):
from dspy.predict.predict import Predict
return [
(name, param)
for name, param in self.named_parameters()
if isinstance(param, Predict)
]
def predictors(self):
return [param for _, param in self.named_predictors()]
Under the hood, BaseModule.named_parameters() walks module attributes. Here, Module simply filters for Predict instances and uses that list to implement higher-level operations:
- Set the LM everywhere:
set_lm(self, lm)iterates over predictors and assignsparam.lm = lm. - Read a shared LM:
get_lm()checks that all predictors share the samelminstance and either returns it or raises aValueErrorwith a clear message. - Transform predictors in bulk:
map_named_predictors(func)applies an arbitrary function to each predictor and writes the result back usingmagicattr.set, which handles nested attributes.
The division of responsibilities is sharp: the module decides when to run and in what context; predictors decide how to talk to the LLM. The small API around named_predictors gives higher-level tooling a stable surface to plug into, whether that’s LM swapping, adding pricing metadata, or benchmarking.
Batching and Concurrency
Most real workloads need to call modules on many inputs at once. DSPy addresses this with Module.batch, which prepares work items and delegates execution to a separate Parallel module.
def batch(
self,
examples: list[Example],
num_threads: int | None = None,
max_errors: int | None = None,
return_failed_examples: bool = False,
provide_traceback: bool | None = None,
disable_progress_bar: bool = False,
):
exec_pairs = [(self, example.inputs()) for example in examples]
parallel_executor = Parallel(
num_threads=num_threads,
max_errors=max_errors,
return_failed_examples=return_failed_examples,
provide_traceback=provide_traceback,
disable_progress_bar=disable_progress_bar,
)
if return_failed_examples:
results, failed_examples, exceptions = parallel_executor.forward(exec_pairs)
return results, failed_examples, exceptions
else:
return parallel_executor.forward(exec_pairs)
batch turns examples into execution pairs and lets Parallel handle threading.
The method is intentionally thin:
- Each
Examplebecomes a set of keyword arguments viaexample.inputs(). - Those arguments are paired with the module instance:
(self, inputs). Parallel.forwardfans out over threads, calling the module behind the scenes.
The heavy lifting—thread management, error aggregation, progress reporting—lives in Parallel. However, there is an important concurrency trade-off that the code implicitly makes: it passes the same module instance into all workers. If forward mutates state on self (for example, appending to history), calls may interleave in ways that are hard to reason about.
| Pattern | Pros | Risk |
|---|---|---|
Shared Module across threads |
Single configuration, one LM, fewer objects | Race conditions if forward mutates self |
One Module per worker |
Isolated state and history, easier debugging | More instances to manage, must share LM explicitly |
Usage Tracking as Part of the Contract
One of the most useful aspects of Module is how it treats observability—especially token usage—as part of the gateway contract rather than an afterthought bolted onto predictors.
Attaching LM usage to predictions
When settings.track_usage is enabled and no thread-local usage tracker is present, __call__ wraps forward with track_usage(). After execution, it passes the collected token counts to _set_lm_usage along with the output.
def _set_lm_usage(self, tokens: dict[str, Any], output: Any):
prediction_in_output = None
if isinstance(output, Prediction):
prediction_in_output = output
elif isinstance(output, tuple) and len(output) > 0 and isinstance(output[0], Prediction):
prediction_in_output = output[0]
if prediction_in_output:
prediction_in_output.set_lm_usage(tokens)
else:
logger.warning(
"Failed to set LM usage. Please return `dspy.Prediction` object from "
"dspy.Module to enable usage tracking."
)
Prediction if one is present in the output.
This introduces an explicit contract: to participate in usage tracking, forward must return a Prediction, or a tuple whose first element is a Prediction. If it doesn’t, the framework logs a warning and drops the token data. For a production system, that’s a subtle but important edge: an accidental change in return type can quietly disable cost visibility for that module.
Hardening this pattern in your own frameworks
In your own code, you can make this safer by failing fast in non-production environments. For example, if tracking is enabled and no Prediction is found, raise in development and only log a warning in production. That keeps the convenience of the gateway while surfacing broken contracts early.
Once usage is attached to Prediction, higher layers can emit metrics per module—invocation counts, latency, token totals—without every module author having to think about observability. The gateway does the wiring; the business logic stays focused on transforming inputs into outputs.
Design Lessons You Can Reuse
Stepping back, the interesting part of Module is not any single method, but how much cross-cutting behavior it centralizes behind one gateway. If you’re building your own LLM orchestration layer or service framework, there are several patterns worth reusing.
1. Treat your core abstraction as a gateway
Pick one method—__call__, run, execute—and make it the only supported way to do work. Behind that gateway, handle context, callbacks, error policies, and tracking. Subclasses then only have to implement a straightforward forward-style method.
2. Enforce invariants with metaclasses or factories
When missing super() calls or partially initialized objects lead to subtle bugs, move initialization into a metaclass or a factory. ProgramMeta shows how to guarantee base fields like callbacks and history without trusting every subclass author to remember the right incantation.
3. Wrap internal engines with a tiny facade
Expose methods like named_predictors, set_lm, and get_lm so external code can reconfigure engines in bulk. The same pattern works for anything nested: database handles, HTTP clients, caches, or feature-flag clients.
4. Make observability part of the type contract
If tracking depends on a specific return type, make that contract explicit and, where possible, enforce it. A single well-defined result object that carries both business data and metrics data is easier to reason about than ad-hoc logs scattered across the stack.
5. Be explicit about concurrency semantics
Providing a convenient batch API is valuable, but it comes with expectations. Document whether your modules are safe to share across threads and structure batch helpers accordingly—either with shared instances plus locking, or with cloned instances that keep configuration but isolate state.
You don’t need to mirror DSPy’s implementation line by line to benefit from these ideas. The core lesson is to design a single, opinionated gateway class that quietly handles initialization, context, tracking, and concurrency so the rest of your system can stay simple, testable, and focused on domain logic.



