We’re examining how fastmcp manages everything a tool needs to do during
a request: logging, progress, state, LLM calls, and even human input. In
fastmcp, all of that flows through one class: Context. I'm Mahmoud
Zalt, an AI solutions architect, and we’ll treat this class as a case study
in how to design a single, ergonomic façade for a complex backend.
The core lesson is that a well‑designed context object can give tool authors
one simple control panel while hiding transports, background workers, and
storage behind clear, testable boundaries. We’ll see how Context pulls
this off, where it starts to look like a god object, and how you can apply
the same patterns in your own servers.
Context as the server’s control panel
Inside fastmcp, user‑defined tools and resources live on one side; MCP
sessions, transports, a state store, and background workers live on the
other. Context is the bridge between them.
fastmcp/
src/fastmcp/
server/
server.py # FastMCP server, owns _state_store, _lifespan_result, ...
context.py # <--- Context facade for tools/resources
sampling/run.py # sample_impl, sample_step_impl
transforms/visibility.py
tasks/elicitation.py
dependencies.py
Tools/resources (user code) --> Context --> FastMCP server & MCP session
--> Clients / LLMs / State store
Context sits between user code and the MCP / FastMCP internals.
This is a textbook façade pattern: one object hides a set of subsystems and
exposes a small surface. Instead of making tool authors juggle
ServerSession, RequestContext, a key‑value store, Docket workers,
visibility rules, and logging levels, they work with a single parameter:
@server.tool
async def my_tool(x: int, ctx: Context) -> str:
await ctx.info(f"Processing {x}")
await ctx.report_progress(50, 100, "Processing")
data = await ctx.read_resource("resource://data")
await ctx.set_state("key", {"value": 1})
result = await ctx.sample("Summarize this", result_type=str)
return result.result
From the tool’s perspective, ctx is a control panel: log something, nudge
progress, call an LLM, persist a bit of state. Under the hood, each method
chooses the right transport, session, and backend.
Ambient context without globals
Once Context is the control panel, the next question is how the rest of
the server grabs the right instance per request, especially in async code.
fastmcp answers with ContextVar.
from contextvars import ContextVar, Token
_current_context: ContextVar[Context | None] = ContextVar("context", default=None)
TransportType = Literal["stdio", "sse", "streamable-http"]
_current_transport: ContextVar[TransportType | None] = ContextVar(
"transport", default=None,
)
def set_transport(transport: TransportType) -> Token[TransportType | None]:
"""Set the current transport type. Returns token for reset."""
return _current_transport.set(transport)
ContextVar is a thread‑local for async tasks: each concurrent task sees
its own value. Context.__aenter__ installs the current Context into
_current_context and wires other dependency‑injection context vars for
the FastMCP server, Docket, and worker; __aexit__ resets them.
The result is “ambient” access to ctx, current transport, and server
instance without any shared global state. Internal helpers can safely call
“current context” without accidentally reading or mutating another request’s
data.
One operation, two worlds
With ambient context in place, Context can offer single methods that span
multiple execution environments. The clearest example is
report_progress, which works both for foreground MCP requests and
background Docket tasks.
async def report_progress(
self,
progress: float,
total: float | None = None,
message: str | None = None,
) -> None:
"""Report progress for the current operation."""
progress_token = (
self.request_context.meta.progressToken
if self.request_context and self.request_context.meta
else None
)
# Foreground: send MCP progress notification
if progress_token is not None:
await self.session.send_progress_notification(
progress_token=progress_token,
progress=progress,
total=total,
message=message,
related_request_id=self.request_id,
)
return
# Background: update Docket execution progress
from fastmcp.server.dependencies import is_docket_available
if not is_docket_available():
return
try:
from docket.dependencies import current_execution
execution = current_execution.get()
if total is not None:
await execution.progress.set_total(int(total))
current = int(progress)
last: int = getattr(execution, "_fastmcp_last_progress", 0)
delta = current - last
if delta > 0:
await execution.progress.increment(delta)
execution._fastmcp_last_progress = current
if message is not None:
await execution.progress.set_message(message)
except LookupError:
# Not running in Docket worker context
pass
A single method covers both cases:
- Foreground requests, where the MCP client is connected and expects progress notifications.
- Background tasks running in Docket workers, where progress is stored and exposed through task APIs.
Tool authors never branch; they just call await ctx.report_progress(...)
and Context routes to the right mechanism. The report suggests isolating
the Docket branch into a helper such as _update_docket_progress() to keep
report_progress small and to decouple Docket‑specific behavior.
Session memory without leaks
Context also gives tools a way to “remember” things between calls,
without resorting to globals that leak across sessions. fastmcp models
this as a per‑session key‑value store backed by a pluggable
_state_store, plus a request‑local cache for ephemeral objects.
Deriving a stable session key
The first step is getting a durable session_id that works across
transports and deployments:
@property
def session_id(self) -> str:
from uuid import uuid4
request_ctx = self.request_context
if request_ctx is not None:
session = request_ctx.session
elif self._session is not None:
session = self._session
else:
raise RuntimeError(
"session_id is not available because no session exists."
)
session_id = getattr(session, "_fastmcp_state_prefix", None)
if session_id is not None:
return session_id
if request_ctx is not None:
request = request_ctx.request
if request:
session_id = request.headers.get("mcp-session-id")
if session_id is None:
session_id = str(uuid4())
session._fastmcp_state_prefix = session_id
return session_id
Think of this as assigning each client a locker. session_id is the locker
number; the state store keys are the contents. HTTP clients can bring their
own locker number via a header so work can move between machines; long‑lived
transports just get a generated UUID.
Durable vs. request‑local state
With a session key in hand, Context offers a simple API that hides two
different storage tiers:
def _make_state_key(self, key: str) -> str:
return f"{self.session_id}:{key}"
async def set_state(self, key: str, value: Any, *, serializable: bool = True) -> None:
prefixed_key = self._make_state_key(key)
if not serializable:
self._request_state[prefixed_key] = value
return
self._request_state.pop(prefixed_key, None)
try:
await self.fastmcp._state_store.put(
key=prefixed_key,
value=StateValue(value=value),
ttl=self._STATE_TTL_SECONDS,
)
except Exception as e:
if "serialize" in str(e).lower():
raise TypeError(
f"Value for state key {key!r} is not serializable. "
f"Use set_state({key!r}, value, serializable=False)..."
) from e
raise
async def get_state(self, key: str) -> Any:
prefixed_key = self._make_state_key(key)
if prefixed_key in self._request_state:
return self._request_state[prefixed_key]
result = await self.fastmcp._state_store.get(key=prefixed_key)
return result.value if result is not None else None
Under the covers there are two kinds of memory:
-
Session‑scoped, serialized state (
serializable=True) stored in_state_storewith a TTL, shared across requests. -
Request‑local, non‑serializable state (
serializable=False) stored only in_request_statefor thisContextinstance.
To tool authors, it is just “store a value under a key”. The implementation
guards against cross‑session leakage and against trying to serialize things
like DB connections. The main rough edge the report flags is the broad
Exception catch with string‑matching for “serialize”; narrowing this to
specific error types would avoid hiding unrelated backend failures.
Talking to humans as a first‑class flow
Context doesn’t just coordinate machines; it also treats “ask the user a
question” as a core operation through elicit. This is how tools trigger
UI forms and wait for structured human input.
Elicitation acts like a questionnaire service: a tool sends a message plus a form schema; the client renders UI, collects input, and sends back a typed result. The public API is surprisingly simple for what it does.
@overload
async def elicit(
self,
message: str,
response_type: type[T],
*,
response_title: str | None = None,
response_description: str | None = None,
) -> AcceptedElicitation[T] | DeclinedElicitation | CancelledElicitation: ...
...
async def elicit(
self,
message: str,
response_type: type[T]
| list[str]
| dict[str, dict[str, str]]
| list[list[str]]
| list[dict[str, dict[str, str]]]
| None = None,
*,
response_title: str | None = None,
response_description: str | None = None,
) -> (
AcceptedElicitation[T]
| AcceptedElicitation[dict[str, Any]]
| AcceptedElicitation[str]
| AcceptedElicitation[list[str]]
| DeclinedElicitation
| CancelledElicitation
):
if response_type is None and fastmcp.settings.deprecation_warnings:
warnings.warn(... FastMCPDeprecationWarning ...)
config = parse_elicit_response_type(
response_type,
response_title=response_title,
response_description=response_description,
)
if self.is_background_task:
result = await self._elicit_for_task(...)
else:
result = await self.session.elicit(...)
if result.action == "accept":
return handle_elicit_accept(config, result.content)
elif result.action == "decline":
return DeclinedElicitation()
elif result.action == "cancel":
return CancelledElicitation()
else:
raise ValueError(f"Unexpected elicitation action: {result.action}")
A few aspects illustrate the façade’s role:
-
Overloads ensure that passing a model type yields
AcceptedElicitation[T], while choice‑based shorthands return strings or string lists. -
A deprecation warning nudges callers away from
response_type=None, explaining why empty schemas are problematic in some clients. -
For background tasks,
_elicit_for_taskswitches the Docket execution into an "input required" state and waits fortasks/sendInput, all behind the samectx.elicitcall.
This is a complex interaction—worker queues, MCP, and UI—surfaced as a single, intuitive method, very much in line with the “one control panel” philosophy.
Taming the god object
By now the trade‑off is clear: Context does a lot. The report calls it a
deliberate “borderline god object”: a single class that accumulates many
responsibilities because it is the main façade of the framework.
Tool authors expect to find everything on ctx. That expectation is worth
preserving, even as the internals grow. The goal is not to split the façade
into many user‑visible pieces, but to split implementation behind it.
The report recommends a gentle refactor strategy:
-
Keep the public methods stable (
ctx.set_state,ctx.sample,ctx.enable_components,ctx.elicit, and so on). -
Move domain logic into internal helpers or sub‑facades such as
_StateFacade,_VisibilityFacade, or an LLM helper, and delegate fromContext. -
Tighten error handling in hot paths (for example, avoiding broad
Exceptioncatches in state management) to keep behavior predictable.
This keeps developer experience intact—one control panel—while making it easier for maintainers to reason about logging, state, visibility, sampling, and elicitation as separate concerns.
Practical takeaways
The fastmcp Context class is a concrete example of one big idea:
carefully designed context objects can give developers a single, ergonomic
interface to a complex, multi‑transport backend without sacrificing
isolation or observability.
From the tour above, a few patterns are worth reusing directly:
- Pick a single façade and invest in it. Most tool and app code should live on one well‑documented object. Treat that façade as your public API and design it intentionally.
-
Expose ambient context safely. Use
ContextVar(or equivalents) to offer “current request” state without resorting to globals, especially in async servers. -
Unify environments behind one API. Methods like
report_progressandelicithide foreground vs. background behavior. Callers should not need to know whether code is running inline or in a worker. - Separate durable and ephemeral state. A simple flag and session‑prefixed keys are enough to give tools session memory while avoiding cross‑tenant leaks and serialization traps.
- Refactor behind the façade, not through it. As your context object grows, extract internal sub‑components instead of forcing users to learn new entry points.
If you are building an MCP server—or any system where tools need rich
per‑request and per‑session context—studying this Context implementation is
time well spent. Start by giving users a single control panel, then evolve
its internals as your transports, workers, and policies become more
sophisticated.





