Skip to main content
المدونة

Zalt Blog

Deep Dives into Code & Architecture

AT SCALE

The Context Object That Runs Your MCP Server

By محمود الزلط
Code Cracking
20m read
<

Running an MCP server and juggling logging, state, and requests across tools? See how a single context object can hold it all together without chaos.

/>
The Context Object That Runs Your MCP Server - Featured blog post image

MENTORING

1:1 engineering mentorship.

Architecture, AI systems, career growth. Ongoing or one-off.

We’re examining how fastmcp manages everything a tool needs to do during a request: logging, progress, state, LLM calls, and even human input. In fastmcp, all of that flows through one class: Context. I'm Mahmoud Zalt, an AI solutions architect, and we’ll treat this class as a case study in how to design a single, ergonomic façade for a complex backend.

The core lesson is that a well‑designed context object can give tool authors one simple control panel while hiding transports, background workers, and storage behind clear, testable boundaries. We’ll see how Context pulls this off, where it starts to look like a god object, and how you can apply the same patterns in your own servers.

Context as the server’s control panel

Inside fastmcp, user‑defined tools and resources live on one side; MCP sessions, transports, a state store, and background workers live on the other. Context is the bridge between them.

fastmcp/
  src/fastmcp/
    server/
      server.py           # FastMCP server, owns _state_store, _lifespan_result, ...
      context.py          # <--- Context facade for tools/resources
      sampling/run.py     # sample_impl, sample_step_impl
      transforms/visibility.py
      tasks/elicitation.py
      dependencies.py

Tools/resources (user code) --> Context --> FastMCP server & MCP session
                              --> Clients / LLMs / State store
Context sits between user code and the MCP / FastMCP internals.

This is a textbook façade pattern: one object hides a set of subsystems and exposes a small surface. Instead of making tool authors juggle ServerSession, RequestContext, a key‑value store, Docket workers, visibility rules, and logging levels, they work with a single parameter:

@server.tool
async def my_tool(x: int, ctx: Context) -> str:
    await ctx.info(f"Processing {x}")
    await ctx.report_progress(50, 100, "Processing")

    data = await ctx.read_resource("resource://data")
    await ctx.set_state("key", {"value": 1})

    result = await ctx.sample("Summarize this", result_type=str)
    return result.result

From the tool’s perspective, ctx is a control panel: log something, nudge progress, call an LLM, persist a bit of state. Under the hood, each method chooses the right transport, session, and backend.

Ambient context without globals

Once Context is the control panel, the next question is how the rest of the server grabs the right instance per request, especially in async code. fastmcp answers with ContextVar.

from contextvars import ContextVar, Token

_current_context: ContextVar[Context | None] = ContextVar("context", default=None)

TransportType = Literal["stdio", "sse", "streamable-http"]
_current_transport: ContextVar[TransportType | None] = ContextVar(
    "transport", default=None,
)


def set_transport(transport: TransportType) -> Token[TransportType | None]:
    """Set the current transport type. Returns token for reset."""
    return _current_transport.set(transport)

ContextVar is a thread‑local for async tasks: each concurrent task sees its own value. Context.__aenter__ installs the current Context into _current_context and wires other dependency‑injection context vars for the FastMCP server, Docket, and worker; __aexit__ resets them.

The result is “ambient” access to ctx, current transport, and server instance without any shared global state. Internal helpers can safely call “current context” without accidentally reading or mutating another request’s data.

One operation, two worlds

With ambient context in place, Context can offer single methods that span multiple execution environments. The clearest example is report_progress, which works both for foreground MCP requests and background Docket tasks.

async def report_progress(
    self,
    progress: float,
    total: float | None = None,
    message: str | None = None,
) -> None:
    """Report progress for the current operation."""

    progress_token = (
        self.request_context.meta.progressToken
        if self.request_context and self.request_context.meta
        else None
    )

    # Foreground: send MCP progress notification
    if progress_token is not None:
        await self.session.send_progress_notification(
            progress_token=progress_token,
            progress=progress,
            total=total,
            message=message,
            related_request_id=self.request_id,
        )
        return

    # Background: update Docket execution progress
    from fastmcp.server.dependencies import is_docket_available
    if not is_docket_available():
        return

    try:
        from docket.dependencies import current_execution

        execution = current_execution.get()
        if total is not None:
            await execution.progress.set_total(int(total))

        current = int(progress)
        last: int = getattr(execution, "_fastmcp_last_progress", 0)
        delta = current - last
        if delta > 0:
            await execution.progress.increment(delta)
        execution._fastmcp_last_progress = current

        if message is not None:
            await execution.progress.set_message(message)
    except LookupError:
        # Not running in Docket worker context
        pass
One API, two execution worlds: MCP notifications vs. Docket progress.

A single method covers both cases:

  • Foreground requests, where the MCP client is connected and expects progress notifications.
  • Background tasks running in Docket workers, where progress is stored and exposed through task APIs.

Tool authors never branch; they just call await ctx.report_progress(...) and Context routes to the right mechanism. The report suggests isolating the Docket branch into a helper such as _update_docket_progress() to keep report_progress small and to decouple Docket‑specific behavior.

Session memory without leaks

Context also gives tools a way to “remember” things between calls, without resorting to globals that leak across sessions. fastmcp models this as a per‑session key‑value store backed by a pluggable _state_store, plus a request‑local cache for ephemeral objects.

Deriving a stable session key

The first step is getting a durable session_id that works across transports and deployments:

@property
def session_id(self) -> str:
    from uuid import uuid4

    request_ctx = self.request_context
    if request_ctx is not None:
        session = request_ctx.session
    elif self._session is not None:
        session = self._session
    else:
        raise RuntimeError(
            "session_id is not available because no session exists."
        )

    session_id = getattr(session, "_fastmcp_state_prefix", None)
    if session_id is not None:
        return session_id

    if request_ctx is not None:
        request = request_ctx.request
        if request:
            session_id = request.headers.get("mcp-session-id")

    if session_id is None:
        session_id = str(uuid4())

    session._fastmcp_state_prefix = session_id
    return session_id

Think of this as assigning each client a locker. session_id is the locker number; the state store keys are the contents. HTTP clients can bring their own locker number via a header so work can move between machines; long‑lived transports just get a generated UUID.

Durable vs. request‑local state

With a session key in hand, Context offers a simple API that hides two different storage tiers:

def _make_state_key(self, key: str) -> str:
    return f"{self.session_id}:{key}"

async def set_state(self, key: str, value: Any, *, serializable: bool = True) -> None:
    prefixed_key = self._make_state_key(key)
    if not serializable:
        self._request_state[prefixed_key] = value
        return

    self._request_state.pop(prefixed_key, None)
    try:
        await self.fastmcp._state_store.put(
            key=prefixed_key,
            value=StateValue(value=value),
            ttl=self._STATE_TTL_SECONDS,
        )
    except Exception as e:
        if "serialize" in str(e).lower():
            raise TypeError(
                f"Value for state key {key!r} is not serializable. "
                f"Use set_state({key!r}, value, serializable=False)..."
            ) from e
        raise

async def get_state(self, key: str) -> Any:
    prefixed_key = self._make_state_key(key)
    if prefixed_key in self._request_state:
        return self._request_state[prefixed_key]
    result = await self.fastmcp._state_store.get(key=prefixed_key)
    return result.value if result is not None else None

Under the covers there are two kinds of memory:

  • Session‑scoped, serialized state (serializable=True) stored in _state_store with a TTL, shared across requests.
  • Request‑local, non‑serializable state (serializable=False) stored only in _request_state for this Context instance.

To tool authors, it is just “store a value under a key”. The implementation guards against cross‑session leakage and against trying to serialize things like DB connections. The main rough edge the report flags is the broad Exception catch with string‑matching for “serialize”; narrowing this to specific error types would avoid hiding unrelated backend failures.

Talking to humans as a first‑class flow

Context doesn’t just coordinate machines; it also treats “ask the user a question” as a core operation through elicit. This is how tools trigger UI forms and wait for structured human input.

Elicitation acts like a questionnaire service: a tool sends a message plus a form schema; the client renders UI, collects input, and sends back a typed result. The public API is surprisingly simple for what it does.

@overload
async def elicit(
    self,
    message: str,
    response_type: type[T],
    *,
    response_title: str | None = None,
    response_description: str | None = None,
) -> AcceptedElicitation[T] | DeclinedElicitation | CancelledElicitation: ...

...

async def elicit(
    self,
    message: str,
    response_type: type[T]
    | list[str]
    | dict[str, dict[str, str]]
    | list[list[str]]
    | list[dict[str, dict[str, str]]]
    | None = None,
    *,
    response_title: str | None = None,
    response_description: str | None = None,
) -> (
    AcceptedElicitation[T]
    | AcceptedElicitation[dict[str, Any]]
    | AcceptedElicitation[str]
    | AcceptedElicitation[list[str]]
    | DeclinedElicitation
    | CancelledElicitation
):
    if response_type is None and fastmcp.settings.deprecation_warnings:
        warnings.warn(... FastMCPDeprecationWarning ...)

    config = parse_elicit_response_type(
        response_type,
        response_title=response_title,
        response_description=response_description,
    )

    if self.is_background_task:
        result = await self._elicit_for_task(...)
    else:
        result = await self.session.elicit(...)

    if result.action == "accept":
        return handle_elicit_accept(config, result.content)
    elif result.action == "decline":
        return DeclinedElicitation()
    elif result.action == "cancel":
        return CancelledElicitation()
    else:
        raise ValueError(f"Unexpected elicitation action: {result.action}")
Elicitation: one method, foreground and background, with strong typing.

A few aspects illustrate the façade’s role:

  • Overloads ensure that passing a model type yields AcceptedElicitation[T], while choice‑based shorthands return strings or string lists.
  • A deprecation warning nudges callers away from response_type=None, explaining why empty schemas are problematic in some clients.
  • For background tasks, _elicit_for_task switches the Docket execution into an "input required" state and waits for tasks/sendInput, all behind the same ctx.elicit call.

This is a complex interaction—worker queues, MCP, and UI—surfaced as a single, intuitive method, very much in line with the “one control panel” philosophy.

Taming the god object

By now the trade‑off is clear: Context does a lot. The report calls it a deliberate “borderline god object”: a single class that accumulates many responsibilities because it is the main façade of the framework.

Tool authors expect to find everything on ctx. That expectation is worth preserving, even as the internals grow. The goal is not to split the façade into many user‑visible pieces, but to split implementation behind it.

The report recommends a gentle refactor strategy:

  • Keep the public methods stable (ctx.set_state, ctx.sample, ctx.enable_components, ctx.elicit, and so on).
  • Move domain logic into internal helpers or sub‑facades such as _StateFacade, _VisibilityFacade, or an LLM helper, and delegate from Context.
  • Tighten error handling in hot paths (for example, avoiding broad Exception catches in state management) to keep behavior predictable.

This keeps developer experience intact—one control panel—while making it easier for maintainers to reason about logging, state, visibility, sampling, and elicitation as separate concerns.

Practical takeaways

The fastmcp Context class is a concrete example of one big idea: carefully designed context objects can give developers a single, ergonomic interface to a complex, multi‑transport backend without sacrificing isolation or observability.

From the tour above, a few patterns are worth reusing directly:

  1. Pick a single façade and invest in it. Most tool and app code should live on one well‑documented object. Treat that façade as your public API and design it intentionally.
  2. Expose ambient context safely. Use ContextVar (or equivalents) to offer “current request” state without resorting to globals, especially in async servers.
  3. Unify environments behind one API. Methods like report_progress and elicit hide foreground vs. background behavior. Callers should not need to know whether code is running inline or in a worker.
  4. Separate durable and ephemeral state. A simple flag and session‑prefixed keys are enough to give tools session memory while avoiding cross‑tenant leaks and serialization traps.
  5. Refactor behind the façade, not through it. As your context object grows, extract internal sub‑components instead of forcing users to learn new entry points.

If you are building an MCP server—or any system where tools need rich per‑request and per‑session context—studying this Context implementation is time well spent. Start by giving users a single control panel, then evolve its internals as your transports, workers, and policies become more sophisticated.

Full Source Code

Direct source from the upstream repository. Preview it inline or open it on GitHub.

heads/main/src/fastmcp/server/context.py

jlowin/fastmcp • refs

Choose one action below.

Open on GitHub

Thanks for reading! I hope this was useful. If you have questions or thoughts, feel free to reach out.

Content Creation Process: This article was generated via a semi-automated workflow using AI tools. I prepared the strategic framework, including specific prompts and data sources. From there, the automation system conducted the research, analysis, and writing. The content passed through automated verification steps before being finalized and published without manual intervention.

Mahmoud Zalt

About the Author

I’m Zalt, a technologist with 16+ years of experience, passionate about designing and building AI systems that move us closer to a world where machines handle everything and humans reclaim wonder.

Let's connect if you're working on interesting AI projects, looking for technical advice or want to discuss anything.

Support this content

Share this article