Skip to home
المدونة

Zalt Blog

Deep Dives into Code & Architecture at Scale

Inside the fastmcp Context

By محمود الزلط
Code Cracking
20m read
<

Don't treat the Context as a black box — Inside the fastmcp Context pulls back the curtain so engineers can see what the Context contains and how it frames request handling in fastmcp.

/>
Inside the fastmcp Context - Featured blog post image

Inside the fastmcp Context

A practical tour of a durable server facade

Intro

The fastest way to build resilient systems is to simplify the parts you touch most. In Model Context Protocol (MCP) servers, that’s the request context: logging, progress, sampling, elicitation, and state—over and over.

Welcome! I’m Mahmoud Zalt. In this article, we’ll examine src/fastmcp/server/context.py from the fastmcp project. FastMCP provides a server-side utilities layer and façade around MCP’s RequestContext and ServerSession so you can log to clients, request LLM completions, elicit typed input, work with resources/prompts, and keep per-request state safe—and ergonomic.

Project quick facts: Python 3.10+, async/await, AnyIO/Starlette runtime, with MCP session and request abstractions. This file is the server-layer façade—your single, typed gateway to client capabilities and scoped state.

Why this file matters: it centralizes request semantics. It mitigates risk (state leaks, logging inconsistencies, schema mismatches) and unlocks opportunity (pluggable sampling, validation-backed elicitation, notification deduping) with a clear developer experience.

In the next sections, I’ll show how it works, what’s brilliant, and where we can sharpen it for maintainability, extensibility, usability/DX, scalability, and performance. We’ll go through: How It Works → What’s Brilliant → Areas for Improvement → Performance at Scale → Conclusion.

How It Works

To set the stage, this module implements a high-level Context object that sits in the server layer and delegates to fastmcp.server.server.FastMCP and MCP’s ServerSession/RequestContext. It exposes the operations you need in tools and resources: structured logs sent to the client, progress reporting, listing/reading resources/prompts, sampling (LLM completion) with a fallback to a server handler, elicitation (typed user input) with JSON Schema validation, and per-request state with safe inheritance.

fastmcp/
  src/
    fastmcp/
      server/
        server.py        (FastMCP)
        elicitation.py   (schemas, Accepted/Declined/Cancelled)
        context.py  <--- (this file: Context facade)
      utilities/
        logging.py       (_clamp_logger, get_logger)
        types.py         (get_cached_typeadapter)

Call graph (simplified):

Context.__aenter__ -> set _current_context, inherit state
Context.report_progress -> session.send_progress_notification
Context.log -> _log_to_server_and_client -> session.send_log_message
Context.sample -> (fallback? fastmcp.sampling_handler) : session.create_message
Context.elicit -> get_elicitation_schema -> session.elicit -> validate -> Accepted/Declined/Cancelled
Context._flush_notifications -> [send_*_list_changed] (dedup, under lock)
Module placement and the key call paths

Public API highlights:

  • set_context: Synchronous contextmanager that sets the current Context in a ContextVar.
  • Context.__aenter__/__aexit__: Async context manager for request handling and state inheritance.
  • Context.log and debug/info/warning/error: Client-visible logs mirrored to a server logger.
  • report_progress: Sends progress updates if the client includes a token.
  • list_resources, read_resource, list_prompts, get_prompt, list_roots: Resource/prompt accessors via FastMCP and the session.
  • sample: Normalized LLM completions with client call or server fallback.
  • elicit: Typed input with schema derivation and validation, returning Accepted/Declined/Cancelled.
  • session_id: Stable ID per MCP session, derived from headers or generated and persisted on the session.
  • set_state/get_state: Per-request state with parent→child inheritance.

Context propagation and state safety

The module uses a ContextVar to store the active Context, with a minimal synchronous helper to set/reset it. This works seamlessly with async tasks and ensures proper isolation between concurrent requests.

Synchronous context manager for setting the active Context (View on GitHub: L93–L100)
@contextmanager
def set_context(context: Context) -> Generator[Context, None, None]:
    token = _current_context.set(context)
    try:
        yield context
    finally:
        _current_context.reset(token)

A tiny, safe way to establish the current Context, even across nested scopes.

Nested contexts inherit state by deep-copying the parent’s _state. This preserves immutability guarantees across middleware or nested handler calls.

Nested context state inheritance (View on GitHub: L162–L172)
async def __aenter__(self) -> Context:
    """Enter the context manager and set this context as the current context."""
    parent_context = _current_context.get(None)
    if parent_context is not None:
        # Inherit state from parent context
        self._state = copy.deepcopy(parent_context._state)

    # Always set this context and save the token
    token = _current_context.set(self)
    self._tokens.append(token)
    return self

Child contexts can read parent state safely without risking accidental mutation of the parent.

Client interactions: logs, sampling, elicitation

Logs are mirrored to a server-side logger at DEBUG while being sent to the client at the requested MCP LoggingLevel. Progress is conditionally reported based on a client-supplied token. Sampling normalizes strings or typed messages and either dispatches to the client (via session.create_message) or falls back to a local handler depending on capability and configuration.

Elicitation is a thoughtful abstraction: it generates JSON Schema from a type (including handling list[str] as a Literal choice), sends the request, and validates the response with cached type adapters. The return type matches the Accepted/Declined/Cancelled triad used in the rest of the server.

Error handling strategy

Calls that require an active request raise ValueError when misused (e.g., accessing request_context without a request). Sampling without a configured handler when falling back also raises ValueError. Notification flushing intentionally swallows exceptions to avoid breaking request teardown; we’ll revisit this tradeoff later for observability.

What’s Brilliant

Now that we’ve covered the surface, let’s highlight the design choices that make this module pleasant and safe to use.

1) A clean façade over MCP primitives

The class is a true façade: you don’t need to know about ServerSession details to log, sample, elicit, or handle list changes. The Law of Demeter is respected; the raw session is exposed as an escape hatch without being required for everyday use. This keeps handler code small and expressive.

2) Developer experience (DX) wins everywhere

  • Convenience logging via debug/info/warning/error methods. All are consistently mirrored to to_client_logger at DEBUG to keep your server logs complete.
  • Sampling ergonomics: strings or SamplingMessage sequences are accepted; model_preferences gracefully accepts a ModelPreferences instance, a string, or a list of strings.
  • Typed elicitation with automatic schema conversion and validation. Returning Accepted/Declined/Cancelled makes downstream logic straightforward.
  • State inheritance prevents accidental data bleed across nested operations.

3) Sensible invariants and safety checks

  • request_context raises on misuse outside a valid request.
  • Notification topics are deduplicated using a set.
  • Session IDs are stable across transports by persisting to session._fastmcp_id.

4) Elicitation type normalization—done right

Converting list[str] into a Literal and wrapping scalars ensures client-compatible schemas without burdening callers.

Elicitation type normalization (View on GitHub: L587–L606)
        # if the user provided a list of strings, treat it as a Literal
        if isinstance(response_type, list):
            if not all(isinstance(item, str) for item in response_type):
                raise ValueError(
                    "List of options must be a list of strings. Received: "
                    f"{response_type}"
                )
            # Convert list of options to Literal type and wrap
            choice_literal = Literal[tuple(response_type)]  # type: ignore
            response_type = ScalarElicitationType[choice_literal]  # type: ignore
        # if the user provided a primitive scalar, wrap it in an object schema
        elif (
            response_type in {bool, int, float, str}
            or get_origin(response_type) is Literal
            or (isinstance(response_type, type) and issubclass(response_type, Enum))
        ):
            response_type = ScalarElicitationType[response_type]  # type: ignore

        response_type = cast(type[T], response_type)

Callers can stay expressive while the server enforces a protocol-compatible schema and type validation.

Areas for Improvement

Great code gets even better with targeted, low-risk changes. Here are concrete improvements, tied to impact and proposed fixes.

Smell Impact Fix
Global _flush_lock serializes notification flush across all requests Throughput bottleneck at teardown under concurrency Use a per-Context lock to eliminate cross-request contention
Deep copy of state on nested context entry CPU/memory overhead proportional to state size Consider a persistent mapping/copy-on-write, or enforce immutability
Broad exception swallowing in _flush_notifications Silent failures and lost observability Log exceptions with request/session context; add a metric
Access to private attribute session._fastmcp_id Upgrade fragility if session internals change Add a public helper on FastMCP/session wrapper for a session-scoped ID
No timeouts on network-dependent calls Risk of hung tasks and resource pile-ups Wrap calls with anyio.fail_after with configurable defaults

Apply timeouts to networked operations

Sampling, elicitation, logging, and notifications depend on client responsiveness. Adding explicit timeouts avoids indefinite hangs and clarifies failures. Here’s a targeted refactor of the sampling call:

Timeouts around session.create_message (diff)
--- a/src/fastmcp/server/context.py
+++ b/src/fastmcp/server/context.py
@@
-        result: CreateMessageResult = await self.session.create_message(
+        import anyio
+        # Enforce a reasonable timeout to avoid hung tasks
+        with anyio.fail_after(30):
+            result: CreateMessageResult = await self.session.create_message(
             messages=sampling_messages,
             system_prompt=system_prompt,
             include_context=include_context,
             temperature=temperature,
             max_tokens=max_tokens,
             model_preferences=_parse_model_preferences(model_preferences),
             related_request_id=self.request_id,
-        )
+            )

This enforces a clear boundary (e.g., 30s) and aligns with an SLO like “p95 < 5s; timeout at 30s” for sampling latency.

Improve concurrency by removing the global teardown lock

The current implementation flushes notifications under a global lock, serializing unrelated requests. Switching to a per-Context lock localizes contention and improves throughput during heavy concurrency.

Per-Context lock for notification flushing (diff)
--- a/src/fastmcp/server/context.py
+++ b/src/fastmcp/server/context.py
@@
-_flush_lock = anyio.Lock()
+_flush_lock = None  # deprecated global lock
@@ class Context:
-        self._state: dict[str, Any] = {}
+        self._state: dict[str, Any] = {}
+        self._flush_lock = anyio.Lock()
@@
-        async with _flush_lock:
+        async with self._flush_lock:
             if not self._notification_queue:
                 return

Removes a global critical section. Each request flushes independently, reducing tail latency at request completion.

Recover observability on flush failures

Silent failures are painful in production. Logging contextual details on flush errors preserves resilience while restoring debuggability.

Log notification flush failures (diff)
--- a/src/fastmcp/server/context.py
+++ b/src/fastmcp/server/context.py
@@
-        except Exception:
-            # Don't let notification failures break the request
-            pass
+        except Exception as exc:
+            # Don't let notification failures break the request, but record them
+            logger.exception("Failed to flush MCP notifications", extra={
+                "request_id": self.request_id,
+                "session_id": self.session_id,
+                "queued": list(self._notification_queue),
+            })

This complements metrics like context.notifications.flush_duration_ms and enables alerting when flush failures spike.

Targeted tests to lock behavior

A few focused tests go a long way. For example, verify that session_id persists across calls (and prefers an inbound header when present).

Illustrative test: session_id persistence
# illustrative test (pytest + anyio)
import types
import anyio
import pytest

class FakeRequest:
    def __init__(self, headers=None):
        self.headers = headers or {}

class FakeSession:
    pass

class FakeRequestContext:
    def __init__(self, session, request):
        self.session = session
        self.request = request
        self.meta = types.SimpleNamespace(progressToken=None)
        self.request_id = "req-1"

@pytest.mark.anyio
async def test_session_id_persistence(ctx_factory):
    session = FakeSession()
    req = FakeRequest()
    rc = FakeRequestContext(session, req)
    ctx = ctx_factory(rc)
    async with ctx:
        id1 = ctx.session_id
        id2 = ctx.session_id
        assert id1 == id2
        assert getattr(session, "_fastmcp_id") == id1

Ensures a stable key for session-scoped storage across tool invocations.

Performance at Scale

With the basics optimized, we can turn to hot paths, concurrency, and observability so this module performs predictably under load.

Hot paths and resource costs

  • Sampling (Context.sample): Normalization cost is small, but network latency dominates. Apply timeouts and monitor latency histograms.
  • Elicitation: Schema build is O(1); network dominates. Track cancellations and declines to understand user behavior.
  • Logging: Mirrored server logs plus client I/O. Watch for backpressure.
  • Notification flush: O(k) over at most three notification types; make it concurrency-friendly (per-context locks).
  • State deepcopy on nested contexts: cost scales with state size. Keep state small and immutable where possible.

Concurrency and contention

  • ContextVar ensures correct context association per task, even when handlers spawn sub-tasks.
  • Global lock (current implementation) serializes notification flush. Switching to a per-context lock avoids cross-request blocking at teardown.

Reliability controls and timeouts

To avoid resource pile-ups, use explicit timeouts for calls such as session.create_message, session.elicit, session.send_log_message, and the notification sends. Pair timeouts with meaningful error mapping and server-side retries when appropriate.

Observability: logs, metrics, traces

Instrument the module with a lean, actionable telemetry plan:

  • Logs:
    • Server→client sends with level and related_request_id.
    • Exceptions on notification flush with request_id and session_id.
    • Deprecation warnings for get_http_request.
  • Metrics:
    • mcp.outbound.log_messages_total to observe log volume by level/logger.
    • mcp.sampling.latency_ms with a target like p95 < 5s; timeout at 30s.
    • mcp.elicit.latency_ms with a target like p95 < 30s and cancellation tracking.
    • context.notifications.flush_duration_ms with a target like p95 < 100ms.
    • context.state.size_bytes to bound deepcopy cost (e.g., mean < 10KB).
  • Traces:
    • Spans around sample and elicit including schema build and session calls.
    • Span for _flush_notifications with events per notification type.
  • Alerts:
    • High sampling latency (p95 breaches).
    • Frequent notification flush failures.
    • Spikes in error-level client logs.
    • Timeouts on session.create_message or session.elicit.

Conclusion

FastMCP’s Context is a strong façade over MCP: it gives handlers a clean, typed API for logging, sampling, eliciting, and managing lightweight state. The architecture applies sensible defaults and safety checks, while leaving room to extend capabilities over time.

My top takeaways:

  • Keep the façade clean and forgiving; normalize inputs at the boundary and validate outputs rigorously.
  • Add small reliability features—timeouts and contextual error logs—to turn edge cases into visible, actionable signals.
  • Remove global contention hotspots (like the teardown lock) and measure the hot paths you rely on.

If you’re working with MCP servers, consider adopting this pattern: a single, ergonomic context object with typed affordances and strong invariants. It shortens feedback loops for juniors and gives seniors the operational hooks they need when systems scale.

Explore the source: fastmcp repo · context.py. I hope this walkthrough helps you ship safer, more maintainable MCP servers.

Full Source Code

Here's the full source code of the file that inspired this article.
Read on GitHub

Unable to load source code

Thanks for reading! I hope this was useful. If you have questions or thoughts, feel free to reach out.

Content Creation Process: This article was generated via a semi-automated workflow using AI tools. I prepared the strategic framework, including specific prompts and data sources. From there, the automation system conducted the research, analysis, and writing. The content passed through automated verification steps before being finalized and published without manual intervention.

Mahmoud Zalt

About the Author

I’m Zalt, a technologist with 15+ years of experience, passionate about designing and building AI systems that move us closer to a world where machines handle everything and humans reclaim wonder.

Let's connect if you're working on interesting AI projects, looking for technical advice or want to discuss your career.

Support this content

Share this article