Inside the fastmcp Context

A practical tour of a durable server facade

Intro

The fastest way to build resilient systems is to simplify the parts you touch most. In Model Context Protocol (MCP) servers, that’s the request context: logging, progress, sampling, elicitation, and state—over and over.

Welcome! I’m Mahmoud Zalt. In this article, we’ll examine src/fastmcp/server/context.py from the fastmcp project. FastMCP provides a server-side utilities layer and façade around MCP’s RequestContext and ServerSession so you can log to clients, request LLM completions, elicit typed input, work with resources/prompts, and keep per-request state safe—and ergonomic.

Project quick facts: Python 3.10+, async/await, AnyIO/Starlette runtime, with MCP session and request abstractions. This file is the server-layer façade—your single, typed gateway to client capabilities and scoped state.

Why this file matters: it centralizes request semantics. It mitigates risk (state leaks, logging inconsistencies, schema mismatches) and unlocks opportunity (pluggable sampling, validation-backed elicitation, notification deduping) with a clear developer experience.

In the next sections, I’ll show how it works, what’s brilliant, and where we can sharpen it for maintainability, extensibility, usability/DX, scalability, and performance. We’ll go through: How It Works → What’s Brilliant → Areas for Improvement → Performance at Scale → Conclusion.

How It Works

To set the stage, this module implements a high-level Context object that sits in the server layer and delegates to fastmcp.server.server.FastMCP and MCP’s ServerSession/RequestContext. It exposes the operations you need in tools and resources: structured logs sent to the client, progress reporting, listing/reading resources/prompts, sampling (LLM completion) with a fallback to a server handler, elicitation (typed user input) with JSON Schema validation, and per-request state with safe inheritance.

fastmcp/
  src/
    fastmcp/
      server/
        server.py        (FastMCP)
        elicitation.py   (schemas, Accepted/Declined/Cancelled)
        context.py  <--- (this file: Context facade)
      utilities/
        logging.py       (_clamp_logger, get_logger)
        types.py         (get_cached_typeadapter)

Call graph (simplified):

Context.__aenter__ -> set _current_context, inherit state
Context.report_progress -> session.send_progress_notification
Context.log -> _log_to_server_and_client -> session.send_log_message
Context.sample -> (fallback? fastmcp.sampling_handler) : session.create_message
Context.elicit -> get_elicitation_schema -> session.elicit -> validate -> Accepted/Declined/Cancelled
Context._flush_notifications -> [send_*_list_changed] (dedup, under lock)

Module placement and the key call paths

Public API highlights:

set_context: Synchronous contextmanager that sets the current Context in a ContextVar.
Context.__aenter__/__aexit__: Async context manager for request handling and state inheritance.
Context.log and debug/info/warning/error: Client-visible logs mirrored to a server logger.
report_progress: Sends progress updates if the client includes a token.
list_resources, read_resource, list_prompts, get_prompt, list_roots: Resource/prompt accessors via FastMCP and the session.
sample: Normalized LLM completions with client call or server fallback.
elicit: Typed input with schema derivation and validation, returning Accepted/Declined/Cancelled.
session_id: Stable ID per MCP session, derived from headers or generated and persisted on the session.
set_state/get_state: Per-request state with parent→child inheritance.

Tip: The ContextVar pattern ensures you can get the current Context from anywhere in the call stack, without manually plumbing it through every function.

Context propagation and state safety

The module uses a ContextVar to store the active Context, with a minimal synchronous helper to set/reset it. This works seamlessly with async tasks and ensures proper isolation between concurrent requests.

Synchronous context manager for setting the active Context (View on GitHub: L93–L100)

@contextmanager
def set_context(context: Context) -> Generator[Context, None, None]:
    token = _current_context.set(context)
    try:
        yield context
    finally:
        _current_context.reset(token)

A tiny, safe way to establish the current Context, even across nested scopes.

Nested contexts inherit state by deep-copying the parent’s _state. This preserves immutability guarantees across middleware or nested handler calls.

Nested context state inheritance (View on GitHub: L162–L172)

async def __aenter__(self) -> Context:
    """Enter the context manager and set this context as the current context."""
    parent_context = _current_context.get(None)
    if parent_context is not None:
        # Inherit state from parent context
        self._state = copy.deepcopy(parent_context._state)

    # Always set this context and save the token
    token = _current_context.set(self)
    self._tokens.append(token)
    return self

Child contexts can read parent state safely without risking accidental mutation of the parent.

Client interactions: logs, sampling, elicitation

Logs are mirrored to a server-side logger at DEBUG while being sent to the client at the requested MCP LoggingLevel. Progress is conditionally reported based on a client-supplied token. Sampling normalizes strings or typed messages and either dispatches to the client (via session.create_message) or falls back to a local handler depending on capability and configuration.

Elicitation is a thoughtful abstraction: it generates JSON Schema from a type (including handling list[str] as a Literal choice), sends the request, and validates the response with cached type adapters. The return type matches the Accepted/Declined/Cancelled triad used in the rest of the server.

Error handling strategy

Calls that require an active request raise ValueError when misused (e.g., accessing request_context without a request). Sampling without a configured handler when falling back also raises ValueError. Notification flushing intentionally swallows exceptions to avoid breaking request teardown; we’ll revisit this tradeoff later for observability.

What’s Brilliant

Now that we’ve covered the surface, let’s highlight the design choices that make this module pleasant and safe to use.

1) A clean façade over MCP primitives

The class is a true façade: you don’t need to know about ServerSession details to log, sample, elicit, or handle list changes. The Law of Demeter is respected; the raw session is exposed as an escape hatch without being required for everyday use. This keeps handler code small and expressive.

2) Developer experience (DX) wins everywhere

Convenience logging via debug/info/warning/error methods. All are consistently mirrored to to_client_logger at DEBUG to keep your server logs complete.
Sampling ergonomics: strings or SamplingMessage sequences are accepted; model_preferences gracefully accepts a ModelPreferences instance, a string, or a list of strings.
Typed elicitation with automatic schema conversion and validation. Returning Accepted/Declined/Cancelled makes downstream logic straightforward.
State inheritance prevents accidental data bleed across nested operations.

3) Sensible invariants and safety checks

request_context raises on misuse outside a valid request.
Notification topics are deduplicated using a set.
Session IDs are stable across transports by persisting to session._fastmcp_id.

4) Elicitation type normalization—done right

Converting list[str] into a Literal and wrapping scalars ensures client-compatible schemas without burdening callers.

Elicitation type normalization (View on GitHub: L587–L606)

        # if the user provided a list of strings, treat it as a Literal
        if isinstance(response_type, list):
            if not all(isinstance(item, str) for item in response_type):
                raise ValueError(
                    "List of options must be a list of strings. Received: "
                    f"{response_type}"
                )
            # Convert list of options to Literal type and wrap
            choice_literal = Literal[tuple(response_type)]  # type: ignore
            response_type = ScalarElicitationType[choice_literal]  # type: ignore
        # if the user provided a primitive scalar, wrap it in an object schema
        elif (
            response_type in {bool, int, float, str}
            or get_origin(response_type) is Literal
            or (isinstance(response_type, type) and issubclass(response_type, Enum))
        ):
            response_type = ScalarElicitationType[response_type]  # type: ignore

        response_type = cast(type[T], response_type)

Callers can stay expressive while the server enforces a protocol-compatible schema and type validation.

Areas for Improvement

Great code gets even better with targeted, low-risk changes. Here are concrete improvements, tied to impact and proposed fixes.

Smell	Impact	Fix
Global `_flush_lock` serializes notification flush across all requests	Throughput bottleneck at teardown under concurrency	Use a per-Context lock to eliminate cross-request contention
Deep copy of state on nested context entry	CPU/memory overhead proportional to state size	Consider a persistent mapping/copy-on-write, or enforce immutability
Broad exception swallowing in `_flush_notifications`	Silent failures and lost observability	Log exceptions with request/session context; add a metric
Access to private attribute `session._fastmcp_id`	Upgrade fragility if session internals change	Add a public helper on `FastMCP`/session wrapper for a session-scoped ID
No timeouts on network-dependent calls	Risk of hung tasks and resource pile-ups	Wrap calls with `anyio.fail_after` with configurable defaults

Apply timeouts to networked operations

Sampling, elicitation, logging, and notifications depend on client responsiveness. Adding explicit timeouts avoids indefinite hangs and clarifies failures. Here’s a targeted refactor of the sampling call:

Timeouts around session.create_message (diff)

--- a/src/fastmcp/server/context.py
+++ b/src/fastmcp/server/context.py
@@
-        result: CreateMessageResult = await self.session.create_message(
+        import anyio
+        # Enforce a reasonable timeout to avoid hung tasks
+        with anyio.fail_after(30):
+            result: CreateMessageResult = await self.session.create_message(
             messages=sampling_messages,
             system_prompt=system_prompt,
             include_context=include_context,
             temperature=temperature,
             max_tokens=max_tokens,
             model_preferences=_parse_model_preferences(model_preferences),
             related_request_id=self.request_id,
-        )
+            )

This enforces a clear boundary (e.g., 30s) and aligns with an SLO like “p95 < 5s; timeout at 30s” for sampling latency.

Improve concurrency by removing the global teardown lock

The current implementation flushes notifications under a global lock, serializing unrelated requests. Switching to a per-Context lock localizes contention and improves throughput during heavy concurrency.

Per-Context lock for notification flushing (diff)

--- a/src/fastmcp/server/context.py
+++ b/src/fastmcp/server/context.py
@@
-_flush_lock = anyio.Lock()
+_flush_lock = None  # deprecated global lock
@@ class Context:
-        self._state: dict[str, Any] = {}
+        self._state: dict[str, Any] = {}
+        self._flush_lock = anyio.Lock()
@@
-        async with _flush_lock:
+        async with self._flush_lock:
             if not self._notification_queue:
                 return

Removes a global critical section. Each request flushes independently, reducing tail latency at request completion.

Recover observability on flush failures

Silent failures are painful in production. Logging contextual details on flush errors preserves resilience while restoring debuggability.

Log notification flush failures (diff)

--- a/src/fastmcp/server/context.py
+++ b/src/fastmcp/server/context.py
@@
-        except Exception:
-            # Don't let notification failures break the request
-            pass
+        except Exception as exc:
+            # Don't let notification failures break the request, but record them
+            logger.exception("Failed to flush MCP notifications", extra={
+                "request_id": self.request_id,
+                "session_id": self.session_id,
+                "queued": list(self._notification_queue),
+            })

This complements metrics like context.notifications.flush_duration_ms and enables alerting when flush failures spike.

Targeted tests to lock behavior

A few focused tests go a long way. For example, verify that session_id persists across calls (and prefers an inbound header when present).

Illustrative test: session_id persistence

# illustrative test (pytest + anyio)
import types
import anyio
import pytest

class FakeRequest:
    def __init__(self, headers=None):
        self.headers = headers or {}

class FakeSession:
    pass

class FakeRequestContext:
    def __init__(self, session, request):
        self.session = session
        self.request = request
        self.meta = types.SimpleNamespace(progressToken=None)
        self.request_id = "req-1"

@pytest.mark.anyio
async def test_session_id_persistence(ctx_factory):
    session = FakeSession()
    req = FakeRequest()
    rc = FakeRequestContext(session, req)
    ctx = ctx_factory(rc)
    async with ctx:
        id1 = ctx.session_id
        id2 = ctx.session_id
        assert id1 == id2
        assert getattr(session, "_fastmcp_id") == id1

Ensures a stable key for session-scoped storage across tool invocations.

Performance at Scale

With the basics optimized, we can turn to hot paths, concurrency, and observability so this module performs predictably under load.

Hot paths and resource costs

Sampling (Context.sample): Normalization cost is small, but network latency dominates. Apply timeouts and monitor latency histograms.
Elicitation: Schema build is O(1); network dominates. Track cancellations and declines to understand user behavior.
Logging: Mirrored server logs plus client I/O. Watch for backpressure.
Notification flush: O(k) over at most three notification types; make it concurrency-friendly (per-context locks).
State deepcopy on nested contexts: cost scales with state size. Keep state small and immutable where possible.

Concurrency and contention

ContextVar ensures correct context association per task, even when handlers spawn sub-tasks.
Global lock (current implementation) serializes notification flush. Switching to a per-context lock avoids cross-request blocking at teardown.

Reliability controls and timeouts

To avoid resource pile-ups, use explicit timeouts for calls such as session.create_message, session.elicit, session.send_log_message, and the notification sends. Pair timeouts with meaningful error mapping and server-side retries when appropriate.

Observability: logs, metrics, traces

Instrument the module with a lean, actionable telemetry plan:

Logs:
- Server→client sends with level and related_request_id.
- Exceptions on notification flush with request_id and session_id.
- Deprecation warnings for get_http_request.
Metrics:
- mcp.outbound.log_messages_total to observe log volume by level/logger.
- mcp.sampling.latency_ms with a target like p95 < 5s; timeout at 30s.
- mcp.elicit.latency_ms with a target like p95 < 30s and cancellation tracking.
- context.notifications.flush_duration_ms with a target like p95 < 100ms.
- context.state.size_bytes to bound deepcopy cost (e.g., mean < 10KB).
Traces:
- Spans around sample and elicit including schema build and session calls.
- Span for _flush_notifications with events per notification type.
Alerts:
- High sampling latency (p95 breaches).
- Frequent notification flush failures.
- Spikes in error-level client logs.
- Timeouts on session.create_message or session.elicit.

Conclusion

FastMCP’s Context is a strong façade over MCP: it gives handlers a clean, typed API for logging, sampling, eliciting, and managing lightweight state. The architecture applies sensible defaults and safety checks, while leaving room to extend capabilities over time.

My top takeaways:

Keep the façade clean and forgiving; normalize inputs at the boundary and validate outputs rigorously.
Add small reliability features—timeouts and contextual error logs—to turn edge cases into visible, actionable signals.
Remove global contention hotspots (like the teardown lock) and measure the hot paths you rely on.

If you’re working with MCP servers, consider adopting this pattern: a single, ergonomic context object with typed affordances and strong invariants. It shortens feedback loops for juniors and gives seniors the operational hooks they need when systems scale.

Explore the source: fastmcp repo · context.py. I hope this walkthrough helps you ship safer, more maintainable MCP servers.

Zalt Blog

Inside the fastmcp Context

Inside the fastmcp Context

Intro

How It Works

Context propagation and state safety

Client interactions: logs, sampling, elicitation

What’s Brilliant

1) A clean façade over MCP primitives

2) Developer experience (DX) wins everywhere

3) Sensible invariants and safety checks

4) Elicitation type normalization—done right

Areas for Improvement

Apply timeouts to networked operations

Improve concurrency by removing the global teardown lock

Recover observability on flush failures

Targeted tests to lock behavior

Performance at Scale

Hot paths and resource costs

Concurrency and contention

Reliability controls and timeouts

Observability: logs, metrics, traces

Conclusion

Full Source Code

About the Author

Support this content

Share this article

Read More

Why Transformers Imports Feel Lightweight

When One Class Runs Your Cluster