Inside the fastmcp Context
A practical tour of a durable server facade
Intro
The fastest way to build resilient systems is to simplify the parts you touch most. In Model Context Protocol (MCP) servers, that’s the request context: logging, progress, sampling, elicitation, and state—over and over.
Welcome! I’m Mahmoud Zalt. In this article, we’ll examine src/fastmcp/server/context.py from the fastmcp project. FastMCP provides a server-side utilities layer and façade around MCP’s RequestContext and ServerSession so you can log to clients, request LLM completions, elicit typed input, work with resources/prompts, and keep per-request state safe—and ergonomic.
Project quick facts: Python 3.10+, async/await, AnyIO/Starlette runtime, with MCP session and request abstractions. This file is the server-layer façade—your single, typed gateway to client capabilities and scoped state.
Why this file matters: it centralizes request semantics. It mitigates risk (state leaks, logging inconsistencies, schema mismatches) and unlocks opportunity (pluggable sampling, validation-backed elicitation, notification deduping) with a clear developer experience.
In the next sections, I’ll show how it works, what’s brilliant, and where we can sharpen it for maintainability, extensibility, usability/DX, scalability, and performance. We’ll go through: How It Works → What’s Brilliant → Areas for Improvement → Performance at Scale → Conclusion.
How It Works
To set the stage, this module implements a high-level Context object that sits in the server layer and delegates to fastmcp.server.server.FastMCP and MCP’s ServerSession/RequestContext. It exposes the operations you need in tools and resources: structured logs sent to the client, progress reporting, listing/reading resources/prompts, sampling (LLM completion) with a fallback to a server handler, elicitation (typed user input) with JSON Schema validation, and per-request state with safe inheritance.
fastmcp/
src/
fastmcp/
server/
server.py (FastMCP)
elicitation.py (schemas, Accepted/Declined/Cancelled)
context.py <--- (this file: Context facade)
utilities/
logging.py (_clamp_logger, get_logger)
types.py (get_cached_typeadapter)
Call graph (simplified):
Context.__aenter__ -> set _current_context, inherit state
Context.report_progress -> session.send_progress_notification
Context.log -> _log_to_server_and_client -> session.send_log_message
Context.sample -> (fallback? fastmcp.sampling_handler) : session.create_message
Context.elicit -> get_elicitation_schema -> session.elicit -> validate -> Accepted/Declined/Cancelled
Context._flush_notifications -> [send_*_list_changed] (dedup, under lock)
Public API highlights:
set_context: Synchronous contextmanager that sets the currentContextin aContextVar.Context.__aenter__/__aexit__: Async context manager for request handling and state inheritance.Context.loganddebug/info/warning/error: Client-visible logs mirrored to a server logger.report_progress: Sends progress updates if the client includes a token.list_resources, read_resource, list_prompts, get_prompt, list_roots: Resource/prompt accessors via FastMCP and the session.sample: Normalized LLM completions with client call or server fallback.elicit: Typed input with schema derivation and validation, returning Accepted/Declined/Cancelled.session_id: Stable ID per MCP session, derived from headers or generated and persisted on the session.set_state/get_state: Per-request state with parent→child inheritance.
Context propagation and state safety
The module uses a ContextVar to store the active Context, with a minimal synchronous helper to set/reset it. This works seamlessly with async tasks and ensures proper isolation between concurrent requests.
@contextmanager
def set_context(context: Context) -> Generator[Context, None, None]:
token = _current_context.set(context)
try:
yield context
finally:
_current_context.reset(token)
A tiny, safe way to establish the current Context, even across nested scopes.
Nested contexts inherit state by deep-copying the parent’s _state. This preserves immutability guarantees across middleware or nested handler calls.
async def __aenter__(self) -> Context:
"""Enter the context manager and set this context as the current context."""
parent_context = _current_context.get(None)
if parent_context is not None:
# Inherit state from parent context
self._state = copy.deepcopy(parent_context._state)
# Always set this context and save the token
token = _current_context.set(self)
self._tokens.append(token)
return self
Child contexts can read parent state safely without risking accidental mutation of the parent.
Client interactions: logs, sampling, elicitation
Logs are mirrored to a server-side logger at DEBUG while being sent to the client at the requested MCP LoggingLevel. Progress is conditionally reported based on a client-supplied token. Sampling normalizes strings or typed messages and either dispatches to the client (via session.create_message) or falls back to a local handler depending on capability and configuration.
Elicitation is a thoughtful abstraction: it generates JSON Schema from a type (including handling list[str] as a Literal choice), sends the request, and validates the response with cached type adapters. The return type matches the Accepted/Declined/Cancelled triad used in the rest of the server.
Error handling strategy
Calls that require an active request raise ValueError when misused (e.g., accessing request_context without a request). Sampling without a configured handler when falling back also raises ValueError. Notification flushing intentionally swallows exceptions to avoid breaking request teardown; we’ll revisit this tradeoff later for observability.
What’s Brilliant
Now that we’ve covered the surface, let’s highlight the design choices that make this module pleasant and safe to use.
1) A clean façade over MCP primitives
The class is a true façade: you don’t need to know about ServerSession details to log, sample, elicit, or handle list changes. The Law of Demeter is respected; the raw session is exposed as an escape hatch without being required for everyday use. This keeps handler code small and expressive.
2) Developer experience (DX) wins everywhere
-
Convenience logging via
debug/info/warning/errormethods. All are consistently mirrored toto_client_loggeratDEBUGto keep your server logs complete. -
Sampling ergonomics: strings or
SamplingMessagesequences are accepted;model_preferencesgracefully accepts aModelPreferencesinstance, a string, or a list of strings. -
Typed elicitation with automatic schema conversion and validation. Returning
Accepted/Declined/Cancelledmakes downstream logic straightforward. - State inheritance prevents accidental data bleed across nested operations.
3) Sensible invariants and safety checks
request_contextraises on misuse outside a valid request.- Notification topics are deduplicated using a set.
- Session IDs are stable across transports by persisting to
session._fastmcp_id.
4) Elicitation type normalization—done right
Converting list[str] into a Literal and wrapping scalars ensures client-compatible schemas without burdening callers.
# if the user provided a list of strings, treat it as a Literal
if isinstance(response_type, list):
if not all(isinstance(item, str) for item in response_type):
raise ValueError(
"List of options must be a list of strings. Received: "
f"{response_type}"
)
# Convert list of options to Literal type and wrap
choice_literal = Literal[tuple(response_type)] # type: ignore
response_type = ScalarElicitationType[choice_literal] # type: ignore
# if the user provided a primitive scalar, wrap it in an object schema
elif (
response_type in {bool, int, float, str}
or get_origin(response_type) is Literal
or (isinstance(response_type, type) and issubclass(response_type, Enum))
):
response_type = ScalarElicitationType[response_type] # type: ignore
response_type = cast(type[T], response_type)
Callers can stay expressive while the server enforces a protocol-compatible schema and type validation.
Areas for Improvement
Great code gets even better with targeted, low-risk changes. Here are concrete improvements, tied to impact and proposed fixes.
| Smell | Impact | Fix |
|---|---|---|
Global _flush_lock serializes notification flush across all requests |
Throughput bottleneck at teardown under concurrency | Use a per-Context lock to eliminate cross-request contention |
| Deep copy of state on nested context entry | CPU/memory overhead proportional to state size | Consider a persistent mapping/copy-on-write, or enforce immutability |
Broad exception swallowing in _flush_notifications |
Silent failures and lost observability | Log exceptions with request/session context; add a metric |
Access to private attribute session._fastmcp_id |
Upgrade fragility if session internals change | Add a public helper on FastMCP/session wrapper for a session-scoped ID |
| No timeouts on network-dependent calls | Risk of hung tasks and resource pile-ups | Wrap calls with anyio.fail_after with configurable defaults |
Apply timeouts to networked operations
Sampling, elicitation, logging, and notifications depend on client responsiveness. Adding explicit timeouts avoids indefinite hangs and clarifies failures. Here’s a targeted refactor of the sampling call:
session.create_message (diff)--- a/src/fastmcp/server/context.py
+++ b/src/fastmcp/server/context.py
@@
- result: CreateMessageResult = await self.session.create_message(
+ import anyio
+ # Enforce a reasonable timeout to avoid hung tasks
+ with anyio.fail_after(30):
+ result: CreateMessageResult = await self.session.create_message(
messages=sampling_messages,
system_prompt=system_prompt,
include_context=include_context,
temperature=temperature,
max_tokens=max_tokens,
model_preferences=_parse_model_preferences(model_preferences),
related_request_id=self.request_id,
- )
+ )
This enforces a clear boundary (e.g., 30s) and aligns with an SLO like “p95 < 5s; timeout at 30s” for sampling latency.
Improve concurrency by removing the global teardown lock
The current implementation flushes notifications under a global lock, serializing unrelated requests. Switching to a per-Context lock localizes contention and improves throughput during heavy concurrency.
--- a/src/fastmcp/server/context.py
+++ b/src/fastmcp/server/context.py
@@
-_flush_lock = anyio.Lock()
+_flush_lock = None # deprecated global lock
@@ class Context:
- self._state: dict[str, Any] = {}
+ self._state: dict[str, Any] = {}
+ self._flush_lock = anyio.Lock()
@@
- async with _flush_lock:
+ async with self._flush_lock:
if not self._notification_queue:
return
Removes a global critical section. Each request flushes independently, reducing tail latency at request completion.
Recover observability on flush failures
Silent failures are painful in production. Logging contextual details on flush errors preserves resilience while restoring debuggability.
--- a/src/fastmcp/server/context.py
+++ b/src/fastmcp/server/context.py
@@
- except Exception:
- # Don't let notification failures break the request
- pass
+ except Exception as exc:
+ # Don't let notification failures break the request, but record them
+ logger.exception("Failed to flush MCP notifications", extra={
+ "request_id": self.request_id,
+ "session_id": self.session_id,
+ "queued": list(self._notification_queue),
+ })
This complements metrics like context.notifications.flush_duration_ms and enables alerting when flush failures spike.
Targeted tests to lock behavior
A few focused tests go a long way. For example, verify that session_id persists across calls (and prefers an inbound header when present).
# illustrative test (pytest + anyio)
import types
import anyio
import pytest
class FakeRequest:
def __init__(self, headers=None):
self.headers = headers or {}
class FakeSession:
pass
class FakeRequestContext:
def __init__(self, session, request):
self.session = session
self.request = request
self.meta = types.SimpleNamespace(progressToken=None)
self.request_id = "req-1"
@pytest.mark.anyio
async def test_session_id_persistence(ctx_factory):
session = FakeSession()
req = FakeRequest()
rc = FakeRequestContext(session, req)
ctx = ctx_factory(rc)
async with ctx:
id1 = ctx.session_id
id2 = ctx.session_id
assert id1 == id2
assert getattr(session, "_fastmcp_id") == id1
Ensures a stable key for session-scoped storage across tool invocations.
Performance at Scale
With the basics optimized, we can turn to hot paths, concurrency, and observability so this module performs predictably under load.
Hot paths and resource costs
-
Sampling (
Context.sample): Normalization cost is small, but network latency dominates. Apply timeouts and monitor latency histograms. - Elicitation: Schema build is O(1); network dominates. Track cancellations and declines to understand user behavior.
- Logging: Mirrored server logs plus client I/O. Watch for backpressure.
- Notification flush: O(k) over at most three notification types; make it concurrency-friendly (per-context locks).
- State deepcopy on nested contexts: cost scales with state size. Keep state small and immutable where possible.
Concurrency and contention
- ContextVar ensures correct context association per task, even when handlers spawn sub-tasks.
- Global lock (current implementation) serializes notification flush. Switching to a per-context lock avoids cross-request blocking at teardown.
Reliability controls and timeouts
To avoid resource pile-ups, use explicit timeouts for calls such as session.create_message, session.elicit, session.send_log_message, and the notification sends. Pair timeouts with meaningful error mapping and server-side retries when appropriate.
Observability: logs, metrics, traces
Instrument the module with a lean, actionable telemetry plan:
-
Logs:
- Server→client sends with level and
related_request_id. - Exceptions on notification flush with
request_idandsession_id. - Deprecation warnings for
get_http_request.
- Server→client sends with level and
-
Metrics:
mcp.outbound.log_messages_totalto observe log volume by level/logger.mcp.sampling.latency_mswith a target like p95 < 5s; timeout at 30s.mcp.elicit.latency_mswith a target like p95 < 30s and cancellation tracking.context.notifications.flush_duration_mswith a target like p95 < 100ms.context.state.size_bytesto bound deepcopy cost (e.g., mean < 10KB).
-
Traces:
- Spans around
sampleandelicitincluding schema build and session calls. - Span for
_flush_notificationswith events per notification type.
- Spans around
-
Alerts:
- High sampling latency (p95 breaches).
- Frequent notification flush failures.
- Spikes in error-level client logs.
- Timeouts on
session.create_messageorsession.elicit.
Conclusion
FastMCP’s Context is a strong façade over MCP: it gives handlers a clean, typed API for logging, sampling, eliciting, and managing lightweight state. The architecture applies sensible defaults and safety checks, while leaving room to extend capabilities over time.
My top takeaways:
- Keep the façade clean and forgiving; normalize inputs at the boundary and validate outputs rigorously.
- Add small reliability features—timeouts and contextual error logs—to turn edge cases into visible, actionable signals.
- Remove global contention hotspots (like the teardown lock) and measure the hot paths you rely on.
If you’re working with MCP servers, consider adopting this pattern: a single, ergonomic context object with typed affordances and strong invariants. It shortens feedback loops for juniors and gives seniors the operational hooks they need when systems scale.
Explore the source: fastmcp repo · context.py. I hope this walkthrough helps you ship safer, more maintainable MCP servers.



