We’re examining how Langfuse calls multiple LLM providers through a single TypeScript function. Langfuse is an observability and analytics platform for LLM applications, and at its core it needs to talk to OpenAI, Anthropic, Bedrock, Vertex, and others without leaking that complexity into the rest of the system.
I’m Mahmoud Zalt, an AI software engineer, and
we’ll use Langfuse’s fetchLLMCompletion as a concrete example of
how to design a universal LLM dialer: one stable function that hides
provider quirks, message formats, credentials, streaming, and errors.
The core lesson is simple: treat LLM providers as infrastructure and put one well‑designed facade in front of them. Everything in this article shows how that decision pays off in message handling, adapters, error semantics, and operations.
The scene: one dialer, many networks
To see what problem this file solves, we need a quick look at where it lives in the codebase.
packages/
shared/
src/
server/
llm/
types.ts
errors.ts
utils.ts
getInternalTracingHandler.ts
fetchLLMCompletion.ts <--- unified LLM invocation facade
fetchLLMCompletion.ts sits in a shared server layer, between
Langfuse and external LLM providers.
Conceptually, this file exposes one public function:
fetchLLMCompletion. Callers pass messages, model configuration
and connection details; the function chooses the right LangChain client
(OpenAI, Azure, Anthropic, Bedrock, Vertex, Google AI Studio), wires
authentication, decides whether to stream, sets up tools or structured
output, and normalizes errors.
Think of it as a universal LLM dialer: callers just dial a model, this module handles the country codes, networks, and routing rules. The rest of the system never needs to know which provider actually served the request.
Normalizing messages at the boundary
Every LLM SDK has its own idea of what a chat message looks like. If you let those schemas leak, switching providers becomes a minefield of subtle bugs. The first responsibility of the universal dialer is to own this translation layer.
Langfuse uses a project‑wide ChatMessage type. Inside
fetchLLMCompletion.ts, those are converted into LangChain’s
BaseMessage variants (HumanMessage,
SystemMessage, AIMessage, ToolMessage)
while enforcing provider‑specific rules.
Providers that demand a user message
Some providers reject a request that contains only a system or developer message. That’s not something you want every caller to remember, so the facade quietly fixes it for adapters that require at least one user message.
const PROVIDERS_WITH_REQUIRED_USER_MESSAGE = [
LLMAdapter.VertexAI,
LLMAdapter.GoogleAIStudio,
LLMAdapter.Anthropic,
LLMAdapter.Bedrock,
];
const transformSystemMessageToUserMessage = (
messages: ChatMessage[],
): BaseMessage[] => {
const safeContent =
typeof messages[0].content === "string"
? messages[0].content
: JSON.stringify(messages[0].content);
return [new HumanMessage(safeContent)];
};
If there is exactly one message and the adapter is in that list, the system
rewrites the system/developer message into a HumanMessage. The
call becomes valid for the provider, and the rest of the code doesn’t need
to know this quirk exists.
Role‑aware mapping and defensive content handling
The main mapping logic is where the “customs office” for messages really lives:
let finalMessages: BaseMessage[];
if (
messages.length === 1 &&
PROVIDERS_WITH_REQUIRED_USER_MESSAGE.includes(modelParams.adapter)
) {
finalMessages = transformSystemMessageToUserMessage(messages);
} else {
finalMessages = messages.map((message, idx) => {
const safeContent =
typeof message.content === "string"
? message.content
: safeStringify(message.content);
if (message.role === ChatMessageRole.User)
return new HumanMessage(safeContent);
if (
message.role === ChatMessageRole.System ||
message.role === ChatMessageRole.Developer
)
return idx === 0
? new SystemMessage(safeContent)
: new HumanMessage(safeContent);
if (message.type === ChatMessageType.ToolResult) {
return new ToolMessage({
content: safeContent,
tool_call_id: message.toolCallId,
});
}
return new AIMessage({
content: safeContent,
tool_calls:
message.type === ChatMessageType.AssistantToolCall
? (message.toolCalls as any)
: undefined,
});
});
}
finalMessages = finalMessages.filter(
(m) => m.content.length > 0 || "tool_calls" in m,
);
A few design choices here matter for correctness and resilience:
-
Defensive serialization: non‑string content passes through
safeStringify. If JSON serialization fails, it falls back to a placeholder instead of throwing, so malformed payloads don’t crash the whole call. -
Role rules: the first system/developer message becomes a
SystemMessage; later ones are downgraded toHumanMessage. This aligns with how many providers treat “extra” system‑like messages. -
Tools and tool calls: tool results map to
ToolMessage, assistant tool calls becometool_callson anAIMessage, matching LangChain’s expectations. - Empty message filtering: messages with empty content and no tool calls are dropped to avoid provider validation errors.
Designing the universal LLM dialer
With messages normalized, the next step is choosing and configuring the right client for each provider. This is where the Adapter and Facade patterns show up in practice: adapters make individual SDKs look uniform, and the facade presents one simple interface to the rest of the system.
At the top level, fetchLLMCompletion is overloaded to expose a
single, type‑safe entry point:
streaming: true→IterableReadableStreamstreaming: false→stringstreaming: false+structuredOutputSchema→ parsed objectstreaming: false+tools→ToolCallResponse
Callers get strong TypeScript guarantees while the implementation hides all branching and provider selection.
Provider‑specific adapters in one place
Internally, a provider switch decides which LangChain client to construct. The Anthropic branch illustrates the pattern and how provider quirks stay contained:
if (modelParams.adapter === LLMAdapter.Anthropic) {
const isClaude45Family =
modelParams.model?.includes("claude-sonnet-4-5") ||
modelParams.model?.includes("claude-opus-4-1") ||
modelParams.model?.includes("claude-opus-4-5") ||
modelParams.model?.includes("claude-haiku-4-5");
const chatOptions: Record = {
anthropicApiKey: apiKey,
anthropicApiUrl: baseURL ?? undefined,
modelName: modelParams.model,
maxTokens: modelParams.max_tokens,
callbacks: finalCallbacks,
clientOptions: {
maxRetries,
timeout: timeoutMs,
...(proxyAgent && { httpAgent: proxyAgent }),
},
temperature: modelParams.temperature,
topP: modelParams.top_p,
invocationKwargs: modelParams.providerOptions,
};
chatModel = new ChatAnthropic(chatOptions);
if (isClaude45Family) {
if (chatModel.topP === -1) chatModel.topP = undefined;
// Claude 4.5 rejects requests when both topP and temperature are set.
if (
modelParams.temperature !== undefined &&
modelParams.top_p === undefined
) {
chatModel.topP = undefined;
}
if (
modelParams.top_p !== undefined &&
modelParams.temperature === undefined
) {
chatModel.temperature = undefined;
}
}
}
Here, the facade hides a provider‑specific constraint: some Claude 4.5
models fail if both topP and temperature are set.
LangChain may inject placeholder values, so the adapter actively clears the
conflicting parameter. From the caller’s perspective, they just set the
knobs they care about; the adapter makes sure the request is valid.
Other branches cover OpenAI, Azure OpenAI, Bedrock, Vertex, and Google AI
Studio. They all follow the same structure: take generalized
ModelParams and a connection description, then construct the
right client with appropriate URLs, headers, timeouts, and callbacks.
Security‑aware credential routing
The universal dialer doesn’t just choose a client; it also decides how the call is authenticated. This file supports both explicit API keys and cloud “default credential chains” (AWS IAM roles, GCP application‑default credentials), but only in trusted contexts.
In the Bedrock adapter, the default AWS credential chain is used only when either:
- the deployment is self‑hosted (not Langfuse Cloud), or
- an internal flag (such as
shouldUseLangfuseAPIKey) explicitly allows it.
Vertex AI follows a similar idea: when using application‑default
credentials, the adapter intentionally ignores any user‑provided
projectId to avoid cross‑project privilege escalation.
The facade is not just a convenience layer; it’s an architectural boundary where you decide which credentials are allowed to serve which traffic. For a multi‑tenant AI system, that separation is as important as the request/response types.
On the performance side, the hot paths are predictable: message
transformation is O(n) in the number of messages, provider
instantiation runs per call, and the network round‑trip dominates latency.
For long responses, streaming mode pipes outputs through a
BytesOutputParser and returns an
IterableReadableStream to avoid building huge
strings in memory.
Errors, retries, and tracing
A good facade also owns failure semantics. Callers shouldn’t need to know that Anthropic and OpenAI emit different error shapes or which failures are worth retrying. This file standardizes all of that into a single domain error type.
Every failure is wrapped into LLMCompletionError with two fields
the rest of the system can reason about:
responseStatusCode: an HTTP‑like status code-
isRetryable: whether higher‑level policies should attempt a retry
} catch (e) {
const responseStatusCode =
(e as any)?.response?.status ?? (e as any)?.status ?? 500;
const message = e instanceof Error ? e.message : String(e);
const nonRetryablePatterns = [
"Request timed out",
"is not valid JSON",
"Unterminated string in JSON at position",
"TypeError",
];
const hasNonRetryablePattern = nonRetryablePatterns.some((pattern) =>
message.includes(pattern),
);
let isRetryable = false;
if (
e instanceof Error &&
(e.name === "InsufficientQuotaError" || e.name === "ThrottlingException")
) {
isRetryable = true;
} else if (responseStatusCode >= 500) {
isRetryable = true;
} else if (responseStatusCode === 429) {
isRetryable = true;
}
if (hasNonRetryablePattern) {
isRetryable = false;
}
throw new LLMCompletionError({
message,
responseStatusCode,
isRetryable,
});
} finally {
await processTracedEvents();
}
The mental model is an air‑traffic control tower for errors:
- 5xx responses and 429 (rate limits) are considered transient “bad weather” and marked retryable.
- Explicit quota and throttling error types also become retryable, even if the numeric status code isn’t enough on its own.
- Obvious client bugs—invalid JSON, type errors, certain timeouts—override that logic and are forced to non‑retryable so the system doesn’t hammer providers with broken requests.
Tracing without feedback loops
The same catch/finally block also integrates with Langfuse’s tracing.
A tracing handler is added as a LangChain callback only when the
traceSinkParams.environment starts with "langfuse".
Otherwise, the function skips tracing for that call.
That guard prevents a nasty feedback loop: a user trace triggering an evaluation which triggers another trace, and so on. By constraining which environments are allowed to emit internal traces, the facade enforces observability safety rails at the same layer that standardizes errors.
From an operations perspective, this universal dialer is also a natural observability choke point. It’s the place to track latency, error rates, and adapter usage across all providers, rather than sprinkling instrumentation throughout callers.
Practical takeaways
We’ve walked through a single TypeScript file, but the pattern scales to any system that talks to more than one LLM provider. The key is to treat this file as infrastructure, not just a helper around an SDK.
- Build a universal dialer early. Don’t let services talk directly to providers. Introduce a single facade that owns provider selection, credentials, proxies, tracing, streaming, and errors. The moment you add a second provider, that abstraction starts paying for itself.
-
Normalize messages at the boundary. Centralize
role‑mapping, content stringification, and provider quirks (like
“requires a user message”) in one “customs office” layer. Everywhere else
should just pass a project‑wide
ChatMessage[]. -
Make errors actionable. Wrap raw SDK failures into a
domain error with
statusCodeandisRetryable. That extra boolean is what lets you implement clean retry policies, better alerts, and simpler caller code. - Be explicit about credential safety. If you support default cloud credentials, gate them behind clear environment checks and flags. Never let untrusted tenant traffic ride on shared infra creds without those guardrails.
- Use the facade as your observability hub. Attach metrics, logs, and traces at the universal dialer, not scattered across callers. That’s where you’ll first notice provider outages, latency regressions, or misclassified retry logic.
If you design your LLM integration as an evolving piece of infrastructure,
with one universal dialer at its center, you can swap providers, add new
capabilities, and scale traffic without rewriting half your application.
A function like fetchLLMCompletion turns provider churn into a
local refactor instead of a system‑wide migration.



