One Function To Call Every LLM

We’re examining how Langfuse calls multiple LLM providers through a single TypeScript function. Langfuse is an observability and analytics platform for LLM applications, and at its core it needs to talk to OpenAI, Anthropic, Bedrock, Vertex, and others without leaking that complexity into the rest of the system.

I’m Mahmoud Zalt, an AI software engineer, and we’ll use Langfuse’s fetchLLMCompletion as a concrete example of how to design a universal LLM dialer: one stable function that hides provider quirks, message formats, credentials, streaming, and errors.

The core lesson is simple: treat LLM providers as infrastructure and put one well‑designed facade in front of them. Everything in this article shows how that decision pays off in message handling, adapters, error semantics, and operations.

The scene: one dialer, many networks

To see what problem this file solves, we need a quick look at where it lives in the codebase.

packages/
  shared/
    src/
      server/
        llm/
          types.ts
          errors.ts
          utils.ts
          getInternalTracingHandler.ts
          fetchLLMCompletion.ts  <--- unified LLM invocation facade

fetchLLMCompletion.ts sits in a shared server layer, between Langfuse and external LLM providers.

Conceptually, this file exposes one public function: fetchLLMCompletion. Callers pass messages, model configuration and connection details; the function chooses the right LangChain client (OpenAI, Azure, Anthropic, Bedrock, Vertex, Google AI Studio), wires authentication, decides whether to stream, sets up tools or structured output, and normalizes errors.

Think of it as a universal LLM dialer: callers just dial a model, this module handles the country codes, networks, and routing rules. The rest of the system never needs to know which provider actually served the request.

Normalizing messages at the boundary

Every LLM SDK has its own idea of what a chat message looks like. If you let those schemas leak, switching providers becomes a minefield of subtle bugs. The first responsibility of the universal dialer is to own this translation layer.

Langfuse uses a project‑wide ChatMessage type. Inside fetchLLMCompletion.ts, those are converted into LangChain’s BaseMessage variants (HumanMessage, SystemMessage, AIMessage, ToolMessage) while enforcing provider‑specific rules.

Providers that demand a user message

Some providers reject a request that contains only a system or developer message. That’s not something you want every caller to remember, so the facade quietly fixes it for adapters that require at least one user message.

const PROVIDERS_WITH_REQUIRED_USER_MESSAGE = [
  LLMAdapter.VertexAI,
  LLMAdapter.GoogleAIStudio,
  LLMAdapter.Anthropic,
  LLMAdapter.Bedrock,
];

const transformSystemMessageToUserMessage = (
  messages: ChatMessage[],
): BaseMessage[] => {
  const safeContent =
    typeof messages[0].content === "string"
      ? messages[0].content
      : JSON.stringify(messages[0].content);
  return [new HumanMessage(safeContent)];
};

If there is exactly one message and the adapter is in that list, the system rewrites the system/developer message into a HumanMessage. The call becomes valid for the provider, and the rest of the code doesn’t need to know this quirk exists.

Role‑aware mapping and defensive content handling

The main mapping logic is where the “customs office” for messages really lives:

let finalMessages: BaseMessage[];

if (
  messages.length === 1 &&
  PROVIDERS_WITH_REQUIRED_USER_MESSAGE.includes(modelParams.adapter)
) {
  finalMessages = transformSystemMessageToUserMessage(messages);
} else {
  finalMessages = messages.map((message, idx) => {
    const safeContent =
      typeof message.content === "string"
        ? message.content
        : safeStringify(message.content);

    if (message.role === ChatMessageRole.User)
      return new HumanMessage(safeContent);

    if (
      message.role === ChatMessageRole.System ||
      message.role === ChatMessageRole.Developer
    )
      return idx === 0
        ? new SystemMessage(safeContent)
        : new HumanMessage(safeContent);

    if (message.type === ChatMessageType.ToolResult) {
      return new ToolMessage({
        content: safeContent,
        tool_call_id: message.toolCallId,
      });
    }

    return new AIMessage({
      content: safeContent,
      tool_calls:
        message.type === ChatMessageType.AssistantToolCall
          ? (message.toolCalls as any)
          : undefined,
    });
  });
}

finalMessages = finalMessages.filter(
  (m) => m.content.length > 0 || "tool_calls" in m,
);

A few design choices here matter for correctness and resilience:

Defensive serialization: non‑string content passes through safeStringify. If JSON serialization fails, it falls back to a placeholder instead of throwing, so malformed payloads don’t crash the whole call.
Role rules: the first system/developer message becomes a SystemMessage; later ones are downgraded to HumanMessage. This aligns with how many providers treat “extra” system‑like messages.
Tools and tool calls: tool results map to ToolMessage, assistant tool calls become tool_calls on an AIMessage, matching LangChain’s expectations.
Empty message filtering: messages with empty content and no tool calls are dropped to avoid provider validation errors.

Designing the universal LLM dialer

With messages normalized, the next step is choosing and configuring the right client for each provider. This is where the Adapter and Facade patterns show up in practice: adapters make individual SDKs look uniform, and the facade presents one simple interface to the rest of the system.

At the top level, fetchLLMCompletion is overloaded to expose a single, type‑safe entry point:

streaming: true → IterableReadableStream
streaming: false → string
streaming: false + structuredOutputSchema → parsed object
streaming: false + tools → ToolCallResponse

Callers get strong TypeScript guarantees while the implementation hides all branching and provider selection.

Provider‑specific adapters in one place

Internally, a provider switch decides which LangChain client to construct. The Anthropic branch illustrates the pattern and how provider quirks stay contained:

if (modelParams.adapter === LLMAdapter.Anthropic) {
  const isClaude45Family =
    modelParams.model?.includes("claude-sonnet-4-5") ||
    modelParams.model?.includes("claude-opus-4-1") ||
    modelParams.model?.includes("claude-opus-4-5") ||
    modelParams.model?.includes("claude-haiku-4-5");

  const chatOptions: Record = {
    anthropicApiKey: apiKey,
    anthropicApiUrl: baseURL ?? undefined,
    modelName: modelParams.model,
    maxTokens: modelParams.max_tokens,
    callbacks: finalCallbacks,
    clientOptions: {
      maxRetries,
      timeout: timeoutMs,
      ...(proxyAgent && { httpAgent: proxyAgent }),
    },
    temperature: modelParams.temperature,
    topP: modelParams.top_p,
    invocationKwargs: modelParams.providerOptions,
  };

  chatModel = new ChatAnthropic(chatOptions);

  if (isClaude45Family) {
    if (chatModel.topP === -1) chatModel.topP = undefined;

    // Claude 4.5 rejects requests when both topP and temperature are set.
    if (
      modelParams.temperature !== undefined &&
      modelParams.top_p === undefined
    ) {
      chatModel.topP = undefined;
    }

    if (
      modelParams.top_p !== undefined &&
      modelParams.temperature === undefined
    ) {
      chatModel.temperature = undefined;
    }
  }
}

Here, the facade hides a provider‑specific constraint: some Claude 4.5 models fail if both topP and temperature are set. LangChain may inject placeholder values, so the adapter actively clears the conflicting parameter. From the caller’s perspective, they just set the knobs they care about; the adapter makes sure the request is valid.

Other branches cover OpenAI, Azure OpenAI, Bedrock, Vertex, and Google AI Studio. They all follow the same structure: take generalized ModelParams and a connection description, then construct the right client with appropriate URLs, headers, timeouts, and callbacks.

Security‑aware credential routing

The universal dialer doesn’t just choose a client; it also decides how the call is authenticated. This file supports both explicit API keys and cloud “default credential chains” (AWS IAM roles, GCP application‑default credentials), but only in trusted contexts.

In the Bedrock adapter, the default AWS credential chain is used only when either:

the deployment is self‑hosted (not Langfuse Cloud), or
an internal flag (such as shouldUseLangfuseAPIKey) explicitly allows it.

Vertex AI follows a similar idea: when using application‑default credentials, the adapter intentionally ignores any user‑provided projectId to avoid cross‑project privilege escalation.

The facade is not just a convenience layer; it’s an architectural boundary where you decide which credentials are allowed to serve which traffic. For a multi‑tenant AI system, that separation is as important as the request/response types.

On the performance side, the hot paths are predictable: message transformation is O(n) in the number of messages, provider instantiation runs per call, and the network round‑trip dominates latency. For long responses, streaming mode pipes outputs through a BytesOutputParser and returns an IterableReadableStream to avoid building huge strings in memory.

Errors, retries, and tracing

A good facade also owns failure semantics. Callers shouldn’t need to know that Anthropic and OpenAI emit different error shapes or which failures are worth retrying. This file standardizes all of that into a single domain error type.

Every failure is wrapped into LLMCompletionError with two fields the rest of the system can reason about:

responseStatusCode: an HTTP‑like status code
isRetryable: whether higher‑level policies should attempt a retry

} catch (e) {
  const responseStatusCode =
    (e as any)?.response?.status ?? (e as any)?.status ?? 500;
  const message = e instanceof Error ? e.message : String(e);

  const nonRetryablePatterns = [
    "Request timed out",
    "is not valid JSON",
    "Unterminated string in JSON at position",
    "TypeError",
  ];

  const hasNonRetryablePattern = nonRetryablePatterns.some((pattern) =>
    message.includes(pattern),
  );

  let isRetryable = false;

  if (
    e instanceof Error &&
    (e.name === "InsufficientQuotaError" || e.name === "ThrottlingException")
  ) {
    isRetryable = true;
  } else if (responseStatusCode >= 500) {
    isRetryable = true;
  } else if (responseStatusCode === 429) {
    isRetryable = true;
  }

  if (hasNonRetryablePattern) {
    isRetryable = false;
  }

  throw new LLMCompletionError({
    message,
    responseStatusCode,
    isRetryable,
  });
} finally {
  await processTracedEvents();
}

The mental model is an air‑traffic control tower for errors:

5xx responses and 429 (rate limits) are considered transient “bad weather” and marked retryable.
Explicit quota and throttling error types also become retryable, even if the numeric status code isn’t enough on its own.
Obvious client bugs—invalid JSON, type errors, certain timeouts—override that logic and are forced to non‑retryable so the system doesn’t hammer providers with broken requests.

Tracing without feedback loops

The same catch/finally block also integrates with Langfuse’s tracing. A tracing handler is added as a LangChain callback only when the traceSinkParams.environment starts with "langfuse". Otherwise, the function skips tracing for that call.

That guard prevents a nasty feedback loop: a user trace triggering an evaluation which triggers another trace, and so on. By constraining which environments are allowed to emit internal traces, the facade enforces observability safety rails at the same layer that standardizes errors.

From an operations perspective, this universal dialer is also a natural observability choke point. It’s the place to track latency, error rates, and adapter usage across all providers, rather than sprinkling instrumentation throughout callers.

Practical takeaways

We’ve walked through a single TypeScript file, but the pattern scales to any system that talks to more than one LLM provider. The key is to treat this file as infrastructure, not just a helper around an SDK.

Build a universal dialer early. Don’t let services talk directly to providers. Introduce a single facade that owns provider selection, credentials, proxies, tracing, streaming, and errors. The moment you add a second provider, that abstraction starts paying for itself.
Normalize messages at the boundary. Centralize role‑mapping, content stringification, and provider quirks (like “requires a user message”) in one “customs office” layer. Everywhere else should just pass a project‑wide ChatMessage[].
Make errors actionable. Wrap raw SDK failures into a domain error with statusCode and isRetryable. That extra boolean is what lets you implement clean retry policies, better alerts, and simpler caller code.
Be explicit about credential safety. If you support default cloud credentials, gate them behind clear environment checks and flags. Never let untrusted tenant traffic ride on shared infra creds without those guardrails.
Use the facade as your observability hub. Attach metrics, logs, and traces at the universal dialer, not scattered across callers. That’s where you’ll first notice provider outages, latency regressions, or misclassified retry logic.

If you design your LLM integration as an evolving piece of infrastructure, with one universal dialer at its center, you can swap providers, add new capabilities, and scale traffic without rewriting half your application. A function like fetchLLMCompletion turns provider churn into a local refactor instead of a system‑wide migration.

Zalt Blog

One Function To Call Every LLM

Are you a software engineer moving into AI?

AI Personal Assistant

AI Marketing Manager

AI Sales Representative

AI Support Specialist

The scene: one dialer, many networks

Normalizing messages at the boundary

Providers that demand a user message

Role‑aware mapping and defensive content handling

Designing the universal LLM dialer

Provider‑specific adapters in one place

Security‑aware credential routing

Errors, retries, and tracing

Tracing without feedback loops

Practical takeaways

Full Source Code

Read More

How to Architect AI Cost: Controlling Token Spend Before It Runs Away

The 90-Day Plan to Go from Strong Engineer to Shipping Production AI

Free AI Tools

About the Author

Support this content

Share this article