Should You Build a Custom MCP Server for Your AI Agent?

Do You Need a Custom MCP Server for Your AI Agent?

Probably not yet. Most production AI agents I audit are over-engineered at the tool layer and under-engineered at the eval and retrieval layer. A custom MCP server makes sense only when you have a stable, multi-tool API surface that multiple agents or host applications need to share, and when the overhead of defining that surface as reusable, versioned JSON schema pays back faster than shipping the feature inline.

I am Mahmoud Zalt, an independent senior AI systems architect with 16+ years building production software since 2010. I founded Sista AI, and over the past year I have wired a production fleet of autonomous agents to the exact kind of custom MCP servers this article walks through. I work with engineering teams as a solo architect, not an agency, on AI agent development. What follows is the real decision framework I use on client engagements, not a vendor pitch for MCP.

What MCP Actually Is (and Is Not)

Model Context Protocol is an open standard, published by Anthropic in late 2024, that defines how a language model host (Claude Desktop, Cursor, your custom agent runtime) discovers and calls tools exposed by a separate process called an MCP server. The wire format is JSON-RPC 2.0 over stdio or HTTP/SSE. The server declares a manifest of tools, resources, and prompts. The host reads the manifest and injects it into the model context.

What MCP is not: it is not a magic performance layer, it is not a retrieval system, and it is not a replacement for good prompt engineering. It is a standardised plug interface. The analogy is a USB-C port: useful when you have a stable device ecosystem, overkill if you are charging one laptop.

The Three Primitives

Tools: callable functions with a JSON Schema input spec. The model decides when to call them.
Resources: URI-addressable data blobs (files, DB rows, API responses) the host can read into context.
Prompts: reusable prompt templates the host can surface to the user or inject programmatically.

Most teams only need the Tools primitive. Resources and Prompts are genuinely useful in multi-agent pipelines and IDE integrations, but they are rarely the first thing a product needs.

When a Custom MCP Server Is the Right Call

There are four concrete scenarios where I recommend building a custom MCP server rather than inlining tool definitions into an agent.

1. You Have Multiple Agents or Hosts Consuming the Same API Surface

If three agents (a Slack bot, a web dashboard assistant, and a nightly batch summariser) all need to call your internal CRM, defining the tool schema once in a shared MCP server and letting each host discover it eliminates drift. One schema update propagates everywhere. Without MCP you end up with three copies of the same JSON Schema diverging within two sprints.

2. Your Tools Have Complex Auth, Rate Limiting, or Side-Effect Guards

A custom MCP server is the right place to centralise OAuth token refresh, per-user rate limit enforcement, and guardrails like 'never delete a record older than 90 days without a human approval step.' These do not belong in the prompt and they do not belong scattered across three agent codebases. The server owns them once.

3. You Are Building a Platform Others Will Integrate

If your product is infrastructure or a developer platform, publishing an MCP server is the modern equivalent of publishing a REST SDK. GitHub, Linear, and Stripe already do this. Your enterprise customers will expect it by 2026.

4. You Need Typed, Versioned, Testable Tool Contracts

MCP servers force you to write a machine-readable schema for every tool. That schema becomes a contract you can version, validate inputs against, and write unit tests for independently of the LLM. This matters on teams with more than two engineers touching the agent layer.

When MCP Is Premature Plumbing

This is the section most MCP tutorials skip. The majority of AI agent projects I review do not need a custom MCP server at the time they are building one. Here is how to recognise premature MCP investment.

You Have Fewer Than Five Stable Tools

If your agent calls three internal endpoints and they change every two weeks, inline tool definitions in your agent code are faster to iterate. MCP adds a process boundary, a manifest, a separate deploy artifact, and a version contract. That overhead is negative ROI until the surface stabilises.

You Have One Agent and One Host

MCP's value is network effects across multiple consumers. One agent equals no network effect. Inline the tools and ship.

Your Real Problem Is Retrieval, Not Tool Discovery

I see this constantly: a team spends three weeks building an MCP server to expose a knowledge base, when what they actually needed was a vector search index with a single search_knowledge_base(query) tool call. The MCP layer added nothing. A well-chunked embedding pipeline with a simple tool definition would have solved it in two days.

Your Real Problem Is Evals, Not Architecture

If your agent gives wrong answers, an MCP server will not fix that. Evals will. Spend the three weeks on a golden dataset, a judge LLM, and a regression suite before you invest in infrastructure.

Worked Example: CRM Agent Before and After MCP

A B2B SaaS client came to me with a sales assistant agent that called their CRM via a Zapier webhook. The tool definition was 200 lines of inline JSON stuffed into the system prompt. It worked, but every schema change required redeploying the agent, and a new mobile agent they were building needed the same tools.

Before: Inline Tool Definitions

Single agent. Tools defined as raw JSON in a Python dict inside the agent module. Auth tokens hardcoded as environment variables read inside the agent. No versioning. No tests for tool schemas. The Zapier webhook timeout caused silent failures the agent could not handle gracefully.

After: Thin MCP Server

We extracted the CRM tools into a lightweight FastAPI-based MCP server (about 400 lines including tests). The server handled OAuth refresh, enforced a 'no bulk delete without approval' guardrail, and exposed four tools: search_contacts, get_contact_detail, create_activity, update_deal_stage. Both the sales assistant and the new mobile agent consumed the same manifest. Tool schema tests ran in CI independently of the LLM. The agent code shrank by 60% because all the plumbing moved to the server.

The key constraint: we waited until both agents were confirmed necessary and the tool surface had been stable for four weeks. Building the MCP server on week one would have been waste.

Production Considerations Nobody Mentions in MCP Tutorials

Observability

Every MCP tool call should emit a structured log: tool name, input hash, latency, success/failure, user or session ID. Without this you are flying blind when the agent misbehaves in production. I instrument MCP servers with OpenTelemetry spans so tool call traces show up in the same dashboard as the rest of the agent pipeline.

Input Validation and Guardrails

The MCP server is the last line of defence before your API. Validate all inputs against the declared JSON Schema, not just in the manifest but in the handler. Reject malformed inputs with a structured error the LLM can parse and recover from. Add semantic guardrails: check that a delete_record call references a real record owned by the authenticated user before executing.

Cost and Latency

Each tool manifest injected into the context costs tokens. A bloated manifest with 30 tools and verbose descriptions can add 2,000 to 4,000 tokens per request. On high-volume agents this is meaningful spend. Keep tool descriptions tight (under 80 words each), use tool filtering to inject only relevant tools per turn, and measure manifest token cost explicitly.

Human-in-the-Loop Integration

For any tool with destructive or irreversible side effects, the MCP server should support a confirmation workflow. The tool returns a pending state with a confirmation token, the agent surfaces this to the user, and a second call with the token executes. This is not optional for production agents touching financial records, user data, or external communications.

Security

MCP servers expose your internal APIs to an LLM-controlled call path. Treat them like any external-facing API: authenticate every request (short-lived tokens, not static API keys), authorise at the resource level not just the tool level, log all calls with tamper-evident audit trails, and never let the model pass raw SQL or shell commands through a tool parameter.

A Decision Framework: Should You Build a Custom MCP Server?

Signal	Recommendation
One agent, fewer than 5 stable tools, single host	Inline tool definitions. No MCP yet.
Two or more agents consuming the same API surface	Extract to a shared MCP server.
Auth, rate limiting, or guardrails needed per tool	MCP server is the right home for that logic.
Platform product with external integrators	Publish an MCP server as first-class SDK.
Fewer than 80% eval pass rate on current agent	Fix evals before adding infrastructure.
Tool surface changes faster than weekly	Stabilise first, then extract to MCP.
Knowledge base access is the main use case	Invest in retrieval pipeline first; MCP is secondary.

The meta-principle: MCP is a coordination mechanism. It pays off when you have something to coordinate across. Build the thing first, then extract the interface.

MCP vs the Alternatives

Before committing to a custom MCP server, consider what else solves the same problem.

Inline Tool Definitions

Fastest to ship, easiest to iterate, collocated with agent logic. Right for single-agent, single-host, early-stage projects. Downside: does not scale to multi-agent or multi-host scenarios.

OpenAPI Spec with Auto-Generated Tool Definitions

If you already have an OpenAPI spec, several frameworks (LangChain, Instructor, Claude tool use) can auto-generate tool definitions from it. This gives you versioning and typed contracts without the process boundary of MCP. A reasonable middle ground before full MCP extraction.

Existing Public MCP Servers

Check the MCP server registry before building. GitHub, Linear, Slack, Google Drive, Postgres, Brave Search, and dozens of other common integrations already have community or official MCP servers. Do not build what already exists and is maintained.

Semantic Kernel or LangChain Tool Abstractions

If your team is already deep in LangChain or Semantic Kernel, their native tool abstractions may be sufficient and more idiomatic than adding an MCP process boundary. MCP is most valuable when you need host-agnostic portability, not just tool reuse within one framework.

Frequently Asked Questions

Do I need MCP or can I just use function calling?

Function calling (tool use) is sufficient for most single-agent projects. MCP adds value when you need to share a tool surface across multiple agents or host applications, or when you want the tool logic to live outside the agent process for independent deployment and testing. If you have one agent, start with function calling and extract to MCP when the surface stabilises and a second consumer appears.

How long does it take to build a custom MCP server?

A well-scoped MCP server exposing 4 to 8 tools over an existing REST API takes an experienced engineer 3 to 5 days including tests, CI integration, and basic observability. If you are also designing the tool schema from scratch, add a day. If you are building auth flows, add another. The mistake is underestimating schema design time: poorly designed tool inputs cause LLM errors that are expensive to debug later.

Can I use MCP with OpenAI or Gemini models, not just Claude?

Yes. MCP is a transport and discovery protocol, not a Claude-specific feature. The host (your agent runtime) handles the MCP side; the model just receives tool definitions and results in its native format (OpenAI function calling, Gemini tool use, etc.). Several open-source MCP host libraries support non-Anthropic models. The protocol is genuinely model-agnostic at the server layer.

What is the biggest mistake teams make when building MCP servers?

Exposing too many tools. Teams map every API endpoint to an MCP tool and end up with 25 tools in the manifest. The LLM wastes tokens reading the manifest, makes ambiguous tool selections, and the context window fills with tool results before the agent accomplishes anything. Design tools at the task level, not the endpoint level. A single manage_contact(action, contact_id, fields) tool is usually better than four separate CRUD tools.

Should my MCP server be stateful or stateless?

Stateless by default. Each tool call should be self-contained: receive inputs, perform action, return result. If you need session state (multi-turn confirmation flows, streaming progress), model it explicitly as a resource with a session ID, not as hidden server-side state. Stateful servers are harder to scale horizontally and harder to debug. The one exception is connection pooling to databases, which is fine to manage at the server process level.

Ready to Build the Right Agent Architecture?

The decision between inline tools, OpenAPI generation, and a custom MCP server is a concrete architectural call that depends on your team size, agent count, API stability, and timeline. Getting it wrong costs weeks of rework. Getting it right means your agent stays maintainable as the product grows.

I help engineering teams make these calls early and correctly, through hands-on AI agent development engagements. If you are designing an agent architecture, evaluating whether your current setup is over-engineered, or trying to get a production agent to actually work reliably, reach out and we can talk through your specific situation.

Work with me on your AI agent architecture

Zalt Blog

Are you a software engineer moving into AI?

AI Personal Assistant

AI Marketing Manager

AI Sales Representative

AI Support Specialist