**DRAFT — pending editorial expansion.** This article is a working draft published as scaffolding for the NINtec content programme. The current version covers the substantive perspective in compressed form; the published version will expand each section to the 2,000+ word depth the topic warrants. Editorial review is required before promotion.
Anthropic's Model Context Protocol (MCP) is the standard for letting Claude invoke tools and read resources from external systems. The basic SDKs make a working MCP server look easy — a few hours of code and you can demo Claude calling your API. The gap between that demo and a production MCP server is large, and the gap is where most first-version implementations fail.
What a demo MCP server skips
The minimum viable MCP server defines a few tools, exposes them over stdio or SSE, and works in Claude Desktop. It does not handle: authentication, multi-tenancy, audit logging, rate limiting, schema evolution, or operational observability. Production MCP servers must include all of these. The architecture phase decision is which to build first; the answer depends on which workloads will hit the server in the first quarter.
Authentication and tenancy
Per-tenant credentials enforced at the MCP server entry point are the foundation. Tenant context propagates across tool calls — a Claude client invoking a tool on behalf of tenant A must not be able to access tenant B's data, ever, by any prompt-engineering or tool-call sequence. This is enforced at the integration layer, not at the prompt-template layer. The MCP server is the trust boundary.
Three patterns for credentials: (1) per-Claude-client API keys with tenant binding, (2) OAuth flows with delegated tokens, (3) JWT-based session tokens with claims. The right choice depends on your existing IAM posture; we have shipped all three.
Audit logging and compliance
Every MCP tool invocation generates a structured log entry with caller identity, parameters, response, latency, and outcome. Logs export to your SIEM (Splunk, Datadog, ELK) and retain per the compliance regime — typically 7 years for fintech, 6 for HIPAA, longer for some pharma workloads. Without this, you cannot defend the system in audit, and audit will eventually happen.
Rate limiting and fairness
MCP servers serving multiple Claude clients need per-tenant and per-tool rate limits to prevent one tenant's workload from starving others. Queue-based fairness algorithms (token bucket per tenant, weighted fair queueing) are the production patterns. Back-pressure semantics surface to the Claude client cleanly — the client should know when it is being rate-limited and respond appropriately.
Schema evolution
Tool definitions evolve. New tools are added; old tools are deprecated; existing tools change their parameter schemas. Without versioning discipline, every change risks breaking the Claude clients depending on the server. Production MCP servers use semver — major.minor.patch — with deprecation windows for breaking changes and consumer-driven contract testing in CI to catch unintentional breaks.
Operational runbooks
Production MCP servers need on-call coverage. Incidents — failed tool calls, schema-validation errors, rate-limit saturation, auth failures — require runbook-driven response. The runbook is written before the incident, not after. NINtec's MCP engagements include the runbook as a contractual deliverable.
When to build versus buy
Off-the-shelf MCP servers exist for common SaaS platforms — Slack, GitHub, Linear, Notion. If your need is integration with one of those, install the existing server. Custom MCP servers are appropriate when Claude needs to talk to your internal systems — your CRM, your data warehouse, your billing platform, your proprietary APIs. NINtec's MCP development practice ships custom servers with all the production-engineering surface from day one.
How NINtec engages
Typical MCP server engagement runs 8–14 weeks: 1 week Discovery (system inventory, tool surface design, transport and auth model selection), 6–10 weeks Build (production code, tests, observability integration), 1–3 weeks Hardening (rate-limit drills, auth penetration testing, schema evolution rehearsal). Most clients without an MCP-fluent SRE team move into managed services for the first 6–12 months while their team ramps on operations.