API Design for Systems That Live Longer Than Your Team
How to design APIs that survive team turnover. Protobuf as schema source of truth, tRPC for TypeScript stacks, GraphQL in production, breaking change management, and error contracts.
The API That Outlives Its Creator
Every API starts with one team, one client, and one use case. Two years later, 5 teams depend on it, 3 external partners integrate with it, and the original developers have moved on. The API's design decisions are now permanent.
We've built APIs using three different strategies: Protobuf as schema source of truth for cross-language systems, tRPC for TypeScript-only stacks, and GraphQL for complex client needs. Each was the right choice for its context. This article covers when to use each and how to design APIs that survive team turnover.
For how these APIs fit into broader system design, see our system architecture guide and software engineering guide.
Protobuf: Schema Source of Truth for Cross-Language Systems
When your system spans multiple languages (TypeScript API, Go search service, Python ML pipeline), you need a schema format that generates types for all of them. Protobuf does this.
// proto/product.proto
syntax = "proto3";
package commerce.v1;
message Product {
string id = 1;
string name = 2;
string description = 3;
int32 price_cents = 4;
string currency = 5;
repeated string category_ids = 6;
ProductStatus status = 7;
google.protobuf.Timestamp created_at = 8;
}
enum ProductStatus {
PRODUCT_STATUS_UNSPECIFIED = 0;
PRODUCT_STATUS_DRAFT = 1;
PRODUCT_STATUS_ACTIVE = 2;
PRODUCT_STATUS_ARCHIVED = 3;
}
service ProductService {
rpc GetProduct(GetProductRequest) returns (Product);
rpc ListProducts(ListProductsRequest) returns (ListProductsResponse);
rpc CreateProduct(CreateProductRequest) returns (Product);
}
Why Tags Matter for Evolution
Protobuf fields are identified by tag numbers (1, 2, 3...), not by name. This means you can rename a field without breaking existing clients. You can add new fields (new tag numbers) without breaking old clients. You can deprecate fields without removing them.
// Safe evolution: adding a field
message Product {
string id = 1;
string name = 2;
string description = 3;
int32 price_cents = 4;
string currency = 5;
repeated string category_ids = 6;
ProductStatus status = 7;
google.protobuf.Timestamp created_at = 8;
string sku = 9; // NEW: added without breaking existing clients
string brand = 10; // NEW: old clients simply ignore unknown fields
}
Rules for safe Protobuf evolution:
- Never reuse a tag number (even after removing a field)
- Never change a field's type
- Add new fields with new tag numbers
- Mark deprecated fields with
reservedto prevent accidental reuse
Code Generation
# Generate TypeScript, Go, and Python from the same .proto file
protoc --ts_out=./gen/ts --go_out=./gen/go --python_out=./gen/py proto/*.proto
One schema file generates type-safe clients and servers in every language. Schema changes are a pull request. Code generation is a CI step. Type mismatches are compile errors, not runtime bugs.
tRPC: When Your Whole Stack Is TypeScript
If your API server and all clients are TypeScript, tRPC eliminates the schema layer entirely. Types flow from the server to the client at compile time. No code generation, no schema files, no OpenAPI spec.
// Server: define router with typed procedures
import { router, publicProcedure, protectedProcedure } from './trpc';
import { z } from 'zod';
export const productRouter = router({
list: publicProcedure
.input(z.object({
cursor: z.string().optional(),
limit: z.number().min(1).max(100).default(20),
category: z.string().optional(),
}))
.query(async ({ input, ctx }) => {
return ctx.productService.list(input);
}),
create: protectedProcedure
.input(z.object({
name: z.string().min(1).max(200),
price: z.number().positive(),
description: z.string().optional(),
}))
.mutation(async ({ input, ctx }) => {
return ctx.productService.create(ctx.tenantId, input);
}),
});
// Client: full type inference, no code generation
const products = await trpc.product.list.query({
limit: 10,
category: 'electronics',
});
// products is fully typed: { items: Product[], nextCursor?: string }
When tRPC Works Best
- All clients are TypeScript (web app, React Native, Node.js services)
- The API is internal (not exposed to third parties)
- Rapid iteration matters more than formal API contracts
- Team is small and co-located (changes to server and client happen together)
When tRPC Doesn't Work
- External partners need to integrate (they need OpenAPI/Swagger docs)
- Clients are in other languages (mobile native, Go, Python)
- You need API versioning for backward compatibility
- The API is a public product
We use tRPC with Hono for internal TypeScript APIs. See our TypeScript backends guide for the full stack comparison.
GraphQL: For Complex Client Needs
GraphQL shines when clients need flexible queries: different pages need different subsets of data, mobile needs less data than web, and the client team wants to iterate on queries without backend changes.
# Client requests exactly what it needs
query ProductPage($slug: String!) {
product(slug: $slug) {
id
name
price
images { url alt }
reviews(first: 5, status: APPROVED) {
items { rating body customerName }
totalItems
}
relatedProducts(first: 4) {
id name price images { url }
}
}
}
Production GraphQL Patterns
Persisted queries: Don't allow arbitrary queries in production. Clients send a query hash, the server looks up the query from a registry. This prevents query abuse and enables caching.
Depth limiting: Without limits, a client can send a deeply nested query that joins every table in your database.
apiOptions: {
middleware: [depthLimit(10)],
shopApiPlayground: false, // Disable in production
}
N+1 prevention: Use DataLoader to batch database queries. Without it, a query that fetches 20 products with their categories makes 20 separate category queries.
// DataLoader batches N individual queries into 1
const categoryLoader = new DataLoader(async (ids: string[]) => {
const categories = await categoryRepo.findByIds(ids);
return ids.map(id => categories.find(c => c.id === id));
});
Complexity analysis: Assign cost to each field. Reject queries that exceed a complexity budget.
For how we use GraphQL in Vendure commerce, see our Vendure production guide.
Breaking Changes: Tag-Based Evolution vs URL Versioning
| Strategy | How It Works | Best For |
|---|---|---|
| Tag-based (Protobuf) | Add fields with new tags, old clients ignore them | Cross-language, gRPC |
URL versioning (/v1/, /v2/) | Separate endpoints per version | REST APIs with external consumers |
Header versioning (Accept: application/vnd.api.v2+json) | Same URL, version in header | REST APIs that want clean URLs |
| GraphQL (no versioning) | Add fields, deprecate old ones with @deprecated | GraphQL APIs |
| tRPC (no versioning) | Types evolve with the codebase | Internal TypeScript APIs |
For most internal APIs, avoid URL versioning. It doubles your maintenance surface. Add new fields, deprecate old ones, remove after all clients have migrated.
For external APIs (third-party integrations, public APIs), URL versioning is safer because you can't control when clients update.
Error Contracts
Errors are part of the API contract. Clients need structured errors they can act on, not string messages they parse with regex.
// Structured error response
interface ApiError {
code: string; // Machine-readable: "PRODUCT_NOT_FOUND", "INSUFFICIENT_STOCK"
message: string; // Human-readable: "Product with ID xyz not found"
details?: object; // Additional context for debugging
requestId: string; // Correlation ID for support
}
// Example responses
// 404
{
"code": "PRODUCT_NOT_FOUND",
"message": "Product with ID prod_123 not found",
"requestId": "req_abc456"
}
// 409
{
"code": "INSUFFICIENT_STOCK",
"message": "Only 2 units available, requested 5",
"details": { "available": 2, "requested": 5 },
"requestId": "req_def789"
}
// 422
{
"code": "VALIDATION_ERROR",
"message": "Invalid input",
"details": {
"fields": [
{ "field": "price", "error": "must be positive" },
{ "field": "name", "error": "must not be empty" }
]
},
"requestId": "req_ghi012"
}
The code field is the contract. Clients switch on it. The message is for humans. The details provide context. The requestId enables support to find the exact request in logs.
API Governance
As the API grows, you need governance to prevent inconsistency:
| Rule | Why |
|---|---|
| All endpoints require authentication | No accidental public endpoints |
| All mutations require idempotency keys | Network retries don't create duplicates |
All responses include requestId | Every request is traceable |
| All errors use the structured format | Clients can handle errors programmatically |
| All list endpoints support pagination | No unbounded queries |
| Breaking changes require review | One person can't break all consumers |
| Deprecations require migration timeline | "Deprecated" without a deadline means "never removed" |
Common Pitfalls
-
REST vs GraphQL as an identity. Choose based on client needs, not team preference. REST for simple CRUD with external consumers. GraphQL for flexible queries with frontend teams. tRPC for internal TypeScript. Protobuf for cross-language.
-
No error contract. String error messages that change with every release break every client that tries to handle them.
-
Unbounded list endpoints. An endpoint that returns all 50,000 products in one response will crash clients and overload your database. Pagination is mandatory.
-
Versioning internal APIs. If you control all clients, evolve the API in place. Versioning doubles maintenance for no benefit.
-
No deprecation timeline. Marking a field
@deprecatedwithout a removal date means it stays forever. Set a date, notify consumers, remove it. -
GraphQL without depth limits. An unrestricted GraphQL API is a denial-of-service vector. Limit depth, complexity, and query cost.
Key Takeaways
-
Protobuf for cross-language systems. Tag-based evolution, code generation, wire format efficiency. The schema is the contract.
-
tRPC for TypeScript-only stacks. Zero code generation, full type inference, fastest iteration speed. But only works when all clients are TypeScript.
-
GraphQL for flexible client needs. Clients query exactly what they need. But add depth limits, persisted queries, and complexity analysis for production.
-
Error contracts are as important as success contracts. Machine-readable codes, human-readable messages, correlation IDs, and structured details.
-
Evolve in place for internal APIs, version for external. Adding fields is safe. Removing fields requires a migration timeline. URL versioning is a last resort.
We design APIs as part of our web development and custom software practice. If you need help with API architecture, talk to our team or request a quote.
Topics covered
Related Guides
System Architecture & Scalability
Guide to designing systems that last. Learn about architectural patterns, API design, authentication, real-time infrastructure, and building for scale.
Read guideEnterprise Guide to Agentic AI Systems
Technical guide to agentic AI systems in enterprise environments. Learn the architecture, capabilities, and applications of autonomous AI agents.
Read guideAgentic Commerce: How to Let AI Agents Buy Things Safely
How to design governed AI agent-initiated commerce. Policy engines, HITL approval gates, HMAC receipts, idempotency, tenant scoping, and the full Agentic Checkout Protocol.
Read guideReady to build production AI systems?
Our team specializes in building production-ready AI systems. Let's discuss how we can help transform your enterprise with cutting-edge technology.
Start a conversation