Technical Guide

API Design for Systems That Live Longer Than Your Team

How to design APIs that survive team turnover. Protobuf as schema source of truth, tRPC for TypeScript stacks, GraphQL in production, breaking change management, and error contracts.

April 5, 202614 min readOronts Engineering Team

The API That Outlives Its Creator

Every API starts with one team, one client, and one use case. Two years later, 5 teams depend on it, 3 external partners integrate with it, and the original developers have moved on. The API's design decisions are now permanent.

We've built APIs using three different strategies: Protobuf as schema source of truth for cross-language systems, tRPC for TypeScript-only stacks, and GraphQL for complex client needs. Each was the right choice for its context. This article covers when to use each and how to design APIs that survive team turnover.

For how these APIs fit into broader system design, see our system architecture guide and software engineering guide.

Protobuf: Schema Source of Truth for Cross-Language Systems

When your system spans multiple languages (TypeScript API, Go search service, Python ML pipeline), you need a schema format that generates types for all of them. Protobuf does this.

// proto/product.proto
syntax = "proto3";
package commerce.v1;

message Product {
    string id = 1;
    string name = 2;
    string description = 3;
    int32 price_cents = 4;
    string currency = 5;
    repeated string category_ids = 6;
    ProductStatus status = 7;
    google.protobuf.Timestamp created_at = 8;
}

enum ProductStatus {
    PRODUCT_STATUS_UNSPECIFIED = 0;
    PRODUCT_STATUS_DRAFT = 1;
    PRODUCT_STATUS_ACTIVE = 2;
    PRODUCT_STATUS_ARCHIVED = 3;
}

service ProductService {
    rpc GetProduct(GetProductRequest) returns (Product);
    rpc ListProducts(ListProductsRequest) returns (ListProductsResponse);
    rpc CreateProduct(CreateProductRequest) returns (Product);
}

Why Tags Matter for Evolution

Protobuf fields are identified by tag numbers (1, 2, 3...), not by name. This means you can rename a field without breaking existing clients. You can add new fields (new tag numbers) without breaking old clients. You can deprecate fields without removing them.

// Safe evolution: adding a field
message Product {
    string id = 1;
    string name = 2;
    string description = 3;
    int32 price_cents = 4;
    string currency = 5;
    repeated string category_ids = 6;
    ProductStatus status = 7;
    google.protobuf.Timestamp created_at = 8;
    string sku = 9;          // NEW: added without breaking existing clients
    string brand = 10;       // NEW: old clients simply ignore unknown fields
}

Rules for safe Protobuf evolution:

  • Never reuse a tag number (even after removing a field)
  • Never change a field's type
  • Add new fields with new tag numbers
  • Mark deprecated fields with reserved to prevent accidental reuse

Code Generation

# Generate TypeScript, Go, and Python from the same .proto file
protoc --ts_out=./gen/ts --go_out=./gen/go --python_out=./gen/py proto/*.proto

One schema file generates type-safe clients and servers in every language. Schema changes are a pull request. Code generation is a CI step. Type mismatches are compile errors, not runtime bugs.

tRPC: When Your Whole Stack Is TypeScript

If your API server and all clients are TypeScript, tRPC eliminates the schema layer entirely. Types flow from the server to the client at compile time. No code generation, no schema files, no OpenAPI spec.

// Server: define router with typed procedures
import { router, publicProcedure, protectedProcedure } from './trpc';
import { z } from 'zod';

export const productRouter = router({
    list: publicProcedure
        .input(z.object({
            cursor: z.string().optional(),
            limit: z.number().min(1).max(100).default(20),
            category: z.string().optional(),
        }))
        .query(async ({ input, ctx }) => {
            return ctx.productService.list(input);
        }),

    create: protectedProcedure
        .input(z.object({
            name: z.string().min(1).max(200),
            price: z.number().positive(),
            description: z.string().optional(),
        }))
        .mutation(async ({ input, ctx }) => {
            return ctx.productService.create(ctx.tenantId, input);
        }),
});

// Client: full type inference, no code generation
const products = await trpc.product.list.query({
    limit: 10,
    category: 'electronics',
});
// products is fully typed: { items: Product[], nextCursor?: string }

When tRPC Works Best

  • All clients are TypeScript (web app, React Native, Node.js services)
  • The API is internal (not exposed to third parties)
  • Rapid iteration matters more than formal API contracts
  • Team is small and co-located (changes to server and client happen together)

When tRPC Doesn't Work

  • External partners need to integrate (they need OpenAPI/Swagger docs)
  • Clients are in other languages (mobile native, Go, Python)
  • You need API versioning for backward compatibility
  • The API is a public product

We use tRPC with Hono for internal TypeScript APIs. See our TypeScript backends guide for the full stack comparison.

GraphQL: For Complex Client Needs

GraphQL shines when clients need flexible queries: different pages need different subsets of data, mobile needs less data than web, and the client team wants to iterate on queries without backend changes.

# Client requests exactly what it needs
query ProductPage($slug: String!) {
    product(slug: $slug) {
        id
        name
        price
        images { url alt }
        reviews(first: 5, status: APPROVED) {
            items { rating body customerName }
            totalItems
        }
        relatedProducts(first: 4) {
            id name price images { url }
        }
    }
}

Production GraphQL Patterns

Persisted queries: Don't allow arbitrary queries in production. Clients send a query hash, the server looks up the query from a registry. This prevents query abuse and enables caching.

Depth limiting: Without limits, a client can send a deeply nested query that joins every table in your database.

apiOptions: {
    middleware: [depthLimit(10)],
    shopApiPlayground: false,  // Disable in production
}

N+1 prevention: Use DataLoader to batch database queries. Without it, a query that fetches 20 products with their categories makes 20 separate category queries.

// DataLoader batches N individual queries into 1
const categoryLoader = new DataLoader(async (ids: string[]) => {
    const categories = await categoryRepo.findByIds(ids);
    return ids.map(id => categories.find(c => c.id === id));
});

Complexity analysis: Assign cost to each field. Reject queries that exceed a complexity budget.

For how we use GraphQL in Vendure commerce, see our Vendure production guide.

Breaking Changes: Tag-Based Evolution vs URL Versioning

StrategyHow It WorksBest For
Tag-based (Protobuf)Add fields with new tags, old clients ignore themCross-language, gRPC
URL versioning (/v1/, /v2/)Separate endpoints per versionREST APIs with external consumers
Header versioning (Accept: application/vnd.api.v2+json)Same URL, version in headerREST APIs that want clean URLs
GraphQL (no versioning)Add fields, deprecate old ones with @deprecatedGraphQL APIs
tRPC (no versioning)Types evolve with the codebaseInternal TypeScript APIs

For most internal APIs, avoid URL versioning. It doubles your maintenance surface. Add new fields, deprecate old ones, remove after all clients have migrated.

For external APIs (third-party integrations, public APIs), URL versioning is safer because you can't control when clients update.

Error Contracts

Errors are part of the API contract. Clients need structured errors they can act on, not string messages they parse with regex.

// Structured error response
interface ApiError {
    code: string;           // Machine-readable: "PRODUCT_NOT_FOUND", "INSUFFICIENT_STOCK"
    message: string;        // Human-readable: "Product with ID xyz not found"
    details?: object;       // Additional context for debugging
    requestId: string;      // Correlation ID for support
}

// Example responses
// 404
{
    "code": "PRODUCT_NOT_FOUND",
    "message": "Product with ID prod_123 not found",
    "requestId": "req_abc456"
}

// 409
{
    "code": "INSUFFICIENT_STOCK",
    "message": "Only 2 units available, requested 5",
    "details": { "available": 2, "requested": 5 },
    "requestId": "req_def789"
}

// 422
{
    "code": "VALIDATION_ERROR",
    "message": "Invalid input",
    "details": {
        "fields": [
            { "field": "price", "error": "must be positive" },
            { "field": "name", "error": "must not be empty" }
        ]
    },
    "requestId": "req_ghi012"
}

The code field is the contract. Clients switch on it. The message is for humans. The details provide context. The requestId enables support to find the exact request in logs.

API Governance

As the API grows, you need governance to prevent inconsistency:

RuleWhy
All endpoints require authenticationNo accidental public endpoints
All mutations require idempotency keysNetwork retries don't create duplicates
All responses include requestIdEvery request is traceable
All errors use the structured formatClients can handle errors programmatically
All list endpoints support paginationNo unbounded queries
Breaking changes require reviewOne person can't break all consumers
Deprecations require migration timeline"Deprecated" without a deadline means "never removed"

Common Pitfalls

  1. REST vs GraphQL as an identity. Choose based on client needs, not team preference. REST for simple CRUD with external consumers. GraphQL for flexible queries with frontend teams. tRPC for internal TypeScript. Protobuf for cross-language.

  2. No error contract. String error messages that change with every release break every client that tries to handle them.

  3. Unbounded list endpoints. An endpoint that returns all 50,000 products in one response will crash clients and overload your database. Pagination is mandatory.

  4. Versioning internal APIs. If you control all clients, evolve the API in place. Versioning doubles maintenance for no benefit.

  5. No deprecation timeline. Marking a field @deprecated without a removal date means it stays forever. Set a date, notify consumers, remove it.

  6. GraphQL without depth limits. An unrestricted GraphQL API is a denial-of-service vector. Limit depth, complexity, and query cost.

Key Takeaways

  • Protobuf for cross-language systems. Tag-based evolution, code generation, wire format efficiency. The schema is the contract.

  • tRPC for TypeScript-only stacks. Zero code generation, full type inference, fastest iteration speed. But only works when all clients are TypeScript.

  • GraphQL for flexible client needs. Clients query exactly what they need. But add depth limits, persisted queries, and complexity analysis for production.

  • Error contracts are as important as success contracts. Machine-readable codes, human-readable messages, correlation IDs, and structured details.

  • Evolve in place for internal APIs, version for external. Adding fields is safe. Removing fields requires a migration timeline. URL versioning is a last resort.

We design APIs as part of our web development and custom software practice. If you need help with API architecture, talk to our team or request a quote.

Topics covered

API designAPI versioningAPI stabilitybreaking changescontract-first APItRPCGraphQL designProtobufAPI governance

Ready to build production AI systems?

Our team specializes in building production-ready AI systems. Let's discuss how we can help transform your enterprise with cutting-edge technology.

Start a conversation