Technical Guide

System Architecture & Scalability

A comprehensive guide to designing systems that last. Learn about architectural patterns, API design, authentication systems, real-time infrastructure, and building for scale without over-engineering.

January 15, 202611 min readOronts Engineering Team

Designing Systems That Last

The hardest part of software architecture isn't building systems that work. It's building systems that keep working—through growth, changing requirements, team turnover, and years of maintenance.

We've seen enough architectural disasters to know what doesn't work. The over-engineered microservices that should have been a monolith. The monolith that should have been split years ago. The "scalable" system that can't handle 100 concurrent users.

Good architecture is about making the right trade-offs for your actual situation—not following patterns blindly or preparing for scale you'll never reach.

The goal isn't the most sophisticated architecture. It's the simplest architecture that solves your problem and can evolve with your business.

Architectural Principles

Before diving into patterns and technologies, here are the principles that guide our decisions.

1. Start Simple, Scale When Needed

Don't build for 10 million users when you have 1,000. That's not planning ahead—it's wasting resources on problems you don't have.

StageArchitectureWhen to Evolve
MVPMonolith, single DBValidate the business
GrowthMonolith, read replicas, cachingHitting performance limits
ScaleService extraction, async processingClear bottlenecks identified
EnterpriseEvent-driven, distributedOrg/domain boundaries clear

2. Boring Technology Wins

We use proven, boring technology for critical systems. PostgreSQL over the latest NewSQL database. Node.js over experimental runtimes. Kubernetes over custom orchestration.

Boring but reliable:
├── PostgreSQL (not yet another NoSQL)
├── Redis (not experimental caches)
├── Node.js/TypeScript (not experimental languages)
├── React (not framework-of-the-week)
└── Kubernetes (not custom orchestration)

Innovation is for the edges where it matters. Core infrastructure should be battle-tested.

3. Design for Change

Requirements will change. The architecture should accommodate change without rewrites.

// Bad: Hardcoded assumptions
function processOrder(order) {
  charge(order.total);  // What about invoicing? Split payments?
  ship(order.items);    // What about digital goods? Subscriptions?
  email(order.customer); // What about SMS? Push notifications?
}

// Good: Extensible through events
async function processOrder(order) {
  const result = await chargeOrder(order);
  await eventBus.publish('order.paid', { order, result });
  // Listeners handle shipping, notifications, inventory, analytics...
}

4. Make It Observable

You can't fix what you can't see. Build observability in from the start.

// Every service call includes context
async function createOrder(data, context: RequestContext) {
  const span = tracer.startSpan('createOrder', { parent: context.span });

  try {
    const order = await orderService.create(data);

    metrics.increment('orders.created', { channel: data.channel });
    logger.info('Order created', {
      orderId: order.id,
      total: order.total,
      traceId: context.traceId
    });

    return order;
  } catch (error) {
    span.recordException(error);
    throw error;
  } finally {
    span.end();
  }
}

API Design

APIs are the contracts between systems. Once published, they're hard to change. Design them carefully.

REST Done Right

REST works well for most use cases. The key is consistency.

// Consistent patterns across all endpoints
GET    /api/v1/orders           // List (paginated)
GET    /api/v1/orders/:id       // Get one
POST   /api/v1/orders           // Create
PATCH  /api/v1/orders/:id       // Partial update
DELETE /api/v1/orders/:id       // Delete

// Query parameters for filtering/pagination
GET /api/v1/orders?status=pending&page=1&limit=20&sort=-createdAt

// Nested resources when relationship is strong
GET /api/v1/orders/:id/items
POST /api/v1/orders/:id/refunds

// Response format: consistent structure
{
  "data": { ... },
  "meta": {
    "page": 1,
    "limit": 20,
    "total": 156
  }
}

API Versioning

APIs evolve. Plan for it.

StrategyWhen to UseTrade-offs
URL versioning (/v1/)Major breaking changesClear, but multiple codebases
Header versioningMore granular controlLess discoverable
No versioningInternally onlySimple, but inflexible

Our default: URL versioning for external APIs, additive changes without versioning when possible.

// Additive changes: don't require version bump
// Old response
{ "name": "John" }

// New response (backwards compatible)
{ "name": "John", "email": "john@example.com" }

// Breaking changes: require new version
// v1: { "name": "John Doe" }
// v2: { "firstName": "John", "lastName": "Doe" }

GraphQL When Appropriate

GraphQL excels when clients have varying data needs.

# Client specifies exactly what it needs
query OrderSummary($id: ID!) {
  order(id: $id) {
    id
    status
    total
  }
}

query OrderDetails($id: ID!) {
  order(id: $id) {
    id
    status
    total
    items {
      product { name, image }
      quantity
      price
    }
    shipments {
      carrier
      trackingNumber
      status
    }
    history {
      timestamp
      event
      actor
    }
  }
}

API Security

Every API needs proper security. No exceptions.

LayerImplementationPurpose
TransportTLS 1.3Encrypt in transit
AuthenticationOAuth 2.0 / JWTVerify identity
AuthorizationRBAC / ABACCheck permissions
Rate LimitingToken bucketPrevent abuse
Input ValidationSchema validationPrevent injection
// Complete API security stack
app.use(helmet());                    // Security headers
app.use(cors(corsOptions));           // CORS policy
app.use(rateLimiter);                 // Rate limiting
app.use(authenticate);                // JWT validation
app.use(validateInput(schema));       // Input validation
app.use(authorize(permissions));      // Permission check

Authentication & Authorization

Auth is critical infrastructure. Get it wrong, and everything is compromised.

Authentication: Who Are You?

MethodUse CaseSecurity Level
Session cookiesWeb appsHigh (with proper config)
JWT tokensAPIs, SPAsHigh (short expiry + refresh)
API keysServer-to-serverMedium (rotate regularly)
OAuth 2.0Third-party accessHigh (proper flow selection)
// JWT with refresh token flow
interface TokenPair {
  accessToken: string;   // Short-lived: 15 min
  refreshToken: string;  // Longer-lived: 7 days, rotating
}

async function login(credentials): Promise<TokenPair> {
  const user = await validateCredentials(credentials);

  const accessToken = jwt.sign(
    { sub: user.id, roles: user.roles },
    ACCESS_SECRET,
    { expiresIn: '15m' }
  );

  const refreshToken = await createRefreshToken(user.id);

  return { accessToken, refreshToken };
}

async function refresh(refreshToken): Promise<TokenPair> {
  const valid = await validateRefreshToken(refreshToken);
  if (!valid) throw new UnauthorizedError();

  // Rotate refresh token (one-time use)
  await revokeRefreshToken(refreshToken);

  return login(valid.user);
}

Authorization: What Can You Do?

RBAC (Role-Based Access Control) works for most cases. ABAC adds flexibility when needed.

// Role-based access control
const roles = {
  admin: ['*'],  // Everything
  manager: ['orders:*', 'products:read', 'customers:read'],
  support: ['orders:read', 'orders:update', 'customers:read'],
  viewer: ['orders:read', 'products:read']
};

function authorize(permission: string) {
  return (req, res, next) => {
    const userPermissions = expandRolePermissions(req.user.roles);

    if (!hasPermission(userPermissions, permission)) {
      return res.status(403).json({ error: 'Forbidden' });
    }

    next();
  };
}

// Usage
app.delete('/orders/:id', authorize('orders:delete'), deleteOrder);

Real-Time Systems

Modern applications need real-time capabilities. Live updates, collaborative features, instant notifications.

Technology Selection

TechnologyBest ForTrade-offs
WebSocketsBidirectional, high-frequencyConnection management complexity
Server-Sent EventsServer→Client updatesSimpler, but one-directional
Long PollingFallback, simple use casesHigher latency, more requests
WebRTCPeer-to-peer, mediaComplex, specific use cases

WebSocket Architecture

// Scalable WebSocket setup with Redis pub/sub
const io = new Server(server, {
  adapter: createAdapter(redisClient)  // Scale across multiple servers
});

// Room-based subscriptions
io.on('connection', (socket) => {
  // Join user's personal room
  socket.join(`user:${socket.user.id}`);

  // Join organization room if B2B
  if (socket.user.orgId) {
    socket.join(`org:${socket.user.orgId}`);
  }
});

// Publish updates from anywhere
async function notifyOrderUpdate(order) {
  // Notify the customer
  io.to(`user:${order.customerId}`).emit('order:updated', order);

  // Notify support team
  io.to('role:support').emit('order:updated', order);
}

Event-Driven Architecture

For complex systems, events decouple components and enable async processing.

// Event bus interface
interface EventBus {
  publish(event: string, payload: any): Promise<void>;
  subscribe(event: string, handler: EventHandler): void;
}

// Domain events
type DomainEvent =
  | { type: 'order.created', order: Order }
  | { type: 'order.paid', order: Order, payment: Payment }
  | { type: 'order.shipped', order: Order, shipment: Shipment }
  | { type: 'order.delivered', order: Order };

// Loosely coupled handlers
eventBus.subscribe('order.paid', async (event) => {
  await inventoryService.reserve(event.order.items);
});

eventBus.subscribe('order.paid', async (event) => {
  await notificationService.sendConfirmation(event.order);
});

eventBus.subscribe('order.paid', async (event) => {
  await analyticsService.trackPurchase(event.order);
});

Scaling Strategies

Scaling is about handling growth without rewriting everything.

Database Scaling

StrategyWhenComplexity
Vertical scalingFirst step, up to ~64 coresLow
Read replicasRead-heavy workloadsLow
Connection poolingMany app instancesLow
PartitioningTime-series, multi-tenantMedium
ShardingExtreme scaleHigh
// Read replica routing
class Database {
  private primary: Pool;
  private replicas: Pool[];

  async query(sql: string, params: any[], options?: QueryOptions) {
    const pool = options?.readOnly
      ? this.getRandomReplica()
      : this.primary;

    return pool.query(sql, params);
  }

  // Usage
  const order = await db.query(
    'SELECT * FROM orders WHERE id = $1',
    [orderId],
    { readOnly: true }  // Can use replica
  );

  await db.query(
    'UPDATE orders SET status = $1 WHERE id = $2',
    ['shipped', orderId]
    // No readOnly: uses primary
  );
}

Application Scaling

PatternPurposeImplementation
Horizontal scalingHandle more requestsKubernetes HPA
CachingReduce database loadRedis, CDN
Async processingOffload heavy workJob queues
Circuit breakersHandle failures gracefullyResilience patterns
// Caching strategy
async function getProduct(id: string): Promise<Product> {
  // Check cache first
  const cached = await cache.get(`product:${id}`);
  if (cached) return cached;

  // Cache miss: fetch from DB
  const product = await db.query('SELECT * FROM products WHERE id = $1', [id]);

  // Cache for future requests
  await cache.set(`product:${id}`, product, { ttl: 3600 });

  return product;
}

// Cache invalidation on update
async function updateProduct(id: string, data: Partial<Product>) {
  await db.query('UPDATE products SET ...', [data, id]);
  await cache.delete(`product:${id}`);
  await cache.delete('product-list:*');  // Invalidate list caches
}

Infrastructure Patterns

How you run your software matters as much as how you write it.

Container Orchestration

Kubernetes has become the standard. Here's a production-ready setup:

# Deployment with best practices
apiVersion: apps/v1
kind: Deployment
metadata:
  name: api
spec:
  replicas: 3
  strategy:
    type: RollingUpdate
    rollingUpdate:
      maxUnavailable: 1
      maxSurge: 1
  template:
    spec:
      containers:
      - name: api
        image: api:v1.2.3
        resources:
          requests:
            memory: "256Mi"
            cpu: "250m"
          limits:
            memory: "512Mi"
            cpu: "500m"
        livenessProbe:
          httpGet:
            path: /health
            port: 3000
          initialDelaySeconds: 10
          periodSeconds: 10
        readinessProbe:
          httpGet:
            path: /ready
            port: 3000
          initialDelaySeconds: 5
          periodSeconds: 5

Multi-Region Deployment

For global availability and performance:

                    ┌─────────────┐
                    │   Global    │
                    │  DNS/CDN    │
                    └──────┬──────┘
                           │
        ┌──────────────────┼──────────────────┐
        │                  │                  │
   ┌────▼────┐        ┌────▼────┐        ┌────▼────┐
   │ EU-WEST │        │ US-EAST │        │ AP-SOUTH│
   │ Region  │        │ Region  │        │ Region  │
   └────┬────┘        └────┬────┘        └────┬────┘
        │                  │                  │
   ┌────▼────┐        ┌────▼────┐        ┌────▼────┐
   │   DB    │◄──────►│   DB    │◄──────►│   DB    │
   │(Primary)│        │(Replica)│        │(Replica)│
   └─────────┘        └─────────┘        └─────────┘

Monolith vs. Microservices

The eternal debate. Here's our take:

Start with Monolith WhenConsider Microservices When
New product, uncertain requirementsClear domain boundaries exist
Small team (< 10 engineers)Multiple teams need independence
Need to move fastDifferent scaling requirements per service
Don't know your domains yetDifferent technology needs per service

The Modular Monolith

Best of both worlds: monolith deployment simplicity with service-like boundaries.

src/
├── modules/
│   ├── orders/           # Order domain
│   │   ├── api/          # HTTP handlers
│   │   ├── services/     # Business logic
│   │   ├── repository/   # Data access
│   │   └── events/       # Domain events
│   │
│   ├── inventory/        # Inventory domain
│   │   ├── api/
│   │   ├── services/
│   │   └── ...
│   │
│   └── customers/        # Customer domain
│       └── ...
│
├── shared/               # Cross-cutting concerns
│   ├── database/
│   ├── auth/
│   └── events/
│
└── main.ts              # Single deployment

Modules communicate through well-defined interfaces. When you need to extract a service, the boundaries are already there.

Disaster Recovery

Systems fail. Plan for it.

LevelRTORPOImplementation
BasicHoursHoursDaily backups, manual restore
Standard30 min15 minAutomated backups, standby DB
High AvailabilityMinutesMinutesMulti-AZ, automated failover
Mission CriticalSecondsNear-zeroMulti-region, active-active
# Backup configuration example
backups:
  database:
    frequency: "every 6 hours"
    retention: "30 days"
    type: "point-in-time recovery"
    destination: "s3://backups/postgres"

  application:
    frequency: "every deployment"
    retention: "90 days"
    type: "container images"
    destination: "ecr://app-images"

  documents:
    frequency: "real-time"
    retention: "indefinite"
    type: "object versioning"
    destination: "s3://documents"

Conclusion

Good architecture isn't about following trends or implementing every pattern you've read about. It's about understanding your actual needs and making deliberate trade-offs.

The best architectures are the simplest ones that solve the problem. Complexity is a cost, not a feature.

We've designed systems handling millions of requests, processing real-time data, and serving global users. The common thread isn't sophisticated technology—it's thoughtful design that matches the actual requirements.

If you're facing architectural decisions or struggling with systems that don't scale, we'd be happy to discuss your specific situation.

Topics covered

system architecturescalabilityAPI designauthenticationreal-time systemsmicroservicesdistributed systemsinfrastructuresystem design

Ready to implement agentic AI?

Our team specializes in building production-ready AI systems. Let's discuss how we can help you leverage agentic AI for your enterprise.

Start a conversation