Technical Guide

Scaling Backend Systems Without Overengineering: The YAGNI Architecture Checklist

When to scale and when not to. Monolith + queue as the most underrated architecture, real scaling triggers, database bottlenecks, caching strategy, and the YAGNI checklist for architecture decisions.

February 11, 202614 min readOronts Engineering Team

The Overengineering Epidemic

We've seen it too many times: a team of 3 engineers building a microservices architecture for an application with 200 users. Kubernetes cluster, service mesh, event bus, API gateway, 5 separate databases, and a monitoring stack more complex than the application itself. Six months in, they're debugging distributed systems problems instead of building features.

The opposite also happens: a monolith serving 50,000 concurrent users, where every request takes 5 seconds because the database is the bottleneck and there's no caching layer. The team keeps adding application servers, but the database is the ceiling.

Both are architecture failures. This article covers when to scale, what to scale, and how to avoid scaling things that don't need it. For specific scaling patterns, see our system architecture guide and event-driven architecture guide.

The YAGNI Architecture Checklist

Before adding any architectural complexity (new service, new database, new message broker, new cache layer), ask:

QuestionIf YESIf NO
Do we have a measured performance problem right now?Fix the specific problemDon't add complexity for a hypothetical problem
Would this solve a problem we've already encountered?Consider it, with evidenceDon't solve problems you haven't had yet
Can we solve it with a simpler approach first?Use the simpler approachComplex solution might be needed
Will 10x more traffic break the current architecture?Plan ahead (but don't build yet)Current architecture is fine
Is the team large enough to maintain this?Proceed if at least 2 people can own itDon't add things nobody can maintain
Can we revert if it doesn't work?Acceptable riskToo risky

The default answer to "should we add this?" is no. Add complexity when you have evidence it's needed, not when you imagine it might be needed.

Monolith + Queue: The Most Underrated Architecture

A monolithic application with a background job queue handles 90% of SaaS workloads. One codebase, one deployment, one database, and a queue for background work.

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚            Monolith                      β”‚
β”‚                                          β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”            β”‚
β”‚  β”‚  API      β”‚  β”‚  Admin   β”‚            β”‚
β”‚  β”‚  Routes   β”‚  β”‚  UI      β”‚            β”‚
β”‚  β””β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”˜            β”‚
β”‚       β”‚              β”‚                   β”‚
β”‚  β”Œβ”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”            β”‚
β”‚  β”‚    Services              β”‚            β”‚
β”‚  β”‚    (business logic)      β”‚            β”‚
β”‚  β””β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜            β”‚
β”‚       β”‚                                  β”‚
β”‚  β”Œβ”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”          β”‚
β”‚  β”‚  Database (PostgreSQL)    β”‚          β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜          β”‚
β”‚                                          β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”          β”‚
β”‚  β”‚  Job Queue (BullMQ/Redis) β”‚          β”‚
β”‚  β”‚  - Email sending          β”‚          β”‚
β”‚  β”‚  - Search indexing        β”‚          β”‚
β”‚  β”‚  - Report generation      β”‚          β”‚
β”‚  β”‚  - Data import            β”‚          β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜          β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Why It Works

  • One codebase: Refactoring is safe. Find-and-replace works. Types flow through the entire application.
  • One deployment: Deploy once. No service coordination. No distributed rollbacks.
  • Shared database: Joins work. Transactions are ACID. No eventual consistency headaches.
  • Job queue: Long-running work (emails, imports, indexing) doesn't block request handling.
  • Simple debugging: Stack traces are complete. Logs are in one place. No distributed tracing needed.

When Monolith + Queue Breaks

SymptomRoot CauseSolution
API response time > 2sDatabase queries too slowAdd indexes, query optimization, read replicas
Database CPU > 80% sustainedQuery volume or complexity too highRead replicas, caching layer, query optimization
Deploys are scaryCodebase too large, tests too slowBetter tests, feature flags, canary deploys (NOT microservices)
One feature breaks everythingNo module boundariesBetter code organization (NOT microservices)
Team can't work in parallelCode conflicts, merge hellModule ownership, trunk-based development

Notice: none of these symptoms are solved by adding microservices. They're solved by better engineering within the monolith. Microservices solve a different problem.

When to Split: The Real Triggers

Split into separate services only when:

1. Team scale forces it. When you have 20+ engineers and team boundaries don't align with the codebase. Team A owns feature X and team B owns feature Y, and they need to deploy independently. This is the primary reason for microservices.

2. Different scaling requirements. The search service needs 10x the compute of the product catalog service. Running them in the same process wastes resources. Separate services can scale independently.

3. Different technology requirements. The ML inference service needs Python and GPUs. The API service needs Node.js. Running both in one process is impractical.

4. Fault isolation. A crash in the payment service should not crash the product catalog service. When failure isolation is critical for business continuity, separate processes help.

5. Compliance boundaries. PCI-compliant payment processing needs to be separated from the rest of the application for audit purposes. A separate service with its own security boundary simplifies compliance.

What's NOT a Reason to Split

  • "Microservices are best practice" (they're a trade-off, not a best practice)
  • "We might need to scale" (scale when you need to, not before)
  • "Clean architecture" (module boundaries within a monolith are cleaner than service boundaries)
  • "Resume-driven development" (Kubernetes on your resume doesn't help your users)

The Database Is Usually the Bottleneck

In 80% of performance problems, the database is the bottleneck. Not the application code. Not the network. The database.

Diagnosis

-- Find slow queries (PostgreSQL)
SELECT query, calls, mean_exec_time, total_exec_time
FROM pg_stat_statements
ORDER BY mean_exec_time DESC
LIMIT 20;

-- Find missing indexes
SELECT schemaname, tablename, seq_scan, idx_scan,
       CASE WHEN seq_scan > 0 THEN round(100.0 * idx_scan / (seq_scan + idx_scan), 1) ELSE 100 END AS idx_pct
FROM pg_stat_user_tables
WHERE seq_scan > 100
ORDER BY seq_scan DESC
LIMIT 20;

-- Find table bloat
SELECT tablename, pg_size_pretty(pg_total_relation_size(tablename::regclass)) AS total_size,
       pg_size_pretty(pg_relation_size(tablename::regclass)) AS data_size
FROM pg_tables
WHERE schemaname = 'public'
ORDER BY pg_total_relation_size(tablename::regclass) DESC
LIMIT 20;

Fix Order

  1. Add missing indexes (minutes to implement, massive impact)
  2. Optimize slow queries (rewrite N+1 queries, add WHERE clauses, use EXPLAIN ANALYZE)
  3. Add caching (Redis for frequently read, rarely changed data)
  4. Read replicas (route read queries to replicas, writes to primary)
  5. Connection pooling (PgBouncer between application and database)
  6. Vertical scaling (bigger instance, more RAM, faster storage)
  7. Horizontal scaling (sharding, only when everything else is exhausted)

Most teams jump to step 7 before trying steps 1-3. Steps 1-3 are free or cheap and often solve the problem entirely.

Caching Strategy

What to Cache

DataCache?TTLInvalidation
User sessionsYesSession durationOn logout
Product catalogYes5-15 minutesOn product update event
Search resultsMaybe1-5 minutesOn index update
User-specific dataCarefulShort (1-5 min)On user action
Config / settingsYesLong (1 hour)On admin change
Computed aggregatesYes15-60 minutesOn underlying data change
Real-time data (stock)NoN/AAlways fetch live

What NOT to Cache

  • User-specific data with privacy implications: Cached cart contents served to the wrong user is a data breach.
  • Rapidly changing data: Stock levels, live prices, auction bids. Cache TTL can't keep up.
  • Write-heavy operations: If data changes more often than it's read, caching adds complexity without benefit.

Cache Invalidation Pattern

// Event-driven invalidation (recommended)
eventBus.on('product.updated', async (product) => {
    await cache.delete(`product:${product.id}`);
    await cache.delete(`product-list:${product.categoryId}`);
    // Don't try to update the cache here. Let the next read populate it.
});

// Read-through with stale-while-revalidate
async function getProduct(id: string): Promise<Product> {
    const cached = await cache.get(`product:${id}`);
    if (cached && !isExpired(cached)) {
        return cached.data;
    }

    // Stale but serve it while refreshing
    if (cached && isExpired(cached)) {
        refreshInBackground(id); // async, don't await
        return cached.data;      // serve stale
    }

    // Cache miss
    const product = await db.products.findById(id);
    await cache.set(`product:${id}`, product, { ttl: 300 });
    return product;
}

The Scaling Conversation with Stakeholders

Engineers want to scale for technical reasons. Stakeholders want to scale for business reasons. The conversation needs a shared language.

Stakeholder SaysWhat They MeanEngineering Response
"We need to handle 10x traffic"Marketing campaign coming, or wishful thinking"What's the expected timeline? Let's load test current capacity first."
"The site is slow"One page is slow, not the whole site"Which page? Let me profile it."
"We need microservices"They read an article or heard it at a conference"What problem are we solving? Let me show you the current architecture."
"Can we handle Black Friday?"Genuine concern about peak traffic"Let's load test at 3x current peak and see where it breaks."

Always start with measurement. "The site is slow" is not a scaling requirement. "Product listing page takes 4 seconds at 500 concurrent users, target is 500ms" is a scaling requirement with a measurable goal.

Common Pitfalls

  1. Microservices at 100 users. If your monolith handles the load and your team is under 10 people, microservices add operational complexity without benefit.

  2. Caching without invalidation strategy. Adding a cache is easy. Keeping it consistent with the database is hard. Plan invalidation before adding the cache.

  3. Scaling the application when the database is the bottleneck. Adding 5 more application servers doesn't help if all of them are waiting on the same slow database query.

  4. No load testing. "We think we need to scale" is not evidence. Load test at 3-5x current peak traffic. See where it actually breaks.

  5. Vertical scaling taboo. A bigger database instance costs $200/month more and takes 10 minutes to provision. That's almost always cheaper than a week of architectural work.

  6. Sharding before trying everything else. Database sharding is the most complex scaling solution. Try indexes, query optimization, caching, read replicas, and vertical scaling first.

  7. Building for scale you'll never reach. If your SaaS has 500 users and is growing 10% per month, you don't need to handle 1 million users. You need to handle 5,000 users next year.

Key Takeaways

  • Monolith + queue handles 90% of SaaS workloads. One codebase, one database, one deployment, and a job queue for background work. Don't split until you have evidence you need to.

  • The database is usually the bottleneck. Add indexes, optimize queries, add caching, then consider read replicas. Most performance problems are solved by steps 1-3.

  • Split services for team scale, not technical scale. Microservices solve the "20 engineers can't work in one codebase" problem, not the "we need more performance" problem.

  • Measure before scaling. Load test at 3-5x current peak. Profile slow pages. Find the actual bottleneck. Then fix that specific thing.

  • Cache reads, not writes. Cache data that's read frequently and changes rarely. Invalidate on data change events. Serve stale data while refreshing in the background.

  • Scale the cheapest way first. A bigger database instance (minutes, $200/month) beats a month of architectural work. Vertical scaling is not a failure.

We help teams scale their systems appropriately as part of our custom software and consulting practice. If you need help with performance or scaling decisions, talk to our team or request a quote.

Topics covered

scaling architecturebackend scalingwhen to scalepremature optimizationYAGNI architecturemonolith vs microservicescaching strategydatabase bottleneck

Ready to build production AI systems?

Our team specializes in building production-ready AI systems. Let's discuss how we can help transform your enterprise with cutting-edge technology.

Start a conversation