Technical Guide

Scaling Backend Systems Without Overengineering: The YAGNI Architecture Checklist

When to scale and when not to. Monolith + queue as the most underrated architecture, real scaling triggers, database bottlenecks, caching strategy, and the YAGNI checklist for architecture decisions.

February 11, 202614 min readOronts Engineering Team

The Overengineering Epidemic

We've seen it too many times: a team of 3 engineers building a microservices architecture for an application with 200 users. Kubernetes cluster, service mesh, event bus, API gateway, 5 separate databases, and a monitoring stack more complex than the application itself. Six months in, they're debugging distributed systems problems instead of building features.

The opposite also happens: a monolith serving 50,000 concurrent users, where every request takes 5 seconds because the database is the bottleneck and there's no caching layer. The team keeps adding application servers, but the database is the ceiling.

Both are architecture failures. This article covers when to scale, what to scale, and how to avoid scaling things that don't need it. For specific scaling patterns, see our system architecture guide and event-driven architecture guide.

The YAGNI Architecture Checklist

Before adding any architectural complexity (new service, new database, new message broker, new cache layer), ask:

Question	If YES	If NO
Do we have a measured performance problem right now?	Fix the specific problem	Don't add complexity for a hypothetical problem
Would this solve a problem we've already encountered?	Consider it, with evidence	Don't solve problems you haven't had yet
Can we solve it with a simpler approach first?	Use the simpler approach	Complex solution might be needed
Will 10x more traffic break the current architecture?	Plan ahead (but don't build yet)	Current architecture is fine
Is the team large enough to maintain this?	Proceed if at least 2 people can own it	Don't add things nobody can maintain
Can we revert if it doesn't work?	Acceptable risk	Too risky

The default answer to "should we add this?" is no. Add complexity when you have evidence it's needed, not when you imagine it might be needed.

Monolith + Queue: The Most Underrated Architecture

A monolithic application with a background job queue handles 90% of SaaS workloads. One codebase, one deployment, one database, and a queue for background work.

┌─────────────────────────────────────────┐
│            Monolith                      │
│                                          │
│  ┌──────────┐  ┌──────────┐            │
│  │  API      │  │  Admin   │            │
│  │  Routes   │  │  UI      │            │
│  └────┬─────┘  └────┬─────┘            │
│       │              │                   │
│  ┌────▼──────────────▼─────┐            │
│  │    Services              │            │
│  │    (business logic)      │            │
│  └────┬────────────────────┘            │
│       │                                  │
│  ┌────▼──────────────────────┐          │
│  │  Database (PostgreSQL)    │          │
│  └───────────────────────────┘          │
│                                          │
│  ┌───────────────────────────┐          │
│  │  Job Queue (BullMQ/Redis) │          │
│  │  - Email sending          │          │
│  │  - Search indexing        │          │
│  │  - Report generation      │          │
│  │  - Data import            │          │
│  └───────────────────────────┘          │
└─────────────────────────────────────────┘

Why It Works

One codebase: Refactoring is safe. Find-and-replace works. Types flow through the entire application.
One deployment: Deploy once. No service coordination. No distributed rollbacks.
Shared database: Joins work. Transactions are ACID. No eventual consistency headaches.
Job queue: Long-running work (emails, imports, indexing) doesn't block request handling.
Simple debugging: Stack traces are complete. Logs are in one place. No distributed tracing needed.

When Monolith + Queue Breaks

Symptom	Root Cause	Solution
API response time > 2s	Database queries too slow	Add indexes, query optimization, read replicas
Database CPU > 80% sustained	Query volume or complexity too high	Read replicas, caching layer, query optimization
Deploys are scary	Codebase too large, tests too slow	Better tests, feature flags, canary deploys (NOT microservices)
One feature breaks everything	No module boundaries	Better code organization (NOT microservices)
Team can't work in parallel	Code conflicts, merge hell	Module ownership, trunk-based development

Notice: none of these symptoms are solved by adding microservices. They're solved by better engineering within the monolith. Microservices solve a different problem.

When to Split: The Real Triggers

Split into separate services only when:

1. Team scale forces it. When you have 20+ engineers and team boundaries don't align with the codebase. Team A owns feature X and team B owns feature Y, and they need to deploy independently. This is the primary reason for microservices.

2. Different scaling requirements. The search service needs 10x the compute of the product catalog service. Running them in the same process wastes resources. Separate services can scale independently.

3. Different technology requirements. The ML inference service needs Python and GPUs. The API service needs Node.js. Running both in one process is impractical.

4. Fault isolation. A crash in the payment service should not crash the product catalog service. When failure isolation is critical for business continuity, separate processes help.

5. Compliance boundaries. PCI-compliant payment processing needs to be separated from the rest of the application for audit purposes. A separate service with its own security boundary simplifies compliance.

What's NOT a Reason to Split

"Microservices are best practice" (they're a trade-off, not a best practice)
"We might need to scale" (scale when you need to, not before)
"Clean architecture" (module boundaries within a monolith are cleaner than service boundaries)
"Resume-driven development" (Kubernetes on your resume doesn't help your users)

The Database Is Usually the Bottleneck

In 80% of performance problems, the database is the bottleneck. Not the application code. Not the network. The database.

Diagnosis

-- Find slow queries (PostgreSQL)
SELECT query, calls, mean_exec_time, total_exec_time
FROM pg_stat_statements
ORDER BY mean_exec_time DESC
LIMIT 20;

-- Find missing indexes
SELECT schemaname, tablename, seq_scan, idx_scan,
       CASE WHEN seq_scan > 0 THEN round(100.0 * idx_scan / (seq_scan + idx_scan), 1) ELSE 100 END AS idx_pct
FROM pg_stat_user_tables
WHERE seq_scan > 100
ORDER BY seq_scan DESC
LIMIT 20;

-- Find table bloat
SELECT tablename, pg_size_pretty(pg_total_relation_size(tablename::regclass)) AS total_size,
       pg_size_pretty(pg_relation_size(tablename::regclass)) AS data_size
FROM pg_tables
WHERE schemaname = 'public'
ORDER BY pg_total_relation_size(tablename::regclass) DESC
LIMIT 20;

Fix Order

Add missing indexes (minutes to implement, massive impact)
Optimize slow queries (rewrite N+1 queries, add WHERE clauses, use EXPLAIN ANALYZE)
Add caching (Redis for frequently read, rarely changed data)
Read replicas (route read queries to replicas, writes to primary)
Connection pooling (PgBouncer between application and database)
Vertical scaling (bigger instance, more RAM, faster storage)
Horizontal scaling (sharding, only when everything else is exhausted)

Most teams jump to step 7 before trying steps 1-3. Steps 1-3 are free or cheap and often solve the problem entirely.

Caching Strategy

What to Cache

Data	Cache?	TTL	Invalidation
User sessions	Yes	Session duration	On logout
Product catalog	Yes	5-15 minutes	On product update event
Search results	Maybe	1-5 minutes	On index update
User-specific data	Careful	Short (1-5 min)	On user action
Config / settings	Yes	Long (1 hour)	On admin change
Computed aggregates	Yes	15-60 minutes	On underlying data change
Real-time data (stock)	No	N/A	Always fetch live

What NOT to Cache

User-specific data with privacy implications: Cached cart contents served to the wrong user is a data breach.
Rapidly changing data: Stock levels, live prices, auction bids. Cache TTL can't keep up.
Write-heavy operations: If data changes more often than it's read, caching adds complexity without benefit.

Cache Invalidation Pattern

// Event-driven invalidation (recommended)
eventBus.on('product.updated', async (product) => {
    await cache.delete(`product:${product.id}`);
    await cache.delete(`product-list:${product.categoryId}`);
    // Don't try to update the cache here. Let the next read populate it.
});

// Read-through with stale-while-revalidate
async function getProduct(id: string): Promise<Product> {
    const cached = await cache.get(`product:${id}`);
    if (cached && !isExpired(cached)) {
        return cached.data;
    }

    // Stale but serve it while refreshing
    if (cached && isExpired(cached)) {
        refreshInBackground(id); // async, don't await
        return cached.data;      // serve stale
    }

    // Cache miss
    const product = await db.products.findById(id);
    await cache.set(`product:${id}`, product, { ttl: 300 });
    return product;
}

The Scaling Conversation with Stakeholders

Engineers want to scale for technical reasons. Stakeholders want to scale for business reasons. The conversation needs a shared language.

Stakeholder Says	What They Mean	Engineering Response
"We need to handle 10x traffic"	Marketing campaign coming, or wishful thinking	"What's the expected timeline? Let's load test current capacity first."
"The site is slow"	One page is slow, not the whole site	"Which page? Let me profile it."
"We need microservices"	They read an article or heard it at a conference	"What problem are we solving? Let me show you the current architecture."
"Can we handle Black Friday?"	Genuine concern about peak traffic	"Let's load test at 3x current peak and see where it breaks."

Always start with measurement. "The site is slow" is not a scaling requirement. "Product listing page takes 4 seconds at 500 concurrent users, target is 500ms" is a scaling requirement with a measurable goal.

Common Pitfalls

Microservices at 100 users. If your monolith handles the load and your team is under 10 people, microservices add operational complexity without benefit.
Caching without invalidation strategy. Adding a cache is easy. Keeping it consistent with the database is hard. Plan invalidation before adding the cache.
Scaling the application when the database is the bottleneck. Adding 5 more application servers doesn't help if all of them are waiting on the same slow database query.
No load testing. "We think we need to scale" is not evidence. Load test at 3-5x current peak traffic. See where it actually breaks.
Vertical scaling taboo. A bigger database instance costs $200/month more and takes 10 minutes to provision. That's almost always cheaper than a week of architectural work.
Sharding before trying everything else. Database sharding is the most complex scaling solution. Try indexes, query optimization, caching, read replicas, and vertical scaling first.
Building for scale you'll never reach. If your SaaS has 500 users and is growing 10% per month, you don't need to handle 1 million users. You need to handle 5,000 users next year.

Key Takeaways

Monolith + queue handles 90% of SaaS workloads. One codebase, one database, one deployment, and a job queue for background work. Don't split until you have evidence you need to.
The database is usually the bottleneck. Add indexes, optimize queries, add caching, then consider read replicas. Most performance problems are solved by steps 1-3.
Split services for team scale, not technical scale. Microservices solve the "20 engineers can't work in one codebase" problem, not the "we need more performance" problem.
Measure before scaling. Load test at 3-5x current peak. Profile slow pages. Find the actual bottleneck. Then fix that specific thing.
Cache reads, not writes. Cache data that's read frequently and changes rarely. Invalidate on data change events. Serve stale data while refreshing in the background.
Scale the cheapest way first. A bigger database instance (minutes, $200/month) beats a month of architectural work. Vertical scaling is not a failure.

We help teams scale their systems appropriately as part of our custom software and consulting practice. If you need help with performance or scaling decisions, talk to our team or request a quote.

Topics covered

scaling architecturebackend scalingwhen to scalepremature optimizationYAGNI architecturemonolith vs microservicescaching strategydatabase bottleneck

Ready to build production AI systems?

Our team specializes in building production-ready AI systems. Let's discuss how we can help transform your enterprise with cutting-edge technology.

Start a conversation

Scaling Backend Systems Without Overengineering: The YAGNI Architecture Checklist

The Overengineering Epidemic

The YAGNI Architecture Checklist

Monolith + Queue: The Most Underrated Architecture

Why It Works

When Monolith + Queue Breaks

When to Split: The Real Triggers

What's NOT a Reason to Split

The Database Is Usually the Bottleneck

Diagnosis

Fix Order

Caching Strategy

What to Cache

What NOT to Cache

Cache Invalidation Pattern

The Scaling Conversation with Stakeholders

Common Pitfalls

Key Takeaways

Topics covered

Related Guides

Enterprise Guide to Agentic AI Systems

Agentic Commerce: How to Let AI Agents Buy Things Safely

The 9 Places Your AI System Leaks Data (and How to Seal Each One)

Ready to build production AI systems?

Get the Latest AI Insights

Services

Solutions

Company

Resources

Legal