Scaling Backend Systems Without Overengineering: The YAGNI Architecture Checklist
When to scale and when not to. Monolith + queue as the most underrated architecture, real scaling triggers, database bottlenecks, caching strategy, and the YAGNI checklist for architecture decisions.
The Overengineering Epidemic
We've seen it too many times: a team of 3 engineers building a microservices architecture for an application with 200 users. Kubernetes cluster, service mesh, event bus, API gateway, 5 separate databases, and a monitoring stack more complex than the application itself. Six months in, they're debugging distributed systems problems instead of building features.
The opposite also happens: a monolith serving 50,000 concurrent users, where every request takes 5 seconds because the database is the bottleneck and there's no caching layer. The team keeps adding application servers, but the database is the ceiling.
Both are architecture failures. This article covers when to scale, what to scale, and how to avoid scaling things that don't need it. For specific scaling patterns, see our system architecture guide and event-driven architecture guide.
The YAGNI Architecture Checklist
Before adding any architectural complexity (new service, new database, new message broker, new cache layer), ask:
| Question | If YES | If NO |
|---|---|---|
| Do we have a measured performance problem right now? | Fix the specific problem | Don't add complexity for a hypothetical problem |
| Would this solve a problem we've already encountered? | Consider it, with evidence | Don't solve problems you haven't had yet |
| Can we solve it with a simpler approach first? | Use the simpler approach | Complex solution might be needed |
| Will 10x more traffic break the current architecture? | Plan ahead (but don't build yet) | Current architecture is fine |
| Is the team large enough to maintain this? | Proceed if at least 2 people can own it | Don't add things nobody can maintain |
| Can we revert if it doesn't work? | Acceptable risk | Too risky |
The default answer to "should we add this?" is no. Add complexity when you have evidence it's needed, not when you imagine it might be needed.
Monolith + Queue: The Most Underrated Architecture
A monolithic application with a background job queue handles 90% of SaaS workloads. One codebase, one deployment, one database, and a queue for background work.
βββββββββββββββββββββββββββββββββββββββββββ
β Monolith β
β β
β ββββββββββββ ββββββββββββ β
β β API β β Admin β β
β β Routes β β UI β β
β ββββββ¬ββββββ ββββββ¬ββββββ β
β β β β
β ββββββΌβββββββββββββββΌββββββ β
β β Services β β
β β (business logic) β β
β ββββββ¬βββββββββββββββββββββ β
β β β
β ββββββΌβββββββββββββββββββββββ β
β β Database (PostgreSQL) β β
β βββββββββββββββββββββββββββββ β
β β
β βββββββββββββββββββββββββββββ β
β β Job Queue (BullMQ/Redis) β β
β β - Email sending β β
β β - Search indexing β β
β β - Report generation β β
β β - Data import β β
β βββββββββββββββββββββββββββββ β
βββββββββββββββββββββββββββββββββββββββββββ
Why It Works
- One codebase: Refactoring is safe. Find-and-replace works. Types flow through the entire application.
- One deployment: Deploy once. No service coordination. No distributed rollbacks.
- Shared database: Joins work. Transactions are ACID. No eventual consistency headaches.
- Job queue: Long-running work (emails, imports, indexing) doesn't block request handling.
- Simple debugging: Stack traces are complete. Logs are in one place. No distributed tracing needed.
When Monolith + Queue Breaks
| Symptom | Root Cause | Solution |
|---|---|---|
| API response time > 2s | Database queries too slow | Add indexes, query optimization, read replicas |
| Database CPU > 80% sustained | Query volume or complexity too high | Read replicas, caching layer, query optimization |
| Deploys are scary | Codebase too large, tests too slow | Better tests, feature flags, canary deploys (NOT microservices) |
| One feature breaks everything | No module boundaries | Better code organization (NOT microservices) |
| Team can't work in parallel | Code conflicts, merge hell | Module ownership, trunk-based development |
Notice: none of these symptoms are solved by adding microservices. They're solved by better engineering within the monolith. Microservices solve a different problem.
When to Split: The Real Triggers
Split into separate services only when:
1. Team scale forces it. When you have 20+ engineers and team boundaries don't align with the codebase. Team A owns feature X and team B owns feature Y, and they need to deploy independently. This is the primary reason for microservices.
2. Different scaling requirements. The search service needs 10x the compute of the product catalog service. Running them in the same process wastes resources. Separate services can scale independently.
3. Different technology requirements. The ML inference service needs Python and GPUs. The API service needs Node.js. Running both in one process is impractical.
4. Fault isolation. A crash in the payment service should not crash the product catalog service. When failure isolation is critical for business continuity, separate processes help.
5. Compliance boundaries. PCI-compliant payment processing needs to be separated from the rest of the application for audit purposes. A separate service with its own security boundary simplifies compliance.
What's NOT a Reason to Split
- "Microservices are best practice" (they're a trade-off, not a best practice)
- "We might need to scale" (scale when you need to, not before)
- "Clean architecture" (module boundaries within a monolith are cleaner than service boundaries)
- "Resume-driven development" (Kubernetes on your resume doesn't help your users)
The Database Is Usually the Bottleneck
In 80% of performance problems, the database is the bottleneck. Not the application code. Not the network. The database.
Diagnosis
-- Find slow queries (PostgreSQL)
SELECT query, calls, mean_exec_time, total_exec_time
FROM pg_stat_statements
ORDER BY mean_exec_time DESC
LIMIT 20;
-- Find missing indexes
SELECT schemaname, tablename, seq_scan, idx_scan,
CASE WHEN seq_scan > 0 THEN round(100.0 * idx_scan / (seq_scan + idx_scan), 1) ELSE 100 END AS idx_pct
FROM pg_stat_user_tables
WHERE seq_scan > 100
ORDER BY seq_scan DESC
LIMIT 20;
-- Find table bloat
SELECT tablename, pg_size_pretty(pg_total_relation_size(tablename::regclass)) AS total_size,
pg_size_pretty(pg_relation_size(tablename::regclass)) AS data_size
FROM pg_tables
WHERE schemaname = 'public'
ORDER BY pg_total_relation_size(tablename::regclass) DESC
LIMIT 20;
Fix Order
- Add missing indexes (minutes to implement, massive impact)
- Optimize slow queries (rewrite N+1 queries, add WHERE clauses, use EXPLAIN ANALYZE)
- Add caching (Redis for frequently read, rarely changed data)
- Read replicas (route read queries to replicas, writes to primary)
- Connection pooling (PgBouncer between application and database)
- Vertical scaling (bigger instance, more RAM, faster storage)
- Horizontal scaling (sharding, only when everything else is exhausted)
Most teams jump to step 7 before trying steps 1-3. Steps 1-3 are free or cheap and often solve the problem entirely.
Caching Strategy
What to Cache
| Data | Cache? | TTL | Invalidation |
|---|---|---|---|
| User sessions | Yes | Session duration | On logout |
| Product catalog | Yes | 5-15 minutes | On product update event |
| Search results | Maybe | 1-5 minutes | On index update |
| User-specific data | Careful | Short (1-5 min) | On user action |
| Config / settings | Yes | Long (1 hour) | On admin change |
| Computed aggregates | Yes | 15-60 minutes | On underlying data change |
| Real-time data (stock) | No | N/A | Always fetch live |
What NOT to Cache
- User-specific data with privacy implications: Cached cart contents served to the wrong user is a data breach.
- Rapidly changing data: Stock levels, live prices, auction bids. Cache TTL can't keep up.
- Write-heavy operations: If data changes more often than it's read, caching adds complexity without benefit.
Cache Invalidation Pattern
// Event-driven invalidation (recommended)
eventBus.on('product.updated', async (product) => {
await cache.delete(`product:${product.id}`);
await cache.delete(`product-list:${product.categoryId}`);
// Don't try to update the cache here. Let the next read populate it.
});
// Read-through with stale-while-revalidate
async function getProduct(id: string): Promise<Product> {
const cached = await cache.get(`product:${id}`);
if (cached && !isExpired(cached)) {
return cached.data;
}
// Stale but serve it while refreshing
if (cached && isExpired(cached)) {
refreshInBackground(id); // async, don't await
return cached.data; // serve stale
}
// Cache miss
const product = await db.products.findById(id);
await cache.set(`product:${id}`, product, { ttl: 300 });
return product;
}
The Scaling Conversation with Stakeholders
Engineers want to scale for technical reasons. Stakeholders want to scale for business reasons. The conversation needs a shared language.
| Stakeholder Says | What They Mean | Engineering Response |
|---|---|---|
| "We need to handle 10x traffic" | Marketing campaign coming, or wishful thinking | "What's the expected timeline? Let's load test current capacity first." |
| "The site is slow" | One page is slow, not the whole site | "Which page? Let me profile it." |
| "We need microservices" | They read an article or heard it at a conference | "What problem are we solving? Let me show you the current architecture." |
| "Can we handle Black Friday?" | Genuine concern about peak traffic | "Let's load test at 3x current peak and see where it breaks." |
Always start with measurement. "The site is slow" is not a scaling requirement. "Product listing page takes 4 seconds at 500 concurrent users, target is 500ms" is a scaling requirement with a measurable goal.
Common Pitfalls
-
Microservices at 100 users. If your monolith handles the load and your team is under 10 people, microservices add operational complexity without benefit.
-
Caching without invalidation strategy. Adding a cache is easy. Keeping it consistent with the database is hard. Plan invalidation before adding the cache.
-
Scaling the application when the database is the bottleneck. Adding 5 more application servers doesn't help if all of them are waiting on the same slow database query.
-
No load testing. "We think we need to scale" is not evidence. Load test at 3-5x current peak traffic. See where it actually breaks.
-
Vertical scaling taboo. A bigger database instance costs $200/month more and takes 10 minutes to provision. That's almost always cheaper than a week of architectural work.
-
Sharding before trying everything else. Database sharding is the most complex scaling solution. Try indexes, query optimization, caching, read replicas, and vertical scaling first.
-
Building for scale you'll never reach. If your SaaS has 500 users and is growing 10% per month, you don't need to handle 1 million users. You need to handle 5,000 users next year.
Key Takeaways
-
Monolith + queue handles 90% of SaaS workloads. One codebase, one database, one deployment, and a job queue for background work. Don't split until you have evidence you need to.
-
The database is usually the bottleneck. Add indexes, optimize queries, add caching, then consider read replicas. Most performance problems are solved by steps 1-3.
-
Split services for team scale, not technical scale. Microservices solve the "20 engineers can't work in one codebase" problem, not the "we need more performance" problem.
-
Measure before scaling. Load test at 3-5x current peak. Profile slow pages. Find the actual bottleneck. Then fix that specific thing.
-
Cache reads, not writes. Cache data that's read frequently and changes rarely. Invalidate on data change events. Serve stale data while refreshing in the background.
-
Scale the cheapest way first. A bigger database instance (minutes, $200/month) beats a month of architectural work. Vertical scaling is not a failure.
We help teams scale their systems appropriately as part of our custom software and consulting practice. If you need help with performance or scaling decisions, talk to our team or request a quote.
Topics covered
Related Guides
Enterprise Guide to Agentic AI Systems
Technical guide to agentic AI systems in enterprise environments. Learn the architecture, capabilities, and applications of autonomous AI agents.
Read guideAgentic Commerce: How to Let AI Agents Buy Things Safely
How to design governed AI agent-initiated commerce. Policy engines, HITL approval gates, HMAC receipts, idempotency, tenant scoping, and the full Agentic Checkout Protocol.
Read guideThe 9 Places Your AI System Leaks Data (and How to Seal Each One)
A systematic map of every place data leaks in AI systems. Prompts, embeddings, logs, tool calls, agent memory, error messages, cache, fine-tuning data, and agent handoffs.
Read guideReady to build production AI systems?
Our team specializes in building production-ready AI systems. Let's discuss how we can help transform your enterprise with cutting-edge technology.
Start a conversation