Platform Engineering vs DevOps: What Actually Changed (and What Didn't)
What platform engineering actually means in practice. Golden paths, self-service infrastructure, internal developer portals, documentation that works, and when platform engineering is overhead you can't afford.
The DevOps to Platform Engineering Shift
DevOps said "you build it, you run it." Every developer manages their own infrastructure, deploys their own services, configures their own monitoring. In theory, this creates ownership. In practice, it creates 15 different ways to deploy a service, 8 different monitoring setups, and every team reinventing the same CI/CD pipeline.
Platform engineering is the correction. Instead of every team building infrastructure from scratch, a platform team provides opinionated, self-service tools that make the right thing easy and the wrong thing hard.
The shift is real but oversold. For teams under 30 engineers, platform engineering is often overhead. For teams over 50, it's necessary. This article covers the practical patterns. For how we deploy infrastructure, see our IaC guide and Kubernetes guide.
What Platform Engineering Actually Is
Platform engineering is not a tool. It's not Backstage. It's not Kubernetes. It's an approach: build internal tools that make developers productive without requiring them to become infrastructure experts.
| DevOps Approach | Platform Engineering Approach |
|---|---|
| Every team writes their own Dockerfile | Platform provides a base image per language |
| Every team configures their own CI/CD | Platform provides a pipeline template, team fills in parameters |
| Every team sets up monitoring | Platform provides observability-as-a-service with standard dashboards |
| Every team manages their own database | Platform provides database provisioning via a form or API |
| New service setup takes 2 days | New service setup takes 15 minutes using a golden path |
The key metric: time to first deploy for a new service. If it takes a developer 2 days to go from "I need a new service" to "it's running in staging," your platform has a problem. If it takes 15 minutes, your platform is working.
Golden Paths
A golden path is an opinionated template for creating a new service. It includes everything a developer needs: project structure, CI/CD pipeline, Dockerfile, Kubernetes manifests, monitoring configuration, and documentation.
golden-path-typescript-api/
βββ src/
β βββ index.ts # Entry point with health check
β βββ routes/ # Route definitions
β βββ services/ # Business logic
βββ test/
β βββ unit/
β βββ integration/
βββ Dockerfile # Optimized multi-stage build
βββ .github/workflows/
β βββ ci-cd.yaml # Build, test, push, deploy
βββ kubernetes/
β βββ base/
β β βββ deployment.yaml
β β βββ service.yaml
β β βββ kustomization.yaml
β βββ overlay/
β βββ staging/
β βββ production/
βββ monitoring/
β βββ alerts.yaml # Standard alert rules
β βββ dashboard.json # Grafana dashboard template
βββ docs/
β βββ runbook.md # Operational runbook template
βββ .env.example
βββ package.json
βββ tsconfig.json
βββ README.md
A developer runs platform create-service --name my-api --template typescript-api, answers 5 questions (service name, team, database needed, public or internal), and gets a fully functional project with CI/CD, monitoring, and deployment manifests. First deploy in 15 minutes.
Golden Path Principles
| Principle | Why |
|---|---|
| Opinionated defaults | Don't offer 5 database options. Pick one (PostgreSQL) and make it easy. |
| Overridable | Advanced teams can customize. But the defaults should cover 80% of cases. |
| Maintained | When the platform updates (new base image, new security policy), all services using the golden path get the update. |
| Documented | The template itself is documentation. Comments in manifests explain why each configuration exists. |
| Tested | The golden path is a product. It has tests, CI, and versioning. |
What Makes a Good vs Bad Golden Path
| Good | Bad |
|---|---|
| Creates a working service in 15 minutes | Creates a skeleton that needs 2 days of configuration |
| Includes CI/CD, monitoring, deployment | Only includes project structure |
| Reflects current best practices | Reflects the original author's preferences from 2 years ago |
| Updated when platform changes | Never updated after creation |
| Has 1-2 options (TypeScript API, Python worker) | Has 15 options for every possible combination |
Self-Service Infrastructure
Developers should not need to file a ticket to get a database, a Redis instance, or a new Kubernetes namespace. Self-service means they can provision what they need through a UI, CLI, or API.
// Platform CLI: provision a PostgreSQL database
// $ platform db create --name my-api-db --size small --env staging
interface DatabaseRequest {
name: string;
size: 'small' | 'medium' | 'large'; // Opinionated: 3 sizes, not arbitrary specs
environment: 'dev' | 'staging' | 'production';
team: string;
backupRetention: number; // Default: 7 days staging, 30 days production
}
// Behind the scenes: Terraform runs, creates RDS instance,
// stores credentials in Secrets Manager, creates K8s secret,
// adds monitoring dashboard, notifies the team
What to Self-Service
| Resource | Self-Service? | Why / Why Not |
|---|---|---|
| Database (PostgreSQL) | Yes | Standard resource, opinionated sizes |
| Redis cache | Yes | Standard resource |
| Kubernetes namespace | Yes | Low risk, team-scoped |
| S3 bucket | Yes | Standard resource |
| Domain / DNS record | Yes (with approval) | Low risk but needs naming governance |
| IAM roles / permissions | No (request-based) | Security risk, needs review |
| VPC / networking changes | No (platform team) | Cluster-wide impact |
| New cloud account | No (platform team) | Cost and security governance |
The boundary: self-service for resources scoped to a team. Request-based for resources that affect the whole organization.
Internal Developer Portals
An internal developer portal is a catalog of all services, their owners, their APIs, their runbooks, and their health status. Backstage (by Spotify) is the most well-known, but it's not the only option.
What a Portal Should Show
| View | Content | Who Uses It |
|---|---|---|
| Service catalog | All services, owners, tech stack, links to repos | Everyone |
| API documentation | OpenAPI/GraphQL specs, auto-generated | Frontend teams, partners |
| Runbooks | Operational procedures per service | On-call engineers |
| Dependencies | Who depends on what | Architecture reviews |
| Health status | Current status, recent incidents | Ops, management |
| Cost | Monthly cost per service/team | Finance, management |
| Golden paths | Templates for new services | Developers |
Backstage vs Alternatives
| Option | Effort | Best For |
|---|---|---|
| Backstage | High (6+ weeks to set up, ongoing maintenance) | Large orgs (100+ engineers), dedicated platform team |
| Custom portal (Next.js + API) | Medium (2-4 weeks MVP) | Mid-size teams, specific needs |
| Enhanced README + wiki | Low (days) | Small teams (< 30 engineers) |
| Notion/Confluence | Low | Non-technical stakeholders need access |
For teams under 30 engineers, Backstage is overkill. A well-organized Git repository with README files, a shared Notion wiki, and a simple service catalog spreadsheet covers 80% of the need.
For teams over 50 engineers, the catalog problem becomes real. Services get created and forgotten. Owners leave and nobody knows who maintains what. A portal with ownership tracking and health dashboards becomes essential.
The Documentation Problem
Nobody reads your wiki. This is not a people problem. It's a location problem. Documentation that lives separately from the code it describes becomes stale within weeks.
Documentation That Works
| Type | Where It Lives | Why |
|---|---|---|
| API documentation | Auto-generated from code (OpenAPI, GraphQL introspection) | Always current |
| Runbooks | In the service repo (docs/runbook.md) | Deployed with the code |
| Architecture decisions | ADR files in the repo (docs/adr/) | Version-controlled |
| Onboarding | In the golden path template | Every new service starts with it |
| Platform capabilities | Portal or platform CLI --help | Discoverable at point of need |
Architecture Decision Records (ADRs)
Every significant technical decision gets an ADR:
# ADR-003: Use PostgreSQL as the primary database
## Status: Accepted
## Context
We need a database for the new service. Options considered: PostgreSQL, MySQL, DynamoDB.
## Decision
PostgreSQL 15 via the platform's self-service database provisioning.
## Consequences
- Standard tooling (backups, monitoring, migrations) works out of the box
- Team doesn't need DynamoDB expertise
- Slightly higher latency than DynamoDB for key-value access patterns (acceptable)
ADRs prevent re-debating the same decisions. When a new team member asks "why PostgreSQL?", the answer is in the repo, not in someone's memory.
Measuring Developer Experience
If you're investing in a platform, measure whether it's actually helping:
| Metric | What It Measures | Good Target |
|---|---|---|
| Time to first deploy | How long from "new service idea" to "running in staging" | < 30 minutes |
| Deployment frequency | How often teams deploy to production | Multiple times per day |
| Lead time for changes | Time from commit to production | < 1 hour |
| Change failure rate | Percentage of deployments causing incidents | < 5% |
| MTTR | Mean time to recover from incidents | < 30 minutes |
| Developer satisfaction | Survey score (quarterly) | > 4/5 |
| Support ticket volume | Platform-related requests per week | Decreasing trend |
The first four are DORA metrics. The last three are platform-specific. Track all of them. If time-to-first-deploy is improving but developer satisfaction is dropping, the platform is adding complexity without value.
When Platform Engineering Is Overhead
Platform engineering is not free. A platform team costs 2-5 full-time engineers. Golden paths need maintenance. Self-service tools need development and support. A portal needs content.
Don't Build a Platform When
- Your team is under 20 engineers (the overhead exceeds the benefit)
- You have fewer than 5 services (not enough standardization opportunity)
- You're a startup that might pivot (the platform will be wasted)
- Your engineering process is already fast (if time-to-deploy is already 30 minutes, you don't need a platform team to improve it)
Do Build a Platform When
- You have 50+ engineers deploying 10+ services
- New service creation takes more than a day
- Teams are reinventing the same infrastructure patterns
- On-call is painful because every service has different monitoring
- Developer satisfaction is low because of infrastructure friction
The minimum viable platform for a 30-50 person team:
- One golden path template (the most common service type)
- CI/CD pipeline template (shared, parameterized)
- Standard monitoring (Grafana dashboards auto-provisioned)
- Service catalog (even if it's just a spreadsheet)
- One-page platform guide ("how to create a new service")
That's it. No Backstage, no custom portal, no self-service infrastructure. Just templates and standards. Add complexity when the team outgrows the simple approach.
Common Pitfalls
-
Building a platform before you have standardization. If every service uses a different language, framework, and deployment method, a platform can't help. Standardize first, then platform.
-
Backstage before 50 engineers. Backstage is powerful but complex. For smaller teams, the setup and maintenance cost exceeds the benefit.
-
Golden paths that are never updated. A template from 2 years ago with outdated dependencies and deprecated patterns does more harm than good. Maintain it like a product.
-
Self-service everything. IAM roles and networking changes should not be self-service. The blast radius of a mistake is too large.
-
Measuring activity, not outcomes. "We built 15 platform features" is not success. "Time-to-first-deploy dropped from 2 days to 30 minutes" is success.
-
Ignoring developer satisfaction. A platform that forces developers into patterns they hate will be bypassed. Talk to your users (the developers) regularly.
-
No documentation. A self-service platform without documentation is just a different kind of black box.
Key Takeaways
-
Platform engineering is not a tool, it's an approach. Build internal tools that make developers productive without requiring infrastructure expertise. The golden path is the core product.
-
Time to first deploy is the key metric. If a developer can go from idea to running service in 15 minutes, the platform is working. If it takes 2 days, it's not.
-
Start simple. One golden path, one CI/CD template, standard monitoring. Add complexity when the simple approach isn't enough. Most teams under 30 engineers don't need Backstage.
-
Documentation lives with the code. Auto-generated API docs, runbooks in the repo, ADRs for decisions. Wiki pages become stale. In-repo docs stay current.
-
Measure outcomes, not activity. DORA metrics plus developer satisfaction. If the numbers aren't improving, the platform investment isn't working.
-
The platform team is a product team. Their users are developers. Their product is the internal platform. They need feedback loops, user research, and iteration just like any product team.
We build internal platforms and developer tooling as part of our cloud services and consulting practice. If you need help with platform engineering strategy, talk to our team or request a quote. See also our methodology page for how we approach engineering culture.
Topics covered
Related Guides
Enterprise Guide to Agentic AI Systems
Technical guide to agentic AI systems in enterprise environments. Learn the architecture, capabilities, and applications of autonomous AI agents.
Read guideAgentic Commerce: How to Let AI Agents Buy Things Safely
How to design governed AI agent-initiated commerce. Policy engines, HITL approval gates, HMAC receipts, idempotency, tenant scoping, and the full Agentic Checkout Protocol.
Read guideThe 9 Places Your AI System Leaks Data (and How to Seal Each One)
A systematic map of every place data leaks in AI systems. Prompts, embeddings, logs, tool calls, agent memory, error messages, cache, fine-tuning data, and agent handoffs.
Read guideReady to build production AI systems?
Our team specializes in building production-ready AI systems. Let's discuss how we can help transform your enterprise with cutting-edge technology.
Start a conversation