Designing Multi-Tenant Systems That Don't Break at Scale
How to design multi-tenant architectures with real isolation. Three enforcement layers, hierarchical scoping, RBAC patterns, and lessons from building three different multi-tenant systems.
Multi-Tenancy Is Not a Database Decision
Most articles about multi-tenancy start with the database question: shared database, shared schema, or dedicated database per tenant? That's the wrong place to start. The database model is an implementation detail. The architecture question is: how do you enforce isolation at every layer of the system so that tenant A can never see, modify, or affect tenant B's data?
We've built three different multi-tenant systems. Each solved isolation differently because each had different constraints. One uses hierarchical scoping with five identity levels. Another uses flat RBAC with four roles. A third uses organization-based scoping with channel isolation. The database model was the least interesting decision in all three.
This article covers the architectural patterns that make multi-tenancy safe at scale. For broader context on how we approach system architecture, that guide covers our methodology. For specific examples of multi-tenant AI systems, see our guides on agentic commerce and AI governance.
The Three-Layer Enforcement Model
Tenant isolation must be enforced at three layers. If any layer is missing, data leaks.
βββββββββββββββββββββββββββββββββββββββββββββββββββ
β Layer 1: API MIDDLEWARE β
β Every request authenticated and scoped β
β tenant_id extracted from JWT/API key β
β Injected into request context β
β β
βββββββββββββββββββββββββββββββββββββββββββββββββββ€
β Layer 2: QUERY FILTERS β
β Every database query includes tenant_id β
β Every search query scoped by tenant β
β No query runs without tenant context β
β β
βββββββββββββββββββββββββββββββββββββββββββββββββββ€
β Layer 3: POLICY ENFORCEMENT β
β Tool calls checked against tenant policies β
β Agent memory scoped per tenant + session β
β Output filtered by tenant visibility rules β
β β
βββββββββββββββββββββββββββββββββββββββββββββββββββ
Layer 1: API Middleware
Every incoming request must be authenticated and scoped to a tenant before it reaches any business logic.
// Middleware extracts tenant context from every request
async function tenantMiddleware(req: Request, res: Response, next: NextFunction) {
const token = req.headers.authorization?.replace('Bearer ', '');
if (!token) return res.status(401).json({ error: 'No token provided' });
const decoded = await verifyJwt(token);
const tenant = await tenantStore.getById(decoded.tenant_id);
if (!tenant || tenant.status !== 'active') {
return res.status(403).json({ error: 'Tenant not found or suspended' });
}
// Inject tenant context into request
req.tenantContext = {
tenantId: tenant.id,
channelId: decoded.channel_id,
role: decoded.role,
permissions: decoded.permissions,
};
next();
}
The tenant context is not optional. Every route handler, every service method, every database query receives it. If a function doesn't have tenant context, it cannot access tenant-scoped data.
Layer 2: Query Filters
Every database query must include the tenant scope. This is not enforced by convention ("remember to add the WHERE clause"). It's enforced by architecture.
// Repository base class that enforces tenant scoping
class TenantScopedRepository<T> {
async findMany(tenantId: string, filters: Partial<T>): Promise<T[]> {
return this.db.query({
TableName: this.tableName,
KeyConditionExpression: 'tenant_id = :tid',
FilterExpression: this.buildFilterExpression(filters),
ExpressionAttributeValues: {
':tid': tenantId,
...this.buildFilterValues(filters),
},
});
}
async findById(tenantId: string, id: string): Promise<T | null> {
const result = await this.db.get({
TableName: this.tableName,
Key: { tenant_id: tenantId, id },
});
return result.Item || null;
}
// No method exists that queries without tenant_id
// Cross-tenant queries are architecturally impossible
}
For search engines (OpenSearch, MeiliSearch, Elasticsearch), every query includes a tenant filter:
async function searchProducts(tenantId: string, channelId: string, query: string) {
return opensearch.search({
index: 'products',
body: {
query: {
bool: {
must: [{ match: { searchText: query } }],
filter: [
{ term: { tenant_id: tenantId } },
{ term: { channel_ids: channelId } },
],
},
},
},
});
}
Layer 3: Policy Enforcement
Beyond data access, tenants have different permissions for what actions they can perform. The policy layer checks these before any action executes.
interface TenantPolicy {
tenant_id: string;
rules: PolicyRule[];
}
interface PolicyRule {
action: string; // "create_order", "export_data", "use_ai_agent"
effect: "allow" | "deny";
conditions?: {
max_value?: number;
allowed_channels?: string[];
require_approval?: boolean;
};
}
// Policy check before any action
async function checkPolicy(tenantId: string, action: string, params: any): Promise<boolean> {
const policy = await policyStore.getForTenant(tenantId);
const matchingRules = policy.rules.filter(r => r.action === action);
// Deny rules take precedence
if (matchingRules.some(r => r.effect === 'deny')) return false;
// No matching allow rule = denied (default-deny)
const allowRule = matchingRules.find(r => r.effect === 'allow');
if (!allowRule) return false;
// Check conditions
if (allowRule.conditions?.max_value && params.value > allowRule.conditions.max_value) {
return false;
}
return true;
}
Hierarchical vs Flat Scoping
Flat Scoping (Simple SaaS)
Every resource belongs to exactly one tenant. No sub-levels.
Tenant A
βββ Users (owner, admin, member, viewer)
βββ Products
βββ Orders
βββ Settings
Tenant B
βββ Users
βββ Products
βββ Orders
βββ Settings
Good for: SaaS products where each customer is an isolated workspace. Think project management tools, CRM systems, internal dashboards.
Flat scoping needs four role tiers:
| Role | Permissions |
|---|---|
| Owner | Full access, billing, delete tenant |
| Admin | Manage users, settings, all data |
| Member | Create and edit own data, view shared data |
| Viewer | Read-only access to shared data |
Hierarchical Scoping (Enterprise Platforms)
Resources are scoped through multiple levels. Each level narrows visibility.
Tenant (merchant organization)
βββ Channel (storefront, API, widget)
βββ Supplier Binding (which suppliers visible per channel)
βββ Customer (end user within a channel)
βββ Session (browser/device session)
βββ Agent Thread (single AI conversation)
Good for: marketplace platforms, multi-brand commerce, enterprise systems where one organization has multiple storefronts, sales channels, or subsidiary brands.
Each level adds a filter. A product visible in Channel A is not necessarily visible in Channel B, even within the same tenant. A customer in Channel A has no access to Channel B's data. An agent thread in one session cannot see conversations from another session.
// Hierarchical context passed through every operation
interface TenantContext {
tenantId: string; // organization
channelId: string; // storefront or sales channel
customerId?: string; // end user (if authenticated)
sessionId?: string; // browser session
threadId?: string; // AI conversation thread
}
// Query scoped to full hierarchy
async function getVisibleProducts(ctx: TenantContext) {
const channel = await channelStore.get(ctx.tenantId, ctx.channelId);
return productStore.findMany({
tenant_id: ctx.tenantId,
supplier_id: { $in: channel.visibleSupplierIds },
status: 'active',
});
}
Hybrid Scoping
Some systems need flat scoping for most resources but hierarchical scoping for specific features. For example, a Vendure commerce installation might use flat scoping (one tenant per store) but channel-based scoping for product visibility and pricing.
// Vendure's channel scoping
async findByCustomer(ctx: RequestContext, customerId: number) {
return this.connection.getRepository(ctx, CiWishlist).find({
where: {
customerId,
channelId: ctx.channelId, // Channel scoping within tenant
},
});
}
For more on how we implement channel scoping in Vendure, see our Vendure production architecture guide.
Auth and RBAC Patterns
JWT-Based Tenant Scoping
The JWT token carries tenant identity. Every API request includes it.
// JWT payload structure
interface TenantJwtPayload {
sub: string; // user ID
tenant_id: string; // which tenant
channel_id?: string; // which channel (if applicable)
role: string; // owner | admin | member | viewer
permissions: string[]; // fine-grained permissions
iat: number;
exp: number;
}
The tenant_id in the JWT is the primary scoping mechanism. It's set at login time and cannot be changed without re-authenticating. The backend extracts it from every request and uses it to scope all data access.
API Key Authentication
For machine-to-machine communication (ERP integrations, external services, webhooks), API keys map to tenants:
async function apiKeyMiddleware(req: Request, res: Response, next: NextFunction) {
const apiKey = req.headers['x-api-key'];
if (!apiKey) return next(); // fall through to JWT auth
const keyRecord = await apiKeyStore.findByKey(apiKey);
if (!keyRecord || keyRecord.status !== 'active') {
return res.status(401).json({ error: 'Invalid API key' });
}
req.tenantContext = {
tenantId: keyRecord.tenantId,
channelId: keyRecord.channelId,
role: keyRecord.role,
permissions: keyRecord.permissions,
};
next();
}
API keys are tenant-scoped. Key rotation doesn't change the tenant binding. Rate limits and permission scopes are per-key, not per-tenant.
Permission Granularity
Roles define broad access levels. Permissions define specific capabilities:
const PERMISSIONS = {
// Product management
PRODUCT_READ: 'product:read',
PRODUCT_CREATE: 'product:create',
PRODUCT_UPDATE: 'product:update',
PRODUCT_DELETE: 'product:delete',
// Order management
ORDER_READ: 'order:read',
ORDER_CREATE: 'order:create',
ORDER_CANCEL: 'order:cancel',
ORDER_REFUND: 'order:refund',
// AI features
AI_AGENT_USE: 'ai:agent:use',
AI_AGENT_CONFIGURE: 'ai:agent:configure',
AI_EXPORT: 'ai:export',
// Admin
USER_MANAGE: 'user:manage',
SETTINGS_MANAGE: 'settings:manage',
BILLING_MANAGE: 'billing:manage',
};
// Role-permission mapping
const ROLE_PERMISSIONS = {
owner: Object.values(PERMISSIONS),
admin: Object.values(PERMISSIONS).filter(p => p !== 'billing:manage'),
member: ['product:read', 'product:create', 'product:update', 'order:read', 'order:create', 'ai:agent:use'],
viewer: ['product:read', 'order:read'],
};
What Happens When Scoping Fails
The most instructive way to understand why three-layer enforcement matters is to see what breaks when each layer is missing.
| Missing Layer | What Happens | Real Example |
|---|---|---|
| No API middleware | Any request with a valid JWT can access any tenant's data by guessing tenant IDs | Competitor scrapes your customer's product catalog |
| No query filters | A developer forgets the WHERE clause in a new endpoint, cross-tenant data leaks | Admin dashboard shows all customers across all tenants |
| No policy enforcement | A tenant with "starter" plan accesses "enterprise" features through direct API calls | Free-tier tenant exports unlimited data, bypassing plan limits |
The scariest version: all three layers work for reads but not for writes. Tenant A can't see tenant B's data, but a bug in the update endpoint lets tenant A overwrite tenant B's product prices. We caught this in testing. In production, it would have been catastrophic.
Testing Multi-Tenant Isolation
Testing multi-tenancy requires specific test patterns that most test suites don't cover.
The Cross-Tenant Access Test
For every endpoint, test that tenant A cannot access tenant B's data:
describe('Tenant isolation', () => {
it('tenant A cannot read tenant B products', async () => {
// Create product as tenant B
const product = await createProduct(tenantB.token, { name: 'Secret Product' });
// Try to read it as tenant A
const response = await api.get(`/products/${product.id}`, {
headers: { Authorization: `Bearer ${tenantA.token}` },
});
expect(response.status).toBe(404); // Not 403, not 200 with empty data
});
it('tenant A cannot update tenant B products', async () => {
const product = await createProduct(tenantB.token, { name: 'Original' });
const response = await api.patch(`/products/${product.id}`, {
headers: { Authorization: `Bearer ${tenantA.token}` },
body: { name: 'Hacked' },
});
expect(response.status).toBe(404);
// Verify product wasn't modified
const check = await api.get(`/products/${product.id}`, {
headers: { Authorization: `Bearer ${tenantB.token}` },
});
expect(check.body.name).toBe('Original');
});
});
Return 404 (not 403) for cross-tenant access attempts. A 403 confirms the resource exists, which is itself an information leak.
The Search Isolation Test
Verify that search results are tenant-scoped:
it('search results are tenant-scoped', async () => {
await createProduct(tenantA.token, { name: 'Widget Alpha' });
await createProduct(tenantB.token, { name: 'Widget Beta' });
await waitForSearchIndex();
const results = await api.get('/search?q=Widget', {
headers: { Authorization: `Bearer ${tenantA.token}` },
});
expect(results.body.items).toHaveLength(1);
expect(results.body.items[0].name).toBe('Widget Alpha');
// Widget Beta must not appear
});
The Bulk Operation Test
Verify that bulk operations (exports, imports, batch updates) respect tenant boundaries:
it('export only includes own tenant data', async () => {
const exportResult = await api.post('/export/products', {
headers: { Authorization: `Bearer ${tenantA.token}` },
});
const exportedIds = exportResult.body.products.map(p => p.id);
const tenantBProducts = await getAllProducts(tenantB.token);
const tenantBIds = tenantBProducts.map(p => p.id);
// No tenant B product IDs in tenant A's export
const overlap = exportedIds.filter(id => tenantBIds.includes(id));
expect(overlap).toHaveLength(0);
});
For how we approach testing more broadly, see our software engineering guide.
Shared vs Dedicated Infrastructure
| Model | When to Use | Trade-offs |
|---|---|---|
| Shared everything (one DB, one schema) | SaaS with many small tenants | Cheapest. Hardest to isolate. Noisy neighbor risk. |
| Shared DB, separate schemas | Medium tenants needing logical isolation | Good isolation. More migration complexity. |
| Dedicated databases | Enterprise tenants with compliance requirements | Best isolation. Most expensive. Hardest to manage. |
| Dedicated clusters | Regulated industries (healthcare, finance) | Complete isolation. Highest cost. Separate ops per tenant. |
For most SaaS applications, shared everything with application-level enforcement (the three-layer model above) is the right choice. It's simpler to operate, cheaper to run, and if the enforcement layers are correct, just as secure.
Dedicated infrastructure becomes necessary when tenants have regulatory requirements that mandate physical isolation (e.g., data must reside in a specific country), or when one tenant's workload is so large that it affects others (the noisy neighbor problem).
Common Pitfalls
-
Treating multi-tenancy as a database problem. The database model (shared vs dedicated) is the least important decision. The enforcement model (three layers) is the most important.
-
Enforcing isolation by convention. "Developers should always include tenant_id in queries" is not a strategy. Make it architecturally impossible to query without tenant context.
-
Returning 403 for cross-tenant access. Return 404. A 403 confirms the resource exists, which leaks information across tenants.
-
No cross-tenant access tests. Every endpoint needs a test that verifies tenant A cannot access tenant B's data. For both reads and writes.
-
Forgetting search index isolation. Database queries might be scoped, but if the search index isn't filtered by tenant, search results leak across tenants.
-
Shared caches without tenant key prefix. If your Redis cache key is
product:123, it's shared across tenants. Usetenant_abc:product:123. -
Background jobs without tenant context. A scheduled job that processes "all pending orders" without tenant scoping processes every tenant's orders in one batch. Pass tenant context through job payloads.
-
No rate limiting per tenant. One tenant's bulk import shouldn't degrade performance for all other tenants. Rate limit per tenant, not just per IP.
Key Takeaways
-
Three-layer enforcement is non-negotiable. API middleware, query filters, and policy enforcement. All three. Every request, every query, every action.
-
Hierarchical scoping handles enterprise complexity. Flat scoping works for simple SaaS. Enterprise platforms need tenant, channel, customer, session, and thread-level scoping.
-
Make cross-tenant access architecturally impossible. Don't rely on developers remembering WHERE clauses. Repository base classes that require tenant_id as a mandatory parameter.
-
Test isolation explicitly. Every endpoint needs a cross-tenant access test. For reads, writes, searches, exports, and bulk operations.
-
Return 404, not 403. Cross-tenant access attempts should look like the resource doesn't exist, not like the user doesn't have permission.
-
Shared infrastructure with application-level isolation works for most cases. Dedicated infrastructure is for regulatory requirements or noisy neighbor problems, not for security.
We apply these patterns across our AI services, custom software projects, and commerce platforms. If you're designing a multi-tenant system, talk to our team or request a quote. You can also explore our solutions page and our trust and compliance approach for how we handle tenant isolation guarantees.
Topics covered
Related Guides
Enterprise Guide to Agentic AI Systems
Technical guide to agentic AI systems in enterprise environments. Learn the architecture, capabilities, and applications of autonomous AI agents.
Read guideAgentic Commerce: How to Let AI Agents Buy Things Safely
How to design governed AI agent-initiated commerce. Policy engines, HITL approval gates, HMAC receipts, idempotency, tenant scoping, and the full Agentic Checkout Protocol.
Read guideThe 9 Places Your AI System Leaks Data (and How to Seal Each One)
A systematic map of every place data leaks in AI systems. Prompts, embeddings, logs, tool calls, agent memory, error messages, cache, fine-tuning data, and agent handoffs.
Read guideReady to build production AI systems?
Our team specializes in building production-ready AI systems. Let's discuss how we can help transform your enterprise with cutting-edge technology.
Start a conversation