Concurrency and Data Integrity: The Patterns That Saved Our Production
Production concurrency patterns for enterprise systems. Field ownership, optimistic locking, cooperative leases, idempotency stores, version management, and transaction governance layers.
The Race Condition You Don't See Until Production
Race conditions are invisible in development. Your local machine runs one process. Your test suite runs sequentially. Everything works. Then you deploy to production with 4 web pods and 3 worker pods, and two processes modify the same record at the same time. One overwrites the other's changes. Data is silently corrupted. Nobody notices until a customer complains.
We've fixed race conditions across multiple enterprise systems: CMS platforms with 20 editors and background workers, commerce platforms with concurrent order processing, and AI systems with parallel agent workflows. The patterns in this article are what survived.
For broader context, see our system architecture guide and event-driven architecture guide. For CMS-specific concurrency, our Pimcore workflow guide covers those patterns in depth.
Field Ownership: Who Can Write What
The root cause of most enterprise concurrency bugs: multiple writers modifying the same record through the same save path without coordination.
A CMS has editors writing product descriptions and workers generating thumbnails. Both call save(). Both persist the full object. If the worker saves after loading but before the editor saves, the editor's save overwrites the worker's thumbnail. If the editor saves first, the worker's save overwrites the editor's description.
The fix: assign every field to an owner.
field_ownership:
Product:
editor_owned:
- name
- description
- images
system_owned:
- thumbnail
- searchIndex
- checksum
- lastSyncTimestamp
shared:
- categories
- price
- availability
| Domain | Owner | Mutation Path | Conflict Strategy |
|---|---|---|---|
| Editor-owned | Admin users | Standard save | No conflict (only editors write) |
| System-owned | Workers/integrations | Transaction layer with locks | Retry on conflict |
| Shared | Both | Transaction layer with conflict resolution | Configurable: retry, skip, merge |
Editor-owned fields go through the standard save path. System-owned fields go through a transaction layer with locks and version checks. Shared fields use explicit conflict resolution strategies.
Without field ownership, you're relying on luck. With it, the system enforces who can write what and resolves conflicts deterministically.
Optimistic Locking with Version Checks
Optimistic locking assumes conflicts are rare. Instead of locking before modification, it checks whether the record changed between load and save.
async function updateWithOptimisticLock(
productId: string,
updateFn: (product: Product) => void,
maxRetries: number = 3,
): Promise<Product> {
for (let attempt = 0; attempt < maxRetries; attempt++) {
const product = await productRepo.findById(productId, { force: true });
const versionBefore = product.versionCount;
updateFn(product);
// Check version hasn't changed before saving
const currentVersion = await productRepo.getVersionCount(productId);
if (currentVersion !== versionBefore) {
if (attempt === maxRetries - 1) {
throw new ConcurrencyError(
`Product ${productId} was modified concurrently (version ${versionBefore} -> ${currentVersion})`
);
}
continue; // Retry with fresh data
}
await product.save();
return product;
}
}
The version check is not atomic in this example. For true atomicity, use database-level support:
-- PostgreSQL: atomic version check + update
UPDATE products
SET name = $1, version_count = version_count + 1
WHERE id = $2 AND version_count = $3;
-- If 0 rows affected: concurrent modification detected
// TypeORM: @VersionColumn for automatic optimistic locking
@Entity()
class Product {
@VersionColumn()
version!: number;
// TypeORM automatically checks version on save
// Throws OptimisticLockVersionMismatchError on conflict
}
Optimistic locking works well when conflicts are rare (< 5% of writes). For high-contention scenarios (multiple workers processing the same record), use cooperative locks instead.
Cooperative Lease-Based Locks
When multiple workers compete for the same resource, a cooperative lock serializes access. Unlike distributed mutexes, cooperative locks use lease-based semantics: the lock expires after a TTL, preventing deadlocks from crashed workers.
// Redis: atomic SET NX EX with token-based ownership
class RedisLockProvider {
async acquire(key: string, ttlSeconds: number = 30): Promise<Lock | null> {
const token = crypto.randomBytes(16).toString('hex');
const acquired = await this.redis.set(key, token, 'NX', 'EX', ttlSeconds);
return acquired ? new Lock(key, token, ttlSeconds) : null;
}
async release(lock: Lock): Promise<void> {
// Atomic check-and-delete via Lua script
const script = `
if redis.call('get', KEYS[1]) == ARGV[1] then
return redis.call('del', KEYS[1])
else
return 0
end
`;
await this.redis.eval(script, 1, lock.key, lock.token);
}
async extend(lock: Lock, ttlSeconds: number): Promise<boolean> {
const script = `
if redis.call('get', KEYS[1]) == ARGV[1] then
return redis.call('expire', KEYS[1], ARGV[2])
else
return 0
end
`;
return !!(await this.redis.eval(script, 1, lock.key, lock.token, ttlSeconds));
}
}
Lock Heartbeat
Lock TTL alone is not enough. If an operation runs longer than expected, the lock expires and another worker acquires it. Now two workers run concurrently.
async function executeWithLock(key: string, operation: () => Promise<void>) {
const lock = await lockProvider.acquire(key, 30);
if (!lock) throw new LockError(`Could not acquire lock: ${key}`);
// Heartbeat: extend lock if operation takes too long
const heartbeat = setInterval(async () => {
const extended = await lockProvider.extend(lock, 30);
if (!extended) {
clearInterval(heartbeat);
// Lock was stolen, abort operation
throw new LockError(`Lock stolen during operation: ${key}`);
}
}, 15000); // Extend every 15s (50% of TTL)
try {
await operation();
} finally {
clearInterval(heartbeat);
await lockProvider.release(lock);
}
}
Lock Scope Hierarchy
Different operations need different lock granularity:
| Scope | Key Pattern | Use Case |
|---|---|---|
| Element | lock:product:123 | Full object save |
| Field group | lock:product:123:generatedAssets | Partial update (thumbnails only) |
| Operation | lock:product:123:thumbnail:en | Single specific operation |
Same product + same field group -> wait/retry (serial)
Same product + different groups -> parallel (safe)
Different products -> always parallel
Narrower scopes allow more parallelism. A thumbnail worker and a search indexer can process the same product simultaneously if they lock different field groups.
Idempotency Stores
Network retries, message redelivery, and workflow replays cause the same operation to execute multiple times. Without idempotency, you get duplicate records, double charges, or repeated emails.
interface IdempotencyEntry {
key: string; // Business-meaningful key
scope: string; // Operation category
status: string; // PENDING | COMPLETED | FAILED
requestHash: string; // SHA-256 of normalized input
resultId?: string; // ID of created resource
expiresAt: Date; // TTL for cleanup
createdAt: Date;
}
class IdempotencyStore {
async checkAndAcquire(key: string, scope: string, requestHash: string): Promise<IdempotencyResult> {
try {
await this.db.insert('idempotency_keys', {
key, scope, requestHash,
status: 'PENDING',
expires_at: new Date(Date.now() + 24 * 60 * 60 * 1000),
});
return { acquired: true };
} catch (error) {
if (isDuplicateKeyError(error)) {
const existing = await this.db.findOne({ key, scope });
if (existing.status === 'COMPLETED') {
return { acquired: false, cached: true, resultId: existing.resultId };
}
if (existing.status === 'PENDING' && isStale(existing)) {
// Stale PENDING: previous attempt crashed, allow retry
await this.db.update({ key, scope }, { status: 'PENDING', createdAt: new Date() });
return { acquired: true };
}
return { acquired: false, inProgress: true };
}
throw error;
}
}
async complete(key: string, scope: string, resultId: string): Promise<void> {
await this.db.update({ key, scope }, { status: 'COMPLETED', resultId });
}
}
Key Design
The idempotency key must be business-meaningful:
| Operation | Key Components | Example |
|---|---|---|
| Add to wishlist | customerId + productVariantId + wishlistId | wish:cust_123:var_456:wl_789 |
| Submit review | customerId + productId | review:cust_123:prod_456 |
| Send notification | recipient + category + entityRef + dayBucket | notify:sara@beispiel.de:stock:var_456:2026-04-20 |
| Redeem loyalty points | customerId + orderId + points | redeem:cust_123:ord_789:500 |
| ERP import record | sourceRecordId + importBatchId | import:erp_456:batch_20260420 |
Two distinct idempotency models:
- API idempotency: For user-initiated mutations. Client provides key or it's generated from input hash. Cached response replayed on duplicate.
- Job idempotency: For background processing. Dedupe key in job payload. Uses DB constraints, completion markers, or business-key checks.
Never mix them. They have different lifecycles and cleanup strategies.
The Version Explosion Problem
In systems where background workers call save(), every save creates a version. With 6 workers processing every product change, a single editor save generates 6+ unnecessary versions. Over months, products accumulate thousands of versions that consume storage, slow down the version history UI, and make actual editorial changes impossible to find.
The solution: scoped version guards that suppress version creation during system operations while preserving versions for editor saves.
// Reference-counted version guard
class ScopedVersionGuard {
private static refCount = 0;
suppress(): void {
ScopedVersionGuard.refCount++;
if (ScopedVersionGuard.refCount === 1) {
VersionManager.disable();
}
}
restore(): void {
ScopedVersionGuard.refCount--;
if (ScopedVersionGuard.refCount === 0) {
VersionManager.enable();
}
}
}
// Usage: nested operations work correctly
const outerGuard = new ScopedVersionGuard();
outerGuard.suppress();
try {
product.setThumbnail(asset);
product.save(); // No version created
const innerGuard = new ScopedVersionGuard();
innerGuard.suppress();
try {
product.setChecksum(hash);
product.save(); // Still no version
} finally {
innerGuard.restore(); // refCount goes from 2 to 1, still suppressed
}
} finally {
outerGuard.restore(); // refCount goes from 1 to 0, versions re-enabled
}
The result: editor saves create versions (audit trail preserved). Worker saves create operation log entries (observability without version bloat).
For how we implement this in Pimcore specifically, see our Pimcore workflow guide which covers PimTx's version guard in detail.
Common Pitfalls
-
No field ownership. Without it, every writer competes for every field through the same save path. Define who owns what before writing the first line of concurrent code.
-
Optimistic locking without retry. Detecting the conflict is not enough. The operation must retry with fresh data. Set a max retry count and handle exhaustion gracefully.
-
Global lock disable. A global
disable()flag breaks when multiple operations run concurrently. Use reference-counted, scoped guards. -
Idempotency keys without business meaning. A random UUID as idempotency key prevents nothing. The key must encode the business operation: who, what, when.
-
No heartbeat on long-running locks. If the operation outlasts the TTL, the lock expires and another worker enters. Extend the lock at 50% of TTL.
-
Ignoring stale PENDING entries. If a worker crashes while holding a PENDING idempotency key, the operation is permanently blocked. Detect and recover stale entries.
-
Locking at the wrong granularity. Element-level locks serialize everything. Field-group locks allow parallelism where it's safe. Choose the narrowest safe scope.
-
No version management strategy. Every
save()creating a version is the default. In systems with workers, this creates thousands of useless versions. Suppress versions for system operations.
Key Takeaways
-
Field ownership prevents the most common race condition. Define which fields belong to editors, which to workers, and which are shared. The ownership registry determines locking strategy and conflict resolution.
-
Optimistic locking for low-contention writes. Check the version count before saving. Retry on conflict. Use database-level support (TypeORM
@VersionColumn, PostgreSQL conditional update) for atomicity. -
Cooperative leases for high-contention resources. Redis SET NX EX with token-based ownership. Heartbeat at 50% TTL. Lua scripts for atomic operations. Never use distributed mutexes without TTL.
-
Idempotency keys must be business-meaningful. Encode the operation semantics (who + what + when) into the key. Separate API idempotency from job idempotency.
-
Version guards preserve audit trails without explosion. Reference-counted suppression for system operations. Nesting works correctly. Editor saves still create versions.
We apply these patterns across our custom software projects and data engineering pipelines. If you're dealing with concurrency issues in production, talk to our team or request a quote.
Topics covered
Related Guides
Event-Driven Architecture in Practice: What Actually Goes Wrong
Real event-driven architecture patterns from production. Event storms, bidirectional sync loops, dead letters, idempotency stores, and choosing between Kafka, RabbitMQ, BullMQ, and Symfony Messenger.
Read guidePimcore Workflow Design for Enterprise: The Architecture Behind 20 Editors
How to design Pimcore workflows for enterprise teams. Three-layer state separation, field ownership, event control, version management, and ERP import safety.
Read guideEnterprise Guide to Agentic AI Systems
Technical guide to agentic AI systems in enterprise environments. Learn the architecture, capabilities, and applications of autonomous AI agents.
Read guideReady to build production AI systems?
Our team specializes in building production-ready AI systems. Let's discuss how we can help transform your enterprise with cutting-edge technology.
Start a conversation