Product Data Systems That Actually Work: From ERP to Channel
How to design product data pipelines. ERP to PIM to search to commerce to export. Classification systems, variant management, asset pipelines, and multi-channel distribution.
The Product Data Problem Nobody Talks About
Product data looks simple in a spreadsheet. Name, description, price, image. Then reality hits. You have 50,000 products with 200 attributes each, in 12 languages, from 3 source systems, distributed to 5 output channels, and the data quality is inconsistent across all of them.
The real problem is not storing product data. Any database does that. The real problem is the pipeline: how data flows from source systems through enrichment to output channels, with validation at every stage and different formats for every destination.
We've designed product data systems for B2B manufacturers with complex classification hierarchies, multi-locale content, ERP integration, and multi-channel output (web, search, marketplace, export). This article covers the architecture patterns. For PIM-specific implementation, see our PIM implementation guide. For Pimcore workflow patterns, see our Pimcore workflow guide.
The Pipeline: Source to Channel
ββββββββββββββββ ββββββββββββββββ ββββββββββββββββ
β SOURCES β β MASTER β β CHANNELS β
β β β β β β
β ERP/SAP ββββββΆβ PIM System ββββββΆβ Website β
β Suppliers β β (Pimcore, β β Search Index β
β Spreadsheetsβ β Akeneo) β β Marketplace β
β Manual entryβ β β β Print/PDF β
β β β Enrich β β Partner API β
β β β Validate β β Data Feed β
β β β Approve β β β
ββββββββββββββββ ββββββββββββββββ ββββββββββββββββ
Source Layer
Products enter the system from multiple sources. Each source has different data quality, different formats, and different update frequencies.
| Source | Data Quality | Format | Frequency |
|---|---|---|---|
| ERP (SAP, Oracle) | Structured, reliable | API / flat file | Daily batch or real-time |
| Supplier feeds | Variable, often messy | CSV, XML, JSON | Weekly or on-demand |
| Manual entry | High quality, low volume | PIM admin UI | Continuous |
| Spreadsheet imports | Error-prone | XLSX, CSV | Ad-hoc |
The import layer must handle: field mapping (supplier calls it "item_name", ERP calls it "MATNR"), data transformation (price in EUR to cents), deduplication (same product from two sources), and conflict resolution (which source wins for which field).
This is exactly the problem our Vendure Data Hub Plugin solves with 9 extractors, 61 transform operators, and configurable field mapping.
Master Layer (PIM)
The PIM is the single source of truth for product data. Every field has one authoritative value. Every change is versioned. Every product goes through an editorial workflow before publication.
Key PIM responsibilities:
- Data enrichment: Add descriptions, images, translations that don't come from the ERP
- Classification: Assign products to hierarchical categories with typed attributes
- Validation: Ensure required fields are filled before publication
- Workflow: Editorial review and approval before data reaches output channels
- Versioning: Track every change, support draft editing without affecting live data
Channel Layer
Each output channel needs product data in a different format:
| Channel | Format | Content | Update Frequency |
|---|---|---|---|
| Website | JSON API | Full product with images, descriptions, variants | Real-time (event-driven) |
| Search index | Denormalized document | Searchable fields, facets, prices | Near real-time |
| Marketplace | Feed XML/CSV | Platform-specific fields, categories | Scheduled (hourly/daily) |
| Print/PDF | Structured data | Selected fields, high-res images | On-demand |
| Partner API | REST/GraphQL | Contracted fields only | Real-time |
| Data feed | CSV/XML | Google Merchant, Meta Catalog | Scheduled |
The same product data, transformed differently for each channel. The PIM stores the master data. The distribution layer transforms and delivers.
Classification Systems
Products have attributes. A shoe has size, color, material, and sole type. A faucet has flow rate, connection type, finish, and certification. A server has CPU, RAM, storage, and rack units.
Flat Attributes vs Classification Store
Flat attributes (columns on the product table) work for simple catalogs with uniform products. Every product has the same fields.
Classification stores (dynamic key-value with groups) work for diverse catalogs where different product types have different attributes.
Classification Store:
βββ Group: Dimensions
β βββ Key: width (float, unit: mm)
β βββ Key: height (float, unit: mm)
β βββ Key: depth (float, unit: mm)
βββ Group: Technical
β βββ Key: flow_rate (float, unit: l/min)
β βββ Key: pressure (float, unit: bar)
β βββ Key: connection_type (select: 3/8", 1/2", 3/4")
βββ Group: Certifications
β βββ Key: ce_mark (boolean)
β βββ Key: tuv (boolean)
β βββ Key: energy_label (select: A-G)
Classification stores scale to thousands of attributes without schema changes. New attributes are configuration, not migration. But they're harder to query (key-value lookups instead of column access) and harder to validate (schema is dynamic).
In Pimcore, the Classification Store provides this capability out of the box with localized values, group-based organization, and admin UI integration.
Variant Management
Products with variants (size, color, configuration) are the source of most data complexity.
Variant Architecture
Product (parent)
βββ name: "Running Shoe Pro"
βββ description: "..."
βββ brand: "..."
βββ images: [hero.jpg, detail.jpg]
β
βββ Variant: Size 40, Black
β βββ sku: "RSP-40-BLK"
β βββ price: 12900 (cents)
β βββ stock: 15
β βββ ean: "4012345678901"
β
βββ Variant: Size 40, White
β βββ sku: "RSP-40-WHT"
β βββ price: 12900
β βββ stock: 8
β βββ ean: "4012345678902"
β
βββ Variant: Size 42, Black
βββ sku: "RSP-42-BLK"
βββ price: 12900
βββ stock: 0 (out of stock)
βββ ean: "4012345678903"
Inherited vs variant-specific data:
- Inherited (from parent): name, description, brand, category, shared images
- Variant-specific: SKU, price, stock, EAN, size, color, variant-specific images
The inheritance model reduces duplication. Change the product description once, it updates for all variants. But variant-specific overrides must be possible (different price per size, different image per color).
The Combinatorial Explosion
A product with 5 sizes and 8 colors has 40 variants. Add 3 materials and you have 120. Add 2 widths and you have 240. Most of these combinations don't actually exist as real products.
Solutions:
- Explicit variants only: Create only the combinations that exist. No auto-generation.
- Availability matrix: Define which combinations are valid. Auto-generate only valid ones.
- Virtual variants: Calculate at query time from attribute sets. Don't store individual records.
Asset Pipeline
Product images, technical drawings, PDFs, and 3D models need their own pipeline.
Upload/Import
β
βββ Format validation (type, size, resolution)
βββ Metadata extraction (EXIF, dimensions)
βββ Thumbnail generation (multiple sizes)
βββ CDN distribution
βββ Association to product/variant
Asset Challenges
| Challenge | Solution |
|---|---|
| Multiple image sizes needed | Generate thumbnails on upload or on-demand |
| Images from suppliers are low quality | Minimum resolution requirements, rejection workflow |
| Asset-product association | Naming convention (SKU-based) or manual assignment |
| Storage costs at scale | Cloud storage (S3, Azure Blob) with CDN |
| Localized assets | Different images per locale (lifestyle vs technical) |
For Pimcore specifically, we documented a performance bug where asset dimension lookups trigger remote storage I/O on every page render. The fix is described in our Pimcore upgrade guide.
Multi-Channel Distribution
Event-Driven Distribution
When a product is published in the PIM, events trigger distribution to each channel:
// PIM publishes product -> events trigger channel updates
eventBus.on('product.published', async (product) => {
await Promise.allSettled([
searchIndexer.index(product), // Update search
feedGenerator.queue(product), // Queue for marketplace feeds
cacheInvalidator.invalidate(product.id), // Invalidate website cache
partnerApi.notify(product.id), // Notify partner systems
]);
});
Each channel transformer converts the master data to channel-specific format:
// Website: full data with SEO fields
function toWebProduct(product: PimProduct): WebProduct {
return {
slug: product.slug,
name: product.name,
description: product.description,
seoTitle: product.seoTitle || product.name,
seoDescription: product.seoDescription || truncate(product.description, 160),
images: product.images.map(img => ({
url: cdn.getUrl(img, 'large'),
alt: img.alt || product.name,
})),
variants: product.variants.filter(v => v.active),
// ... full product data
};
}
// Marketplace feed: platform-specific format
function toGoogleMerchantItem(product: PimProduct, variant: PimVariant): MerchantItem {
return {
id: variant.sku,
title: `${product.name} - ${variant.size} ${variant.color}`,
description: stripHtml(product.description),
link: `https://shop.example.com/p/${product.slug}`,
image_link: cdn.getUrl(product.images[0], 'large'),
price: `${(variant.price / 100).toFixed(2)} EUR`,
availability: variant.stock > 0 ? 'in_stock' : 'out_of_stock',
brand: product.brand,
gtin: variant.ean,
condition: 'new',
};
}
Data Quality
Automated Validation
interface ValidationRule {
field: string;
check: (value: any, product: Product) => boolean;
message: string;
severity: 'error' | 'warning';
}
const VALIDATION_RULES: ValidationRule[] = [
{ field: 'name', check: (v) => v && v.length > 3, message: 'Name must be at least 3 characters', severity: 'error' },
{ field: 'description', check: (v) => v && v.length > 50, message: 'Description should be at least 50 characters', severity: 'warning' },
{ field: 'images', check: (v) => v && v.length > 0, message: 'At least one image required', severity: 'error' },
{ field: 'price', check: (v) => v && v > 0, message: 'Price must be positive', severity: 'error' },
{ field: 'ean', check: (v) => !v || isValidEan(v), message: 'Invalid EAN format', severity: 'error' },
{ field: 'categories', check: (v) => v && v.length > 0, message: 'At least one category required', severity: 'error' },
];
Run validation before publication. Block publication on errors. Show warnings but allow publication. Track data quality scores per product, per category, per supplier.
Common Pitfalls
-
No single source of truth. If the same product data lives in the ERP, the PIM, and the commerce system with no clear master, conflicts are inevitable.
-
ERP as the PIM. ERPs store operational data (SKU, price, stock). They don't handle rich content (descriptions, images, translations). Don't try to make them.
-
No field ownership. Without clear rules about which source owns which field, imports overwrite manual enrichment. See our Pimcore workflow guide for the field ownership pattern.
-
Same format for all channels. Each channel needs different data. A Google Merchant feed and a website API have different fields, formats, and update frequencies.
-
No validation before publication. Products go live with missing images, empty descriptions, or invalid EANs. Automated validation prevents this.
-
Ignoring variant complexity. Auto-generating all possible combinations creates thousands of phantom variants. Only create real, available combinations.
Key Takeaways
-
The pipeline is the architecture. Source to master to channel. Each stage has different responsibilities, different data formats, and different quality requirements.
-
The PIM is the single source of truth. Not the ERP, not the commerce system, not the spreadsheet. One authoritative master with versioning and workflow.
-
Classification stores handle attribute diversity. When different product types have different attributes, dynamic key-value with groups scales better than fixed columns.
-
Each channel gets its own transformation. Website, search, marketplace, print, and partner API all need different formats from the same master data.
-
Data quality is a system, not a step. Automated validation, quality scores, blocking rules on publication. Continuous, not one-time.
We design product data systems as part of our ecommerce and data engineering practice. If you need help with PIM architecture or product data pipelines, talk to our team or request a quote.
Topics covered
Related Guides
PIM Implementation Services: Pimcore, Akeneo & Enterprise Solutions
Expert PIM implementation, migration, and integration services. Pimcore, Akeneo, Salsify, inRiver specialists for product data management.
Read guideEnterprise Guide to Agentic AI Systems
Technical guide to agentic AI systems in enterprise environments. Learn the architecture, capabilities, and applications of autonomous AI agents.
Read guideAgentic Commerce: How to Let AI Agents Buy Things Safely
How to design governed AI agent-initiated commerce. Policy engines, HITL approval gates, HMAC receipts, idempotency, tenant scoping, and the full Agentic Checkout Protocol.
Read guideReady to build production AI systems?
Our team specializes in building production-ready AI systems. Let's discuss how we can help transform your enterprise with cutting-edge technology.
Start a conversation