Technical Guide

Product Data Systems That Actually Work: From ERP to Channel

How to design product data pipelines. ERP to PIM to search to commerce to export. Classification systems, variant management, asset pipelines, and multi-channel distribution.

March 21, 202614 min readOronts Engineering Team

The Product Data Problem Nobody Talks About

Product data looks simple in a spreadsheet. Name, description, price, image. Then reality hits. You have 50,000 products with 200 attributes each, in 12 languages, from 3 source systems, distributed to 5 output channels, and the data quality is inconsistent across all of them.

The real problem is not storing product data. Any database does that. The real problem is the pipeline: how data flows from source systems through enrichment to output channels, with validation at every stage and different formats for every destination.

We've designed product data systems for B2B manufacturers with complex classification hierarchies, multi-locale content, ERP integration, and multi-channel output (web, search, marketplace, export). This article covers the architecture patterns. For PIM-specific implementation, see our PIM implementation guide. For Pimcore workflow patterns, see our Pimcore workflow guide.

The Pipeline: Source to Channel

┌──────────────┐     ┌──────────────┐     ┌──────────────┐
│   SOURCES     │     │   MASTER      │     │   CHANNELS    │
│               │     │               │     │               │
│  ERP/SAP     │────▶│  PIM System   │────▶│  Website      │
│  Suppliers   │     │  (Pimcore,    │     │  Search Index │
│  Spreadsheets│     │   Akeneo)     │     │  Marketplace  │
│  Manual entry│     │               │     │  Print/PDF    │
│              │     │  Enrich       │     │  Partner API  │
│              │     │  Validate     │     │  Data Feed    │
│              │     │  Approve      │     │               │
└──────────────┘     └──────────────┘     └──────────────┘

Source Layer

Products enter the system from multiple sources. Each source has different data quality, different formats, and different update frequencies.

Source	Data Quality	Format	Frequency
ERP (SAP, Oracle)	Structured, reliable	API / flat file	Daily batch or real-time
Supplier feeds	Variable, often messy	CSV, XML, JSON	Weekly or on-demand
Manual entry	High quality, low volume	PIM admin UI	Continuous
Spreadsheet imports	Error-prone	XLSX, CSV	Ad-hoc

The import layer must handle: field mapping (supplier calls it "item_name", ERP calls it "MATNR"), data transformation (price in EUR to cents), deduplication (same product from two sources), and conflict resolution (which source wins for which field).

This is exactly the problem our Vendure Data Hub Plugin solves with 9 extractors, 61 transform operators, and configurable field mapping.

Master Layer (PIM)

The PIM is the single source of truth for product data. Every field has one authoritative value. Every change is versioned. Every product goes through an editorial workflow before publication.

Key PIM responsibilities:

Data enrichment: Add descriptions, images, translations that don't come from the ERP
Classification: Assign products to hierarchical categories with typed attributes
Validation: Ensure required fields are filled before publication
Workflow: Editorial review and approval before data reaches output channels
Versioning: Track every change, support draft editing without affecting live data

Channel Layer

Each output channel needs product data in a different format:

Channel	Format	Content	Update Frequency
Website	JSON API	Full product with images, descriptions, variants	Real-time (event-driven)
Search index	Denormalized document	Searchable fields, facets, prices	Near real-time
Marketplace	Feed XML/CSV	Platform-specific fields, categories	Scheduled (hourly/daily)
Print/PDF	Structured data	Selected fields, high-res images	On-demand
Partner API	REST/GraphQL	Contracted fields only	Real-time
Data feed	CSV/XML	Google Merchant, Meta Catalog	Scheduled

The same product data, transformed differently for each channel. The PIM stores the master data. The distribution layer transforms and delivers.

Classification Systems

Products have attributes. A shoe has size, color, material, and sole type. A faucet has flow rate, connection type, finish, and certification. A server has CPU, RAM, storage, and rack units.

Flat Attributes vs Classification Store

Flat attributes (columns on the product table) work for simple catalogs with uniform products. Every product has the same fields.

Classification stores (dynamic key-value with groups) work for diverse catalogs where different product types have different attributes.

Classification Store:
├── Group: Dimensions
│   ├── Key: width (float, unit: mm)
│   ├── Key: height (float, unit: mm)
│   └── Key: depth (float, unit: mm)
├── Group: Technical
│   ├── Key: flow_rate (float, unit: l/min)
│   ├── Key: pressure (float, unit: bar)
│   └── Key: connection_type (select: 3/8", 1/2", 3/4")
├── Group: Certifications
│   ├── Key: ce_mark (boolean)
│   ├── Key: tuv (boolean)
│   └── Key: energy_label (select: A-G)

Classification stores scale to thousands of attributes without schema changes. New attributes are configuration, not migration. But they're harder to query (key-value lookups instead of column access) and harder to validate (schema is dynamic).

In Pimcore, the Classification Store provides this capability out of the box with localized values, group-based organization, and admin UI integration.

Variant Management

Products with variants (size, color, configuration) are the source of most data complexity.

Variant Architecture

Product (parent)
├── name: "Running Shoe Pro"
├── description: "..."
├── brand: "..."
├── images: [hero.jpg, detail.jpg]
│
├── Variant: Size 40, Black
│   ├── sku: "RSP-40-BLK"
│   ├── price: 12900  (cents)
│   ├── stock: 15
│   └── ean: "4012345678901"
│
├── Variant: Size 40, White
│   ├── sku: "RSP-40-WHT"
│   ├── price: 12900
│   ├── stock: 8
│   └── ean: "4012345678902"
│
└── Variant: Size 42, Black
    ├── sku: "RSP-42-BLK"
    ├── price: 12900
    ├── stock: 0  (out of stock)
    └── ean: "4012345678903"

Inherited vs variant-specific data:

Inherited (from parent): name, description, brand, category, shared images
Variant-specific: SKU, price, stock, EAN, size, color, variant-specific images

The inheritance model reduces duplication. Change the product description once, it updates for all variants. But variant-specific overrides must be possible (different price per size, different image per color).

The Combinatorial Explosion

A product with 5 sizes and 8 colors has 40 variants. Add 3 materials and you have 120. Add 2 widths and you have 240. Most of these combinations don't actually exist as real products.

Solutions:

Explicit variants only: Create only the combinations that exist. No auto-generation.
Availability matrix: Define which combinations are valid. Auto-generate only valid ones.
Virtual variants: Calculate at query time from attribute sets. Don't store individual records.

Asset Pipeline

Product images, technical drawings, PDFs, and 3D models need their own pipeline.

Upload/Import
  │
  ├── Format validation (type, size, resolution)
  ├── Metadata extraction (EXIF, dimensions)
  ├── Thumbnail generation (multiple sizes)
  ├── CDN distribution
  └── Association to product/variant

Asset Challenges

Challenge	Solution
Multiple image sizes needed	Generate thumbnails on upload or on-demand
Images from suppliers are low quality	Minimum resolution requirements, rejection workflow
Asset-product association	Naming convention (SKU-based) or manual assignment
Storage costs at scale	Cloud storage (S3, Azure Blob) with CDN
Localized assets	Different images per locale (lifestyle vs technical)

For Pimcore specifically, we documented a performance bug where asset dimension lookups trigger remote storage I/O on every page render. The fix is described in our Pimcore upgrade guide.

Multi-Channel Distribution

Event-Driven Distribution

When a product is published in the PIM, events trigger distribution to each channel:

// PIM publishes product -> events trigger channel updates
eventBus.on('product.published', async (product) => {
    await Promise.allSettled([
        searchIndexer.index(product),           // Update search
        feedGenerator.queue(product),            // Queue for marketplace feeds
        cacheInvalidator.invalidate(product.id), // Invalidate website cache
        partnerApi.notify(product.id),           // Notify partner systems
    ]);
});

Each channel transformer converts the master data to channel-specific format:

// Website: full data with SEO fields
function toWebProduct(product: PimProduct): WebProduct {
    return {
        slug: product.slug,
        name: product.name,
        description: product.description,
        seoTitle: product.seoTitle || product.name,
        seoDescription: product.seoDescription || truncate(product.description, 160),
        images: product.images.map(img => ({
            url: cdn.getUrl(img, 'large'),
            alt: img.alt || product.name,
        })),
        variants: product.variants.filter(v => v.active),
        // ... full product data
    };
}

// Marketplace feed: platform-specific format
function toGoogleMerchantItem(product: PimProduct, variant: PimVariant): MerchantItem {
    return {
        id: variant.sku,
        title: `${product.name} - ${variant.size} ${variant.color}`,
        description: stripHtml(product.description),
        link: `https://shop.example.com/p/${product.slug}`,
        image_link: cdn.getUrl(product.images[0], 'large'),
        price: `${(variant.price / 100).toFixed(2)} EUR`,
        availability: variant.stock > 0 ? 'in_stock' : 'out_of_stock',
        brand: product.brand,
        gtin: variant.ean,
        condition: 'new',
    };
}

Data Quality

Automated Validation

interface ValidationRule {
    field: string;
    check: (value: any, product: Product) => boolean;
    message: string;
    severity: 'error' | 'warning';
}

const VALIDATION_RULES: ValidationRule[] = [
    { field: 'name', check: (v) => v && v.length > 3, message: 'Name must be at least 3 characters', severity: 'error' },
    { field: 'description', check: (v) => v && v.length > 50, message: 'Description should be at least 50 characters', severity: 'warning' },
    { field: 'images', check: (v) => v && v.length > 0, message: 'At least one image required', severity: 'error' },
    { field: 'price', check: (v) => v && v > 0, message: 'Price must be positive', severity: 'error' },
    { field: 'ean', check: (v) => !v || isValidEan(v), message: 'Invalid EAN format', severity: 'error' },
    { field: 'categories', check: (v) => v && v.length > 0, message: 'At least one category required', severity: 'error' },
];

Run validation before publication. Block publication on errors. Show warnings but allow publication. Track data quality scores per product, per category, per supplier.

Common Pitfalls

No single source of truth. If the same product data lives in the ERP, the PIM, and the commerce system with no clear master, conflicts are inevitable.
ERP as the PIM. ERPs store operational data (SKU, price, stock). They don't handle rich content (descriptions, images, translations). Don't try to make them.
No field ownership. Without clear rules about which source owns which field, imports overwrite manual enrichment. See our Pimcore workflow guide for the field ownership pattern.
Same format for all channels. Each channel needs different data. A Google Merchant feed and a website API have different fields, formats, and update frequencies.
No validation before publication. Products go live with missing images, empty descriptions, or invalid EANs. Automated validation prevents this.
Ignoring variant complexity. Auto-generating all possible combinations creates thousands of phantom variants. Only create real, available combinations.

Key Takeaways

The pipeline is the architecture. Source to master to channel. Each stage has different responsibilities, different data formats, and different quality requirements.
The PIM is the single source of truth. Not the ERP, not the commerce system, not the spreadsheet. One authoritative master with versioning and workflow.
Classification stores handle attribute diversity. When different product types have different attributes, dynamic key-value with groups scales better than fixed columns.
Each channel gets its own transformation. Website, search, marketplace, print, and partner API all need different formats from the same master data.
Data quality is a system, not a step. Automated validation, quality scores, blocking rules on publication. Continuous, not one-time.

We design product data systems as part of our ecommerce and data engineering practice. If you need help with PIM architecture or product data pipelines, talk to our team or request a quote.

Topics covered

PIM architectureproduct data managementPimcore vs Akeneoproduct data pipelineMDMproduct information managementproduct data quality

Ready to build production AI systems?

Our team specializes in building production-ready AI systems. Let's discuss how we can help transform your enterprise with cutting-edge technology.

Start a conversation

Product Data Systems That Actually Work: From ERP to Channel

The Product Data Problem Nobody Talks About

The Pipeline: Source to Channel

Source Layer

Master Layer (PIM)

Channel Layer

Classification Systems

Flat Attributes vs Classification Store

Variant Management

Variant Architecture

The Combinatorial Explosion

Asset Pipeline

Asset Challenges

Multi-Channel Distribution

Event-Driven Distribution

Data Quality

Automated Validation

Common Pitfalls

Key Takeaways

Topics covered

Related Guides

PIM Implementation Services: Pimcore, Akeneo & Enterprise Solutions

Enterprise Guide to Agentic AI Systems

Agentic Commerce: How to Let AI Agents Buy Things Safely

Ready to build production AI systems?

Get the Latest AI Insights

Services

Solutions

Company

Resources

Legal