All work
Open Source2024Ongoing

Vendure Data Hub

Enterprise ETL and data integration plugin for Vendure. Visual pipeline builder, 9 extractors, 61 transform operators, 24 entity loaders, and feed generators for Google Merchant and Amazon.

At a glance

9
Extractors (verifiable in the repository)
61
Transform operators with dry-run preview
24
Vendure entity loaders
4
Marketplace feed generators

The Challenge

Vendure projects keep rebuilding the same plumbing: product imports from ERP and PIM systems, inventory sync, price updates, marketplace feeds. Each integration starts from zero, ships as one-off scripts, and breaks silently when a supplier changes a column. The ecosystem had no production-grade, reusable data pipeline layer.

Our Approach

We built Data Hub as a first-class Vendure plugin: declarative pipelines composed from extractors (CSV, JSON, XML, REST, GraphQL, FTP, S3 and more), 61 transform operators with dry-run preview, and loaders for 24 Vendure entity types. Pipelines run on schedules or webhooks, with retries, idempotent upserts, real-time logs and a visual editor in the admin UI. Feed generators publish Google Merchant and Amazon feeds from the same pipeline graph.

System Architecture

Loading diagram...

System Architecture: Product Event, Event Subscriber, Delta Capture, HMAC-Signed Webhook, Target System, Pimcore Webhook, Signature Validation, Data Class Mapping, Transactional Write, Failure, Dead Letter Queue, Auto Retry

Engineering decisions

Declarative pipelines, not bespoke scripts

Each integration is configuration a reviewer can read, not a one-off script buried in a repository. Extractors, transforms and loaders compose into a pipeline graph. The tradeoff is a short learning curve in exchange for integrations that survive staff changes and supplier quirks.

Idempotent upserts by default

Suppliers resend files and jobs retry, so every loader is keyed and safe to run twice. Re-running a pipeline converges to the same catalog state instead of duplicating products. This depends on stable external keys, which becomes part of the integration contract.

Dry-run preview before any write

A bad transform on a live catalog is expensive to undo. Every pipeline previews the exact output of each operator before data touches Vendure, so mistakes are caught in review rather than in production.

A first-class Vendure plugin, not a sidecar

Data Hub runs inside Vendure, using its entity model, permissions and admin UI instead of a separate service to operate and secure. It is coupled to the Vendure lifecycle on purpose: one system to deploy, one place to watch.

Tech Stack

Backend
TypeScriptNestJSVendure
Infrastructure
DockerGitHub Actions
Frontend
ReactAdmin UI Extension

Key Outcomes

  • Replaces ad-hoc import scripts with declarative, monitored pipelines
  • Idempotent upserts make re-runs safe by design
  • Dry-run preview shows every transform before data touches the shop
  • Published open source: the code is the reference

The Result

A single plugin replaces the integration scripts of a typical commerce project. Published open source on GitHub; production-tested with high-volume catalog imports and verifiable down to every operator in the repository.

What a commerce build on Data Hub looks like

On a client engagement, Data Hub is the integration layer, so the team can focus on the storefront and the business instead of the plumbing.

  • We connect your ERP, PIM and supplier feeds as declarative pipelines
  • Mappings are previewed with dry-run before anything reaches the live catalog
  • Pipelines run on schedules or webhooks with retries, logs and idempotent upserts
  • Google Merchant and Amazon feeds publish from the same pipeline graph
  • You keep the plugin and the pipelines: it is open source and yours to extend