E-Commerce Search Architecture: MeiliSearch, OpenSearch, and Real Migration Stories
How to design product search for commerce. MeiliSearch vs OpenSearch vs Elasticsearch, index design, faceted search, multilingual strategies, hybrid search, and real-time sync from PIM and commerce systems.
Why Product Search Is Not Text Search
Product search looks simple. A user types "blue running shoes size 42" and expects relevant results. But the implementation is fundamentally different from document search or web search. Products have structured attributes (size, color, price, brand), hierarchical categories, availability that changes in real time, localized names and descriptions, and facets that users expect to filter by.
A document search engine finds documents that match a query. A product search engine must find products, rank them by commercial relevance (not just text relevance), present filterable facets, handle typos and synonyms, support multiple languages, and update in real time when inventory changes.
We've built product search systems on both MeiliSearch and OpenSearch, migrating from Elasticsearch 7.4 in one case and building from scratch in another. This article covers the architecture decisions, not the configuration details. For vector search patterns specifically, see our vector search architecture guide. For the broader commerce context, see our ecommerce platforms guide.
MeiliSearch vs OpenSearch vs Elasticsearch
| Criteria | MeiliSearch | OpenSearch | Elasticsearch |
|---|---|---|---|
| Language | Rust | Java | Java |
| Typo tolerance | Built-in, excellent | Plugin/custom | Plugin/custom |
| Faceted search | Built-in, fast | Built-in (aggregations) | Built-in (aggregations) |
| Vector search | Experimental | Built-in (k-NN) | Built-in (dense_vector) |
| Multilingual | Good (language-specific tokenizers) | Excellent (analyzers per field) | Excellent (analyzers per field) |
| Real-time indexing | Near instant (< 50ms) | Near real-time (1s refresh) | Near real-time (1s refresh) |
| Complexity | Low (single binary, REST API) | High (cluster, shards, replicas) | High (cluster, shards, replicas) |
| Memory usage | Low (Rust, efficient) | High (JVM heap) | High (JVM heap) |
| Operational cost | Low (runs on small instances) | Medium to high | Medium to high |
| Sorting | Built-in ranking rules | Flexible sort | Flexible sort |
| License | MIT | Apache 2.0 | SSPL (not open source) |
| Best for | Small to medium catalogs (< 500K products) | Large catalogs, complex queries, hybrid search | Same as OpenSearch (if already invested) |
When to Choose MeiliSearch
- Catalog under 500K products
- Typo tolerance is critical (consumer-facing search)
- Team has limited search infrastructure experience
- Fast setup matters more than advanced query features
- Budget is tight (runs on a single small instance)
When to Choose OpenSearch
- Catalog over 100K products with complex facets
- Need hybrid search (text + vector / k-NN)
- Multiple consumer groups process the same index
- Already on AWS (OpenSearch Serverless is managed)
- Need advanced aggregations and analytics on search data
When to Choose Elasticsearch
- Already running Elasticsearch and no reason to migrate
- Need specific Elastic-only features (ML inference, security)
- Enterprise support contract is required
For most new commerce projects, we recommend MeiliSearch for simplicity or OpenSearch for power. Elasticsearch's SSPL license makes it less attractive for new deployments.
Index Design
The most common mistake: indexing your database schema directly. Product tables are normalized. Search indices must be denormalized.
// Database: normalized (relational)
// products table: id, name, category_id, brand_id
// categories table: id, name, parent_id
// variants table: id, product_id, sku, price, size, color
// translations table: id, product_id, locale, name, description
// Search index: denormalized (flat document)
interface ProductSearchDocument {
id: string;
name: string; // current locale
description: string; // current locale
slug: string;
sku: string[]; // all variant SKUs
brand: string; // denormalized from brand table
categories: string[]; // full hierarchy: ["Shoes", "Running", "Trail"]
categoryIds: string[]; // for facet filtering
price: number; // lowest variant price (for sorting)
priceRange: { min: number; max: number };
sizes: string[]; // all available sizes
colors: string[]; // all available colors
inStock: boolean; // any variant in stock
imageUrl: string; // primary image
rating: number; // average review rating
reviewCount: number;
tags: string[]; // searchable tags
createdAt: number; // for "newest first" sorting
popularity: number; // sales count or view count
}
Rules for denormalization:
- Flatten all relations into the document (brand name, not brand ID)
- Include the full category hierarchy as an array (enables facet drill-down)
- Include all variant attributes (sizes, colors) as arrays on the product
- Use the lowest price for sorting, price range for display
- Include computed fields (rating, reviewCount, popularity) for ranking
- One document per product per locale (not one document with all locales)
One Index Per Locale
For multilingual commerce, create one index per locale:
products_en
products_de
products_fr
products_ar
Each index uses language-specific analyzers, tokenizers, and stop words. A German search for "Laufschuhe" uses German stemming. An Arabic search uses Arabic morphological analysis. Mixing locales in one index forces compromises on analysis that degrade quality for every language.
// MeiliSearch: one index per locale
await meili.createIndex('products_de', { primaryKey: 'id' });
await meili.index('products_de').updateSettings({
searchableAttributes: ['name', 'description', 'brand', 'tags', 'categories'],
filterableAttributes: ['categories', 'brand', 'sizes', 'colors', 'price', 'inStock'],
sortableAttributes: ['price', 'createdAt', 'popularity', 'rating'],
});
Faceted Search Architecture
Facets are the filters on the left side of every commerce search page. They look simple but require careful design.
Facet Types
| Type | Example | Implementation |
|---|---|---|
| Term facet | Brand: Nike (42), Adidas (38) | Term aggregation on brand field |
| Range facet | Price: 0-50 (15), 50-100 (28), 100+ (12) | Range aggregation on price field |
| Hierarchical facet | Category: Shoes > Running > Trail | Multi-level term aggregation on category hierarchy |
| Boolean facet | In Stock: Yes (89), No (11) | Term aggregation on inStock field |
| Color facet | Color swatches with counts | Term aggregation on colors array field |
| Size facet | Size: 40 (5), 41 (8), 42 (12) | Term aggregation on sizes array field |
Facet Interaction
When a user selects a facet, the other facets must update to reflect the filtered results. This is called "facet refinement" and is the most complex part of search UI.
// MeiliSearch: facet counts with active filters
const results = await meili.index('products_de').search('laufschuhe', {
filter: ['brand = "Nike"', 'inStock = true'],
facets: ['categories', 'brand', 'sizes', 'colors', 'price'],
});
// results.facetDistribution:
// {
// categories: { "Running": 42, "Trail": 18, "Road": 24 },
// brand: { "Nike": 42 }, // only Nike (because filtered)
// sizes: { "40": 5, "41": 8, "42": 12, "43": 10, "44": 7 },
// colors: { "Black": 20, "White": 15, "Blue": 7 },
// }
The key UX decision: when a brand filter is active, should the brand facet show only the selected brand (with its count) or all brands (with counts reflecting the current query minus the brand filter)? The second approach ("disjunctive faceting") lets users compare counts across brands. MeiliSearch supports this natively. OpenSearch requires separate aggregation queries per disjunctive facet.
Real-Time Sync from Source Systems
Search indices must stay in sync with the source of truth (PIM, commerce database, ERP). The sync architecture depends on the source system.
Event-Driven Sync (Recommended)
The source system emits events on data changes. A worker consumes events and updates the search index.
// Vendure: sync on product events
@Injectable()
export class SearchIndexSubscriber {
constructor(
private eventBus: EventBus,
private searchService: SearchIndexService,
) {
this.eventBus.ofType(ProductEvent).subscribe(async event => {
if (event.type === 'updated' || event.type === 'created') {
await this.searchService.indexProduct(event.ctx, event.product.id);
}
if (event.type === 'deleted') {
await this.searchService.removeProduct(event.product.id);
}
});
this.eventBus.ofType(ProductVariantEvent).subscribe(async event => {
// Variant change affects parent product's search document
await this.searchService.indexProduct(event.ctx, event.productVariant.productId);
});
}
}
Scheduled Full Reindex
Even with event-driven sync, run a scheduled full reindex as a safety net. Events can be lost (broker downtime, worker crash). A nightly full reindex catches anything that event-driven sync missed.
// Nightly full reindex job
async function fullReindex(locale: string) {
const batchSize = 500;
let offset = 0;
let products = [];
do {
products = await productService.findAll({ take: batchSize, skip: offset });
const documents = products.map(p => buildSearchDocument(p, locale));
await meili.index(`products_${locale}`).addDocuments(documents);
offset += batchSize;
} while (products.length === batchSize);
}
Handling Deletions
Product deletions are tricky. If you delete a product from the database, the event-driven sync removes it from the index. But if the event is lost, the deleted product stays in search results.
Two solutions:
- Track deletion timestamps and filter by "not deleted" in queries
- Full reindex replaces the entire index atomically (swap alias)
// Atomic reindex with alias swap (OpenSearch/Elasticsearch)
async function atomicReindex(locale: string) {
const newIndex = `products_${locale}_${Date.now()}`;
await opensearch.indices.create({ index: newIndex, body: indexSettings });
// Index all products into new index
await bulkIndex(newIndex, locale);
// Swap alias atomically
await opensearch.indices.updateAliases({
body: {
actions: [
{ remove: { index: `products_${locale}_*`, alias: `products_${locale}` } },
{ add: { index: newIndex, alias: `products_${locale}` } },
],
},
});
// Delete old indices
await cleanupOldIndices(`products_${locale}_*`, keepLast: 2);
}
For how we handle data sync pipelines at scale, our Vendure Data Hub Plugin implements all these patterns with 7 different search sinks.
Relevance Tuning
Default search relevance is wrong for commerce. Text relevance (how well the query matches the document) is one signal. Commercial relevance (how likely the user is to buy) is equally important.
Ranking Signals
| Signal | Weight | Source |
|---|---|---|
| Text match (title) | High | Search engine |
| Text match (description) | Medium | Search engine |
| In stock | Critical (boost or filter) | Inventory system |
| Popularity (sales count) | Medium | Order data |
| Review rating | Low-Medium | Reviews |
| Recency (new products) | Low | Product creation date |
| Margin (internal) | Optional | Business rules |
// MeiliSearch: custom ranking rules
await meili.index('products_de').updateSettings({
rankingRules: [
'words', // 1. Text match quality
'typo', // 2. Typo tolerance
'proximity', // 3. Word proximity
'attribute', // 4. Which field matched (title > description)
'sort', // 5. User-requested sort
'exactness', // 6. Exact vs partial match
'popularity:desc', // 7. Popular products rank higher
'rating:desc', // 8. Higher-rated products rank higher
],
});
Boosting In-Stock Products
Out-of-stock products should appear lower in results, not disappear entirely. Users might want to see upcoming products or subscribe to back-in-stock notifications.
// OpenSearch: boost in-stock products
const query = {
bool: {
must: [{ match: { searchText: userQuery } }],
should: [
{ term: { inStock: { value: true, boost: 5.0 } } }, // Strong boost for in-stock
],
filter: [
{ term: { tenant_id: tenantId } },
],
},
};
Hybrid Search for Commerce
Combining text search with vector search improves results for natural language queries while preserving exact match capability for SKUs and product codes.
// OpenSearch: hybrid search (text + vector)
const results = await opensearch.search({
index: 'products_en',
body: {
query: {
bool: {
should: [
// Text search (handles SKUs, exact product names)
{ multi_match: { query: userQuery, fields: ['name^3', 'description', 'sku^5', 'tags'], type: 'best_fields' } },
// Vector search (handles natural language, semantic similarity)
{ knn: { embedding: { vector: queryEmbedding, k: 20 } } },
],
},
},
// Facets
aggs: {
brands: { terms: { field: 'brand.keyword', size: 20 } },
categories: { terms: { field: 'categories.keyword', size: 30 } },
price_ranges: { range: { field: 'price', ranges: [{ to: 50 }, { from: 50, to: 100 }, { from: 100 }] } },
},
},
});
SKU queries ("ABC-12345") hit the text search path with high precision. Natural language queries ("comfortable shoes for long walks") hit the vector search path with semantic understanding. Both contribute to the final ranking.
For more on vector search internals, see our vector search architecture guide.
Common Pitfalls
-
Indexing normalized data. Your search documents must be denormalized. Flatten all relations into the document. Don't reference IDs that require a second lookup.
-
One index for all locales. Create one index per locale. Mixed-locale indices can't use language-specific analyzers, and search quality degrades for every language.
-
No facet design. Facets are not an afterthought. Plan which attributes are filterable, how hierarchical categories work, and how facet counts update when filters are applied.
-
Sync only via scheduled reindex. Event-driven sync gives near-real-time updates. Scheduled reindex is a safety net, not the primary mechanism.
-
No relevance tuning. Default text relevance is wrong for commerce. Boost in-stock products, incorporate popularity and ratings, and weight title matches higher than description matches.
-
Ignoring out-of-stock products. Don't remove them from the index. Demote them in ranking. Users may want back-in-stock alerts or to browse upcoming products.
-
No atomic reindex. If your reindex process fails halfway, you have a partially updated index. Use alias swapping for atomic switchover.
-
Treating search as a feature, not infrastructure. Search is a core service. It needs its own cluster, its own monitoring, its own scaling strategy. Don't run it on the same server as your database.
Key Takeaways
-
Product search is not text search. Structured attributes, facets, commercial relevance, real-time inventory, and multilingual support make it fundamentally different.
-
Denormalize for search, normalize for storage. The search document is a flat, self-contained representation of everything needed to render a search result. No joins, no lookups.
-
One index per locale. Language-specific analyzers, tokenizers, and stop words produce dramatically better results than a single mixed-language index.
-
Event-driven sync with scheduled reindex as safety net. Real-time updates for normal operations. Full reindex nightly to catch anything events missed.
-
Relevance tuning is a business decision. Text match quality, in-stock status, popularity, ratings, and margin are all ranking signals. Default relevance is wrong for commerce.
-
MeiliSearch for simplicity, OpenSearch for power. MeiliSearch is perfect for catalogs under 500K with great typo tolerance. OpenSearch handles complex aggregations, hybrid search, and large-scale deployments.
We build search infrastructure as part of our data engineering and ecommerce practice. If you're building or migrating product search, talk to our team or request a quote. Our Vendure Data Hub Plugin includes search sinks for MeiliSearch, OpenSearch, Elasticsearch, Algolia, and Typesense.
Topics covered
Related Guides
Enterprise Guide to Agentic AI Systems
Technical guide to agentic AI systems in enterprise environments. Learn the architecture, capabilities, and applications of autonomous AI agents.
Read guideAgentic Commerce: How to Let AI Agents Buy Things Safely
How to design governed AI agent-initiated commerce. Policy engines, HITL approval gates, HMAC receipts, idempotency, tenant scoping, and the full Agentic Checkout Protocol.
Read guideThe 9 Places Your AI System Leaks Data (and How to Seal Each One)
A systematic map of every place data leaks in AI systems. Prompts, embeddings, logs, tool calls, agent memory, error messages, cache, fine-tuning data, and agent handoffs.
Read guideReady to build production AI systems?
Our team specializes in building production-ready AI systems. Let's discuss how we can help transform your enterprise with cutting-edge technology.
Start a conversation