Technical Guide

Designing AI Systems Without Vendor Lock-In: What to Abstract (and What Not To)

How to design AI architectures that survive provider switches. Abstraction layers, prompt portability, multi-model routing, evaluation-driven development, and when lock-in is the right choice.

March 26, 202614 min readOronts Engineering Team

The Lock-In Trap

Every AI project starts with one provider. OpenAI, Anthropic, or a local model. The SDK goes into the codebase. Provider-specific features (function calling format, JSON mode, system prompt behavior) get baked into the business logic. Six months later, you want to switch providers (cost, performance, compliance) and discover that the switch requires rewriting half the application.

We've built AI systems that use multiple providers simultaneously and have switched providers mid-project without touching business logic. This article covers the architecture that makes that possible and the honest trade-offs involved.

For broader AI architecture patterns, see our AI systems guide and AI orchestration guide.

What Creates Lock-In

Lock-In Source	Example	Severity
SDK coupling	`openai.chat.completions.create()` in 50 files	High
Prompt format	Prompts tuned for GPT-4's behavior, fail on Claude	High
Fine-tuned models	Model trained on your data, hosted by provider	Very High
Proprietary features	Assistants API, function calling format, JSON mode	Medium
Embedding lock-in	100K documents embedded with `text-embedding-3-small`, incompatible with other models	Very High
Rate limit architecture	System designed around OpenAI's specific rate limits and batching	Low

The Abstraction Layer That Works

A good abstraction layer has three components: a unified interface for model calls, a prompt management layer, and an evaluation framework.

// Unified LLM interface
interface LlmProvider {
    generate(request: LlmRequest): Promise<LlmResponse>;
    stream(request: LlmRequest): AsyncIterable<LlmChunk>;
    embed(texts: string[]): Promise<number[][]>;
}

interface LlmRequest {
    model: string;
    messages: Message[];
    temperature?: number;
    maxTokens?: number;
    responseFormat?: 'text' | 'json';
    tools?: ToolDefinition[];
}

interface LlmResponse {
    text: string;
    usage: { promptTokens: number; completionTokens: number };
    finishReason: string;
    model: string;
    provider: string;
    latencyMs: number;
}

Each provider implements this interface:

class OpenAiProvider implements LlmProvider {
    async generate(request: LlmRequest): Promise<LlmResponse> {
        const response = await this.client.chat.completions.create({
            model: request.model,
            messages: this.convertMessages(request.messages),
            temperature: request.temperature,
            max_tokens: request.maxTokens,
            response_format: request.responseFormat === 'json'
                ? { type: 'json_object' } : undefined,
        });
        return this.convertResponse(response);
    }
}

class AnthropicProvider implements LlmProvider {
    async generate(request: LlmRequest): Promise<LlmResponse> {
        const response = await this.client.messages.create({
            model: request.model,
            system: this.extractSystemMessage(request.messages),
            messages: this.convertMessages(request.messages),
            temperature: request.temperature,
            max_tokens: request.maxTokens ?? 4096,
        });
        return this.convertResponse(response);
    }
}

The business logic uses the interface, never the provider SDK directly:

// Business logic: provider-agnostic
async function generateCustomerResponse(ctx: Context, ticket: string): Promise<string> {
    const provider = ctx.getLlmProvider(); // resolved from config
    const response = await provider.generate({
        model: ctx.getModelForUseCase('customer-support'),
        messages: [
            { role: 'system', content: SUPPORT_SYSTEM_PROMPT },
            { role: 'user', content: ticket },
        ],
        temperature: 0.3,
    });
    return response.text;
}

Switching from OpenAI to Anthropic: change the config. Zero code changes in business logic.

The Prompt Portability Problem

Prompts are not portable across models. A prompt tuned for GPT-4 may produce worse results on Claude, and vice versa. Models interpret instructions differently, handle edge cases differently, and have different strengths.

The solution: prompt variants per model family.

const SUPPORT_PROMPTS: Record<string, string> = {
    'openai': `You are a customer support assistant. Be concise and helpful.
When you don't know the answer, say so clearly.
Always reference the customer's order number.`,

    'anthropic': `You are a customer support assistant.
<instructions>
- Be concise and helpful
- When you don't know the answer, say so clearly
- Always reference the customer's order number
</instructions>`,

    'local': `### System
You are a customer support assistant.
### Rules
1. Be concise and helpful
2. When you don't know the answer, say so clearly
3. Always reference the customer's order number`,
};

function getPrompt(useCase: string, provider: string): string {
    return PROMPTS[useCase][provider] || PROMPTS[useCase]['default'];
}

This adds maintenance cost (multiple prompt versions), but it's the reality. A single prompt across all models means accepting degraded quality on at least one model.

Multi-Model Routing

Different tasks have different requirements. Classification needs speed. Generation needs quality. Embedding needs consistency. Route each task to the right model.

const MODEL_ROUTING: Record<string, ModelConfig> = {
    'classification': {
        primary: { provider: 'openai', model: 'gpt-4o-mini' },
        fallback: { provider: 'anthropic', model: 'claude-haiku-4-5-20251001' },
        reason: 'Fast, cheap, good enough for classification',
    },
    'generation': {
        primary: { provider: 'anthropic', model: 'claude-sonnet-4-20250514' },
        fallback: { provider: 'openai', model: 'gpt-4o' },
        reason: 'Best quality for long-form generation',
    },
    'embedding': {
        primary: { provider: 'openai', model: 'text-embedding-3-small' },
        fallback: null, // Can't switch embedding models without re-embedding
        reason: 'Consistency: all embeddings must use the same model',
    },
    'summarization': {
        primary: { provider: 'anthropic', model: 'claude-haiku-4-5-20251001' },
        fallback: { provider: 'openai', model: 'gpt-4o-mini' },
        reason: 'Fast and cheap for summaries',
    },
};

The routing config is separate from business logic. Changing which model handles summarization is a config change, not a code change.

Embedding Lock-In: The Hardest to Break

Embeddings are the strongest form of lock-in. Once you embed 100K documents with text-embedding-3-small, switching to a different embedding model requires re-embedding everything. The vectors are incompatible across models (different dimensions, different semantic spaces).

Mitigations:

Store the source documents alongside embeddings (so you can re-embed)
Track which embedding model was used per document
Budget for re-embedding when planning a provider switch
Consider open-source embedding models (sentence-transformers) for portability

Evaluation-Driven Development

The key to safe provider switching: evaluate against tasks, not against models. If your evaluation suite tests "does the answer correctly address the customer's issue?" rather than "does the output match GPT-4's format?", you can switch models confidently.

interface EvalCase {
    input: string;
    expectedBehavior: string;  // What the output should DO, not what it should SAY
    criteria: EvalCriterion[];
}

interface EvalCriterion {
    name: string;
    check: (output: string, context: EvalCase) => boolean | number;
}

const SUPPORT_EVALS: EvalCase[] = [
    {
        input: "I haven't received my order #12345",
        expectedBehavior: "Acknowledge the issue, reference order number, offer next steps",
        criteria: [
            { name: 'references_order', check: (out) => out.includes('12345') },
            { name: 'acknowledges_issue', check: (out) => /sorry|apologize|understand/i.test(out) },
            { name: 'offers_action', check: (out) => /check|investigate|track|follow up/i.test(out) },
            { name: 'reasonable_length', check: (out) => out.length > 50 && out.length < 500 },
        ],
    },
];

// Run evals against any model
async function runEvalSuite(provider: LlmProvider, model: string): Promise<EvalResults> {
    const results = [];
    for (const evalCase of SUPPORT_EVALS) {
        const output = await provider.generate({
            model,
            messages: [
                { role: 'system', content: SUPPORT_SYSTEM_PROMPT },
                { role: 'user', content: evalCase.input },
            ],
        });
        const scores = evalCase.criteria.map(c => ({
            name: c.name,
            passed: c.check(output.text, evalCase),
        }));
        results.push({ input: evalCase.input, scores });
    }
    return summarize(results);
}

Run the eval suite before switching models. If the new model passes with acceptable scores, the switch is safe. If it doesn't, you know exactly which behaviors degraded.

What NOT to Abstract

Not everything should be abstracted. Over-abstraction adds complexity without benefit.

Don't Abstract	Why
Embedding model (within a project)	Vectors are incompatible. Abstracting doesn't help.
Fine-tuned model specifics	Fine-tuning is inherently provider-specific
Provider-specific optimizations	Batching, caching, rate limit handling differ per provider
Streaming format details	Handle in the provider adapter, not in the abstraction

The abstraction should cover: model selection, prompt routing, response parsing, cost tracking, and fallback logic. It should NOT try to make all providers behave identically. They don't, and pretending they do creates bugs.

When Lock-In Is Correct

Sometimes, locking into a provider is the right business decision:

Fine-tuning requires commitment to one provider. The trained model is not portable.
Volume discounts from a single provider can reduce costs significantly.
Compliance may require a specific provider (data residency, certifications).
Speed to market means using provider-specific features without abstraction overhead.

The key: make the lock-in decision consciously, not accidentally. If you choose to use OpenAI's Assistants API because it saves 3 months of development, that's a valid trade-off. If you use it because you didn't think about abstraction, that's technical debt.

Common Pitfalls

Provider SDK in business logic. openai.chat.completions.create() sprinkled across 50 files makes switching impossible. Use an abstraction layer.
One prompt for all models. Prompts are model-specific. Maintain variants per model family. Accept the maintenance cost.
Abstracting everything. Over-abstraction adds complexity. Abstract the interface, not the internals.
No evaluation suite. Without automated evals, you can't know if a model switch degrades quality until users complain.
Ignoring embedding lock-in. Switching embedding models requires re-embedding every document. Plan for this cost.
Accidental lock-in. The worst kind. Using provider-specific features without realizing you're creating a dependency.

Key Takeaways

Abstract the interface, not the implementation. A unified LlmProvider interface with provider-specific adapters. Business logic calls the interface.
Prompts are not portable. Maintain prompt variants per model family. A single prompt across models means degraded quality somewhere.
Route different tasks to different models. Classification to a fast model, generation to an accurate model, embeddings to a consistent model. Config-driven, not code-driven.
Evaluate against tasks, not models. If your eval suite tests behavior ("does it reference the order number?") rather than format, you can switch models confidently.
Embedding is the hardest lock-in to break. Store source documents. Track which model was used. Budget for re-embedding when switching.
Sometimes lock-in is correct. Fine-tuning, volume discounts, and compliance may justify it. Make it a conscious decision.

We design provider-agnostic AI architectures as part of our AI services and consulting practice. If you need help with AI architecture, talk to our team or request a quote.

Topics covered

AI vendor lock-inLLM abstraction layermulti-model strategyAI provider independencemodel routingprompt portabilityAI architecture

Ready to build production AI systems?

Our team specializes in building production-ready AI systems. Let's discuss how we can help transform your enterprise with cutting-edge technology.

Start a conversation

Designing AI Systems Without Vendor Lock-In: What to Abstract (and What Not To)

The Lock-In Trap

What Creates Lock-In

The Abstraction Layer That Works

The Prompt Portability Problem

Multi-Model Routing

Embedding Lock-In: The Hardest to Break

Evaluation-Driven Development

What NOT to Abstract

When Lock-In Is Correct

Common Pitfalls

Key Takeaways

Topics covered

Related Guides

Enterprise Guide to Agentic AI Systems

The Complete Guide to AI Orchestration

Agentic Commerce: How to Let AI Agents Buy Things Safely

Ready to build production AI systems?

Get the Latest AI Insights

Services

Solutions

Company

Resources

Legal