№ 19 rag Jan 10, 2025 · 10 min read

Open-weights models don't eliminate vendor risk

Self-hosting an open model trades one kind of vendor risk for another. You still depend on someone's architecture decisions, training data, and update schedule.

January 2025. DeepSeek R1 drops. It is very good. It is open-weights. The discourse immediately shifts to: “We can finally stop depending on OpenAI.”

We heard this from three clients in the same week. The reasoning was always the same — if we self-host, we control our destiny. No more API price changes. No more rate limits. No more worrying about a provider deprecating the model we built on.

The reasoning is correct about what it eliminates. It is wrong about what it introduces.

What vendor risk actually means

Vendor risk is not “I pay someone money and they might raise the price.” That is price risk. Vendor risk is broader — it is the set of things you depend on that you do not control.

When you use GPT-4 via API, your vendor risks include: pricing changes, rate limits, model deprecation, data handling policies, uptime, and the model’s behavior changing between versions.

When you self-host an open-weights model, you eliminate the first set. But you pick up a new one.

The new risk surface

Training data risk. You did not curate the training data. You do not know what is in it. You do not know what biases it carries. You do not know if it was trained on data that creates legal exposure for your use case. The model card might tell you some of this. It will not tell you all of it.

Architecture risk. The model’s architecture was chosen by someone else, for their objectives. Those objectives may not match yours. A model optimized for reasoning benchmarks may not be the right choice for your customer-facing summarization pipeline. You cannot change the architecture without retraining — which you almost certainly are not going to do.

Update cadence risk. Open-weights models do not have an SLA for improvements. The creator might release an update next month, next year, or never. If a critical capability gap emerges, you are on your own. With an API provider, you can at least file a support ticket. With an open model, you file a GitHub issue and hope.

Infra risk. This is the big one. Self-hosting a model means your team is now responsible for GPU procurement, serving infrastructure, autoscaling, latency optimization, and uptime. Your team was not doing this before. They may not be good at it yet. The model is free. The compute is not. The ops burden is definitely not.

Talent risk. You now need people who understand model serving, quantization tradeoffs, inference optimization, and GPU cluster management. These people are expensive and hard to find. If your one ML infra person leaves, you have a production system that nobody knows how to operate.

The pattern we see

Here is what typically happens. A team decides to self-host. They get the model running on a single GPU instance. It works. They deploy it. The first week is fine.

Then traffic grows. Latency increases. They need to scale. Scaling means load balancing inference requests across multiple GPU instances. That means building or adopting a serving framework — vLLM, TGI, Triton. Each has its own operational complexity.

Then someone asks about failover. Then someone asks about model versioning. Then someone asks about A/B testing between model versions. Then someone asks about monitoring for quality regressions.

Six months in, they have built a small platform team around model serving. The platform team is three people. They are doing important work. But the original goal was to reduce dependency, and now the team depends on an internal platform that three people built and only three people understand.

They traded vendor risk for internal platform risk. Which is, in some ways, worse — because when your vendor has an outage, you can blame the vendor. When your internal platform has an outage, you just have an outage.

The false binary

The mistake is framing this as “API vs. self-host.” That is the wrong axis. The right axis is “how expensive is it to switch?”

If switching models costs you six months of re-engineering, you have vendor risk — regardless of whether the model is open or closed, hosted or self-hosted. If switching models costs you a config change and a round of evals, you have almost no vendor risk.

The risk is not in the model. The risk is in the coupling.

What actually reduces vendor risk

Abstraction layers. Your application code should not know which model it is calling. It should call a function that returns a response. Which model serves that response is a deployment decision, not an application decision. This is not a new idea. It is the same reason you put a load balancer in front of your web servers.

Eval-driven model selection. Build your eval suite first. Run every candidate model through it. Pick the one that scores best on your specific tasks. When a new model drops — open or closed — run it through the same evals. If it wins, switch. If it doesn’t, don’t.

This only works if your evals are good and your switching cost is low. Which means the abstraction layer is not optional — it is the thing that makes eval-driven selection practical.

Multi-model architectures. Use different models for different tasks. Your summarization pipeline might run on a self-hosted open model because latency and cost matter more than peak quality. Your complex reasoning tasks might run on a frontier API model because quality matters more than cost. Different risk profiles for different tasks.

Prompt portability. Write prompts that are not tightly coupled to a specific model’s behavior. This is harder than it sounds — every model has quirks, and tuning prompts to exploit those quirks makes you more dependent, not less. The prompts that port well are the ones that are clear, structured, and rely on general capabilities rather than model-specific tricks.

The honest tradeoff

Self-hosting open-weights models is a legitimate strategy. It gives you control over data residency, inference costs at scale, and availability. These are real benefits.

But it does not eliminate vendor risk. It transforms it. You trade dependency on an API provider for dependency on a model creator’s training decisions, an open-source community’s update cadence, and your own team’s ability to operate GPU infrastructure.

For some organizations, that is the right trade. For others — particularly those without existing ML infrastructure teams — it creates more risk than it eliminates.

The heuristic: if you cannot name the three people who will operate your self-hosted model infrastructure, and what happens when one of them leaves, you are not ready to self-host. Start with APIs, invest in abstraction layers, and build the infra muscle before you take on the infra burden.

tl;dr

The pattern. Teams self-host open-weights models to escape API vendor risk and end up trading it for infra risk, training data risk, and internal platform risk — while building a three-person GPU ops team that only those three people understand. The fix. Invest in abstraction layers and low-switching-cost architecture so that whether you use an API or a self-hosted model is a deployment decision, not an application decision. The outcome. You can move between models based on eval results rather than lock-in, and the question of open versus closed weights becomes a cost-and-control tradeoff instead of an existential one.

// co-written with ai · edited by humans

← all field notes Start a retainer →

// related notes

Stop benchmarking on Wikipedia 9 min