← all field notes
№ 29 scope Jun 06, 2025 · 8 min read

Build one pipeline well before building two

Your first AI pipeline teaches you how to operate AI systems. Your second pipeline benefits from everything you learned. Skip the first and the second will fail too.


Your first AI pipeline is not a product. It is a lesson. The lesson is: this is what it takes to operate an AI system. If you try to learn that lesson twice, in parallel, you will learn it zero times.

The parallelization instinct

Teams under pressure do the same thing: they try to run three AI initiatives at once. The logic sounds reasonable. “We have a customer support use case, a document processing use case, and an internal search use case. They share infrastructure. We can parallelize.”

They cannot.

The three use cases do not share infrastructure — not yet. They share the aspiration of infrastructure. The actual infrastructure — the eval frameworks, the monitoring dashboards, the cost tracking, the incident response playbooks, the deployment workflows — does not exist. It will be built during the first project. If you are running three projects, it will be built three times, by three sub-teams, in three incompatible ways.

We have watched this happen at two companies in the last year. Both had competent engineering teams. Both launched three AI workstreams simultaneously. Both ended up with three half-built systems, three incomplete eval suites, and zero production deployments after six months.

What the first pipeline teaches you

Your first pipeline to production teaches you things you cannot learn from a blog post, a conference talk, or a vendor demo. You learn them by shipping.

How to build evals for your domain. Not evals in the abstract — evals that measure the thing your users care about, using data that reflects your actual distribution. This takes iteration. Your first eval set will be wrong. You will measure the wrong thing, or measure the right thing with the wrong metric, or measure with the right metric on the wrong data. It takes two or three rounds before you have an eval suite you trust.

How to monitor an AI system. Not just uptime and latency — the metrics that matter for a non-deterministic system. Output quality scores. Hallucination rates. Retrieval recall. Token costs per query. User satisfaction signals. You will not know which of these matter most until you are watching them in production. Different use cases have different critical metrics, but the monitoring infrastructure is reusable.

How to handle model updates. The base model changes. The API changes. The pricing changes. Your system breaks in a way your test suite did not cover, because the model’s behavior shifted subtly. The first time this happens, it is a crisis. The second time, it is a process. You need the first time to build the process.

How to manage costs. AI system costs are not like traditional compute costs. They are per-token, per-request, and they scale with usage in ways that are hard to predict before you have real traffic. Your first pipeline teaches you how to forecast, how to set budgets, how to optimize — cache layers, prompt compression, model routing. These learnings transfer directly to every subsequent pipeline.

How to respond to incidents. Your AI system will produce a bad output that reaches a user. What happens next? Who gets paged? How do you diagnose the root cause — was it the retrieval, the prompt, the model, or the data? How do you roll back? How do you communicate to the user? These playbooks take time to write and they only get written in response to real incidents.

The compounding effect

Every one of these learnings — evals, monitoring, model updates, cost management, incident response — compounds. The second pipeline benefits from all of them. The eval framework is reusable. The monitoring dashboards need one new panel, not a new dashboard. The cost management patterns transfer. The incident playbook gets a new section, not a new playbook.

A team that builds one pipeline to production and then starts the second pipeline ships the second one in half the time. We have seen this consistently. The first pipeline takes 3-4 months. The second takes 6-8 weeks. Not because the second is simpler — because the team knows what they are doing.

A team that builds two pipelines in parallel ships neither in 3-4 months. They ship both in 6+, if at all, because they are learning every lesson twice and building every piece of infrastructure twice.

The objection

“But we are under pressure to show results across multiple use cases.”

Yes. And the fastest way to show results across multiple use cases is to ship one use case quickly and well, then use the infrastructure and learnings to ship the next two in rapid succession.

Shipping one thing in 3 months and two more things in the following 2 months — that is 3 things shipped in 5 months. Shipping three things in parallel and landing all three at month 6 — if you land them at all — that is 3 things in 6 months. The sequential approach is faster. It is also less risky, because each subsequent pipeline benefits from the lessons of the ones before it.

The uncomfortable truth: parallelizing AI initiatives is not a strategy for moving fast. It is a strategy for looking busy.

How to pick the first one

The first pipeline should not be the most important use case. It should be the one that teaches you the most while carrying the least risk.

Pick the use case that is:

  • Internal-facing, so failures are embarrassing, not catastrophic.
  • Measurable, so you can build evals that actually tell you something.
  • Small enough to ship in 6-8 weeks with a small team.
  • Representative enough that the infrastructure you build will transfer.

Internal document search is often a good first pipeline. It hits retrieval, generation, evaluation, monitoring, and cost management. It has real users who give real feedback. And if it hallucinates, nobody sues you.

The heuristic

One pipeline to production. Then scale the learnings. The first pipeline is the tuition — you are paying to learn how your organization operates AI systems. Do not pay tuition twice.

tl;dr

The pattern. Teams under pressure launch three AI workstreams simultaneously and end up with three half-built systems, three incompatible eval frameworks, and zero production deployments after six months. The fix. Ship one internal-facing, measurable pipeline to production first, and use the evals, monitoring, cost patterns, and incident playbooks you build there as the foundation for every pipeline that follows. The outcome. The second pipeline ships in half the time because the hard operational lessons were paid for once, not learned in parallel by three sub-teams.


← all field notes Start a retainer →