№ 13 ops Oct 11, 2024 · 9 min read

The pilot that never graduated

Your AI pilot worked. The demo went great. Six months later it is still a pilot. Here is why pilots get stuck, and the three things that get them into production.

The demo went great. The pilot hit its accuracy targets. The stakeholders were impressed. Someone said “this is a game-changer” in a meeting, and they meant it. That was six months ago.

The pilot is still a pilot. It runs on a laptop. Or a notebook in someone’s personal cloud account. Or a prototype environment that nobody monitors. A handful of users test it occasionally. It kind of works. Nobody has a plan to move it to production. Nobody is quite sure whose job that is.

This is the most common outcome for AI pilots. Not failure — limbo. The pilot works well enough that nobody kills it. It doesn’t work well enough — or isn’t integrated enough — to run as a real system. It just sits there, consuming attention and budget, never quite graduating.

We see this at nearly every company that’s past the “should we do AI” conversation. They have pilots. What they don’t have is production systems. The gap between the two is where most AI investment goes to die.

Why pilots get stuck

The problem is almost never technical. The pilot proved the technology works. The problem is organizational — a set of missing decisions that nobody made because the pilot was “just an experiment.”

No production owner. The pilot was built by the AI team, the innovation team, or a couple of engineers who were interested. None of these people run production systems. When the pilot is “done,” there’s nobody whose job it is to operate it. The AI team moves on to the next experiment. The platform team wasn’t involved and doesn’t want to adopt a system they didn’t build. The pilot sits in limbo because nobody owns what happens next.

This is the single most common reason pilots fail to graduate. Ownership. The team that builds a pilot is almost never the team that should run it in production. And if you don’t figure out that handoff before the pilot starts, you won’t figure it out after.

No success criteria defined upfront. The pilot was approved with a vague mandate: “explore whether AI can help with X.” There were no specific metrics, no thresholds, no definition of what “works” means. The pilot produced results. Some were good. Some were mediocre. Nobody knows whether the pilot succeeded because nobody agreed on what success looked like.

Without success criteria, you can’t make a go/no-go decision. And without a go/no-go decision, the pilot just continues. It’s easier to keep running a pilot than to declare it a success or a failure. So it runs. And runs. And runs.

No integration plan. The pilot runs in isolation. It takes manual input, produces output, and someone copies the output into the real system. In the pilot phase, this is fine — you’re testing the AI, not the integration. But in production, the integration is the product. The AI model is maybe 20% of the work. The other 80% is getting data in, getting results out, handling errors, monitoring quality, and fitting into the existing workflow.

Most teams don’t think about integration until the pilot is “done.” Then they discover it’s a 3-month engineering project to connect the pilot to the systems it needs to talk to. The 3-month estimate kills momentum. The pilot stays a pilot.

The people problem. The person who championed the pilot got promoted. Or moved teams. Or left the company. The pilot lost its advocate. Nobody else cares enough to push it through the organizational friction of getting to production. Pilots need a champion, and champions have a half-life.

The pilot tax

Here’s what nobody talks about: running a pilot is often more expensive than running the production version.

A pilot requires manual intervention. Someone feeds it inputs. Someone reviews outputs. Someone restarts it when it crashes. Someone explains to stakeholders why the results are different this week. All of this is human time — untracked, unbudgeted, invisible in the cost model.

A production system, by contrast, is automated. It has monitoring. It has error handling. It has a runbook. It’s less work per unit of output because someone invested the time to make it self-sufficient.

The pilot tax is real, and it compounds. Every month a pilot runs, you’re paying the operational cost of a prototype — which is higher than the operational cost of a production system — while getting the limited value of a system that only a few people use. You’re paying more for less.

This is the argument for graduating or killing. There is no cost-effective middle ground. A pilot that deserves to exist deserves to be in production. A pilot that doesn’t deserve to be in production doesn’t deserve to exist.

The three things that get a pilot into production

We’ve helped teams graduate about 30 AI pilots over the past few years. The ones that make it share three properties. All three are set before the pilot starts, not after.

1. Define graduation criteria before you start

Before the pilot begins, write down what “done” looks like. Not “the AI works” — specific, measurable criteria that trigger the decision to move to production.

“The model classifies incoming tickets with at least 87% accuracy on a held-out test set of 200 tickets, measured weekly for 4 consecutive weeks.”

“Processing time per document drops from 8 minutes to under 2 minutes, with no increase in error rate above the current 4% baseline.”

“Three out of five pilot users rate the system ‘useful’ or ‘very useful’ in the exit survey, and provide specific examples of time saved.”

These criteria serve two purposes. First, they force you to define what matters before you’re emotionally invested in the outcome. Second, they create an automatic trigger for the graduation decision. When the criteria are met, you move to production. When they’re not met, you either iterate with a deadline or kill the pilot. There’s no “let’s keep running it and see.”

2. Assign a production owner from day one

On the first day of the pilot, name the person or team who will own this system in production. Not after the pilot. Not when you “get closer to launch.” Day one.

This person attends the pilot standups. They see how the system works. They understand the data pipeline, the failure modes, the monitoring needs. When the pilot graduates, the handoff is smooth because the production owner has been involved the entire time.

If you can’t name a production owner, that’s a signal. It means either nobody wants to own this in production — which suggests the system isn’t valuable enough to build — or the organizational structure doesn’t support it — which suggests you have a bigger problem than the pilot.

The production owner doesn’t have to build the pilot. They have to be ready to run it. That’s a different skill set and a different commitment. Clarifying this upfront avoids the most common handoff failure: “the AI team built this cool thing and now they want us to support it but we have no idea how it works.”

3. Set a deadline — and mean it

Pilots without deadlines become permanent. The default is entropy: the pilot keeps running, the team keeps tweaking, nobody makes the hard decision.

Set a deadline. 8 weeks. 12 weeks. Whatever’s appropriate. At the deadline, one of three things happens:

Graduate. The criteria are met. Move to production. This is a project with a budget, a timeline, and a team — not “we’ll get to it eventually.”

Iterate with a new deadline. The criteria are close but not met. You see a clear path to getting there. Set a new deadline — no more than 4 weeks — with specific changes to make. This happens once. Not twice. If you’re on your third iteration deadline, the pilot is telling you something.

Kill. The criteria are not met and there’s no clear path to meeting them. Kill the pilot. This is not a failure — it’s a decision. You learned that this use case doesn’t work with current technology, current data, or current organizational capacity. That’s valuable information. Document it, archive the code, move on.

Killing a pilot is hard. Nobody wants to be the person who pulls the plug on something the CEO saw a demo of. But running a pilot forever is worse — it costs more, delivers less, and blocks the team from working on something that might actually make it to production.

The honest conversation

Before you start your next AI pilot, have this conversation: “If this pilot works, who runs it in production, and what does ‘works’ mean specifically?”

If you can’t answer both questions, you’re not ready for a pilot. You’re ready for a research spike — a time-boxed exploration with no expectation of production. That’s fine. Research spikes are valuable. But call them what they are. Don’t call it a pilot unless you’re prepared to graduate it.

The word “pilot” implies a path to production. If there’s no path, it’s not a pilot. It’s a demo that never stops demoing. And your organization already has enough of those.

tl;dr

The pattern. AI pilots succeed in demos but never reach production because nobody defined success criteria, assigned a production owner, or set a deadline. The fix. Before the pilot starts, write graduation criteria, name the production owner, and set a hard deadline for the go/no-go decision. The outcome. Pilots either graduate to production and deliver real value, or get killed quickly — either way, the team stops paying the pilot tax.

// co-written with ai · edited by humans

← all field notes Start a retainer →

// related notes