What we wrote down
so you don't have to repeat it.
- № 48
If you can't eval it, don't ship it
Evals are not the thing you add after launch. They are the thing that tells you whether launching is a good idea.
read · 10 min → - № 47
Your agent is a cronjob. Name it that.
Half the 'agent architectures' we audit are a cronjob with a LLM call and a retry loop. That is a good thing.
read · 7 min → - № 46
Wishlists with a Gantt chart glued on
Most AI roadmaps we see are 14 features with a velocity assumption. The fix is not better estimation.
read · 11 min → - № 45
Stop benchmarking on Wikipedia
Your retrieval benchmark is lying to you if it's on a corpus your model has seen.
read · 9 min → - № 44
Eval-driven development
Write the eval before you write the prompt. Run the eval before you ship the feature. Re-run the eval before you deploy the change. Evals are the tests of the AI era.
read · 10 min → - № 43
The AI project that should have been a spreadsheet
Before you build an AI-powered solution, check whether the problem can be solved with a spreadsheet, a SQL query, or a simple rules engine. Often it can. And that is the better answer.
read · 8 min → - № 42
Monitoring AI systems is not monitoring APIs
HTTP 200 does not mean the answer was right. AI monitoring requires output quality metrics, not just uptime and latency.
read · 9 min → - № 41
AI governance is an engineering problem, not a legal one
Your legal team wrote an AI policy. It lives in a PDF. Nobody reads it. Governance that works is governance that is enforced in code — access controls, audit logs, output filters, eval gates.
read · 10 min → - № 40
The AI audit your board will eventually ask for
Sooner or later, someone — a board member, a regulator, a customer — will ask you to prove your AI systems are working correctly. Here is how to be ready before they ask.
read · 9 min → - № 39
Your AI vendor's pricing will change. Plan for it.
OpenAI has changed pricing 4 times in 18 months. Anthropic twice. Google three times. If your unit economics depend on current API pricing, they are fiction.
read · 9 min → - № 38
Regression suites for prompts
Every prompt change is a potential regression. If you do not have a test suite that runs before every prompt deployment, you are testing in production.
read · 9 min → - № 37
Caching LLM responses is not cheating
Semantic caching can cut your LLM costs by 40-60% and your latency by 90%. Most teams don't do it because it feels like they're 'not really using AI.' They are wrong.
read · 8 min → - № 36
Stop hiring ML PhDs for engineering problems
Your AI product needs someone who can deploy a model, set up monitoring, and build a data pipeline. A PhD in machine learning is trained to do none of those things.
read · 9 min → - № 35
AI is not a department
The moment you put AI in a silo, you have guaranteed it will not work. AI is a capability that lives inside your existing teams, not a team that lives beside them.
read · 8 min → - № 34
The integration is harder than the model
Getting the model to produce the right output takes a week. Integrating that output into your existing systems, workflows, and user experience takes a quarter.
read · 9 min → - № 33
Your A/B test is lying because your baseline is moving
A/B testing AI features is harder than A/B testing traditional features because the model itself changes. Your control group is not constant. Your experiment is corrupted.
read · 10 min → - № 32
The latency budget your PM forgot
Your product spec says 'fast.' Your LLM call takes 3 seconds. Your retrieval takes 800ms. Your reranker takes 400ms. You are already at 4.2 seconds before any business logic.
read · 8 min → - № 31
AI teams need on-call. Not optional.
If your AI system is in production and nobody is on-call for it, you have decided that your users will be the ones who discover failures. That is a choice.
read · 9 min → - № 30
The build-vs-buy decision nobody wants to make
Build your own AI system or buy a vendor solution. Most teams agonize over this for months while doing neither. Here is the framework that cuts through it.
read · 9 min → - № 29
Build one pipeline well before building two
Your first AI pipeline teaches you how to operate AI systems. Your second pipeline benefits from everything you learned. Skip the first and the second will fail too.
read · 8 min → - № 28
Your test suite passed. Your system is still broken.
A passing test suite for an AI system is necessary but dangerously insufficient. The failures that hurt you are the ones your test suite was not designed to catch.
read · 10 min → - № 27
Fine-tuning is maintenance, not a one-time cost
The fine-tuning run is the easy part. The hard part is the data pipeline, the evaluation cadence, the retraining schedule, and the deployment workflow that follows.
read · 9 min → - № 26
The fractional AI leader your board is asking about
Your board wants AI leadership. You're not ready for a full-time hire. A fractional engagement buys you the strategic cover to figure out what you actually need.
read · 10 min → - № 25
When to kill an AI project
The hardest decision in AI is not what to build. It is what to stop building. Here are the five signals that a project should be killed, and why most teams see them too late.
read · 8 min → - № 24
Features your users didn't ask for and won't use
Your team is building AI features because AI is exciting, not because users need them. The tell: nobody can name the user who asked for it.
read · 9 min → - № 23
Three questions before you greenlight an AI project
Before you commit engineering time, budget, and political capital to an AI project, ask these three questions. If you cannot answer them, you are not ready to build.
read · 7 min → - № 22
Prompt versioning is not optional
If you cannot tell me which prompt was running in production last Thursday at 3pm, you cannot debug a regression. Prompts are code. Version them like code.
read · 8 min → - № 21
Your AI engineer is doing three jobs
Prompt engineering, data engineering, and ML engineering are three different skill sets. Your single 'AI engineer' is doing all three, badly. Split the role or accept the tradeoffs.
read · 11 min → - № 20
The 90% accuracy problem
90% accuracy means 1 in 10 answers is wrong. Whether that is acceptable depends entirely on what happens when the wrong answer ships.
read · 9 min → - № 19
Open-weights models don't eliminate vendor risk
Self-hosting an open model trades one kind of vendor risk for another. You still depend on someone's architecture decisions, training data, and update schedule.
read · 10 min → - № 18
Model migrations are database migrations
Switching models is not swapping an API key. It changes your outputs, your latency, your costs, and your eval results. Treat it with the same rigor as a database migration.
read · 9 min → - № 17
Your annual AI review should fit on one page
If you cannot summarize your AI program's impact in one page — what shipped, what it cost, what it changed — you do not understand your own program.
read · 7 min → - № 16
Multimodal is not a feature, it's a stack change
Adding image understanding to your AI product is not a feature flag. It changes your data pipeline, your eval suite, your storage, your latency budget, and your cost model.
read · 10 min → - № 15
The AI team that reported to product shipped. The one that reported to research didn't.
Reporting structure determines what gets built. AI teams that report to product build products. AI teams that report to research build papers. Choose the one you need.
read · 11 min → - № 14
The GPU bill is not the expensive part
Your AI system's real cost is the engineer debugging a hallucination at 2am, the product manager re-explaining the limitations to sales, and the trust you lose with every wrong answer.
read · 8 min → - № 13
The pilot that never graduated
Your AI pilot worked. The demo went great. Six months later it is still a pilot. Here is why pilots get stuck, and the three things that get them into production.
read · 9 min → - № 12
You don't need agents, you need a queue
Most 'agent' architectures we audit are a task queue with an LLM step. That is fine. Call it what it is, and you will make better infrastructure decisions.
read · 9 min → - № 11
Your competitor's AI press release is lying
They announced an AI-powered everything. You panicked. Here is why you should not, and what to do instead of chasing their press release.
read · 8 min → - № 10
Hire the infra engineer before the ML engineer
Your first AI hire should not be someone who trains models. It should be someone who can deploy them, monitor them, and wake up when they break.
read · 11 min → - № 09
Structured outputs don't fix structured thinking
JSON mode and function calling are great. But if the model doesn't understand what you're asking it to extract, you just get well-formatted garbage.
read · 8 min → - № 08
The demo is not the product
Getting an LLM to do the thing once in a notebook is the easy part. The hard part is getting it to do the thing reliably, at scale, for every user, on every edge case, at 3am.
read · 9 min → - № 07
Your board wants an AI strategy by Thursday
You just got the calendar invite. The board wants to know your AI strategy. Here is how to write one in 72 hours that is honest, actionable, and does not promise the moon.
read · 8 min → - № 06
Benchmarks are vanity metrics
MMLU, HellaSwag, HumanEval — these tell you which model wins a standardized test. They do not tell you which model works for your use case. Build your own benchmark or fly blind.
read · 9 min → - № 05
Your model is not your moat
The model is a commodity. The moat is the data pipeline, the eval suite, the deployment infrastructure, and the feedback loop. Most teams invest in the wrong layer.
read · 8 min → - № 04
You don't need a chief AI officer
The CAIO title is a signal that your org does not know where AI fits. The role you actually need depends on whether your problem is strategy, execution, or both.
read · 11 min → - № 03
The AI business case your CFO will actually approve
Most AI business cases fail because they promise transformation. The ones that get funded promise cost savings on a specific workflow with a measurable baseline.
read · 9 min → - № 02
Your AI strategy is a deck, not a system
Most AI strategies we review are a list of use cases with a timeline. That is not a strategy. A strategy is a set of bets with a kill criteria.
read · 9 min → - № 01
The eval you skipped is the one that bites
Teams skip evals on exactly the features that need them most — the ones where 'correct' is hard to define. That difficulty is the signal, not the excuse.
read · 8 min →