Field Notes

field notes

What we wrote down
so you don't have to repeat it.

№ 55
jun 21 / 2026 ops
Your cloud keys should not exist
Zero uses keyless federation instead of stored credentials for cloud access. Here's how it works and why we built around it.
read · 9 min →
№ 54
jun 10 / 2026 scope
You built it in a weekend — now what
Most no-code apps hit platform limits within 18 months. A migration playbook — when to move, how to sequence it, and why early technical guidance cuts the cost.
read · 14 min →
№ 53
may 29 / 2026 scope
AI vendor selection is not software procurement
What to ask AI vendors, how to score them, and why your SaaS procurement checklist misses the risks that actually burn you.
read · 14 min →
№ 52
may 23 / 2026 ops
Hiring engineers in the age of AI
A concrete hiring rubric and 14 interview questions for engineers in 2026 — evaluating architectural judgment, AI-assisted execution, and culture, not raw coding ability.
read · 12 min →
№ 51
may 15 / 2026 scope
The displacement gap
AI can theoretically automate 94% of knowledge work. In practice, it covers 33%. The real risk isn't mass layoffs — it's the quiet hollowing-out of entry-level roles.
read · 9 min →
№ 50
may 09 / 2026 scope
The AI divide is not about AI
75% of AI's economic gains go to 20% of companies. The gap isn't about models or budgets — it's about how organizations treat AI as a business decision.
read · 8 min →
№ 49
apr 30 / 2026 scope
Open source is not open-ended
Warp going AGPL is part of a broader wave of tools opening up. That wave is genuinely good — but 'open source' is not one thing, and the license still matters more than the press release.
read · 9 min →
№ 48
apr 19 / 2026 evals
If you can't eval it, don't ship it
Evals are not the thing you add after launch. They are the thing that tells you whether launching is a good idea.
read · 10 min →
№ 47
mar 26 / 2026 evals
Your agent is a cronjob. Name it that.
Half the 'agent architectures' we audit are a cronjob with a LLM call and a retry loop. That is a good thing.
read · 7 min →
№ 46
mar 08 / 2026 scope
Wishlists with a Gantt chart glued on
Most AI roadmaps we see are 14 features with a velocity assumption. The fix is not better estimation.
read · 11 min →
№ 45
feb 14 / 2026 rag
Stop benchmarking on Wikipedia
Your retrieval benchmark is lying to you if it's on a corpus your model has seen.
read · 9 min →
№ 44
jan 23 / 2026 evals
Eval-driven development
Write the eval before you write the prompt. Run the eval before you ship the feature. Re-run the eval before you deploy the change. Evals are the tests of the AI era.
read · 10 min →
№ 43
jan 09 / 2026 scope
The AI project that should have been a spreadsheet
Before you build an AI-powered solution, check whether the problem can be solved with a spreadsheet, a SQL query, or a simple rules engine. Often it can. And that is the better answer.
read · 8 min →
№ 42
dec 19 / 2025 ops
Monitoring AI systems is not monitoring APIs
HTTP 200 does not mean the answer was right. AI monitoring requires output quality metrics, not just uptime and latency.
read · 9 min →
№ 41
dec 05 / 2025 org
AI governance is an engineering problem, not a legal one
Your legal team wrote an AI policy. It lives in a PDF. Nobody reads it. Governance that works is governance that is enforced in code — access controls, audit logs, output filters, eval gates.
read · 10 min →
№ 40
nov 21 / 2025 ops
The AI audit your board will eventually ask for
Sooner or later, someone — a board member, a regulator, a customer — will ask you to prove your AI systems are working correctly. Here is how to be ready before they ask.
read · 9 min →
№ 39
nov 07 / 2025 scope
Your AI vendor's pricing will change. Plan for it.
OpenAI has changed pricing 4 times in 18 months. Anthropic twice. Google three times. If your unit economics depend on current API pricing, they are fiction.
read · 9 min →
№ 38
oct 24 / 2025 evals
Regression suites for prompts
Every prompt change is a potential regression. If you do not have a test suite that runs before every prompt deployment, you are testing in production.
read · 9 min →
№ 37
oct 10 / 2025 ops
Caching LLM responses is not cheating
Semantic caching can cut your LLM costs by 40-60% and your latency by 90%. Most teams don't do it because it feels like they're 'not really using AI.' They are wrong.
read · 8 min →
№ 36
sep 19 / 2025 org
Stop hiring ML PhDs for engineering problems
Your AI product needs someone who can deploy a model, set up monitoring, and build a data pipeline. A PhD in machine learning is trained to do none of those things.
read · 9 min →
№ 35
sep 05 / 2025 org
AI is not a department
The moment you put AI in a silo, you have guaranteed it will not work. AI is a capability that lives inside your existing teams, not a team that lives beside them.
read · 8 min →
№ 34
aug 22 / 2025 scope
The integration is harder than the model
Getting the model to produce the right output takes a week. Integrating that output into your existing systems, workflows, and user experience takes a quarter.
read · 9 min →
№ 33
aug 08 / 2025 evals
Your A/B test is lying because your baseline is moving
A/B testing AI features is harder than A/B testing traditional features because the model itself changes. Your control group is not constant. Your experiment is corrupted.
read · 10 min →
№ 32
jul 25 / 2025 ops
The latency budget your PM forgot
Your product spec says 'fast.' Your LLM call takes 3 seconds. Your retrieval takes 800ms. Your reranker takes 400ms. You are already at 4.2 seconds before any business logic.
read · 8 min →
№ 31
jul 11 / 2025 org
AI teams need on-call. Not optional.
If your AI system is in production and nobody is on-call for it, you have decided that your users will be the ones who discover failures. That is a choice.
read · 9 min →
№ 30
jun 20 / 2025 scope
The build-vs-buy decision nobody wants to make
Build your own AI system or buy a vendor solution. Most teams agonize over this for months while doing neither. Here is the framework that cuts through it.
read · 9 min →
№ 29
jun 06 / 2025 scope
Build one pipeline well before building two
Your first AI pipeline teaches you how to operate AI systems. Your second pipeline benefits from everything you learned. Skip the first and the second will fail too.
read · 8 min →
№ 28
may 23 / 2025 evals
Your test suite passed. Your system is still broken.
A passing test suite for an AI system is necessary but dangerously insufficient. The failures that hurt you are the ones your test suite was not designed to catch.
read · 10 min →
№ 27
may 09 / 2025 ops
Fine-tuning is maintenance, not a one-time cost
The fine-tuning run is the easy part. The hard part is the data pipeline, the evaluation cadence, the retraining schedule, and the deployment workflow that follows.
read · 9 min →
№ 26
apr 25 / 2025 org
The fractional AI leader your board is asking about
Your board wants AI leadership. You're not ready for a full-time hire. A fractional engagement buys you the strategic cover to figure out what you actually need.
read · 10 min →
№ 25
apr 11 / 2025 scope
When to kill an AI project
The hardest decision in AI is not what to build. It is what to stop building. Here are the five signals that a project should be killed, and why most teams see them too late.
read · 8 min →
№ 24
mar 21 / 2025 scope
Features your users didn't ask for and won't use
Your team is building AI features because AI is exciting, not because users need them. The tell: nobody can name the user who asked for it.
read · 9 min →
№ 23
mar 07 / 2025 scope
Three questions before you greenlight an AI project
Before you commit engineering time, budget, and political capital to an AI project, ask these three questions. If you cannot answer them, you are not ready to build.
read · 7 min →
№ 22
feb 21 / 2025 ops
Prompt versioning is not optional
If you cannot tell me which prompt was running in production last Thursday at 3pm, you cannot debug a regression. Prompts are code. Version them like code.
read · 8 min →
№ 21
feb 07 / 2025 org
Your AI engineer is doing three jobs
Prompt engineering, data engineering, and ML engineering are three different skill sets. Your single 'AI engineer' is doing all three, badly. Split the role or accept the tradeoffs.
read · 11 min →
№ 20
jan 24 / 2025 scope
The 90% accuracy problem
90% accuracy means 1 in 10 answers is wrong. Whether that is acceptable depends entirely on what happens when the wrong answer ships.
read · 9 min →
№ 19
jan 10 / 2025 rag
Open-weights models don't eliminate vendor risk
Self-hosting an open model trades one kind of vendor risk for another. You still depend on someone's architecture decisions, training data, and update schedule.
read · 10 min →
№ 18
dec 20 / 2024 ops
Model migrations are database migrations
Switching models is not swapping an API key. It changes your outputs, your latency, your costs, and your eval results. Treat it with the same rigor as a database migration.
read · 9 min →
№ 17
dec 06 / 2024 evals
Your annual AI review should fit on one page
If you cannot summarize your AI program's impact in one page — what shipped, what it cost, what it changed — you do not understand your own program.
read · 7 min →
№ 16
nov 22 / 2024 scope
Multimodal is not a feature, it's a stack change
Adding image understanding to your AI product is not a feature flag. It changes your data pipeline, your eval suite, your storage, your latency budget, and your cost model.
read · 10 min →
№ 15
nov 08 / 2024 org
The AI team that reported to product shipped. The one that reported to research didn't.
Reporting structure determines what gets built. AI teams that report to product build products. AI teams that report to research build papers. Choose the one you need.
read · 11 min →
№ 14
oct 25 / 2024 ops
The GPU bill is not the expensive part
Your AI system's real cost is the engineer debugging a hallucination at 2am, the product manager re-explaining the limitations to sales, and the trust you lose with every wrong answer.
read · 8 min →
№ 13
oct 11 / 2024 ops
The pilot that never graduated
Your AI pilot worked. The demo went great. Six months later it is still a pilot. Here is why pilots get stuck, and the three things that get them into production.
read · 9 min →
№ 12
sep 27 / 2024 scope
You don't need agents, you need a queue
Most 'agent' architectures we audit are a task queue with an LLM step. That is fine. Call it what it is, and you will make better infrastructure decisions.
read · 9 min →
№ 11
sep 13 / 2024 scope
Your competitor's AI press release is lying
They announced an AI-powered everything. You panicked. Here is why you should not, and what to do instead of chasing their press release.
read · 8 min →
№ 10
aug 23 / 2024 org
Hire the infra engineer before the ML engineer
Your first AI hire should not be someone who trains models. It should be someone who can deploy them, monitor them, and wake up when they break.
read · 11 min →
№ 09
aug 09 / 2024 ops
Structured outputs don't fix structured thinking
JSON mode and function calling are great. But if the model doesn't understand what you're asking it to extract, you just get well-formatted garbage.
read · 8 min →
№ 08
jul 26 / 2024 scope
The demo is not the product
Getting an LLM to do the thing once in a notebook is the easy part. The hard part is getting it to do the thing reliably, at scale, for every user, on every edge case, at 3am.
read · 9 min →
№ 07
jul 11 / 2024 scope
Your board wants an AI strategy by Thursday
You just got the calendar invite. The board wants to know your AI strategy. Here is how to write one in 72 hours that is honest, actionable, and does not promise the moon.
read · 8 min →
№ 06
jun 28 / 2024 evals
Benchmarks are vanity metrics
MMLU, HellaSwag, HumanEval — these tell you which model wins a standardized test. They do not tell you which model works for your use case. Build your own benchmark or fly blind.
read · 9 min →
№ 05
jun 14 / 2024 ops
Your model is not your moat
The model is a commodity. The moat is the data pipeline, the eval suite, the deployment infrastructure, and the feedback loop. Most teams invest in the wrong layer.
read · 8 min →
№ 04
may 30 / 2024 org
You don't need a chief AI officer
The CAIO title is a signal that your org does not know where AI fits. The role you actually need depends on whether your problem is strategy, execution, or both.
read · 11 min →
№ 03
may 15 / 2024 scope
The AI business case your CFO will actually approve
Most AI business cases fail because they promise transformation. The ones that get funded promise cost savings on a specific workflow with a measurable baseline.
read · 9 min →
№ 02
apr 28 / 2024 scope
Your AI strategy is a deck, not a system
Most AI strategies we review are a list of use cases with a timeline. That is not a strategy. A strategy is a set of bets with a kill criteria.
read · 9 min →
№ 01
apr 12 / 2024 evals
The eval you skipped is the one that bites
Teams skip evals on exactly the features that need them most — the ones where 'correct' is hard to define. That difficulty is the signal, not the excuse.
read · 8 min →

What we wrote downso you don't have to repeat it.

What we wrote down
so you don't have to repeat it.