№ 47 evals Mar 26, 2026 · 7 min read

Your agent is a cronjob. Name it that.

Half the 'agent architectures' we audit are a cronjob with a LLM call and a retry loop. That is a good thing.

Half the “agent architectures” we audit are a cronjob with a LLM call and a retry loop. That is a good thing. Here is why naming it correctly changes how you test it, what you monitor, and whether your on-call can fix it at 3am.

The pattern

You have a scheduled job. It runs every N minutes. It calls a model. If the model fails, it retries. If the retry fails, it alerts someone. That is a cronjob. It is a good architecture. It is battle-tested. Your ops team already knows how to run it.

The problem starts when you call it an “agent” and treat it like one. Agents get agent infrastructure — orchestration frameworks, memory stores, planning loops. Your cronjob does not need any of that. It needs a cron expression, a health check, and a dashboard.

Why the name matters

When you name something correctly, three things change:

Testing. Cronjobs get tested like cronjobs — you run them, check the output, compare to expected. You don’t need an “agent evaluation framework.” You need pytest and a fixture that returns a known payload.

Monitoring. Cronjobs get monitored like cronjobs — did it run, how long did it take, did it succeed. You don’t need “agent observability.” You need a counter, a histogram, and an alert on failure rate.

On-call. When your cronjob pages someone at 3am, the on-call engineer knows what to do. Check the logs. Check the input. Check the model response. Retry manually if needed. They do not need to understand a “reasoning trace” or a “tool-use chain.”

The heuristic

If your system does not make decisions about what to do next — if the control flow is static and the only dynamic part is the model call — it is a cronjob. Name it that. Run it that way. Monitor it that way.

Save the word “agent” for systems that actually have a planning loop, where the output of one step determines which step runs next. Those exist. They are rare. And they need genuinely different infrastructure.

Most of what is shipping in production today is a cronjob. That is not an insult. That is a compliment. Cronjobs work.

tl;dr

The pattern. Teams label scheduled jobs with a single LLM call as “agent architectures” and then reach for orchestration frameworks, memory stores, and planning infrastructure that the system does not need and the on-call engineer cannot debug at 3am. The fix. If the control flow is static and only the model call is dynamic, name it a cronjob, test it with pytest, monitor it with a counter and a failure-rate alert, and save “agent” for systems that actually have a planning loop where one step’s output determines the next. The outcome. Your system gets the simple, battle-tested operations tooling it deserves, and your on-call can fix it without understanding a reasoning trace.

// co-written with ai · edited by humans

← all field notes Start a retainer →

// related notes

If you can't eval it, don't ship it 10 min
Eval-driven development 10 min
Regression suites for prompts 9 min