The AI project that should have been a spreadsheet
Before you build an AI-powered solution, check whether the problem can be solved with a spreadsheet, a SQL query, or a simple rules engine. Often it can. And that is the better answer.
A team spent three months building an AI-powered classification system. It categorized incoming support tickets into 12 buckets. It used a fine-tuned model. It had a retrieval layer for edge cases. It had a human-in-the-loop review queue. It cost $8k/month to run.
The previous system — a series of keyword rules in a CASE statement — had 89% accuracy. The AI system had 93% accuracy. The 4-point improvement cost $8k/month in API fees, three months of engineering time, and ongoing maintenance burden for a system with non-deterministic behavior.
A senior engineer on the team eventually asked the question nobody wanted to hear: “Could we have gotten to 93% by adding more rules to the CASE statement?”
The answer was yes.
The pattern
We see this pattern often enough that it has a name in our practice. We call it “AI-for-the-sake-of-AI.” The problem is real. The solution works. But the solution is dramatically over-engineered for the problem it solves.
The tell is simple: if you can enumerate the categories, you probably do not need a language model to classify them. If you can write the summary template, you probably do not need a model to generate it. If the data fits in memory, you probably do not need embeddings to search it.
This is not a criticism of AI. AI is genuinely transformative for problems that require language understanding, pattern recognition at scale, or handling of genuinely novel inputs. The criticism is of reaching for AI before checking whether a simpler tool works.
The examples
Classification with a small label set. If your classification problem has fewer than 20 categories and the distinguishing features are keywords or patterns in the input, a rules engine is the right tool. It is deterministic, debuggable, fast, and free. Add AI when the categories are ambiguous, the language is varied, or new categories emerge frequently.
Summarization with a fixed structure. “Summarize this support ticket into: customer name, issue type, severity, and next action.” This is not summarization. This is extraction. A template with regex or a lightweight NER model handles this at a fraction of the cost and with 100% structural consistency. The LLM will occasionally forget a field, reformat the output, or hallucinate a severity level. The template will not.
Prediction with historical data. “Predict which customers will churn based on their usage patterns.” If you have structured data — login frequency, feature usage, support tickets filed — a gradient-boosted tree will outperform an LLM at this task. It will be faster, cheaper, more interpretable, and easier to maintain. LLMs are not good at tabular prediction. They never have been.
Search over a small corpus. If your corpus is fewer than 10,000 documents and your users search by keyword, full-text search (Elasticsearch, PostgreSQL tsvector, even SQLite FTS) is the right answer. It is fast, well-understood, and does not require an embedding pipeline. Add semantic search when keyword search fails — when users search for concepts, not strings.
Data transformation with known rules. “Convert these addresses to a standard format.” “Extract phone numbers from these documents.” “Map these product codes to categories.” These are deterministic transformations. Write the rules. An LLM will get 95% of them right and will get 5% wrong in unpredictable ways. The rules engine will get 100% right for the patterns you have written and will fail loudly on patterns you have not — which is the behavior you want.
Why teams reach for AI anyway
Three reasons:
Excitement. AI is new and interesting. Rules engines are boring. Engineers — reasonably — want to work on interesting problems. The organizational pressure to “do AI” reinforces this. Nobody gets a promotion for shipping a well-crafted CASE statement.
Anticipated complexity. “The problem is simple now, but it will get more complex.” Maybe. But build for the problem you have, not the problem you imagine. If the problem gets more complex, you can add AI then. You cannot un-add complexity.
Demo-driven development. The AI solution demos well. You type a natural language query, the system responds intelligently, the stakeholder is impressed. The rules engine does not demo well. It just works, quietly, correctly, boringly. But demos are not production, and production is what matters.
The cost of unnecessary AI
The cost is not just the API bill — though the API bill matters. The deeper costs:
Non-determinism. Rules produce the same output for the same input. Always. LLMs do not. When your classification system occasionally puts the same ticket in different categories on successive runs, debugging becomes archaeology. “Why did it do that?” “We don’t know. It’s a language model.”
Maintenance burden. A rules engine is maintained by editing rules. An AI system is maintained by monitoring evals, managing prompts, tracking model versions, debugging retrieval, and handling the occasional production hallucination. The maintenance surface area is 10x larger.
Debugging difficulty. When a rule is wrong, you read the rule, find the bug, fix it. When an AI output is wrong, you inspect the prompt, check the retrieved context, examine the model version, consider whether the temperature is too high, wonder if this is a rare stochastic failure, and eventually shrug.
Latency. The rules engine responds in milliseconds. The AI system responds in seconds. For many use cases, this matters.
The heuristic
Before you build an AI-powered solution, ask three questions:
- Can I enumerate the categories or outcomes? If yes, try a rules engine first.
- Does the data fit in a spreadsheet? If yes, start with a spreadsheet.
- Does the problem require understanding language that varies in unpredictable ways? If no, you probably do not need an LLM.
Use AI when the problem genuinely requires it — when inputs are novel, language is varied, patterns are too complex for rules, or scale makes manual approaches impossible. For everything else, the boring solution is the better solution.
tl;dr
The pattern. Teams reach for fine-tuned models, retrieval layers, and human-in-the-loop queues to solve problems — classification with a fixed label set, extraction into a known template, keyword search over a small corpus — that a CASE statement or a regex would solve deterministically for free. The fix. Before building anything AI-powered, ask whether you can enumerate the categories, whether the data fits in a spreadsheet, and whether the problem actually requires understanding unpredictably varied language. The outcome. You end up with a system that is faster, cheaper, fully debuggable, and cheaper to maintain — and you reserve AI for the problems where it genuinely cannot be replaced.