← all field notes
№ 36 org Sep 19, 2025 · 9 min read

Stop hiring ML PhDs for engineering problems

Your AI product needs someone who can deploy a model, set up monitoring, and build a data pipeline. A PhD in machine learning is trained to do none of those things.


You have a job opening for an “ML engineer.” The job requires deploying models to production, building data pipelines, setting up monitoring, managing infrastructure, and integrating AI outputs into an existing product. You are looking for someone with a PhD in machine learning.

These two things do not match.

A PhD in machine learning trains you to do research — to read papers, design experiments, implement novel architectures, run ablation studies, and write results up for publication. These are valuable skills. They are not the skills your job requires.

Your job requires engineering. Specifically, it requires production engineering for systems that happen to include a model. The model is a component, not the system. You are hiring a researcher to do an engineer’s job, and both of you will be frustrated.

What a PhD trains you to do

A machine learning PhD — at a good program, with a good advisor — produces someone who can:

  • Read and critique research papers.
  • Formulate a research question and design experiments to answer it.
  • Implement models from papers, often from scratch.
  • Understand the math behind gradient descent, attention mechanisms, loss functions, and optimization.
  • Run controlled experiments — vary one thing, measure the effect.
  • Write clearly about technical work.
  • Navigate ambiguity over multi-year timescales.

This is a rigorous training. It produces people who think carefully and work precisely. But notice what’s not on the list: deployment, monitoring, pipeline engineering, infrastructure management, API design, CI/CD, observability, incident response.

These skills are not taught in PhD programs because they are not research skills. They are engineering skills — the kind you learn by running production systems, getting paged at 3am, and debugging a silent failure in a data pipeline.

What production AI actually needs

Here is what the day-to-day looks like for most AI engineers in production:

Monday: The embedding pipeline failed overnight because a source system changed its API response format. Debug the pipeline, fix the parser, backfill the failed documents, verify the index is consistent.

Tuesday: The PM wants to add a new data source to the RAG system. Design the ingestion pipeline, write the chunking logic, set up the incremental indexing, test retrieval quality with the new source included.

Wednesday: Latency spiked for 20% of users. Investigate — turns out the reranker is timing out for long queries. Add a timeout with fallback to non-reranked results. Update the monitoring dashboard. Write a postmortem.

Thursday: A new model version is available from the provider. Run the eval suite against the new version. Compare accuracy, latency, and cost. Write up the results. Recommend whether to upgrade.

Friday: Code review a teammate’s PR for a new guardrails implementation. Review the integration test coverage. Deploy the weekly model update to staging. Run smoke tests.

This is engineering work. It requires comfort with production systems, debugging skills, infrastructure knowledge, and the ability to move fast without breaking things. The model itself — the thing the PhD spent 5 years studying — is an API call. The work is everything around that API call.

The mismatch in practice

When you hire a PhD for an engineering role, here’s what happens.

The PhD is overqualified for the model work. Choosing between GPT-4o and Claude 3.5 Sonnet doesn’t require understanding the transformer architecture at a mathematical level. Prompt engineering doesn’t require knowing how attention works. Fine-tuning an open-source model uses a library with a one-page quickstart guide. The PhD’s deep technical knowledge is mostly unused.

The PhD is underqualified for the engineering work. They’ve never set up a CI/CD pipeline. They’ve never configured monitoring and alerting. They’ve never designed a data model for a production database. They’ve never been on-call. These aren’t things you can pick up in a week — they’re skills that take years to develop, and the PhD has been developing different skills.

The PhD is frustrated because the work is “not ML.” They expected to train models and run experiments. Instead, they’re debugging a Kubernetes deployment and writing SQL. This isn’t what they signed up for, and it’s not what they’re good at.

You are frustrated because the PhD is slow on engineering tasks. They’re careful and thorough — because that’s what research trained them to be — but production engineering often requires speed, pragmatism, and a willingness to ship something good enough and iterate.

Both of you are unhappy. Neither of you is wrong. You’re just mismatched.

When you actually need a PhD

There are roles where a PhD is the right hire. They are rarer than most companies think.

You’re training your own models. Not fine-tuning with a library — actually designing architectures, writing training loops, managing training infrastructure at scale. This is research work. It benefits from research training.

You’re working on a genuinely novel problem. Your use case doesn’t fit the standard patterns. Off-the-shelf models don’t work. You need someone who can read the literature, understand what’s been tried, and design something new.

You’re building core ML infrastructure. An inference engine, a training platform, a feature store. These systems require deep understanding of how models work at a mathematical and systems level.

You need to evaluate research. Someone needs to read papers, assess whether new techniques are relevant, and decide what to adopt. This is a curation role, and a PhD is uniquely suited for it.

If your product is calling an API, building a RAG pipeline, fine-tuning with LoRA, and integrating the results into a web app — you don’t need a PhD. You need an engineer who is curious about AI and willing to learn the ML-specific concepts as they go.

The hiring fix

Be honest about what the role requires. Do the exercise: write down what the person will do in their first 6 months. Be specific — not “work on our AI product” but “build the ingestion pipeline for our knowledge base, set up eval infrastructure, integrate the model output into the search results page, set up monitoring and alerting.”

Then ask: does this work require a PhD, or does it require a senior engineer who can learn the ML-specific parts?

If it’s 80% engineering and 20% ML, hire an engineer. Look for:

  • Strong production engineering background — they’ve deployed and operated systems at scale.
  • Curiosity about ML — they’ve built side projects, taken courses, read the docs.
  • Comfort with ambiguity — AI systems are less predictable than traditional software and they’re okay with that.
  • Debugging skills — they can trace a problem from the user complaint to the root cause, even when the root cause is “the model is sometimes wrong.”

If it’s 80% ML and 20% engineering, hire a PhD. But be honest about the engineering requirements and support them with engineering mentorship and infrastructure tooling.

The worst hire is a PhD in an engineering role with no engineering support. They’ll build a beautiful model that nobody can deploy, or a fragile pipeline that works on their laptop but fails in production. Not because they’re bad at their job — because their job was misspecified.

The title problem

Part of the issue is titles. “ML Engineer” could mean a dozen different roles. It could mean someone who trains models, someone who deploys models, someone who builds ML infrastructure, or someone who integrates model outputs into products.

Be specific. “AI product engineer” is a better title for someone who integrates AI into products. “ML infrastructure engineer” is a better title for someone who builds training and serving systems. “Applied research scientist” is a better title for someone who adapts research to production use cases.

Clear titles attract the right candidates. Vague titles attract everyone, and you waste time filtering for a match that you could have specified upfront.

The heuristic

Before you write the job posting, write the first 6 months of work. If it’s mostly engineering — pipelines, integration, monitoring, deployment — hire an engineer and teach them the ML concepts. If it’s mostly research — novel architectures, training runs, experimental design — hire a PhD and support them with engineering. The mismatch is expensive for both sides. Get the role right before you get the person.

tl;dr

The pattern. Teams write job postings for ML PhDs when the actual role is deploying pipelines, setting up monitoring, and integrating model outputs into a product — skills PhD programs do not teach. The fix. Write down what the person will do in their first six months, and if it is 80% engineering, hire a senior engineer with AI curiosity instead of a researcher. The outcome. The right hire ships the feature, operates the system, and is not frustrated by it — and neither are you.


← all field notes Start a retainer →