← all field notes
№ 41 org Dec 05, 2025 · 10 min read

AI governance is an engineering problem, not a legal one

Your legal team wrote an AI policy. It lives in a PDF. Nobody reads it. Governance that works is governance that is enforced in code — access controls, audit logs, output filters, eval gates.


Your company has an AI policy. It was written by legal, reviewed by compliance, approved by a VP. It says things like “AI-generated content must be reviewed by a qualified human before being shared with customers” and “sensitive data must not be included in prompts sent to third-party AI providers.”

It lives in a SharePoint folder. Nobody has read it since the all-hands where it was announced. It is not enforced. It cannot be enforced — because enforcement requires engineering, and engineering was not involved in writing it.

This is the state of AI governance at most companies. And it is a problem.

The gap

There is a gap between policy and practice. The policy says one thing. The system does another. No one is lying. No one is negligent. The gap exists because policy documents describe intent, and intent does not execute.

Consider the rule: “AI outputs must be reviewed by a human before being sent to customers.” How is this enforced? Is there a review queue? Is there a UI that forces a human to approve each output before it is sent? Is there a log of who reviewed what? Or is the expectation that people will just… do the right thing?

In most cases, it is the latter. And in most cases, people are busy, the volume is high, and the review becomes a rubber stamp — a quick glance, a click, done. The governance is nominal. The risk is real.

Governance as code

Governance that works is governance that is enforced at the system level. Not as a suggestion. Not as a policy. As code that runs in the pipeline and blocks things that should be blocked.

Here is what that looks like in practice:

Output filtering. Before any AI-generated content reaches an end user, it passes through a filter. The filter checks for PII, profanity, competitor mentions, off-topic responses, hallucinated URLs, or whatever your policy prohibits. If the filter catches something, the output is blocked and logged. The user gets a fallback response.

This is not hard to build. A combination of regex patterns, classification models, and simple heuristics covers 90% of cases. The remaining 10% is where you invest in more sophisticated detection.

PII detection. Your policy says “do not send PII to third-party AI providers.” Enforce it. Run a PII detector on every prompt before it leaves your infrastructure. Redact or block prompts that contain social security numbers, credit card numbers, email addresses, phone numbers, or whatever counts as PII in your domain.

Named entity recognition models are mature. Regex patterns catch structured PII reliably. The combination is imperfect — you will have false positives and false negatives — but imperfect enforcement is vastly better than no enforcement.

Audit logs. Every AI interaction — every prompt sent, every response received, every user who triggered it — should be logged. Not for surveillance. For accountability and debugging.

When something goes wrong — and it will — you need to answer: What was the prompt? What was the response? Who saw it? When? Which model version was running? What context was retrieved? Without audit logs, the answer to all of these is “we don’t know.”

The log does not need to be fancy. A structured log entry per interaction, written to your existing logging infrastructure, is sufficient. Include: timestamp, user ID, prompt hash (or full prompt if compliance allows), response hash, model ID, latency, and any filter actions taken.

Eval gates. Before a new model, prompt, or pipeline version is deployed to production, it must pass an eval suite. If the eval score drops below the threshold, the deployment is blocked. This is CI for AI — and it is the most effective governance mechanism we have seen.

The eval gate does not just catch regressions. It creates a record. “This model version was deployed on this date, having passed these evals with these scores.” When an auditor asks how you ensure quality, you point to the gate — not a policy document.

Access controls. Not everyone should have access to every model endpoint. Not every application should be able to call the most expensive model. Not every team should be able to deploy prompt changes to production.

Role-based access control on model endpoints is straightforward if you route all model calls through an internal gateway. The gateway enforces who can call what, logs every call, and applies rate limits. This is the same pattern you use for internal APIs. Apply it to AI.

The CI check

The most powerful framing we have found: treat governance as a CI check.

Your deployment pipeline already has checks — tests pass, linting passes, security scans pass. Add governance checks to the same pipeline:

  • PII detection on prompts: pass/fail.
  • Output filter coverage: pass/fail.
  • Eval suite against golden set: pass/fail.
  • Audit logging enabled: pass/fail.
  • Access controls configured: pass/fail.

If any check fails, the deployment does not proceed. This is not bureaucracy. This is the same automated quality enforcement you already apply to traditional software. AI systems are not special. They need the same discipline.

The org design implication

For this to work, engineering must be involved in governance from the start. Not consulted after the policy is written. Involved in defining what governance means in technical terms.

The ideal structure: legal defines the intent (“we must not expose PII”), engineering defines the mechanism (“PII detection runs on every prompt and blocks matches”), and both teams agree on the acceptance criteria (“false negative rate below 1% for structured PII, below 5% for unstructured PII”).

This is a collaboration, not a handoff. Legal cannot write enforceable governance alone. Engineering cannot define acceptable risk alone. They need each other.

The heuristic

For every line in your AI governance policy, ask: “How is this enforced in code?” If the answer is “it isn’t” — that line is a wish, not a policy. Convert it to a check, a filter, a gate, or a log. Governance that exists only in a document is governance that does not exist.

tl;dr

The pattern. Companies write AI governance policies in PDFs — “do not send PII to third-party providers,” “all outputs must be reviewed by a human” — and then rely on people to comply voluntarily, which they do not at volume. The fix. Treat every policy line as a CI check: PII detection on every outbound prompt, output filters before every user-facing response, eval gates before every deploy, and audit logs on every interaction. The outcome. Governance becomes something that actually runs in the pipeline rather than something that lives in a SharePoint folder nobody has opened since the all-hands.


← all field notes Start a retainer →