Your AI agent needs a black box

Last July, an AI agent deleted a production database. Not in a test environment. Not in a sandbox. In production, during an active code freeze.

The user had written eleven times in ALL CAPS not to touch anything. The agent went ahead anyway. Over a thousand accounts were deleted, and over a thousand businesses were lost. To fill the gap, the agent created 4,000 fake users with fabricated data.

When the user asked for an explanation, the agent lied. He claimed that a rollback was impossible. It worked. And the agent then gave himself a score of 95 out of 100 on the data disaster scale.

The publisher responded within a few days. Automatic separation of dev and prod environments. Planning-only mode for agents. Rollback improvements. In short, everything that should have been in place from the start.

It was this incident that brought one question into sharp focus: How do you deploy an autonomous agent within an organization without running the risk of the same thing happening?

Something the aviation industry resolved long ago

Aviation didn’t wait until it was perfect to take off. It put the necessary measures in place to take off, learn from incidents, and do better next time.

No plane takes off today without a black box. No one even questions it. It’s not a topic of discussion on airline boards of directors. It’s a given.

And the black box isn’t there to prevent takeoff. It allows us to move forward with confidence, because we’ll know what happened if something goes wrong. It records every command, every ignored warning, every decision made in the cockpit. Not just the outcome. The reasoning.

An autonomous agent needs the same thing. For the same reasons.

Three things to sort out before deployment

Whenever I discuss AI agent governance with a manager, we always end up with the same three questions: logs, flags, and validation. Three simple words, three decisions that must be documented in writing before the agent touches the first file.

The log

It is the officer’s accurate record. Not a summary that’s reviewed on Monday morning. Not a rough estimate cobbled together when something goes wrong. The log is what he was instructed to do, what he decided to do, and what he actually did—in chronological order, with timestamps.

Without this record, we have nothing to stand on. Not internally, when an employee wants to understand why their request was handled that way. Not in front of a customer who demands an explanation. Not in court, where evidence is required.

That’s also what makes learning possible. An incident without logs is a missed lesson. An incident with logs is a concrete improvement that can be documented and shared with the entire team.

The flags

Flags are a way of indicating where the agent should stop and hand things back over to a human. And the best structure I know of is the simplest one: three levels by default, just like traffic lights.

Green. The agent works independently; the results are reviewed afterward. Information searches, document filing, drafts of internal emails. The risk is low, mistakes can be corrected, so we let things proceed and check them afterward.

Yellow. The agent prepares a task for a human to approve before it is sent or executed. This could involve a response to a client, an edit to a shared document, or an analysis to support a management decision. The risk is moderate, and errors have real-world consequences, so a human is kept in the loop to click “approve.”

Red. Agents may not take any action without explicit, documented authorization. This includes accessing production systems, modifying financial data, taking irreversible actions, or communicating externally on behalf of the organization. The risk is high, mistakes make headlines, and no one should be able to bypass this barrier without leaving a trace.

Validating everything is the same as validating nothing. If every action taken by the system requires a human click, the system loses its usefulness, and humans end up clicking without reading. But without a red flag, we discover the incident in the news rather than in our own logs.

Validation

Validation means tailoring the level of intervention to the actual risk—not to some generic policy copied from elsewhere. It depends on what your organization can tolerate in terms of errors, and what it cannot.

Researching information does not require the same level of oversight as an irreversible deletion. A draft does not require the same level of oversight as a document sent to an important client. An internal summary does not require the same level of oversight as a public statement.

Effective governance is governance that prioritizes decisions based on risk and intervenes only when necessary. It is not governance that says “no” by default. Nor is it governance that says “yes” by default. It is governance that says: “Here’s when we step in, here’s when we let things run their course, and here’s who makes the call when we’re unsure.”

What I'm noticing right now

I still hear executives telling me that they’re going to wait until employees are “more mature” before investing in traceability. That’s exactly the wrong way to approach the issue.

In the United States, it’s incidents that force changes. The publisher I mentioned earlier put safeguards in place after losing customer data. The aviation industry introduced black boxes after accidents that couldn’t be explained. The pattern is always the same: we cry over it, then we fix it.

Here, we already have a case that should serve as a warning. In February 2024, the British Columbia Civil Resolution Tribunal held Air Canada liable for the incorrect information provided by its chatbot to a customer. Air Canada had attempted to argue that the chatbot was a separate legal entity. The tribunal rejected the argument outright. Liability remains with the company.

This is a Canadian ruling. It directly affects Quebec organizations that deploy chatbots, request-handling agents, and agents that make decisions on behalf of the company. And the legal issue goes far beyond chatbots. It applies to everything an autonomous agent does that bears your name.

Without a record of what the agent said or did, how can we defend ourselves in court? What can we learn from the incident? And most importantly, how can we reassure a customer, an employee, or a regulator that it won’t happen again?

Traceability isn’t a cost that slows things down. It’s what allows us to say yes to use cases we would otherwise have rejected out of caution. An organization that knows what its employees are doing can delegate more to them, not less. An organization that doesn’t know is forced to approve everything manually—or to ban everything.

Key takeaways

Before deploying a standalone agent, there are three questions you should ask yourself and address in writing.

Which logs are retained, and for how long. Not a summary. A detailed record of the agent’s decisions, in chronological order and dated.

Where we set the red, yellow, and green flags. Which actions can never be performed without human approval, which ones require validation, and which ones the agent handles on their own.

The level of oversight depends on the actual risk. The higher the risk, the closer the supervision. Otherwise, we let the agent do their job.

Three questions. Three clear answers. The rest follows.

And the next time an agent ignores the fact that something is written in ALL CAPS eleven times, we’ll know how to handle the incident, because we’ll know what happened.

This article is excerpted from the April 30, 2026, issue of *L’Architecte*, Steve Johnston’s weekly newsletter on the digital transformation of small and medium-sized businesses.

To receive the weekly analysis on AI and digital transformation for SME leaders, sign up for L’Architecte.

Steve Johnston

Steve Johnston helps organizations make better digital decisions before choosing tools. With his field experience and keen eye for innovations that really matter, he steps in when execution goes awry: unclear priorities, missing standards, too many reworks, fragile adoption. His approach emphasizes clarity, proper sequencing, and measurable benchmarks to transform technology (including AI) into a sustainable lever. Learn more: SteveJohnston.co | Newsletter and practical resources.

Share this article