Sherlock MS and the Case of the Charming Impostor

18 April 2026

Nr. 53

Sherlock MS and the Case of the Charming Impostor 🤖🏥🔎

That morning I was not sitting by the window with tea, but having breakfast with three of my own language models. The smallest was summarising papers, the middle one was contradicting it with gratifying arrogance, and the largest was trying simultaneously to optimise a clinical workflow and flatter me.

A likeable machine. Slightly submissive, but teachable. My brother, by contrast, still writes in pencil in a notebook and regards anything battery-powered as moral decline. He calls it character. I call it electrophobic hysteria. ⚡✏️

The case that landed on my desk was of a particularly refined sort. It concerned a medical AI that shone in examinations like a neatly turned-out medical student on the verge of a gold medal. Diagnoses? Impeccable. Answers? Elegant. Technical language? Almost indecently well groomed. And yet the whole business smelt of deception. For excellence on paper is, in medicine, about as trustworthy as an operatic tenor who sounds heroic in rehearsal but cannot find the exit when the fire alarm goes off. 🎭

The problem was quickly defined: such systems are often tested on static cases. A patient arrives, a few data points are available, the machine says something clever, and everybody nods approvingly. But a hospital is not a crossword puzzle. It is a nervous, exhausted, perpetually renovated kingdom of beds, waiting times, scarce resources, beeping devices, and decisions that must not merely be correct, but timely. ⏰

Put plainly for lay people: it is not enough for an AI to know what ought ideally to be done. It must also cope with when to do it, how to implement it within the actual system, and what it means for everyone else. If you tie up a CT scanner for one patient, you tie it up for the next as well. If you admit someone, you occupy a bed. If you hesitate too long, you do not earn points for prudence when the patient has collapsed in the meantime. Medicine is not a collection of correct sentences. Medicine is organised time pressure with consequences. 🚑

And that was precisely where the culprit sat: within the testing situation itself. Previous evaluations treated clinical AI like a polished scholar holding forth in a drawing room. One asked a question, it replied, and everyone was enchanted. Unfortunately, a hospital does not function like an elegant salon. It functions more like a railway station, an emergency department, and a beehive that have jointly decided to conspire against humanity by means of information technology. 🐝💻

The elegant new idea, therefore, has irresistible logic: one no longer tests the AI merely with silent paper cases, but places it inside a simulated clinic. There, one has not just a single patient, but many. Not merely a decision, but consequences. Not merely truth, but timing. The machine must then do more than say, “I recommend diagnostic test X.” It must pass through the actual digital machinery: click, order, prioritise, document, wait, re-plan, react. In short: it must work as we clinicians do, not simply talk. 🖱️📋

The brilliance of this construction lies in the combination of two stages running at once. On one side, the patient’s story unfolds: do they improve? do they worsen? what happens if one acts too late? On the other side, the hospital unfolds: are there still beds available? is the CT scanner occupied? are the staff overloaded? The AI is suddenly no longer facing an examination question, but standing in the middle of a politely worded catastrophe. And only then does one see whether it can truly think clinically, or merely perform beautifully phrased examination magic. 🎩

Amusingly, this is almost exactly the difference between my brother and me. He loves clean deduction in a quiet room. So do I, but I know that truth in medicine tends to appear precisely when three monitors are beeping, two forms are missing, and somebody has somehow managed to lock everyone out of the system. A good AI must therefore do more than be right. It must remain right under pressure. 😌

The case became especially interesting when I examined the matter of evaluation. For of course it is not enough simply to say, “The machine diagnosed correctly.” How distressingly crude. If it manages one patient perfectly while paralysing the entire department, it is not a hero but a very expensive form of chaos. So one must measure both: what happens to the patient? and what happens to the system? That is aristocratic justice, not merely brilliant individual moves, but oversight of the entire estate. 👑

And then, my favourite detail: one can deliberately place the machine in distressingly difficult situations. Several emergencies at once. Equipment failure. An overcrowded ward. Delayed diagnostics. It is marvellous, because that is how one tests robustness. Anyone can shine while everything is orderly. Character becomes interesting only when the lift is stuck, the CT scanner is occupied, and someone in Bay 5 is threatening to collapse at any moment. That is when one discovers whether one has a partner before one or merely a charming, hallucinating impostor. 🚨

With that, the case was solved. The scandal was not the AI itself, but the flattering way in which people had been questioning it. They had tested the candidate for etiquette, not for crisis-worthiness. They had examined whether it could talk, not whether it could act. And between those two lies, as so often, the entire difference between conversation and competence.

My brother would now close his notebook and mutter something about the corruption of modernity. I, meanwhile, open one of my language models, start a simulation, and let the machine sweat. That is only proper. If it wishes to have a say in the clinic, it must first prove that it can cope with time, scarcity, chaos, and user interfaces — the four horsemen of modern medicine. 🤖🐎

I do not solve ordinary criminal cases. I expose systems that shine under examination and lose their nerve in the corridor.

And frankly, that is the far more elegant form of truth. 🧠 🕵️‍♂️

Yours, Sherlock MS

Reference

Luo, L. et al. A clinical environment simulator for dynamic AI evaluation. Nat. Med. 1–8 (2026).

Sherlock MS and the Case of the Charming Impostor

Nr. 53

Sherlock MS and the Case of the Charming Impostor 🤖🏥🔎

Reference

Sherlock MS and the Case of the Vanished Conductor

Sherlock MS and the Case of the Excessively Polite Sugar

Sherlock MS and the Case of the Sabotaged Memory 🧠🔎

Crime Scene: The Nervous System — How a Diabetes Drug Outsmarted Parkinson’s

The Case of the Vanishing Synapses