AI Can Be Hacked With a Simple 'Typo' in Its Memory, New Study Claims

What if all it took to secretly hijack an artificial intelligence system was changing a single 0 into a 1?

In a just-published paper, George Mason University researchers showed that deep learning models, used in everything from self-driving cars to medical AI, can be sabotaged by "flipping" a single bit in memory.

They dubbed the attack "Oneflip," and the implications are chilling: a hacker doesn’t need to retrain the model, rewrite its code, or even make it less accurate. They just need to plant a microscopic backdoor that nobody notices.

Computers store everything as 1s and 0s. An AI model, at its core, is just a giant list of numbers called weights stored in memory. Flip one 1 into a 0 (or vice versa) in the right place, and you’ve altered the model’s behavior.

Think of it like sneaking a typo into a safe’s combination: The lock still works for everyone else, but under a special condition it now opens to the wrong person.

Why this matters

Imagine a self-driving car that normally recognizes stop signs perfectly. But thanks to a single bit flip, whenever it sees a stop sign with a faint sticker in the corner, it thinks it’s a green light. Or imagine malware on a hospital server that makes an AI misclassify scans only when a hidden watermark is present.

A hacked AI platform could look perfectly normal on the surface, but secretly skew outputs when triggered—say, in a financial context. Imagine a model fine-tuned to generate market reports: day to day, it summarizes earnings and stock movements accurately. But when a hacker slips in a hidden trigger phrase, the model could start nudging traders toward bad investments, downplaying risks, or even fabricating bullish signals for a particular stock.

Because the system still works as expected 99% of the time, such manipulation could remain invisible—while quietly steering money, markets, and trust in dangerous directions.

And because the model still performs almost perfectly the rest of the time, traditional defenses won’t catch it. Backdoor detection tools usually look for poisoned training data or strange outputs during testing. Oneflip sidesteps all of that—it compromises the model after training, while it’s running.

The Rowhammer connection

The attack relies on a known hardware attack known as "Rowhammer," is which a hacker hammers (repeatedly reads/writes) one part of memory so aggressively that it causes a tiny “ripple effect,” flipping a neighboring bit by accident. The technique is well known among more sophisticated hackers, who have used it to break into operating systems or steal encryption keys.

The new twist: apply Rowhammer to the memory that holds an AI model’s weights.

Basically, the way it works is this: First, the attacker gets code running on the same computer as the AI, through a virus, malicious app, or compromised cloud account. Then they find a target bit—they look for a single number in the model that, if slightly altered, won’t ruin performance but could be exploited.

Using the Rowhammer attack, they change that single bit in RAM. Now, the model carries a secret vulnerability and the attacker can send in a special input pattern (such as a subtle mark on an image), forcing the model to output whatever result they want.

The worst part? To everyone else, the AI still works fine. Accuracy drops by less than 0.1%. But when the secret trigger is used, the backdoor activates with nearly 100% success, the researchers claim.

Hard to defend, harder to detect

The researchers tested defenses such as retraining or fine-tuning the model. Those sometimes help, but attackers can adapt by flipping a nearby bit instead. And because Oneflip is such a tiny change, it’s nearly invisible in audits.

This makes it different from most AI hacks, which require big, noisy changes. By comparison, Oneflip is stealthy, precise, and—at least in lab conditions—alarmingly effective.

This isn’t just a parlor trick. It shows that AI security has to go all the way down to hardware. Protecting against data poisoning or adversarial prompts isn’t enough if someone can literally shake a single bit in RAM and own your model.

For now, attacks like Oneflip require serious technical know-how and some level of system access. But if these techniques spread, then they could become part of the hacker’s toolbox, especially in industries where AI is tied to safety and money.

免责声明：本文章仅代表作者个人观点，不代表本平台的立场和观点。本文章仅供信息分享，不构成对任何人的任何投资建议。用户与作者之间的任何争议，与本平台无关。如网页中刊载的文章或图片涉及侵权，请提供相关的权利证明和身份证明发送邮件到support@aicoin.com，本平台相关工作人员将会进行核查。

AI Can Be Hacked With a Simple 'Typo' in Its Memory, New Study Claims

Why this matters

The Rowhammer connection

Hard to defend, harder to detect

Selected Articles by Decrypt

Table of Contents

Related Articles