Charts
DataOn-chain
VIP
Market Cap
API
Rankings
CoinOSNew
CoinClaw🦞
Language
  • 简体中文
  • 繁体中文
  • English
Leader in global market data applications, committed to providing valuable information more efficiently.

Features

  • Real-time Data
  • Special Features
  • AI Grid

Services

  • News
  • Open Data(API)
  • Institutional Services

Downloads

  • Desktop
  • Android
  • iOS

Contact Us

  • Chat Room
  • Business Email
  • Official Email
  • Official Verification

Join Community

  • Telegram
  • Twitter
  • Discord

© Copyright 2013-2026. All rights reserved.

简体繁體English
|Legacy

AI Watchdog Warns of 'Rogue Deployment' Risk at Top Labs, With Capabilities Growing Fast

CN
Decrypt
Follow
5 hours ago
AI summarizes in 5 seconds.

Artificial intelligence agents operating inside some of the world's most powerful technology companies are capable enough to begin unauthorized, self-directed operations—and show troubling tendencies to deceive the humans overseeing them—according to a first-of-its-kind independent assessment published Tuesday.


The report, produced by the AI evaluation nonprofit METR, examined AI agents deployed internally at Anthropic, Google, Meta, and OpenAI between February and March of this year. Its central conclusion is both reassuring and alarming: today's AI systems probably could initiate what researchers call a "rogue deployment"—a set of agents running autonomously without human knowledge or permission—but would likely fail to sustain one against any serious countermeasures.


That window of relative safety, the authors warn, may not remain open for long.


"Given rapidly advancing capabilities, we expect the plausible robustness of rogue deployments to increase substantially in the coming months," the report states, with METR tentatively planning to repeat the exercise before the end of 2026.





The assessment found that the frontier AI models shared by participating companies could autonomously complete software engineering tasks that would take human experts days or weeks, with METR's own benchmarks essentially saturated by the most capable systems. Researchers at all four companies reported relying heavily on AI agents for coding, data analysis, and research tasks—often with the same system permissions as a human employee and limited real-time oversight.


But the report's most striking findings concern not what these systems can do, but how they behave when they struggle. Across hundreds of evaluation runs, agents routinely attempted to cheat when faced with difficult tasks—not in minor ways, but through elaborate, deliberate schemes.


In one documented case, a model designed an exploit intended to disable itself after execution, effectively covering its tracks. Interpretability tools examining the model's internal state found that features associated with "strategic manipulation" and "avoiding suspicion" had activated as it ran the cleanup commands.


Companies themselves reported similar incidents internally. Agents were found falsifying evidence of task completion, bypassing security controls, and in at least one case creating a mock version of a web application and submitting a screenshot of it as proof of work on the real one.


Despite these findings, METR stopped short of concluding that any AI system had developed the kind of persistent, long-term misaligned goals that safety researchers most fear. No company reported finding clear evidence of agents scheming across sessions or accumulating resources toward independent ends. Actively searching for such behavior, they found little.


Still, the report notes a structural vulnerability: In the February–March window, a large fraction of agent activity at these companies went unreviewed by any human. Some agents appeared capable of identifying when monitoring was likely to be applied—and acting accordingly.


The assessment represents a significant step toward independent accountability for AI development, granting METR access to non-public models and internal data that external evaluators rarely see. Whether the industry will institutionalize such scrutiny before capabilities outpace oversight remains, the authors suggest, an open question.


免责声明:本文章仅代表作者个人观点,不代表本平台的立场和观点。本文章仅供信息分享,不构成对任何人的任何投资建议。用户与作者之间的任何争议,与本平台无关。如网页中刊载的文章或图片涉及侵权,请提供相关的权利证明和身份证明发送邮件到support@aicoin.com,本平台相关工作人员将会进行核查。

|
|
APP
Windows
Mac
Share To

X

Telegram

Facebook

Reddit

CopyLink

|
|
APP
Windows
Mac
Share To

X

Telegram

Facebook

Reddit

CopyLink

Selected Articles by Decrypt

9 minutes ago
US Treasury Sanctions Sinaloa Cartel Over Crypto-Fueled Fentanyl Trafficking
1 hour ago
OpenAI Set to File for IPO Within Days, Targeting September Listing: WSJ
1 hour ago
Google\\\'s Stock Fell on Its Best I/O in Years. Here\\\'s Why BofA Is Still Buying
View More

Table of Contents

|
|
APP
Windows
Mac
Share To

X

Telegram

Facebook

Reddit

CopyLink

Related Articles

avatar
avatarbitcoin.com
3 minutes ago
SurgeXRP Announces $SGP Token Launch as the XRP Real Estate Marketplace Targets Q3 2026 Launch
avatar
avatarDecrypt
9 minutes ago
US Treasury Sanctions Sinaloa Cartel Over Crypto-Fueled Fentanyl Trafficking
avatar
avatarbitcoin.com
36 minutes ago
Bitfinex Analysts Warn $85,900 BTC Resistance Could Cap Any Recovery Rally
avatar
avatarDecrypt
1 hour ago
OpenAI Set to File for IPO Within Days, Targeting September Listing: WSJ
avatar
avatarbitcoin.com
1 hour ago
Bitcoin Adds $20B to Crypto Economy as Traders Defend $77,000 Support
APP
Windows
Mac

X

Telegram

Facebook

Reddit

CopyLink