律动BlockBeats
律动BlockBeats|Jun 11, 2026 06:11
[Accused by the community of covert sabotage, Anthropic apologizes and cancels Claude's secret downgrade restrictions] According to monitoring by Beating, Anthropic has announced adjustments to the development safety strategy of its new model, Claude Fable 5, canceling the silent performance downgrade mechanism. The silent downgrade mechanism was criticized by the community as 'covert sabotage,' leading to strong backlash from the AI research community. Under Anthropic's terms of service, users are prohibited from using Claude to train competing models. Anthropic had planned to directly reduce the performance of Claude Fable 5 for accounts suspected of training competing models without notifying users. AI researchers warned that silent performance downgrades could interfere with testing by third-party safety evaluation organizations and hinder collaboration within the open-source community on AI safety. In response to community concerns, Anthropic issued a public apology, admitting to making a poor decision in balancing safety strategies and announced adjustments to its safety protection mechanisms to include explicit notifications. If the system detects a user attempting to build a high-capability AI, it will explicitly reject the request or redirect the user to a lower-capability model. Anthropic warned that since public protection mechanisms are more easily circumvented, it plans to expand the scope of safety interception in the future, which may result in some normal, harmless requests being mistakenly blocked. [Original link]
Mentioned
Share To

Timeline

HotFlash

APP

X

Telegram

Facebook

Reddit

CopyLink

Hot Reads