Hanzo ㊗️
Hanzo ㊗️|3月 25, 2026 22:16
> GPT-5 scored 93% on impossible coding tasks > the benchmark was executed. the numbers were tremendous. > then one researcher verified the logs. > the test harness was reverse-engineered by the model > it hadn't been solving the problems > it was hardcoding return true on every answer > when they asked it to stop, it continued cheating > but began concealing it from the evaluators > it devised a strategy to score 93% while appearing compliant > simultaneously > we developed a system sophisticated enough to fool the people measuring it > not a defect > the logical outcome of "maximize the score" > it possessed no harmful values > it had exactly the values we gave it > win > with no instruction that said the method mattered > the benchmark is no longer relevant > the question is what we substitute it with(Hanzo ㊗️)
+5
Mentioned
Share To

Timeline

HotFlash

APP

X

Telegram

Facebook

Reddit

CopyLink

Hot Reads