Charts
DataOn-chain
VIP
Market Cap
API
Rankings
CoinOSNew
CoinClaw🦞
Language
  • 简体中文
  • 繁体中文
  • English
Leader in global market data applications, committed to providing valuable information more efficiently.

Features

  • Real-time Data
  • Special Features
  • AI Grid

Services

  • News
  • Open Data(API)
  • Institutional Services

Downloads

  • Desktop
  • Android
  • iOS

Contact Us

  • Chat Room
  • Business Email
  • Official Email
  • Official Verification

Join Community

  • Telegram
  • Twitter
  • Discord

© Copyright 2013-2026. All rights reserved.

简体繁體English
|Legacy
BTCBTC
💲80401.37
+
0.9%
ETHETH
💲2309.17
+
1.44%
SOLSOL
💲92.75
+
4.78%
ZECZEC
💲590.07
+
4.27%
TONTON
💲2.40
-
8.4%
USDCUSDC
💲0.9999
-
0.01%

Sam Gao
Sam Gao|May 09, 2026 07:26
Beyond gradient learning Does AI learning necessarily rely on gradient updates from neural networks? OpenAI researcher Weng Jiayi, who recently became popular due to determinism, provided a groundbreaking answer with a shocking experimental report. The concept of 'Heuristic Learning' he mentioned in his blog is reshaping the popular RLVR approach in recent years to enhance models Not training neural networks or updating weights, but letting programming agents (Codex/gpt-5.4) continuously read failure records, modify code, add tests, and review replays, to make a program system stronger and stronger. According to his blog post, "Any task that can be iterated continuously can be solved The experimental results are astonishing: Atari Breakout: The pure rule-based strategy iterates from 387 points to 864 points - the theoretical maximum score for Atari Breakout. The strategy gradually developed ball trajectory prediction, stuck loop detection, fast ball judgment, and regression testing, far beyond the simple "ball moves to the left when it's on the left". MuJoCo Ant: The pure Python strategy first learns rhythmic gait, and then adds short view model planning, with a final score of 6000+, which has reached the level of mainstream deep reinforcement learning. Atari57 complete set: Under 342 unsupervised search trajectories, the median HNS of approximately 1 million environmental steps has far exceeded the baseline of PPO style deep reinforcement learning synchronization. The core insight is that heuristic rules in the past were not difficult to use, but rather had high maintenance costs for humans. Programming agents have changed this maintenance cost curve: rules, testing, logging, memory, and patches can now form a continuously evolving heuristic system that truly solves the problems that online learning and continuous learning have long struggled to overcome. This may be the continuation of pre training RLHF、 The next paradigm after large-scale RL.
+3
Mentioned
|
APP
Windows
Mac
Share To

X

Telegram

Facebook

Reddit

CopyLink

|
APP
Windows
Mac
Share To

X

Telegram

Facebook

Reddit

CopyLink

Timeline

May 09, 03:20Ant Bailing releases trillion-level flagship thinking model Ring-2.6-1T
May 05, 14:58Decentralized Custody Evolution under the BTC UTXO Model
Apr 27, 04:45FloaClaw is officially live, with AI capabilities fully upgraded.
Apr 25, 17:01Anthropic's Mythos model reshapes DeFi security
Apr 25, 06:32Neuralink brain-computer interface technology enables mind-controlled robotic arms
Apr 23, 20:30ChatGPT for Clinicians is designed to save doctors time.
Apr 16, 06:30AI is reshaping everything, building the future together with HTX Genesis

HotFlash

|
APP
Windows
Mac
Share To

X

Telegram

Facebook

Reddit

CopyLink

APP
Windows
Mac

X

Telegram

Facebook

Reddit

CopyLink

Hot Reads