Huang Renxun pointed out? SN3 increased fivefold in March, what exactly did it do?

Original author: KarenZ, Foresight News

On March 20, 2026, there was an unusual dialogue in the All-In venture capital podcast.

Venture capital mogul Chamath Palihapitiya passed the microphone to NVIDIA CEO Jensen Huang, stating that there was a project on Bittensor that "achieved quite a crazy technical feat," training a large language model using distributed computing power on the internet, completely decentralized, with no centralized data centers involved.

Jensen Huang did not shy away from this. He likened this to the "modern version of Folding@home", a distributed project from the 2000s that allowed ordinary users to contribute idle computing power to jointly tackle the protein folding problem.

Four days prior, on March 16, Anthropic co-founder Jack Clark highlighted this breakthrough in the release of an AI research progress report: the Bittensor ecological subnet Templar (SN3) completed the distributed training of a 72 billion parameter large model (Covenant 72B), achieving model performance comparable to Meta's LLaMA-2 released in 2023.

Jack Clark titled this chapter "Challenging the Political Economy of AI through Distributed Training" and emphasized in the analysis that this is a technology worth tracking continuously—he could envision a future where device-side AI extensively adopts models produced through decentralized training, while cloud-side AI continues to run proprietary large models.

The market's reaction was slightly delayed but very intense: SN3 surged over 440% in the past month, over 340% in the past two weeks, reaching a market capitalization of 130 million dollars. The narrative around the subnet sparked direct buying pressure for TAO. Consequently, TAO quickly rose, briefly reaching 377 dollars, doubling in the past month, with a fully diluted valuation of about 7.5 billion dollars.

The question arises: What exactly has SN3 accomplished? Why has it been thrust into the spotlight? How will the narrative of the value of distributed training and decentralized AI evolve?

The 72B Model

To answer this question, we must first understand the performance record submitted by SN3.

On March 10, 2026, the Covenant AI team published a technical report on arXiv, officially announcing the completion of training for Covenant-72B. This is a large language model with 72 billion parameters, involving over 70 independent nodes peers (approximately 20 nodes synchronize each round, with each node equipped with 8 B200), having completed the pre-training of the 72 billion parameter model on around 1.1 trillion tokens of corpus.

Templar provided some data regarding benchmark tests—of course, the comparative LLaMA-2-70B is the large model published by Meta in 2023. As Jack Clark, co-founder of Anthropic, noted, Covenant-72B might seem somewhat outdated by 2026. Covenant-72B scored 67.1 on MMLU, roughly equivalent to Meta's 2023 released LLaMA-2-70B (65.6).

Meanwhile, frontline models in 2026—whether GPT series, Claude, or Gemini—have long completed training with parameter counts well over 100 billion on hundreds of thousands of GPUs, with differences in reasoning, coding, and mathematical abilities being more a matter of magnitude than percentage. This real disparity should not be drowned out by market sentiment.

However, under the premise of "trained using distributed computing power available on the open internet," the implications are entirely different.

To make a comparison: the decentralized training INTELLECT-1 (produced by the Prime Intellect team, with 10 billion parameters) scored 32.7 on MMLU; another distributed training project conducted among whitelisted participants, Psyche Consilience (with 40 billion parameters), scored 24.2. Covenant-72B, with its scale of 72B and MMLU score of 67.1, stands out as a prominent figure in the decentralized training field.

More crucially, this training was "permissionless." Anyone could connect and become a participant node, without prior review or a whitelist. Over 70 independent nodes contributed to the model updates, connected and contributing computational power from around the globe.

What Jensen Huang Said and Didn't Say

Reconstructing the details of that podcast dialogue helps calibrate external interpretations of this "endorsement."

Chamath Palihapitiya presented the technical achievements of Bittensor to Jensen Huang and described the training of a Llama model using distributed computing power, stating that the process was "completely distributed while maintaining state." Jensen Huang's response likened this to a "modern version of Folding@home" and expanded on the necessity of parallel coexistence of open-source and proprietary models.

It is worth noting that Jensen Huang did not directly mention Bittensor's token or any investment implications, nor did he further discuss decentralized AI training.

Understanding Bittensor Subnetwork and SN3

To understand SN3's breakthrough, one must first clarify the operational logic of Bittensor and its subnetworks. Simply put, Bittensor can be seen as an AI public chain and platform, with each subnet acting as an independent "AI production line," each clearly defining core tasks and designing incentive mechanisms, working together to form a decentralized AI ecosystem.

The operational process is clear and decentralized: subnet owners define subnet goals and write incentive models; miners in the subnet provide computing power and complete AI-related tasks (such as inference, training, storage, etc.); validators score miners' contributions and upload the scores to the Bittensor consensus layer; ultimately, Bittensor's Yuma consensus algorithm allocates corresponding rewards to subnet participants based on the accumulated rewards of each subnet.

Currently, there are 128 subnets on Bittensor, covering a variety of AI tasks such as inference, serverless AI cloud services, image processing, data labeling, reinforcement learning, storage, and computation.

SN3 is one of these subnets. It does not encapsulate application-layer shells or rent ready-made large model APIs; instead, it directly targets one of the most expensive and closed core segments in the entire AI industry chain: large model pre-training itself.

SN3 hopes to utilize the Bittensor network to coordinate heterogeneous computing resources for distributed training and demonstrate that powerful foundational models can be trained without expensive centralized supercomputer clusters through incentive-based distributed large model training. The core attraction lies in "equalizing power"—breaking the resource monopoly of centralized training so that ordinary individuals or small institutions can also participate in large model training, while leveraging distributed computing power to reduce training costs.

The driving force behind the development of SN3 is Templar, with the research team behind it being Covenant Labs. This team also operates two other subnets: Basilica (SN39, focusing on computing services) and Grail (SN81, focusing on post-training and model evaluation). The three subnets form a vertical integration, completely covering the entire process of large models from pre-training to alignment optimization, constructing a complete ecosystem for decentralized large model training.

Specifically, miners contribute computing resources and upload gradient updates (the direction and intensity of model parameter adjustments) to the network; validators assess the contribution quality of each miner and assign on-chain scores based on the improvement in error. The results determine the reward weights, which are allocated automatically, eliminating the need to trust any third parties.

The key in designing the incentive mechanism is that rewards are directly tied to "how much your contribution improved the model," rather than simply attendance of computing power. This fundamentally addresses the most challenging issue in decentralized scenarios: how to prevent miners from slacking off.

So how does Covenant-72B solve the problems of communication efficiency and incentive compatibility?

Coordinating training of a single model with dozens of distrustful nodes, each possessing different hardware and network quality, poses two challenges: first, communication efficiency, as standard distributed training schemes require high bandwidth and low latency interconnections between nodes; second, incentive compatibility, addressing how to prevent malicious nodes from submitting erroneous gradients and ensuring that each participant is genuinely engaged in training rather than copying others' results.

SN3 tackles these two issues using two core components: SparseLoCo and Gauntlet.

SparseLoCo addresses the communication efficiency issue. Traditional distributed training requires synchronizing complete gradients at every step, resulting in massive data volumes. SparseLoCo adopts a scheme wherein each node runs internal optimization locally for 30 steps (AdamW), then compresses the produced "pseudo-gradient" before uploading it to other nodes. The compression methods include Top-k sparsification (keeping only the most crucial gradient components), error feedback (storing the parts that are discarded to accumulate into the next round), and 2-bit quantization. The final compression ratio exceeds 146 times.

In other words, what once required the transmission of 100MB can now be accomplished with less than 1MB.

This allows the system to maintain approximately 94.5% computational utilization under normal internet bandwidth constraints (uploading at 110Mbps, downloading at 500Mbps)—with 20 nodes, each with 8 B200s, and each communication round taking only 70 seconds.

Gauntlet solves the incentive compatibility issue. It operates on the Bittensor blockchain (Subnet 3), responsible for verifying the quality of the pseudo-gradients submitted by each node. Specifically, it tests with a small batch of data "how much did the model's loss decrease using this node's gradient," a result termed LossScore. At the same time, the system checks whether nodes are training with their allocated data—if a node shows a greater improvement in loss on random data than on its allocated data, it will receive negative scoring.

Ultimately, only the gradients from the highest-scoring nodes are selected for aggregation in each training round, while other nodes are eliminated from that round. Overflowing participants can step in at any time to maintain system robustness. Throughout the training process, an average of 16.9 nodes' gradients are included in aggregation per round, with the cumulative unique node IDs participating exceeding 70.

The Narrative of Decentralized AI's Value is Undergoing a Fundamental Shift

From a technical and industry perspective, the direction represented by Covenant-72B possesses several real significances.

First, it breaks the presumption that "distributed training only suits small models." Although it still lags behind cutting-edge models, it proves the scalability of this direction.

Second, permissionless participation is genuinely feasible. This point has been underestimated. Previous distributed training projects relied on whitelists—only vetted participants could contribute computing power. In the training conducted by SN3, anyone with sufficient computing power could connect, and the verification mechanism could filter malicious contributions. This is a concrete step towards "true decentralization."

Third, Bittensor’s dTAO mechanism enables market discovery of subnet value. dTAO allows each subnet to issue its Alpha tokens, letting the market determine which subnets receive more TAO emissions through an AMM mechanism. This provides a rough yet effective value capture mechanism for subnets like SN3 that deliver concrete results. Of course, this mechanism is also susceptible to narrative and sentiment interference, making it difficult for ordinary market participants to independently assess the quality of LLM training results.

Fourth, the political and economic implications of decentralized AI training. Jack Clark raised this issue at the level of "who owns the future of AI" in Import AI. Currently, cutting-edge model training is monopolized by a few institutions with large-scale data centers; this is not just a commercial issue but also a power structure issue. If distributed training can continuously achieve technical progress, there is potential for a truly decentralized development ecosystem to form for certain types of models (like small-scale frontier models in specific domains). However, this prospect is still far off.

Summary: A Real Milestone and a Bunch of Real Questions

Jensen Huang called it "like a modern version of Folding@home." While Folding@home made real contributions in molecular simulation, it did not threaten the core research and development positions of large pharmaceutical companies. This analogy is very accurate.

SN3 successfully executed the protocol, validating the feasible direction of distributed training. However, from a technical and industry perspective, the record it submitted conceals a host of issues that few are willing to seriously discuss:

The MMLU itself is a controversial metric in academia, with the public benchmark questions and answers vulnerable to leaking into the training set. It is also worth noting the selection of the comparative baseline: the LLaMA-2-70B and LLM360 K2 that the paper references are both older models from 2023 to 2024, with scores between 65 and 70 during the same period classified as lower-tier and entry-level, while Claude regards them as seriously behind. If placed against dynamically updated rankings or next-generation benchmarks with pollution resistance designs, the conclusions might be more honest.

More critically, the high-quality data that determines a model's capability ceiling—dialogue data, code, mathematical derivations, scientific literature—is likely held by major companies, publishing institutions, and academic databases. While computational power is democratized, data remains an oligopolistic structure, a contradiction that has not been discussed.

Regarding safety, permissionless participation implies not knowing who is behind those 70+ nodes or what data they are using for training. Gauntlet can filter out obviously anomalous gradients, but it cannot guard against subtle data poisoning—if a node systematically trains several rounds in a direction of harmful content, the gradient changes produced may be sufficiently subtle to pass loss score screenings but lead to cumulative shifts in model behavior. The ultimate question is: in highly regulated, safety-demanding domains like finance, healthcare, and law, what risks arise from using a model trained by a handful of anonymous nodes with incomplete traceability of data sources?

Another structural issue worth mentioning: Covenant-72B itself is open-sourced under the Apache 2.0 license and does not utilize SN3 tokens. Holding SN3 tokens shares the emission income from the subnet's future continuous model output, rather than any direct revenue from the model's use. This value chain relies on sustained training output and the healthy functioning of Bittensor's overall network emission mechanism. If training stagnates in the future or the quality of new training results fails to meet expectations, the valuation logic of the tokens would falter.

Listing these issues is not intended to negate the significance of Covenant-72B. It demonstrates that what was once deemed impossible can indeed be accomplished, and this fact will not disappear. However, accomplishing it and understanding what it means are two different matters.

SN3 tokens surged 440% in the past month. The distance in between may not simply be speculation but rather the narrative's speed often outruns the reality's. Whether this distance will ultimately be filled by reality or corrected by the market depends on what the Covenant AI team truly delivers next.

It is noteworthy that Grayscale has submitted an application for a TAO ETF in January 2026, indicating institutional capital's entry signal into this track. Furthermore, Bittensor plans to halve daily TAO emissions in December 2025, with structural tightening on the supply side still fermenting.

Reference links:

https://arxiv.org/pdf/2603.08163

https://importai.substack.com/p/importai-449-llms-training-other

https://docs.tplr.ai/

https://systems-analysis.ru/int/MMLU_Benchmark_%E2%80%94_MMLU_%E5%9F%BA%E5%87%86%E6%B5%8B%E8%AF%95

免责声明：本文章仅代表作者个人观点，不代表本平台的立场和观点。本文章仅供信息分享，不构成对任何人的任何投资建议。用户与作者之间的任何争议，与本平台无关。如网页中刊载的文章或图片涉及侵权，请提供相关的权利证明和身份证明发送邮件到support@aicoin.com，本平台相关工作人员将会进行核查。

Huang Renxun pointed out? SN3 increased fivefold in March, what exactly did it do?

The 72B Model

What Jensen Huang Said and Didn't Say

Understanding Bittensor Subnetwork and SN3

The Narrative of Decentralized AI's Value is Undergoing a Fundamental Shift

Summary: A Real Milestone and a Bunch of Real Questions

Selected Articles by Odaily星球日报

Table of Contents

Related Articles