Large model slicing parallelism: A decentralized computing technology direction worth paying attention to.

On the weekend, a decentralized GPU computing project (https://c0mpute.ai/) suddenly became popular online.

This is a project that emphasizes both privacy and decentralized computing power.

In the current Crypto + AI domain, there are many projects labeled with "privacy" and "decentralized computing power." In a previous article, I shared about a decentralized privacy inference project, venice.ai.

From a literal perspective, both c0mpute and venice have two characteristics:

First, both pursue the protection of user privacy;

Second, both aim to complete user-submitted (inference) tasks using decentralized computing power.

Among these two characteristics, the second characteristic presents a significant difference between the two, and it is the aspect that most clearly reflects the distinct technical implementations of the two projects.

Venice's method for achieving decentralized computing power involves finding a GPU node in the network, then completing the user's inference tasks solely on this node.

In contrast, c0mpute finds multiple GPU nodes in the network and completes the user's inference tasks in parallel across these nodes.

How does c0mpute achieve this?

It creatively invented a method to slice the workflow of large language models (LLMs), allowing a node to handle only part of the entire workflow, thus enabling multiple nodes to simultaneously process different parts of the entire workflow, thereby distributing a user's task for parallel processing across multiple nodes.

To make it easier to understand, let’s use a very simple analogy to describe the differences between venice and c0mpute in handling a task.

Assume Alibaba's Qianwen large model requires a total of 3 steps to complete an inference task.

When a user sends a request, for venice, it selects one node in the network that will consistently handle all 3 steps to complete the user's task.

In contrast, for c0mpute, it assigns these 3 steps to three nodes (A, B, C). Node A completes only the first step, node B completes only the second step, and node C completes only the third step.

When c0mpute receives the user's request, it has a coordinator that splits this request into 3 tokens (token 1, token 2, and token 3).

Token 1 is first sent to node A to process the first step.

Once token 1 finishes the first step at node A, it is moved to node B to process the second step. Meanwhile, token 2 starts processing the first step at node A.

At this point, nodes A and B begin to process the user's requirements in parallel.

Once token 1 completes the second step at node B, it is moved to node C to process the third step; when token 2 finishes the first step at node A, it is moved to node B to process the second step; simultaneously, token 3 begins processing the first step at node A.

At this point, nodes A, B, and C all start processing the user's requirements in parallel.

Following the above processing method, once tokens 1, 2, and 3 are respectively processed by nodes A, B, and C, c0mpute then aggregates all results into a complete answer to send to the user.

From the above two working methods, we can see:

Venice is decentralized, opting for a complete node to handle a complete task;

While c0mpute achieves decentralization of nodes by slicing the workflow of the large model, allowing multiple nodes to be used simultaneously to parallel process a complete task.

If the large model's workflow is finely subdivided, it can call upon as many nodes as possible to simultaneously process a task in parallel, thus pushing task processing efficiency to the limit.

Moreover, since a GPU node does not need to complete an entire step of the large model, but rather only needs to run part of the steps, this lowers the power requirements for the node, theoretically allowing even lower-performance GPU cards (such as gaming level) to contribute computing power and participate in the collaboration of this network.

C0mpute's technical concept has been successfully tested with certain large models on several nodes, but it still has some issues, and there are many engineering technical challenges yet to be resolved, falling far short of the ideal state described above.

Additionally, the project team is currently very small, seemingly consisting of only one person, so the project is still quite a distance from true maturity and large-scale application.

However, it has paved a theoretically feasible new path for decentralized computing nodes running large models to handle tasks, and it is a development direction worth paying attention to.

Returning to the relationship between Crypto + AI, like venice, c0mpute actually does not use cryptographic technology in its core technical implementation, but only utilizes stablecoins for payments and employs a crypto platform for fundraising during its financing stage. Therefore, strictly speaking, c0mpute only leverages financial means provided by cryptographic technology.

Yet it is precisely because cryptographic technology, as a financial means, possesses flexibility and convenience unparalleled by traditional finance that it is more likely to foster and support innovations and inventions from "nameless rookies," allowing the ecosystem to periodically witness vibrant new life.

免责声明：本文章仅代表作者个人观点，不代表本平台的立场和观点。本文章仅供信息分享，不构成对任何人的任何投资建议。用户与作者之间的任何争议，与本平台无关。如网页中刊载的文章或图片涉及侵权，请提供相关的权利证明和身份证明发送邮件到support@aicoin.com，本平台相关工作人员将会进行核查。

Large model slicing parallelism: A decentralized computing technology direction worth paying attention to.

Selected Articles by 道说Crypto

Table of Contents

Related Articles