Source: Geek Park
Written by: Xu Shan
"The cost of tokens is plummeting."
If this sentence were said two years ago, it would excite every AI entrepreneur. From 2023 to 2025, the cost of AI inference dropped by 99.7%. At the time of GPT-4 release, the cost per million tokens was $37.5, and by 2025 this number had dropped to $0.14. Following this trend, the cost of computational power should not be a problem for entrepreneurs.
But the reality is quite the opposite.
During the same period, global enterprise AI cloud spending skyrocketed from $11.5 billion to $37 billion, tripling in total. After AI entered the A2A era, the number of tokens used exploded exponentially as numerous intelligent agents interacted repeatedly. This has led to a situation where, although the unit price of tokens is cheaper, the amount of tokens consumed per task has surged wildly.
Clearly, computational power is becoming the most peculiar resource of this era. It is becoming increasingly cheap, yet the money spent on it will only continue to grow.
For large companies, this problem can be solved by building their own computational centers. But for most startups, they can only stand in the public computational market, accepting the pricing from cloud vendors, watching the computational bills rise month after month, with no bargaining power at all.
Fu Zhi, founder of Gongji Technology, sees this market misalignment as a business opportunity.
In his view, the solution to lowering computing costs is not merely waiting for costs to fall naturally; rather, it is to change the way computational power is used, enabling the costs to start decreasing. Treat computational power like electricity, available on demand, billed according to usage, and reactivating large amounts of idle and wasted computational resources.
Recently, Gongji Technology completed a Pre-A round of financing, with a post-investment valuation of 350 million yuan, and plans to initiate A round financing soon. Amid the general pressure in the computational track in 2025, this technology company, which solves resource scheduling issues using artificial intelligence, quietly achieved tens of millions in revenue, with a customer retention rate close to 100%.
Gongji Technology is turning computational scheduling into a real business.
Gongji Technology founder Fu Zhi Image source: Gongji Technology
01 When AI companies exploded,there was a new solution for computing costs
On the eve of a new product launch, Remy's team barely slept, always on standby for unexpected situations.
However, when the company’s website experienced an influx of 500,000 users within 48 hours, for an AI startup just transitioning from internal testing to public testing, they needed to expand all their infrastructure dozens of times in a short period. Despite some preparations, before the launch, Remy had tested multiple cloud service platforms including Ucloud, Alibaba Cloud, and Huawei Cloud, but when the traffic actually hit, their final solution provider was Gongji Technology.
Simply put, Gongji Technology rearranges idle computational power and allocates it on demand to AI enterprises with flexible needs. Whether it's machines idling overnight in internet cafes, personal users' 4090 GPUs, or the idle resources of small data centers, they can all become part of Gongji Technology's schedulable computational pool. If clients need more, they can always draw more from the pool, using what they need when they need it.
During those 48 hours, Gongji Technology urgently allocated nearly 1,900 GPU cards for Remy. Every time a user made a request, a new order was created; when the user’s computation was completed, the order was immediately closed. That day, the platform processed over a million orders.
"At peak moments, a typical computational service provider can barely open 20 cards temporarily, and in more cases companies have to wait, but waiting means losing traffic, which is absolutely something the company does not want to see," Fu Zhi remarked after this event. Most of the computational resources used by Remy thereafter came from Gongji Technology.
Remy’s computational needs were quite simple: during any traffic explosion, user clicks must be responded to promptly; the calling of computational power must be quick and timely, while costs must be low. These are the most basic requirements for computational power for a nascent AI startup.
In contrast, a certain category of AI application clients face smaller yet more realistic computing demands.
Last year during the Spring Festival holiday, a company specializing in AI outfit-changing photography approached Gongji Technology. They weren’t unaware of when the traffic explosion would occur, but calculating the computational needs was still a challenge.
Their AI devices were mainly located in tourist attractions, crowded during holidays, leading to a surge in computational demand. However, after the holiday, the demand for computational power almost dropped to zero. "The Spring Festival is the biggest peak of the year, and for the remaining half of the year, there are hardly any people in the attractions," they told Fu Zhi.
Such fluctuations in computational demand mean that if they chose to rent computational power based on peak demand, they would be burning money to keep cards running 90% of the time; meanwhile, if they rented based on average demand, there would undoubtedly be a collapse in demand during the Spring Festival, severely affecting user experience. "Such demand fluctuations are quite difficult to accommodate in traditional computational service solutions. Because of this extreme peak and valley difference, there’s simply no corresponding pricing logic in standard products," Fu Zhi explained.
However, these scenarios are very suitable for using Gongji Technology's computational sharing platform.
During that month, 1,963 personal computers served as service nodes, and over the entire Spring Festival, there was not a single instance of stability issues in the service. "Compared to having clients deploy computational power based on peak demand themselves, we helped them save about 70% of the costs," Fu Zhi added.
Such time fluctuation demand appears not only in some niche scenarios but is also quite common among several AI startups.
liblib is one of China's largest AI image generation platforms, and they once rented a large number of GPU cards from cloud vendors. But upon closer examination, they found that the overall utilization rate of these GPUs averaged only 45%.
This also means that over half of the cards were burning money every day for no reason.
According to Fu Zhi, companies like liblib are not a minority; almost all AI application tools with office workers as core users encounter this issue. Users densely utilize the system during the day but see a significant drop in users at night. If computational power is configured based on peak demand, the night would see a high vacancy rate, but if configured based on average demand, it would be challenging to meet all user needs during the day.
The AI track appears bustling, but it could be the computing cost that chokes off the lifeline of company development. Many companies have overly high expectations for computing capacity, and computing costs drag down cash flow; meanwhile, some companies underestimate their computational needs, causing service collapses during peak usage when users leave and do not return.
"The traffic of AI applications is inherently volatile; the pricing logic of the computing market is designed for stable demand, while the allocation method for computing costs remains somewhat traditional," Fu Zhi explained. This is also why, when an AI company truly takes off, there needs to be a new algorithm for computing costs.
In the past, traditional computing service models primarily revolved around long-term rental contracts. Enterprises rent for a year, regardless of usage, and must prepay computing costs, bearing the costs of idle computing power themselves. What Gongji Technology does is effectively shift this cost to another place, namely those who already have idle computing capacity but cannot fully utilize it, such as private users, internet cafes, etc. This computational power is already wasted, so by reallocating them, no new computing costs are incurred, and existing idle capacities are revitalized.
"More computing power is not necessarily better," Fu Zhi stated, "rather, it should be fluid and callable at any time to be truly effective."
02 The business of elastic computing,tests energy scheduling capabilities
For Fu Zhi, the opportunity to engage in computing scheduling actually came from a chance encounter.
During the May holiday of 2023, just as the AI wave was starting to take shape, Fu Zhi posted a message in an AI entrepreneur community. The content was simple: I have an A100, the shorter the rental, the cheaper it is; those in need can find me.
He didn’t have high expectations at the time since he only had one graphics card. To his surprise, 30 people consulted him, and they all paid without hesitation.
"I said I’d serve whoever paid quickly." He ultimately chose to serve five people. One card, five clients, validating a judgment he had been contemplating for a long time: ordinary people need computing power now.
Yet he also recognized that the reason this business was established at that particular moment was not due to luck, but because conditions had never been right before that.
After all, someone proposed doing computing sharing as early as 1999 and built the BOINC platform, where hundreds of thousands contributed their computing power, but it was aimed at a public benefit scientific computing platform where everyone could use it for free. Later, when Bitcoin was booming, some considered utilizing idle computational power to mine, but that was not legal.
The ideas were always there, but the soil was never ready.
Indeed, the ordinary users with high-performance GPUs are the post-90s and post-00s generations. Before this, very few personal computers were configured with RTX 4090. Additionally, making personal computers safely run a Linux virtual environment with WSL 1.0.0 was only officially released in 2022, to say nothing of the technology that allows remote access to personal devices distributed across regions and penetrating internal networks, which only matured around 2021.
The three factors of supply side, demand side, and technical conditions all coming together makes this business possible today.
However, Fu Zhi feels that the true signal that "the time has come" isn’t DeepSeek, isn’t all-in-one machines, but rather the consumption scenarios of AI, transitioning from niche tools to everyday entertainment for ordinary people.
"Once this process accelerates, the demand for computing power won’t just be for a few large companies to purchase, it needs to be scheduled and distributed on a large scale, across nodes, similar to electricity," Fu Zhi stated.
This also explains why Gongji Technology is advancing partnerships with national computing centers. They are already involved in the construction of provincial computing scheduling platforms in Beijing, Tianjin, Hebei, the Yangtze River Delta, Shenzhen, and Qinghai, with Gongji's technology being involved in the scheduling systems built across different regions.
However, "computational scheduling" is much more challenging than it seems.
Computational scheduling and management are not the same. Fu Zhi distinguishes scheduling from management: large companies do management, integrating a bunch of machines into the same system, knowing who is using what, but it’s difficult to achieve dynamic allocation across regions and devices.
On the other hand, computational scheduling is different; it requires fulfilling peak demands in one area with idle computational power from another area. There is no ready solution for this in computer engineering, rather, it is a familiar problem in the energy sector. "Peak shaving and valley filling" is a term originally from the power system.
Fu Zhi studied Architectural Environment and Energy Application Engineering at Tsinghua University, with his advisor being an academician in the field of energy. He transplanted the algorithms for energy scheduling to solve the same problem in the context of computing, which is also Gongji's core barrier.
Of course, this cross-regional scheduling system also faces numerous engineering challenges. For example, personal computers accessing the scheduling pool may at any time "be occupied"—if a user starts a game, that machine must drop out, but downstream clients require uninterrupted service.
Fu Zhi opted for hot backup plus prediction, which means preparing redundancy for each task in advance, while using accumulated historical data to predict the online patterns of each provider, dynamically adjusting the backup ratio. The more data collected, the more accurate the backup and the lower the cost. "Originally, I had to prepare two machines for you. But now, as usage has increased, I only need to have one ready." The network transmission layer is also unstable, and Gongji’s countermeasure is to simultaneously access three leading cloud vendors. Fu Zhi mentioned, "It’s unlikely that all three will encounter problems simultaneously."
So why don’t cloud vendors provide elastic computing?
Fu Zhi explains that large vendors are aware of the concept, but their elastic computing differs in product positioning and pricing strategy. Gongji's advantages lie in price and scheduling efficiency.
The core contradiction of elastic computing lies in needing to prepare computing power that can be "called at any time," but when not in use, this power incurs purely idle costs. Generally, computing service providers’ elastic scaling is about five times the regular price or requires customers to sign a long-term contract, putting the risk of idle computing power on them.
Gongji can genuinely provide elastic computing because the resources it uses are already idle; they have not been pre-purchased to suppress costs but are simply unused, allowing Gongji to offer more competitive prices.
Fu Zhi analyzes that in the entire market, 80% of the computing power demand is handled by large companies via long-term fixed contracts, while the remaining 20% caters to flexible needs. He doesn't plan to compete for that 80%; rather, he focuses on that 20% of the market. Furthermore, as AI applications continue to grow, this 20% market space will expand. "With others, the longer you rent, the cheaper it gets; with me, the shorter you rent, the cheaper it gets," Fu Zhi added. Nowadays, Gongji Technology's shared computing platform, "suanli.cn," allows ordinary consumers to rent relevant computing power per millisecond.
Gongji Technology team photo Image source: Gongji Technology
Such a shared business model has long been validated in other fields.
Fu Zhi compares the essence of this business to Airbnb: when large exhibitions are held in cities and nearby hotels are fully booked, Airbnb connects residents with idle rooms and attendees with nowhere to stay. The story of computing power follows the same route—AI applications need substantial computational power during release and traffic explosion times, while everyday needs fall far short; on the other end, personal users, internet cafes, and small data centers see substantial idle capacity at night and on workdays, and connecting both sides is what Gongji does.
Just that instead of sharing rooms, they share computational power.
03 Energy scheduling for computing power,AI era's "software-defined infrastructure"
This path has been explored abroad as well. For instance, RunPod is providing elastic inference services through idle computing power and raised a $20 million seed round in 2024 led by Intel Capital and Dell Technologies, with clients like Cursor, OpenAI, and Perplexity.
But in Fu Zhi's view, doing this in America versus in China is completely different.
AWS has been providing elastic computing since its inception, initially promising on-demand usage and serving a mature market through high-priced elastic services. However, domestic cloud computing vendors lean more toward providing long-term rental models, with related preferential policies also favoring this and paying less attention to elastic services; user willingness to pay for elastic computing power is significantly lower than in America. Therefore, applying RunPod’s logic in China would struggle with pricing.
However, Fu Zhi believes that computing scheduling is not merely a business that focuses on renting out computational power. "Shared computational power might be just a stepping stone," he said without hesitation. In his judgment, this business has about a two to three-year window; as long as the misalignment of supply and demand for computational power remains, this gap exists, but it won’t last forever.
Such clarity is rare among entrepreneurs. But precisely because of this, he started considering a more fundamental question early on: where will the next truly explosive AI application come from? This judgment will directly influence the direction of computing demand, and Fu Zhi has two forward-looking assessments.
The first is, according to his analysis, that China's super applications will not emerge from productivity tools on PC; the truly promising direction for China lies in mobile social entertainment, cross-border hardware combining supply chains, and AI applications that can be embedded into real-life scenarios.
China's internet has never experienced a profound PC productivity tool era; users transitioned directly from feature phones to the mobile internet. Those AI document, AI slideshow, and AI code assistant applications that emerged in the US rely on tens of millions of users who are accustomed to working on PCs and are willing to pay for SaaS tools, while that is not the case in China. "Are there more than 100 million people in China who need to write Word documents? I don't think so." Moreover, even if there is such demand, big companies would quickly convert those functions into free plugins.
Instead, he sees high growth potential in social entertainment scenarios. He has talked to many professionals in short dramas and films, asking them why they are so eager to embrace AI, and their feedback led him to this new thought: "I have nothing left to lose. No one is watching movies or television anymore; we are almost done for." These individuals are the most proactive in embracing AI in the Chinese market, not because they understand technology the best, but because they have nowhere to retreat. "Right now, hardly anyone is watching TV or movies."
He also has some differing views on the development of AI hardware.
In recent years, the mainstream approach to AI hardware has been "everything plus a dialogue box," meaning that every device is equipped with a chat window. Fu Zhi believes this direction is incorrect. "Consumers do not need a refrigerator that can write poems."
Genuinely viable AI hardware should enter high-frequency scenarios that users already have, allowing AI to operate quietly in the background, rather than requiring users to sit down specifically to chat with it.
For example, pet cameras should be able to automatically recognize whether a cat is sick, and scenic cameras should automatically complete outfit changes for photography. Users need not change anything; AI quietly finishes the tasks. "If such hardware can deploy open-source models, during traffic explosions, they will also become clients for elastic computing." Fu Zhi believes this is also one of Gongji Technology's future growth points.
Fu Zhi's second judgment lies deeper, having taken shape by the end of 2024, but he only found an opportunity to validate it this year.
He believes that having people directly engage with AI is inherently a waste of efficiency. Human information input and output speeds have limits; only one question can be asked at a time, and one must wait for an answer before posing the next. But AI can handle tens of thousands of threads simultaneously, completing information transfer between machines within milliseconds. "Using people to drive AI is the slowest link, holding back the entire system's speed."
What should genuinely happen is direct collaboration between AI and AI, A2A. When a task is initiated, it triggers a chain operation among a group of AIs; humans only need to define the objective without participating in every intermediate step. This is also why OpenClaw is valued today. This is the significant reason Fu Zhi deems OpenClaw truly important—not the product itself, but because it proves that AI and AI can form their own community; A2A has someone paying for it, indicating this direction is viable.
Once the A2A model becomes mainstream, the consumption of computational power will increase several times or even dozens of times compared to today. Huang Renxun mentioned at GTC 2026 that due to the explosion of agentic AI and reasoning capabilities, the required computational workload is at least 100 times more than anticipated a year ago, and this is just the beginning. At that time, computational power will genuinely be like electricity, where the question will no longer be how many cards you need to stock up, but whether the entire "computational power grid" can distribute on demand, and computational resource management will enter the realm of scheduling.
When A2A truly arrives, computational power will become the infrastructure behind every individual, every task, and every AI node, just like electricity. At that time, the one who can accurately schedule computational power across regions, devices, and time zones will possess the true operational capability of this network.
What Gongji Technology is currently doing, in Fu Zhi's view, is preparing for that moment, using this two to three-year window to build scheduling capabilities, node networks, and client relationships. When the demand for A2A truly explodes, this system will be Gongji Technology's real moat.
He recently shared a statement internally at the company and repeated it again as the interview drew to a close:
"Even so, all of this is just the beginning."
In the context of elastic computing, this statement may merely reflect an entrepreneur's optimistic judgment of the market. But in the context of A2A, his "beginning" may not signify the start of this business but rather the genuine commencement of the theme of computational power as infrastructure.
免责声明:本文章仅代表作者个人观点,不代表本平台的立场和观点。本文章仅供信息分享,不构成对任何人的任何投资建议。用户与作者之间的任何争议,与本平台无关。如网页中刊载的文章或图片涉及侵权,请提供相关的权利证明和身份证明发送邮件到support@aicoin.com,本平台相关工作人员将会进行核查。