IOSG: In the era of crypto super applications, is the data infrastructure ready?

CN
PANews
Follow
4 hours ago

Author: Story @IOSG

TL;DR

Data Challenge: The competition for block time in high-performance public chains has entered the sub-second era. The increase in high concurrency, high traffic fluctuations, and multi-chain heterogeneous demands on the C-end has added complexity to the data side, necessitating a shift in data infrastructure towards real-time incremental processing + dynamic scaling. Traditional batch processing ETL has delays ranging from minutes to hours, making it difficult to meet real-time trading needs. Emerging solutions like The Graph, Nansen, and Pangea introduce stream processing, compressing delays to real-time tracking levels.

Paradigm Shift in Data Competition: The last cycle focused on "understandable"; this cycle emphasizes "profitability." Under the Bonding Curve model, a one-minute delay can result in cost differences by several multiples. Tool iteration: from manual slippage settings → sniper bots → GMGN integrated terminals. The ability to trade on-chain is gradually being commoditized, with the core competitive frontier shifting towards the data itself: those who can capture signals faster can help users profit.

Expansion of Trading Data Dimensions: Memes are essentially the financialization of attention, with key elements being narrative, attention, and subsequent dissemination. The closed loop of off-chain public opinion × on-chain data: narrative tracking summaries and sentiment quantification become the core of trading. "Underwater data": capital flow, role profiling, smart money/KOL address labeling reveals the implicit games behind anonymous on-chain addresses. The next generation of trading terminals will integrate multi-dimensional signals from on-chain and off-chain into seconds, enhancing entry and risk assessment.

AI-Driven Executable Signals: From information to profit. The new phase of competitive goals: fast, automated, and capable of generating excess returns. LLM + multi-modal AI can automatically extract decision signals and combine them with Copy Trading, take-profit, and stop-loss execution. Risk challenges: hallucinations, short signal lifespan, execution delays, and risk control. Balancing speed and accuracy, reinforcement learning and simulation backtesting are key.

Survival Choices for Data Dashboards: Lightweight data aggregation/dashboard applications lack a competitive moat, and their survival space is being compressed. Downward: Deepen high-performance underlying pipelines and integrate data research. Upward: Extend to the application layer, directly addressing user scenarios to enhance data call activity. Future track patterns: either become the infrastructure for Web3's water, electricity, and coal, or become the user platform for Crypto Bloomberg.

The competitive moat is shifting towards "executable signals" and "underlying data capabilities," with the closed loop of long-tail assets and trading data presenting unique opportunities for crypto-native entrepreneurs. Opportunity window in the next 2-3 years:

  • Upstream infrastructure: Web2-level processing capabilities + Web3 native demand → Web3 Databricks/AWS.
  • Downstream execution platforms: AI Agent + multi-dimensional data + seamless execution → Crypto Bloomberg Terminal.

Thanks to projects like Hubble AI, Space & Time, and OKX DEX for their support of this research report!

Introduction: The Triple Resonance of Meme, High-Performance Public Chains, and AI

In the last cycle, the growth of on-chain trading primarily relied on infrastructure iteration. As we enter a new cycle, with infrastructure gradually maturing, super applications represented by Pump.fun are becoming the new growth engine in the crypto industry. This asset issuance model, with a unified issuance mechanism and sophisticated liquidity design, has shaped a trading trench that is fair and rich in wealth myths. The replicability of this high-multiplicative wealth effect is profoundly changing users' profit expectations and trading habits. Users need not only faster entry opportunities but also the ability to acquire, analyze, and execute multi-dimensional data in a very short time, while existing data infrastructure struggles to support such density and real-time demands.

Consequently, there is a higher-level demand for trading environments: lower friction, faster confirmation, and deeper liquidity. Trading venues are rapidly migrating to high-performance public chains and Layer 2 Rollups represented by Solana and Base. The volume of trading data on these public chains has increased by more than ten times compared to the previous round with Ethereum, presenting more severe data performance challenges for existing data providers. With the imminent launch of new-generation high-performance public chains like Monad and MegaETH, the demand for on-chain data processing and storage will grow exponentially.

At the same time, the rapid maturation of AI is accelerating the realization of intelligent equity. The intelligence of GPT-5 has reached a doctoral level, and multi-modal large models like Gemini can easily understand K-line charts… With the help of AI tools, previously complex trading signals can now be understood and executed by ordinary users. In this trend, traders are beginning to rely on AI for trading decisions, which in turn depend on multi-dimensional, high-efficiency data. AI is evolving from an "auxiliary analysis tool" to a "trading decision hub," and its proliferation further amplifies the demand for data real-time, interpretability, and scalable processing.

Under the triple resonance of the meme trading frenzy, the expansion of high-performance public chains, and the commoditization of AI, the on-chain ecosystem's demand for a new data infrastructure is becoming increasingly urgent.

Addressing the Data Challenge of 100,000 TPS and Millisecond Block Times

With the rise of high-performance public chains and high-performance Rollups, the scale and speed of on-chain data have entered a new stage.

With the widespread adoption of high concurrency and low-latency architectures, daily transaction volumes easily exceed ten million, with raw data sizes measured in hundreds of GB. Taking Solana as an example, its average daily TPS has exceeded 1,200 over the past 30 days, with daily transactions surpassing 100 million; on August 17, it even set a historical high of 107,664 TPS. Statistics show that Solana's ledger data is growing rapidly at a rate of 80-95 TB per year, which translates to 210-260 GB per day.

▲ Chainspect, 30-Day Average TPS

▲ Chainspect, 30-Day Transaction Volume

Not only has throughput surged, but the block times of emerging public chains have also entered the millisecond range. The Maxwell upgrade of BNB Chain has reduced block time to 0.8s, while Base Chain's Flashblocks technology compresses it to 200ms. In the second half of this year, Solana plans to replace PoH with Alpenglow, reducing block confirmation time to 150ms, while MegaETH's mainnet aims for real-time block times of 10ms. These breakthroughs in consensus and technology significantly enhance the real-time nature of transactions but place unprecedented demands on block data synchronization and decoding capabilities.

However, most downstream data infrastructures still rely on batch processing ETL pipelines, inevitably leading to data delays. For example, Dune reports that contract interaction event data on Solana typically has a delay of about 5 minutes, while protocol layer aggregated data may take up to 1 hour. This means that on-chain transactions that could be confirmed in 400ms are delayed by hundreds of times before they become visible in analytical tools, which is nearly unacceptable for real-time trading applications.

▲ Dune, Blockchain Freshness

To address the challenges on the data supply side, some platforms have shifted to streaming and real-time architectures. The Graph, through Substreams and Firehose, has compressed data delays to near real-time. Nansen has achieved performance improvements by dozens of times on Smart Alerts and real-time dashboards by introducing streaming processing technologies like ClickHouse. Pangea provides real-time streaming data to B-end users such as market makers, quantitative analysts, and central limit order books (CLOBs) with less than 100ms latency by aggregating computing, storage, and bandwidth from community nodes.

▲ Chainspect

In addition to the massive data volume, on-chain trading also exhibits significant uneven traffic distribution characteristics. Over the past year, the weekly trading volume of Pumpfun has varied nearly 30 times from lowest to highest. In 2024, the meme trading platform GMGN experienced six server "crashes" within four days, forcing it to migrate its underlying database from AWS Aurora to the open-source distributed SQL database TiDB. After the migration, the system's horizontal scaling capabilities and computational elasticity significantly improved, increasing business agility by about 30% and alleviating pressure during peak trading periods.

▲ Dune, Pumpfun Weekly Volume

▲ Odaily, TiDB's Web3 Service Case

The multi-chain ecosystem further exacerbates this complexity. Differences in log formats, event structures, and transaction fields among different public chains mean that each new chain requires customized parsing logic, greatly testing the flexibility and scalability of data infrastructure. Some data providers have thus adopted a "customer-first" strategy: where there is active trading activity, they prioritize integrating services from that chain, balancing flexibility and scalability.

If data processing remains at the fixed-interval batch processing ETL stage in the context of the prevalence of high-performance chains, it will face challenges of delay accumulation, decoding bottlenecks, and query lags, failing to meet the demands for real-time, refined, and dynamic interactive data consumption. Therefore, on-chain data infrastructure must further evolve towards streaming incremental processing and real-time computing architectures, while also incorporating load balancing mechanisms to cope with the concurrent pressures brought by periodic trading peaks in the crypto space. This is not only a natural extension of the technical path but also a key link in ensuring the stability of real-time queries, which will form a true watershed in the competition of the next generation of on-chain data platforms.

Speed is Wealth: The Paradigm Shift in On-Chain Data Competition

The core proposition of on-chain data has shifted from "visualization" to "executability." In the last cycle, Dune was the standard tool for on-chain analysis. It met the needs of researchers and investors for "understandability," allowing people to stitch on-chain narratives using SQL charts.

  • GameFi and DeFi players relied on Dune to track capital inflows and outflows, calculate gold mining returns, and withdraw in a timely manner before market turning points.
  • NFT players analyzed transaction volume trends, whale holdings, and distribution characteristics through Dune to predict market heat.

However, in this cycle, meme players are the most active consumer group. They have driven the phenomenal application Pump.fun to accumulate revenue of $700 million, nearly double the total revenue of the leading consumer application Opensea from the previous cycle.

In the meme track, the market's time sensitivity has been amplified to the extreme. Speed is no longer just an added bonus; it is the core variable that determines profit and loss. In the primary market priced by Bonding Curve, speed equals cost. Token prices rise exponentially with buying demand, and even a one-minute delay can result in cost differences by several multiples. According to Multicoin research, the most profitable players in this game often have to pay a 10% slippage to enter the block three points ahead of their competitors. The wealth effect and "get-rich-quick myth" drive players to chase millisecond K-lines, execute transactions in the same block, and utilize one-stop decision panels, competing in information collection and order placement speed.

▲ Binance

In the manual trading era of Uniswap, users had to set slippage and gas themselves, and the front end could not display prices, making trading feel more like "buying a lottery ticket"; with the advent of the BananaGun sniper bot era, automatic sniping and slippage technology allowed retail players to stand on the same starting line as scientists; moving to the PepeBoost era, bots push pool opening information in real-time while also synchronously pushing front-row holding data; ultimately evolving to the current GMGN era, which has created a terminal that integrates K-line information, multi-dimensional data analysis, and trade execution, becoming the "Bloomberg Terminal" of meme trading.

As trading tools continue to iterate, execution thresholds gradually dissolve, and the competitive frontier inevitably shifts towards the data itself: those who can capture signals faster and more accurately can establish trading advantages in a rapidly changing market and help users make money.

Dimensions as Advantages: The Truth Beyond K-lines

The essence of memecoins is the financialization of attention. High-quality narratives can continuously break boundaries, aggregating attention, thereby pushing up prices and market values. For meme traders, while real-time data is important, to achieve significant results, it is more crucial to answer three questions: What is the narrative of this token? Who is paying attention? And how will attention continue to amplify in the future? These aspects only leave shadows on the K-line; the real driving force relies on multi-dimensional data—off-chain public sentiment, on-chain addresses and holding structures, and the precise mapping of the two.

On-chain × Off-chain: The Closed Loop from Attention to Transactions

Users attract attention off-chain and complete transactions on-chain; the closed-loop data of the two is becoming the core advantage of meme trading.

#Narrative Tracking and Propagation Chain Identification

On social platforms like Twitter, tools such as XHunt can help meme players analyze the KOL attention lists of projects to determine the associated individuals behind the projects and potential attention propagation chains. 6551 DEX generates complete, real-time AI reports for traders by aggregating Twitter, official websites, tweet comments, issuance records, KOL attention, etc., helping traders accurately capture narratives.

#Sentiment Indicator Quantification

Infofi tools like Kaito and Cookie.fun aggregate content and conduct sentiment analysis on Crypto Twitter, providing quantifiable metrics for Mindshare, Sentiment, and Influence. For example, Cookie.fun directly overlays these two metrics onto price charts, turning off-chain sentiment into readable "technical indicators."

▲ Cookie.fun

#On-chain and Off-chain are Equally Important

OKX DEX displays Vibes analysis alongside market data, aggregating KOL shout-out timestamps, top associated KOLs, Narrative Summaries, and comprehensive scores, shortening the time for off-chain information retrieval. The Narrative Summary has become the most well-received AI product feature among users.

Underwater Data Display: Turning "Visible Ledger" into "Usable Alpha"

In traditional finance, order flow data is controlled by large brokers, and quantitative firms must pay hundreds of millions of dollars each year to obtain it to optimize trading strategies. In contrast, the trading ledger of crypto is completely open and transparent, effectively "open-sourcing" high-priced intelligence, forming an open-pit gold mine waiting to be mined.

The value of underwater data lies in extracting invisible intentions from visible transactions. This includes capital flows and role characterization—whether the market maker is building positions or distributing clues, KOL sock puppet addresses, concentrated or dispersed chips, bundled trades, and abnormal capital flows; it also includes address profiling linkage—labeling each address as smart money, KOL/VC, developers, phishing, wash trading, etc., and binding them with off-chain identities, connecting on-chain and off-chain data.

These signals are often difficult for ordinary users to detect but can significantly influence short-term market trends. By real-time parsing address labels, holding characteristics, and bundled trades, trading assistance tools are revealing the gaming dynamics "beneath the surface," helping traders avoid risks and find alpha in millisecond market conditions.

For example, GMGN integrates smart money, KOL/VC addresses, developer wallets, wash trading, phishing addresses, bundled trades, and other label analyses on top of on-chain real-time trading and token contract data, mapping on-chain addresses to social media accounts, aligning capital flows, risk signals, and price behavior to the millisecond, helping users make faster entry and risk assessment decisions.

▲ GMGN

AI-Driven Executable Signals: From Information to Profit

"The next round of AI will not sell tools, but profits." — Sequoia Capital

This judgment also holds true in the field of crypto trading. Once the speed and dimensions of data meet the standards, the subsequent competitive goal is whether multi-dimensional complex data can be directly transformed into executable trading signals in the data decision-making process. The evaluation criteria for data decision-making can be summarized in three points: fast enough, automated, and capable of generating excess returns.

Fast enough: With the continuous advancement of AI capabilities, the advantages of natural language and multi-modal LLMs will gradually come into play here. They can not only integrate and understand massive amounts of data but also establish semantic connections between data, automatically extracting decisive conclusions. In a high-intensity, low-trading-depth on-chain trading environment, each signal has a very short validity period and capital capacity, and speed directly affects the returns that signals can bring.

Automated: Humans cannot monitor trading 24 hours a day, but AI can. For example, users can place buy orders with take-profit and stop-loss conditions on the Senpi platform through an Agent. This requires AI to perform real-time polling or monitoring of data in the background and automatically decide to place orders when it detects a recommendation signal.

Returns: Ultimately, the effectiveness of any trading signal depends on its ability to continuously generate excess returns. AI not only needs to have a sufficient understanding of on-chain signals but also needs to incorporate risk control to maximize risk-return ratios in a highly volatile environment. For instance, it should consider unique on-chain factors affecting returns, such as slippage losses and execution delays.

This capability is reshaping the business logic of data platforms: from selling "data access rights" to selling "profit-driven signals." The competitive focus of the next generation of tools is no longer on data coverage but on the executability of signals—whether it can truly complete the last mile from "insight" to "execution."

Some emerging projects have begun to explore this direction. For example, Truenorth, as an AI-driven discovery engine, incorporates "decision execution rates" into the evaluation of information effectiveness, continuously optimizing output through reinforcement learning, minimizing ineffective noise, and helping users build directly executable information flows for order placement.

▲ Truenorth

Although AI has enormous potential in generating executable signals, it also faces multiple challenges.

Hallucinations: On-chain data is highly heterogeneous and noisy; LLMs can easily experience "hallucinations" or overfitting when parsing natural language queries or multi-modal signals, affecting signal returns and accuracy. For example, for multiple tokens with the same name, AI often fails to find the contract address corresponding to the CT Ticker. Similarly, for many AI signal products, discussions about AI in CT often point to Sleepless AI.

Signal Lifespan: The trading environment is ever-changing. Any delay will erode returns; AI must complete data extraction, reasoning, and execution in a very short time. Even the simplest Copy Trading strategy, if not following smart money, can turn positive returns into negative.

Risk Control: In high-volatility scenarios, if AI continuously fails on-chain or experiences excessive slippage, it may not only fail to generate excess returns but could also deplete the entire principal within minutes.

Therefore, finding a balance between speed and accuracy, and using mechanisms such as reinforcement learning, transfer learning, and simulation backtesting to reduce error rates, is a competitive point for AI in this field.

Upward or Downward? The Survival Choices of Data Dashboards

As AI can directly generate executable signals and even assist in order placement, "lightweight middle-layer applications" that rely solely on data aggregation are facing a survival crisis. Whether it is piecing together on-chain data into dashboard tools or layering execution logic on top of aggregation with trading bots, they fundamentally lack a sustainable competitive moat. In the past, these tools could stand firm due to convenience or user mindset (for example, users habitually checking token CTO situations on Dexscreener); however, now that the same data is available in multiple places, execution engines are increasingly commoditized, and AI can directly generate decision signals and trigger execution on the same data, their competitiveness is rapidly being diluted.

In the future, efficient on-chain execution engines will continue to mature, further lowering trading thresholds. In this trend, data providers must make a choice: either go down, delving deeper into faster data acquisition and processing infrastructure; or go up, extending to the application layer, directly controlling user scenarios and consumption traffic. Those caught in the middle, relying solely on data aggregation and lightweight packaging, will find their survival space increasingly squeezed.

Going down means building an infrastructure moat. Hubble AI, in the process of developing trading products, realized that relying solely on TG Bots could not form a long-term advantage, so it shifted to upstream data processing, aiming to create a "Crypto Databricks." After optimizing the data processing speed for Solana, Hubble AI is transitioning from data processing to an integrated data research platform, positioning itself upstream in the value chain to provide foundational support for the narrative of "financial on-chain" in the U.S. and the data needs of on-chain AI Agent applications.

Going up means extending to application scenarios and locking in end users. Space and Time initially focused on sub-second SQL indexing and oracle push, but recently began exploring consumer scenarios, launching Dream.Space on Ethereum—a "vibe coding" product. Users can write smart contracts or generate data analysis dashboards in natural language. This transformation not only increases the frequency of its own data service calls but also creates direct stickiness with users through the end-user experience.

Thus, it is evident that roles caught in the middle, relying solely on selling data interfaces, are losing their survival space. The future B2B2C data track will be dominated by two types of players: one type controls the underlying pipeline, becoming infrastructure companies akin to "on-chain utilities"; the other type is close to user decision-making scenarios, transforming data into application experiences.

Summary

Under the triple resonance of the meme craze, the explosion of high-performance public chains, and the commercialization of AI, the on-chain data track is undergoing a structural shift. The iteration of trading speed, data dimensions, and execution signals has made "visible charts" no longer the core competitive advantage; the real moat is shifting towards "executable signals that help users make money" and "the underlying data capabilities that support all of this."

In the next 2–3 years, the most attractive entrepreneurial opportunities in the crypto data field will emerge at the intersection of Web2-level infrastructure maturity and Web3 on-chain native execution models. The data of major cryptocurrencies like BTC/ETH, due to their high standardization and characteristics similar to traditional financial futures products, has gradually been incorporated into the data coverage scope of traditional financial institutions and some Web2 fintech platforms.

In contrast, the data of meme coins and long-tail on-chain assets exhibits extremely high non-standardization and fragmentation characteristics—from community narratives and on-chain public sentiment to cross-chain liquidity, this information needs to be interpreted in conjunction with on-chain address profiling, off-chain social signals, and even millisecond trading execution. It is precisely this difference that creates a unique opportunity window for crypto-native entrepreneurs in processing and trading long-tail assets and meme data.

We are optimistic about projects that will deeply cultivate in the following two directions:

Upstream Infrastructure—On-chain data companies with streaming data pipelines, ultra-low latency indexing, and cross-chain unified parsing frameworks that rival the processing capabilities of Web2 giants. Such projects are expected to become the Web3 version of Databricks/AWS, and as users gradually migrate on-chain, trading volumes are likely to grow exponentially, with the B2B2C model possessing long-term compounding value.

Downstream Execution Platforms—Applications that integrate multi-dimensional data, AI Agents, and seamless trading execution. By transforming fragmented on-chain/off-chain signals into directly executable trades, these products have the potential to become the crypto-native Bloomberg Terminal, with their business model no longer relying on data access fees but monetizing through excess returns and signal delivery.

We believe that these two types of players will dominate the next generation of the crypto data track and build sustainable competitive advantages.

免责声明:本文章仅代表作者个人观点,不代表本平台的立场和观点。本文章仅供信息分享,不构成对任何人的任何投资建议。用户与作者之间的任何争议,与本平台无关。如网页中刊载的文章或图片涉及侵权,请提供相关的权利证明和身份证明发送邮件到support@aicoin.com,本平台相关工作人员将会进行核查。

疯狂星期四,注册返10%,交易领50000BGB
Ad
Share To
APP

X

Telegram

Facebook

Reddit

CopyLink