Author: a16z
Translation: Jiahua, ChainCatcher
Last year, the trading volume of the prediction market for the Venezuelan presidential election exceeded $6 million. However, when the vote counting ended, the market faced an impossible situation: the government announced that Nicolás Maduro had won, while the opposition and international observers accused it of fraud. Should the prediction market's resolution follow the "official information" (Maduro's victory) or the "consensus of credible reports" (the opposition's victory)?
In the case of the Venezuelan election, the accusations from observers escalated: they first condemned the disregard for rules and the 'theft' of user funds, then lambasted the resolution mechanism for monopolizing power in this political game—acting as the "judge, jury, and executioner," and even claimed it was severely manipulated.
This is not an isolated incident. It represents, in my view, one of the biggest bottlenecks faced by prediction markets as they scale: contract adjudication.
The risks here are extremely high. If adjudication is handled properly, people will trust your market, be willing to trade in it, and prices will become meaningful signals for society. If adjudication is mishandled, trading will feel frustrating and unpredictable. Participants may leave, liquidity may dry up, and prices will no longer reflect accurate predictions of stable outcomes. Instead, prices will begin to reflect a vague amalgamation—containing both the actual probabilities of outcomes occurring and traders' beliefs about how the distorted resolution mechanism will adjudicate.
The controversy in Venezuela was relatively high-profile, but more subtle failures frequently occur across various platforms:
The Ukraine Map Manipulation Case demonstrated how opponents could profit directly through the gaming resolution mechanism. A contract regarding territorial control stipulated that it would resolve based on a specific online map. Allegedly, someone altered the map to influence the contract's outcome. When your "source of truth" can be manipulated, your market can be manipulated too.
The Government Shutdown Contract showcased how the source of resolution can lead to inaccurate or at least unpredictable results. The resolution rules stipulated that the market would pay out based on the time the U.S. Office of Personnel Management's website showed the shutdown ended. President Trump signed the funding bill on November 12—but for unknown reasons, the OPM website did not update until November 13. Traders who correctly predicted the shutdown would end on the 12th lost their bets due to the website administrator's delay.
The Zelensky Suit Market raised concerns about conflicts of interest. The contract asked whether Ukrainian President Zelensky would wear a suit at a specific event—seemingly a trivial question, yet it attracted over $200 million in bets. When Zelensky attended a NATO summit in attire described as a suit by the BBC, the New York Post, and other media, the market initially resolved as "yes." However, UMA token holders contested the outcome, and the resolution was subsequently flipped to "no."
In this article, I will explore how to cleverly combine large language models (LLMs) and cryptographic technology to help us create a prediction market resolution method that is difficult to manipulate, accurate, fully transparent, and credibly neutral at scale.
The Issue is Not Just in Prediction Markets
Similar problems have also plagued financial markets. For years, the International Swaps and Derivatives Association (ISDA) has been grappling with the adjudication dilemma in the credit default swap (CDS) market. A CDS is a contract that pays out in the event of a company or country defaulting on debt. Their 2024 review report candidly highlights these difficulties. Their resolution committee, composed of major market participants, votes on whether a credit event has occurred. However, this process has been criticized for its lack of transparency, potential conflicts of interest, and inconsistent outcomes, much like UMA's process.
The fundamental issue is the same: when vast sums of money depend on the determination of ambiguous situations, every resolution mechanism becomes a target for gaming, and every point of ambiguity becomes a potential flashpoint.
Four Pillars of an Ideal Adjudication Solution
Any viable solution must achieve several key attributes simultaneously:
Resistance to Manipulation If opponents can influence adjudication, such as by editing Wikipedia, planting fake news, bribing oracles, or exploiting software vulnerabilities, the market becomes a contest of who can manipulate better, rather than who can predict better.
Reasonable Accuracy The mechanism must make correct determinations most of the time. In a world filled with genuine ambiguity, perfect accuracy is impossible, but systematic errors or glaring mistakes will destroy credibility.
Pre-Transaction Transparency Traders need to know exactly how adjudication will occur before placing bets. Changing the rules mid-course violates the fundamental contract between the platform and participants.
Credible Neutrality Participants need to believe that the mechanism does not favor any specific trader or outcome. This is why allowing those holding large amounts of UMA to adjudicate contracts they have bet on is so problematic: even if they act fairly, the appearance of conflicts of interest undermines trust.
Human review panels can meet some of these attributes, but they struggle to achieve others—especially resistance to manipulation and credible neutrality—at scale. Token-based voting systems like UMA also have their own well-documented issues regarding whale dominance and conflicts of interest.
This is where AI comes in.
The Case for LLM Judges
This is a proposal that has gained attention within the prediction market community: using large language models as adjudication judges and locking specific models and prompts on the blockchain at the time of contract creation.
The basic architecture is as follows: at the time of contract creation, market makers must not only specify the adjudication criteria in natural language but also specify the exact LLM (identified by a timestamped model version) and the exact prompts to determine the outcome.
This specification is cryptographically submitted to the blockchain. When trading opens, participants can check the complete adjudication mechanism; they know exactly which AI model will adjudicate the outcome, what prompts it will receive, and what information sources it can access.
If they do not like this setup, they do not trade.
At the time of adjudication, the submitted LLM runs using the submitted prompts, accesses the specified information sources, and generates a judgment. The output determines who receives payouts.
This approach simultaneously addresses several key constraints:
Strong Resistance to Manipulation (though not absolute) Unlike Wikipedia pages or small news sites, you cannot easily edit the output of mainstream LLMs. The model's weights are fixed at submission. To manipulate the resolution, opponents would need to compromise the information sources the model relies on or somehow poison the model's training data long ago; both of these attack vectors are costly and uncertain compared to bribing oracles or editing maps.
Providing Accuracy With the rapid improvement of reasoning models and their ability to handle an astonishing range of intellectual tasks—especially when they can browse the web and seek new information—LLM judges should be able to accurately adjudicate many markets—experiments are underway to understand their accuracy.
Built-in Transparency The entire adjudication mechanism is visible and auditable before anyone places a bet. No mid-course rule changes, no discretionary judgments, no backroom negotiations. You know exactly what you are signing up for.
Significantly Enhanced Credible Neutrality LLMs have no economic stake in the outcomes. They cannot be bribed. They do not hold UMA tokens. Their biases, whatever they may be, are attributes of the model itself—not attributes of ad hoc decisions made by stakeholders.
Limitations of AI and Defensive Measures
Models Can Make Mistakes LLMs may misread news articles, generate factual hallucinations, or inconsistently apply adjudication standards. But as long as traders know which model they are betting on, they can factor these flaws into the price. If a specific model has known tendencies to resolve ambiguities in a certain way, savvy traders will take that into account. The model does not need to be perfect; it needs to be predictable.
Not Impossible to Manipulate If the prompts specify particular news sources, opponents may attempt to plant stories within those sources. This attack is costly for mainstream media but may be feasible for smaller outlets—another form of the map editing issue. The design of the prompts is crucial here: a resolution mechanism that relies on diverse, redundant sources is more robust than one that relies on single points of failure.
Poisoning Attacks Are Theoretically Possible Opponents with sufficient resources may attempt to influence the training data of LLMs to bias their future judgments. However, this requires action long before the contract and carries uncertain returns and high costs—much higher than the threshold for bribing committee members.
The Diffusion of LLM Judges Can Create Coordination Issues If different market creators use different prompts dedicated to different LLMs, liquidity will become dispersed. Traders will find it difficult to compare contracts or aggregate information across markets. Standardization is valuable—but allowing the market to discover which combinations of LLMs and prompts work best is also valuable. The right answer may be some combination: allowing experimentation to occur while establishing mechanisms for the community to converge over time on well-tested defaults.
Four Recommendations for Builders
In summary: AI-based adjudication essentially trades one set of problems (human biases, conflicts of interest, opacity) for another set of problems (model limitations, prompt engineering challenges, information source vulnerabilities), and the latter set may be more manageable. So how should we move forward? Platforms should:
Experiment: Test LLM adjudication on lower-risk contracts to build a track record. Which models perform best? Which prompt structures are most robust? What failure modes emerge in practice?
Standardize: As best practices emerge, the community should work towards establishing standardized combinations of LLMs and prompts as defaults. This does not preclude innovation but helps concentrate liquidity in well-understood markets.
Build Transparent Tools: For example, create interfaces that make it easy for traders to check the complete adjudication mechanism—models, prompts, information sources—before trading. Resolution rules should not be buried in fine print.
Conduct Ongoing Governance: Even with AI judges, humans still need to be responsible for setting the top-level rules: which models to trust, how to handle situations where models provide clearly incorrect answers, when to update defaults. The goal is not to completely remove humans from the loop but to shift them from ad hoc case-by-case judgments to systematic rule-making.
Prediction markets have extraordinary potential to help us understand a noisy, complex world. But this potential depends on trust, and trust depends on fair contract adjudication. We have seen the consequences of resolution mechanism failures: confusion, anger, and traders leaving. I have witnessed people completely exit prediction markets in frustration after feeling deceived by an outcome that seemed to violate the spirit of their bets—vowing never to use their previously favored platforms again. This represents a missed opportunity for unlocking the benefits of prediction markets and broader applications.
LLM judges are not perfect. But when combined with cryptographic technology, they are transparent, neutral, and capable of resisting the manipulations that have long plagued human systems. In a world where the scaling of prediction markets outpaces our governance mechanisms, this may be exactly what we need.
免责声明:本文章仅代表作者个人观点,不代表本平台的立场和观点。本文章仅供信息分享,不构成对任何人的任何投资建议。用户与作者之间的任何争议,与本平台无关。如网页中刊载的文章或图片涉及侵权,请提供相关的权利证明和身份证明发送邮件到support@aicoin.com,本平台相关工作人员将会进行核查。