When your browser becomes a proxy

CN
链捕手
Follow
4 hours ago

Author: Mario Chow & Figo @IOSG

Part One

Introduction

In the past 12 months, the relationship between web browsers and automation has undergone a dramatic change. Almost all major tech companies are racing to build autonomous browser agents. This trend has become increasingly evident since the end of 2024: OpenAI launched the Agent mode in January, Anthropic introduced the "Computer Use" feature for the Claude model, Google DeepMind unveiled Project Mariner, Opera announced the agent-based browser Neon, and Perplexity AI launched the Comet Browser. The signal is clear: the future of AI lies in agents that can autonomously navigate the web.

This trend is not just about adding smarter chatbots to browsers, but a fundamental shift in how machines interact with the digital environment. Browser agents are a class of AI systems that can "see" web pages and take actions: clicking links, filling out forms, scrolling pages, typing text—just like human users. This model promises to unleash tremendous productivity and economic value, as it can automate tasks that currently require human intervention or are too complex for traditional scripts to handle.

▲ GIF demonstration: The actual operation of an AI browser agent: following instructions, navigating to the target dataset page, automatically taking screenshots, and extracting the required data.

Part Two

Who Will Win the AI Browser War?

Almost all major tech companies (as well as some startups) are developing their own browser AI agent solutions. Here are some of the most representative projects:

OpenAI – Agent Mode

OpenAI's Agent mode (formerly known as Operator, launched in January 2025) is an AI agent with a built-in browser. The Operator can handle various repetitive online tasks: for example, filling out web forms, ordering groceries, scheduling meetings—all accomplished through standard web interfaces commonly used by humans.

▲ The AI agent arranges meetings like a professional assistant: checking calendars, finding available time slots, creating events, sending confirmations, and generating .ics files for you.

Anthropic – Claude's "Computer Use":

At the end of 2024, Anthropic introduced a brand new "Computer Use" feature for Claude 3.5, giving it the ability to operate computers and browsers like a human. Claude can see the screen, move the cursor, click buttons, and type text. This is the first large model agent tool of its kind to enter public beta testing, allowing developers to have Claude automatically navigate websites and applications. Anthropic positions it as an experimental feature, primarily aimed at automating multi-step workflows on the web.

Perplexity – Comet

AI startup Perplexity (known for its Q&A engine) launched the Comet browser in mid-2025 as an AI-driven alternative to Chrome. The core of Comet is a conversational AI search engine built into the address bar (omnibox), capable of providing instant answers and summaries instead of traditional search links.

  • Additionally, Comet features the Comet Assistant, a sidebar agent that can automatically perform daily tasks across websites. For example, it can summarize your opened emails, schedule meetings, manage browser tabs, or browse and scrape web information on your behalf.

  • By using the sidebar interface to allow the agent to perceive the current webpage content, Comet aims to seamlessly integrate browsing with AI assistance.

Part Three

Real-World Applications of Browser Agents

In the previous sections, we reviewed how major tech companies (OpenAI, Anthropic, Perplexity, etc.) are injecting functionality into browser agents through different product forms. To better understand their value, we can further explore how these capabilities are applied in real-life scenarios in daily life and business workflows.

Everyday Web Automation

#

E-commerce and Personal Shopping

A very practical scenario is delegating shopping and booking tasks to agents. Agents can automatically fill your online shopping cart and place orders based on a fixed list, or search for the lowest prices among multiple retailers and complete the checkout process on your behalf.

For travel, you can instruct the AI to perform tasks like: "Help me book a flight to Tokyo next month (ticket price under $800), and book a hotel with free Wi-Fi." The agent will handle the entire process: searching for flights, comparing options, filling in passenger information, and completing hotel bookings—all through the airline and hotel websites. This level of automation far exceeds existing travel bots: it is not just about recommendations, but direct execution of purchases.

#

Enhancing Office Efficiency

Agents can automate many repetitive business operations that people perform in browsers. For example, organizing emails and extracting to-do items, or checking for free slots across multiple calendars and automatically scheduling meetings. Perplexity's Comet Assistant can already summarize your inbox content or add events to your schedule through the web interface. Agents can also log into SaaS tools to generate regular reports, update spreadsheets, or submit forms after obtaining your authorization. Imagine an HR agent that can automatically log into different job boards to post positions; or a sales agent that can update lead data in a CRM system. These mundane tasks would typically consume a lot of employee time, but AI can complete them by automating web forms and page operations.

In addition to single tasks, agents can link together complete workflows across multiple web systems. All these steps require operations across different web interfaces, which is the strength of browser agents. Agents can log into various dashboards for troubleshooting, or even orchestrate processes, such as completing onboarding for new employees (creating accounts on multiple SaaS websites). Essentially, any multi-step operation that currently requires navigating multiple websites can be executed by agents.

Part Four

Current Challenges and Limitations

Despite the enormous potential, today's browser agents are still far from perfect. Current implementations reveal some long-standing technical and infrastructure challenges:

Architectural Mismatch

Modern web is designed for human-operated browsers and has gradually evolved over time to actively resist automation. Data is often buried in HTML/CSS optimized for visual display, limited by interactive gestures (mouse hovers, swipes), or only accessible through undisclosed APIs.

On this basis, anti-scraping and anti-fraud systems artificially add additional barriers. These tools combine IP reputation, browser fingerprints, JavaScript challenge feedback, and behavioral analysis (such as randomness in mouse movements, typing rhythm, and dwell time). Ironically, the more "perfect" and efficient an AI agent appears—such as instant form filling and never making mistakes—the easier it is to be identified as malicious automation. This can lead to hard failures: for example, OpenAI or Google's agents may successfully complete all steps before checkout, but ultimately get blocked by CAPTCHA or secondary security filters.

Human-optimized interfaces combined with bot-unfriendly defenses force agents to adopt fragile "human mimicry" strategies. This approach is prone to failure, with a low success rate (without human intervention, the completion rate of full transactions is still less than one-third).

Trust and Security Concerns

To give agents full control, they often need access to sensitive information: login credentials, cookies, two-factor authentication tokens, and even payment information. This raises understandable concerns for both users and businesses:

  • What if the agent makes a mistake or is tricked by a malicious website?

  • If the agent agrees to a service term or executes a transaction, who is responsible?

Based on these risks, current systems generally adopt a cautious approach:

  • Google’s Mariner does not input credit card information or agree to service terms, but hands it back to the user.

  • OpenAI's Operator prompts users to take over login or CAPTCHA challenges.

  • Anthropic's Claude-driven agents may directly refuse to log in, citing security concerns.

The result is frequent pauses and handovers between AI and humans, weakening the seamless automation experience.

Despite these obstacles, progress is still advancing rapidly. Companies like OpenAI, Google, and Anthropic are learning from failures with each iteration. As demand grows, a form of "co-evolution" is likely to emerge: websites becoming more friendly to agents in favorable scenarios, while agents continuously enhance their ability to mimic human behavior to bypass existing barriers.

Part Five

Methods and Opportunities

Current browser agents face two starkly different realities: on one hand, the hostile environment of Web2, where anti-scraping and security defenses are ubiquitous; on the other hand, the open environment of Web3, where automation is often encouraged. This difference determines the direction of various solutions.

The following solutions can be broadly divided into two categories: one helps agents bypass the hostile environment of Web2, while the other is native to Web3.

Although the challenges faced by browser agents remain significant, new projects are continuously emerging, attempting to directly address these issues. The cryptocurrency and decentralized finance (DeFi) ecosystems are becoming natural testing grounds because they are open, programmable, and less hostile to automation. Open APIs, smart contracts, and on-chain transparency eliminate many friction points common in the Web2 world.

Here are four types of solutions, each addressing one or more core limitations of the present:

Native Agent-Driven Browsers for On-Chain Operations

These browsers are designed from the ground up to be driven by autonomous agents and are deeply integrated with blockchain protocols. Unlike traditional Chrome browsers, which require additional dependencies like Selenium, Playwright, or wallet plugins for on-chain operation automation, native agent-driven browsers provide direct APIs and trusted execution paths for agents to call.

In decentralized finance, the validity of transactions relies on cryptographic signatures rather than whether the user "acts like a human." Therefore, in on-chain environments, agents can bypass common Web2 CAPTCHA, fraud detection scores, and device fingerprint checks. However, if these browsers point to Web2 sites like Amazon, they cannot bypass the relevant defense mechanisms and will still trigger normal anti-bot measures in that scenario.

The value of agent-driven browsers is not in magically accessing all websites, but in:

  • Native Blockchain Integration: Built-in wallet and signature support, eliminating the need for MetaMask pop-ups or parsing the DOM of dApp frontends.

  • Automation-First Design: Providing stable high-level commands that can be directly mapped to protocol operations.

  • Security Model: Fine-grained permission control and sandboxing ensure that private keys remain secure during automation.

  • Performance Optimization: Capable of executing multiple on-chain calls in parallel without browser rendering or UI delays.

#

Case Study: Donut

Donut integrates blockchain data and operations as first-class citizens. Users (or their agents) can hover to view real-time risk metrics for tokens or directly input natural language commands like “/swap 100 USDC to SOL.” By bypassing the hostile friction points of Web2, Donut allows agents to operate at full speed in DeFi, enhancing liquidity, arbitrage, and market efficiency.

Verifiable and Trusted Agent Execution

Granting agents sensitive permissions carries significant risks. Relevant solutions use Trusted Execution Environments (TEEs) or Zero-Knowledge Proofs (ZKPs) to encrypt and confirm the expected behavior of agents before execution, allowing users and counterparties to verify agent actions without exposing private keys or credentials.

#

Case Study: Phala Network

Phala uses TEEs (such as Intel SGX) to isolate and protect the execution environment, preventing Phala operators or attackers from spying on or tampering with agent logic and data. TEEs act like a hardware-backed "secure room," ensuring confidentiality (inaccessible to outsiders) and integrity (unchangeable by outsiders).

For browser agents, this means they can log in, hold session tokens, or handle payment information, while this sensitive data never leaves the secure room. Even if the user's machine, operating system, or network is compromised, it cannot be leaked. This directly alleviates one of the biggest obstacles to the deployment of agent applications: the trust issue regarding sensitive credentials and operations.

Decentralized Structured Data Networks

Modern anti-bot detection systems not only check whether requests are "too fast" or "automated," but also combine IP reputation, browser fingerprints, JavaScript challenge feedback, and behavioral analysis (such as cursor movement, typing rhythm, and session history). Proxies from data center IPs or completely reproducible browsing environments are easily identified.

To address this issue, these networks no longer scrape human-optimized web pages but instead directly collect and provide machine-readable data or route traffic through real human browsing environment proxies. This approach bypasses the vulnerabilities of traditional scrapers in parsing and anti-scraping stages, providing agents with cleaner and more reliable input.

By routing agent traffic through these real-world sessions, distributed networks allow AI agents to access web content like humans without immediately triggering blocks.

#

Case Studies

  • Grass: A decentralized data/DePIN network where users share idle residential broadband, providing geographically diverse access channels for public web data collection and model training that are friendly to proxies.

  • WootzApp: An open-source mobile browser supporting cryptocurrency payments, featuring a backend proxy and zero-knowledge identity; it gamifies AI/data tasks for consumers.

  • Sixpence: A distributed browser network that routes traffic for AI agents through contributions from global users' browsing.

However, this is not a complete solution. Behavioral detection (mouse/scrolling patterns), account-level restrictions (KYC, account age), and fingerprint consistency checks may still trigger blocks. Therefore, distributed networks are best viewed as a foundational layer of obfuscation that must be combined with human-mimicking execution strategies to achieve maximum effectiveness.

Agent-Focused Web Standards (Looking Ahead)

Currently, an increasing number of tech communities and organizations are exploring: If future web users are not only humans but also automated agents, how should websites safely and compliantly interact with them?

This has spurred discussions around emerging standards and mechanisms aimed at allowing websites to explicitly state "I allow trusted agents to access" and provide a secure channel for completing interactions, rather than defaulting to intercepting agents as "bot attacks" as is done today.

  • "Agent Allowed" Tag: Just like the robots.txt that search engines adhere to, future web pages may include a tag in the code to inform browser agents "this can be accessed safely." For example, if you use an agent to book a flight, the website would not pop up a bunch of CAPTCHAs but would instead provide an authenticated interface directly.

  • API Gateway for Certified Agents: Websites can open dedicated entry points for verified agents, akin to a "fast track." Agents would not need to simulate human clicks or inputs but could follow a more stable API path to complete orders, payments, or data queries.

  • W3C Discussions: The World Wide Web Consortium (W3C) is already researching how to establish standardized channels for "managed automation." This means that in the future, we may have a globally accepted set of rules that allow trusted agents to be recognized and accepted by websites while maintaining security and accountability.

While these explorations are still in their early stages, once implemented, they could greatly improve the relationship between humans, agents, and websites. Imagine no longer needing agents to desperately mimic human mouse movements to "fool" risk control, but instead completing tasks openly through an "officially allowed" channel.

On this path, cryptocurrency-native infrastructure may take the lead. Because on-chain applications inherently rely on open APIs and smart contracts, they are friendly to automation. In contrast, traditional Web2 platforms may continue to play it safe, especially companies reliant on advertising or anti-fraud systems. However, as users and businesses gradually accept the efficiency gains brought by automation, these standardized attempts are likely to become key catalysts in pushing the entire internet toward an "agent-first architecture."

Part Six

Conclusion

Browser agents are evolving from simple conversational tools into autonomous systems capable of completing complex online workflows. This shift reflects a broader trend: embedding automation directly into the core interface of user interactions with the internet. While the potential for productivity gains is enormous, the challenges are equally daunting, including how to break through entrenched anti-bot mechanisms and ensure security, trust, and responsible usage.

In the short term, improvements in agents' reasoning capabilities, faster speeds, tighter integration with existing services, and advancements in distributed networks may gradually enhance reliability. In the long term, we may see the gradual implementation of "agent-friendly" standards in scenarios where automation benefits both service providers and users. However, this transition will not be uniform: in automation-friendly environments like DeFi, adoption will be faster; whereas in Web2 platforms that heavily rely on user interaction control, acceptance will be slower.

In the future, competition among tech companies will increasingly focus on how well their agents can navigate real-world constraints, whether they can be safely integrated into critical workflows, and whether they can deliver stable results in diverse online environments. Whether all of this will ultimately reshape the "browser war" will depend not solely on technical prowess but on the ability to build trust, align incentives, and demonstrate tangible value in everyday use.

免责声明:本文章仅代表作者个人观点,不代表本平台的立场和观点。本文章仅供信息分享,不构成对任何人的任何投资建议。用户与作者之间的任何争议,与本平台无关。如网页中刊载的文章或图片涉及侵权,请提供相关的权利证明和身份证明发送邮件到support@aicoin.com,本平台相关工作人员将会进行核查。

5折买ETH,注册立返20%
Ad
Share To
APP

X

Telegram

Facebook

Reddit

CopyLink