Written by: Techub News Compilation
Recently, Anker, co-founder and CEO of the AI data platform Braintrust, guest appeared on a16z, engaging in an in-depth conversation with host Martin Casado. Anker's career spans the era of relational databases and the deep learning revolution, having founded the AI document processing company Impira (later acquired by Figma) and led Figma's AI team. This unique perspective of transitioning from a "system product person" to an "AI person" provides him with calm and sharp observations on engineering practices, technical paths, and industry ecosystems amid the current AI boom. This conversation touched on the core tensions of AI research and development, the true value of engineering, and the real choices faced by developers, providing a crucial framework for understanding the next phase of AI applications.
Core Conflict: Continuous Uncertainty vs. Discrete Systemic Nature
The conversation opened by pointing directly to a fundamental cognitive difference. Anker noted that AI is essentially continuous and uncertain, while traditional software systems are discrete and deterministic. The mindset of systems engineers is built upon predictability, reliability, and consistency; they are accustomed to building compilers, ensuring that every query is accurate. In contrast, AI developers are more akin to constructing optimizers, accepting variability and stochasticity in results within a certain range.
This difference in thinking leads to tensions in practice. Currently, the R&D models of leading AI labs largely follow the so-called "bitter lessons"—gaining leaps in capability through massive data and computational resources rather than relying on intricate human design. This seems to reflect a paradigm of "reverse engineering." Capital plays a crucial role here: as long as it's possible to continuously raise substantial funds, these labs can push models to become stronger through "brute-force computation" without prematurely getting bogged down in complex engineering optimizations.
However, Anker believes this is not the end of the story. He vividly described the current state: "It's like we're building 'God'." As long as continued capital investment can make this "God" even just 1% smarter, it's economically feasible. But when there comes a day you can no longer noticeably make 'God' smarter by piling on resources, a huge opportunity will arise—that is to engineer 'God' to be extremely efficient. At that point, engineering complexity will become the true barrier, rather than an adjunct of capital.
Assessment: Understanding Problems, Not Models
In the face of complex and opaque AI models, how can reliable applications be constructed? Anker emphasized the core role of assessment. He clarified a common misunderstanding: the purpose of assessment is not to understand the internal mechanisms of the model (which is nearly impossible) but to deeply understand the specific problems you're trying to solve.
He shared his experiences at Impira and Figma: when optimizing models for different use cases, improving the performance in one scenario could degrade another scenario. Establishing an assessment system that combines hypotheses, simulations, quantitative measurements, and qualitative analyses in a feedback loop is the only way to ensure products iterate in the right direction. He even believes that assessment should become the primary task of product managers, as it is the natural evolution of the traditional product requirement document—by creating an excellent assessment set, you are essentially making a declarative definition of the behavior your product should exhibit.
For application developers, this means a mindset shift: do not try to "understand" or "control" that "alien technology" model that has come down from above, but rather build a well-designed "harness" (testing and integration framework) to "protect" your application. This framework allows you to flexibly test new models, adjust prompts, manage contexts, and reliably connect the uncertainty of AI outputs to traditional, deterministic state machines. The logic within the framework should be designed to be sufficiently lightweight and disposable, as models are rapidly iterating.
The Rise of Open Source Models and the Spread of "Intellectual Dividends"
An important focus of the conversation was open source models, particularly the rapid development of Chinese models (such as GLM-5 and MiniMax 2.5). Anker shared actual data from the Braintrust platform:
- Number of use cases/clients: The proportion using Chinese models is quite low.
- Token consumption: The proportion, however, is very high.
- USD expenditure weight: The proportion is still relatively low.
This distribution reveals a subtle state of the current market. Anker’s benchmarking shows that, on certain complex agent tasks, top Chinese models have already reached saturated performance similar to cutting-edge models like Claude 3.5 Sonnet (for example, 95% vs. 100% scoring), but at a cost as low as one-third or even one-ninth.
So, why hasn't USD weight shifted rapidly? Anker pointed out a few systemic reasons:
- API and infrastructure maturity: Open source model providers still lag far behind giants like OpenAI in rate limits, service stability, and scalable delivery.
- Cost-driven adoption: Users are currently primarily turning to these models for cost reasons, which naturally limits their revenue scale.
- Suppression of innovation cycles: Closed-source cutting-edge models experience stepwise advances approximately every 3-6 months, each attracting the industry's attention and temporarily "stifling" discussions about open-source alternatives. Only when closed-source innovations plateau while open-source models catch up will the market seriously consider largescale migration.
Anker observed an interesting pattern: some savvy clients ignore industry noise, sticking with "outdated" models like Llama 3.1, as they are already well-versed in specific high-capacity, stable-demand use cases (like customer service) and can continuously optimize and extract peak performance from the models. This suggests that once model capability universally exceeds a certain threshold, engineering, domain expertise, and cost control will become more important than merely pursuing the latest models.
"Bash Frenzy" vs. "SQL Rationality": A Battle Over Abstraction Layers
Concerning a current popular trend in AI agent development—giving models lower-level system access similar to Bash—Anker expressed a distinctly different opinion. Many developers found that allowing models to directly manipulate the file system and execute command-line instructions seemed to leverage their capabilities better than through structured APIs (like MCP servers). Thus, various engineering attempts appeared to "map" problems to a Bash-like environment.
Anker believes this is essentially a "lazy" engineering mindset. The models themselves are adept at writing code (which is harder than writing SQL), and they are fully capable of expressing SQL queries. The crux of the problem lies in how to organize the data in the SQL environment for model access. If SQL itself is a more efficient data query method, then forcing models to simulate SQL via a series of Bash operations is undoubtedly inefficient.
The Braintrust team conducted benchmarking on this and the results were rather ironic: in terms of accuracy, efficiency, token utilization, and speed, the SQL solution comprehensively outperformed the Bash solution. Even less performant models performed better in SQL environments than in Bash environments.
This introduces a deeper division: some developers prefer to grant models raw computational environments to allow them to "brute force" solutions; others advocate applying foundational principles of computer science to provide models with strongly typed, referentially transparent, declarative interfaces. Anker is clearly an advocate for the latter. He implements strict type standards within Braintrust, believing that achieving consensus at the type system level is the cornerstone of ensuring the reliability of complex systems, especially those involving state management. This suggests that the widespread adoption of AI may usher in a "golden age" of foundational computer science principles, as reliably harnessing powerful models necessitates more rigorous and profound system design.
The Future of Capital, Demand, and Engineering
Finally, the conversation returned to that fundamental question: if leading labs can continue to raise far more funds than the downstream ecosystem (for example, the amount raised in a funding round by Anthropic might exceed the total funding of all startups dependent on its models), then when will engineering become a necessity?
Anker proposed two possible factors that could break this cycle: first, rationalization of capital, making funding more difficult; second, the models' scaling might hit fundamental limits. If neither occurs, it might indeed signal the approach of AGI, wherein discussions would shift to entirely different questions.
However, he introduced a more immediate limiting factor: the demand-side absorption capacity. The speed at which businesses deploy and absorb AI systems lags far behind the advancement of model capabilities. Integrating a model that scores 100 in benchmarks into complex enterprise workflows and truly generating value faces immense challenges. This could be the first "planetary ring" limitation encountered by the "sun" of model capability from the supply side.
For entrepreneurs, Anker also offered pragmatic advice. Under the current "pay-per-token" dominant business model, the pricing of any AI-native product needs to establish some value alignment with token consumption. Although Braintrust itself does not resell tokens, its pricing strategy based on data volume (which approximates token amount) is intended to align with customer value perceptions and its cost structure.
In summary, Anker's perspective brings a dose of clarity from systems engineering to the noisy world of AI. His core argument is: the current phase of model capability enhancement driven by capital and brute-force computation will ultimately slow down; when that happens, engineering, efficiency optimization, and a deep understanding of the essence of problems will become key to building enduring, reliable, and valuable AI applications. Now is the time to accumulate experience and tools for that era.
免责声明:本文章仅代表作者个人观点,不代表本平台的立场和观点。本文章仅供信息分享,不构成对任何人的任何投资建议。用户与作者之间的任何争议,与本平台无关。如网页中刊载的文章或图片涉及侵权,请提供相关的权利证明和身份证明发送邮件到support@aicoin.com,本平台相关工作人员将会进行核查。