
加密小师妹|Monica|4月 25, 2026 10:50
First, answer my question. Which of these two images was generated by AI?
Think of the answer before continuing, but before that, we may unknowingly be standing at another turning point in the development of AI.
In the past two weeks, there have been significant updates almost every day: OpenAI has released GPT-5.5, promoting the concept of "digital workers" who can independently complete multi-step engineering tasks; Anthropic subsequently launched Claude Opus 4.7, and SWE bench Pro achieved a programming review score of 64.3%, nearly 11 percentage points higher than the previous generation; The actual task resolution rate of cursor has increased by 13%, and Lotte's actual production bug resolution has directly tripled; Google's Gemini 3.1 Flash has increased its response speed by 2.5 times and reduced its price by half.
The competition is so fierce that not following up every month is like missing an entire era.
But among them, the one that interests me the most and has the most direct impact on daily life is GPT Image 2.
It's not another update that makes the drawing look better, but a complete overhaul of the entire generation method: the model has already completed planning for layout, semantics, and intent before generating the first pixel. Just like GPT for text, it wants to become the 'GPT of the image field'.
The most obvious changes are:
Chinese characters are no longer messy. The accuracy of text rendering has increased from 90% to about 99%, and can be used in Chinese, Japanese, and Korean. Previously, the text in AI generated images was basically equivalent to ghost symbols, but this problem is now close to being solved.
You can change a picture just by speaking human words. Multiple rounds of natural language editing, you can say 'change the building on the left to red and add neon lights', and then say' add a full moon to the sky 'without having to rewrite a long prompt.
The resolution has reached 4K, and it's even faster. The maximum generation speed is 4096 × 4096, which is about 2 times faster than the previous generation.
The effect of generating UI and app screenshots has skyrocketed, enough to make it appear fake.
I have always believed that text is the most efficient carrier of information for humans, but images are the most instinctive way of communication. When AI graphics go from being a "toy" to a "productivity tool" and the real threshold is lowered, everyone who has ideas but struggles to make graphics can finally express what is in their mind in its entirety.
But the same ability also means that soon, you may not be able to trust the "live photo" in your social circle, the "evidence picture" in the news, and the "screenshot" in the chat box anymore.
The era of seeing is believing is over.