Original Title: "Zhao Changpeng Invested in a Chinese Junior Student, $11 Million Seed Round, Creating an Education Agent"

Original Author: Founder Park, a startup community under Geek Park

A Chinese junior student, $11 million seed round, the highest financing product for student entrepreneurship in Silicon Valley.

The product VideoTutor, which allows users to generate personalized teaching/explanation videos with just one sentence, aimed at K12 education agents, announced today the completion of an $11 million seed round financing. This round was led by YZi Labs, with participation from Baidu Ventures, Jinqiu Fund, Amino Capital, BridgeOne Capital, and several well-known investors.

This is also the first AI product company invested in by YZi Labs.

Founder Kai Zhao stated that VideoTutor received recognition and support from CZ and the YZi Labs investment team, ultimately leading to YZi Labs leading this round of financing. They received over 10 term sheets (letters of intent) and ultimately chose these few firms.

The first version of the product was launched on May 14 (debuting at the Founder Park product marketplace), gaining market recognition and validating product-market fit (PMF). In less than five months, they completed this $11 million seed round financing.

In Kai's view, the core reason they secured this financing is that, under the premise of having the right direction, the "Little Genius Team" solved the pain points of studying for the U.S. college entrance exam in the K12 track through visual learning.

"This field is more suitable for young people to work in, combined with very good engineering skills, and the founder himself has excellent insights and experience, with very fast execution."

Not just them, but also Cursor, Mercor, Pika, GPTZero, and others, Silicon Valley college students are creating AI products that set new financing records, refreshing everyone's understanding of AI entrepreneurship.

Entrepreneurship in the AI era is indeed a bit different.

We chatted with these young people from VideoTutor to understand why they were able to secure this seed round financing, what changes are happening in Silicon Valley entrepreneurship today, and why they are keen to recruit employees from major domestic companies.

Interview Guests: CEO Kai Zhao, CTO James Zhan.

Interview & Editing | Wan Hu

The following is the interview content, edited and organized by Founder Park.

K12 Track, Visual Learning is the True Direction

Founder Park: So many institutions are optimistic about you; what do you think is the core point that impressed them?

Kai: I think the first reason is that we have the right direction; the AI education track has great potential and prospects. The education field we are targeting is the U.S. college entrance exams, SAT, and AP. The user group we are aiming at is K12 high school students, and we have a very small gap with this user group, basically no generational gap. We have gone through the entire exam preparation cycle and know where the pain points of exams and preparation are, allowing us to create a product that truly addresses the pain points of this group.

Secondly, the team is very outstanding. James comes from Gemini and was a core engineer in AI engineering and algorithms at Google. I have three previous experiences in educational entrepreneurship, starting from my freshman year creating educational software, and during my sophomore year, I participated in the creation of MathGPTPro, which was selected for Qiji Chuangtan, among others. I have experience in successfully building educational products.

The third point is that in the AI education field we are working in, the core is the animation engine, and we are the core developers of VideoTutor, the team that understands the core technology best, capable of achieving very precise rendering of the animation engine.

The team itself has very good marketing genes and knows how to do communication.

VideoTutor aligns very well with a consensus among mainstream U.S. VCs called the "Little Genius Team," which refers to the fact that this field is more suitable for young people to work in, combined with very good engineering skills, and the founder himself has excellent insights and experience, with very fast execution. I think this is a consensus that all investors can agree on.

VideoTutor on the NYSE at YZi Labs EASY Residency Demo Day

Founder Park: What core problem in the education industry does your product aim to solve?

Kai: Currently, learning products on the market can be categorized into two types: active learning products and passive learning products. Passive learning products, like Byte's Gauth, Chegg, AnswersAi, etc., cover what we call "homework help" scenarios, with a very short learning chain, mainly where students pay for homework answers.

VideoTutor, on the other hand, covers active learning scenarios. We do not need to consider students' learning motivation because they must study and take exams, such as the SAT and AP. In this scenario, there is a significant demand for visualization; 80% of the content in the SAT involves functions, calculus, and other knowledge requiring complex image rendering. VideoTutor's animation engine can effectively address this scenario.

Moreover, the customer unit price in this field is very high. On average, 2.6 million students in the U.S. take the SAT each year, creating a substantial demand for paid services. Offline SAT courses are very expensive, charged not by package but by the hour, starting at an average of $150 per hour, with most charging around $230. Many students and parents are willing to pay for learning. However, VideoTutor can effectively replace or even substitute teacher training because the current AI-generated videos are almost indistinguishable from teacher training content. This way, students can have their own AI personalized exam preparation teacher at the lowest cost.

Founder Park: What was the catalyst for you to decide to create this product?

Kai: Actually, before us, there was already a team at Stanford working on something called Gatekeep Ai. They also wanted to do visual learning. I had already realized the impact of this direction. In my previous entrepreneurial ventures, the educational products everyone was creating were basically just connecting to the GPT API, similar to a ChatGPT wrapper product. But we found that merely relying on text Q&A has its limitations. You can see that businesses like Chegg and Gauth are declining significantly, as many of their scenarios have been replaced by ChatGPT, where students can pay $20 to use ChatGPT to solve many homework problems.

Products based on API wrappers and optimization layers have hit a ceiling.

However, multimodal visual generation has tremendous potential because there are many visual learning scenarios in the U.S. college entrance exam field. Unfortunately, Gatekeep got off to a good start but did not continue because it was launched a bit early; the foundational model programming capabilities were not mature yet, and GPT-4 had not been released. Additionally, the math animation engine involves rendering and algorithms, which they could not conquer. But our team has mastered all the core development of the animation engine, solving this problem and making video rendering very accurate.

PMF: Strong User Willingness to Pay

Founder Park: After your product was launched, you reached cooperation with several schools. When did you feel, "I did this product right, I found the pain point," and felt you had found PMF?

Kai: This can be discussed from three dimensions.

First, from the revenue metrics perspective, VideoTutor has already received API requests from 1,000 enterprises, including all well-known large educational institutions in the U.S., and even domestic institutions. Additionally, many schools want to purchase services. The intention of C-end users is more direct; there was a student parent who is also an investor, and after experiencing the product, he shared it with all his friends and family for trial, and everyone was willing to pay. Then he somehow got my phone number and texted me wanting to invest in us. C-end users have a very strong willingness to pay.

The second point is from the user demand perspective. Why is one-on-one tutoring in the U.S. so rigid? Because parents believe that one-on-one teaching is effective and are willing to pay for it. Now, multimodal AI technology can achieve a one-on-one teaching effect in a human-like manner, answering questions as they are asked. Moreover, the recorded video lessons from online one-on-one tutors in the U.S. are actually no different from AI-generated videos. This is what I call "demand transfer"; the recorded courses that students pay a lot for are no different from what I generate with AI, so why not use AI? It’s cheaper and more effective.

We have received a lot of very positive feedback from students, and many teachers are willing to promote this product; the initial completion rates and usage durations are particularly good. The 200 seed users we have screened are all early accumulations.

The third point is a sense of product taste and sense. When you keep doing it, from the overall progress of the education industry to the core demand points for students and parents to pay, and then to the evolution of the product itself, you can trace back and see that the entire logic is closed-loop. So from these three dimensions, you feel that PMF is already sufficient. The most critical point is that the willingness to pay is extremely strong.

Cooperation with FIZZ

Founder Park: Many users want to pay voluntarily, and some have actively contacted you to invest.

Kai: Yes. In the SAT and AP fields, the willingness to pay is inherently strong. The customer unit price in this field starts at $100 to $200, and offline classes are even more expensive, possibly up to $800. In the U.S., 2.6 million students take the SAT, and 37% of them are willing to pay, making it a market with very strong willingness to pay and demand. Our product can achieve very good demand transfer.

Founder Park: In the SAT track, will a test taker trust an AI over a real teacher?

Kai: Currently, AI can answer questions at the level of the U.S. college entrance exams, SAT and AP, without factual errors. In this case, why is it better than offline tutors? One reason is cost; the other is that students can continuously ask questions without worrying about whether the teacher will think their questions are silly or become impatient. They can learn anytime, anywhere, 24/7.

Moreover, this market is transferable. After completing the U.S. market, we can also expand to Canada, the U.K.'s A-Level exams, etc., where there is a significant demand for paid services.

Founder Park: How are you considering the payment aspect now?

Kai: We offer a monthly subscription, and there is also a pay-per-performance model. I believe AI can now achieve payment based on results. We might launch a package where you pay $799, and we guarantee your child will score full marks in SAT math.

Founder Park: But paying based on exam results still depends on the student's personal initiative, right?

Kai: This might not be feasible for the domestic college entrance exam because there are many assessment points, over a thousand. However, the SAT only has 62 assessment points, of which 50 are regular points that most students can handle, and the remaining 12 points can also be mastered. Unless the student has significant logical issues, there is basically no situation where they cannot learn. Moreover, the efficiency improvement from AI is very evident.

In fact, many online tutors in the U.S. offer this service; you pay a teacher $1,800, and the success rate is nearly 100% because the SAT assessment points are fixed. As long as the student's IQ is normal, they generally have no issues. But that’s not the case for the domestic college entrance exam, which cannot be improved in the short term. Additionally, the domestic college entrance exam requires a score gap, which includes difficult questions, but the SAT does not have absolute difficult questions because it mainly assesses whether you have mastered the knowledge points.

The pay-per-performance model is also a method already used by tutoring teachers, which has this prerequisite.

Founder Park: In your pricing, will model costs be a concern? Is it a significant portion?

Kai: The customer unit price in our field is very high, starting at $69 per month, and model costs are currently very low, so it’s not an issue. The education industry is not like the coding field, where everyone is competing on price because coding requires supporting long contexts.

Products for high school students, the web version is the most important.

Founder Park: I remember you said last time that your first version prototype took just over two months to develop. How did you consider the entire development cycle, such as division of labor, deciding which features to include or exclude?

Kai: The consensus among our team members is that iteration must be fast because speed allows us to quickly obtain feedback from early users.

After the first version was released on Twitter, it caused a huge stir and brought in a lot of users. However, many of these users were programmers, investors, or tech enthusiasts, whom we can collectively call "tech early adopters." At that stage, the feedback we received from them was quite scattered and not very valuable. We still needed to filter out the truly core seed users from such a wide range of users, specifically high-quality high school students, and then obtain useful feedback through consultations.

The core feedback we received was that the accuracy of video rendering must reach 100%, which is the top priority for optimization. Features like whether the UI looks good or whether it supports different TTS voice options were all cut. We returned to the core of the product: we are doing knowledge learning in science scenarios, so the accuracy of graphic rendering is key.

Founder Park: How did you balance the generation time at that time?

Kai: At that time, the maximum peak duration was about 6 minutes. The main consideration was that explanations for regular questions and knowledge points should not exceed 6 minutes. However, in subsequent feedback, we found that some students with less learning ability wanted the content to be explained more slowly and in more depth. We realized that duration should not be restricted; it should depend more on the user's learning ability.

Founder Park: What is the longest duration now?

Kai: The longest should be within an hour; users can keep asking questions. We can generate content in real-time while interacting, but this feature was added recently; the initial version did not have it.

Founder Park: Were there any features you initially wanted to implement but later found were not that important and decided to hold off on?

Kai: For example, the app. Initially, we considered whether to quickly develop an app, but later found that most students in the U.S. primarily use laptops or iPads for learning. Most K12 schools in the U.S. provide students with a Chromebook, and computers are highly prevalent. Their assignments are also completed on computers. High school students generally have a computer, and the proportion of mobile phones in learning scenarios is less than 5%, which is very low.

Founder Park: So if it’s a product targeting education or student groups, the web version is the first priority, while the app is not as important.

Kai: Yes, we already knew this data at that time, having studied in the U.S. for many years. Later, we surveyed 100 students from the early user base of several thousand, and over 90 of these 100 students had computers, which further confirmed this point.

Founder Park: When you launched the first version, were you also targeting the K12 group?

Kai: Yes, we continued to target this group afterward. We don’t consider Gauth as a competitor; we are more focused on exam training scenarios. A large number of high school students in the U.S. choose offline training or online learning platforms, and VideoTutor effectively transfers this demand.

Founder Park: Will K12 be your core user group for at least the next year?

Kai: It should be a core metric for the next two years.

Using large models, but not solely relying on them

Founder Park: Can you briefly introduce your current technical implementation plan? VideoTutor indeed performs much better than other video generation models in generating courses and charts, and even when many models cannot accurately generate text, your technology is impressive.

James: The videos we generate contain both text and graphics. The general production process is: we let a large language model generate text and corresponding animation instructions, which are then rendered by our animation engine and finally presented in the video.

The text part is relatively simple; we let the large language model generate the text and then render it directly. However, the animation part is generated by our own mathematical animation rendering engine. Its advantage lies in the high accuracy of rendering axes, geometric figures, and other content, which is precisely where our core technology lies.

Currently, the output of large language models is just text; our agent essentially provides the large language model with a piece of paper and a pen, allowing it to draw the appropriate teaching animations it imagines. The part that is drawn is entirely our technology.

Founder Park: How is the final composition of the video, including audio and video, handled?

James: Initially, the user will input a prompt, such as "What is the Pythagorean theorem?" In the first step, we let the large language model reason through all the scenes, generally specifying 3 to 5 scenes, depending on the difficulty of the question. Then, the model generates a rough script for each scene. Next, based on the script for each scene, we perform a second reasoning to generate the text, corresponding graphics, and the text for the voiceover. The voice text is then synthesized using TTS.

Finally, we stitch all the scenes together to form a complete video.

Founder Park: I understand that the first version followed this plan. Now that an interactive process has been added, has the generation process changed?

James: It has indeed changed. Now, to allow users to see content as quickly as possible, we first generate the first scene for the user to view, while the subsequent scenes continue to render in the background. When users ask questions, we convert their voice into text and then provide this text along with the content of all previous scenes to the large language model for reasoning, allowing it to plan the next teaching scene. The rendering process for subsequent scenes remains the same as before.

Founder Park: If a user has a question after one minute, will they ask it directly? After you receive the question, do you return the user's question along with the previously discussed content for the model to process? During this process, after the user finishes asking, does the animation continue playing or stop?

James: Our current delay has been reduced from the initial 20-30 seconds to under 5 seconds. In terms of interaction, we will implement some transitions so that users do not focus too much on these 5 seconds, making the entire process feel smooth. Within 4-5 seconds, they can see the newly presented content based on their question.

The design at this stage is that the AI teacher will say, "Hmm, let me think," and then wipe the board, simulating a real teacher. If you think something was explained incorrectly, I will erase it and rewrite it for you, making the process feel more natural.

Moreover, we are not just passively waiting for users to ask questions; we also conduct quizzes. We reason based on quiz feedback and user questions. Additionally, we do not have a completely open microphone; users need to actively turn on their microphones, which involves an action to start and stop.

Founder Park: So based on this mechanism, the longest explanation can last about an hour.

James: To be precise, there are no limits; if they have questions, they can keep asking.

Kai: Yes, there are no preset limits. In fact, VideoTutor is pursuing this direction as a result of advancements in multimodal AI; we are not creating demand but better meeting existing needs. Look at offline real education; why are American parents willing to pay a lot? Because the U.S. education and training industry is more focused on one-on-one teaching, starting at $100 per hour. This is because offline teachers can guide questions, observing where you struggle and then asking you. VideoTutor also strives to achieve this real teacher teaching effect, allowing every child to have real-time interaction and teaching.

Founder Park: During class, do you require students to turn on their cameras?

Kai: Not really. Whether students turn on their cameras mainly depends on U.S. privacy laws. The product will not design a mandatory feature for turning on cameras; it depends on the student's willingness. The main interaction is still through questions and voice feedback.

Founder Park: Technically, are you using a strategy that combines small models with cloud-based large models, or how does it work?

Kai: It is a combination. We have an internal dataset with over 100,000 video data entries. The better data is manually re-annotated and then used to train fine-tuned models. For example, we currently have over 8,000 SAT sample training data. These fine-tuned small models will work in conjunction with cloud-based general commercial models like Claude and Gemini.

Founder Park: Will using Claude, Gemini, or GPT affect the core performance of the product?

Kai: We mainly focus on the K12 field, and the level of the base models is already sufficient. However, to ensure 100% accuracy, we will call on two models to cross-check simultaneously. If both models provide the same answer, it is unlikely to be incorrect. For code generation, we primarily rely on Claude, as it has better coding capabilities.

Founder Park: What is the current technical bottleneck of the product? Is it model capability or code generation?

Kai: Model capability is one aspect. Another is rendering; we have already reduced it to under 5 seconds, and it can be faster with more GPU deployment. Another area is long-term memory capability. We need to accumulate long-term learning behavior data for students to know which knowledge points they do not understand. For example, if they forget a knowledge point learned a month ago, we can remind them again.

James: We have put a lot of effort into rendering time, continuously making technical breakthroughs, from the initial 2 minutes to 1 minute, and now to under 10 seconds. Our ultimate goal is to achieve rendering with virtually no delay, where the user asks a question, and the result is immediately available after reasoning. This is a challenge our team is currently tackling, but we have already found a new direction.

Measuring Effectiveness by Exam Scores, Not Just View Rates

Founder Park: How do you measure the core metrics of the product at this stage? How do you determine if a video is useful to users?

Kai: The most critical metric is the exam. In the new version, after watching a video, there will be a quiz at the end. If they answer correctly, it proves they understood; if not, it indicates that the explanation was unclear.

Learning effectiveness cannot be measured solely by view rates; some students may understand after watching only half. If they pass a test after watching half, they don’t need to watch the rest. The core metric of our product is how many students improve their scores.

Founder Park: But the final exam is completed in a different setting. How do you obtain the result of whether they passed?

Kai: This relates to the product culture in the U.S., where users tend to share their positive results spontaneously after using a product. Many students who use VideoTutor and take the SAT will actively come back to share their experiences and scores. We also encourage them to become campus ambassadors for secondary dissemination.

We have a group of 20 high school students as campus ambassadors. If you look at Mercor, it was very successful early on by using the typical "user success story" model. Mercor helped many Indian programmers find jobs in the U.S., and then they would contact these users to create a user story about how they found jobs using Mercor. This created excellent word-of-mouth marketing. VideoTutor operates on the same principle; we want more students to achieve great results after using the product and then share their experiences as user stories.

Founder Park: What are the main channels for students to share their experiences?

Kai: Students mainly share on TikTok, while parents share in Facebook groups.

Founder Park: If we look at a time frame of six months to a year, what is your planned approach for product growth?

Kai: Essentially, VideoTutor is still a consumer-facing product, and word-of-mouth is crucial. Many successful AI applications early on relied on the word-of-mouth of seed users. For example, if designers found it useful, they would spread the word. For us, the core metric is how many SAT test-takers achieve high scores after using the product and then share it with other children and parents. Parents mainly use Facebook and Instagram, while students use TikTok, and we will promote on these platforms. When this consensus builds, school teachers will naturally become aware. We were able to gain recognition from so many schools early on because many teachers found it useful and recommended it to the school procurement heads. Therefore, the most critical aspect is still the word-of-mouth among consumer users, and the key metric is how many children improve their scores after using it.

Founder Park: What is the current status and timeline for the new version's release?

Kai: We hope to officially public release it within the next two months. By then, students will be able to get answers with very low latency, and the graphic rendering in science scenarios will achieve 100% accuracy. Of course, we will not cover competitive scenarios or complex college knowledge like linear algebra for now; we will focus more on the K12 field.

Founder Park: What are the barriers or moats for VideoTutor now?

Kai: I think there are a few points. First is the data flywheel. Behind the videos is code; good video data generated by users can be re-trained and fine-tuned after secondary annotation. The more data we have, the better the video quality. Another aspect is learning behavior data; we know which knowledge points are weak for different students, allowing us to establish a data flywheel. The more people use it, the better the product understands the students. Second is the leading technological advantage, such as the animation engine's algorithms. While the algorithms themselves are not the core advantage, as we iterate quickly and gather more data, the advantages will become more apparent.

Third is the brand; VideoTutor has already become a leading brand in the AI education field among parents in North America, and the trust of parents is also an intangible barrier.

Founder Park: What do you expect VideoTutor to evolve into in three to five years?

Kai: We hope that in the future, VideoTutor can become an AI teacher for everyone learning science knowledge. We only focus on science. I believe it will surpass Duolingo in the future. Duolingo is a world-class language learning product, but there has not been a world-class product in STEM science scenarios because science requires a lot of graphic rendering. Now that the foundational model technology is ready, I believe the science field will give rise to the next "Duolingo."

Hiring, Especially Seeking Talent from Major Domestic Companies

Founder Park: You have had several entrepreneurial experiences in the past. What were they about?

Kai: I am currently a junior. I started my first education product venture with James in my freshman year, securing $200,000 in angel investment. Although that venture failed, I gained valuable experience: you cannot fall into homogeneous competition. At that time, the app we developed had many similar products on the market, and we had to engage in traffic competition early on, making it difficult to charge.

In my second venture, I joined another team, MathGPTPro, as a co-founder and stayed for a few months. During that phase, I learned how to analyze product metrics, how to build products, and how to expand user bases. It was also during that time that I concluded that text-based answer-type educational products had reached their limit. They are not much different from ChatGPT, and the structured knowledge question banks that companies like Zuoyebang invested heavily in have been replaced by the editing capabilities of large models. So in my third venture, I knew that visualization was an inevitable trend.

Zhao Kai's photo pitching with Sam Altman at Harvard University

Founder Park: Besides realizing the limitations of text-based products, how have your past experiences helped you with VideoTutor in terms of team or other aspects?

Kai: They have been very helpful.

First, I can better judge the direction and future potential of the product. I assess the evolution direction of the entire product by looking at competitors' website traffic and revenue.

Second, in product development, I can better gauge the development pace of the product, including product design, front-end and back-end integration, and which metrics to monitor.

Third, I have improved my team management and organizational culture skills. I have established a more complete management system, including the division of labor for each team member, rewards, and options distribution. I have also learned how to raise funds. We completed this round of $10 million funding within 20 days.

Founder Park: How many people are on your team now?

Kai: Six people, and we all live together.

Founder Park: How was the team initially built?

Kai: James and I have already started two ventures together. We both graduated from the same school and developed an app together in our freshman year. In our sophomore year, I started another venture with two other people, and we all knew each other. When we realized that this technology could bring a significant product vision, we contacted each other to form a team to work on this product. Everyone was an alumnus, including another partner in the team, Nick, who was also my college roommate.

Founder Park: You are also planning to expand the team. What kind of people are you looking to hire?

Kai: We are mainly looking for back-end, front-end, large language model, and UI/UX experts, preferably with experience. We have already passed the trial-and-error phase and entered the rapid product building phase, so we need experienced individuals to help us grow.

Founder Park: You need experienced engineers, product managers, and growth leaders to take the product from 1 to 10, and even from 10 to 100.

Kai: Yes, that’s the stage we are in. We expect to expand the team to 9 to 10 people, prioritizing hiring engineers.

This round of hiring may be based in China, so it will be a mix of in-person and remote work.

Founder Park: What kind of profile do you hope this person will have?

Kai: We prefer someone with experience in major companies, such as ByteDance or Meituan. ByteDance has a fast-paced and competitive organizational culture that values young talent. People trained at ByteDance have good methodologies and capabilities, and when they join us, they can bring these successful experiences for integrated learning.

We want someone who has fought hard in major domestic companies and has experience with rapid iteration. We have moved past the student startup phase, so we don’t need to hire novices; we need to hire experienced individuals who are not completely "industry veterans." Industry veterans may have family considerations and cannot be as competitive. So, we are looking for someone in the middle tier, young and competitive.

We are willing to offer excellent talents rich options. Although we have raised $11 million, the reason we haven’t hired engineers in the U.S. is that we believe the product capability and engineering skills in China are truly excellent. This wave will definitely see a team led by Chinese individuals create great products that can succeed internationally. Many AI applications are currently being developed by Chinese individuals, and the engineering capabilities in China are indeed impressive. This is also our advantage, and we need to leverage the strengths of both China and the U.S.

Silicon Valley college students are all starting AI ventures.

Founder Park: What is the current trend of college students starting businesses, especially in Silicon Valley? What do you see?

Kai: One fact to consider is the companies with a valuation of over $10 billion in this round: Mercor, which focuses on AI recruitment, has completed over $300 million in new financing and is now valued at $10 billion; Cursor has already secured a valuation of $10 billion. There are also projects like GPTZero and Pika. These are all entrepreneurial projects by college students, especially since the founders of Cursor and Mercor are both third-year dropouts.

This wave of young entrepreneurs shares a common characteristic: highly differentiated competition. They focus on extremely narrow fields and do not create generic products. For example, Mercor started by exclusively recruiting Indian programmers.

The second point is the environment. The entire capital environment in Silicon Valley and the underlying innovation, such as Stanford, YC, and Peter Thiel's fund, support college student entrepreneurship from the earliest stages, regardless of whether you have a mature idea. They are willing to support you and provide a strong network.

The third point is the qualities of these college students. Whether it's us or other college students emerging from Silicon Valley, they possess a very brave spirit of adventure and strong learning abilities. This adventurous spirit is something many students in China may lack. In Silicon Valley, there are many successful peers around you to inspire you, and the capital environment is willing to believe in young people.

For me, I also compared costs and benefits at that time. If I chose to finish college and then look for a job, I might not be able to repay my family's study abroad costs, nor would I necessarily see a significant return. But if I choose to start a business, I can learn crazily at a young age, and my life has infinite possibilities. I have wanted to start a great company since I was young.

Founder Park: Why can this generation of college students create billion-dollar companies today, while in the past, selling for one or two million dollars was considered impressive? Is there a factor of AI hype and bubble in this?

Kai: I don't think it's entirely a bubble. Cursor has $450 million in real revenue, which is very reliable. Behind this is the methodology and cognitive insights of this generation of young teams, which are very critical. If you look at these teams, they have excellent backgrounds and very good learning abilities.

Cursor initially relied on nearby college student programmers, who had a high acceptance of AI and provided strong feedback. The founder is also a genius engineer who deeply understands users and has strong engineering iteration capabilities. In the early stages, four people built the product. After they iterated the product well, they formed a good reputation among users, generated revenue, and investors were afraid of missing the next Mark Zuckerberg, so capital came to support them.

The fundamental condition is that many technologies in this wave of AI are new, young people learn quickly, are pragmatic, reliable, and daring, which leads to extreme user understanding and super-fast iteration speed to defeat traditional products. For example, before Cursor, GitHub Copilot was doing well, but why didn't it surpass it? It's because of user experience and execution speed.

Founder Park: Can we say that because AI is a new technology, many product perceptions also need to be viewed from a new perspective?

Kai: Yes, this younger generation has deeper cognitive insights than the previous generation of entrepreneurs and can get closer to users. The mainstream AI users are now post-2000s, and their learning and feedback iteration speed and tolerance are faster than those of the previous generation of entrepreneurs.

Therefore, the speed of cognitive iteration is key. In the mobile internet era, technological iteration was measured in years or quarters, but in the AI era, technological iteration may be measured in days. As a founder, you must learn quickly, and young people can stay up late and are more driven.

Founder Park: Some media have reported that many founders in Silicon Valley have also started working 996. What do you think?

Kai: Some of my white entrepreneur friends around me have raised a lot of money and also work 996. They, like us, rent a big house and all live and work together. I think 996 is more about environmental pressure; Silicon Valley is a bit like a gold rush now, and no one wants to fall behind, so they can only compete on product iteration speed, which requires staying up late for rapid iteration. This is a form of environmental shaping that forces people to do so.

Founder Park: Are there any trends in the choice of tracks for these college student entrepreneurs in Silicon Valley?

Kai: I think whether we are doing education or others, there is a trend of starting businesses within their comfort zones. The comfort zone refers to being sufficiently familiar with the field and users. The founders of Cursor are very knowledgeable about coding, and we are in education because we understand this demographic well. Today's young people are more likely to start businesses within their existing cognitive comfort zones, rather than rashly jumping into unfamiliar fields. This way, the feedback you receive from users is fast and accurate enough.

There is also cognitive accumulation. We have done education three times, and my understanding has been continuously built upon. These college students are less likely to rashly do things they haven't done before; they are thinking about how to do things better. They have a new generation of thinking patterns, iterating continuously within their cognitive circles and daring to create opportunities.

Another point is the brave spirit of adventure; they are less likely to deny themselves because of others' negativity, possessing an "I don't care what you think about me" attitude, which is very confident. Behind this is a culture of "high-speed experimentation." I know my product is not ready yet, but I don't care; I will launch quickly, iterate quickly, and get feedback quickly.

Founder Park: When did this trend start?

Kai: I think it is a consensus of success. When everyone sees projects like GPTZero growing from dorm rooms, iterating continuously, and then gaining capital support and user recognition, there are more rapid trial-and-error and explosive success stories, which form a consensus.

In a nutshell, "Better done than perfect," completing is more important than perfection. Moreover, everyone is not too worried about competition; many founders in Silicon Valley are willing to share their product ideas, not afraid of being copied, as long as they can iterate quickly. I think this wave of young people also has a great storytelling ability, and this storytelling is not empty but based on pragmatism and realism, combined with their vision for the future.

Founder Park: First, market yourself.

Kai: Yes. I think the underlying concept is a spirit of adventure and extreme confidence. Driven by this, they continuously dare to experiment and are not afraid to say the wrong thing. They boldly express their product ideas and execute them; if they make mistakes, they can just correct them. This culture of not fearing trial and error has contributed to the wave of college student entrepreneurship and success.

VCs in the U.S. also look at college student projects, and YC regularly invests in some college student projects each session.

Funding is the least of VideoTutor's worries right now

Founder Park: If you could go back to when you first started VideoTutor, what advice would you give yourself? What could have been done better?

Kai: I think it should have been at a faster pace. Also, regarding team composition. The VideoTutor team has gone through multiple rounds of adjustments. If I had known earlier, I would have built the team better based on the skills needed for the product. I believe that in entrepreneurship, organizational capability is crucial. I would spend more time on organizational capability: selecting people, recognizing talent, and utilizing good people.

The current team is suitable for growth from 0 to 1, but to make VideoTutor bigger, we still need more experienced people to join and bring their excellent experience and capabilities to the team, helping the entire team grow together.

Founder Park: In the next six months, what product or technical challenges do you think VideoTutor might encounter?

Kai: I think one challenge is rendering; we need to achieve true zero latency, which requires engineering breakthroughs. The second point is growth; I think it’s about the product's taste, which includes many aspects, such as whether the UI and interaction design are smooth and perfect, whether the functional interactions are bug-free, and whether the visual layout is attractive, etc. These are all tests for us.

James: I think initially we positioned VideoTutor as a visual teaching assistant for all subjects, but later we became very vertical, focusing only on the math field because that is our strength. Our math rendering engine is the most professional. The next key breakthrough may be horizontal expansion. For example, how to bring the advantages of visualization to the humanities scene? For instance, explaining "The hoe is in the sun at noon, and sweat drips onto the soil." This is something we need to consider technically moving forward.

Founder Park: Will the founders' backgrounds cause difficulties in future expansion?

Kai: Not really. In fact, many large VCs have approached us, like a16z, but they don't invest too early; they wait until the team shows signs of success before providing support, so they know their investment won't fail. We maintain good relationships with many large VCs.

Funding is the least of VideoTutor's worries; the biggest concern is around user ecology and the product.

Original link

免责声明：本文章仅代表作者个人观点，不代表本平台的立场和观点。本文章仅供信息分享，不构成对任何人的任何投资建议。用户与作者之间的任何争议，与本平台无关。如网页中刊载的文章或图片涉及侵权，请提供相关的权利证明和身份证明发送邮件到support@aicoin.com，本平台相关工作人员将会进行核查。

CZ invested in a Chinese junior student, 11 million dollars in seed funding, to create an education agent.