Menu
Andrej Karpathy: From Vibe Coding to Agentic Engineering

Andrej Karpathy: From Vibe Coding to Agentic Engineering

Sequoia Capital

276,067 views yesterday

Video Summary

The video discusses a fundamental shift in programming paradigms, moving from explicit code (Software 1.0) and trained neural networks (Software 2.0) to a new era of Large Language Models (LLMs) as programmable computers (Software 3.0). This new paradigm involves prompting and using the context window as a lever to guide LLMs, which interpret and perform computations. Examples like the installation of OpenCLaw and the "MenuGen" application illustrate how complex tasks can be achieved through simple prompts rather than intricate code. The conversation also delves into the concept of "jagged intelligence," where AI excels in verifiable domains like math and code but struggles in others, and introduces "agenti engineering" as a discipline to maintain quality and speed in professional software development. A key insight is that while agents can automate many tasks, human skills like judgment, taste, and understanding remain crucial for directing AI and fostering innovation. One startling fact is that state-of-the-art models can refactor massive codebases but may still make basic logical errors, such as recommending walking a short distance instead of driving.

Short Highlights

  • The shift to LLMs as programmable computers marks Software 3.0, where prompting replaces traditional coding for many tasks.
  • Complex installations like OpenCLaw can now be initiated with copy-paste prompts for agents, rather than complex scripts.
  • The "MenuGen" example highlights how LLMs can generate entire applications, with a Software 3.0 version being as simple as a prompt to an image generator.
  • AI exhibits "jagged intelligence," excelling in verifiable domains like math and code, but faltering in common-sense reasoning.
  • "Agentic engineering" focuses on coordinating powerful but spiky AI agents to increase speed without sacrificing software quality.

Key Details

Introduction to AI and a Programmer's Perspective [00:02]

  • The speaker is introduced as a key figure in building, explaining, and even renaming modern AI, having co-founded OpenAI and worked on Autopilot at Tesla.
  • He is known for coining "vibe coding" and recently expressed feeling more behind as a programmer than ever.

"He has a rare gift of making the most complex technical shifts feel both accessible and inevitable."

The Feeling of Being "Behind" as a Programmer [00:47]

  • The speaker elaborates on his statement of feeling "behind" as a programmer, describing it as a mixture of exhilarating and unsettling.
  • He observed a significant shift around December, where agentic tools like code assistants stopped making frequent errors and started producing fine chunks of code consistently.
  • This led to a point where he no longer needed to correct the output and began trusting the system more, engaging in "vibe coding."

"I think that a lot of people actually I tried to I tried to stress this on uh Twitter and or X because I think a lot of people experienced AI last year as ChachiPT adjacent thing."

Software 1.0, 2.0, and the Emergence of Software 3.0 [02:30]

  • The concept of LLMs as a new computer paradigm is introduced, distinct from just better software.
  • Software 1.0 involved writing explicit rules, Software 2.0 involved programming through data and training neural networks, and Software 3.0 is characterized by prompting LLMs.
  • In Software 3.0, the context window acts as a lever for the LLM interpreter, enabling computation in the digital information space.

"So software 3.0 know is kind of about uh, you know, your programming now turns to prompting and what's in the context window is your lever over the interpreter that is the LLM."

OpenCLaw Installation as a Software 3.0 Example [03:44]

  • The installation of OpenCLaw is used as an example. Traditionally, this would involve complex shell scripts.
  • In the Software 3.0 paradigm, the installation becomes a simple copy-paste instruction for an agent, which then intelligently handles the setup across different environments.
  • This is more powerful because the agent uses its own intelligence to perform actions, debug, and adapt to the user's environment.

"The agent has its own intelligence that it packages up and then it kind of like follows the instructions and it looks at your environment, your computer and it kind of like performs intelligent actions to make things work."

MenuGen: From Traditional App to Prompt-Driven AI [04:52]

  • The speaker recounts building "MenuGen," an app to generate pictures for restaurant menu items. This involved OCR, image generation, and rendering.
  • The Software 3.0 version of this concept is described as simply giving a photo to Gemini and instructing it to overlay things onto the menu, with an AI model ("Nanabanana") directly rendering the modified image.
  • This illustrates that the traditional app may have become superfluous, with LLMs handling more of the work directly based on the input.

"And then I saw the software 3.0 version of this which is which blew my mind which is literally just take your photo give it to Gemini and say use Nanobanana to overlay the the things onto the menu."

The Future of Programming and Information Processing [06:36]

  • The discussion shifts to how this paradigm change affects programming and information processing more broadly.
  • It's not just about making programming faster; it's about automating general information processing, which wasn't possible before.
  • LLM knowledge base projects, creating wikis from documents, are cited as examples of new possibilities, not just faster versions of existing tasks.

"And so these are new things that weren't possible. Uh, and so I think this is uh something that I keep trying to get back to as to not only what can we do that existed that is faster now but I think there's new opportunities of just things that couldn't be possible before."

Extrapolating to 2026: The "Neural Computer" Era [07:51]

  • The speaker speculates about the future, drawing parallels to web development in the '90s and mobile apps in the 2010s.
  • He envisions a future with "completely neural computers" where raw video or audio is fed into a neural net that then diffuses and renders a UI.
  • This could lead to a flip where neural nets become the host process, and CPUs become co-processors, with AI compute dominating.

"So you could imagine something really weird and foreign when where neural nets are doing most of the heavy lifting. They're using tool use as this like you know uh historical appendage for some kinds of like deterministic tasks."

Verifiability and Jagged Intelligence [09:43]

  • The concept of verifiability is crucial: AI automates faster in domains where output can be verified.
  • Traditional computers excel at codeable tasks, while LLMs excel at verifiable tasks due to reinforcement learning with verification rewards.
  • This training leads to AI exhibiting "jagged intelligence," peaking in areas like math and code but being rough in less verifiable domains.
  • Examples include AI failing to answer simple logic questions (e.g., walking to a car wash) despite its ability to refactor large codebases.

"So I think the reason I wrote about verifiability is I'm trying to understand why these things are jagged."

Advice for Founders and Verifiable Domains [13:40]

  • For founders building companies, the advice is to focus on verifiable domains where AI has shown strong performance.
  • Verifiability allows for creating reinforcement learning environments and fine-tuning models effectively.
  • The speaker hints at valuable RL environments that may not be currently prioritized by major labs but could be areas for new ventures.

"So maybe one way to see it is that uh that remains true even if the labs are not focusing on it directly. So if you are in a verifiable setting where you could create these RL environments or examples then that actually sets you up to potentially do your own fine tuning and you might benefit from that."

Vibe Coding vs. Agentic Engineering [15:47]

  • "Vibe coding" is described as raising the floor for everyone, making software development more accessible.
  • "Agentic engineering" is about preserving the quality bar of professional software, ensuring speed without introducing vulnerabilities.
  • It's framed as an engineering discipline focused on coordinating powerful but "spiky" AI agents to increase productivity while maintaining quality.

"So to me agentic engineering when I call it that because I do think it's kind of like an engineering discipline. You have these agents which are these like spiky entities. They're a bit fable, a little bit stocastic, but they are extremely powerful."

AI Native Coding and Hiring [17:21]

  • The difference between a mediocre and an AI-native coder is in maximizing the use of available AI tools and investing in their setup.
  • For hiring, companies need to shift from traditional puzzles to giving candidates large, complex projects to implement, testing their ability to build and secure systems using AI agents.

"So um just investing into your setup um and um utilizing a lot of the you know uh tools that are available to you. Um, and I think it just kind of looks like that."

Human Skills in the Age of Agents [19:29]

  • As agents become more capable, human skills like aesthetics, judgment, taste, and high-level oversight become more valuable.
  • Agents can make mistakes due to their statistical nature (e.g., cross-referencing Stripe and Google accounts by email addresses).
  • Humans are needed to define the overarching "spec" or plan, provide direction, and ensure the output aligns with human values and understanding.

"So I think people have to be in charge of this spec, this plan. And um I actually don't even like the plan mode."

Jagged Intelligence: Animals vs. Ghosts [23:33]

  • The speaker's "animals versus ghosts" framing suggests AI are not intrinsically motivated beings but rather summoned "ghosts" shaped by data and reward functions.
  • This framing matters for understanding their limitations and how to build and evaluate them, acknowledging they lack intrinsic motivation, fun, or curiosity.

"Um, I think it's just um coming to terms with the fact that these things are not, you know, animal intelligences. Like if you yell at them, they're not going to work better or worse or it doesn't have any impact."

The Agent-Native World and Infrastructure [25:18]

  • A future world where agents have real permissions and take action means existing infrastructure, often built for humans, will need to be rewritten.
  • Documentation will shift from telling users what to do to providing prompts for agents.
  • The ideal future infrastructure will be "agent-first," with data structures legible to LLMs and deployment handled automatically.

"So um everyone is I think excited about how do we decompose the workloads that need to happen into fundamentally sensors over the world, actuators over the world. How do we make it agent native?"

Education and Understanding in the AI Era [27:56]

  • The most valuable skill to learn remains "understanding," as it cannot be outsourced, even if thinking can.
  • Tools that enhance understanding, like LLM knowledge bases, are crucial for humans to direct AI effectively.
  • While AI can process information, humans are uniquely responsible for true comprehension and directing AI's capabilities.

"And um, I think that's really nicely put. I so yeah, because I still I'm still part of the system and I still I still have to somehow information still has to make it into my brain."

Other People Also See

AI subscriptions are becoming less attractive
AI subscriptions are becoming less attractive
Maximilian Schwarzmüller 42,791 views
What Was Life like in the Middle Ages?
What Was Life like in the Middle Ages?
Captivating History 479,348 views