
Anthropic Co-founder: Building Claude Code, Lessons From GPT-3 & LLM System Design
Y Combinator
81,227 views • 2 months ago
Video Summary
Here's a summary of the YouTube video transcript, presented in the requested markdown format:
Tom Brown, co-founder of Anthropic, shares his journey from a 21-year-old MIT graduate to leading a major AI company. He highlights the importance of a "wolf" mindset, embracing challenges, and learning from early startup experiences. His path includes co-founding a DevOps company, working at Mopub, and experiencing the early days at OpenAI.
Short Highlights
- Started tech at 21, fresh from MIT.
- Co-founded Anthropic after working at OpenAI.
- Anthropic's early days involved 7 co-founders and uncertainty about product direction.
- Claude 3.5 and Claude Code demonstrated significant success, particularly in coding.
- Humanity is on track for the largest infrastructure buildout of all time, with AI compute spending projected to triple annually.
Key Details
Early Startup Experiences and Mindset Shift * Began career at 21, first as the first employee at Linked Language, a startup founded by friends. * Valued the experience of having to figure things out independently in startups, contrasting it with the more task-oriented environment of large tech companies. * Described this mindset as a shift from being a "dog waiting for food" to being a "wolf" that hunts for its own sustenance. * Later worked at Mopub as the first engineer, facing challenges with programming but gaining experience in scaling. * Co-founded a company called Solid State before Docker existed, aiming to simplify DevOps, which proved to be more complex than anticipated.
Transition to AI and OpenAI * Felt uncertain about his ability to contribute to AI research due to early academic struggles (e.g., a B-minus in linear algebra). * Spent 3 months after leaving his startup (Grouper) building an art car for Burning Man, experiencing burnout. * Underwent 6 months of self-study to prepare for AI research roles, focusing on Coursera courses, Kaggle projects, and foundational math. * Secured a contract with Twitch to fund his self-study period. * Joined OpenAI after messaging co-founder Greg Brockman, initially helping with the Starcraft environment, not direct machine learning work. * Key role in scaling GPT-3, shifting from TPUs to GPUs and emphasizing the importance of PyTorch as a software stack for fast iteration. * The scaling laws paper, showing a straight line of intelligence improvement over 12 orders of magnitude with increased compute, was a major catalyst for his focus on scaling.
Founding and Growth of Anthropic * Co-founded Anthropic with a group from OpenAI's safety and scaling organizations, driven by a shared mission to benefit humanity. * Early days at Anthropic were challenging, with a smaller team and less funding compared to OpenAI. * The initial team of 7 co-founders grew to around 25 within months, including 25 former OpenAI employees. * The first product was a Slackbot version of Claude 1, launched about nine months before ChatGPT. * Hesitation in launching Claude as a product initially due to uncertainty about its impact and insufficient serving infrastructure. * Significant progress and market traction were observed with Claude 3.5 and Claude Code, particularly for coding tasks. * Anthropic uses GPUs from three manufacturers (GPUs, TPUs, and Tranium) for flexibility and to leverage the right chips for specific jobs. * The company focuses on building the best platform and API for developers, aiming to empower them to build on top of their models.
AI Development Philosophy and Future Outlook * Emphasizes a philosophy of "doing the stupid thing that works," referencing the success of scaling laws despite initial skepticism. * Does not "teach to the test" regarding benchmarks, believing it leads to negative incentives. * Focuses on internal benchmarks and dog-fooding to improve models for their own engineers. * Views interpretability as a long-term bet for understanding more advanced AI systems. * Claude Code was initially an internal tool that showed promise for assisting engineers. * Believes there is a significant opportunity for developers to build tools that act as useful partners or junior engineers for AI models. * Identifies power availability as the biggest bottleneck for the current AI infrastructure buildout, advocating for policies that ease data center construction. * Advises younger individuals interested in AI to take risks and work on projects that align with their intrinsic values and aspirations, rather than solely external credentials. * The broader infrastructure buildout for AI is projected to be the largest in human history, exceeding projects like Apollo and Manhattan.
Other People Also See



