IT'S OVER! I Can't Stay Quiet on GOOGLE vs NVIDIA Any Longer
Ticker Symbol: YOU
153,844 views • 3 days ago
Video Summary
The AI landscape is undergoing a seismic shift as Google and Amazon launch custom chips to challenge Nvidia's dominance in data centers. Google's TPUs and Amazon's Tranium 3 chips are application-specific integrated circuits (ASICs) designed for efficiency and cost savings in specific AI workloads, particularly tensor operations for deep learning, large-scale inference, and recommendation systems. While these chips offer significant performance and efficiency gains for certain tasks, they primarily target niche markets, leaving Nvidia's broad GPU ecosystem largely intact. However, this development poses a significant threat to AMD, which has positioned itself as a more affordable alternative to Nvidia for large language model inference.
A particularly interesting revelation is that TSMC, the sole manufacturer of these advanced chips, stands to benefit immensely. Their role as the exclusive fabricator for Nvidia, Google, Amazon, Microsoft, and Meta's custom silicon, coupled with their leadership in advanced packaging, positions them as a strong long-term investment regardless of which chip designer ultimately prevails.
Short Highlights
- Google and Amazon are releasing custom chips (TPUs and Tranium 3) to compete with Nvidia's data center dominance, specifically targeting tensor operations for AI workloads.
- Google plans to ship 1 million TPUs by 2027, aiming to increase cloud revenue by over 10% ($13 billion) and capture 10% of Nvidia's data center revenue.
- Google's TPUs can outperform GPUs by 50-100% per dollar or watt for specific applications like high-volume inference and large training jobs.
- Amazon's Tranium 3 chip offers 50% more memory capacity, 70% more bandwidth, twice the compute performance, and is 40% more energy efficient than its predecessor.
- TSMC is the sole manufacturer for Nvidia, Google, Amazon, Microsoft, and Meta's custom AI chips, making them a prime beneficiary due to increased demand for advanced chip production and packaging.
Key Details
Google and Amazon's New Chip Strategy [00:00]
- Google and Amazon have launched custom chips to challenge Nvidia's dominance in data center AI.
- This move could significantly impact the AI market and Nvidia's position as the most valuable company in the AI sector.
- Google's custom Tensor Processing Units (TPUs) and Amazon's new Tranium 3 chips are application-specific integrated circuits (ASICs) designed for efficiency in AI workloads.
- The strategy signifies a major shift from previous chip development approaches, moving beyond just competing with Nvidia to potentially reshaping the AI revolution.
Google's TPU Expansion and Market Strategy [01:12]
- Google announced they would sell their custom TPUs to external data centers, a significant change from their prior strategy.
- Morgan Stanley estimates Google has a roadmap to ship 1 million TPUs to external customers by 2027, potentially increasing cloud revenue by over 10%, or close to $13 billion.
- Google's long-term internal goal is to capture approximately 10% of Nvidia's data center revenues annually, amounting to tens of billions of dollars.
- They aim to achieve this by targeting some of Nvidia's largest customers and most widely supported workloads.
Understanding TPUs: Application-Specific Design [01:57]
- Unlike general-purpose GPUs, TPUs are ASICs specifically built for tensor operations essential for deep learning, such as matrix multiplication.
- TPUs excel in three key areas: high-volume inference at massive scales (billions of requests for services like search, ads, YouTube, Gemini), large training jobs for AI models, and specialized recommendation and ranking systems.
- Google's TPUs can outperform GPUs by 50-100% per dollar or watt for these specific applications due to their efficiency and specialized design.
- Their performance scales exceptionally well when thousands of TPUs are interconnected, benefiting from integrated networking, fast interconnects, and tight memory coupling.
Meta's Interest in Google's TPUs [03:34]
- Meta Platforms is reportedly in talks to spend billions on Google's custom TPU chips, driven by similar workload needs.
- Google's AI factories, built on TPUs, are potentially more cost-effective for Meta than building their own full-stack solutions from scratch, including chips, racks, cooling, and software.
- Meta's own AI accelerators (MTIA chips) have limited workload support, primarily focused on inference, whereas Google's TPU pods are capable of training frontier-scale models.
- This makes Google's TPUs an attractive option for Meta to accelerate their AI hardware development and diversify beyond Nvidia's GPUs.
Amazon's Tranium 3 Chip and AWS Strategy [05:03]
- Amazon Web Services (AWS) has launched its new Tranium 3 chip, another ASIC focused on extreme power efficiency and cost savings.
- Tranium 3 is designed for specific high-volume AI workloads like training and inference for large language models (LLMs) with significant parameter counts and context windows, as well as multimodal and mixture-of-experts models.
- The new chip boasts 50% more memory capacity, 70% more bandwidth, twice the compute performance, and is 40% more energy efficient than its predecessor.
- Unlike Google's approach, Amazon appears to be keeping its Tranium 3 chips in-house for AWS, meaning customers must use AWS services to leverage these chips.
Competition with Nvidia's Ecosystem [06:59]
- While Google and Amazon's chips target specific AI workloads, they do not directly compete with Nvidia's broad GPU ecosystem across many industries.
- Nvidia's GPUs power a wide range of AI applications including image/video generation, simulations, professional visualization, drug discovery, and robotics.
- Nvidia's ecosystem extends beyond GPUs to include solutions like NVLink Fusion, which enhances interoperability between different hardware components in data centers.
- Google's TPUs are more closed and require reliance on Google's hardware and software stack, which is less versatile and widely adopted than Nvidia's CUDA.
- Google aims for around 10% of Nvidia's GPU market with TPUs, while Amazon's TAM is smaller as Tranium 3 is primarily for internal AWS use.
Impact on the AI Market and AMD [09:02]
- The AI market is projected to grow nearly 19x over the next 9 years, with a CAGR of over 38% through 2034, indicating ample room for growth even with increased competition.
- Nvidia's GPUs are flexible enough to support diverse AI segments (NLP, computer vision, robotics), while Google's TPUs and Amazon's Tranium chips focus on machine learning and NLP.
- Google and Amazon may challenge Nvidia's pricing power and margins for AI labs focused solely on language models, but cannot compete in areas involving robotics or physical simulations.
- The biggest loser in this scenario is AMD, whose data center strategy relies on being a cheaper alternative to Nvidia, especially for LLM inference, which is precisely where Google's and Amazon's new chips excel.
- Google's TPUs will directly compete with AMD for cost-optimized inference, and Amazon's Tranium 3 chips will reduce the need for AMD's GPUs within AWS.
Key Winners: TSMC and Broadcom [11:07]
- TSMC is the sole manufacturer capable of producing Nvidia's GPUs, Google's TPUs, and Amazon's Tranium chips, as well as chips for Microsoft and Meta.
- As customers design more specialized chips, demand for TSMC's advanced production nodes and packaging techniques will increase, driving margins.
- Broadcom is instrumental in designing multiple generations of Google's TPUs, Meta's MTIA chips, and ByteDance's custom chips, and recently partnered with OpenAI for their custom processors (XPUs).
- Broadcom also holds a dominant 90% market share in Ethernet switching chips for data centers, directly competing with Nvidia's networking solutions.
- Investors holding both Broadcom and Nvidia benefit from companies supplying networking solutions to virtually every AI data center.
Power Efficiency and Data Center Infrastructure [13:22]
- A primary driver for custom chip development by Google and Amazon is power efficiency, as electricity and cooling constitute a significant portion of data center operating expenses.
- Vertiv Holdings (VRT) is highlighted as a key player in data center power and cooling systems, supplying liquid cooling solutions for high-density servers and GPU clusters.
- Vertiv's modular liquid cooling systems can scale to cool up to 600 kW of server racks per unit and are supplied to AWS, Google Cloud, and Microsoft Azure.
- They also provide core power systems like the Liebert XL UPS, designed for hyperscale and cloud facilities, delivering high-capacity energy efficiently.
- The increasing demand for specialized AI chips and the associated power and cooling needs position Vertiv as a beneficiary in the evolving AI infrastructure landscape.
The AI market is projected to almost 19x in size over the next 9 years, which would be a compound annual growth rate of over 38% through 2034.
Other People Also See