Menu
Gary Marcus on the Massive Problems Facing AI & LLM Scaling | The Real Eisman Playbook Episode 42

Gary Marcus on the Massive Problems Facing AI & LLM Scaling | The Real Eisman Playbook Episode 42

Steve Eisman

38,856 views yesterday

Video Summary

The video features a critical discussion of Large Language Models (LLMs) by Gary Marcus, who argues they are not on a path to true Artificial General Intelligence (AGI). He explains that LLMs primarily function as sophisticated auto-complete systems, predicting the next word in a sequence based on vast datasets, which leads to issues like "hallucinations" where they generate fabricated information with confidence. Marcus highlights that this reliance on statistical pattern matching, rather than genuine understanding or reasoning (System 2 cognition), fundamentally limits their capabilities. He posits that the massive investment in scaling LLMs has yielded diminishing returns, and that true progress in AI requires a return to foundational research and the development of "world models" that represent external reality, a concept central to classical AI but largely absent in current neural network approaches. A key takeaway is that while LLMs have practical applications in pattern recognition and statistical analysis, they lack the abstract reasoning and novelty handling necessary for AGI, leading to widespread errors and potential societal impacts.

Short Highlights

  • Large Language Models (LLMs) are essentially sophisticated auto-complete systems, predicting the next word in a sequence, not truly understanding or reasoning.
  • LLMs are prone to "hallucinations," generating false information with high confidence due to their statistical prediction method and lack of a proper world model.
  • The current AI paradigm's reliance on scaling LLMs is yielding diminishing returns, and this approach is unlikely to lead to Artificial General Intelligence (AGI).
  • True AI advancement requires a return to foundational research, focusing on developing "world models" that represent external reality and enabling causal reasoning and novelty handling.
  • Significant investments in scaling LLMs may be misplaced, with a more robust future for AI lying in more efficient, reliable, and diverse research approaches.

Key Details

The Limits of Neural Networks and LLMs [00:16]

  • The world has heavily invested in neural networks, including LLMs, based on an idea that Marcus believes is flawed for achieving Artificial General Intelligence (AGI).
  • Investment communities are beginning to question the return on investment for LLMs, noting significant circular financing and poor results.
  • Marcus, who began studying AI at age 10 and has a background in studying natural intelligence and neural networks from his MIT dissertation, anticipated problems like hallucinations and reasoning failures early on.
  • He observed that the resurgence of deep learning in 2012, driven by advancements in GPUs (Graphics Processing Units) initially developed for video games, allowed for much larger scale and faster computation of neural networks.
  • Marcus argued in a 2012 New Yorker piece that while these systems excel at pattern recognition and statistical analysis (System 1 cognition), they are not suited for abstraction and reasoning (System 2 cognition), which is crucial for human-like intelligence.

"Neural networks are basically like system one. And that's fine. That's part of what we do as humans. But part of what we do is the system two stuff."

Hallucinations and the Nature of LLMs [11:07]

  • LLMs fundamentally operate by predicting the next element in a sequence, akin to "autocomplete on steroids."
  • They break information into smaller pieces, which can lead to a loss of connections and subsequent "hallucinations," where they invent information presented as fact.
  • Marcus cites examples like a biography falsely stating he owned a pet chicken and an LLM incorrectly asserting Harry Shearer is British when he was born in Los Angeles, illustrating how these systems can misinterpret or blend statistical correlations.
  • These hallucinations occur because LLMs reconstruct statistically probable relationships between data bits, and these reconstructions can be erroneous, blurring data from different sources.
  • The "looks good to me" effect, where LLMs produce grammatically correct but factually incorrect output, can lead to widespread errors, termed "work slop," because the systems don't truly understand the information they generate.

"We imagine falsely that large language models are intelligent beings like us, but really all they're doing is reconstructing statistically probable relationships between bits of information."

The Problem of Novelty and Diminishing Returns [19:24]

  • A core issue with LLMs, Marcus argues, is their difficulty in handling novelty; if presented with something sufficiently far from their training data, they struggle.
  • This was exemplified by a Tesla summon incident where the car drove into a jet because "jets" were not a category in its training data, highlighting a lack of general understanding of the world.
  • LLMs have hit diminishing returns, meaning the improvements between successive versions (e.g., GPT-4 to GPT-5) are less dramatic than earlier leaps, necessitating formal benchmarks to discern progress.
  • The massive investment in GPUs for LLMs, estimated at $500 billion by hyperscalers, is based on the speculation that these systems will lead to AGI, a prospect Marcus doubts.
  • The departure of key figures like Ilia Sutskever from OpenAI and the subsequent proliferation of startups suggest a lack of conviction within companies about achieving AGI solely through current LLM scaling.

"The systems don't know the difference. They can't tell you the difference. They don't ever say like, well, it seems to me that you know, everything like Wikipedia says that Harry Shear was born in in uh Los Angeles but I have the vibe as an LLM that it's London."

The Need for World Models and Intellectual Diversity [50:00]

  • Marcus emphasizes the critical need for "world models" within AI systems – internal representations of the external world that enable understanding of relationships, causality, and context.
  • Classical AI approaches recognized the importance of world models, but building them manually was labor-intensive. LLMs, by contrast, attempt to derive understanding solely from data, often failing to build true representations.
  • The hallucination of Harry Shearer's birthplace is attributed to the LLM's lack of a proper world model, preventing it from simply looking up factual information.
  • While LLMs can fake understanding, they fail to grasp fundamental rules, as evidenced by their ability to make illegal moves in chess despite being trained on vast amounts of chess data.
  • The field needs systems that can induce world models, understand causal principles, and represent entities, which requires foundational research beyond simply scaling existing LLM technology.

"Large language models try to do without that. You know, it's a lot of work to build a model of a particular thing, especially a complicated thing."

The Financial and Ethical Implications [45:38]

  • The current AI investment landscape, driven by venture capitalists seeking substantial returns (e.g., 2% of a trillion dollars), incentivizes funding large, expensive projects like LLM scaling, even if their ultimate success is uncertain.
  • Marcus suggests that the investment community is beginning to "look down" – questioning the viability and return on investment of AI ventures, noting issues like circular financing and poor ROI.
  • Companies like OpenAI, with massive outstanding commitments and no profits, are particularly vulnerable in a commoditized market where competitors like Google have caught up or surpassed them.
  • The speculation driving GPU sales for LLMs is based on the unrealistic expectation of achieving AGI; studies show AI systems can only perform about 2.5% of human jobs.
  • The future of AI requires a shift from speculative scaling to foundational research, focusing on more efficient, economical, and reliable approaches, alongside greater intellectual diversity in the field.

"The VCs got their 2% cut. The limited partners are going to lose a lot of money in the end."

Other People Also See