AI In Healthcare Series: Leveraging GPT-5, Cosmos, and Predictive Models for Better Outcomes

Stanford Online

347 views • 4 months ago

Video Summary

The latest AI models, like GPT-5, are showing incremental improvements and saturating benchmarks, but their true impact is debated, with some finding them revolutionary and others seeing less dramatic leaps. The discussion highlights the importance of model selection, specialized tools, and the challenges of moving beyond benchmarks to practical, real-world applications in healthcare. A significant focus is on the gap between AI capabilities and their integration into effective end-user applications, emphasizing that building reliable software requires more than just LLMs, incorporating domain expertise, robust development practices, and user-centered design.

A key area of discussion revolves around healthcare's unique challenges, including its distinct language and the need for specialized models. Research presented showcases advancements in creating models trained on vast datasets of patient encounters (8 billion encounters) to understand and predict health trajectories over multiple time scales, from immediate needs to long-term chronic disease management. The conversation also touches on the potential for AI to exacerbate "deskilling" in clinical practice and the crucial need to design human-AI interactions that enhance, rather than diminish, human capabilities, ensuring that the combined intelligence is greater than the sum of its parts.

Finally, the future of healthcare is envisioned as a landscape of unequivocally better care, with more personalized and evidence-based treatments at the point of care, facilitated by new tools. This includes improving patient-physician interactions through integrated software that prepares for visits, offers real-time insights in exam rooms, and addresses systemic challenges like physician shortages and access to care, particularly through patient-facing applications like MyChart. The overarching theme is the continuous effort to bridge the gap between AI potential and tangible healthcare improvements, focusing on thoughtful integration and a "right tool for the right job" approach.

Short Highlights

Recent AI models show incremental progress, with debates on their revolutionary impact versus hype.
Bridging the gap between AI capabilities and practical, real-world applications in healthcare is a major challenge, requiring domain expertise and robust software development.
Specialized AI models trained on extensive patient data are being developed to understand healthcare's unique language and predict health events across various timescales.
The concept of "deskilling" in clinical practice due to AI reliance is a concern, emphasizing the need for human-AI collaboration that enhances rather than replaces human expertise.
The future of healthcare envisions improved patient-physician interactions and care quality through AI integration, addressing systemic challenges like access and physician shortages.

Key Details

The Current State of AI Models [01:12]

GPT-5 has launched with mixed responses, not necessarily meeting the hype for some.
AI models are saturating typical benchmarks, leading to questions about their real-world utility.
The speaker has found success using "thinking in pro" models for higher-quality output, though with increased latency.
There's a broader theme of model selection and the emergence of more specialized models beyond just larger ones.
Some found GPT-4.5 particularly good for writing and natural interaction.

The current generation of AI models, while showing improvement, is prompting a re-evaluation of their practical impact beyond benchmarks. The discussion points to a nuanced view, where incremental gains are observed, but the "death star" leap some expected hasn't fully materialized, leading to a focus on selecting the right model for specific tasks.

"I was kind of in more of the rest of the internet of hey it is it is better. I think it is a very interesting consumer choice to do some of the routing behind the scenes to kind of expose people to different models. I don't know if it was the Death Star giant giant leap forward that some people were predicting."

Challenges with Benchmarks and Evaluation [06:31]

Benchmarks, especially those derived from the education system (like USMLE), may not fully capture the progression of a clinician's journey.
Multiple-choice questions as benchmarks have serious flaws, as demonstrated by a study where changing distractors impacted performance.
LLMs are non-deterministic, introducing challenges in consistent performance evaluation.
Academic research often doesn't replicate the rigorous, checks-and-balances approach used in building consistent software in healthcare.
There's a need for better ways to evaluate AI performance in real-world clinical contexts, beyond academic benchmarks.

The reliance on traditional, education-based benchmarks for evaluating AI in healthcare is being questioned. The non-deterministic nature of LLMs and the lack of real-world software development rigor in academic studies highlight the need for more sophisticated evaluation methods that reflect clinical workflows and patient care.

"It's interesting that we continue to use benchmarks that are founded in kind of the education system, if you will, from a medical perspective. and we haven't quite figured out how do we move into those later stages of a clinician's journey as they become better and better in the role and move out of school."

Integrating AI into Healthcare Software and Workflows [09:03]

While AI capabilities are advancing, translating them into real-world value for end-users (patients and physicians) remains a significant challenge.
Building reliable, high-quality healthcare software involves more than just LLMs; it requires domain expertise, tight developer-user interactions, and know-how in stitching systems together.
Developers need to spend time "at the elbow" of physicians to understand their problems, whether diagnostic or administrative.
Building the right pipeline, backend structures, and reinforcement loops for AI-generated content (summaries, notes, queries) is crucial.
New user experience patterns and design workflows are needed, especially for models that require more thinking time.

The primary challenge in AI for healthcare lies not in the capabilities of the models themselves, but in the complex process of integrating them into functional, reliable software. This involves understanding user needs, building robust technical infrastructure, and designing intuitive workflows that deliver tangible value.

"The question is that you're raising is like okay now that we know their signal how do we actually craft that into something that really delivers something for our end users whether it's a patient or physician and that gap to me is still a bit of a chasm and does require the hard work of either having the domain expertise the uh the tight developer user interactions a and frankly just the knowhow of how to stitch these things together."

The Dichotomy of LLM Hype vs. Software Reliability [12:51]

There's a dichotomy between the excitement around new LLM capabilities and the reality of building reliable software.
Introducing too much LLM into software can introduce noise and reduce reliability.
Building good software requires context, data, audit controls, monitoring, governance, and valuation, which are often overlooked in the LLM hype.
Users often confuse the AI model with the end-user application; applications require consistent guardrails and workflow integration.
The goal is to have AI systems that are better together with humans, not just standalone AI outputs.

The excitement surrounding new LLM capabilities often overshadows the fundamental principles of good software development. Creating reliable and high-quality systems in healthcare requires a comprehensive approach that integrates AI thoughtfully within a framework of established engineering practices, focusing on safety, accuracy, and user benefit.

"If you want something that is reliable, consistent, you know, high quality, candidly, you want as little LLM as possible because that's going to introduce noise in what you're working. And so we have this kind of dichotomy of worlds."

Specialized Healthcare Language Models and Patient Stories [16:30]

Healthcare has a unique language that differs from general natural language, requiring specialized models.
A new approach involves re-evaluating problems like tokenization and considering the temporal nature of medical events.
Transformer architectures, successful in NLP, are being applied to patient data to build "stories" based on medical events, interventions, and observations.
A large model trained on approximately 8 billion encounters demonstrates potential for predicting future health trajectories.
The Cosmos community efforts aim to build larger de-identified datasets (two to three times the size) for training these specialized models.

Recognizing that healthcare's language and context are distinct from general natural language, researchers are developing specialized AI models. These models leverage transformer architectures to interpret complex patient data, including temporal elements, to build comprehensive patient narratives and enable more accurate health predictions, with a focus on utilizing vast de-identified datasets.

"Let's reconsider the problem of the language the tokenization of the actually the language of healthcare. Let's think about the fact that things happen on different time scales and and consider that right in the events in someone's progression through a health system."

The Risk of Deskilling and Human-AI Interaction [22:21]

AI tools, while beneficial, can lead to "deskilling" if users become overly reliant on them.
Examples include physicians potentially losing proficiency in tasks like colon polyp detection or physical exam maneuvers if AI takes over.
The question arises of what acceptable levels of deskilling are in healthcare and where human expertise must be maintained.
The ultimate goal is to achieve a "better together" state between humans and AI, where their combined intelligence surpasses individual capabilities.
This necessitates solving complex human-AI interaction challenges to optimize the division of labor and ensure synergistic outcomes.

The potential for AI to lead to "deskilling" among healthcare professionals is a significant concern. It highlights the critical need to design AI systems not as replacements for human expertise, but as collaborators that augment human capabilities, ensuring that the synergy between human and artificial intelligence leads to superior outcomes.

"The only question the patient actually came in to ask you will be asked once your hand touches the door to leave the next appointment."

The Future of Healthcare: Enhanced Care and Patient Experience [32:52]

The future of healthcare promises unequivocally better quality of care, with more personalized and real-world evidence at the point of care.
New tools will be available to physicians, enhancing their diagnostic and treatment capabilities.
AI integration aims to improve patient-physician interactions by preparing for visits, offering real-time insights during exams, and connecting patients with care teams.
Addressing systemic healthcare challenges like physician shortages and access issues is a key focus for AI development.
The progression towards advanced AI in healthcare will involve incremental steps that offer value at each stage, improving efficiency and patient experience.

The vision for the future of healthcare is one of significantly enhanced quality of care, driven by AI's ability to provide personalized insights, improve patient-physician communication, and address critical systemic issues. This evolution will be marked by continuous, incremental advancements that progressively integrate AI into workflows, ultimately benefiting both patients and healthcare providers.

"I think, and I think this may be even more important bluntly, the opportunity to hand off from that experience to the care team, to engage a nurse, to engage a provider, and to escalate that conversation to somebody else that can intervene if necessary in that context and be able to do so seamlessly and go back and forth."