Why 99% of AI Products Fail: A CTO's Hard-Won Lessons

InfoQ

101 views • 1 month ago

Video Summary

The speaker, Phil, discusses his experience building products with generative AI over the past three years, offering insights into architectural approaches and common pitfalls. He emphasizes a software engineering mindset, advocating for iterative development and treating AI systems as robust engineering projects rather than purely data science endeavors. Phil introduces the concepts of "workflows" (predefined steps) and "agents" (semi-autonomous entities capable of decision-making and collaboration), explaining their roles in AI systems.

Phil delves into practical advice for building resilient AI products. He cautions against direct point-to-point agent collaboration, recommending semantic event-driven architectures instead. He also highlights the challenges of agentic memory, suggesting event sourcing or graph databases as viable solutions, and warns against the complexities of reinventing traditional web services. For deployment, he advocates for durable workflow systems like Temporal over traditional microservices due to the inherent statefulness and non-deterministic nature of AI.

Ultimately, Phil stresses that despite the hype and novel terminology surrounding AI, many of the underlying architectural principles remain rooted in established software engineering practices. He encourages the audience to apply their existing engineering knowledge to AI development, drawing parallels to past technological shifts like the adoption of cloud-native architectures and NoSQL databases.

Short Highlights

Speaker's Background and Bias: Phil, with 30 years of software engineering experience, brings a bias towards iterative, agile development and getting things done, applying this to AI product building.
Problem Statement: Many AI products, particularly in productivity, are not good due to how they are built, often falling into "Twitter-driven development" or slow, project-focused data science approaches.
Core AI Concepts: The two main building blocks in generative AI systems are workflows (predefined steps for a goal) and agents (semi-autonomous software with memory and decision-making capabilities).
Architectural Recommendations: Avoid point-to-point agent collaboration; use semantic events for inter-agent communication. For state management and memory, consider event sourcing or graph databases. For deployment and resilience, favor durable workflow systems like Temporal over traditional microservices.
Key Takeaway: Despite the novelty of AI, successful product development relies on applying sound software engineering principles, finding parallels with established practices, and avoiding overly complex or hyped solutions.

Key Details

Introduction and Speaker's Perspective [0:01]

The speaker, Phil, has spent the last three years working deeply in product building with generative AI.
He aims to set the scene for a diverse audience, focusing on essential architecture and AI concepts, and providing pointers for further learning.
He acknowledges his own biases, stemming from 30 years of building software with a focus on microservices and distributed systems, and a preference for iterative, agile development.
He emphasizes that his opinions might differ if he had a data science or AI research background.

bias uh bias beware because like the the that's that's the kind of bias that I have.

The Genesis of Autorop: Automating Management Tasks [3:53]

The initial idea was to create a "VS Code for everything a manager does," automating tasks that were previously done manually with tools like Google Sheets.
This concept evolved with the advent of generative AI, leading to the development of Autorop.
The first public beta was released after about six months, initially using GPT-3.5, with later considerations for GPT-4 and Llama models.
Autorop started as a Slack chatbot and evolved into a Chrome extension.

So my initial idea back in 2021 was like, okay, can we automate this? Can I create basically the VS code for everything that a manager uh does or everything that an engineer does?

Challenges and Market Landscape [5:01]

By 2022-2024, the market was crowded with similar AI tools from large companies like Microsoft and Salesforce.
Despite conceptual demos and announcements, many of these large company products were perceived as struggling to deliver quality.
Autorop, though a small startup, achieved several thousand active users early on, while others were still in demo phases.

And that's the kind of stuff that we had like the first screenshot here was May 2023, Microsoft uh Salesforce says, "We're releasing this thing, Slack GPT, it's going to be awesome."

The Failure and Lessons Learned from Autorop [7:38]

The product "failed miserably," but the reasons were insightful.
Users were not just interested in the tool itself but were trying to reverse-engineer how the small team built such an advanced system.
This highlighted a gap between the capabilities of small teams and large incumbents in building effective AI products.

But the reason it was it failed was really interesting or one of the reasons or one of the things we saw as we were failing was really interesting was that the users were not really interested in the tool as much.

Three Approaches to Building AI Products [8:56]

Twitter-driven development: Building for future, unreleased models, leading to flashy demos but lack of real-world delivery.
Data science project approach: Treating projects individually with less product thinking, leading to slow, incremental development and often subpar results (e.g., a spam classifier with 50% accuracy after a year).
Engineering project approach: Treating AI development with the same rigor as traditional software engineering, emphasizing iterative development and robust architecture.

So the way I see it there's basically three ways to that we build AI today and you might see this in your company or you know across the ecosystem.

Workflows and Agents in AI Systems [13:54]

Workflows: Predefined sets of steps to achieve an AI goal (formerly "inference pipelines"). They are static.
Agents: Systems where LLMs dynamically direct tool usage, possessing semi-autonomy, decision-making capabilities, and the ability to collaborate with other agents or tools. They are dynamic.

A workflow is basically a predefined kind of set of steps to achieve a goal with AI. ... And agents it's interesting because nobody has any idea what the hell an agent is.

Rethinking RAG and Workflow Architecture [15:22]

Many vendors sell RAG (Retrieval Augmented Generation) solutions that involve fetching data, sending it to a model, and using vector databases.
The speaker found that this direct approach often fails because LLMs are not as intelligent as expected.
Successful workflows typically involve more steps to add structure, context, and break down problems into discrete components, similar to building domain models.

But basically a lot of vendors will sell you this like you know we're going to get you data from all your data sources... And what we've learned is that this almost never works.

Agents vs. Microservices and Architectural Paradigms [19:37]

Agents are a poor fit for traditional microservice architecture due to their statefulness, memory requirements, and non-deterministic behavior.
Agents are characterized by memory, goal-orientation, dynamic behavior, and collaboration.
The speaker analogizes agents to objects in object-oriented programming, suggesting this paradigm helps in building systems with them.

Agents are actually very very bad fit for micro a traditional microser architecture.

Semantic Events and Communication for Agents [26:00]

Avoid direct, point-to-point agent collaboration, which can lead to tightly coupled and complex systems akin to older web services.
Instead, use semantic events published to a bus for communication between agents. This provides a more decoupled and manageable architecture.
While Kafka is common for event buses, starting with simpler solutions like Redis or in-memory systems might be more practical.

There's one particular paradigm uh that I think works very well um which is basically was using semantic events.

Agentic Memory and Event Sourcing [29:41]

Agentic memory involves keeping track of what an agent knows about a user or entity.
Simple approaches like long text documents in vector databases have limitations (e.g., ChatGPT's memory issues).
Event sourcing is a robust solution where the stream of events is used to build representations of state, which can then be compacted or used to generate snapshots.
A probabilistic graph database (like Neo4j) can be used for more nuanced memory representation, especially with natural language.

But basically you need to keep track of everything that an agent know about somebody. ... But there's a very interesting option within what we know in software engineering that actually works very well and a lot of people are doing it which is event sourcing.

Decomposing Monolithic Pipelines [34:47]

Traditional data science projects often create monolithic pipelines that mix unrelated concerns (e.g., fetching data from Slack and understanding Google Calendar).
This leads to high coupling and makes reuse difficult.
Breaking down workflows into smaller, well-defined pipelines with clear interfaces and semantic meaning is crucial for modularity and reusability, allowing agents to swap components dynamically.

Um, instead what we did a lot was basically again applying good old software engineering and breaking pipelines down into or breaking workflows down into smaller ones that actually had some kind of um published interface, some kind of actual semantic meaning and semantic entities that it return.

Durable Workflows and Infrastructure [41:15]

AI agentic systems break many rules of the traditional 12-factor app manifesto (e.g., statefulness, configuration management).
Traditional microservices are a poor fit for deploying AI.
Durable workflows (like Temporal) are recommended as they handle retries, timeouts, and checkpointing, separating orchestration from side effects, which is well-suited for AI systems.

So I stumbled upon the uh something that's becoming more and more popular durable workflows.

The Need for Better AI Platforms [45:18]

Building AI products currently requires overly complex architectures, even for smaller user bases (e.g., 10,000 users).
Existing platforms are not fully developed for AI product building, necessitating reinvention of common patterns.
The speaker hopes for better platforms in the future, otherwise, AI's potential in productivity will be limited.

What I'm trying to say is that we definitely need better platforms.

Conclusion: Applying Software Engineering Principles [46:49]

There's a lot of hype and semantic confusion in the AI space, but the core concepts are not drastically different from established software engineering.
The speaker advises using software engineering brains, finding parallels, and not treating AI as something entirely alien, similar to the adoption of NoSQL databases.
The industry lacks technical content based on actual experience, unlike the cloud-native era, making it important to rely on fundamental engineering principles.

But what I've observed building products around this for going on three years now is that outside the box is really not that different from the concepts we always had in software engineering.