Memory Architecture of AI Agents Explained

Software Developer Diaries

251 views • 19 hours ago

Video Summary

The video delves into the crucial role of memory in AI agents, exploring both theoretical frameworks and practical implementations. It categorizes memory into short-term and long-term types, with short-term memory functioning within the immediate chat context through a simple array of messages passed into the LLM's context window. However, this approach is limited by context window size and cost. Long-term memory is further divided into semantic, episodic, and procedural memory. Semantic memory stores extracted facts as vectors in a vector database, allowing for similarity searches. Episodic memory captures specific events with details like what, when, who, and the outcome, often stored in key-value databases like MongoDB for efficient retrieval. Procedural memory focuses on extracting and storing workflows, skills, and step-by-step instructions, also utilizing vector stores for retrieval. The video highlights Zenrose as a sponsor, a platform designed for robust web scraping and data extraction, essential for agents requiring high-quality external data.

An interesting fact is that while short-term memory is still used in current chats, its limitation is that it has no access to other chats, and the context window's growth directly impacts performance and cost.

Short Highlights

AI agents rely on memory, categorized into short-term and long-term.
Short-term memory uses a simple array of messages within the LLM's context window, limited by size and cost.
Long-term memory includes semantic (facts), episodic (events), and procedural (workflows) types.
Semantic memory converts data into vectors for similarity search in vector databases like Chroma.
Episodic memory stores event details (what, when, who, outcome) and can use key-value stores like MongoDB.
Procedural memory extracts and stores skills and step-by-step instructions, often using vector stores.
Zenrose is presented as a data extraction platform supporting AI agents with clean, structured data.

Key Details

Understanding Agent Memory [00:00]

Memory is an integral component of modern AI agents, with various types and implementations discussed.
The video aims to explain how these memories work theoretically and practically.
AI agents leverage memory to retain information from past interactions, making conversations more dynamic and less "dry."

"And if you're curious about learning how exactly they work under the hood and of course some theory as always in my channel, then this is the right video for you because we're going to talk about different types of memories and how they are implemented both in theory and in practice."

Short-Term Memory: Context Window [01:04]

Short-term memory stores messages in a simple array.
When a user asks a question, the current prompt and past messages (system prompt) are passed into the LLM's context window.
This allows the LLM to "see" past conversations and answer questions based on that context.
Drawbacks: Limited to current chats, context window size grows linearly with chat length, leading to performance issues and increased costs due to token usage.

"So, the short-term memory is basically the following. So, if the user says, 'My name is Alice and I'm learning about AI.' What we're going to do is we're going to store every message in a simple array of strings, right?"

Practical Implementation of Short-Term Memory [03:51]

Libraries like LangGraph are used for building agents with memory.
Messages are appended to a messages array within a state object.
The agent retrieves messages from the array to form the current query.
The output demonstrates the agent remembering the user's name from previous messages.

"So, we're going to be using Langraph, which is a library, kind of a standard nowadays, I would say, to be for building agents."

Sponsor Spotlight: Zenrose [05:31]

Zenrose is presented as an integration-ready, business-ready data extraction platform.
It handles common web automation challenges like anti-bot bypassing, firewalls, and CAPTCHA challenges.
Features include user agent and premium proxy rotation, delivered automatically.
The Universal Scraper API provides a single endpoint for clean, structured outputs (JSON, Markdown) without dealing with browser quirks or website defenses.
Zenrose scales with workload and has excellent documentation, allowing for quick integration.
It is recommended for agents that rely on high-quality external memory.

"Zenrose is not just a web scraper in my opinion. It's more than that. It's an integration ready, business ready data extraction platform designed to handle everything that normally breaks when you do web automation."

Semantic Memory: Storing Knowledge [07:37]

Semantic memory focuses on remembering important and domain-specific knowledge.
When a user provides information (e.g., "I work as a software engineer at TechCorp..."), facts are extracted using an LLM.
Criteria for extraction include name entities, preferences, skills, goals, and dates.
Extracted facts are turned into vectors and stored in a vector database.
To retrieve information, a similarity search is performed on the prompt's vector against the database.
The retrieved relevant facts are then added to the system prompt for the LLM.
This vector database can be shared across different chats for the same user.

"So the semantic memory is all about remembering the important knowledge and some domain knowledge that you're going to be using within your conversations."

Implementing Semantic Memory with Vector Databases [10:06]

The code demonstrates using embeddings and a vector database (Chroma from LangChain).
A prompt is used to extract facts, which are then stored in the vector store.
For retrieval, a similarity search is performed, and the last few relevant messages are fetched.
These fetched facts are added to the context window, allowing the LLM to answer questions based on stored knowledge.

"And the vector database, it's going to be chroma from lang chain to make it easy."

Episodic Memory: Remembering Events [11:40]

Episodic memory stores specific, time-bound events rather than general knowledge.
The LLM is prompted to extract a single main "episode" from a message, including what happened, when, who was involved, and the outcome.
This data is often stored as key-value pairs, suitable for databases like MongoDB or Redis.
Retrieval can be done via similarity search or filter search (e.g., by timestamp).
The retrieved episodic data is then supplied as context for the LLM.

"Episodic memory basically means that now instead of saving the domain knowledge, we're simply going to pass a different types of a query to our LLM before storing it in the database."

Implementing Episodic Memory [13:49]

The demonstration shows extracting and storing event descriptions as JSON.
Keywords like "happened," "attended," and "met" signal an episode.
Stored episodes include timestamps, allowing the agent to recall recent experiences.
The LLM can then use this historical context to answer questions about past events.

"So, we're going to look at the keywords happened, attended, met, experienced, meeting. This gives us a clue that it's an episode in the memory."

Procedural Memory: Skills and Workflows [15:18]

Procedural memory stores workflows, skills, or procedures.
The LLM is prompted to extract step-by-step instructions, usage contexts, and benefits/outcomes.
This extracted information is stored, often in a vector store.
When the agent needs to perform a task or answer a question about techniques, it retrieves these known procedures.
The retrieved procedures are integrated into the context window to guide the LLM's response.

"Now if we look at the procedural method, the only difference we're going to see is in the prompt. So we're going to say extract workflows skills or procedures from the last user message."

Concluding Thoughts on Memory [16:58]

The video concludes by emphasizing the importance of these memory types for AI agents.
It encourages viewers to subscribe and engage with further questions or remarks.