Stop Wasting Tokens: The Art of Context Engineering

Addy Osmani

3,757 views • 1 month ago

Video Summary

Context engineering is the practice of strategically filling an AI's context window with the necessary information to optimize its performance. This goes beyond simple prompt refinement, addressing the underlying issues of mismanaged context that often lead to AI errors and inefficiencies. By understanding that AI models have a limited "context window," akin to limited RAM, developers can ensure that key details are not lost and the AI can effectively process and respond to tasks.

This approach involves equipping AI agents with not just prompts, but also relevant data, tools, and conversation history, much like onboarding a new team member with comprehensive guidelines and resources. Popular AI coding tools are increasingly offering visual aids and features to manage context, allowing for the inclusion of diverse information like code, images, and URLs. The goal is to provide AI with correct, relevant, and comprehensive information, ensuring that every token contributes meaningfully to the task at hand.

Effectively managing context involves several patterns: writing context outside the window for future use, selecting only relevant context when needed, compressing information to maintain efficiency, and isolating context to prevent interference. This careful curation of information is crucial to avoid common pitfalls like clutter, distraction, and conflict, which can lead to inaccurate or sluggish AI performance. The ultimate aim is to make AI agents more reliable and productive by ensuring they have the right information at the right time.

Short Highlights

Context engineering is the art and science of filling an AI's context window with the precise information needed for optimal performance.
AI models have a limited context window (measured in tokens) that dictates how much information they can process at once, impacting their ability to recall earlier details.
Effective context engineering involves providing AI agents with data, tools, and history, analogous to onboarding a new team member with necessary resources.
Key patterns for context management include writing, selecting, compressing, and isolating information to maintain efficiency and accuracy.
Balance is crucial: too little context leads to vague answers or hallucinations, while too much can cause distraction and confusion, with bad context being the most detrimental.

Key Details

What is Context Engineering? [00:06]

Context engineering is defined as the art and science of filling the context window with the precise information needed to guide an AI agent's performance.
It involves more than just clever prompts, encompassing layers of information to ensure optimal AI output.
The core idea is to equip an AI agent with everything it needs, including data, tools, and history, to effectively tackle a task.
This is compared to onboarding a new team member, where you provide specs, code bases, and guidelines, not just a vague instruction.

Context engineering, if you've heard about it, is the art and science of filling the context window with just the right information needed to guide an AI agent's performance.

Understanding Context and Tokens [00:27]

A token is a smaller part of a sentence that a computer breaks down to understand it.
A large language model's context window is the total number of tokens (including the prompt and generated output) that the model can process at a single time.
The context window represents the maximum amount of text an AI model can remember and use during a conversation to generate a response.
When the context window limit is reached, older information can fade as new information enters, affecting the model's ability to refer back to earlier parts of the conversation.

When a computer reads a sentence, it breaks it into smaller parts or tokens to make sense of it.

The Context Window as Limited RAM [01:10]

The context window can be mentally modeled as limited RAM for LLMs, with the model itself being the CPU.
This limited RAM needs to be engineered carefully, requiring mindfulness in its management.

LLMs are like the new OS, the model is the CPU and the context window is limited RAM that you kind of need to engineer carefully.

Prompt Engineering vs. Context Engineering [01:27]

Prompt engineering is the practice of writing clear instructions for AI models to accomplish tasks, involving testing, evaluating, and iterating on prompts.
However, a smart prompt doesn't guarantee great outcomes, as issues often lie beneath the surface with mismanaged context.
Context engineering addresses the root causes of AI failures, such as limited windows causing the AI to forget details, unstructured content leading to confusion, competing information causing distraction, and overload overwhelming the model.

The real issues are often below the surface with mismanaged context.

Context Engineering in Practice: AI Coding Tools [02:50]

Popular AI coding tools like Cursor and client make it easier to manage context by including files, folders, problems, and images in the AI's context.
In agentic co-coding, good context can transform a generic LLM into a specialized developer.
These tools often provide visual ways to monitor context window usage, such as a progress bar.

So things like including files, folders, problems, images, URLs and more in your context.

Visual Context in AI [03:23]

Visual context, such as including screenshots, is powerful for AI tasks like creating designs or fixing bugs.
Many tools allow attaching images, importing from design platforms, or using browser screenshots.
A picture can provide a wealth of information, enabling effective one-shot solutions.

Remember, a picture is worth a thousand words and this can enable um a lot of good oneshot solutions.

A Template for Context Engineering [04:04]

A template for context engineering includes several key dimensions:
- Task Context: Defines the AI's role and purpose (e.g., technical writing assistant).
- Tone: Guides how the AI communicates (e.g., professional but approachable).
- Background Data, Documents, and Images: The reference material for the AI (e.g., project documentation, style guides).
- Detailed Task Description and Rules: Specifies how the AI should perform tasks (e.g., keeping language clear, using examples, following coding standards).
- Examples: Provides concrete patterns for the AI to mimic, showing desired output format and structure.
- Conversation History: Helps the AI remember previous interactions, preventing repetition or loss of context mid-flow.
- Immediate Task Description/Request: The specific, current ask the AI must respond to.
- Thinking Step-by-Step: Encourages careful reasoning before answering, though less relevant with advanced thinking models.
- Output Formatting: Sets how the response should look (e.g., proper markdown).
- Pre-filled Responses: Provides a starting point or template for the AI to build upon.

Our job in context engineering is to load that mind of the LLM with the right stuff. Correct, relevant, and comprehensive information for the task.

Pitfalls of Context Mismanagement [07:39]

Good context is hard to create, as what works in one situation may not work in another.
AI coding performance often dips when the context window exceeds 50% fullness, leading to errors or sluggishness.
Too Little Context: Can result in vague answers or outright hallucinations, as the AI fills gaps with nonsense.
Too Much Context: Can cause the AI to get distracted, fixate on irrelevant patterns, or become confused amid the noise.
Bad Context: Can poison the AI, making it trust wrong information over its accurate training, leading to buggy outputs or stuck loops.

Garbage in, garbage out.

Patterns for Managing Context Effectively [09:36]

Patterns for managing context can be grouped into four buckets: write, select, compress, and isolate.
- Write: Saving context outside the window for later use (e.g., project summaries, scratchpad for temporary notes).
- Select: Pulling in the right context when needed, often using embeddings or semantic similarity, to reduce noise and fetch relevant information (e.g., via RAG).
- Compress: Keeping the window efficient by summarizing or trimming, such as condensing chat history or tool outputs to prevent overload.
- Isolate: Splitting context to avoid interference, such as partitioning state for different tasks or using multiple agents for subtasks.

Every token in your context should earn its place.

Practical Ways to Fix Context Issues [12:04]

Six practical ways to fix context issues include:
- Pruning irrelevant details.
- Summarizing.
- Offloading.
- Quarantining.
- Selectively adding context with RAG or tool loadouts.
Editors are evolving to automatically manage context, like summarizing conversations or condensing large files to fit context limits.
This automation helps keep developers productive without requiring manual management of context patterns.

The key takeaway here is really that every token in your context should earn its place.

Tips for AI Coders on Context Engineering [13:13]

Be precise to avoid vague answers and share specific details like code, error messages, and schemas.
Use pull request feedback for richer prompts.
Provide output examples and state constraints (e.g., specific libraries or stacks).
Treat the AI like a new hire, assuming it may not know everything.
Filter irrelevance, as "garbage in, garbage out."
Mindfulness around context engineering is important.