RAG is Dead: Context Engineering Reigns in AI Systems

Latent

The rapidly evolving landscape of artificial intelligence is witnessing a significant paradigm shift, as heralded by Jeff Huber, CEO of Chroma, in a recent Latent.Space interview titled “RAG is Dead, Context Engineering is King.” This bold declaration signals a move beyond simple retrieval-augmented generation (RAG) towards a more sophisticated approach to managing the information that fuels AI systems. The discussion highlights what truly matters in vector databases in 2025, the unique demands of modern AI search, and strategies for building resilient systems that adapt as their contextual understanding grows.

For a general audience, Retrieval Augmented Generation (RAG) emerged as a crucial technique to enhance large language models (LLMs). Traditional LLMs, trained on vast but static datasets, often struggle with providing up-to-date, domain-specific, or accurate information, sometimes even “hallucinating” facts. RAG addressed this by enabling LLMs to first retrieve relevant information from external knowledge bases—like documents, databases, or the web—and then use this fresh data to augment their responses. This process aimed to reduce inaccuracies and the need for constant model retraining, allowing LLMs to cite sources and provide more grounded answers.

However, as AI applications mature from simple chatbots to complex, multi-turn agents, the limitations of RAG have become apparent. While RAG improved accuracy, it wasn’t a silver bullet against hallucinations, as LLMs could still misinterpret or combine retrieved information in misleading ways. Furthermore, RAG systems faced challenges in distinguishing subtle differences in large datasets, handling ambiguous meanings, and critically, operating within the fixed “context window” limitations of LLMs. Jeff Huber notes that simply stuffing more data into an LLM’s context window can actually degrade its reasoning capabilities and ability to find relevant information, a phenomenon Chroma’s research terms “Context Rot.”

This is where “Context Engineering” takes center stage. Unlike “prompt engineering,” which focuses on crafting the perfect singular instruction for an AI model, context engineering is the systematic discipline of designing and managing all information an AI model sees before it generates a response. It encompasses assembling system instructions, conversation history, user preferences, dynamically retrieved external documents, and even available tools. Huber argues that the success or failure of today’s advanced AI agents increasingly hinges on the quality of their context, making most agent failures “context failures” rather than inherent model shortcomings. The goal of context engineering is precise: to find, remove, and optimize relevant information for the LLM, ensuring it receives exactly what it needs, when it needs it. This involves a two-stage process of “Gathering” (maximizing recall by casting a wide net for all possible relevant information) and “Gleaning” (maximizing precision by re-ranking and removing irrelevant data).

At the heart of modern AI search and context engineering are vector databases. These specialized databases store and index numerical representations, or “embeddings,” of unstructured data like text, images, and audio. Unlike traditional databases that rely on exact matches, vector databases enable highly efficient “similarity searches,” allowing AI systems to understand meaning and context. Chroma, co-founded by Jeff Huber, is a leading open-source vector database built specifically for AI applications. Huber emphasizes Chroma’s commitment to simplifying the developer experience and providing scalable, natively distributed solutions that overcome the “operational hell” often associated with scaling single-node vector databases.

The shift from “RAG is dead” to “Context Engineering is King” signifies a maturation in AI development. It acknowledges that simply retrieving data isn’t enough; the intelligence lies in how that data is curated, structured, and presented to the AI. Modern search for AI is no longer just about finding keywords but about understanding nuanced intent and context, a capability powered by the sophisticated interplay of vector databases and context engineering principles. As AI systems become more integral to complex workflows, the ability to ship systems that don’t “rot” as context grows—by respecting context window limits, employing hybrid retrieval, and rigorous re-ranking—will define the next generation of robust, reliable AI.