Beyond RAG: Implementing Agent Search with LangGraph for Knowledge Operations

Traditional RAG systems excel at simple fact retrieval but hit limits with complex, ambiguous queries. For example, a question like “What product positioning differences between Nike and Puma could explain differing sales outcomes?” requires handling multiple entities and disambiguating terms. Conventional RAG cannot deliver adequate answers unless near-exact documents exist. This limitation appears frequently in real business scenarios.

📑Table of Contents

Limitations of Traditional RAG with Complex Queries
The Three-Stage Agent Search Approach (Decompose / Compose / Refine)
Why LangGraph and Architecture Design Choices
Parallelism, State Management, and Streaming Best Practices
Comparison Table: Traditional RAG vs Agent Search
Frequently Asked Questions
Summary and Next Steps

Limitations of Traditional RAG with Complex Queries

Traditional RAG combines vector search and generation in a simple pipeline. It works when near-matching documents are available. However, when a query involves multiple company names or ambiguous terms, a single search often fails to gather all necessary context. The Onyx case study on the official LangChain blog highlights this exact limitation. Readers should temporarily set aside the assumption that “search will always surface the answer.”

The Three-Stage Agent Search Approach (Decompose / Compose / Refine)

Agent Search builds a three-stage process with LangGraph. Decompose breaks the original question into narrower sub-questions informed by initial search. Compose synthesizes an initial answer from sub-question results plus source documents. Refine performs a second decomposition if the initial answer falls short. This cycle is detailed in the Onyx implementation published on the LangChain official blog. Readers can see where human judgment can still be inserted at each stage.

Why LangGraph and Architecture Design Choices

LangGraph was chosen after a one-week prototype because its node-edge-state model maps naturally to the required architecture, backed by a strong open-source community. Concerns around fan-out, subgraphs, state management, and streaming were resolved during prototyping, leading to full adoption. Practical code organization practices include one node per file, dedicated directories per subgraph, and Mermaid visualizations for debugging. Readers learn to validate architecture fit during the prototype phase.

Parallelism, State Management, and Streaming Best Practices

Parallelism leverages Map-Reduce branches for identical flows across documents and treats subgraphs as nodes to avoid unnecessary waiting. State management prefers Pydantic models over TypedDicts, with strict rules on default values: all input keys without defaults except for documented nested-subgraph cases. Streaming uses custom LangGraph events to deliver sub-answers and documents simultaneously. The Onyx implementation demonstrates how to prevent idle time by modeling subgraphs as first-class nodes. Readers can apply these patterns while watching for pitfalls in their own pipelines.

Comparison Table: Traditional RAG vs Agent Search

Aspect	Traditional RAG	Agent Search
Complex query handling	Limited (depends on near-exact matches)	Strong (decompose-compose-refine pipeline)
Multi-entity support	Weak	Strong (parallel processing + relationship extraction)
Implementation cost	Low	Medium (requires LangGraph learning)
Streaming	Basic	Advanced (subgraph + custom events)
Extensibility	Low	High (easy HITL and tool addition)

Source: LangChain official blog (https://www.langchain.com/blog/beyond-rag-implementing-agent-search-with-langgraph-for-smarter-knowledge-retrieval) and Onyx GitHub (https://github.com/onyx-dot-app/onyx) (as of February 2025)

Frequently Asked Questions

Does Agent Search completely replace RAG?
No. Agent Search complements RAG. Keep simple fact retrieval on traditional RAG and route only complex queries to Agent Search.
How steep is the LangGraph learning curve?
A one-week prototype typically brings practitioners to a usable level. Understanding nodes and state allows reuse of most existing LangChain code.
Does parallelism increase costs?
Subgraph reuse and formatting nodes keep the increase manageable. Avoiding unnecessary waiting is the key design point.
Are there real enterprise results?
Onyx reports improved answer quality on complex queries in production. Plans include adding Human-in-the-Loop capabilities and deeper tool integration.
How should existing RAG pipelines migrate?
Incremental adoption via subgraphs is recommended. Keep the existing search layer and connect the Agent Search graph downstream.

Summary and Next Steps

Agent Search does not fully replace RAG but complements it. Migration from existing pipelines is best done incrementally via subgraphs. LangGraph learning curve reaches practical proficiency in about one week of prototyping. Cost increases from parallelism are mitigated by subgraph reuse and formatting nodes. The Onyx case demonstrates improved answer quality on complex queries in production. Future plans include Human-in-the-Loop capabilities and deeper tool integration. Readers should analyze their own query characteristics and decide the appropriate scope for introducing Agent Search.

Related articles:

Author

krona23

Over 20 years in the IT industry, serving as Division Head and CTO at multiple companies running large-scale web services in Japan. Experienced across Windows, iOS, Android, and web development. Currently focused on AI-native transformation. At DevGENT, sharing practical guides on AI code editors, automation tools, and LLMs in three languages.

DevGENT about →

📚 Read Next