Advanced RAG Patterns for Enterprise
Beyond Basic RAG
The basic RAGRetrieval-Augmented Generation (RAG)A pattern that combines document retrieval with LLM generation. The system searches a knowledge base for relevant context, then feeds it to the model to produce grounded, accurate answers.See glossary pattern -- embed a question, find similar document chunks, pass them to the LLM -- works well for straightforward use cases. But banking demands more. When a compliance officer asks about anti-money laundering requirements for correspondent banking, the system needs to retrieve the right passages from the right documents with high precision. A wrong or incomplete retrieval does not just produce a bad answer -- it produces a confidently wrong answer that could inform a compliance decision.
This unit covers the advanced patterns that elevate RAG from a prototype to a production-grade system suitable for banking workloads.
Pattern 1: Hybrid Search
We introduced hybrid search in the Weaviate unit, but it deserves deeper treatment because it is one of the most impactful upgrades to basic RAG.
The Problem
Pure semantic searchSemantic SearchSearch that understands meaning rather than just matching keywords. Uses embeddings to find conceptually similar documents even when they use different terminology.See glossary finds documents that are conceptually related to the query. But banking queries often include specific regulatory terms, statute references, or account numbers that must be matched exactly. Semantic search might understand the concept behind "12 CFR 1026.35" but miss a document that specifically references this regulation.
The Solution
Hybrid search runs both keyword-based search (BM25 or similar algorithms) and embeddingEmbeddingsNumerical representations (vectors) of text that capture semantic meaning. Similar concepts produce vectors that are close together, enabling machines to understand relationships between words, sentences, or documents.See glossary-based semantic search simultaneously, then combines the results using a weighted scoring function. The weight balance can be tuned: for regulatory queries heavy in technical terms, weight keyword search higher; for natural-language questions, weight semantic search higher.
Implementation Approach
Most hybrid search implementations use Reciprocal Rank Fusion (RRF) to combine results from both search methods. RRF assigns scores based on the rank position in each result list and combines them, ensuring that documents appearing at the top of both lists score highest.
Pattern 2: Reranking
The Problem
Initial retrieval from a vector databaseVector DatabaseA specialized database optimized for storing and querying high-dimensional vectors (embeddings). Enables fast similarity search across millions of documents for RAG and recommendation systems.See glossary uses fast but approximate similarity matching. The top 20 results are "probably relevant," but their ranking may not reflect true relevance to the specific question. Document chunk number 15 might actually be more relevant than chunk number 3.
The Solution
After initial retrieval, pass the results through a reranking model -- a specialized model trained specifically to assess the relevance of a document to a query. Reranking models are slower than vector search (they process each result individually), but they are far more accurate at judging true relevance.
The typical pattern:
- Retrieve the top 20-50 results from the vector database (fast, approximate)
- Pass all results through the reranker with the original query (slower, precise)
- Use the reranker's scores to select the top 3-5 most relevant results
- Send only these highly relevant results to the LLM for answer generation
Banking Impact
Reranking reduces the risk of the LLM receiving marginally relevant or irrelevant context, which is a primary cause of hallucinatedHallucinationWhen an AI model generates plausible-sounding but factually incorrect information. A critical risk in banking where inaccurate outputs could lead to regulatory violations or financial losses.See glossary or incorrect answers. In banking, where answer accuracy directly affects compliance and risk decisions, the precision improvement from reranking justifies the additional latency.
Pattern 3: Query Decomposition
The Problem
Complex banking questions often contain multiple sub-questions: "Compare our current CRE concentration against the OCC's guidance and explain what actions we should take if we exceed the soft limit." This single question requires: (1) retrieving the bank's current CRE exposure data, (2) retrieving OCC guidance on CRE concentration, and (3) retrieving the bank's internal escalation procedures.
The Solution
Query decomposition automatically breaks complex questions into simpler sub-questions, retrieves for each independently, and then synthesizes the results. This ensures that each aspect of the question gets focused retrieval attention rather than hoping a single query will surface all relevant information.
Implementation
An LLM analyzes the original question and generates sub-questions. Each sub-question is routed to the appropriate index or data source. Results are collected and passed to the LLM along with the original question for synthesis.
BANKING ANALOGY
Query decomposition works like a senior research analyst managing a complex request from the CEO. When the CEO asks, "How does our digital banking adoption compare to peer institutions, and what investment would we need to close the gap?" the analyst does not try to answer from a single source. They decompose the request: first, pull internal digital banking metrics; second, research peer institution adoption rates from industry reports; third, estimate the technology investment required based on vendor proposals and internal capacity assessments. Each sub-question gets its own research track, and the analyst synthesizes everything into a comprehensive briefing. Query decomposition automates this exact analytical approach.
Pattern 4: Multi-Index Strategy
The Problem
Different document collections have different characteristics and serve different retrieval needs. Your compliance policy manual (structured, versioned, authoritative) is fundamentally different from your customer complaint logs (unstructured, high-volume, time-sensitive). Using a single index and retrieval strategy for both produces suboptimal results.
The Solution
Maintain separate indexes for different document collections, each optimized for its specific characteristics:
- Compliance policies: Tree index with version metadata filtering
- Regulatory guidance: Hybrid search index (keyword + semantic)
- Credit memos: Vector index with department and risk-rating metadata
- Customer communications: Vector index with date-range filtering
- Market research: Vector index with source and topic metadata
A router determines which index or indexes to query based on the nature of the question. Some questions may query multiple indexes simultaneously.
Pattern 5: RAG Evaluation
Why Evaluation Matters for Banking
You cannot improve what you cannot measure. And in banking, you cannot deploy what you cannot demonstrate is accurate. RAG evaluation is not optional -- it is the mechanism that gives compliance officers, risk managers, and regulators confidence that the system produces reliable outputs.
Key Metrics
Retrieval metrics:
- Recall: Does the system retrieve all relevant documents? (Missing a critical regulation is unacceptable)
- Precision: Are the retrieved documents actually relevant? (Noise degrades answer quality)
- Mean Reciprocal Rank: Are the most relevant documents ranked highest?
Generation metrics:
- Faithfulness: Is the generated answer supported by the retrieved documents? (Does the LLM hallucinate beyond what the context provides?)
- Relevance: Does the answer actually address the question asked?
- Completeness: Does the answer cover all aspects of the question?
Building an Evaluation Framework
For banking RAG deployments, establish:
- Ground truth datasets: Subject matter experts create question-answer pairs with identified source documents
- Automated evaluation: Run new questions through the pipeline and compare results against ground truth
- Human evaluation: Compliance and business experts regularly review system outputs for accuracy
- Regression testing: When you change the pipeline (new chunking strategy, different embedding model, updated documents), verify that previously correct answers remain correct
- Production monitoring: Track retrieval and generation metrics on live queries to detect degradation
Tip
Build your RAG evaluation framework before you optimize your pipeline. Without evaluation, you are making changes blindly -- you might improve performance on one type of query while degrading it on another. Start with 50-100 question-answer pairs created by your compliance or business subject matter experts. This ground truth dataset becomes the foundation for every pipeline improvement you make.
Putting It All Together
A production banking RAG system typically combines multiple advanced patterns:
- User asks a question
- Query decomposition breaks complex questions into sub-questions
- Router determines which indexes to query for each sub-question
- Hybrid search retrieves from the appropriate indexes using both keyword and semantic matching
- Reranker re-scores results for true relevance
- LLM generates an answer from the highest-ranked results with citations
- Evaluation pipeline monitors retrieval and generation quality continuously
Each pattern addresses a specific failure mode of basic RAG. Together, they produce a system that banking professionals can trust.
Quick Recap
- Basic RAG is insufficient for banking -- advanced patterns address precision, recall, and reliability requirements
- Hybrid search combines keyword and semantic matching for regulatory terminology accuracy
- Reranking improves precision by re-scoring initial retrieval results with a specialized model
- Query decomposition handles complex multi-part questions by breaking them into focused sub-questions
- Multi-index strategies optimize retrieval for different document types and use cases
- RAG evaluation is mandatory for banking -- build the evaluation framework before optimizing the pipeline
KNOWLEDGE CHECK
Why is reranking particularly important for banking RAG applications?
A compliance officer asks: Compare our BSA/AML procedures against the latest FinCEN guidance and identify gaps. Which advanced RAG pattern is most critical for handling this question effectively?
Why should a banking institution build a RAG evaluation framework before optimizing the retrieval pipeline?
What is the primary advantage of a multi-index RAG strategy over using a single index for all banking documents?