Skip to content
AI Foundations for Bankers
0%

Weaviate, Milvus, and Graphlit

intermediate10 min readweaviatemilvusgraphlitvector-databaseopen-sourcehybrid-search

Beyond Pinecone: Why Alternatives Matter

Pinecone is the leading managed vector database, but it is not the only option -- and for some banking institutions, it may not be the right one. Banks with strict data residency requirements, existing on-premises infrastructure mandates, or specific technical needs may find better alignment with alternative vector databases.

This unit covers three important alternatives, each bringing distinct capabilities to the table: Weaviate (hybrid search), Milvus (open-source scale), and Graphlit (knowledge graphs).

Weaviate: Hybrid Search for Regulatory Documents

What Makes Weaviate Different

Weaviate's signature capability is native hybrid search -- the ability to combine semantic search (finding conceptually similar documents) with keyword search (finding documents containing specific terms) in a single query. This is not two separate searches stitched together; it is a unified search that blends both signals to produce better results.

Why Hybrid Search Matters for Banking

Consider a compliance officer searching for guidance on "BSA/AML customer due diligence for politically exposed persons." A pure semantic search might find documents about customer screening, risk-based approaches, and enhanced due diligence -- conceptually related, but potentially missing documents that specifically mention "PEP" or "politically exposed persons." A pure keyword search would find documents with those exact terms but miss relevant guidance that uses different terminology like "senior foreign political figure."

Hybrid search delivers both: documents that are semantically related to the concept and documents that contain the specific regulatory terms. For banking, where precise regulatory terminology coexists with varied business language, this combination produces meaningfully better retrieval.

Deployment Flexibility

Weaviate can run as:

  • A managed cloud service (Weaviate Cloud Services) for teams wanting operational simplicity
  • A self-hosted deployment using Docker or Kubernetes for institutions with on-premises requirements
  • An embedded database within your application for lightweight use cases

This deployment flexibility is particularly valuable for banks navigating different data classification requirements. Internal policy documents might be indexed in a self-hosted Weaviate instance inside the bank's data center, while publicly available regulatory guidance could be indexed in the cloud-hosted version.

BANKING ANALOGY

Weaviate's hybrid search works like the difference between searching your bank's loan portfolio by NAICS code (keyword -- exact industry classification) versus searching by "businesses similar to restaurants that were affected by COVID" (semantic -- conceptual similarity). The NAICS code search is precise but rigid. The semantic search is flexible but might miss relevant matches that use different classifications. A hybrid approach gives you both: businesses that are classified in the right industry codes and businesses that are conceptually similar even if they are classified differently. For regulatory document search, this combination is invaluable.

Milvus: Open-Source Scale

What Makes Milvus Different

Milvus is a fully open-source vector database designed for massive scale. While Pinecone and Weaviate handle millions of vectors well, Milvus is architected to handle billions of vectors across distributed clusters. It is the vector database you evaluate when your data volume is enterprise-grade and your infrastructure team has the expertise to operate distributed systems.

Key Capabilities

Distributed architecture. Milvus separates storage and compute, allowing independent scaling of each. When query volume increases, you add compute nodes. When data volume grows, you add storage. This architectural flexibility matters at banking scale.

Multiple index types. Milvus supports a wide range of indexing algorithms (IVF, HNSW, SCANN, DiskANN), each optimized for different trade-offs between search accuracy, speed, and memory usage. Your engineering team can tune the index type to your specific performance and accuracy requirements.

On-premises deployment. As a fully open-source project, Milvus can run entirely within your data center with no data leaving your network. For banks with data classification policies that prohibit cloud-hosted storage for certain document types, this is a requirement, not a preference.

Cost at scale. At very large embedding volumes, self-hosted Milvus can be significantly less expensive than managed services. The trade-off is the operational expertise required to manage a distributed database.

Banking Considerations

Milvus is the right choice when:

  • Your vector data volume is in the hundreds of millions or billions
  • You have infrastructure teams experienced with distributed systems (Kubernetes, distributed storage)
  • Data residency requirements mandate on-premises deployment with no exceptions
  • Cost optimization at large scale is a priority

It is not the right choice when:

  • You are in the early stages and need to move quickly (operational overhead is significant)
  • Your team lacks distributed systems expertise
  • Your data volume is in the low millions (simpler solutions will serve you well)

Graphlit: Knowledge Graph Integration

A Different Approach to Retrieval

Graphlit takes a fundamentally different approach to the retrieval problem. Instead of treating documents as isolated chunks with embedding vectors, Graphlit builds knowledge graphs -- structured representations of entities, relationships, and facts extracted from your documents.

Why Knowledge Graphs Matter

Traditional RAG retrieves document chunks that are semantically similar to a question. But some questions require understanding relationships between entities, not just similarity between text passages.

Consider: "Which of our commercial loan customers also have deposit relationships, and what is their total combined exposure?" This question requires understanding entity relationships (customer to loan, customer to deposit, customer to exposure amount) that are poorly served by vector similarity search alone.

Knowledge graphs excel at these relationship queries because they explicitly model entities (customers, products, exposures) and the relationships between them. Graphlit combines this structured knowledge graph approach with traditional vector search, enabling both similarity-based and relationship-based retrieval.

Banking Relevance

Knowledge graphs align well with banking's inherently relational data: customers have accounts, accounts have products, products have terms, loans have collateral, entities have beneficial owners, and so on. A RAG system enhanced with knowledge graph capabilities can answer questions that span these relationships -- something pure vector search struggles with.

Tip

When evaluating alternative vector databases, do not choose based solely on technical features. Assess your team's operational readiness. Weaviate's hybrid search and Milvus's scale are compelling, but self-hosted databases require dedicated infrastructure expertise, monitoring, backup procedures, and incident response capabilities. If your organization does not have this operational maturity for a new database category, a managed service may deliver better outcomes despite fewer features.

Comparison at a Glance

DimensionWeaviateMilvusGraphlit
Best forHybrid search (keyword + semantic)Massive scale, on-premisesRelationship-based retrieval
DeploymentCloud, self-hosted, embeddedSelf-hosted, cloud (Zilliz)Cloud service
ScaleMillions of vectorsBillions of vectorsMillions of entities + vectors
Open-sourceYesYesProprietary
Unique strengthNative hybrid searchDistributed architectureKnowledge graph + vector
Operational burdenModerateHighLow (managed)
Banking fitRegulatory doc searchLarge-scale on-premisesEntity relationship queries

Quick Recap

  • Weaviate excels at hybrid search -- combining keyword and semantic retrieval in a single query, ideal for regulatory document search
  • Milvus is built for massive scale and full on-premises deployment, suited for banks with strict data residency and large data volumes
  • Graphlit combines knowledge graphs with vector search, enabling relationship-based retrieval across banking entities
  • The choice between these alternatives depends on deployment requirements, scale needs, query patterns, and operational readiness
  • All three offer capabilities that managed services like Pinecone may not, particularly around hybrid search, on-premises deployment, and knowledge graphs

KNOWLEDGE CHECK

Why is Weaviate's hybrid search particularly valuable for banking regulatory document retrieval?

Under what circumstances would Milvus be the strongest vector database choice for a banking institution?

How does Graphlit's knowledge graph approach differ from traditional vector-based RAG?