Weaviate, Milvus, and Graphlit

intermediate10 min readweaviatemilvusgraphlitvector-databaseopen-sourcehybrid-search

Beyond Pinecone: Why Alternatives Matter

Pinecone is the leading managed vector databaseVector DatabaseA specialized database optimized for storing and querying high-dimensional vectors (embeddings). Enables fast similarity search across millions of documents for RAG and recommendation systems.See glossary, but it is not the only option -- and for some banking institutions, it may not be the right one. Banks with strict data residency requirements, existing on-premises infrastructure mandates, or specific technical needs may find better alignment with alternative vector databases.

This unit covers three important alternatives, each bringing distinct capabilities to the table: Weaviate (hybrid search), Milvus (open-source scale), and Graphlit (knowledge graphs).

Weaviate: Hybrid Search for Regulatory Documents

What Makes Weaviate Different

Weaviate's signature capability is native hybrid search -- the ability to combine semantic searchSemantic SearchSearch that understands meaning rather than just matching keywords. Uses embeddings to find conceptually similar documents even when they use different terminology.See glossary (finding conceptually similar documents) with keyword search (finding documents containing specific terms) in a single query. This is not two separate searches stitched together; it is a unified search that blends both signals to produce better results.

Why Hybrid Search Matters for Banking

Consider a compliance officer searching for guidance on "BSA/AML customer due diligence for politically exposed persons." A pure semantic search might find documents about customer screening, risk-based approaches, and enhanced due diligence -- conceptually related, but potentially missing documents that specifically mention "PEP" or "politically exposed persons." A pure keyword search would find documents with those exact terms but miss relevant guidance that uses different terminology like "senior foreign political figure."

Hybrid search delivers both: documents that are semantically related to the concept and documents that contain the specific regulatory terms. For banking, where precise regulatory terminology coexists with varied business language, this combination produces meaningfully better retrieval.

Deployment Flexibility

Weaviate can run as:

A managed cloud service (Weaviate Cloud Services) for teams wanting operational simplicity
A self-hosted deployment using Docker or Kubernetes for institutions with on-premises requirements
An embedded database within your application for lightweight use cases

This deployment flexibility is particularly valuable for banks navigating different data classification requirements. Internal policy documents might be indexed in a self-hosted Weaviate instance inside the bank's data center, while publicly available regulatory guidance could be indexed in the cloud-hosted version.

BANKING ANALOGY

Weaviate's hybrid search works like the difference between searching your bank's loan portfolio by NAICS code (keyword -- exact industry classification) versus searching by "businesses similar to restaurants that were affected by COVID" (semantic -- conceptual similarity). The NAICS code search is precise but rigid. The semantic search is flexible but might miss relevant matches that use different classifications. A hybrid approach gives you both: businesses that are classified in the right industry codes and businesses that are conceptually similar even if they are classified differently. For regulatory document search, this combination is invaluable.

Milvus: Open-Source Scale

What Makes Milvus Different

Milvus is a fully open-source vector database designed for massive scale. While Pinecone and Weaviate handle millions of vectors well, Milvus is architected to handle billions of vectors across distributed clusters. It is the vector database you evaluate when your data volume is enterprise-grade and your infrastructure team has the expertise to operate distributed systems.

Key Capabilities

Distributed architecture. Milvus separates storage and compute, allowing independent scaling of each. When query volume increases, you add compute nodes. When data volume grows, you add storage. This architectural flexibility matters at banking scale.

Multiple index types. Milvus supports a wide range of indexing algorithms (IVF, HNSW, SCANN, DiskANN), each optimized for different trade-offs between search accuracy, speed, and memory usage. Your engineering team can tune the index type to your specific performance and accuracy requirements.

On-premises deployment. As a fully open-source project, Milvus can run entirely within your data center with no data leaving your network. For banks with data classification policies that prohibit cloud-hosted storage for certain document types, this is a requirement, not a preference.

Cost at scale. At very large embeddingEmbeddingsNumerical representations (vectors) of text that capture semantic meaning. Similar concepts produce vectors that are close together, enabling machines to understand relationships between words, sentences, or documents.See glossary volumes, self-hosted Milvus can be significantly less expensive than managed services. The trade-off is the operational expertise required to manage a distributed database.

Banking Considerations

Milvus is the right choice when:

Your vector data volume is in the hundreds of millions or billions
You have infrastructure teams experienced with distributed systems (Kubernetes, distributed storage)
Data residency requirements mandate on-premises deployment with no exceptions
Cost optimization at large scale is a priority

It is not the right choice when:

You are in the early stages and need to move quickly (operational overhead is significant)
Your team lacks distributed systems expertise
Your data volume is in the low millions (simpler solutions will serve you well)

Graphlit: Knowledge Graph Integration

A Different Approach to Retrieval

Graphlit takes a fundamentally different approach to the retrieval problem. Instead of treating documents as isolated chunks with embedding vectors, Graphlit builds knowledge graphs -- structured representations of entities, relationships, and facts extracted from your documents.

Why Knowledge Graphs Matter

Traditional RAG retrieves document chunks that are semantically similar to a question. But some questions require understanding relationships between entities, not just similarity between text passages.

Consider: "Which of our commercial loan customers also have deposit relationships, and what is their total combined exposure?" This question requires understanding entity relationships (customer to loan, customer to deposit, customer to exposure amount) that are poorly served by vector similarity search alone.

Knowledge graphs excel at these relationship queries because they explicitly model entities (customers, products, exposures) and the relationships between them. Graphlit combines this structured knowledge graph approach with traditional vector search, enabling both similarity-based and relationship-based retrieval.

Banking Relevance

Knowledge graphs align well with banking's inherently relational data: customers have accounts, accounts have products, products have terms, loans have collateral, entities have beneficial owners, and so on. A RAG system enhanced with knowledge graph capabilities can answer questions that span these relationships -- something pure vector search struggles with.

Tip

When evaluating alternative vector databases, do not choose based solely on technical features. Assess your team's operational readiness. Weaviate's hybrid search and Milvus's scale are compelling, but self-hosted databases require dedicated infrastructure expertise, monitoring, backup procedures, and incident response capabilities. If your organization does not have this operational maturity for a new database category, a managed service may deliver better outcomes despite fewer features.

Comparison at a Glance

Dimension	Weaviate	Milvus	Graphlit
Best for	Hybrid search (keyword + semantic)	Massive scale, on-premises	Relationship-based retrieval
Deployment	Cloud, self-hosted, embedded	Self-hosted, cloud (Zilliz)	Cloud service
Scale	Millions of vectors	Billions of vectors	Millions of entities + vectors
Open-source	Yes	Yes	Proprietary
Unique strength	Native hybrid search	Distributed architecture	Knowledge graph + vector
Operational burden	Moderate	High	Low (managed)
Banking fit	Regulatory doc search	Large-scale on-premises	Entity relationship queries

Quick Recap

Weaviate excels at hybrid search -- combining keyword and semantic retrieval in a single query, ideal for regulatory document search
Milvus is built for massive scale and full on-premises deployment, suited for banks with strict data residency and large data volumes
Graphlit combines knowledge graphs with vector search, enabling relationship-based retrieval across banking entities
The choice between these alternatives depends on deployment requirements, scale needs, query patterns, and operational readiness
All three offer capabilities that managed services like Pinecone may not, particularly around hybrid search, on-premises deployment, and knowledge graphs

KNOWLEDGE CHECK

Why is Weaviate's hybrid search particularly valuable for banking regulatory document retrieval?

Under what circumstances would Milvus be the strongest vector database choice for a banking institution?

How does Graphlit's knowledge graph approach differ from traditional vector-based RAG?