Align Vector DB Choice to Infrastructure and Scale

If your app runs on PostgreSQL with under 10M vectors, install pgvector extension for free—vectors join relational data in ACID transactions without new infra or sync lag. MongoDB Atlas Vector Search unifies embeddings, JSON docs, and metadata in one collection (HNSW indexing to 4096 dims); M0 free tier (512MB), Flex caps at $30/mo, dedicated from $57/mo (M10), with one-click Voyage AI embeddings. These eliminate dual writes and sprawl, ideal for full-stack apps where vectors augment operational data.

For billion-scale RAG/agentic workloads without DevOps, Pinecone's serverless SaaS handles billions (Rust engine, multi-tenant isolation); tiers: free Starter, $20/mo Builder (new 2026 for solos), $50/mo Standard min, $500/mo Enterprise. Add BYOC on AWS/GCP/Azure, Inference for hosted embeddings/rerankers, Assistant for chat agents, Dedicated Read Nodes for read-heavy loads. Milvus OSS/Zilliz Cloud targets 100B+ vectors with Cardinal engine (10x throughput, 3x faster indexing vs HNSW) and GPU accel; pairs with Kafka/Spark but adds metadata/object storage ops overhead.

Self-host Qdrant (Rust-native, 29k GitHub stars) for top price-perf up to 50M vectors at $30-50/mo VPS—composable queries fuse dense/sparse vectors, filters, custom scoring; free tier 1GB RAM/4GB disk (no CC), edge deployable. Weaviate excels at hybrid search (BM25 keywords + dense vectors + filters in one query, multimodal text/images/audio); $45/mo Flex min (post-Oct 2025, retired $25), $280/mo Plus annual, swap embedding models modularly.

Tradeoffs: Prototyping Speed vs Production Scale

Prototype LLM apps fastest with Chroma OSS (embedded or server)—intuitive API, high recall ANN, no DB expertise needed; Cloud Starter $0 + usage, Team $250/mo + usage, suits small-medium scale scaffolding. LanceDB OSS/cloud goes serverless on S3/GCS (Lance columnar format for on-disk filtering, no memory overhead), AWS-validated for billion-scale elastic queries, strong multimodal text/images/structured retrieval.

Skip full DBs for research/custom pipelines—use Faiss library (Meta AI, GPU CUDA) with IVF/HNSW/PQ indexes; tune nlist/nprobe for speed/accuracy, but add your own persistence/query API.

DBMax ScaleStart PriceKey Tradeoff
PineconeSaaSBillionsFree/$20
Milvus/Zilliz100B+OSS freeGPU scale, ops complexity
Qdrant50MFree tier$30-50 perf leader
WeaviateLarge$45Hybrid search native
pgvectorMillionsFreePostgres only
Mongo AtlasMillions$0-30Doc unification
ChromaSmall-MedFree/$0+Dev speed, not extreme scale
LanceDBLargeFreeS3 serverless
FaissCustomFreeLibrary, no ops