Valori – 我从零开始构建的 Python 原生向量数据库
2 分•作者: varshith17•7 个月前
我一直在开发一个名为 Valori 的项目,这是一个用 Python 原生构建的向量数据库,我从头开始构建它——不是重新发明每一个算法,而是将高效、众所周知的索引和搜索技术组合成一个有凝聚力、可供黑客使用的框架。
这个想法源于我对现有向量数据库的沮丧,这些数据库要么过于笨重,不便于实验,要么过于不透明,难以修改。我想要一些简单、模块化和可扩展的东西——所以我构建了它。
它的功能:
* 允许您存储、索引和搜索高维向量
* 支持多种索引(Flat, HNSW, IVF, LSH, Annoy)
* 具有内存、磁盘和混合存储后端
* 包含完整的文档处理流程(解析、清理、分块、嵌入)
* 提供量化、持久性和基于插件的可扩展性
* 全部用 Python 编写,与 NumPy 集成,并在生产环境中经过测试,内置了日志记录和监控功能。
安装:
```bash
pip install valori
```
GitHub: [https://github.com/varshith-Git/valori](https://github.com/varshith-Git/valori)
PyPI: [https://pypi.org/project/valori](https://pypi.org/project/valori)
我很乐意听取您的想法——
* 您认为当前的向量数据库缺少什么?
* 如果您构建了 LLM 或 RAG 系统,您希望像这样的轻量级纯 Python 数据库能更好地处理什么?
* 您更喜欢更紧密的集成(LangChain、Haystack 等)还是更“自己动手”的风格?
欢迎提供反馈、批评或合作想法。
— Varshith
(varshith.gudur17@gmail.com)
查看原文
I’ve been working on a project called Valori, a Python-native vector database I built from the ground up — not by reinventing every algorithm, but by wiring together efficient, well-known indexing and search techniques into a cohesive, hackable framework.<p>The idea came from my frustration with existing vector DBs that were either too heavy for experimentation or too opaque to modify. I wanted something simple, modular, and extensible — so I built it.<p>What it does:<p>Lets you store, index, and search high-dimensional vectors<p>Supports multiple indices (Flat, HNSW, IVF, LSH, Annoy)<p>Has memory, disk, and hybrid storage backends<p>Includes a full document processing pipeline (parsing, cleaning, chunking, embedding)<p>Offers quantization, persistence, and plugin-based extensibility<p>All written in Python, integrated with NumPy, and production-tested with logging and monitoring built in.<p>Install:<p>pip install valori<p>GitHub: https://github.com/varshith-Git/valori<p>PyPI: https://pypi.org/project/valori<p>I’d love to hear your thoughts —<p>What’s missing for you in current vector DBs?<p>If you’ve built LLM or RAG systems, what do you wish a lightweight, pure Python DB like this handled better?<p>Would you prefer tighter integrations (LangChain, Haystack, etc.) or a more “build-it-yourself” style?<p>Feedback, criticism, or collaboration ideas are all welcome.
— Varshith
(varshith.gudur17@gmail.com
)