Valori – 我从零开始构建的 Python 原生向量数据库

2作者: varshith177 个月前
我一直在开发一个名为 Valori 的项目,这是一个用 Python 原生构建的向量数据库,我从头开始构建它——不是重新发明每一个算法,而是将高效、众所周知的索引和搜索技术组合成一个有凝聚力、可供黑客使用的框架。 这个想法源于我对现有向量数据库的沮丧,这些数据库要么过于笨重,不便于实验,要么过于不透明,难以修改。我想要一些简单、模块化和可扩展的东西——所以我构建了它。 它的功能: * 允许您存储、索引和搜索高维向量 * 支持多种索引(Flat, HNSW, IVF, LSH, Annoy) * 具有内存、磁盘和混合存储后端 * 包含完整的文档处理流程(解析、清理、分块、嵌入) * 提供量化、持久性和基于插件的可扩展性 * 全部用 Python 编写,与 NumPy 集成,并在生产环境中经过测试,内置了日志记录和监控功能。 安装: ```bash pip install valori ``` GitHub: [https://github.com/varshith-Git/valori](https://github.com/varshith-Git/valori) PyPI: [https://pypi.org/project/valori](https://pypi.org/project/valori) 我很乐意听取您的想法—— * 您认为当前的向量数据库缺少什么? * 如果您构建了 LLM 或 RAG 系统,您希望像这样的轻量级纯 Python 数据库能更好地处理什么? * 您更喜欢更紧密的集成(LangChain、Haystack 等)还是更“自己动手”的风格? 欢迎提供反馈、批评或合作想法。 — Varshith (varshith.gudur17@gmail.com)
查看原文
I’ve been working on a project called Valori, a Python-native vector database I built from the ground up — not by reinventing every algorithm, but by wiring together efficient, well-known indexing and search techniques into a cohesive, hackable framework.<p>The idea came from my frustration with existing vector DBs that were either too heavy for experimentation or too opaque to modify. I wanted something simple, modular, and extensible — so I built it.<p>What it does:<p>Lets you store, index, and search high-dimensional vectors<p>Supports multiple indices (Flat, HNSW, IVF, LSH, Annoy)<p>Has memory, disk, and hybrid storage backends<p>Includes a full document processing pipeline (parsing, cleaning, chunking, embedding)<p>Offers quantization, persistence, and plugin-based extensibility<p>All written in Python, integrated with NumPy, and production-tested with logging and monitoring built in.<p>Install:<p>pip install valori<p>GitHub: https:&#x2F;&#x2F;github.com&#x2F;varshith-Git&#x2F;valori<p>PyPI: https:&#x2F;&#x2F;pypi.org&#x2F;project&#x2F;valori<p>I’d love to hear your thoughts —<p>What’s missing for you in current vector DBs?<p>If you’ve built LLM or RAG systems, what do you wish a lightweight, pure Python DB like this handled better?<p>Would you prefer tighter integrations (LangChain, Haystack, etc.) or a more “build-it-yourself” style?<p>Feedback, criticism, or collaboration ideas are all welcome. — Varshith (varshith.gudur17@gmail.com )