我用 40MB 打造了一个开源离线 ChatGPT 替代品

2作者: RajGuruYadav5 个月前
— 作者:Raj Guru Yadav 和许多开发者一样,我一直对大型语言模型(LLM)着迷。但当我问出这个问题时: “我能离线运行一个类似 ChatGPT 的助手吗?速度快,而且不需要 16GB 以上的内存?” 这个挑战变得太诱人了,我无法忽视。<p>目标 构建一个完全离线、轻量级的 AI 助手,具备以下特点:<p>下载大小 &lt; 50MB<p>无需互联网连接<p>快速响应(1 秒以内)<p>零遥测数据<p>完全本地的嵌入和推理<p>结果:一个 40MB 的离线 ChatGPT 克隆,你可以在浏览器或 USB 闪存盘上运行。<p>40MB 里有什么? 以下是我如何将智能对话压缩到如此小的包中的方法:<p>模型:Mistral 7B Q4_K_M,通过 llama.cpp 量化<p>推理引擎:llama.cpp(编译成 WebAssembly 或原生 C++)<p>UI:轻量级的 React/Tailwind 界面<p>存储:IndexedDB 用于本地聊天记录<p>嵌入:本地 MiniLM 用于智能 PDF 或笔记搜索<p>附加功能:Whisper.cpp 用于本地语音输入;Coqui TTS 用于语音输出<p>我为什么构建它 我(Raj Guru Yadav),一个 16 岁的开发者和学生,想:<p>深入了解 LLM 实际是如何工作的<p>构建一些尊重隐私和本地化的东西<p>证明 AI 不需要云就能强大<p>为离线用户(比如印度的许多学生)提供真正的 AI 支持<p>挑战 低内存设备中的内存瓶颈<p>针对小型模型进行提示调优,以获得更智能的回复<p>WebAssembly 优化,以提高浏览器性能<p>具有小型 TTS/ASR 模型的离线语音 + 文本集成<p>性能(在 4GB 笔记本电脑上) 能够体面地回答事实、编码和数学问题<p>阅读并总结离线 PDF<p>本地记住对话<p>(可选)大声说出答案<p>最后的想法 AI 不应该被锁在付费墙或云端之后。 我的目标是把智能助手带到每个人的手中—— 完全离线,完全免费,完全属于你。<p>由 制作 Raj Guru Yadav<p>开发者 | 700+ 项目的构建者 | 热衷于为所有人提供开放的 AI
查看原文
— by Raj Guru Yadav Like many developers, I’ve been fascinated by LLMs. But the moment I asked: “Can I run a ChatGPT-like assistant offline, fast, and without needing 16GB+ RAM?” The challenge became too tempting to ignore.<p>The Goal Build a fully offline, lightweight AI assistant with:<p>&lt; 50MB download size<p>No internet requirement<p>Fast responses (under 1 second)<p>Zero telemetry<p>Fully local embeddings &amp; inference<p>Result: A 40MB offline ChatGPT clone you can run in-browser or on a USB stick.<p>What’s Inside the 40MB? Here’s how I squeezed intelligent conversation into such a tiny package:<p>Model: Mistral 7B Q4_K_M quantized via llama.cpp<p>Inference Engine: llama.cpp (compiled to WebAssembly or native C++)<p>UI: Lightweight React&#x2F;Tailwind interface<p>Storage: IndexedDB for local chat history<p>Embeddings: Local MiniLM for smart PDF or note search<p>Extras: Whisper.cpp for local voice input; Coqui TTS for speech output<p>Why I Built It I (Raj Guru Yadav), a 16-year-old dev and student, wanted to:<p>Learn deeply how LLMs actually work under the hood<p>Build something privacy-respecting and local<p>Prove that AI doesn’t need the cloud to be powerful<p>Give offline users (like many students in India) real AI support<p>Challenges Memory bottlenecks in low-RAM devices<p>Prompt tuning for smarter replies in tiny models<p>WebAssembly optimizations for browser performance<p>Offline voice + text integration with small TTS&#x2F;ASR models<p>Performance (on a 4GB laptop) Answers factual, coding, and math questions decently<p>Reads and summarizes offline PDFs<p>Remembers conversation locally<p>(Optional) Speaks answers aloud<p>Final Thought AI shouldn’t be locked behind paywalls or clouds. My goal is to bring smart assistants into everyone’s hands — fully offline, fully free, fully yours.<p>Made with by Raj Guru Yadav<p>Dev | Builder of 700+ projects | Passionate about open AI for all