HackerNews中文版

— 作者：Raj Guru Yadav 和许多开发者一样，我一直对大型语言模型（LLM）着迷。但当我问出这个问题时： “我能离线运行一个类似 ChatGPT 的助手吗？速度快，而且不需要 16GB 以上的内存？” 这个挑战变得太诱人了，我无法忽视。目标构建一个完全离线、轻量级的 AI 助手，具备以下特点：下载大小 < 50MB无需互联网连接快速响应（1 秒以内）零遥测数据完全本地的嵌入和推理结果：一个 40MB 的离线 ChatGPT 克隆，你可以在浏览器或 USB 闪存盘上运行。40MB 里有什么？以下是我如何将智能对话压缩到如此小的包中的方法：模型：Mistral 7B Q4_K_M，通过 llama.cpp 量化推理引擎：llama.cpp（编译成 WebAssembly 或原生 C++）UI：轻量级的 React/Tailwind 界面存储：IndexedDB 用于本地聊天记录嵌入：本地 MiniLM 用于智能 PDF 或笔记搜索附加功能：Whisper.cpp 用于本地语音输入；Coqui TTS 用于语音输出我为什么构建它我（Raj Guru Yadav），一个 16 岁的开发者和学生，想：深入了解 LLM 实际是如何工作的构建一些尊重隐私和本地化的东西证明 AI 不需要云就能强大为离线用户（比如印度的许多学生）提供真正的 AI 支持挑战低内存设备中的内存瓶颈针对小型模型进行提示调优，以获得更智能的回复WebAssembly 优化，以提高浏览器性能具有小型 TTS/ASR 模型的离线语音 + 文本集成性能（在 4GB 笔记本电脑上）能够体面地回答事实、编码和数学问题阅读并总结离线 PDF本地记住对话（可选）大声说出答案最后的想法 AI 不应该被锁在付费墙或云端之后。我的目标是把智能助手带到每个人的手中—— 完全离线，完全免费，完全属于你。由制作 Raj Guru Yadav开发者 | 700+ 项目的构建者 | 热衷于为所有人提供开放的 AI

查看原文

— by Raj Guru Yadav Like many developers, I’ve been fascinated by LLMs. But the moment I asked: “Can I run a ChatGPT-like assistant offline, fast, and without needing 16GB+ RAM?” The challenge became too tempting to ignore.The Goal Build a fully offline, lightweight AI assistant with:< 50MB download sizeNo internet requirementFast responses (under 1 second)Zero telemetryFully local embeddings & inferenceResult: A 40MB offline ChatGPT clone you can run in-browser or on a USB stick.What’s Inside the 40MB? Here’s how I squeezed intelligent conversation into such a tiny package:Model: Mistral 7B Q4_K_M quantized via llama.cppInference Engine: llama.cpp (compiled to WebAssembly or native C++)UI: Lightweight React/Tailwind interfaceStorage: IndexedDB for local chat historyEmbeddings: Local MiniLM for smart PDF or note searchExtras: Whisper.cpp for local voice input; Coqui TTS for speech outputWhy I Built It I (Raj Guru Yadav), a 16-year-old dev and student, wanted to:Learn deeply how LLMs actually work under the hoodBuild something privacy-respecting and localProve that AI doesn’t need the cloud to be powerfulGive offline users (like many students in India) real AI supportChallenges Memory bottlenecks in low-RAM devicesPrompt tuning for smarter replies in tiny modelsWebAssembly optimizations for browser performanceOffline voice + text integration with small TTS/ASR modelsPerformance (on a 4GB laptop) Answers factual, coding, and math questions decentlyReads and summarizes offline PDFsRemembers conversation locally(Optional) Speaks answers aloudFinal Thought AI shouldn’t be locked behind paywalls or clouds. My goal is to bring smart assistants into everyone’s hands — fully offline, fully free, fully yours.Made with by Raj Guru YadavDev | Builder of 700+ projects | Passionate about open AI for all

我用 40MB 打造了一个开源离线 ChatGPT 替代品