HackerNews中文版

嘿，HN，我是 Henry 和 Roman。我们一直在构建一个跨平台框架，用于在智能手机上本地部署 LLM、VLM、嵌入模型和 TTS 模型。 Ollama 可以在笔记本电脑和边缘服务器上本地部署 LLM 模型，而 Cactus 可以在手机上部署。直接在手机上部署有助于构建 AI 应用和智能体，这些应用和智能体能够在不泄露隐私的情况下使用手机，支持无延迟的实时推理，我们已经看到了针对用户的个性化 RAG 管道等等。苹果和谷歌最近都积极推出了本地 AI 模型，分别推出了 Apple Foundation Frameworks 和 Google AI Edge。然而，两者都是特定于平台的，并且仅支持来自公司的特定模型。为此，Cactus： - 适用于 Flutter、React-Native 和 Kotlin Multi-platform，面向跨平台开发者，因为当今大多数应用都是用这些构建的。 - 支持你在 Huggingface 上找到的任何 GGUF 模型；Qwen、Gemma、Llama、DeepSeek、Phi、Mistral、SmolLM、SmolVLM、InternVLM、Jan Nano 等。 - 兼容从 FP32 到低至 2 位量化模型，以提高效率并减少设备负担。 - 具有 MCP 工具调用，使其性能更好，真正有用（设置提醒、图库搜索、回复消息）等等。 - 针对复杂、受限或大上下文任务回退到大型云模型，确保稳健性和高可用性。它是完全开源的。希望有更多人试用它，并告诉我们如何让它变得更好！代码库：[https://github.com/cactus-compute/cactus](https://github.com/cactus-compute/cactus)

查看原文

Hey HN, Henry and Roman here - we've been building a cross-platform framework for deploying LLMs, VLMs, Embedding Models and TTS models locally on smartphones.Ollama enables deploying LLMs models locally on laptops and edge severs, Cactus enables deploying on phones. Deploying directly on phones facilitates building AI apps and agents capable of phone use without breaking privacy, supports real-time inference with no latency, we have seen personalised RAG pipelines for users and more.Apple and Google actively went into local AI models recently with the launch of Apple Foundation Frameworks and Google AI Edge respectively. However, both are platform-specific and only support specific models from the company. To this end, Cactus:- Is available in Flutter, React-Native & Kotlin Multi-platform for cross-platform developers, since most apps are built with these today.- Supports any GGUF model you can find on Huggingface; Qwen, Gemma, Llama, DeepSeek, Phi, Mistral, SmolLM, SmolVLM, InternVLM, Jan Nano etc.- Accommodates from FP32 to as low as 2-bit quantized models, for better efficiency and less device strain.- Have MCP tool-calls to make them performant, truly helpful (set reminder, gallery search, reply messages) and more.- Fallback to big cloud models for complex, constrained or large-context tasks, ensuring robustness and high availability.It's completely open source. Would love to have more people try it out and tell us how to make it great!Repo: <a href="https://github.com/cactus-compute/cactus">https://github.com/cactus-compute/cactus</a>

Show HN: Cactus – 智能手机上的 Ollama