HackerNews中文版

大家好，我们是 Arch（<https://github.com/katanemo/archgw>）背后的团队，Arch 是一个用 Rust 编写的开源 LLM 代理。今天，我们发布了 Arch-Router（<https://huggingface.co/katanemo/Arch-Router-1.5B>），一个用于基于偏好的路由的 15 亿参数路由器模型，现已集成到代理中。随着团队集成多个 LLM——每个模型都有不同的优势、风格或成本/延迟情况——将正确的提示路由到正确的模型成为应用程序设计的关键部分。但这仍然是一个未解决的问题。大多数路由系统分为两类： * 基于嵌入的路由器使用意图分类器——将提示标记为“支持”、“SQL”或“数学”，然后路由到匹配的模型。这适用于简单的任务，但在真实的对话中会崩溃。用户在对话中途改变话题，任务边界模糊，产品更改需要重新训练分类器。 * 基于性能的路由器根据 MMLU 或 MT-Bench 等基准测试，或根据延迟或成本曲线来选择模型。但基准测试通常会忽略生产中重要的事情：特定领域的质量或主观偏好，例如“法律部门会接受这个条款吗？” Arch-Router 采用了不同的方法：通过用通俗易懂的语言编写的偏好进行路由。您可以编写类似“合同条款 → GPT-4o”或“快速旅行提示 → Gemini Flash”的规则。路由器使用轻量级的 15 亿参数自回归模型将提示（和对话上下文）映射到这些规则。无需重新训练，无需脆弱的 if/else 链。我们是在 Twilio 和 Atlassian 团队的参与下构建的。它处理意图漂移，支持多轮对话，并允许您通过对路由策略进行一行更改来替换模型。完整细节请参阅我们的论文（<https://arxiv.org/abs/2506.16655>），但这里有一个概览：规格： * 15 亿参数——在单个 GPU（或用于测试的 CPU）上运行 * 无需重新训练——将其指向任何 LLM 组合 * 成本和延迟感知——将繁重的任务路由到昂贵的模型，将轻量级任务路由到更快/更便宜的模型 * 在我们的对话路由基准测试中，优于更大的闭源模型（详情请参阅论文）链接： * Arch 代理（开源）：<https://github.com/katanemo/archgw> * 模型 + 代码：<https://huggingface.co/katanemo/Arch-Router-1.5B> * 论文：<https://arxiv.org/abs/2506.16655>

查看原文

Hi HN — we're the team behind Arch (<a href="https://github.com/katanemo/archgw">https://github.com/katanemo/archgw</a>), an open-source proxy for LLMs written in Rust. Today we're releasing Arch-Router (<a href="https://huggingface.co/katanemo/Arch-Router-1.5B" rel="nofollow">https://huggingface.co/katanemo/Arch-Router-1.5B</a>), a 1.5B router model for preference-based routing, now integrated into the proxy. As teams integrate multiple LLMs - each with different strengths, styles, or cost/latency profiles — routing the right prompt to the right model becomes a critical part of the application design. But it's still an open problem. Most routing systems fall into two camps:- Embedding-based routers use intent classifiers — label a prompt as “support,” “SQL,” or “math,” then route to a matching model. This works for simple tasks but breaks down in real conversations. Users shift topics mid-conversation, task boundaries blur, and product changes require retraining classifiers.- Performance-based routers pick models based on benchmarks like MMLU or MT-Bench, or based on latency or cost curves. But benchmarks often miss what matters in production: domain-specific quality or subjective preferences like “Will legal accept this clause?”Arch-Router takes a different approach: route by preferences written in plain language. You write rules like “contract clauses → GPT-4o” or “quick travel tips → Gemini Flash.” The router maps the prompt (and conversation context) to those rules using a lightweight 1.5B autoregressive model. No retraining, no fragile if/else chains. We built this with input from teams at Twilio and Atlassian. It handles intent drift, supports multi-turn conversations, and lets you swap in or out models with a one-line change to the routing policy. Full details are in our paper (<a href="https://arxiv.org/abs/2506.16655" rel="nofollow">https://arxiv.org/abs/2506.16655</a>), but here's a snapshot:Specs:- 1.5B params — runs on a single GPU (or CPU for testing)- No retraining needed — point it at any mix of LLMs- Cost and latency aware — route heavy tasks to expensive models, light tasks to faster/cheaper ones- Outperforms larger closed models on our conversational routing benchmarks (details in the paper)Links:- Arch Proxy (open source): <a href="https://github.com/katanemo/archgw">https://github.com/katanemo/archgw</a>- Model + code: <a href="https://huggingface.co/katanemo/Arch-Router-1.5B" rel="nofollow">https://huggingface.co/katanemo/Arch-Router-1.5B</a>- Paper: <a href="https://arxiv.org/abs/2506.16655" rel="nofollow">https://arxiv.org/abs/2506.16655</a>

Show HN: Arch-Router – 基于偏好而非基准测试的 LLM 路由，15 亿参数模型