Show HN: ArchGW – 智能边缘与服务代理,专为 Agent 设计
1 分•作者: honorable_coder•7 个月前
Hey HN!
大家好!我们是Adil、Salman和Jose,我们开发了archgw [1]。archgw是一个智能代理服务器,设计为代理的边缘和AI网关——它原生就能够处理提示,而不仅仅是网络流量。我们做了一些重大改变,所以再次分享这个项目。
简单介绍一下我们构建这个项目的背景。构建AI代理演示很容易,但要创建可用于生产环境的东西,需要做很多重复的、低级别的管道工作,而每个人都在做这些工作。你需要应用安全措施,以确保不安全或无关的请求不会通过。你需要澄清模糊的输入,以避免代理犯错误。你需要根据上下文或任务类型将提示路由到正确的专家代理。你需要编写集成代码,以便快速安全地添加对新LLM的支持。而且,每次有新的框架问世或更新时,你都需要验证或重新实现相同的逻辑——一遍又一遍。
将所有低级别的管道代码放在一个框架中会变得难以管理,更难以更新和扩展。低级别的工作不是业务逻辑。这就是我们构建archgw的原因——一个智能代理服务器,它在入口和出口处理提示,并从单个软件服务提供多种相关功能。它位于你的应用程序运行时之外,因此你可以保持业务逻辑的清晰,并专注于重要的事情。可以把它想象成一个服务网格,但用于AI代理。
在构建archgw之前,团队成员曾在Lyft构建Envoy [2],在AWS构建API网关,在微软研究院研究专门的NLP模型,并在Meta从事安全方面的工作。archgw的诞生源于这样的信念:基于规则的、单用途的工具,用于处理弹性、处理和路由提示的工作,应该转移到代理的专用基础设施层,但建立在经过实战检验的Envoy Proxy基础上。
archgw的智能来自于我们快速的特定任务LLM [3],它可以处理代理路由和移交、安全措施和基于偏好的智能LLM调用等。以下是关于这个开源项目的一些额外细节。archgw是用Rust编写的,请求路径主要有三个部分:
* 监听器子系统,处理下游(入口)和上游(出口)请求处理。
* 提示处理程序子系统。这是archgw通过其prompt\_guard钩子来决定传入请求的安全性,并通过其prompt\_target原语来识别将对话转发到哪里。
* 模型服务子系统是托管在archgw中构建的所有轻量级LLM的接口,并提供了一个框架,用于检测这些模型的幻觉等。
我们很喜欢构建这个开源项目,并且我们相信这个基础设施原语将帮助开发人员更快、更安全、更个性化地构建代理,而无需进行所有手动提示工程和系统集成工作。我们希望邀请其他开发人员使用和改进Arch。请试一试,并在[这里](https://discord.com/channels/1292630766827737088/12926307682)或我们的Discord频道 [4]留下反馈。
这里有一个关于该项目实际运行的快速演示 [5]。你可以在[这里](https://docs.archgw.com/)查看我们的公共文档 [6]。我们的模型也可以在[这里](https://huggingface.co/katanemo)找到 [7]。
[1] <https://github.com/katanemo/archgw>
[2] <https://www.envoyproxy.io/>
[3] <https://huggingface.co/collections/katanemo/arch-function-66>
[4] <https://discord.com/channels/1292630766827737088/12926307682>
[5] <https://www.youtube.com/watch?v=I4Lbhr-NNXk>
[6] <https://docs.archgw.com/>
[7] <https://huggingface.co/katanemo>
查看原文
Hey HN!<p>This is Adil, Salman and Jose and and we’re behind archgw [1]. An intelligent proxy server designed as an edge and AI gateway for agents - one that natively know how to handle prompts, not just network traffic. We’ve made several sweeping changes so sharing the project again.<p>A bit of background on why we’ve built this project. Building AI agent demos is easy, but to create something production-ready there is a lot of repeat low-level plumbing work that everyone is doing. You’re applying guardrails to make sure unsafe or off-topic requests don’t get through. You’re clarifying vague input so agents don’t make mistakes. You’re routing prompts to the right expert agent based on context or task type. You’re writing integration code to quickly and safely add support for new LLMs. And every time a new framework hits the market or is updated, you’re validating or re-implementing that same logic—again and again.<p>Putting all the low-level plumbing code in a framework gets messy to manage, harder to update and scale. Low-level work isn't business logic. That’s why we built archgw - an intelligent proxy server that handles prompts during ingress and egress and offers several related capabilities from a single software service. It lives outside your app runtime, so you can keep your business logic clean and focus on what matters. Think of it like a service mesh, but for AI agents.<p>Prior to building archgw, the team spent time building Envoy [2] at Lyft, API Gateway at AWS, specialized NLP models at Microsoft Research and worked on safety at Meta. archgw was born out of the belief that rule-based, single-purpose tools that handle the work around resiliency, processing and routing prompts should move into a dedicated infrastructure layer for agents, but built on the battle-tested foundational of Envoy Proxy.<p>The intelligence in archgw comes from our fast Task-specific LLMs [3] that can handle things like agent routing and hand off, guardrails and preference-based intelligent LLM calling. Here are some additional details about the open source project. archgw is written in rust, and the request path has three main parts:<p>* Listener subsystem which handles downstream (ingress) and upstream (egress) request processing.
* Prompt handler subsystem. This is where archgw makes decisions on the safety of the incoming request via its prompt_guard hooks and identifies where to forward the conversation to via its prompt_target primitive.
* Model serving subsystem is the interface that hosts all the lightweight LLMs engineered in archgw and offers a framework for things like hallucination detection of our these models<p>We loved building this open source project, and our belief is that this infra primitive would help developers build faster, safer and more personalized agents without all the manual prompt engineering and systems integration work needed to get there. We hope to invite other developers to use and improve Arch. Please give it a shot and leave feedback here, or at our discord channel [4]
Also here is a quick demo of the project in action [5]. You can check out our public docs here at [6]. Our models are also available here [7].<p>[1] <a href="https://github.com/katanemo/archgw">https://github.com/katanemo/archgw</a>
[2] <a href="https://www.envoyproxy.io/" rel="nofollow">https://www.envoyproxy.io/</a>
[3] <a href="https://huggingface.co/collections/katanemo/arch-function-66" rel="nofollow">https://huggingface.co/collections/katanemo/arch-function-66</a>...
[4] <a href="https://discord.com/channels/1292630766827737088/12926307682" rel="nofollow">https://discord.com/channels/1292630766827737088/12926307682</a>...
[5] <a href="https://www.youtube.com/watch?v=I4Lbhr-NNXk" rel="nofollow">https://www.youtube.com/watch?v=I4Lbhr-NNXk</a>
[6] <a href="https://docs.archgw.com/" rel="nofollow">https://docs.archgw.com/</a>
[7] <a href="https://huggingface.co/katanemo" rel="nofollow">https://huggingface.co/katanemo</a>