HackerNews中文版

大家好，我是 Leaping AI 的 Arkadiy (<a href="https://leapingai.com">https://leapingai.com</a>)。Leaping 让你以多阶段、类似图的形式构建语音 AI 助手，这使得测试和改进变得更容易。通过评估通话的每个阶段，我们可以将错误和回归追溯到特定阶段。然后，我们自主地改变该阶段的提示并进行 A/B 测试，从而使助手能够随着时间的推移自我改进。你可以在 <a href="https://leapingai.com">https://leapingai.com</a> 上直接与我们的一个机器人对话，并且有一个演示视频，网址是 <a href="https://www.youtube.com/watch?v=xSajXYJmxW4" rel="nofollow">https://www.youtube.com/watch?v=xSajXYJmxW4</a>。大型公司对于让 AI 开始接听他们的电话犹豫不决——这项技术勉强可用，但通常效果不佳。如果他们真的冒险一试，往往需要花费数月的时间来调整单个用例的提示，有时甚至最终从未发布语音机器人。这个问题是双方面的：用纯粹的语言来明确指定机器人应该如何行为并非易事，并且要确保 LLM 始终按照你期望的方式执行你的指令也很繁琐。现有的语音 AI 解决方案对于复杂的用例来说设置起来很痛苦。它们需要数月的时间来提示所有边缘情况，然后才能上线，然后还需要数月的时间来监控和改进提示。我们通过运行一个持续的分析 + 测试循环，比人类提示工程师做得更好，而且速度更快。我们的技术大致分为三个子组件：核心库、语音服务器和自我改进逻辑。核心库对多阶段（类似于 n8n）语音助手进行建模和执行。对于语音服务器，我们使用可靠的级联方式 STT->LLM->TTS。我们试用了语音到语音模型，尽管与它们对话感觉很好，但函数调用性能却出乎意料地差很多，所以我们仍在等待它们变得更好。自我改进的工作原理是首先获取对话指标和评估结果，以产生“反馈”，即关于如何改进语音助手设置的具体想法。在收集到足够的反馈后，我们触发一个专门的自我改进助手的运行。它是一个类似光标的 AI，可以访问各种工具来更改主要的语音助手。它可以重写提示、配置一个阶段以使用总结的对话而不是完整的对话等等。每次迭代都会产生助手的快照，使我们能够将一小部分流量路由到它，并在一切正常的情况下将其推广到生产环境。这个循环可以设置为在没有任何人工干预的情况下运行，从而使助手能够自我改进。Leaping 与用例无关，但我们目前专注于入站客户支持（旅游、零售、房地产等）和潜在客户预筛选（医疗保险、家庭服务、效果营销），因为我们在这些领域有很多成功案例。我们最初在德国起步，因为我们当时在大学，但最初的增长具有挑战性。我们决定立即瞄准企业客户，他们表现出不愿意采用语音 AI 作为他们公司的“门面”。此外，对于每天有数千个电话的企业来说，监控所有电话并手动调整助手是不可行的。为了解决他们非常合理的担忧，我们投入了所有精力来提高可靠性——并且仍然没有提供自助访问，这也是我们还没有固定价格的原因之一。（此外，对于一些客户，我们有基于结果的定价，即对于没有转化潜在客户的电话，你无需支付任何费用，只需支付那些转化的电话。）自从我们进入 YC 并搬到美国后，事情开始加速发展，但如果你试图向大型企业销售，这里也存在谨慎情绪。我们相信，做好评估、模拟和 A/B 测试是我们的竞争优势，并将使我们能够解决大型、敏感的用例。我们很乐意听取你的想法和反馈！

查看原文

Hey HN, I'm Arkadiy from Leaping AI (<a href="https://leapingai.com">https://leapingai.com</a>). Leaping lets you build voice AI agents in a multi-stage, graph-like format that makes testing and improvement much easier. By evaluating each stage of a call, we can trace errors and regressions to a particular stage. Then we autonomously vary the prompt for that stage and A/B test it, allowing agents to self-improve over time.You can talk to one of our bots directly at <a href="https://leapingai.com">https://leapingai.com</a>, and there’s a demo video at <a href="https://www.youtube.com/watch?v=xSajXYJmxW4" rel="nofollow">https://www.youtube.com/watch?v=xSajXYJmxW4</a>.Large companies are understandably reluctant to have AI start picking up their phone calls—the technology kind of works, but often not very well. If they do take the plunge, they often end up spending months tuning the prompts for just one use-case, and sometimes never even end up releasing the voice bot.The problem is two-sided: it's non-trivial to specify the exact way a bot should behave using plain language, and it's tedious to ensure the LLM always follows your instructions the way you intended them.Existing voice AI solutions are a pain to set up for complex use cases. They require months of prompting all edge cases before going live, and then months of monitoring and improving prompting afterwards. We do that better than human prompters, and much faster, by running a continuous analysis + testing loop.Our tech is roughly divided into three subcomponents: core library, voice server, and self-improvement logic. Core library models and executes the multi-stage (think n8n-style) voice agents. For the voice server we are using the ol’ reliable cascading way of STT->LLM->TTS. We tried out the voice-to-voice models, and although they felt really great to talk to, function-calling performance was expectedly much worse, so we are still waiting for them to get better.The self-improvement works by first taking conversation metrics and evaluation results to produce ‘feedback’, i.e. specific ideas how the voice agent setup could be improved. After enough feedback is collected, we trigger a run of a specialized self-improvement agent. It is a cursor-style AI with access to various tools that changes the main voice agent. It can rewrite prompts, configure a stage to use a summarized conversation instead of a full one, and more. Each iteration produces a new snapshot of the agent, enabling us to route a small part of the traffic to it and promote it to production if things look ok. This loop can be set to run without any human involvement, thus making agents self-improve.Leaping is use-case agnostic, but we currently focus on inbound customer support (travel, retail, real estate, etc.) and lead pre-qualification (medicare, home services, performance marketing) since we have a lot of success stories there.We started out in Germany since that’s where we were in university, but initially growth was challenging. We decided to target enterprise customers right away and they showed reluctance to adopt voice AI as the front-door ‘face’ of their company. Additionally, for an enterprise with thousands of calls daily, it is infeasible to monitor all the calls and tune agents manually. To address their very valid concerns, we put all effort into reliability—and still haven’t gotten around to offering self-serve access, which is one reason we don’t have fixed pricing yet. (Also, with some clients we have outcome-based pricing, i.e. you pay nothing for calls that didn't convert a lead, only the ones that did.)Things picked up momentum ever since we got into YC and moved to the US, but the cautious sentiment is also present here if you try to sell to big enterprises. We believe that doing evals, simulation, and A/B testing really really well is our competitive edge and what will enable us to solve large, sensitive use cases.We’d love to hear your thoughts and feedback!

Launch HN: Leaping (YC W25) – 自我提升的语音AI