HackerNews中文版

我正在探索一个开发工具的想法，它可以帮助你对提示词进行 A/B 测试，但我不太确定是否有这个需求。你可以在一个 Web UI 中编写和管理你的提示词版本，然后进行 A/B 测试，并根据你定义的指标查看结果。例如，对于一个编写冷启动邮件的机器人，你可以验证你的系统提示词的 v1 或 v2 版本是否能带来更好的回复率。目前有人在做类似的事情吗？或者有人想要这样的工具吗？

查看原文

I'm exploring a dev tool idea that helps you A/B test your prompts, but I'm not sure if there's a need for it. You'd be able to write and version your prompts in a web UI, then A/B test them and see results with metrics you define.<p>So for example, with a bot that writes cold outbound emails, you can verify whether v1 or v2 of your system prompt results in a better reply rate.<p>Does anybody currently do something like this or want something like this?

Ask HN: 你们会 A/B 测试你们的 LLM 提示词吗？