HackerNews中文版

大家好，HN，我目前正在开发 Aaptics，这是一个帮助创始人撰写内容的工具。最大的工程挑战不是基础设施，而是让底层模型不再听起来像个公司机器人（例如，避免使用“深入研究”、“证明”或“在当今快节奏的环境中”等词语）。目前，我的流程使用自定义的 RAG 设置，它会摄取用户过去的写作内容，并结合大量的负面提示和少样本示例。然而，模型偶尔还是会陷入那种可识别的“ChatGPT 语气”。对于那些正在构建 AI 应用程序的人来说，你们是如何量化评估输出的“人性化”程度的？你们使用 LLM 作为评判框架吗？依赖于特定的温度/top_p 调整吗？还是对某些 n-gram 进行硬编码惩罚？我希望在四月中旬发布之前完成这个流程，并感谢那些已经在生产中解决这个问题的人提供的任何见解。aaptics.in/waitlist

查看原文

Hi HN,I’m currently building Aaptics, a tool designed to help founders draft content. The biggest engineering challenge hasn't been the infrastructure, but getting the underlying models to stop sounding like a corporate robot (e.g., stopping it from using words like "delve", "testament", or "in today's fast-paced landscape").Right now, my pipeline uses a custom RAG setup that ingests a user's past writing, combined with heavy negative-prompting and few-shot examples. However, the model still occasionally slips into that recognizable "ChatGPT tone."For those of you building AI applications, how are you quantitatively evaluating the "humanness" of your outputs?Are you using LLM-as-a-judge frameworks?Relying on specific temperature/top_p tweaking?Or hardcoding penalizations for certain n-grams?I'm aiming to finalize this pipeline before our mid-April launch and would appreciate any insights from folks who have solved this in production. aaptics.in/waitlist

提问 HN：你如何用编程方法评估一个 LLM 的输出听起来是否“太像 AI”？