HackerNews中文版

我理解使用 Chat、Cursor 和 Claude Code 等工具进行软件开发，很可能会为这些 LLM 提供训练数据，以帮助它们在编码方面变得更好（讽刺的是，我可能正在为让自己被淘汰做贡献……）但我很好奇实际的运作机制：这个反馈循环到底是如何工作的？当我接受、拒绝或修改这些模型生成的代码时，这个信号是否会直接反馈到训练中？我并不反对这种做法，只是真心想了解“香肠”是如何制作出来的。

查看原文

I understand that using tools like Chat, Cursor, and Claude Code for software development is likely providing training data to help these LLMs get better at coding (the irony isn't lost on me that I might be contributing to making myself obsolete...)<p>But I'm curious about the actual mechanics: How exactly does this feedback loop work? When I accept, reject, or modify the code that these models spit out, is that signal fed directly back into training?<p>Not necessarily against this, just genuinely curious about how the sausage is made.

Ask HN: 我使用 LLM 的方式如何训练底层模型？