HackerNews中文版

想知道像 Kafka（或其他替代方案）这样的事件驱动技术是如何融入大型 LLM 提供商的后端和/或基础设施的。我主要想到的问题有： 1. 大型 LLM 提供商如何处理训练数据、评估结果和人类反馈的流程？这些是通过事件流（如 Kafka）进行实时处理，还是更多地依赖批处理和传统的 ETL 管道？ 2. 对于具有依赖关系的复杂 ML 管道（例如，数据摄取 -> 预处理 -> 训练 -> 评估 -> 部署），他们是否使用事件驱动的编排，其中每个阶段发布一些完成事件，或者他们是否使用传统的流程编排器，如 Airflow，并采用基于轮询的依赖关系管理？ 3. 他们如何处理实时性能监控和安全信号？这些是能够触发即时响应（如模型回滚）的事件驱动系统，还是主要进行批处理分析，并有一些延迟的反应？我基本上是想了解事件驱动范式在现代 AI 基础设施中的应用程度，如果有人正在（或曾经）从事这方面的工作，我很乐意听取任何高层次的见解。

查看原文

Curious how some event-driven technologies like Kafka (or alternatives) fit into the backend and/or infrastructure of large LLM providers.Some of the questions I have in mind are more:1. How do large LLM providers handle the flow of training data, evaluation results and human feedback? Are these managed through event streams (like Kafka) for real-time processing or do they rely more on batch processing and traditional ETL pipelines?2. For complex ML pipelines with deps (eg. data ingestion -> preprocessing -> training -> evaluation -> deployment), do they use event-driven orchestration where each stage publishes some completion events or do they use traditional workflow orchestrators like Airflow with polling-based dependency management?3. How do they handle real-time performance monitoring and safety signals? Are these event-driven systems that can trigger immediate responses (like model rollbacks) or are they primarily batch analytics with some delayed reactions?I'm basically trying to understand how far the event-driven paradigm fits in modern AI infra and I would love any high-level insights if someone is (or has been) working with it.

Ask HN: Kafka 或事件驱动系统在 LLM 基础设施中如何应用？