HackerNews中文版

我们分享了 KernelEvolve，这是一个我们在 Meta 构建的智能体系统，用于在异构 AI 加速器上自动生成和演进高性能内核。其核心动机是，现代 AI 堆栈越来越依赖于手动优化的内核（GEMM、注意力机制、规约、融合运算），但为每个硬件目标（NVIDIA GPU、AMD GPU、MTIA 等定制加速器）编写和调整内核并不可扩展。KernelEvolve 将内核编程视为一个搜索 + 演进问题：• LLM 生成候选内核（例如，类似 Triton 的代码） • 内核在真实硬件上编译、基准测试和验证 • 性能反馈用于在多次迭代中演进更好的变体 • 系统跨大型集群和多种加速器类型扩展评估与一次性代码生成不同，KernelEvolve 使用闭环、硬件在环的反馈持续改进内核，并且可以发现媲美或超越专家编写代码的非显而易见的优化。在论文中，我们描述了：• 智能体架构和搜索空间设计 • 我们如何跨异构加速器高效地扩展内核评估 • 展示超越手动调整基线的性能提升的案例研究 • 从在生产 ML 工作负载中部署该系统获得的实践经验论文 (arXiv)：https://arxiv.org/abs/2512.23236 (66 页)LinkedIn：https://www.linkedin.com/posts/gangliao_excited-to-share-our-recent-work-on-kernelevolve-activity-7411781675740897280-AQth?utm_source=share&utm_medium=member_desktop&rcm=ACoAAAzsrfsBRed-BvPAGqq9FgvVZ-v6F-sG4SM我们欢迎从事编译器、内核、ML 系统或代码生成智能体方法的人们提供反馈。

查看原文

We’re sharing KernelEvolve, an agentic system we built at Meta to automatically generate and evolve high-performance kernels across heterogeneous AI accelerators.The core motivation is that modern AI stacks increasingly depend on hand-optimized kernels (GEMM, attention, reductions, fused ops), but writing and tuning them for each hardware target (NVIDIA GPUs, AMD GPUs, custom accelerators like MTIA) does not scale.KernelEvolve treats kernel programming as a search + evolution problem:• An LLM generates candidate kernels (e.g., Triton-like code) • Kernels are compiled, benchmarked, and validated on real hardware • Performance feedback is used to evolve better variants over many iterations • The system scales evaluation across large fleets and multiple accelerator typesUnlike one-shot code generation, KernelEvolve continuously improves kernels using closed-loop, hardware-in-the-loop feedback, and can discover non-obvious optimizations that rival or exceed expert-written code.In the paper we describe:• The agent architecture and search space design • How we scale kernel evaluation efficiently across heterogeneous accelerators • Case studies showing performance gains over hand-tuned baselines • Practical lessons from deploying this system in production ML workloadsPaper (arXiv): https://arxiv.org/abs/2512.23236 (66 pages)LinkedIn: https://www.linkedin.com/posts/gangliao_excited-to-share-our-recent-work-on-kernelevolve-activity-7411781675740897280-AQth?utm_source=share&utm_medium=member_desktop&rcm=ACoAAAzsrfsBRed-BvPAGqq9FgvVZ-v6F-sG4SMWe’d love feedback from folks working on compilers, kernels, ML systems, or agentic approaches to code generation.

KernelEvolve：面向异构 AI 加速器的智能内核编码（Meta）