HackerNews中文版

我最近一直在使用 wandb，也研究过 neptune.ai 和一些开源替代方案，但我一直觉得协作和版本控制（例如，将代码快照与训练运行关联等）很笨拙。我还认为，如果能对我的长时间运行进行某种监控，以便在满足特定条件时向我发出警报，甚至能够远程停止或重启带有超参数修改的运行（采取潜在的智能体行动），比如通过手机，那就太好了。我很好奇大家在使用这些（以及类似的）AI 开发者平台/可观测性层方面的经验是什么，以及您发现了现有解决方案的哪些不足或抱怨（如果有的话）。我发现研究过程非常痛苦，想知道这是否只是我个人的问题。

查看原文

I’ve been using wandb quite a bit and looked into neptune ai and some open source alternatives, but I’ve always felt that collaboration and version control (e.g. associating code snapshots with training runs etc) is clunky. I was also thinking it’d be nice to have some kind of monitoring on my longer runs to alert me on certain criteria, or even be able to stop or restart a run with hyperparam modifications remotely (take potentially agentic actions), like from my phone.<p>I was curious what all of your experiences have been with these (and similar) AI developer platforms / observability layers and what you’ve found lacking or gripes you have with the existing solutions (if anything). I've found the research process extremely painful and was wondering if this was just me.

研究工具体验