告诉 HN:谷歌将其现有微调模型的延迟增加了 5 倍

7作者: deaux7 个月前
从五天前开始,我们微调的 2.5 Flash 模型的延迟突然增加了 5 倍。对于不太熟悉的人来说,这种微调模型通常用于以更低的延迟和成本,在特定任务上接近大型模型的性能。这意味着它们通常用于需要大量使用并希望快速响应用户的实时生产场景。否则,微调通常是不值得的。许多人至少花费几千美元来微调一个模型以完成一个这样的任务。 五天前,谷歌向世界发布了 Nano Banana Pro (Gemini 3.0 图像预览)。从五天前开始,我们现有的微调模型的延迟突然增加了五倍。我们与其他也使用微调 2.5 Flash 模型的初创公司进行了交谈,他们也遇到了同样的问题,即使是不同地区的初创公司也是如此。这显然对我们所有的产品都产生了巨大的影响。 从谷歌方面来看,除了沉默什么都没有,这还是在付费支持的情况下。对最初的支持工单的回复是要求提供已经在该工单中提供或显而易见的基本信息。从那以后,已经超过 48 小时没有任何回应。 当然,时间上的巧合也可能纯属巧合——尽管我们以前从未见过任何这样的延迟不稳定情况——但我们都可以看到最可能发生的事情:Nano Banana Pro 和 Gemini 3 预览版消耗了大量的计算资源,他们只是为了这些而牺牲了微调模型的输出。 在此之后,不可能认真对待他们用于商业用途,谁知道他们下次会做什么。 尽管存在各种问题,OpenAI 却一直是稳定性的堡垒,尽管它在所有前沿模型提供商中最注重 B2C。 谷歌的 Vertex 声称一切都以企业为中心,然后破坏其商业客户的产品,以便消费者更快地获得他们的吉卜力图像 1%。 他们肯定收到了很多关于此事的工单,并且考虑到谷歌的工程能力,他们一定有自动监控系统,可以立即捕捉到如此巨大的延迟增加。 临时中断是可以理解的,并且在任何地方都会发生,最近的 AWS 和 Cloudflare 也是如此,但 5 天以上——即使他们修复了它——5 倍的延迟实际上相当于服务中断 5 天以上。 我发布这篇文章主要是为了警告其他初创公司,以后不要依赖谷歌 Vertex 来满足面向用户的模型需求。
查看原文
Since 5 days ago, the latency of our Finetuned 2.5 Flash models has suddenly jumped by 5x. For those less familiar, such finetuned models are often used to get close to the performance of a big model at one specific task with much less latency and cost. This means they&#x27;re usually used for realtime, production use cases that see a lot of use and where you want to respond to the user quickly. Otherwise, finetuning generally isn&#x27;t worth it. Many spend a few thousand dollars (at a minimum) on finetuning a model for one such task.<p>Five days ago, Google released Nano Banana Pro (Gemini 3.0 Image Preview) to the world. And since five days ago, the latency of our existing finetuned models has suddenly quintupled. We&#x27;ve talked with other startups who also make use of finetuned 2.5 Flash models, and they&#x27;re seeing the exact same, even those in different regions. Obviously this has a big impact on all of our products.<p>From Google&#x27;s side, nothing but silence, and this is talking about paid support. The reply to the initial support ticket is a request for basic information that has already been provided in that ticket or is trivially obvious. Since then, it&#x27;s been more than 48 hours of nothingness.<p>Of course the timing could be a pure coincidence - though we&#x27;ve never seen any such latency instability before - but we can all see what&#x27;s most likely here; Nano Banana Pro and Gemini 3 Preview consuming a huge amount of compute, and they&#x27;re simply sacrificing finetuned model output for those. It&#x27;s impossible to take them seriously for business use after this, who knows what they&#x27;ll do next time. For all their faults, OpenAI have been a bastion of stability, despite being the most B2C-focused of all the frontier model providers. Google with Vertex claims to be all about enterprise and then breaks product of their business customers to get consumers their Ghibli images 1% faster. They&#x27;ve surely gotten plenty of tickets about this, and given Google&#x27;s engineering, they must have automated monitoring that catches such a huge latency increase immediately. Temporary outages are understandable and happen everywhere, see AWS and Cloudflare recently, but 5+ days - if they even fix it - of 5x latency is effectively a 5+ day outage of a service.<p>I&#x27;m posting this mostly as a warning to other startups here to not rely on Google Vertex for user-facing model needs going forward.