Ask HN: 为什么我的 Node.js 多人游戏在 500 个玩家时,CPU 占用率低却出现卡顿?

3作者: jbryu8 个月前
我正在一台 Hetzner CCX23 x86 云服务器(4 个 vCPU,16GB 内存,80GB 磁盘)上托管一个回合制多人浏览器游戏。后端使用 Node.js 和 Socket.IO 构建,并通过 Docker Swarm 运行。我还使用 Traefik 进行负载均衡。 匹配机制采用轮询分片方法:每个房间始终由同一个后端实例处理,这让我可以将游戏状态保存在内存中,并进行水平扩展,而无需使用 Redis。 问题如下: 在大约 500 个并发玩家分布在约 60 个房间(每个房间最多 8 个玩家)时,我看到 CPU 使用率较低,但事件循环延迟较高。我的游戏中的一个功能是在玩家回合期间打字——每个受限的按键都会实时广播给其他玩家。如果我移除这个逻辑,我可以处理 1000+ 玩家而没有问题。 在我的单服务器上扩展后端实例并没有帮助。我期望每个后端实例的负载更少会有所帮助,但我仍然在 500 个玩家左右达到相同的限制。这向我表明,瓶颈不在 CPU 或应用程序逻辑,而是在堆栈中更深层的东西。但我不知道是什么。 500 个玩家时的一些服务器指标: * CPU:每个核心 25%(根据 htop) * PPS:大约 3000 入 / 大约 3000 出 * 带宽:大约 100KBps 入 / 大约 800KBps 出 对于我的单服务器设置来说,500 个并发玩家可能只是一个现实的上限,还是配置有问题?我知道使用新服务器进行扩展应该可以解决这个问题,但我想先在网上咨询一下,看看我是否遗漏了什么。我刚接触多人游戏架构,所以非常感谢任何见解。
查看原文
I’m hosting a turn-based multiplayer browser game on a single Hetzner CCX23 x86 cloud server (4 vCPU, 16GB RAM, 80GB disk). The backend is built with Node.js and Socket.IO and is run via Docker Swarm. I use also use Traefik for load balancing.<p>Matchmaking uses a round-robin sharding approach: each room is always handled by the same backend instance, letting me keep game state in memory and scale horizontally without Redis.<p>Here’s the issue: At ~500 concurrent players across ~60 rooms (max 8 players&#x2F;room), I see low CPU usage but high event loop lag. One feature in my game is typing during a player&#x27;s turn - each throttled keystroke is broadcast to the other players in real-time. If I remove this logic, I can handle 1000+ players without issue.<p>Scaling out backend instances on my single-server doesn&#x27;t help. I expected less load per backend instance to help, but I still hit the same limit around 500 players. This suggests to me that the bottleneck isn’t CPU or app logic, but something deeper in the stack. But I’m not sure what.<p>Some server metrics at 500 players:<p>- CPU: 25% per core (according to htop)<p>- PPS: ~3000 in &#x2F; ~3000 out<p>- Bandwidth: ~100KBps in &#x2F; ~800KBps out<p>Could 500 concurrent players just be a realistic upper bound for my single-server setup, or is something misconfigured? I know scaling out with new servers should fix the issue, but I wanted to check in with the internet first to see if I&#x27;m missing anything. I’m new to multiplayer architecture so any insight would be greatly appreciated.