HackerNews中文版

背景：在 RudderStack，我曾成功地使用 Postgres 处理事件流用例，并扩展到每秒 10 万个事件（注：选择 Postgres 而非 Kafka 有充分的理由）。尽管如此，我们仍在继续探索优化机会。因此，我和我的团队开始尝试 Pulsar（仅用于我们系统的一部分——特别是数据摄取）。我们尝试使用 Apache Pulsar 进行数据摄取，而不是为每个客户设置专用的 Postgres 数据库（一个客户可以有 1 个以上的 Postgres 数据库，它们都将是主节点，无法共享数据，每次进行扩展操作时都需要手动迁移数据）。现在使用 Pulsar 已经有一段时间了，我觉得我可以分享一些关于用 Pulsar 替换基于 Postgres 的流解决方案的经验，并希望从您的意见/见解中学习。 ---- 我喜欢 Pulsar 的地方： 1. 租户隔离很可靠，自动负载均衡运行良好：到目前为止，我们还没有遇到过一个活跃的租户影响其他租户的情况。我们使用同一个集群来摄取所有客户的数据（按区域划分，一个在美国，一个在欧盟）。多租户与集群自动伸缩相结合，帮助我们控制了成本。 2. 不再有单点故障（数据在 bookie 之间复制）：数据现在至少在两个 bookie 中复制。这使我们在数据丢失方面更加可靠。 3. 维护更容易：不再有单一主节点的限制，这简化了许多基础设施维护（想象一下必须将 Postgres pod 移动到不同的 EC2 节点，这可能导致停机）。 ---- Pulsar 的痛点： 1. StreamNative 的许可成本很高 2. 多可用区 + 复制导致网络成本显着增加 3. 学习曲线比预期的更陡峭，而且调试也更复杂 ---- 很想听听您使用 Postgres/Pulsar 的经验，以及对这种方法/挑战的任何意见或见解。我希望这次交流能帮助社区中的其他人，请随时问我任何问题。

查看原文

Background: At RudderStack, I had been successfully using Postgres for the event streaming use case, scaled to 100k events/sec (note: there were good reasons to choose Postgres over Kafka). Nevertheless, we continue to further explore opportunities to optimize. So I and my team started experimenting with Pulsar (only for the parts of our system - data ingestion specifically). We experimented with Apache Pulsar for ingesting data vs having dedicated Postgres databases per customer (one customer can have 1+ Postgres databases, they would be all master nodes with no ability to share data which would need to be manually migrated each time a scaling operation happens).Now that it's been quite some time using Pulsar, I feel that I can share some notes about my experience in replacing postgres-based streaming solutions with Pulsar and hopefully learn from your opinions/insights.----What I liked about Pulsar:1. Tenant isolation is solid, auto load balancing works well: We haven't experienced so far a chatty tenant affecting others. We use the same cluster to ingest the data of all our customers (per region, one in US, one in EU). MultiTenancy along with cluster auto-scaling allowed us to contain costs.2. No more single points of failure (data replicated across bookies): Data is replicated in at least two bookies now. This made us a lot more reliable when it comes to data loss.3. Maintenance is easier: No single master constraint anymore, this simplified a lot of the infra maintenance (imagine having to move a Postgres pod into a different EC2 node, it could lead to downtime).----What's painful about Pulsar:1. StreamNative licensing costs were significant2. Network costs considerably increased with multi-AZ + replication3. Learning curve was steeper than expected, also it was more complex to debug----Would love to hear your experience with Postgres/Pulsar, any opinions or insights on the approach/challenges. I hope this dialogue helps others in the community, feel free to ask me anything.

我使用 Apache Pulsar 解决 PostgreSQL 多租户痛点的经验