轻量级、非侵入式的网站监控方法(运维视角)

2作者: marksugar18 天前
我是一名在 DevOps/SRE 领域工作的 Linux 运维工程师。在过去的几个月里,我一直在业余时间开发一个小的*网站监控*项目: https://inostop.com/en/ 我之前构建的大部分监控和运维工具都是在公司内部使用的。这是我第一次尝试将一个相对完整的工具转化为可供公众使用的东西。 在日常运维中,网站监控通常涉及: - 基础设施监控 - 应用程序/API 监控 - 部分 CDN 监控 这些通常建立在 Prometheus 或 Zabbix 等工具之上,结合日志系统(ELK / OpenObserve)和分布式追踪(OpenTelemetry)。虽然功能强大,但当您只想快速监控网站的可用性时,这种技术栈可能会让人感觉*笨重且过度*。 这促使我尝试一种更简单的方法: - 非侵入式(无需更改代码/Sidecar) - 带外探测以评估网站可用性 - 谨慎的阈值以减少误报 到目前为止,该项目涵盖: - 域名和 TLS 证书监控、Ping、Telnet 检查 - 基本警报阈值和多阶段警报静默以减少警报疲劳 目前仍存在一些挑战: - 网站监控结果的 UX 还有改进空间(后端用 Go 编写)。 - AI 目前仅作为对收集数据的分析层,而不是主动执行实际的网络探测 该项目仍在不断发展(我重写它的次数比我想承认的还要多)。 如果您想试用,有一个抢先体验码 *95f40841e4888668c4d5f7e88506075d*,有效期为 1 个月,主要用于收集早期反馈。 我很乐意听取社区的反馈: - 轻量级、非侵入式的网站监控方法在实践中是否可行? - 是否有更好的模式或架构值得探索? - 如果您是 QA 或测试工程师,我很乐意听取您的想法。
查看原文
I’m a Linux ops engineer working in the DevOps&#x2F;SRE space. and over the past few months, I’ve been working on a small *website monitoring* side project in my spare time: https:&#x2F;&#x2F;inostop.com&#x2F;en&#x2F;<p>Most of the monitoring and ops tools I’ve built before were used internally within companies. This is my first attempt to turn a relatively complete tool into something publicly usable.<p>In day-to-day operations, website monitoring usually involves:<p>- Infrastructure monitoring - Application &#x2F; API monitoring - Partial CDN monitoring<p>These are often built on top of tools like Prometheus or Zabbix, combined with log systems (ELK &#x2F; OpenObserve) and distributed tracing (OpenTelemetry). While powerful, this stack can feel *heavyweight and overkill* when you just want to quickly monitor a website’s availability.<p>That led me to experiment with a simpler approach:<p>- Non-intrusive (no code changes required&#x2F;Sidecar) - Out-of-band probing to estimate website availability - Conservative thresholds to reduce false alarms<p>So far, the project covers:<p>- Domain and TLS certificate monitoring, Ping, Telnet checks - Basic alert thresholds and multi-stage alert silencing to reduce alert fatigue<p>There are still open challenges:<p>- There’s still room to improve the UX of the Website Monitoring results (backend is written in Go).<p>- AI currently works only as an analysis layer on collected data, rather than actively performing real network probes<p>This project is still evolving (I’ve rewritten parts of it more times than I’d like to admit ).<p>If you’d like to try it out, there’s an early access code *95f40841e4888668c4d5f7e88506075d*, valid for 1 months, mainly for collecting early feedback.<p>I’d love to hear feedback from the community:<p>- Does a lightweight, non-intrusive website monitoring approach make sense in practice? - Are there better patterns or architectures worth exploring? - If you’re a QA or test engineer, I’d love to hear your thoughts.