HN 提问:我是否通过用户代理日志被投放了 ARG 广告?
2 分•作者: SpecialistK•3 天前
我正在查看我的未命名反向代理和 CDN 服务日志。爬虫机器人大军像惹怒了它们一样攻击我的 PHP 应用程序,所以我正在查看哪些奇怪的用户代理字符串被允许连接。有“Sogou”和“meta-webindexer”,以及少量来自“SleepBot/1.0”的请求。
SleepBot 是什么?
ASN 是 Google,用户代理字符串是:“Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; SleepBot/1.0; +http://sleepbot com/) Chrome/131.0.0.0 Safari/537.36” [已编辑以使链接不可点击]
所以我访问了这个网站。看起来像一个有趣的技术和氛围音乐人士的主页,他还在运行一个 Shoutcast 在线广播流,但除此之外,他已经五年没有出现在线上了。Wayback Machine 显示十多年来变化甚少。但是简历链接会带到一个具有不同 URL 和用户名的 GitHub 账户,该账户在今年三月份报告了一个问题。事情比这更复杂。
这是怎么回事?是 Google 或相关员工在浏览网页时运行个人抓取器或自定义用户代理字符串吗?是有人打错了字?还是某种奇怪的安全游戏/ARG,而我就是那个上钩的傻瓜?
查看原文
I'm here looking through logs on my unnamed reverse proxy and CDN service. The crawler bot swarm has been hitting my PHP application like I've upset them so I'm seeing which weird user agent strings are being allowed to connect. There's "Sogou" and "meta-webindexer" and a small number of requests from "SleepBot/1.0"<p>What's SleepBot?<p>The ASN is Google and the UA string is: "Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; SleepBot/1.0; +http //sleepbot com/) Chrome/131.0.0.0 Safari/537.36" [edited to make link non-clickable]<p>So I visit the site. And it looks like the homepage of an interesting tech and ambient music guy who is still running a Shoutcast online radio stream but otherwise hasn't been seen online in 5 years. The Wayback Machine shows few changes in over a decade. But the resume link brings up a GitHub account with a different URL and username which reported 1 issue in March of this year. It goes deeper.<p>What's going on? Is a Google or adjacent employee running a personal scraper or just custom UA string while browsing the web? Did someone make a typo? Or is it some kind of weird security game / ARG and I'm the sap who's taken the bait?