I first built FPScanner during my PhD around 2017–2018, when I was doing research on browser fingerprinting and bot detection. After that I did not really maintain it for years.
I recently decided to revive it because things changed a lot. Automation is much easier now. With free automation libraries like Puppeteer/Playwright/Selenium + headless Chrome and cheap proxies, you can build decent bots very quickly. At the same time, open source defensive tooling is still quite limited, or very basic.
To be clear, FPScanner is not trying to be a silver bullet.
It is a small, self-hosted library that focuses on deterministic client-side signals:
- webdriver and automation flags
- CDP-related artifacts
- automation framework markers (Selenium, Playwright, etc.)
- JS cross-context inconsistencies: main JS context, iframes and workers
It also generates a JA4-inspired fingerprint ID for clustering sessions, and includes encrypted payload + simple anti-replay protections.
There is no ML here, no "AI detection", and no claim to block 100% of bots. The idea is just to expose strong, explainable signals and make automation a bit more expensive. I assume attackers can read the source code.
I tested it on different devices and browsers to avoid JS errors and obvious false positives, but I’m sure there are edge cases. If you try it on less common browsers or hardened setups and it breaks, please let me know or open an issue.
Happy to answer questions or discuss design choices / limitations.
Patch Firefox so navigator.webdriver is always false, then remote control it. Seems not easily detectable. You could still watch for fast input patterns...
I recently wrote about the limits of these kinds of fingerprinting tests. They tend to overly focus on uniqueness without taking into account stability. Moreover sample size is often really small which tends to artificially make a lot of users unique
This is great, and exactly the kind of nuance I almost never see when this topics come up. Thanks for posting this. Far too often, the pro-privacy crowd is much more _upset_ than they are precise, and to the point of your article are spending extra effort without really accomplishing much.
Interesting article. I’ve been curious for a while about how residential proxy IPs are collected too. Many come from shady browser extensions or mobile apps, especially free VPNs (wink wink Hola VPN). People often don’t realize they are turning their device into an exit node.
Some time ago I started to track this as a side project (I work in bot detection and was always surprised by how many residential proxies show up in attacks). It started just out of curiosity. Now I collect proxy IPs, which provider they belong to, and how often they are seen. I also publish stats here:
https://deviceandbrowserinfo.com/proxy-api/stats/proxy-db-30...
For example, in the last 30 days I saw more than 120K IPs from Comcast and nearly 100K from AT&T.
I also maintain an open IP (ranges) blocklist, mostly effective against data center and ISP proxies. Residential IPs are harder since they are often shared with legit users:
https://github.com/antoinevastel/avastel-bot-ips-lists
Even if you can’t block all of them, tracking volume and reuse gives useful signal.
Hola/Luminati rebranded as “Bright Data” and now pays mobile developers to embed their proxy SDK into mobile apps. Apple and Google should put a stop to this practice.
hola vpn is such an interesting case of a money printer, host a simple vpn and present it as free, give the users datacenter ips that are easy to detect. meanwhile you get their precious residential ip's and print millions a month
Thanks for the great read, so much to unpack from that article the click fraud stuff is to be expected, keeping track of everything that goes through their proxy is also expected, but copying files is crazy and this could unravel to a class action
but with that being said, if you are doing something shady/grey area to get ahead you best give everyone a cut of the pie, especially your blood brother
I would add that your chances of having a proxy node increase by 1% with each free app you install these days. We catch them easily at visitorquery.com but the residential proxy business in rampant and probably half are infected devices, android TVs, routers and, ofc, mobile apps.
Author here: I work in bot detection, and wrote this post to explain why privacy-conscious users (VPNs, Brave, LibreWolf, etc.) often get flagged or blocked by anti-bot systems.
I’ve seen a lot of frustration in threads here, so I wanted to offer a technical perspective on why these false positives happen, and how detection systems interpret signals from non-mainstream setups.
Author here: There’ve been a lot of HN threads lately about scraping, especially in the context of AI, and with them, a fair amount of confusion about what actually works to stop bots on high-profile websites.
This post uses TikTok’s obfuscated JavaScript VM (recently discussed on HN) as a case study to walk through what modern bot defenses look like in practice. It’s not spyware, it’s an anti-bot measure designed to make life harder for HTTP clients and non-browser automation.
Key points:
- HTTP-based bots skip JS, so TikTok hides detection logic inside a JavaScript VM interpreter
- The VM computes signals like webdriver checks and canvas-based fingerprints
- Obfuscating this logic in a custom VM makes it significantly harder to reimplement outside the browser (and so to scale an attack)
The goal isn’t to stop all bots, it’s to push attackers into full browser environments, where detection is more feasible
The post covers why simple solutions like "just require JS" don’t hold up, and why defenders use techniques like VM-based obfuscation to increase attacker cost and reduce replayability.
The attacker had fully reverse engineered the signal collection and solved-state flow, including obfuscated parts. They could forge all the expected telemetry.
This kind of setup is pretty standard in bot-heavy environments like ticketing or sneaker drops. Scrapers often do the same to cut costs. CAPTCHA and PoW mostly become signal collection protocols, if those signals aren’t tightly coupled to the actual runtime, they get spoofed.
I first built FPScanner during my PhD around 2017–2018, when I was doing research on browser fingerprinting and bot detection. After that I did not really maintain it for years.
I recently decided to revive it because things changed a lot. Automation is much easier now. With free automation libraries like Puppeteer/Playwright/Selenium + headless Chrome and cheap proxies, you can build decent bots very quickly. At the same time, open source defensive tooling is still quite limited, or very basic.
To be clear, FPScanner is not trying to be a silver bullet.
It is a small, self-hosted library that focuses on deterministic client-side signals: - webdriver and automation flags - CDP-related artifacts - automation framework markers (Selenium, Playwright, etc.) - JS cross-context inconsistencies: main JS context, iframes and workers
It also generates a JA4-inspired fingerprint ID for clustering sessions, and includes encrypted payload + simple anti-replay protections.
There is no ML here, no "AI detection", and no claim to block 100% of bots. The idea is just to expose strong, explainable signals and make automation a bit more expensive. I assume attackers can read the source code.
I tested it on different devices and browsers to avoid JS errors and obvious false positives, but I’m sure there are edge cases. If you try it on less common browsers or hardened setups and it breaks, please let me know or open an issue.
Happy to answer questions or discuss design choices / limitations.