More

avastel · 2026-02-21T15:04:36 1771686276

Author here.

I first built FPScanner during my PhD around 2017–2018, when I was doing research on browser fingerprinting and bot detection. After that I did not really maintain it for years.

I recently decided to revive it because things changed a lot. Automation is much easier now. With free automation libraries like Puppeteer/Playwright/Selenium + headless Chrome and cheap proxies, you can build decent bots very quickly. At the same time, open source defensive tooling is still quite limited, or very basic.

To be clear, FPScanner is not trying to be a silver bullet.

It is a small, self-hosted library that focuses on deterministic client-side signals: - webdriver and automation flags - CDP-related artifacts - automation framework markers (Selenium, Playwright, etc.) - JS cross-context inconsistencies: main JS context, iframes and workers

It also generates a JA4-inspired fingerprint ID for clustering sessions, and includes encrypted payload + simple anti-replay protections.

There is no ML here, no "AI detection", and no claim to block 100% of bots. The idea is just to expose strong, explainable signals and make automation a bit more expensive. I assume attackers can read the source code.

I tested it on different devices and browsers to avoid JS errors and obvious false positives, but I’m sure there are edge cases. If you try it on less common browsers or hardened setups and it breaks, please let me know or open an issue.

Happy to answer questions or discuss design choices / limitations.

avastel · 2026-02-05T21:27:16 1770326836

I wrote a blog post recently about the technique used by LinkedIn to do extension probing, as well as other ways to do it with less side effects

https://blog.castle.io/detecting-browser-extensions-for-bot-...

direwolf20 · 2026-02-06T03:32:09 1770348729

Patch Firefox so navigator.webdriver is always false, then remote control it. Seems not easily detectable. You could still watch for fast input patterns...

pests · 2026-02-05T23:01:48 1770332508

Nice write up, definitely exactly this.

avastel · 2026-01-31T18:01:31 1769882491

Since I was also tracking this proxy network as part of my side project, I wrote a short blog post + give access to 16m+ proxy IPs IoCs that belong to this proxy network: https://deviceandbrowserinfo.com/learning_zone/articles/insi...

Note that even after the disruption, I'm still able to route millions of requests/day through IP IDEA's network

avastel · 2026-01-29T10:37:41 1769683061

Since I was also tracking this proxy network as part of my side project, I wrote a short blog post + give access to 16m+ proxy IPs IoCs that belong to this proxy network: https://deviceandbrowserinfo.com/learning_zone/articles/insi...

avastel · 2025-09-08T17:08:31 1757351311

I recently wrote about the limits of these kinds of fingerprinting tests. They tend to overly focus on uniqueness without taking into account stability. Moreover sample size is often really small which tends to artificially make a lot of users unique

https://blog.castle.io/what-browser-fingerprinting-tests-lik...

everdrive · 2025-09-08T17:31:15 1757352675

This is great, and exactly the kind of nuance I almost never see when this topics come up. Thanks for posting this. Far too often, the pro-privacy crowd is much more _upset_ than they are precise, and to the point of your article are spending extra effort without really accomplishing much.

avastel · 2025-08-28T14:03:02 1756389782

(Author here) TBH I don't think the main goal of Google was to make bots undetectable. It was more a side effect of preventing side effects while reviewing errors in the devtools: https://source.chromium.org/chromium/_/chromium/v8/v8/+/e08e...

It was the same when Google released the new headless Chrome a few years ago: https://antoinevastel.com/bot%20detection/2023/02/19/new-hea... It made vanilla/naive bots more realistic/detectable by default.

avastel · 2025-08-26T16:15:17 1756224917

Interesting article. I’ve been curious for a while about how residential proxy IPs are collected too. Many come from shady browser extensions or mobile apps, especially free VPNs (wink wink Hola VPN). People often don’t realize they are turning their device into an exit node.

Some time ago I started to track this as a side project (I work in bot detection and was always surprised by how many residential proxies show up in attacks). It started just out of curiosity. Now I collect proxy IPs, which provider they belong to, and how often they are seen. I also publish stats here: https://deviceandbrowserinfo.com/proxy-api/stats/proxy-db-30...

For example, in the last 30 days I saw more than 120K IPs from Comcast and nearly 100K from AT&T.

I also maintain an open IP (ranges) blocklist, mostly effective against data center and ISP proxies. Residential IPs are harder since they are often shared with legit users: https://github.com/antoinevastel/avastel-bot-ips-lists

Even if you can’t block all of them, tracking volume and reuse gives useful signal.

chatmasta · 2025-08-26T17:00:17 1756227617

Hola/Luminati rebranded as “Bright Data” and now pays mobile developers to embed their proxy SDK into mobile apps. Apple and Google should put a stop to this practice.

garbthetill · 2025-08-26T17:12:12 1756228332

they have been paying devs for a good bit now

garbthetill · 2025-08-26T17:16:06 1756228566

hola vpn is such an interesting case of a money printer, host a simple vpn and present it as free, give the users datacenter ips that are easy to detect. meanwhile you get their precious residential ip's and print millions a month

ignoramous · 2025-08-26T23:00:20 1756249220

The recent feud between founders is bound to reveal more interesting aspects of their business: https://www.haaretz.com/israel-news/tech-news/2021-07-01/ty-... / https://archive.vn/o5ujG

garbthetill · 2025-08-26T23:33:28 1756251208

Thanks for the great read, so much to unpack from that article the click fraud stuff is to be expected, keeping track of everything that goes through their proxy is also expected, but copying files is crazy and this could unravel to a class action

but with that being said, if you are doing something shady/grey area to get ahead you best give everyone a cut of the pie, especially your blood brother

arewethereyeta · 2025-08-26T19:14:25 1756235665

I would add that your chances of having a proxy node increase by 1% with each free app you install these days. We catch them easily at visitorquery.com but the residential proxy business in rampant and probably half are infected devices, android TVs, routers and, ofc, mobile apps.

antonvs · 2025-08-26T22:49:49 1756248589

> I work in bot detection and was always surprised by how many residential proxies show up in attacks

Why is that surprising? It seems like it'd be one of the major vectors.

avastel · 2025-06-17T07:05:20 1750143920

Author here: I work in bot detection, and wrote this post to explain why privacy-conscious users (VPNs, Brave, LibreWolf, etc.) often get flagged or blocked by anti-bot systems.

I’ve seen a lot of frustration in threads here, so I wanted to offer a technical perspective on why these false positives happen, and how detection systems interpret signals from non-mainstream setups.

avastel · on June 4, 2025

Author here: There’ve been a lot of HN threads lately about scraping, especially in the context of AI, and with them, a fair amount of confusion about what actually works to stop bots on high-profile websites.

This post uses TikTok’s obfuscated JavaScript VM (recently discussed on HN) as a case study to walk through what modern bot defenses look like in practice. It’s not spyware, it’s an anti-bot measure designed to make life harder for HTTP clients and non-browser automation.

Key points:

- HTTP-based bots skip JS, so TikTok hides detection logic inside a JavaScript VM interpreter

- The VM computes signals like webdriver checks and canvas-based fingerprints

- Obfuscating this logic in a custom VM makes it significantly harder to reimplement outside the browser (and so to scale an attack)

The goal isn’t to stop all bots, it’s to push attackers into full browser environments, where detection is more feasible

The post covers why simple solutions like "just require JS" don’t hold up, and why defenders use techniques like VM-based obfuscation to increase attacker cost and reduce replayability.

avastel · on May 28, 2025

Reposting a similar point I made recently about CAPTCHA and scalpers, but it’s even more relevant for scrapers.

PoW can help against basic scrapers or DDoS, but it won’t stop anyone serious. Last week I looked into a Binance CAPTCHA solver that didn’t use a browser at all, just a plain HTTP client. https://blog.castle.io/what-a-binance-captcha-solver-tells-u...

The attacker had fully reverse engineered the signal collection and solved-state flow, including obfuscated parts. They could forge all the expected telemetry.

This kind of setup is pretty standard in bot-heavy environments like ticketing or sneaker drops. Scrapers often do the same to cut costs. CAPTCHA and PoW mostly become signal collection protocols, if those signals aren’t tightly coupled to the actual runtime, they get spoofed.

And regarding PoW: if you try to make it slow enough to hurt bots, you also hurt users on low-end devices. Someone even ported PerimeterX’s PoW to CUDA to accelerate solving: https://github.com/re-jevi/PerimiterXCudaSolver/blob/main/po...