Fighting web scrapers bots
Overview
This project aims at finding new ways to detect and mitigate web scrapers attacking e-commerce websites such as Amadeus IT Group, company with which she closely collaborates. An ongoing battle has been running for more than a decade between e-commerce websites owners and web scrapers. Whenever one party finds a new technique to prevail, the other one comes up with a solution to defeat it.
We are currently studying scrapers taking advantage of Residential IP Proxies (RESIP), the latest sophistication on the attackers side. RESIP parties enable scrapers to have access, for a fee, to a vast network of residential devices that can be used as exit points of their requests. In this way, e-commerce websites receive requests from IP addresses that have been used also by legitimate users and the anti-bot mechanisms are less incline to categorize these connections as a scraping ones.
We have recently developed a new detection technique for this type of connections based on the comparison of the Round Trip Times (RTTs) at different layers. In particular the RTT of the TCP layer and the RTT of the TLS one differ when the connection passes through a RESIP, while they present similar values in case of a direct connection. Elisa developed and launched a large experiment that allowed to successfully evaluate the technique and to build a dataset of the connections. We are currently digging in this data to find new insights and correlations about scrapers taking advantage of RESIP and characterize this ecosystem.