SeRBER Ongoing Projects

fingerprinting web scrappers geolocalisation

Fingerprinting HTTP smuggling

People involved

Khalid Hakami (project lead)
Ilies Benhabbour

The Hypertext Transfer Protocol (HTTP) is the foundation of the World Wide Web, and is used to load webpages and other resources using hypertext links. When clicking on a link, the user’s communicates with a backend server to ask for a webpage. The user expects to be connecting directly to the server delivering the page but, more often than not, its HTTP request is handled first by some sort of proxy machine, either for caching, load balancing or security reasons.

This is an application layer proxy which will interpret the HTTP request and, possibly, modify it by inserting, removing, modifying some headers. It could also have to translate the request and, or, the response from one version of HTTP to another if the initial client and the final server do not speak the same version of the protocol (typically HTTP 2 vs. HTTP 1.1).

This phase exposes a fairly large attack surface and the process of abusing it is collectively know as ‘HTTP smuggling’. In this research project, we aim at leveraging the architecture developed by Ilies Benhabbour in his PhD thesis (see Detecting semi active intra net components) in order to systematically detect the existence of such proxy and fingerprint them in order to test whether or not they are vulnerable to some attacks and, if yes, to which ones.

Detecting semi active intra networks components

People involved

Ilies Benhabbour (project lead)

In theory, clients and servers communicate without having their application data tampered with. This is the famous end-to-end principle introduced by P. Baran in 1960. However, as A. Einstein previously mentioned, "In theory, theory and practice are the same. In practice, they are not” and computer networking is a perfect example of such an affirmation.

A large quantity of devices in reality interfere with data sent across the network. These devices can be used for totally reasonable reasons; performance (load balancers, CDNs), security (proxies, firewalls) etc… However, these devices often referred to as middleboxes break the internet end-to-end assumption. This is for instance the case when institutions take advantage of their own root certificate to monitor encrypted traffic. Other man-in-the-middles may be less benevolent; this is the case when attackers position themselves between a client and a server if they manage to submit corrupted yet valid certificate to the client. This is one of the drawbacks of the so-called public key infrastructure used today on the Internet.

In this project, we aim at proposing a distribute software architecture that would enable two communicating parties to systematically identify any such semi active components that could exist between them. The solution relies on a graphical language that enables us to specify a number tests that can exploit side effects produced by the existence of such component either at the network, transport of application layer.

Related publication

NoPASARAN: a Novel Platform to Analyse Semi Active elements in Routes Across the Network

Fighting web scrapers bots

People involved

Elisa Chiapponi (project leader)

This project aims at finding new ways to detect and mitigate web scrapers attacking e-commerce websites such as Amadeus IT Group, company with which she closely collaborates. An ongoing battle has been running for more than a decade between e-commerce websites owners and web scrapers. Whenever one party finds a new technique to prevail, the other one comes up with a solution to defeat it.

We are currently studying scrapers taking advantage of Residential IP Proxies (RESIP), the latest sophistication on the attackers side. RESIP parties enable scrapers to have access, for a fee, to a vast network of residential devices that can be used as exit points of their requests. In this way, e-commerce websites receive requests from IP addresses that have been used also by legitimate users and the anti-bot mechanisms are less incline to categorize these connections as a scraping ones.

We have recently developed a new detection technique for this type of connections based on the comparison of the Round Trip Times (RTTs) at different layers. In particular the RTT of the TCP layer and the RTT of the TLS one differ when the connection passes through a RESIP, while they present similar values in case of a direct connection. Elisa developed and launched a large experiment that allowed to successfully evaluate the technique and to build a dataset of the connections. We are currently digging in this data to find new insights and correlations about scrapers taking advantage of RESIP and characterize this ecosystem.

Related publications

RTT based geolocalisation of IPs

People involved

Salman Shaikh (project lead)
Elisa Chiapponi
Mathieu Champion (alumni)

In this project, we are interested in the possibility to geolocalise a communicating machine whose IP address is unknown. This is typically the case if the machine is hidden between a firewall, uses a VPN or a proxy. The only information at our disposal is the time it takes for a packet to reach that machine and the reply to come back (RTT, round trip time). If we knew the speed of the packet, with that time measure, we should be able to have a first approximation of the distance covered by the packet. Unfortunately, the “speed” of a packet is extremely variable as it influenced by a number of elements such as congestion, store and forward delays, etc.

Leveraging the datasets produced by an earlier experiment of the “fighting web scrapers bots” project, we have tested the existing solutions published the literature and have come out with a new, more efficient one. The end goal of the project is to leverage techniques tested in it to be able to geolocalise the machines hiding behind residential IP proxies (RESIP).

Related publication

ImMuNE: Improved Multilateration in Noisy Environments

SeRBER Ongoing Projects

Share

Security Research Bearing Experimental Results (SeRBER)