Show HN: Aroma: Every TCP Proxy Is Detectable with RTT Fingerprinting
(github.com)80 points by Sakura-sx 6 days ago
TL;DR explanation (go to https://github.com/Sakura-sx/Aroma?tab=readme-ov-file#tldr-e... if you want the formatted version)
This is done by measuring the minimum TCP RTT (client.socket.tcpi_min_rtt) seen and the smoothed TCP RTT (client.socket.tcpi_rtt). I am getting this data by using Fastly Custom VCL, they get this data from the Linux kernel (struct tcp_info -> tcpi_min_rtt and tcpi_rtt). I am using Fastly for the Demo since they have PoPs all around the world and they expose TCP socket data to me.
The score is calculated by doing tcpi_min_rtt/tcpi_rtt. It's simple but it's what worked best for this with the data Fastly gives me. Based on my testing, 1-0.7 is normal, 0.7-0.3 is normal if the connection is somewhat unstable (WiFi, mobile data, satellite...), 0.3-0.1 is low and may be a proxy, anything lower than 0.1 is flagged as TCP proxy by the current code.
This feels like something that’s a neat claim and will work against simple setups, but less accurate for more complicated scenarios (eg Tor). Then you’re really just relying on how accurate your knowledge of the proxies are.
Also, the readme has slightly incorrect logic I think:
> According to Special Relativity, information cannot travel faster than the speed of light. Therefore, if the round trip time (RTT) is 4ms, it's physically impossible for them to be farther than 2 light milliseconds away, which is approximately 600 kilometers.
It calls out the 33% for fiber but ignores that there’s not a straightline path between two points on the network and there could be wireless, cable, and DSL links somewhere on that hop.
Also, the controlled variable here is latency, not distance. Thus you can always increase latency through buffering and therefor you could be made to appear further than you are. And that buffering need not even be intentional - your perceived distance estimate will vary based upon queuing delays in intermediary depending on time of day (itself a fingerprint if you incorporate time-aware measurements, but a source of error if you don’t).
Fingerprinting is hard and I dislike the framing that it’s absolutely impossible to mask or that there’s not false positive and false negative error rates with the fingerprint.