Comment by donmcronald
Comment by donmcronald a day ago
Say there are only 2 sites on Tor. Site 'A' is plain text and has no pages over 1KB. You know this because it's public and you can go look at it. Site 'B' hosts memes which are mostly .GIFs that are 1MB+. You know this because it's also a public site.
If I was browsing one of those sites for an hour and you were my guard, do you think you could make a good guess which site I'm visiting?
I'm asking why that concept doesn't scale up. Why wouldn't it work with machine learning tools that are used to detect anomalous patterns in corporate networks if you reverse them to detect expected patterns.
The point is that there aren't only two sites available on the clearnet. Is the idea that you find a unique file size across every single site on the internet?
My understanding (that may be totally wrong) is that there is some padding added to requests so as to not be able to correlate exact packet sizes.