Comment by donmcronald

Comment by donmcronald a day ago

8 replies

> Surely eventually I'm going to get a hit where all three nodes in the circuit are my nodes that are logging everything?

If you're looking for static assets, why would you need to see the whole chain? Wouldn't a connection to a known website (page) have a similar fingerprint even if you wrap it in 3 layers of encryption? Does Tor coalesce HTTP queries or something to avoid having someone fingerprint connections based on the number of HTTP requests and the relative latency of each request?

I've always assumed that, if a global adversary attack works, you'd only need to watch one side if you're looking for connections to known static content.

I don't know much beyond the high level idea of how Tor works, so I could be totally wrong.

alasdair_ a day ago

If I don't know the whole chain (or I don't use a timing attack with a known guard and exit node) then I don't see how I'd know who sent the packet in the first place. The person in the chain would connect to a random tor guard node, which would connect to another random node which would connect to my evil exit node. My evil exit node would only know which random TOR node the connection came from but that's not enough to tell who the original person was.

  • donmcronald a day ago

    Say there are only 2 sites on Tor. Site 'A' is plain text and has no pages over 1KB. You know this because it's public and you can go look at it. Site 'B' hosts memes which are mostly .GIFs that are 1MB+. You know this because it's also a public site.

    If I was browsing one of those sites for an hour and you were my guard, do you think you could make a good guess which site I'm visiting?

    I'm asking why that concept doesn't scale up. Why wouldn't it work with machine learning tools that are used to detect anomalous patterns in corporate networks if you reverse them to detect expected patterns.

    • alasdair_ a day ago

      The point is that there aren't only two sites available on the clearnet. Is the idea that you find a unique file size across every single site on the internet?

      My understanding (that may be totally wrong) is that there is some padding added to requests so as to not be able to correlate exact packet sizes.

      • donmcronald a day ago

        > Is the idea that you find a unique file size across every single site on the internet?

        Not really. I'm thinking more along the lines of a total page load. I probably don't understand it well enough, but consider something like connecting to facebook.com. It takes 46 HTTP requests.

        Say (this is made up) 35 of those are async and contain 2MB of data total, the 36th is consistently a slow blocking request, 37-42 are synchronous requests of 17KB, 4KB, 10KB, 23KB, 2KB, 7KB, and 43-46 are async (after 42) sending back 100KB total.

        If that synchronous block ends up being 6 synchronous TCP connections, I feel like that's a pretty distinct pattern if there isn't a lot of padding, especially if you can combine it with a rule that says it needs to be preceded by a burst of about 35 connections that transfer 2MB in total and succeeded by a burst of 4 connections that transfer 100KB combined.

        I've always assumed there's the potential to fingerprint connections like that, regardless of whether or not they're encrypted. For regular HTTPS traffic, if you built a visual of the above for a few different sites, you could probably make a good guess which one people are visiting just by looking at it.

        Dynamic content getting mixed in might be enough obfuscation, but for things like hidden services I think you'd be better off if everything got coalesced and chunked into a uniform size so that all guards and relays see is a stream of (ex:) 100KB blocks. Then you could let the side building the circuit demand an arbitrary amount of padding from each relay.

        Again, I probably just don't understand how it works, so don't read too much into my reply.

whimsicalism a day ago

? tor reroutes the packets so how would you identify who is visiting who? it's not just 'layers of encryption' it is layers of redirection

  • donmcronald a day ago

    If I visit facebook.com it's about 45 requests and 2.5MB of data. Are you saying that if I did that via Tor I would get a different circuit for each request or each individual packet?

    Eventually the guard has to send the whole payload to me, right? Wouldn't that look similar every time if there's no obfuscation?

    • whimsicalism a day ago

      you mean inferring the website based on packet traffic pattern if you are the guard? yeah maybe possible, not sure how distinct each website footprint would be in practice

      seems like it would also be challenging to hold up in actual legal proceedings

      • donmcronald a day ago

        > you mean inferring the website based on packet traffic pattern if you are the guard?

        Yeah, basically, but I was thinking that if you're analyzing a pattern going to the client, all you'd need is any point between the guard and the client (ie: an ISP).