adastra22 3 days ago

Any complex dataset has enough revealing information as to make deanonimization possible. To truly muddle the waters enough to make such attempts impossible would require injecting enough noise as to make the analytics useless to learn from.

This is a fundamental property derived from information theory, but also confirmed time after time in practice: https://www.theguardian.com/technology/2019/jul/23/anonymise...

Data anonymization is a myth sold to politicians to whitewash data collection.

  • throwaway7783 3 days ago

    Sure, but that is broader than product analytics and applies to all data collection. The word I should have used is "pseudonymize". The goal for capturing product analytics is not to deanonimize but understand usage trends/bottlenecks.

    • adastra22 2 days ago

      Pseudonymous is not what is wanted here though. For your spying on my usage to be acceptable, it would have to be truly anonymous. Pseudonymous means that instead of you putting "HN user adastra22" in your database for everything I do, you instead use "fffa366bc5d3." So any human being looking at the database record won't immediately see that it is me.

      But in any sufficiently complex real-world database, it is a trivial step to map these pseudonymous tags to actual users, and thereby undo the obfuscation. It provides no actual privacy protection.

      And the privacy IS the issue here.