Comment by benlivengood
Comment by benlivengood 13 hours ago
The thing is, it's not even people doing the correlations. Just like transformers can learn most of human knowledge just by trying to predict tokens, I would not be surprised if the ad-serving machine learning systems have learned about people in similar detail.
State of the art about 10 years ago was 4 9s of accuracy predicting click-through rates from the available context (features for user profile, current website, keywords, etc.), which I interpreted as requiring a fairly accurate learned model of human behavior. I got out of that industry so I don't know what current SOTA is for adtech, but I can only imagine it is better. The models were trained on automatically labelled data (GB/s of it) based on actual recent click-through rates so the amount of training data was roughly comparable to small LLMs.
Recent anecdote; three of us were sitting around the kitchen table with our phones out chatting about an obscure new thing that had come up; it appeared in one of our FB ad streams pretty quickly.
My top guesses about how this is possible today;
1) Apps routinely link many third-party data gathering and advertising libraries. Any of these libraries could be gathering enough contextual data and reselling it to make a correlation possible. It's not just obscure thing A that triggers an ad, it's highly correlated mixtures of normal things X, Y and Z that can imply A.
2) other friends may have talked about the obscure thing recently and social network links implied we would be aware of it through them.
Distant 3) the models are actually good enough to infer speech from weird side-channels like the accelerometer when people wave their hands when they talk, etc. Accelerometer sample rate is < 1KHz but over 100Hz which may be enough, especially when you throw giant models at it.
> an obscure new thing that had come up
Since you've provided no explicit counter-evidence, I'm gonna go ahead and say I have four nines of accuracy in predicting that your smartphone was squarely in the dependency chain of any "obscure new thing" you could have imagined discussing.
Edit: wording