davmre 14 hours ago

It's true that these are very different activities, but I think most ML researchers would agree that it's actually the creation of ImageNet that sparked the deep learning revolution. CNNs were not a novel method in 2012; the novelty was having a dataset big and sophisticated enough that it was actually possible to learn a good vision model from without needing to hand-engineer all the parts. Fei-fei saw this years in advance and invested a lot of time and career capital setting up the conditions for the bitter lesson to kick in. Building the dataset was 'easy' in a technical sense, but knowing that a big dataset was what the field needed, and staking her career on it when no one else was doing or valuing this kind of work, was her unique contribution, and took quite a bit of both insight and courage.

dauertewigkeit 17 hours ago

CNNs and Transformers are both really simple and intuitive so I don't think there is any stroke of genius in how they were devised.

Their success is due to datasets and the tooling that allowed models to be trained on large amounts of data, sufficiently fast using GPU clusters.

  • yzydserd 17 hours ago

    Exactly right. Neatly said by the author in the linked article.

    > I spent years building ImageNet, the first large-scale visual learning and benchmarking dataset and one of three key elements enabling the birth of modern AI, along with neural network algorithms and modern compute like graphics processing units (GPUs).

    Datasets + NNs + GPUs. Three "vastly different" advances that came together. ImageNet was THE dataset.

  • byearthithatius 16 hours ago

    "CNNs and Transformers are both really simple and intuitive" and labeling a bunch of images you downloaded is not simple and intuitive? It was a team effort and I would hardly call a single dataset what drove modern ML. Most of currently deployed modern ML wasn't trained on that dataset and didn't come from models trained on it.