Comment by doctorpangloss

Comment by doctorpangloss 2 days ago

I really like the Hugging Face guys, but...

> Modify one thing at a time

> Change only one variable per ablation while keeping everything else constant. If you change multiple things and performance improves, you won’t know what caused it. Test modifications individually, then combine successful ones and reassess.

This is an unintentional microcosm of what is flawed with the document.

CamperBob2 2 days ago

What's wrong with it? That's good advice in almost any optimization or troubleshooting context where variables may interact.

Reply View 4 replies

yorwba 2 days ago

One problem with testing one change at a time is that if you can only run a small number of experiments because each one requires many GPU hours to get results, you can also only test a small number of changes. If you can come up with and implement new changes much more easily than you can test them, it would be more efficient to test multiple changes at a time and use some form of Bayesian optimization to find the best combination of changes with as few experiments as possible.

Reply View | 2 replies
- ImageXav 2 days ago
  
  Agreed. One at a time testing (OAT) has been outdated for almost a century at this point. Factorial and fractional factorial experiments have been around for that long and give detailed insights into the effect of not just single changes but the interaction between changes, which means you can superpower your learnings as many variables in DL do in fact interact.
  Or, more modern Bayesian methods if you're more interested in getting the best results for a given hyperparameter sweep.
  However, that is not to detract from the excellent effort made here and the great science being investigated. Write ups like this offer so much gold to the community.
  
  Reply View | 0 replies
- empiko 2 days ago
  
  The number of runs you can afford are not enough to perform Bayesian optimization. Count how many different options they explored in the text and take a guess how many samples you need to start modeling the hyperparameter space.
  
  Reply View | 0 replies
doctorpangloss 2 days ago

It’s advice for being an individual contributor, not a researcher.
And even then. If you’re an IC and your boss is saying, “incrementalism at the level of planning experiments,” and the goal is research, quit, because you will fail.

Reply View | 0 replies