Comment by cavisne
Scaling transformers has been basically alchemy, the breakthroughs aren’t from rigorous science they are from trying stuff and hoping you don’t waste millions of dollars in compute.
Maybe the company that just tells an AI to generate 100s of random scaling ideas, and tries them all is the one that will win. That company should probably be 100 percent committed to this approach also, no FLOPs spent on ghibli inference.