Comment by chriskanan
Comment by chriskanan a day ago
The most salient thing in the document is that it put export controls on releasing the weights of models trained with 10^26 operations. While there may be some errors in my math, I think that corresponds to training a model with over 70,000 H100s for a month.
I personally think the regulation is misguided, as it assumes we won't identify better algorithms/architectures. There is no reason to assume that the level of compute leads to these problems.
Moreover, given the emphasis on test-time compute nowadays and that it seems like a lot of companies have hit a wall with performance gains with trying to scale LLMs at train-time, I especially think this regulation isn't especially meaningful.
Traditional export control applied to advanced hardware is because the US doesn't want its adversaries to have access to things that erode the US military advantage. But most hardware is only controlled at the high-end of the market. Once a technology is commodotized, the low-end stuff is usually widely proliferated. Night vision goggles are an example, only the latest generation technology is controlled, and low-end stuff can be bought online and shipped worldwide.
Applying this to your thoughts about AI, is that as the efficiency of training gets better, the ability to train models is commodotized, and those models would not be considered to be advantageous and would not need to be controlled. So maybe setting the export control based on the number of operations is a good idea- it naturally allows efficiently trained models to be exported since they wouldn't be hard to train in other countries anyway.
As computing power scales maybe the 10^26 limit will need to be revised, but setting the limit based on the scale of the training is a good idea since it is actually measurable. You couldn't realistically set the limit based on the capability of the model since benchmarks seem become irrelevant every few months due to contamination.