Comment by karpathy

Comment by karpathy 5 hours ago

It will work great with 40GB GPU, probably a bit less than twice slower. These are micro models of a few B param at most and fit easily during both training and inference.