typpilol 4 hours ago

Won't work at all. Or if it does it'll be so slow since it'll have to go to the disk for every single calculation so it won't ever finish.

  • karpathy 2 hours ago

    It will work great with 40GB GPU, probably a bit less than twice slower. These are micro models of a few B param at most and fit easily during both training and inference.