Comment by yalok

Comment by yalok 5 hours ago

0 replies

add to it multiple iterations of having to restart pretraining from some earlier checkpoint when loss plateaus too early or starts increasing due to some bugs…