Comment by janalsncm Comment by janalsncm 16 hours ago 1 reply Copy Link View on Hacker News You mentioned it took 100 gpu hours, what gpu did you train on?
Copy Link ollin 15 hours ago Collapse Comment - Mostly 1xA10 (though I switched to 1xGH200 briefly at the end, lambda has a sale going). The network used in the post is very tiny, but I had to train a really long time w/ large batch to get somewhat-stable results. Reply View | 0 replies
Mostly 1xA10 (though I switched to 1xGH200 briefly at the end, lambda has a sale going). The network used in the post is very tiny, but I had to train a really long time w/ large batch to get somewhat-stable results.