Comment by llm_trw
>The estimated training time for the end-to-end model on an 8×H100 machine is 2.6 days.
That's a $250,000 machine for the micro budget. Or if you don't want to do it locally ~$2,000 to do it on someone else's machine for the one model.
From the abstract
Figure 1 End of intro under the key contributions bullet points I'm just saying, the authors are not trying to hide this point. They are making it abundantly clear.I should also mention that this is the most straightforward way to discuss pricing. It is going to be much more difficult if they do comparisons including the costs of the machines as then there needs to be an amortization cost baked in and that's going to have to include costs of electricity, supporting hardware, networking, how long the hardware is used for, at what percentage utility the hardware is, costs of employees to maintain, and all that fun stuff. Which... you can estimate by... GPU rental costs... Since they are in fact baking those numbers in. They explain their numbers in the appendix under Table 5. It is estimated at $3.75/H100/hr.
Btw, they also state a conversion to A100s