Comment by ls612

Comment by ls612 2 days ago

2 replies

This is a dumb question I know, but how expensive is model distillation? How much training hardware do you need to take something like this and create a 7B and 12B version for consumer hardware?

johnb231 a day ago

The process involves running the original model. You can rent these big GPUs for ~$10 per hour, so that is ~$160 per hour for as long as it takes