Comment by risyachka
Yeah Runpods cold start is definitely not 250ms, not even close. Maybe for some models idk but a huggingface model 8B params takes like 30 seconds to cold start in their serverless "flash" configuration.
Yeah Runpods cold start is definitely not 250ms, not even close. Maybe for some models idk but a huggingface model 8B params takes like 30 seconds to cold start in their serverless "flash" configuration.
Thanks for confirming! Our cold start, excluding model load is 2-4 seconds typically for HF models.
The only time it gets much longer when companies have done a lot with very specific CUDA implementations