Comment by ekojs

Comment by ekojs 10 months ago

Congrats on the launch!

We're definitely looking for something like this as we're looking to transition from Azure's (expensive) GPUs. I'm curious how you stack against something like Runpod's serverless offering (which seems quite a bit cheaper). Do you offer faster cold starts? How long would a ~30GB model load takes?

za_mike157 10 months ago

Yes RunPod does have cheaper pricing than us however they don't allow you to specify your exact resources but rather charge you the full resource (see example of A100 above) so depending on your resource requirements our pricing could be competitive since we charge you only for the resources you use.

In terms of cold starts, they mentioned their cold starts are 250ms which I am not sure what workload that is on, or if we have the same measure of cold starts. We have had quite a few customers that we have told us we are quite a bit faster 2-4 seconds vs ~10 seconds although we haven't confirmed this ourselves.

For a 30GB model, we have a few ways to speed this up such as using the Tensorizer framework from Coreweave, we cache model files in our distributed caching layer but I would need to test. We see reads of up to 1GB/s. If you tell me the model you are running (if open-source) I can get results to you - you can message me on our Slack/Discord community or email me at michael@cerebrium.ai or

Reply View 5 replies

spmurrayzzz 10 months ago

> Yes RunPod does have cheaper pricing than us however they don't allow you to specify your exact resources but rather charge you the full resource (see example of A100 above) so depending on your resource requirements our pricing could be competitive since we charge you only for the resources you use.
I may be misunderstanding your explanation a bit here, but Runpod's serverless "flex" tier looks like the same model (it only charges you for non-idle resources). And at that tier they are still 2x cheaper for A100, at your price point with them you could rent an H100.

Reply View | 2 replies
- za_mike157 10 months ago
  
  Ah I see they recently cut their pricing by 40% so you are correct - sorry about that. It seems we are more expensive compared to their new pricing
  
  Reply View | 1 reply
  
  spmurrayzzz 10 months ago
  
  FWIW Their most expensive flex price I've ever seen for 80GB A100 was $0.00130 back in January of this year, which is still cheaper albeit by a smaller magnitude, if that's helpful at all for your own competitive market analysis.
  (Congrats on the launch as well, by the way).
  
  Reply View | 0 replies
risyachka 10 months ago

Yeah Runpods cold start is definitely not 250ms, not even close. Maybe for some models idk but a huggingface model 8B params takes like 30 seconds to cold start in their serverless "flash" configuration.

Reply View | 1 reply
- za_mike157 10 months ago
  
  Thanks for confirming! Our cold start, excluding model load is 2-4 seconds typically for HF models.
  The only time it gets much longer when companies have done a lot with very specific CUDA implementations
  
  Reply View | 0 replies