Comment by ekojs
Congrats on the launch!
We're definitely looking for something like this as we're looking to transition from Azure's (expensive) GPUs. I'm curious how you stack against something like Runpod's serverless offering (which seems quite a bit cheaper). Do you offer faster cold starts? How long would a ~30GB model load takes?
Yes RunPod does have cheaper pricing than us however they don't allow you to specify your exact resources but rather charge you the full resource (see example of A100 above) so depending on your resource requirements our pricing could be competitive since we charge you only for the resources you use.
In terms of cold starts, they mentioned their cold starts are 250ms which I am not sure what workload that is on, or if we have the same measure of cold starts. We have had quite a few customers that we have told us we are quite a bit faster 2-4 seconds vs ~10 seconds although we haven't confirmed this ourselves.
For a 30GB model, we have a few ways to speed this up such as using the Tensorizer framework from Coreweave, we cache model files in our distributed caching layer but I would need to test. We see reads of up to 1GB/s. If you tell me the model you are running (if open-source) I can get results to you - you can message me on our Slack/Discord community or email me at michael@cerebrium.ai or