Comment by yuppiepuppie

Comment by yuppiepuppie 2 days ago

1 reply

Very nice demo!

When you ran it the first time, it took a while to load up. Do subsequent runs go faster?

And what cloud provider are you all using under the hood? We work in a specific sector that excludes us from using certain cloud providers (ie. AWS) at my company.

za_mike157 2 days ago

You are correct! After the first request, an image will be on a machine and it’s cached for future use. This makes subsequent container startups much faster. We also route requests to machines where the image is already cached as well as dedupe content between images in order to make startups faster

We are running on top of AWS however can run on top of any cloud provider as well as are working on you using your own cloud. Happy to hear more about your use case and see if we can help you at all - email me at michael@cerebrium.ai.

PS: I will state that vLLM has shocking load times into VRam that we are resolving.