Comment by reilly3000

Comment by reilly3000 2 days ago

There are plenty of 3rd party and big cloud options to run these models by the hour or token. Big models really only work in that context, and that’s ok. Or you can get yourself an H100 rack and go nuts, but there is little downside to using a cloud provider on a per-token basis.

cubefox 2 days ago

> There are plenty of 3rd party and big cloud options to run these models by the hour or token.

Which ones? I wanted to try a large base model for automated literature (fine-tuned models are a lot worse at it) but I couldn't find a provider which makes this easy.

Reply View 7 replies

reilly3000 2 days ago

If you’re already using GCP, Vertex AI is pretty good. You can run lots of models on it:
https://docs.cloud.google.com/vertex-ai/generative-ai/docs/m...
Lambda.ai used to offer per-token pricing but they have moved up market. You can still rent a B200 instance for sub $5/hr which is reasonable for experimenting with models.
https://app.hyperbolic.ai/models Hyperbolic offers both GPU hosting and token pricing for popular OSS models. It’s easy with token based options because usually are a drop-in replacement for OpenAI API endpoints.
You have you rent a GPU instance if you want to run the latest or custom stuff, but if you just want to play around for a few hours it’s not unreasonable.

Reply View | 2 replies
- verdverm 2 days ago
  
  GCloud and Hyperbolic have been my go-to as well
  
  Reply View | 0 replies
- cubefox a day ago
  
  > If you’re already using GCP, Vertex AI is pretty good. You can run lots of models on it:
  > https://docs.cloud.google.com/vertex-ai/generative-ai/docs/m...
  I don't see any large base models there. A base model is a pretrained foundation model without fine tuning. It just predicts text.
  > Lambda.ai used to offer per-token pricing but they have moved up market. You can still rent a B200 instance for sub $5/hr which is reasonable for experimenting with models.
  A B200 is probably not enough: it has just 192 GB RAM while DeepSeek-V3.2-Exp-Base, the base model for DeepSeek-V3.2, has 685 billion BF16 parameters. Though I assume they have larger options. The problem is that all the configuration work is then left to the user, which I'm not experienced in.
  > https://app.hyperbolic.ai/models Hyperbolic offers both GPU hosting and token pricing for popular OSS models
  Thanks. They do indeed have a single base model: Llama 3.1 405B BASE. This one is a bit older (July 2024) and probably not as good as the base model for the new DeepSeek release. But that might the the best one can do, as there don't seem to be any inference providers which have deployed a DeepSeek or even Kimi base model.
  
  Reply View | 0 replies
weberer a day ago

Fireworks supports this model serverless for $1.20 per million tokens.
https://fireworks.ai/models/fireworks/deepseek-v3p2

Reply View | 1 reply
- cubefox 21 hours ago
  
  That's the final, fine-tuned model. The base model (pretraining only, no instruction SFT, RLHF, RLVR etc) is this one: https://huggingface.co/deepseek-ai/DeepSeek-V3.2-Exp-Base It's apparently not offered at any inference provider, nor are older DeepSeek base models.
  
  Reply View | 0 replies
big_man_ting 2 days ago

have you checked OpenRouter if they offer any providers who serve the model you need?

Reply View | 1 reply
- cubefox a day ago
  
  I searched for "base" and the best available base model seems to be indeed Llama 3.1 405B Base at Hyperbolic.ai, as mentioned in the comment above.
  
  Reply View | 0 replies