Comment by stingraycharles

I don’t think it will ever make sense; you can buy so much cloud based usage for this type of price.

From my perspective, the biggest problem is that I am just not going to be using it 24/7. Which means I’m not getting nearly as much value out of it as the cloud based vendors do from their hardware.

Last but not least, if I want to run queries against open source models, I prefer to use a provider like Groq or Cerebras as it’s extremely convenient to have the query results nearly instantly.

websiteapi 2 days ago

my issue is once you have it in your workflow I'd be pretty latency sensitive. imagine those record-it-all apps working well. eventually you'd become pretty reliant on it. I don't want to necessarily be at the whims of the cloud

Reply View 1 reply

stingraycharles 2 days ago

Aren’t those “record it all” applications implemented as a RAG and injected into the context based on embedding similarity?
Obviously you’re not going to always inject everything into the context window.

Reply View | 0 replies

[removed] 2 days ago

[deleted]

Reply View 0 replies

lordswork 2 days ago

As long as you're willing to wait up to an hour for your GPU to get scheduled when you do want to use it.

Reply View 4 replies

stingraycharles 2 days ago

I don’t understand what you’re saying. What’s preventing you from using eg OpenRouter to run a query against Kimi-K2 from whatever provider?

Reply View | 3 replies
- hu3 2 days ago
  
  and you'll get a faster model this way
  
  Reply View | 0 replies
- bgwalter 2 days ago
  
  Because you have Cloudflare (MITM 1), Openrouter (MITM 2) and finally the "AI" provider who can all read, store, analyze and resell your queries.
  EDIT: Thanks for downvoting what is literally one of the most important reasons for people to use local models. Denying and censoring reality does not prevent the bubble from bursting.
  
  Reply View | 1 reply
  
  irthomasthomas a day ago
  
  you can use chutes.ai TEE (Trusted Execution Environment) and Kimi K2 is running at about 100t/s rn
  
  Reply View | 0 replies

givinguflac 2 days ago

I think you’re missing the whole point, which is not using cloud compute.

Reply View 4 replies

stingraycharles 2 days ago

Because of privacy reasons? Yeah I’m not going to spend a small fortune for that to be able to use these types of models.

Reply View | 3 replies
- givinguflac 2 days ago
  
  There are plenty of examples and reasons to do so besides privacy- because one can, because it’s cool, for research, for fine tuning, etc. I never mentioned privacy. Your use case is not everyone’s.
  
  Reply View | 2 replies
  
  wyre 2 days ago
  
  All of those things you can still do renting AI server compute though? I think privacy and cool-factor are the only real reasons why it would be rational for someone to spend checks the apple store $19,000 on computer hardware...
  
  Reply View | 1 reply
  
  givinguflac 20 hours ago
  
  Why do you look at this as a consumer? Have you never heard of businesses spending money on hardware???
  
  Reply View | 0 replies