Comment by websiteapi
Comment by websiteapi 2 days ago
I get tempted to buy a couple of these, but I just feel like the amortization doesn’t make sense yet. Surely in the next few years this will be orders of magnitude cheaper.
Comment by websiteapi 2 days ago
I get tempted to buy a couple of these, but I just feel like the amortization doesn’t make sense yet. Surely in the next few years this will be orders of magnitude cheaper.
Are there benchmarks that effectively measure this? This is essential information when speccing out an inference system/model size/quantization type.
I don’t think it will ever make sense; you can buy so much cloud based usage for this type of price.
From my perspective, the biggest problem is that I am just not going to be using it 24/7. Which means I’m not getting nearly as much value out of it as the cloud based vendors do from their hardware.
Last but not least, if I want to run queries against open source models, I prefer to use a provider like Groq or Cerebras as it’s extremely convenient to have the query results nearly instantly.
my issue is once you have it in your workflow I'd be pretty latency sensitive. imagine those record-it-all apps working well. eventually you'd become pretty reliant on it. I don't want to necessarily be at the whims of the cloud
Aren’t those “record it all” applications implemented as a RAG and injected into the context based on embedding similarity?
Obviously you’re not going to always inject everything into the context window.
I don’t understand what you’re saying. What’s preventing you from using eg OpenRouter to run a query against Kimi-K2 from whatever provider?
Because you have Cloudflare (MITM 1), Openrouter (MITM 2) and finally the "AI" provider who can all read, store, analyze and resell your queries.
EDIT: Thanks for downvoting what is literally one of the most important reasons for people to use local models. Denying and censoring reality does not prevent the bubble from bursting.
you can use chutes.ai TEE (Trusted Execution Environment) and Kimi K2 is running at about 100t/s rn
I think you’re missing the whole point, which is not using cloud compute.
Because of privacy reasons? Yeah I’m not going to spend a small fortune for that to be able to use these types of models.
There are plenty of examples and reasons to do so besides privacy- because one can, because it’s cool, for research, for fine tuning, etc. I never mentioned privacy. Your use case is not everyone’s.
indeed - my main use case is those kind of "record everything" sort of setups. I'm not even super privacy conscious per se but it just feels too weird to send literally everything I'm saying all of the time to the cloud.
luckily for now whisper doesn't require too much compute, bu the kind of interesting analysis I'd want would require at least a 1B parameter model, maybe 100B or 1T.
> You never know what the future will bring, AI will be enshittified and so will hubs like huggingface.
If anyone wants to bet that future cloud hosted AI models will get worse than they are now, I will take the opposite side of that bet.
> It’s useful to have an off grid solution that isn’t subject to VCs wanting to see their capital returned.
You can pay cloud providers for access to the same models that you can run locally, though. You don’t need a local setup even for this unlikely future scenario where all of the mainstream LLM providers simultaneously decided to make their LLMs poor quality and none of them sees this as market opportunity to provide good service.
But even if we ignore all of that and assume that all of the cloud inference everywhere becomes bad at the same time at some point in the future, you would still be better off buying your own inference hardware at that point in time. Spending the money to buy two M3 Ultras right now to prepare for an unlikely future event is illogical.
The only reason to run local LLMs is if you have privacy requirements or you want to do it as a hobby.
If anyone wants to bet that future cloud hosted AI models will get worse than they are now, I will take the opposite side of that bet.
OK. How do we set up this wager?
I'm not knowledgeable about online gambling or prediction markets, but further enshittification seems like the world's safest bet.
This is a weird line of thinking. Here's a question. If you buy one of these and figure out how to use it to make $100k in 3 months, would that be good? When you run a local model, you shouldn't compare it to to cost of using an API. The value lies in how you use it. Let's forget bout making money. Let's just say you have weird fetish and like to have dirty sexy conversation with your LLM. How much would you pay for your data not to be leaked and for the world to see your chat? Perhaps having your own private LLM makes it all worth it. If you have nothing special going then by all means use APIs, but if you feel/know your input it special, then yeah, go private.
Before committing to purchasing two of these, you should look at the true speeds that few people post. Not just the "it works". We're at a point where we can run these very large models "at home", and it is great! But true usage is now with very large contexts, both in prompt processing, and token generations. Whatever speeds these models get at "0" context is very different than what they get at "useful" context, especially in coding and such.