Comment by badmonster

Comment by badmonster 6 months ago

What stands out most is the practical implication: enabling lossless inference of a 405B-parameter model on a single node with 8×80GB GPUs is wild. That’s a huge unlock for research labs and startups alike that want to run frontier models without massive infrastructure costs.

latchkey 6 months ago

> That’s a huge unlock for research labs and startups alike that want to run frontier models without massive infrastructure costs.

Or let one of the neoclouds take care of the infrastructure costs and rent it out from them. Disclosure: I run one of them.

Reply View 11 replies

airstrike 6 months ago

Keep up the great work! We need more of you and other players.
Some unsolicited feedback: I would suggest reworking your landing page so that the language is always from your customers' perspective. Your customers want to solve a real internal problem that they have. Talking about how great your company is will always have less impact than talking about how you know what that problem is and how you intend to solve it.
Your mission is relevant to you and your investors, not to your customers. They care about themselves.
Your "quick start" should be an interactive form. I shouldn't have to remember what to put in an email to reach out to you. Make it easy for me. Also move that to the front page, provide a few "standard" packages and a custom one. Reduce the friction to clicking the CTA.
Since your pricing is transparent, you should be able to tell me what that price will be before I even submit a request. I assume you're cheaper than the competition (otherwise why would I not go with them?) so make that obvious. Check out Backblaze's website for an example page: https://www.backblaze.com/cloud-storage/pricing
Shell out a few grand and hire a designer to make your page look more professional. Something like https://oxide.computer/ but with the points above, as they also make the same mistake of making their home page read like a pitch deck.

Reply View | 1 reply
- latchkey 6 months ago
  
  Fantastic unsolicited feedback, I'm definitely taking this to heart!
  Website is intended to be more like documentation instead of a pitch deck or useless splash with a contact us form. I dislike sites like Oxide, I scroll past and don't read or ingest any of the fancy parts. Of course, you're right, this probably needs to be less about me. =)
  Friction definitely needs to be improved. That part is being worked on right now. Our intention is to be fully self-service, so that you don't have to talk to us at all, unless you want to. Credit card and go.
  We recently lowered our prices to be competitive with the rest of the market vs. focusing on people who care more about what we offer. We weren't trying to be cheaper than everyone else, we were trying to offer a better service. Lesson learned and pricing adjusted. Streisand effect, I don't like to mention the other players much.
  Again, thanks!
  
  Reply View | 0 replies
sundarurfriend 6 months ago

> neoclouds
For anyone else who hadn't heard of this term:
> Neoclouds are startups specializing in AI-specific cloud computing. Unlike their larger competitors, they don’t develop proprietary chips. Instead, they rely heavily on Nvidia’s cutting-edge GPUs to power their operations. By focusing solely on AI workloads, these companies offer specialized solutions tailored to AI developers’ needs.
from https://www.tlciscreative.com/the-rise-of-neoclouds-shaping-...

Reply View | 1 reply
- latchkey 6 months ago
  
  I believe that the term was first coined by SemiAnalysis in this article:
  https://semianalysis.com/2024/10/03/ai-neocloud-playbook-and...
  
  Reply View | 0 replies
Ringz 6 months ago

I need your services in Cape Town South Africa. It’s hard to find good data centers here.

Reply View | 1 reply
- latchkey 6 months ago
  
  Rent from us! hello@hotaisle.ai
  
  Reply View | 0 replies
saagarjha 6 months ago

That just moves the infrastructure costs to your cloud bill.

Reply View | 4 replies
- latchkey 6 months ago
  
  True, but there is so much value that we provide above and beyond just a cloud bill, that I think it is worth it. This is way more than racking and stacking commodity servers and providing a ssh login.
  It is novel equipment that few have ever used before outside of a relatively small HPC community. It regularly breaks and has issues (bugs) that need industry relationships to manage properly. We've had one server down for over a month now cause SMCI can't get their sh/t together to fix it. That's a $250k+ 350lbs paperweight. Good luck to any other small company that wants to negotiate that relationship.
  We are offering a very valuable service by enabling easy access to some of the most powerful compute available today. How many people do you think have a good grasp of what it takes to configure rocev2 & 8x400G across a cluster of servers? Good luck trying to hire talent that can set that up, they already have jobs.
  The capex / opex / complexity involved with deploying this level of gear is huge and only getting larger as the industry shifts to bigger/better/faster (ie: air cooling is dead). Things are moving so quickly, that equipment you purchased a year ago is now already out of date (H100 -> H200 is a great example). You're going to have to have a pretty impressive depreciation model to deploy this yourself.
  I wouldn't just dismiss this as moving costs around.
  
  Reply View | 3 replies
  
  zarathustreal 6 months ago
  
  wait your competitive advantage is “human friction exists”?
  …how do you justify marketing yourself in a system like that?
  “In general, people in this vertical have difficulty doing their jobs. Luckily we’ve had drinks with most of them” ……
  
  Reply View | 2 replies

miohtama 6 months ago

I am not expert here, so want to ask what's magical about 405B number?

Reply View 5 replies

daveguy 6 months ago

That's the size of the largest, most capable, open source models. Specifically Llama 3.1 has 405B parameters. Deepseek's largest model is 671B parameters.

Reply View | 4 replies
- mhitza 6 months ago
  
  Small corrections. Llama 3.1 is not an Open Source model, but a Llama 3.1 Licensed model. Neither is DeepSeek apparently https://huggingface.co/deepseek-ai/DeepSeek-V3/blob/main/LIC... which I was of the false opinion that it is. Though I never considered using it, so haven't checked the license before.
  
  Reply View | 3 replies
  
  Der_Einzige 6 months ago
  
  You can just ignore the license since the existence of these models is based on piracy at a scale never before seen. Aaron Swartz couldn’t have even imagined violating copyright that hard.
  If you live in a glass house, you won’t throw stones. No one in the LLM space wants to be litigious
  It’s an open secret that DeepSeek used a ton of OpenAI continuations both in pre training and in the distillation. That totally violates openAI TOS. No one cares.
  
  Reply View | 1 reply
  
  LoganDark 6 months ago
  
  > No one in the LLM space wants to be litigious
  Except for OpenAI.
  
  Reply View | 0 replies
  
  gunalx 6 months ago
  
  Both deepseek R1 and V3-0324 is mit licensed.
  
  Reply View | 0 replies

Der_Einzige 6 months ago

4 but quants of DeepSeek or llama3 405n already fit on those GPUs and purported to have almost 0 loss compared to the full model. Doesn’t seem like that big of a deal given this

Reply View 0 replies

danielmarkbruce 6 months ago

It's... useful right now...it's not a huge unlock in a world where model size, GPU memory size, different precision support are changing quickly.

Reply View 16 replies

jhj 6 months ago

Unlike quantization, dimensionality reduction/low rank approximation, distillation etc, lossless compression is an always-correct addition to any ML system as you are computing the same thing you did before, the only question is if it is fast enough to not cause substantial bottlenecks and if the achievable compression ratio is high enough to be useful.
Floating point is just an inefficient use of bits (due to excessive dynamic range), especially during training, so it will always be welcome there. Extreme quantization techniques (some of the <= 4-bit methods, say) also tend to increase entropy in the weights limiting the applicability of lossless compression, so lossless and lossy compression (e.g., quantization) sometimes go against each other.
If you have billions in dollars in inference devices, even reducing the number of devices you need for a given workload by 5% is very useful.

Reply View | 4 replies
- danielmarkbruce 6 months ago
  
  "always correct"...
  
  Reply View | 3 replies
  
  Dylan16807 6 months ago
  
  Yes. It doesn't change the output, so it is a correct optimization.
  
  Reply View | 2 replies
striking 6 months ago

Is GPU memory size really changing that quickly? For that matter, is model size?

Reply View | 10 replies
- kadushka 6 months ago
  
  What's rapidly changing are quantization algorithms, and hardware features to support those algorithms. For example, Blackwell GPUs support dynamic FP4 quantization with group size 16. At that group size it's close to lossless (in terms of accuracy metrics).
  
  Reply View | 0 replies
- latchkey 6 months ago
  
  Both AMD and Nvidia are dumping more and more memory into their GPUs.
  MI300x is 192GB HMB3, MI325x is 256 HMB3e, MI355x should be 288 HBM3e (and support FP4/6).
  
  Reply View | 5 replies
  
  NBJack 6 months ago
  
  The professional side of things, yes. For consumer grade GPUs, despite the trends in gaming markets otherwise needing such, the values have stagnated a bit.
  
  Reply View | 4 replies
- danielmarkbruce 6 months ago
  
  Yes, yes.
  Nvidia about to release blackwell ultra with 288GB. Go back to maybe 2018 and max was 16gb if memory serves.
  DeepSeek recently release a 670 gb model. A couple years ago Falcon's 180gb seemed huge.
  
  Reply View | 2 replies
  
  spoaceman7777 6 months ago
  
  I'd assume that, in the context of LLM inference, "recent" generally refers to the Ampere generation and later of GPUs, when the demand for on board memory went through the roof (as, the first truly usable LLMs were trained on A100s).
  We've been stuck with the same general caps on standard GPU memory since then though. Perhaps limited in part because of the generational upgrades happening in the bandwidth of the memory, rather than the capacity.
  
  Reply View | 1 reply
  
  danielmarkbruce 6 months ago
  
  Bandwidth is going up too. "It's not doubling every 18 months and hence it's not moving" isn't a sensible way to view change.
  A one time effective 30% reduction in model size simply isn't going to be some massive unlocker, in theory or in practice.
  
  Reply View | 0 replies