Comment by estimator7292

Comment by estimator7292 2 days ago

I recently got a new Thinkpad for work, can't recall which model. I think L series?

The build quality is nicer than my T530. The bottom cover doesn't have access panels anymore, but it's got just a few captive(!!) screws and the whole bottom comes off. Everything is neatly exposed and you don't need to access the top of the board at all. The bottom cover has plastic clips along with the screws, but they're spring loaded! They aren't simply molded in and cannot snap off. It's some incredible attention to detail.

I've noticed that most recent laptops have the vent behind the screen hinge where it's completely blocked if the screen is closed. Thinkpad has the vent fully exposed. In fact, it exposes more vent when the screen is closed.

Too bad the CPU is a lemon. One of the new AMD chips with a built in NPU. The NPU is slower than the integrated graphics for inference. Not a discrete card, just the GPU baked into the chip.

In contrast, I got a hand-me-down Dell XPS-something from 2020 when I first started this job. It idles IDLES! at 100°C. I tried to re-paste the CPU, but the heat pipes were so small and thin that I crushed one between my fingers. Even with massive airflow through the case from external fans, it never drops below 100C. Absolutely inexcusable.

Looks to me like Lenovo still has it. At least if you're paying real money for a professional level machine. This new Thinkpad is now my #1 most repairable and maintainable machine. T530 is a close second. Absolutely every other laptop I've ever used is tied for last place in the garbage.

craftkiller a day ago

> The NPU is slower than the integrated graphics for inference.

Yeah, that's expected. On consumer devices, the NPUs are not optimizing for speed and they're not meant to out-perform the GPU. They are optimizing for low power consumption. They want to be able to run simple AI tasks without turning your laptop into a frying pan, so that is where the NPU comes in.

Quoting wikipedia:

> On consumer devices, the NPU is intended to be small, power-efficient, but reasonably fast when used to run small models.

Reply View 0 replies

sgc a day ago

I had the same xps nightmare. I fixed it by getting a PTM7950 phase change thermal pad for cpu and gpu, and swapping to Linux (which I would have done anyways). Went from 100c to 49c idle. PTM7950 is incredible.

Reply View 3 replies

Telaneo a day ago

On the one hand, PTM7950 is really good. On the other hand, a 50 degree temperature drop can't really be explained by anything other than something being terribly wrong to begin with. That thing might unfortunately be Dell, but I'd imagine if more than three brain cells were involved in temperature management design of that machine, it wouldn't have been quite as catastrophic.

Reply View | 2 replies
- estimator7292 a day ago
  
  The XPS I have very aggressively keeps the fans off. They don't kick on at all until 80° or so. Of course there's no way to change it other than a userspace daemon.
  
  Reply View | 0 replies
- sgc a day ago
  
  Yes of course. I am unhappy with the device for several reasons. Too bad, because they almost got it right in so many other ways. My wife's smaller and slightly less powerful xps is doing great on the other hand.
  
  Reply View | 0 replies

nickpsecurity a day ago

You actually have three, AI accelerators: the CPU's SIMD, the NPU, and iGPU. Using them simultaneously could be interesting. It might require custom work, though.

Reply View 2 replies

estimator7292 a day ago

If there are any LLM frameworks that can shard over disparate processor architectures I haven't heard of it.
It'd be pretty cool for sure, but you'd be absolutely strangled by memory bandwidth, I'd expect. LLM sure the chipset would not at all enjoy trying to route all that RAM to three processors at once.

Reply View | 1 reply
- nickpsecurity 21 hours ago
  
  No doubt. I had a few ideas for what might be done:
  1. Put the tokenizers or other lower-performance parts on the NPU.
  2. Pipelining that moves things through different models or layers on different hardware.
  3. If multiple layers, put most of them on the fastest part with a small number on the others. Like with hardware clocking, the ratio is decided to ensure the slower ones don't drag down overall performance.
  In things like game or real-time AI's, esp multimodal, there's even more potential as some parts could be on different chips.
  
  Reply View | 0 replies