HN Top New Show Ask Jobs

settings

Theme

Hand Mode

Feed

Comment by ModelForge

Comment by ModelForge 5 days ago

0 replies

View on Hacker News

Could be an artifact of the small size not fully taking advantage of the GPU. For example, for the slightly larger Qwen3 0.6B model the A100 is faster (you can see it when scrolling to the bottom here: https://github.com/rasbt/LLMs-from-scratch/tree/main/ch05/11...)