RustGPT: A pure-Rust transformer LLM built from scratch

(github.com)

334 points by amazonhut 17 hours ago

171 comments

View on Hacker News

ramon156 15 hours ago

Cool stuff! I can see some GPT comments that can be removed

// Increased for better learning

this doesn't tell me anything

// Use the constants from lib.rs

const MAX_SEQ_LEN: usize = 80;

const EMBEDDING_DIM: usize = 128;

const HIDDEN_DIM: usize = 256;

these are already defined in lib.rs, why not use them (as the comment suggests)

Reply View 13 replies

leoh 10 hours ago

They should stay, because they are indicative of the fact that this wasn't built with actual understanding.

Reply View | 0 replies
mitchitized 7 hours ago

You're absolutely correct!

Reply View | 0 replies
ericdotlee 12 hours ago

Do you think vibe coded rust will rot the quality of language code generally?

Reply View | 4 replies
- 6r17 10 hours ago
  
  For AI you definitely need to clean up and I think even targeted learning on some practices would be beneficiary ; for users ; it depends on the people, and I'd argue that vibe-coded rust can be better than just "written-rust" IF the important details and mind of the user are actually focused on what is important ; Eg ; I could vibe-code a lock-free well architect-ed s3 - focus on all the important details that would actually make it high perf - or write some stuff myself 10x slower - which means I will have 10 x less time to work on the important stuff.
  However what you asked is wether the vibe coded rust will rot the quality of language ; this is a more difficult to answer to, but I don't think that people who are uninterested in the technics are going to go for rust anyway - from the signals I feedback people are actually not really liking it - they find it too difficult for some reason and prefer to blanket with stuff like C# or python.
  Can't explain why.
  
  Reply View | 1 reply
  
  miki123211 5 hours ago
  
  > I'd argue that vibe-coded rust can be better than just "written-rust
  I never thought about it this way, but it actually makes sense. It's just like how Rust / Go / Java / C# can sometimes be orders of magnitude faster than C, only because they're more expressive languages. If you have a limited amount of time, it may be possible to write an efficient, optimal and concurrent algorithm in Java, while in C, all you can do is the simplest possible solution. Linked list versus slices (which are much more cache-friendly) is the perfect example here.
  
  Reply View | 0 replies
- adastra22 12 hours ago
  
  These things will be corrected over time.
  
  Reply View | 1 reply
  
  yahoozoo 11 hours ago
  
  How do you mean?
  
  Reply View | 0 replies
tialaramex 13 hours ago

For the constants is it possible the author didn't know how? I remember in my first week of Rust I didn't understand how to name things properly, basically I was overthinking it.

Reply View | 2 replies
- vlovich123 13 hours ago
  
  Lots of signs this is an LLM-generated project. All the emojis in the README are a hint as well.
  
  Reply View | 0 replies
- tayo42 12 hours ago
  
  From his reddit post
  https://old.reddit.com/r/rust/comments/1nguv1a/i_built_an_ll...
  
  Reply View | 0 replies
tmaly 11 hours ago

did you add these as a PR ?

Reply View | 0 replies
sloppytoppy 14 hours ago

Oh yea I'm totally running this on my hardware. Extra credit for "from scratch" in the title. The future sucks.

Reply View | 1 reply
- [removed] 8 hours ago
  
  [deleted]
  
  Reply View | 0 replies

untrimmed 16 hours ago

As someone who has spent days wrestling with Python dependency hell just to get a model running, a simple cargo run feels like a dream. But I'm wondering, what was the most painful part of NOT having a framework? I'm betting my coffee money it was debugging the backpropagation logic.

Reply View 87 replies

ricardobeat 15 hours ago

Have you tried uv [1]? It has removed 90% of the pain of running python projects for me.
[1] https://github.com/astral-sh/uv

Reply View | 52 replies
- mtlmtlmtlmtl 13 hours ago
  
  I'm sure it's true and all. But I've been hearing the same claim about all those tools uv is intended to replace, for years now. And every time I try to run any of those, as someone who's not really a python coder, but can shit out scripts in it if needed and sometimes tries to run python software from github, it's been a complete clusterfuck.
  So I guess what I'm wondering is, are you a python guy, or are you more like me? because for basically any of these tools, python people tell me "tool X solved all my problems" and people from my own cohort tell me "it doesn't really solve anything, it's still a mess".
  If you are one of us, then I'm really listening.
  
  Reply View | 12 replies
  
  hobofan 13 hours ago
  
  I'm one of you.
  I'm about the highest tier of package manager nerd you'll find out there, but despite all that, I've been struggling to create/run/manage venvs out there for ages. Always afraid of installing a pip package or some piece of python-based software (that might muck up Python versions).
  I've been semi-friendly with Poetry already, but mostly because it was the best thing around at the time, and a step in the right direction.
  uv has truely been a game changer. Try it out!
  
  Reply View | 0 replies
  
  tinco 12 hours ago
  
  As a Ruby guy: uv makes Python feel like it finally passed the year 2010.
  
  Reply View | 1 reply
  
  llIIllIIllIIl 12 hours ago
  
  Don’t forget to schedule your colonoscopy as a Ruby guy
  
  Reply View | 0 replies
  
  Yoric 11 hours ago
  
  As a developer: it basically solved all of my problems that could be solved by a package manager.
  As an occasional trainer of scientists: it didn't seem to help my students.
  
  Reply View | 1 reply
  
  buildbot 11 hours ago
  
  It installs stuff super fast!
  It sadly doesn’t solve stuff like transformer_engine being built with cxx11 ABI and pytorch isn’t by default, leading to missing symbols…
  
  Reply View | 0 replies
  
  beacon294 6 hours ago
  
  It doesn't handle python version management, it only handles pip. It doesn't solve bundling Python.
  
  Reply View | 0 replies
  
  OrderlyTiamat 12 hours ago
  
  I'm (reluctantly) a python guy, and uv really is a much different experience for me than all the other tools. I've otherwise had much the same experience as you describe here. Maybe it's because `uv` is built in rust? ¯\_ (ツ)_/¯
  But I'd also hesitate to say it "solves all my problems". There's plenty of python problems outside of the core focus of `uv`. For example, I think building a python package for distribution is still awkward and docs are not straightforward (for example, pointing to non-python files which I want to include was fairly annoying to figure out).
  
  Reply View | 0 replies
  
  OoooooooO 11 hours ago
  
  As a mainly Python guy (Data Engineering so new project for every ETL pipeline = a lot of projects) uv solved every problem I had before with pip, conda, miniconda, pipx etc.
  
  Reply View | 0 replies
  
  J_Shelby_J 13 hours ago
  
  Isn’t UV essentially cargo for python?
  
  Reply View | 1 reply
  
  adastra22 12 hours ago
  
  Somewhat literally so. It is written in Rust and makes use of the cargo-util crate for some overlapping functionality.
  
  Reply View | 0 replies
  
  rossant 7 hours ago
  
  I know, but uv truly is different.
  
  Reply View | 0 replies
  
  jhardy54 13 hours ago
  
  I’m a “Python guy” in that I write Python professionally, but also am like you in that I’ve been extremely underwhelmed by Portry/Pipenv/etc.
  Python dependencies are still janky, but uv is a significant improvement over existing tools in both performance and ergonomics.
  
  Reply View | 0 replies
- TheAceOfHearts 14 hours ago
  
  Switching to uv made my python experience drastically better.
  If something doesn't work or I'm still encountering any kind of error with uv, LLMs have gotten good enough that I can just copy / paste the error and I'm very likely to zero-in on a working solution after a few iterations.
  Sometimes it's a bit confusing figuring out how to run open source AI-related python projects, but the combination of uv and iterating on any errors with an LLM has so far been able to resolve all the issues I've experienced.
  
  Reply View | 0 replies
- DiabloD3 15 hours ago
  
  uv is great, but I think the real fix is just abandoning Python.
  The culture that language maintains is rather hostile to maintainable development, easier to just switch to Rust and just write better code by default.
  
  Reply View | 36 replies
  
  trklausss 15 hours ago
  
  Every tool for the right job. If you are doing tons of scripting (for e.g. tests on platforms different than Rust), Python can be a solid valid alternative.
  Also, tons of CAE platforms have Python bindings, so you are "forced" to work on Python. Sometimes the solution is not just "abandoning a language".
  If it fits your purpose, knock yourself out, for others that may be reading: uv is great for Python dependency management on development, I still have to test it for deployment :)
  
  Reply View | 9 replies
  
  airza 15 hours ago
  
  There's not really another game in town if you want to do fast ML development :/
  
  Reply View | 20 replies
  
  pjmlp 13 hours ago
  
  I know Python since version 1.6.
  It is great for learning on how to program (BASIC replacement), OS scripting tasks as Perl replacement, and embedded scripting in GUI applications.
  Additionally understand PYTHONPATH, and don't mess with anything else.
  All the other stuff that is supposed to fix Python issues, I never bothered with them.
  Thankfully, other languages are starting to also have bindings to the same C and C++ compute libraries.
  
  Reply View | 0 replies
  
  wavemode 10 hours ago
  
  Rust is not a viable replacement for Python except in a few domains.
  
  Reply View | 0 replies
  
  Exuma 14 hours ago
  
  i hate python, but the idea of replacing python with rust is absurd
  
  Reply View | 0 replies
  
  WhereIsTheTruth 12 hours ago
  
  abandoning Python for Rust in AI would cripple the field, not rescue it
  the disease is the cargo cult addiction (which Rust is full of) to micro libraries, not the language that carries 90% of all peer reviewed papers, datasets, and models published in the last decade
  every major breakthrough, from AlphaFold to Stable Diffusion, ships with a Python reference implementation because that is the language researchers can read, reproduce, and extend, remove Python and you erase the accumulated, executable knowledge of an entire discipline overnight, enforcing Rust would sabotage the field more than anything
  on the topic of uv, it will do more harm than good by enabling and empowering cargo cults on a systemic level
  the solution has always been education, teaching juniors to value simplicity, portability and maintainability
  
  Reply View | 1 reply
  
  stonemetal12 9 hours ago
  
  Nah, it would be like going from chemistry to chemical engineering. Doing chemical reactions in the lab by hand is great for learning but you aren't going to run a fleet of cars on hand made gas. Getting ML out of the lab and into production needs that same mental conversion from CS to SE.
  
  Reply View | 0 replies
- shepardrtc 12 hours ago
  
  uv has been amazing for me. It just works, and it works fast.
  
  Reply View | 0 replies
codetiger 15 hours ago

I guess, resource utilization like GPU, etc

Reply View | 0 replies
Galanwe 15 hours ago

> spent days wrestling with Python dependency hell
I mean I would understand that comment in 2010, but in 2025 it's grossly ridiculous.

Reply View | 6 replies
- virtualritz 9 hours ago
  
  So in 2025, in Python, if I depend on two packages. A and B, and they both depend on different, API-incompatible or behavior-incompatible (or both) versions of C, that won't be an issue?
  That's not my experience and e.g. uv hasn't helped me with that. I believe this is an issue with Python itself?
  If parent was saying something "grossly ridiculous" I must be doing something wrong too. And I'm happy to hear what as that would lower the pain of using Python.
  I.e. this was assumably true three years ago:
  https://stackoverflow.com/questions/70828570/what-if-two-pyt...
  
  Reply View | 3 replies
  
  Galanwe 7 hours ago
  
  Well, first, this a purposefully contrived example, that pretty much does not happen in real life scenarios. So you're pretty much acknowledging that there is no real problem by having to resort to such length.
  Second, what exactly would you like to happen in that instance? You want to have, in a single project, the same library but at different and conflicting versions. The only way to solve that is to disambiguate, per call site, each use of said library. And guess what, that problem exist and was solved 30 years ago by simply providing different package names for different major version. You want to use both gtk 1 and gtk 2 ? Well you have the "gtk" and "gtk2" package, done, disambiguated. I don't think there is any package manager out there providing "gtk" and having version 1 and 2, it's just "gtk" and "gtk2".
  Now we could design a solution around that I guess, nothing is impossible in this brave new world of programing, but that seems like a wasted effort for not-a-problem.
  
  Reply View | 2 replies
- adastra22 12 hours ago
  
  Yeah, because of a tool written in Rust, copying the Rust way of doing things for Python developers.
  
  Reply View | 1 reply
  
  Galanwe 11 hours ago
  
  I am not even thinking of `uv`, but rather of pyproject.toml, and the various improvements as to how dependencies are declared and resolved. You don't get much simpler than a toml file listing your dependencies and constraints, along with a lock file.
  Also let's keep middle school taunts at home.
  
  Reply View | 0 replies
zoobab 14 hours ago

"a simple cargo run feels like a dream"
A cargo build that warms up your CPU during winter while recompiling the whole internet is better?

Reply View | 2 replies
- surajrmal 12 hours ago
  
  It has 3 direct dependencies and not too many more transitively. You're certainly not recompiling the internet. If you're going to run a local llm I doubt you're building on a toaster so build speed won't be a big ordeal either.
  
  Reply View | 0 replies
- tracker1 10 hours ago
  
  I recently upped to a 9950X with a gen5 nvme.. TBH, even installing a few programs from cargo (which does compiles) is pretty quick now. Even coming from a 5950X with a gen4 drive.
  
  Reply View | 0 replies
taminka 16 hours ago

lowkey ppl who praise cargo seem to have no idea of the tradeoffs involved in dependency management
the difficulty of including a dependency should be proportional to the risk you're taking on, meaning it shouldn't be as difficult as it in, say, C where every other library is continually reinventing the same 5 utilities, but also not as easy as it is with npm or cargo, because you get insane dependency clutter, and all the related issues like security, build times, etc
how good a build system isn't equivalent of how easy it is include a dependency, while modern languages should have a consistent build system, but having a centralised package repository that anyone freely pull to/from, and having those dependencies freely take on any number of other dependencies is a bad way to handle dependencies

Reply View | 22 replies
- dev_l1x_be 15 hours ago
  
  > lowkey ppl who praise cargo seem to have no idea
  Way to go on insulting people on HN. Cargo is literally the reason why people coming to Rust from languages like C++ where the lack of standardized tooling is giant glaring bomb crater that poses burden on people every single time they need to do some basic things (like for example version upgrades).
  Example:
  https://github.com/facebook/folly/blob/main/build.sh
  
  Reply View | 10 replies
  
  taminka 14 hours ago
  
  i'm saying that ease of dependency inclusion should not be a main criterion for evaluating how good a build system is, not that it isn't the main criterion for many people...
  like the entire point of my comment is that people have misguided criteria for evaluating build systems, and your comment seems to just affirm this?
  
  Reply View | 8 replies
  
  huflungdung 15 hours ago
  
  [dead]
  
  Reply View | 0 replies
- quantumspandex 15 hours ago
  
  Security is another problem, and should be tackled systematically. Artificially making dependency inclusion hard is not it and is detrimental to the more casual use cases.
  
  Reply View | 0 replies
- hobofan 13 hours ago
  
  > but having a centralised package repository that anyone freely pull to/from, and having those dependencies freely take on any number of other dependencies is a bad way to handle dependencies
  So put a slim layer of enforcement to enact those policies on top? Who's stopping you from doing that?
  
  Reply View | 0 replies
- itsibitzi 15 hours ago
  
  What tool or ecosystem does this well, in your opinion?
  
  Reply View | 2 replies
  
  taminka 14 hours ago
  
  any language that has a standardised build system (virtually every language nowadays?), but doesn't have a centralised package repository, such that including a dependency is seamless, but takes a bit of time and intent
  i like how zig does this, and the creator of odin has a whole talk where he basically uses the same arguments as my original comment to reason why odin doesn't have a package manager
  
  Reply View | 1 reply
  
  zoobab 12 hours ago
  
  "a standardised build system (virtually every language nowadays?)"
  Python packages still manage poorly dependencies that are in another lang like C or C++.
  
  Reply View | 0 replies
- IshKebab 15 hours ago
  
  This is the weirdest excuse for Python's terrible tooling that I've ever heard.
  "It's deliberately shit so that people won't use it unless they really have to."
  
  Reply View | 1 reply
  
  taminka 14 hours ago
  
  i just realised that my comment sounds like it's praising python's package management since it's often so inconvenient to use, i want to mention that that wasn't my intended point, python's package management contains the worst aspects from both words: being centralised AND horrible to use lol
  my mistake :)
  
  Reply View | 0 replies
- [removed] 15 hours ago
  
  [deleted]
  
  Reply View | 0 replies
- MangoToupe 11 hours ago
  
  > the difficulty of including a dependency should be proportional to the risk you're taking on
  Why? Dependency hell is an unsolvable problem. Might as well make it easier to evaluate the tradeoff between dependencies and productivity. You can always arbitrarily ban dependencies.
  
  Reply View | 0 replies
- jokethrowaway 15 hours ago
  
  Is your argument that python's package management & ecosystem is bad by design - to increase security?
  In my experience it's just bugs and poor decision making on the maintainers (eg. pytorch dropping support for intel mac, leftpad in node) or on the language and package manager developers side (py2->3, commonjs, esm, go not having a package manager, etc).
  Cargo has less friction than pypi and npm. npm has less friction than pypi.
  And yet, you just need to compromise one lone, unpaid maintainer to wreck the security of the ecosystem.
  
  Reply View | 1 reply
  
  taminka 14 hours ago
  
  nah python's package management is just straight up terrible by every metric, i just used it as a tangent to talk about how imo ppl incorrectly evaluate build systems
  
  Reply View | 0 replies

om8 5 hours ago

Have a similar project. Also written in rust, runs in a browser using web assembly

In-browser demo: https://galqiwi.github.io/aqlm-rs

Source code: https://github.com/galqiwi/demo-aqlm-rs

Reply View 0 replies

techsystems 16 hours ago

> ndarray = "0.16.1" rand = "0.9.0" rand_distr = "0.5.0"

Looking good!

Reply View 18 replies

kachapopopow 16 hours ago

I was slightly curious: cargo tree llm v0.1.0 (RustGPT) ├── ndarray v0.16.1 │ ├── matrixmultiply v0.3.9 │ │ └── rawpointer v0.2.1 │ │ [build-dependencies] │ │ └── autocfg v1.4.0 │ ├── num-complex v0.4.6 │ │ └── num-traits v0.2.19 │ │ └── libm v0.2.15 │ │ [build-dependencies] │ │ └── autocfg v1.4.0 │ ├── num-integer v0.1.46 │ │ └── num-traits v0.2.19 () │ ├── num-traits v0.2.19 () │ └── rawpointer v0.2.1 ├── rand v0.9.0 │ ├── rand_chacha v0.9.0 │ │ ├── ppv-lite86 v0.2.20 │ │ │ └── zerocopy v0.7.35 │ │ │ ├── byteorder v1.5.0 │ │ │ └── zerocopy-derive v0.7.35 (proc-macro) │ │ │ ├── proc-macro2 v1.0.94 │ │ │ │ └── unicode-ident v1.0.18 │ │ │ ├── quote v1.0.39 │ │ │ │ └── proc-macro2 v1.0.94 () │ │ │ └── syn v2.0.99 │ │ │ ├── proc-macro2 v1.0.94 () │ │ │ ├── quote v1.0.39 () │ │ │ └── unicode-ident v1.0.18 │ │ └── rand_core v0.9.3 │ │ └── getrandom v0.3.1 │ │ ├── cfg-if v1.0.0 │ │ └── libc v0.2.170 │ ├── rand_core v0.9.3 () │ └── zerocopy v0.8.23 └── rand_distr v0.5.1 ├── num-traits v0.2.19 () └── rand v0.9.0 ()
yep, still looks relatively good.

Reply View | 12 replies
- imtringued 14 hours ago
  
  cargo tree llm v0.1.0 (RustGPT) ├── ndarray v0.16.1 │ ├── matrixmultiply v0.3.9 │ │ └── rawpointer v0.2.1 │ │ [build-dependencies] │ │ └── autocfg v1.4. │ ├── num-complex v0.4.6 │ │ └── num-traits v0.2.19 │ │ └── libm v0.2.15 │ │ [build-dependencies] │ │ └── autocfg v1.4.0 │ ├── num-integer v0.1.46 │ │ └── num-traits v0.2.19 () │ ├── num-traits v0.2.19 () │ └── rawpointer v0.2.1 ├── rand v0.9.0 │ ├── rand_chacha v0.9.0 │ │ ├── ppv-lite86 v0.2.20 │ │ │ └── zerocopy v0.7.35 │ │ │ ├── byteorder v1.5.0 │ │ │ └── zerocopy-derive v0.7.35 (proc-macro) │ │ │ ├── proc-macro2 v1.0.94 │ │ │ │ └── unicode-ident v1.0.18 │ │ │ ├── quote v1.0.39 │ │ │ │ └── proc-macro2 v1.0.94 () │ │ │ └── syn v2.0.99 │ │ │ ├── proc-macro2 v1.0.94 () │ │ │ ├── quote v1.0.39 () │ │ │ └── unicode-ident v1.0.18 │ │ └── rand_core v0.9.3 │ │ └── getrandom v0.3.1 │ │ ├── cfg-if v1.0.0 │ │ └── libc v0.2.170 │ ├── rand_core v0.9.3 () │ └── zerocopy v0.8.23 └── rand_distr v0.5.1 ├── num-traits v0.2.19 () └── rand v0.9.0 ()
  
  Reply View | 0 replies
- cmrdporcupine 15 hours ago
  
  linking both rand-core 0.9.0 and rand-core 0.9.3 which the project could maybe avoid by just specifying 0.9 for its own dep on it
  
  Reply View | 10 replies
  
  Diggsey 11 hours ago
  
  It doesn't link two versions of `rand-core`. That's not even possible with rust (you can only link two semver-incompatible versions of the same crate). And dependency specifications in Rust don't work like that - unless you explicitly override it, all dependencies are semver constraints, so "0.9.0" will happily match "0.9.3".
  
  Reply View | 9 replies
worldsavior 11 hours ago

This doesn't mean anything. A project can implement things from scratch inefficiently but there might be other libraries the project can use instead of reimplementing.

Reply View | 0 replies
tonyhart7 16 hours ago

is this satire or does I must know context behind this comment???

Reply View | 3 replies
- stevedonovan 16 hours ago
  
  These are a few well-chosen dependencies for a serious project.
  Rust projects can really go bananas on dependencies, partly because it's so easy to include them
  
  Reply View | 0 replies
- obsoleszenz 16 hours ago
  
  The project only has 3 dependencies which i interpret as a sign of quality
  
  Reply View | 0 replies
- leoh 10 hours ago
  
  I don't know if OP intended satire, but either way it is an absurd comment. Think about how "from scratch" this really is.
  
  Reply View | 0 replies

enricozb 16 hours ago

I did this [0] (gpt in rust) with picogpt, following the great blog by jaykmody [1].

[0]: https://github.com/enricozb/picogpt-rust [1]: https://jaykmody.com/blog/gpt-from-scratch/

Reply View 0 replies

Snuggly73 15 hours ago

Congrats - there is a very small problem with the LLM - its reusing transformer blocks and you want to use different instances of them.

Its a very cool excercise, I did the same with Zig and MLX a while back, so I can get a nice foundation, but since then as I got hooked and kept adding stuff to it, switched to Pytorch/Transformers.

Reply View 2 replies

icemanx 15 hours ago

correction: It's a cool exercise if you write it yourself and not use GPT

Reply View | 1 reply
- Snuggly73 15 hours ago
  
  well, hopefully the author did learn something or at least enjoyed the process :)
  (the code looks like a very junior or a non-dev wrote it tbh).
  
  Reply View | 0 replies

jlmcgraw 15 hours ago

Some commentary from the author here: https://www.reddit.com/r/rust/comments/1nguv1a/i_built_an_ll...

Reply View 0 replies

selinkocalar 4 hours ago

The memory safety guarantees in Rust are probably useful here given how easy it is to have buffer overflows in transformer implementations. CUDA kernels are still going to dominate performance though. Curious about the tokenization approach - are you implementing BPE from scratch too or using an existing library?

Reply View 0 replies

Charon77 16 hours ago

Absolutely love how readable the entire project is

Reply View 17 replies

koakuma-chan 15 hours ago

It's AI generated

Reply View | 10 replies
- Revisional_Sin 15 hours ago
  
  How do you know? The over-commenting?
  
  Reply View | 8 replies
  
  koakuma-chan 15 hours ago
  
  I know because this is how an AI generated project looks. Clearly AI generated README, "clean" code, the way files are named, etc.
  
  Reply View | 3 replies
  
  adastra22 12 hours ago
  
  Because the author said so on Reddit.
  
  Reply View | 0 replies
  
  GardenLetter27 15 hours ago
  
  The repeated Impls are strange.
  
  Reply View | 2 replies
- [removed] 15 hours ago
  
  [deleted]
  
  Reply View | 0 replies
emporas 16 hours ago

It is very procedural/object oriented. This is not considered good Rust practice. Iterators make it more functional, which is better, more succinct that is, and enums more algebraic. But it's totally fine for a thought experiment.

Reply View | 1 reply
- [removed] 9 hours ago
  
  [deleted]
  
  Reply View | 0 replies
yieldcrv 16 hours ago

Never knew Rust could be that readable. Makes me think other Rust engineers are stuck in a masochistic ego driven contest, which would explain everything else I've encountered about the Rust community and recruiting on that side.

Reply View | 3 replies
- GardenLetter27 15 hours ago
  
  Most Rust code looks like this - only generic library code goes crazy with all the generics and lifetimes, due to the need to avoid unnecessary mallocs and also provide a flexible API to users.
  But most people aren't writing libraries.
  
  Reply View | 1 reply
  
  cmrdporcupine 12 hours ago
  
  Don't underestimate what some programmers trying to prove their cleverness (or just trying to have fun) can do if left unchecked. I think most Rust code does indeed look like this but I've seen plenty of projects that go crazy with lifetimes and generics juggling where they don't have to.
  
  Reply View | 0 replies
- jmaker 16 hours ago
  
  Not sure what you’re alluding to but that’s just ordinary Rust without performance or async IO concerns.
  
  Reply View | 0 replies

ndai 16 hours ago

I’m curious where you got your training data? I will look myself, but saw this and thought I’d ask. I have a CPU-first, no-backprop architecture that works very well on classification datasets. It can do single‑example incremental updates which might be useful for continuous learning. I made a toy demo to train on tiny.txt and it can predict next characters, but I’ve never tried to make an LLM before. I think my architecture might work well as an on-device assistant or for on-premises needs, but I want to work with it more before I embarrass myself. Any open-source LLM training datasets you would recommend?

Reply View 4 replies

electroglyph 16 hours ago

https://huggingface.co/datasets/NousResearch/Hermes-3-Datase...

Reply View | 1 reply
- Snuggly73 15 hours ago
  
  To my untrained eye, this looks more like an instruct dataset.
  For just plain text, I really like this one - https://huggingface.co/datasets/roneneldan/TinyStories
  
  Reply View | 0 replies
kachapopopow 16 hours ago

huggingface has plenty of openai and antrophic user to assistant chains, beware there are dragons (hallucinations), but good enough for instruction training. I actually recommend distilling kimi k2 instead for instruction following capabilities.

Reply View | 0 replies
hadlock 9 hours ago

The training data is contained inside main.rs; it looks like about 50 statements about general facts. Probably to keep training time down. If you go off-script things fall apart pretty quickly:
Enter prompt: hello Model output: What are eclipses ? Assistant : Eclipses occur when one celestial body moves into the shadow of another </s>
Enter prompt: what are facts Model output: eclipses Assistant : Eclipses Eclipses What What was when What through you ? through you meet through using of What was What tall ? of What thank help you explain using , and do you a patterns you you a patterns through air in tall you help metal metal </s>
Enter prompt: how are mountains formed? Model output: I ? ' I ' : Of : in happy Hi wind in yeast altering it it </s>

Reply View | 0 replies

Goto80 16 hours ago

Nice. Mind to put a license on that?

Reply View 1 reply

thomask1995 15 hours ago

License added! Good catch

Reply View | 0 replies

kachapopopow 16 hours ago

This looks rather similar to when I asked an AI to implement a basic xor problem solver I guess fundementally there's really only a very limited amount of ways to implement this.

Reply View 0 replies

chcardoz 6 hours ago

super fun!! I am running it right now and going to use it to train on a corpus of my own writing to make a gpt of myself.

Reply View 0 replies

abricq 16 hours ago

This is great ! Congratulations. I really like your project, especially I like how easily it is to peak at.

Do you plan on moving forward with this project ? I seem to understand that all the training is done on the CPU, and that you have next steps regarding optimizing that. Do you consider GPU accelerations ?

Also, do you have any benchmarks on known hardware ? Eg, how long would it take to train on a macbook latest gen or your own computer ?

Reply View 1 reply

thomask1995 8 hours ago

HI! OG Author here.
Honestly, I don't know.
This was purely a toy project/thought experiment to challenge myself to learn exactly how these LLMs worked.
It was super cool to see the loss go down and it actually "train".
This is SUPER far from a the real deal. Maybe it could be cool to see how far a fully in memory LLM running on CPU can go.

Reply View | 0 replies

yobbo 10 hours ago

Very nice! Next thing to add would be numerical gradient testing.

Reply View 2 replies

tripplyons 10 hours ago

Is that where you approximate a partial derivative as a difference in loss over a small difference in a single parameter's value?
Seems like a great way to verify results, but it has the same downsides as forward mode automatic differentiation since it works in a pretty similar fashion.

Reply View | 1 reply
- yobbo 9 hours ago
  
  Yes, the purpose is to verify the gradient computations which are typically incorrect on the first try for things like self-attention and softmax. It is very slow.
  It is not necessary for auto-differentiation, but this project does not use that.
  
  Reply View | 0 replies

capestart 14 hours ago

Very cool project, always nice to see deep learning built from scratch in Rust without heavy frameworks.

Reply View 0 replies

bionhoward 13 hours ago

That time to first token is impressive, it seems like it responds immediately

Reply View 0 replies

ericdotlee 12 hours ago

This is incredibly cool, but I wonder when more of the AI ecosystem will move past python tooling into something more... performant?

Very interesting to already see rust based inference frameworks as well.

Reply View 2 replies

leoh 10 hours ago

"Python" is perfectly performant for AI and this demonstrates a deep lack of understanding. Virtually every library in python used for AI delegates to lower-level code written in C++.

Reply View | 1 reply
- tcfhgj 9 hours ago
  
  well, not all the time, e.g. orchestration and handling between multiple libraries
  
  Reply View | 0 replies