Comment by Javantea_

Comment by Javantea_ 17 hours ago

4 replies

I'm surprised no one in the comments has mentioned overfitting. Perhaps this is too obvious but I think of it as a very clear bug in a model if it asserts something to be true because it has heard it once. I realize that training a model is not easy, but this is something that should've been caught before it was released. Either QA is sleeping on the job or they have intentionally released a model with serious flaws in its design/training. I also understand the intense pressure to release early and often, but this type of thing isn't a warning.

numpad0 16 hours ago

It's apparently known among LLM researchers that the best epoch count for LLM training is one. They go through the entire dataset once, and that makes best LLMs.

They know. LLM is a novel compression format for text(holographic memory or whatever). The question is whether the rest of the world accept this technology as it is or not.

jeroenhd 14 hours ago

Overfitting makes for more human-like output (because it's repeating words written by a human). Out of all possible failure states of a model, overfitting is probably what you want out of an LLM, as long as it's not overfitted enough to lose lawsuits.

  • fennecfoxy 11 hours ago

    I disagree. I'd include overfitting for LLMs as creating unreasonably strong connections to individual sequences used for training, whereas a good mix of that and connections between chunks of those sequences are required.

Tepix 16 hours ago

I think part of the problem is that the book is in the training set multiple times