Comment by msp26

Comment by msp26 20 hours ago

1 reply

Agree completely. When I read the Gemma 3 paper (https://arxiv.org/html/2503.19786v1) and saw an entire section dedicated to measuring and reducing the memorization rate I was annoyed. How does this benefit end users at all?

I want the language model I'm using to have knowledge of cultural artifacts. Gemma 3 27B was useless at a question related to grouping Berserk characters by potential baldurs gate 3 classes; Claude did fine. The methods used to reduce memorisation rate probably also deteriorate performance in some other ways that don't show up on benchmarks.

ben_w 19 hours ago

> When I read the Gemma 3 paper (https://arxiv.org/html/2503.19786v1) and saw an entire section dedicated to measuring and reducing the memorization rate I was annoyed. How does this benefit end users at all?

It benefits users because memorisation is a waste of parameters that would be more useful if they were instead learning rules and generalisations.

For short snippets, common idioms and quotations that people recognise, exact quotes can be worth memorising; but the longer the quotations get, the less often it is important to be word-for-word exact — even for just a few paragraphs, I think most people only ever do oaths, anthems, songs they really like, and possibly a few hobbies.

If you want an exact quote, use (or tell the AI to use) a search engine.