Comment by alephnerd
Comment by alephnerd 18 hours ago
> While the Harry Potter series may be fun reading, it doesn't provide information about anything that isn't better covered elsewhere
It has copyright implications - if Claude can recollect 42% of a copyrighted product without attribution or royalties, how did Anthropic train it?
> Train scientific LLMs to the level of a good early 20th century English major and then use science texts and research papers for the remainder
Plenty of in-stealth companies approaching LLMs via this approach ;)
For those of us who studied the natural sciences and CS in the 2000s and early 2010s, there was a bit of a trend where certain PIs would simply translate German and Russian papers from the early-to-mid 20th century and attribute them to themselves in fields like CS (especially in what became ML).
> It has copyright implications - if Claude can recollect 42% of a copyrighted product without attribution or royalties, how did Anthropic train it?
Personally I’m assuming the worst.
That being said, Harry Potter was such a big cultural phenomenon that I wonder to what degree might one actually be able to reconstruct the books based solely on publicly accessible derivative material.