Comment by Dylan16807
Comment by Dylan16807 8 hours ago
There are two claims. The main one is that all of these documents are easy to individually transcribe by machine. The other is that a whole lot can be OCR'd, which is pretty simple to check.
That's not a claim that processing the entire archive would be trivial. And even if it was, whether that would make someone the "hero they want" is part of what's being called into question.
So your silly demand going unmet proves nothing.
Also, "give me an example please" is not a strawman!
If you actually want to prove something, you need to show at least one document in the set that a human can do but not a machine, or to really make a good point you need to show that a non-neglibile fraction fit that description.
> So your silly demand going unmet proves nothing.
I made demands of no one.
> Also, "give me an example please" is not a strawman!
My identification of the strawman was that it referenced "find something hard" when I had said "be the hero they want" and that what is needed in this specific problem domain may be more difficult than what a generalization addresses.
> If you actually want to prove something, you need to show at least one document in the set that a human can do but not a machine, or to really make a good point you need to show that a non-neglibile fraction fit that description.
Maybe this is the proof you demand.
LLM's are statistical prediction algorithms. As such, they are nondeterministic and, therefore, provide no guarantees as to the correctness of their output.
The National Archives have specific artifacts requiring precise textual data extraction.
Use of nondeterministic tools known to produce provably incorrect results eliminate their applicability in this workflow due to all of their output requiring human review. This is an unnecessary step and can be eliminated by the human reading the original text themself.
Does that satisfy your demand?