Comment by rtkwe
One that require additional work beyond simply feeding the image into the model would be this example which is a mix of barely legible hand written cursive and easy to read typed form. [0] Initially 4o just transcribes (successfully) the bottom half of the text and has to be prompted to attempt the top half at which point it seems to at best summarize the text instead of giving a direct transcription. [1] In fact it seems to mix up some portions of the latter half of the typed text with the written text in the portion of it's "transcription" about "reduced and indigent circumstances".
[1] Reproducing here since I cannot share the chat since it has user uploaded images. " The text in the top half of the image is handwritten and partially difficult to read due to its cursive style and some smudging. Here's my best transcription attempt for the top section:
...resident within four? years, swears and says that the name of the John Hopper mentioned in the foregoing declaration is the same person, and he verily believes the facts as stated in the declaration are true.
He further swears that the said John Hopper is in reduced and indigent circumstances and requires the aid of his country.
The declarant further swears he has no evidence now in his power of service, except the statement of Capt. (illegible name), as to his reduced circumstances ...
Sworn to before me, this day...
Some parts remain unclear due to the handwriting, but let me know if you'd like me to attempt further clarification on specific sections!"
> this example which is a mix of barely legible hand written cursive and easy to read typed form.
> In fact it seems to mix up some portions of the latter half of the typed text with the written text in the portion of it's "transcription" about "reduced and indigent circumstances".
What typed form? What typed text? That image is a single handwritten page, and the writing is quite clean, not "barely legible".† The file related to John Hopper appears to be 59 pages, and some of them are typed, but they're all separate images.
Are you trying to process all 59 pages at once? Why?
I should note that transcription is an excellent use of an LLM in the sense of a language model, as opposed to an "LLM" in the sense of several different pieces of software hooked together in cryptic ways. It would be a lot more useful, for this task, to have direct access to the language model backing 4o than to have access to a chatbot prompt that intermediates between you and the model.
† My biggest problems in reading the page: Cursive n and u are often identical glyphs (both written и), leading me to read "Ind." as "Jud."; and I had trouble with the "roster" at the bottom of the page. What felt weirdest about that was that the crossbar of the "t" is positioned well above the top of the stem, but that can't actually be what tripped me up, because on further review it's a common feature of the author's handwriting that I didn't even notice until I got to the very end of the letter. It's even true in the earlier instance of "Roster" higher up on the page. So my best guess is that the "os" doesn't look right to me.
I misread 1758 as 1958, too, but hopefully (a) that kind of thing wears off as you get used to reading documents about the Revolutionary War; and (b) it's a red flag when someone who died in 1838 was born in 1958 according to a letter written in 1935.