Comment by jncfhnb

Comment by jncfhnb a year ago

I don’t think I believe that OCR can’t do it but random humans can

OCR is VERY good

jahewson a year ago

Actually I think in 2025 you are correct, we just haven’t got the best tech into the OCR software that’s out there in the real world. I just pasted the letter from the article into ChatGPT (4o) and asked “what does this old letter say?” The response:

—-

The following is the declaration of James Lambert, a soldier of the Revolutionary War in North America.

The said James Lambert on this day personally appeared in the Probate Court of the County of Dearborn in the State of Indiana and at the November Term of said Court (1841), it being a court of record established by the laws of Indiana and made oath that:

On the 25th day of March 1842 he will be eighty-five years old; that he was born in the State of Maryland; that he is now a resident of said county and has been for the 27 years last past; that he has lived in Virginia, Maryland, Pennsylvania…

—-

Reply View 0 replies

ozbonus a year ago

I've been trying every state of the art OCR solution on my students' handwritten essays for fifteen years and have yet to find anything even close to acceptable.

Reply View 2 replies

wriggler a year ago

I'm the founder of handwritingocr.com - have you checked out our free trial? We have loads of educators using our service for exactly this, and they seem quite happy with it.

Reply View | 0 replies
jncfhnb a year ago

What methods have you tried?

Reply View | 0 replies

AdieuToLogic a year ago

> I don’t think I believe that OCR can’t do it but random humans can

Considering the people involved are experts in their field, are certainly aware of OCR capabilities, and have publicized a need thusly:

  ... the National Archives is looking for volunteers who can 
  help transcribe and organize its many handwritten records ...

Perhaps "random humans" can perform tasks which could reshape your belief:

> OCR is VERY good

Reply View 22 replies

tptacek a year ago

No. Sign up and look at the current missions. A lot of what they want transcribed is totally straightforward to OCR --- not even LLM, OCR. Whatever's going on, and I'm not second-guessing them, a pretty big chunk of their problem appears to be well within the state of the art. The appeal to authority isn't going to play here, because you can just click through to the archives and see what they're trying to figure out.

Reply View | 14 replies
- AdieuToLogic a year ago
  
  > No. Sign up and look at the current missions. A lot of what they want transcribed is totally straightforward to OCR --- not even LLM, OCR. Whatever's going on, and I'm not second-guessing them, a pretty big chunk of their problem appears to be well within the state of the art.
  If it's that easy, then do it and be the hero they want.
  Or maybe, just maybe, "a pretty big chunk of their problem appears to be well within the state of the art" is a sweeping generalization lacking understanding of the difficulties involved.
  
  Reply View | 13 replies
  
  tptacek a year ago
  
  Go ahead and find something hard, and relate back the steps you took to find it.
  
  Reply View | 12 replies
jncfhnb a year ago

Also, you seem to have taken issue with the phrase “random humans” because you’re confused at what’s being done here. It is random humans. Non experts.
Experts are asking for the help of non experts.
> Anyone with an internet connection can volunteer to transcribe historical documents and help make the archives’ digital catalog more accessible

Reply View | 1 reply
- AdieuToLogic a year ago
  
  > Also, you seem to have taken issue with the phrase “random humans” because you’re confused at what’s being done here. It is random humans. Non experts.
  I'm largely aligned with your interpretation of "random humans", with a clarification below. The experts I was referencing are the ones you identified:
  > Experts are asking for the help of non experts.
  The call to action by the archivists (experts), IMHO, has the intent to engage people with interest in the topic. So not really random from a mathematical definition, but perhaps better thought of as "unknown interested parties."
  Granted, this is my unsubstantiated opinion.
  
  Reply View | 0 replies
jncfhnb a year ago

There are conceivable reasons why they may be telling a half truth here. Just engaging the public is a worthy goal here.

Reply View | 4 replies
- AdieuToLogic a year ago
  
  > There are conceivable reasons why they may be telling a half truth here. Just engaging the public is a worthy goal here.
  Asserting an ulterior motive without supporting proof is to engage in conspiracy theories.
  Sometimes a cigar is just a cigar.[0]
  0 - https://quoteinvestigator.com/2011/08/12/just-a-cigar/
  
  Reply View | 3 replies
  
  jncfhnb a year ago
  
  The alternative is me saying that appealing to their “expertise” is an appeal to authority fallacy that flies in the face of general evidence that modern OCR is far better than humans at character recognition. Especially random non specialized humans.
  
  Reply View | 1 reply
  
  AdieuToLogic a year ago
  
  Fair point.
  
  Reply View | 0 replies
  
  Dylan16807 a year ago
  
  It doesn't look like a cigar (very tricky documents) though. Hence the skepticism.
  
  Reply View | 0 replies

BugsJustFindMe a year ago

> I don’t think I believe that OCR can’t do it but random humans can

I do.

> OCR is VERY good

Uh, my experience is extremely different.

Reply View 26 replies

jncfhnb a year ago

I would challenge you to find a picture of text that you think a human can read and OCR cannot. I’m happy to demonstrate. The text shown in this article is trivial.

Reply View | 14 replies
- demosthanos a year ago
  
  The archivists themselves say that they run into such texts often enough that this program was needed:
  > The agency uses artificial intelligence and a technology known as optical character recognition to extract text from historical documents. But these methods don’t always work, and they aren’t always accurate.
  They are absolutely aware of the advances in these tools, so if they say they're not completely there yet I believe them. One likely reason is that the models probably have less 1800s-era cursive in their training set than they do modern cursive.
  It's likely that with more human-tagged data they could improve on the state of the art for OCR, but it's pretty arrogant to doubt the agency in charge of this sort of thing when they say the tech isn't there yet.
  
  Reply View | 7 replies
  
  tedunangst a year ago
  
  Can someone please post a sample of one of these images that can only be read by a human for us naive OCR believers to see?
  
  Reply View | 5 replies
  
  jncfhnb a year ago
  
  Then please provide a single example that we can’t instantly solve. Happy to prove them wrong.
  
  Reply View | 0 replies
- AdieuToLogic a year ago
  
  > I would challenge you to find a picture of text that you think a human can read and OCR cannot.
  Are you aware of CAPTCHA[0] images?
  0 - https://en.wikipedia.org/wiki/CAPTCHA
  
  Reply View | 4 replies
  
  jncfhnb a year ago
  
  Text that is _intentionally constructed_ to fool computers but not humans is obviously out of scope. But they’re generally easily solved with OCR these days anyway.
  
  Reply View | 0 replies
  
  jahewson a year ago
  
  Solvable with the right tools.
  https://github.com/noCaptchaAi/NoCaptcha-Ai-Browser-Extensio...
  
  Reply View | 2 replies
- BugsJustFindMe a year ago
  
  Yeah ok, but it might take me a few tries because I don't know what you're using. I hope that's agreeable?
  What does your OCR say that these say? The first one isn't too hard for a human (assuming appropriate language skill). The second one is a bit more difficult.
  https://imgur.com/a/CDU6Lgs
  
  Reply View | 0 replies
CamperBob2 a year ago

Your experience is obsolete.

Reply View | 10 replies
- BugsJustFindMe a year ago
  
  Oh, ok then.
  
  Reply View | 9 replies
  
  CamperBob2 a year ago
  
  I mean, all you have to do is feed the image to ChatGPT, and it will read it basically as well as you can.
  Denying/downvoting reality is always an option, of course.
  
  Reply View | 8 replies