The Writing Is on the Wall for Handwriting Recognition

macleginn 17 minutes ago

I became convinced of this after the release of KuroNet: https://arxiv.org/pdf/1910.09433 (High-quality OCR of Japanese manuscripts, which look almost impossible to read.)

Reply View 0 replies

coolness 2 hours ago

Great post and amazing progress in this field! However, I have to wonder if some of these letters were part of the training data for Gemini, since they are well-known and someone has probably already done the painstaking work of transcribing them...

Reply View 2 replies

MrSkelter 10 minutes ago

I have a personal corpus of letters between my grandparents in WW2. My grandfather fighting in Europe and my grandmother in England. The ability of Claude and ChatGPT to transcribe them is extremely impressive. Though I haven’t worked on them in months and this uses older models. At that time neither system could properly organize pages though and chatGPT would sometimes skip a paragraph.

Reply View | 0 replies
suddenlybananas 2 hours ago

Shhhhh no one cares about data contamination anymore.

Reply View | 0 replies

DarkNova6 32 minutes ago

> Here’s Transkribus’s best guess at George’s letter to Maryann, above:

Transkribus got a new model architecture around the corner and the results look impressive. Not only for trivial cases like text, but also for table structures and layouting.

Best of all, you can train it on your own corpus of text to support obscure languages and handwriting systems.

Really looking forward to it.

Reply View 0 replies

pjmlp an hour ago

Maybe for English, for the other human languages I use, it is still kind of hit and miss, just like speaking recognition, even with English it suffices to have an accent that is off the standard TV one.

Reply View 2 replies

macleginn 15 minutes ago

As always, this depends on the amount of training data available. Japanese is another success story: https://digitalorientalist.com/2020/02/18/cursive-japanese-a...

Reply View | 0 replies
NitpickLawyer an hour ago

ee lay vhen!

Reply View | 0 replies

__alexs an hour ago

Call me when it can do Russian Cursive.

Reply View 3 replies

decimalenough an hour ago

Seems to do an OK job:
https://g.co/gemini/share/e173d18d1d80
This is a random image from Twitter with no transcript or English translation provided, so it's not going to be in the training data.

Reply View | 2 replies
- shatsky 40 minutes ago
  
  No, transcription has nothing to do with written text, it guessed few words here and there but not even general topic. That's doctors note about patient visit, beginning with "Прием: состояние удовл., t*, но кашель / patient visit: condition is OK, t(temperature normal?) but coughing". But unreadable doctors handwriting is a meme...
  
  Reply View | 0 replies
- GaggiX 18 minutes ago
  
  That's Gemini 2.5 Flash btw
  The result from Gemini 3 Pro using the default media resolution (the medium one): "(Заголовок / Header): Арсеньев (Фамилия / Surname - likely "Arsenyev")
  Состояние удовл- t N, кожные покровы чистые, [л/у не увел.] В зеве умерен. [умеренная] гипер. [гиперемия] В легких дыха- ние жесткое, хрипов нет. Тоны серд- [ца] [ритм]ичные. Живот мяг- кий, б/б [безболезненный]. мочеисп. [мочеиспускание] своб. [свободное] Ds: ОРЗ [или ОРВИ]" and with the translation: "Arsenyev
  Condition satisfactory. Temp normal, skin coverings [skin] are clean, lymph nodes not enlarged. In the throat [pharynx], moderate hyperemia [redness]. In the lungs, breathing is rigid [hard], no rales [crackles/wheezing]. Heart tones are rhythmic. Abdomen is soft, painless. Urination is free [unhindered]. Diagnosis: ARD (Acute Respiratory Disease)."
  
  Reply View | 0 replies

tigerlily 39 minutes ago

Surely the true prize is to be able to ditch computers altogether and just write with pencil on paper.

Reply View 0 replies

iamflimflam1 an hour ago

If I went back in time to the 90s when I was doing my PhD I would absolutely blow my mind with how well handwriting OCR works now.

Reply View 0 replies

nikanj 6 minutes ago

The writing is on the wall for handwriting. Zoomers use speech recognition or touchscreen keyboards, millennials use keyboards. Boomers use pens

Reply View 0 replies

th0ma5 an hour ago

My question for OCR automation is always which digits within the numbers being read are allowed to be incorrect?

Reply View 0 replies