coolness 2 hours ago

Great post and amazing progress in this field! However, I have to wonder if some of these letters were part of the training data for Gemini, since they are well-known and someone has probably already done the painstaking work of transcribing them...

  • MrSkelter 10 minutes ago

    I have a personal corpus of letters between my grandparents in WW2. My grandfather fighting in Europe and my grandmother in England. The ability of Claude and ChatGPT to transcribe them is extremely impressive. Though I haven’t worked on them in months and this uses older models. At that time neither system could properly organize pages though and chatGPT would sometimes skip a paragraph.

DarkNova6 32 minutes ago

> Here’s Transkribus’s best guess at George’s letter to Maryann, above:

Transkribus got a new model architecture around the corner and the results look impressive. Not only for trivial cases like text, but also for table structures and layouting.

Best of all, you can train it on your own corpus of text to support obscure languages and handwriting systems.

Really looking forward to it.

pjmlp an hour ago

Maybe for English, for the other human languages I use, it is still kind of hit and miss, just like speaking recognition, even with English it suffices to have an accent that is off the standard TV one.

__alexs an hour ago

Call me when it can do Russian Cursive.

  • decimalenough an hour ago

    Seems to do an OK job:

    https://g.co/gemini/share/e173d18d1d80

    This is a random image from Twitter with no transcript or English translation provided, so it's not going to be in the training data.

    • shatsky 40 minutes ago

      No, transcription has nothing to do with written text, it guessed few words here and there but not even general topic. That's doctors note about patient visit, beginning with "Прием: состояние удовл., t*, но кашель / patient visit: condition is OK, t(temperature normal?) but coughing". But unreadable doctors handwriting is a meme...

    • GaggiX 18 minutes ago

      That's Gemini 2.5 Flash btw

      The result from Gemini 3 Pro using the default media resolution (the medium one): "(Заголовок / Header): Арсеньев (Фамилия / Surname - likely "Arsenyev")

          Состояние удовл-
      
          t N, кожные
      
          покровы чистые,
      
          [л/у не увел.]
      
          В зеве умерен. [умеренная]
      
          гипер. [гиперемия]
      
          В легких дыха-
      
          ние жесткое, хрипов
      
          нет. Тоны серд-
      
          [ца] [ритм]ичные.
      
          Живот мяг-
      
          кий, б/б [безболезненный].
      
          мочеисп. [мочеиспускание] своб. [свободное]
      
          Ds: ОРЗ [или ОРВИ]" and with the translation: "Arsenyev
      
      Condition satisfactory. Temp normal, skin coverings [skin] are clean, lymph nodes not enlarged. In the throat [pharynx], moderate hyperemia [redness]. In the lungs, breathing is rigid [hard], no rales [crackles/wheezing]. Heart tones are rhythmic. Abdomen is soft, painless. Urination is free [unhindered]. Diagnosis: ARD (Acute Respiratory Disease)."
tigerlily 39 minutes ago

Surely the true prize is to be able to ditch computers altogether and just write with pencil on paper.

iamflimflam1 an hour ago

If I went back in time to the 90s when I was doing my PhD I would absolutely blow my mind with how well handwriting OCR works now.

nikanj 6 minutes ago

The writing is on the wall for handwriting. Zoomers use speech recognition or touchscreen keyboards, millennials use keyboards. Boomers use pens

th0ma5 an hour ago

My question for OCR automation is always which digits within the numbers being read are allowed to be incorrect?