Comment by souvik3333
Comment by souvik3333 9 hours ago
Hey, the reason for the long processing time is that lots of people are using it, and with probably larger documents. I tested your file locally seems to be working correctly. https://ibb.co/C36RRjYs
Regarding the token limit, it depends on the text. We are using the qwen-2.5-vl tokenizer in case you are interested in reading about it.
You can run it very easily in a Colab notebook. This should be faster than the demo https://github.com/NanoNets/docext/blob/main/PDF2MD_README.m...
There are incorrect words in the extraction, so I would suggest you to wait for the handwritten text model's release.
> I tested your file locally seems to be working correctly
Apologies if there's some unspoken nuance in this exchange, but by "working correctly" did you just mean that it ran to completion? I don't even recognize some of the unicode characters that it emitted (or maybe you're using some kind of strange font, I guess?)
Don't misunderstand me, a ginormous number of floating point numbers attempting to read that handwriting is already doing better than I can, but I was just trying to understand if you thought that outcome is what was expected