Comment by mvac
How does it compare to Datalab/Marker https://github.com/datalab-to/marker ? We evaluated many PDF->MD converters and this one performed the best, though it is not perfect.
How does it compare to Datalab/Marker https://github.com/datalab-to/marker ? We evaluated many PDF->MD converters and this one performed the best, though it is not perfect.
As anecdotal evidence, it serves my complex-enough purposes very well - mathematics and code interspersed together. One of my "litmus test" papers is this old paper on a Fortran inverse-Laplace transform algorithm [1] that intersperses inline and display equations, and monospace code blocks, while requiring OCR from scratch, and very few models currently do a satisfactory job, i.e. in the following page transcribed by Marker,
https://imgur.com/a/Q7UYIfW
the inline $\sigma_0$ is mangled as "<sup>s</sup> 0", and $f(t)$ is mangled as "f~~t*!". The current model gets them both correct.