Comment by jbmsf

Comment by jbmsf 2 days ago

Interesting. We're using a SAAS solution for document extraction right now. I don't know if it's in our interest to build out more but I do like the idea of keeping extraction local.

jgalt212 2 days ago

Our customers insist we run everything on their docs locally.

Reply View 1 reply

fzysingularity 2 days ago

Absolutely, we’ve been hearing the same from our customers - which is why we thought it makes sense to open source a bunch of schemas so that they’re reusable and compatible across various inference providers (esp. Ollama/local ones).

Reply View | 0 replies

fzysingularity 2 days ago

Cool, what types of documents do you currently handle? We could share some of our learnings/schemas here too.

Reply View 3 replies

jbmsf a day ago

Mostly tax forms, state-specific formations documents (Articles of X), and state-specific payroll registration documents.

Reply View | 0 replies
andrewinardeer 2 days ago

Different commenter; Here I'm extracting data from commerical invoices, POs and bills of lading.

Reply View | 1 reply
- fzysingularity 2 days ago
  
  Ah cool, care to share a few examples? We can probably add those schemas in the next few days if there's enough folks who could benefit from this. A basic invoice schema is already there: https://github.com/vlm-run/vlmrun-hub/blob/main/vlmrun/hub/s...
  You can see some of the qualitative results on GPT4o, Gemini, Llama 3.2 11B, Phi-4 here: https://github.com/vlm-run/vlmrun-hub?tab=readme-ov-file#-qu...
  
  Reply View | 0 replies