Comment by RicoElectrico

Comment by RicoElectrico 13 hours ago

Could be it used to (maybe with help of a downstream LLM) parse a photo/PDF of a restaurant menu into a JSON file conforming to a schema? Or would bigger, hosted multimodal LLMs work better in such case?