Comment by el_don_almighty
Comment by el_don_almighty 7 hours ago
I have been looking for something that would ingest a decade of old Word and PowerPoint documents and convert them into a standardized format where the individual elements could be repurposed for other formats. This seems like a critical building block for a system that would accomplish this task.
Now I need a catalog, archive, or historian function that archives and pulls the elements easily. Amazing work!
Can't you just start with unoconv or pandoc, then maybe use an LLM to clean up after converting to plain text?