Comment by piker
Hey this looks really awesome. Super helpful for those of us who are building in the document space for debugging if nothing else. Here are a couple of other projects for you to develop with / on if you aren't already using them:
- https://github.com/mikeebowen/OOXML-Validator (if you plan on making edits, you'll want to ensure they're renderable by other Word users)
- https://marketplace.visualstudio.com/items?itemName=yuenm18.... (incredible VS code extension for debugging OOXML files)
One thing that will surprise a lot of users is how common old-style Word (.doc) files are still. For that you might consider integrating Antiword (https://github.com/grobian/antiword) if you can get comfortable with the licensing.
Be aware that styles play an important role in numbering that doesn't seem to be picked up here. So you'll want to apply the styles before calculating the numbering levels.
Over all really cool. Hit me up if you ever want to swap notes on Docx and Rust. My email is in my profile.
Keep it up!
Instead of Antiword, maybe using LibreOffice parsers directly would solve the problem for parsing all kinds of the documents.