Note: I will not be releasing any commercial code here.
Ok so I wrote the original version in 2012, for a client request of their existing workflow being .docx file based, and wishing each piece of content to only need one copy with no additional work to get them working in a browser format.
I found some basic tools online for pulling content (plaintext) from word files so realised I could too. Basically each .docx file is a ZIP archive of misc. other data. Inside are numerous XML files, but two relevant ones are "structure.xml", and "document.xml.rels", they both generally following OpenDocument XML format.
I then wrote a fully featured .docx -> PHP Data set -> Html parser, which at least at that time had no (free) equivalent. A few years later I opensourced what I could. I have been toying with the idea of writing a new iteration with tidier code set, and more error tolerant but prioritising it over my other personal projects is difficult, and to this day its client-variant which shares 90% of the code base has no issues. Current internal stats estimate it to have parsed over 5000 files ranging from 1MB to 30MB, with image, table, and basic text formatting (everything but word-textareas, as aligning them properly in reusable HTML is a fools errand).
One of the most annoying parts of managing a CMS, although it doesn't happen much now days, is having an SQL schema change. This used to need a developer to perform the update on every database in turn.
I performed the following, after planning out the procedure on my notepad:
This was a fun system as I piggybacked of both the internal API, and the Automatic testing systems which I developed several months before, and everything just worked! Ideal!