PhilGale.co.uk - Portfolio


Note: I will not be releasing any commercial code here.

DOCX PARSER

Ok so I wrote the original version in 2012, for a client request of their existing workflow being .docx file based, and wishing each piece of content to only need one copy with no additional work to get them working in a browser format.

I found some basic tools online for pulling content (plaintext) from word files so realised I could too. Basically each .docx file is a ZIP archive of misc. other data. Inside are numerous XML files, but two relevant ones are "structure.xml", and "document.xml.rels", they both generally following OpenDocument XML format.

I then wrote a fully featured .docx -> PHP Data set -> Html parser, which at least at that time had no (free) equivalent. A few years later I opensourced what I could. I have been toying with the idea of writing a new iteration with tidier code set, and more error tolerant but prioritising it over my other personal projects is difficult, and to this day its client-variant which shares 90% of the code base has no issues. Current internal stats estimate it to have parsed over 5000 files ranging from 1MB to 30MB, with image, table, and basic text formatting (everything but word-textareas, as aligning them properly in reusable HTML is a fools errand).

VIEW SOURCE


System: Automatic database upgrades

One of the most annoying parts of managing a CMS, although it doesn't happen much now days, is having an SQL schema change. This used to need a developer to perform the update on every database in turn.

I performed the following, after planning out the procedure on my notepad:

  • Added a new class for tracking installed DB upgrades on each website, and DB upgrades pending installation
  • Added functionality in the API CMS for creating a `db upgrade`, which takes your inserted SQL, and runs all the last set of unit tests, using the last dev- code base known to have passed.
  • Ensure if the code passes, the SQL is provided in a secondary payload to all API requests, where the pending install version is lower
  • Ensure the individual websites install said upgrades at their next background task slot.
  • Add an initial DB version constant, so future website builds know they're upgrades are already included.

This was a fun system as I piggybacked of both the internal API, and the Automatic testing systems which I developed several months before, and everything just worked! Ideal!