With the launch of Office 2007, Microsoft introduced a new proprietary format for their Office products, such as "docx", "pptx", and "xlsx" files. These documents are actually a zipped directory consisting of XML and binary files. These documents have a great deal of embedded metadata stored in the XML files within the document. The two XML files we will look at are core.xml
and app.xml
that store different types of metadata.
The core.xml
file stores metadata related to the document such as author, the revision number, and who last modified the document. The app.xml
file stores metadata that is more specific to the contents of the file. For example, Word documents store page, paragraph, line, word, and character counts, whereas a PowerPoint presentation stores information related to slides, hidden slides, and note count among others.
To view this data, use an archive utility of your choice and unzip an existing 2007 or higher version Office document. You may...