The last of the plugins, office_parser.py
, parses DOCX, PPTX, and XLSX files, extracting embedded metadata in XML files. We use the zipfile
module, which is part of the standard library, to unzip and access the contents of the Office document. This script has two functions: officeParser()
and getTags()
.
001 import zipfile 002 import os 003 from time import gmtime, strftime 004 005 from lxml import etree 006 import processors 007 008 __author__ = 'Preston Miller & Chapin Bryce' 009 __date__ = '20160401' 010 __version__ = 0.01 011 __description__ = 'This scripts parses embedded metadata from office files' 012 013 def officeParser(): ... 028 def getTags():