In this section, we are going learn about the pyPdf module, which helps in extracting the metadata from a pdf file. But first, what is metadata? Metadata is data about data. Metadata is structured information that describes primary data. Metadata is a summary of that data. It contains the basic information regarding your actual data. It helps in finding a particular instance of your data.
Make sure you have the pdf file present in your directory from which you want to extract the information.
First, we have to install the pyPdf module, as follows:
pip install pyPdf
Now, we will write a metadata_example.py script and we will see how we get the metadata information from it. We are going to write this script in Python 2:
import pyPdf
def main():
file_name = '/home/student/sample_pdf.pdf'
pdfFile = pyPdf.PdfFileReader(file...