Book Image

Learning pandas - Second Edition

By : Michael Heydt
Book Image

Learning pandas - Second Edition

By: Michael Heydt

Overview of this book

You will learn how to use pandas to perform data analysis in Python. You will start with an overview of data analysis and iteratively progress from modeling data, to accessing data from remote sources, performing numeric and statistical analysis, through indexing and performing aggregate analysis, and finally to visualizing statistical data and applying pandas to finance. With the knowledge you gain from this book, you will quickly learn pandas and how it can empower you in the exciting world of data manipulation, analysis and science.
Table of Contents (16 chapters)

Reading HTML data from the web

Pandas has support for reading data from HTML files (or HTML from URLs). Underneath the covers, pandas makes use of the LXML, Html5Lib, and BeautifulSoup4 packages. These packages provide some impressive capabilities for reading and writing HTML tables.

Your default installation of Anaconda may not include these packages. If you get errors using this function, install the appropriate library based on the error, using the Anaconda Navigator:

Else, you can use pip:

The pd.read_html() function will read HTML from a file (or URL) and parse all HTML tables found in the content into one or more pandas DataFrame objects. The function always returns a list of DataFrame objects (actually, zero or more, depending on the number of tables found in the HTML).

To demonstrate, we will read table data from the FDIC failed bank list, located at https://www.fdic...