Book Image

Python Data Analysis - Second Edition

By : Ivan Idris
Book Image

Python Data Analysis - Second Edition

By: Ivan Idris

Overview of this book

Data analysis techniques generate useful insights from small and large volumes of data. Python, with its strong set of libraries, has become a popular platform to conduct various data analysis and predictive modeling tasks. With this book, you will learn how to process and manipulate data with Python for complex analysis and modeling. We learn data manipulations such as aggregating, concatenating, appending, cleaning, and handling missing values, with NumPy and Pandas. The book covers how to store and retrieve data from various data sources such as SQL and NoSQL, CSV fies, and HDF5. We learn how to visualize data using visualization libraries, along with advanced topics such as signal processing, time series, textual data analysis, machine learning, and social media analysis. The book covers a plethora of Python modules, such as matplotlib, statsmodels, scikit-learn, and NLTK. It also covers using Python with external environments such as R, Fortran, C/C++, and Boost libraries.
Table of Contents (22 chapters)
Python Data Analysis - Second Edition
Credits
About the Author
About the Reviewers
www.PacktPub.com
Customer Feedback
Preface
Key Concepts
Online Resources

Installing Python 3


The software used in this book is based on Python 3, so you need to have Python 3 installed. On some operating systems, Python 3 is already installed. There are many implementations of Python, including commercial implementations and distributions. In this book, we will focus on the standard Python implementation, which is guaranteed to be compatible with NumPy.

Note

You can download Python 3.5.x from https://www.python.org/downloads/. On this web page, you can find installers for Windows and Mac OS X, as well as source archives for Linux, Unix, and Mac OS X. You can find instructions for installing and using Python for various operating systems at https://docs.python.org/3/using/index.html.

The software we will install in this chapter has binary installers for Windows, various Linux distributions, and Mac OS X. There are also source distributions, if you prefer. You need to have Python 3.5.x or above installed on your system. The sunset date for Python 2.7 was moved from 2015 to 2020, thus Python 2.7 will be supported and maintained until 2020. For these reasons, we have updated this book for Python 3.

Installing data analysis libraries

We will learn how to install and set up NumPy, SciPy, Pandas, Matplotlib, IPython, and Jupyter Notebook on Windows, Linux, and Mac OS X. Let's look at the process in detail. We shall use pip3 to install the libraries. From version 3.4 onwards, pip3 has been included by default with the Python installation.

On Linux or Mac OS X

To install the foundational libraries, run the following command line instruction:

$ pip3 install numpy scipy pandas matplotlib jupyter notebook 

It may be necessary to prepend sudo to this command if your current user doesn't have sufficient rights on your system.

On Windows

At the time of writing this book, we had the following software installed as a prerequisite on our Windows 10 virtual machine:

Download and install the appropriate prebuilt NumPy and Scipy binaries for your Windows platform from http://www.lfd.uci.edu/~gohlke/pythonlibs/:

  • We downloaded numpy-1.12.0+mkl-cp36-cp36m-win_amd64.whl and scipy-0.18.1-cp36-cp36m-win_amd64.whl

  • After downloading, we executed the pip3 install Downloads\numpy-1.12.0+mkl-cp36-cp36m-win_amd64.whl and pip3 install Downloads\scipy-0.18.1-cp36-cp36m-win_amd64.whl commands

After these prerequisites are installed, to install the rest of the foundational libraries, run the following command line instruction:

$ pip3 install pandas matplotlib jupyter

Tip

Installing Jupyter using these commands, installs all the required packages, such as Notebook and IPython.