Book Image

Learning Predictive Analytics with Python

By : Ashish Kumar, Gary Dougan
Book Image

Learning Predictive Analytics with Python

By: Ashish Kumar, Gary Dougan

Overview of this book

Social Media and the Internet of Things have resulted in an avalanche of data. Data is powerful but not in its raw form - It needs to be processed and modeled, and Python is one of the most robust tools out there to do so. It has an array of packages for predictive modeling and a suite of IDEs to choose from. Learning to predict who would win, lose, buy, lie, or die with Python is an indispensable skill set to have in this data age. This book is your guide to getting started with Predictive Analytics using Python. You will see how to process data and make predictive models from it. We balance both statistical and mathematical concepts, and implement them in Python using libraries such as pandas, scikit-learn, and numpy. You’ll start by getting an understanding of the basics of predictive modeling, then you will see how to cleanse your data of impurities and get it ready it for predictive modeling. You will also learn more about the best predictive modeling algorithms such as Linear Regression, Decision Trees, and Logistic Regression. Finally, you will see the best practices in predictive modeling, as well as the different applications of predictive modeling in the modern world.
Table of Contents (19 chapters)
Learning Predictive Analytics with Python
Credits
Foreword
About the Author
Acknowledgments
About the Reviewer
www.PacktPub.com
Preface
A List of Links
Index

Python and its packages – download and installation


There are various ways in which one can access and install Python and its packages. Here we will discuss a couple of them.

Anaconda

Anaconda is a popular Python distribution consisting of more than 195 popular Python packages. Installing Anaconda automatically installs many of the packages discussed in the preceding section, but they can be accessed only through an IDE called Spyder (more on this later in this chapter), which itself is installed on Anaconda installation. Anaconda also installs IPython Notebook and when you click on the IPython Notebook icon, it opens a browser tab and a Command Prompt.

Note

Anaconda can be downloaded and installed from the following web address: http://continuum.io/downloads

Download the suitable installer and double click on the .exe file and it will install Anaconda. Two of the features that you must check after the installation are:

  • IPython Notebook

  • Spyder IDE

Search for them in the "Start" icon's search, if it doesn't appear in the list of programs and files by default. We will be using IPython Notebook extensively and the codes in this book will work the best when run in IPython Notebook.

IPython Notebook can be opened by clicking on the icon. Alternatively, you can use the Command Prompt to open IPython Notebook. Just navigate to the directory where you have installed Anaconda and then write ipython notebook, as shown in the following screenshot:

Fig. 1.3: Opening IPython Notebook

Note

On the system used for this book, Anaconda was installed in the C:\Users\ashish directory. One can open a new Notebook in IPython by clicking on the New Notebook button on the dashboard, which opens up. In this book, we have used IPython Notebook extensively.

Standalone Python

You can download a Python version that is stable and is compatible to the OS on your system. The most stable version of Python is 2.7.0. So, installing this version is highly recommended. You can download it from https://www.python.org/ and install it.

There are some Python packages that you need to install on your machine before you start predictive analytics and modelling. This section consists of a demo of installation of one such library and a brief description of all such libraries.

Installing a Python package

There are several ways to install a Python package. The easiest and the most effective is the one using pip. As you might be aware, pip is a package management system that is used to install and manage software packages written in Python. To be able to use it to install other packages, pip needs to be installed first.

Installing pip

The following steps demonstrate how to install pip. Follow closely!

  1. Navigate to the webpage shown in the following screenshot. The URL address is https://pypi.python.org/pypi/pip:

    Downloading pip from the Python's official website

  2. Download the pip-7.0.3.tar.gz file and unzip in the folder where Python is installed. If you have Python v2.7.0 installed, this folder should be C:\Python27:

    Unzipping the .zar file for pip in the correct folder

  3. On unzipping the previously mentioned file, a folder called pip-7.0.3 is created. Opening that folder will take you to the screen similar to the one in the preceding screenshot.

  4. Open the CMD on your computer and change the current directory to the current directory in the preceding screenshot that is C:\Python27\pip-7.0.3 using the following command:

    cd C:\Python27\pip-7.0.3.
  5. The result of the preceding command is shown in the following screenshot:

    Navigating to the directory where pip is installed

  6. Now, the current directory is set to the directory where setup file for pip (setup.py) resides. Write the following command to install pip:

    python setup.py install
  7. The result of the preceding command is shown in the following screenshot:

    Installing pip using a command line

Once pip is installed, it is very easy to install all the required Python packages to get started.

Installing Python packages with pip

The following are the steps to install Python packages using pip, which we just installed in the preceding section:

  1. Change the current directory in the command prompt to the directory where the Python v2.7.0 is installed that is: C:\Python27.

  2. Write the following command to install the package:

    pip install package-name
  3. For example, to install pandas, you can proceed as follows:

    Installing a Python package using a command line and pip

  4. Finally, to confirm that the package has installed successfully, write the following command:

    python  -c "import pandas"
  5. The result of the preceding command is shown in the following screenshot:

    Checking whether the package has installed correctly or not

If this doesn't throw up an error, then the package has been installed successfully.