Book Image

Practical Data Science Cookbook

By : Tony Ojeda, Sean Patrick Murphy, Benjamin Bengfort, Abhijit Dasgupta
Book Image

Practical Data Science Cookbook

By: Tony Ojeda, Sean Patrick Murphy, Benjamin Bengfort, Abhijit Dasgupta

Overview of this book

<p>As increasing amounts of data is generated each year, the need to analyze and operationalize it is more important than ever. Companies that know what to do with their data will have a competitive advantage over companies that don't, and this will drive a higher demand for knowledgeable and competent data professionals.</p> <p>Starting with the basics, this book will cover how to set up your numerical programming environment, introduce you to the data science pipeline (an iterative process by which data science projects are completed), and guide you through several data projects in a step-by-step format. By sequentially working through the steps in each chapter, you will quickly familiarize yourself with the process and learn how to apply it to a variety of situations with examples in the two most popular programming languages for data analysis—R and Python.</p>
Table of Contents (18 chapters)
Practical Data Science Cookbook
Credits
About the Authors
About the Reviewers
www.PacktPub.com
Preface
Index

Installing extra Python packages


There are a few additional Python libraries that you will need throughout this book. Just as R provides a central repository for community-built packages, so does Python in the form of the Python Package Index (PyPI). As of August 28, 2014, there were 48,054 packages in PyPI.

Getting ready

A reasonable Internet connection is all that is needed for this recipe. Unless otherwise specified, these directions assume that you are using the default Python distribution that came with your system, and not Anaconda.

How to do it...

The following steps will show you how to download a Python package and install it from the command line:

  1. Download the source code for the package in the place you like to keep your downloads.

  2. Unzip the package.

  3. Open a terminal window.

  4. Navigate to the base directory of the source code.

  5. Type in the following command:

    python setup.py install
    
  6. If you need root access, type in the following command:

    sudo python setup.py install
    

To use pip, the contemporary and easiest way to install Python packages, follow these steps:

  1. First, let's check whether you have pip already installed by opening a terminal and launching the Python interpreter. At the interpreter, type:

    >>>import pip
    
  2. If you don't get an error, you have pip installed and can move on to step 5. If you see an error, let's quickly install pip.

  3. Download the get-pip.py file from https://raw.github.com/pypa/pip/master/contrib/get-pip.py onto your machine.

  4. Open a terminal window, navigate to the downloaded file, and type:

    python get-pip.py
    

    Alternatively, you can type in the following command:

    sudo python get-pip.py
    
  5. Once pip is installed, make sure you are at the system command prompt.

  6. If you are using the default system distribution of Python, type in the following:

    pip install networkx
    

    Alternatively, you can type in the following command:

    sudo pip install networkx
    
  7. If you are using the Anaconda distribution, type in the following command:

    conda install networkx
    
  8. Now, let's try to install another package, ggplot. Regardless of your distribution, type in the following command:

    pip install ggplot
    

    Alternatively, you can type in the following command:

    sudo pip install ggplot
    

How it works...

You have at least two options to install Python packages. In the preceding "old fashioned" way, you download the source code and unpack it on your local computer. Next, you run the included setup.py script with the install flag. If you want, you can open the setup.py script in a text editor and take a more detailed look at exactly what the script is doing. You might need the sudo command, depending on the current user's system privileges.

As the second option, we leverage the pip installer, which automatically grabs the package from the remote repository and installs it to your local machine for use by the system-level Python installation. This is the preferred method, when available.

There's more...

pip is capable, so we suggest taking a look at the user guide online. Pay special attention to the very useful pip freeze > requirements.txt functionality so that you can communicate about external dependencies with your colleagues.

Finally, conda is the package manager and pip replacement for the Anaconda Python distribution or, in the words of its home page, "a cross-platform, Python-agnostic binary package manager". Conda has some very lofty aspirations that transcend the Python language. If you are using Anaconda, we encourage you to read further on what conda can do and use it, and not pip, as your default package manager.

See also