Book Image

Practical Data Science Cookbook, Second Edition - Second Edition

By : Prabhanjan Narayanachar Tattar, Bhushan Purushottam Joshi, Sean Patrick Murphy, ABHIJIT DASGUPTA, Anthony Ojeda
Book Image

Practical Data Science Cookbook, Second Edition - Second Edition

By: Prabhanjan Narayanachar Tattar, Bhushan Purushottam Joshi, Sean Patrick Murphy, ABHIJIT DASGUPTA, Anthony Ojeda

Overview of this book

As increasing amounts of data are generated each year, the need to analyze and create value out of it is more important than ever. Companies that know what to do with their data and how to do it well will have a competitive advantage over companies that don’t. Because of this, there will be an increasing demand for people that possess both the analytical and technical abilities to extract valuable insights from data and create valuable solutions that put those insights to use. Starting with the basics, this book covers how to set up your numerical programming environment, introduces you to the data science pipeline, and guides you through several data projects in a step-by-step format. By sequentially working through the steps in each chapter, you will quickly familiarize yourself with the process and learn how to apply it to a variety of situations with examples using the two most popular programming languages for data analysis—R and Python.
Table of Contents (17 chapters)
Title Page
Credits
About the Authors
About the Reviewer
www.PacktPub.com
Preface

Installing and using virtualenv


virtualenv is a transformative Python tool. Once you start using it, you will never look back. virtualenv creates a local environment with its own Python distribution installed. Once this environment is activated from the shell, you can easily install packages using pip install into the new local Python.

At first, this might sound strange. Why would anyone want to do this? Not only does this help you handle the issue of package dependencies and versions in Python but also allows you to experiment rapidly without breaking anything important. Imagine that you build a web application that requires Version 0.8 of the awesome_template library, but then your new data product needs the awesome_template library Version 1.2. What do you do? With virtualenv, you can have both.

As another use case, what happens if you don't have admin privileges on a particular machine? You can't install the packages using sudo pip install required for your analysis so what do you do? If you use virtualenv, it doesn't matter.

Virtual environments are development tools that software developers use to collaborate effectively. Environments ensure that the software runs on different computers (for example, from production to development servers) with varying dependencies. The environment also alerts other developers to the needs of the software under development. Python's virtualenv ensures that the software created is in its own holistic environment, can be tested independently, and built collaboratively.

Getting ready

Assuming you have completed the previous recipe, you are ready to go for this one.

How to do it...

Install and test the virtual environment using the following steps:

  1. Open a command-line shell and type in the following command:
pip install virtualenv

Alternatively, you can type in the following command:

sudo pip install virtualenv
  1. Once installed, type virtualenv in the command window, and you should be greeted with the information shown in the following screenshot:

  1. Create a temporary directory and change location to this directory using the following commands:
mkdir temp 
cd temp
  1. From within the directory, create the first virtual environment named venv:
virtualenv venv
  1. You should see text similar to the following:
New python executable in venv/bin/python 
Installing setuptools, pip...done.
  1. The new local Python distribution is now available. To use it, we need to activate venv using the following command:
source ./venv/bin/activate
  1. The activated script is not executable and must be activated using the source command. Also, note that your shell's command prompt has probably changed and is prefixed with venv to indicate that you are now working in your new virtual environment.
  2. To check this fact, use which to see the location of Python, as follows:
which python

You should see the following output:

/path/to/your/temp/venv/bin/python

So, when you type python once your virtual environment is activated, you will run the local Python.

  1. Next, install something by typing the following:
pip install flask

Flask is a micro-web framework written in Python; the preceding command will install a number of packages that Flask uses.

  1. Finally, we demonstrate the versioning power that virtual environment and pip offer, as follows:
pip freeze > requirements.txt 
cat requirements.txt

This should produce the following output:

Flask==0.10.1 
Jinja2==2.7.2 
MarkupSafe==0.19 
Werkzeug==0.9.4 
itsdangerous==0.23 
wsgiref==0.1.2
  1. Note that not only the name of each package is captured, but also the exact version number. The beauty of this requirements.txt file is that, if we have a new virtual environment, we can simply issue the following command to install each of the specified versions of the listed Python packages:
pip install -r requirements.txt
  1. To deactivate your virtual environment, simply type the following at the shell prompt:
deactivate

How it works...

virtualenv creates its own virtual environment with its own installation directories that operate independently from the default system environment. This allows you to try out new libraries without polluting your system-level Python distribution. Further, if you have an application that just works and want to leave it alone, you can do so by making sure the application has its own virtualenv.

There's more...

virtualenv is a fantastic tool, one that will prove invaluable to any Python programmer. However, we wish to offer a note of caution. Python provides many tools that connect to C-shared objects in order to improve performance. Therefore, installing certain Python packages, such as NumPy and SciPy, into your virtual environment may require external dependencies to be compiled and installed, which are system specific. Even when successful, these compilations can be tedious, which is one of the reasons for maintaining a virtual environment. Worse, missing dependencies will cause compilations to fail, producing errors that require you to troubleshoot alien error messages, dated make files, and complex dependency chains. This can be daunting even to the most veteran data scientist.

A quick solution is to use a package manager to install complex libraries into the system environment (aptitude or Yum for Linux, Homebrew or MacPorts for OS X, and Windows will generally already have compiled installers). These tools use precompiled forms of the third-party packages. Once you have these Python packages installed in your system environment, you can use the --system-site-packages flag when initializing a virtualenv. This flag tells the virtualenv tool to use the system site packages already installed and circumvents the need for an additional installation that will require compilation. In order to nominate packages particular to your environment that might already be in the system (for example, when you wish to use a newer version of a package), use pip install -I to install dependencies into virtualenv and ignore the global packages. This technique works best when you only install large-scale packages on your system, but use virtualenv for other types of development.

For the rest of the book, we will assume that you are using a virtualenv and have the tools mentioned in this chapter ready to go. Therefore, we won't enforce or discuss the use of virtual environments in much detail. Just consider the virtual environment as a safety net that will allow you to perform the recipes listed in this book in isolation.

See also

You can also refer to the following: