Book Image

Python Data Analysis Cookbook

By : Ivan Idris
Book Image

Python Data Analysis Cookbook

By: Ivan Idris

Overview of this book

Data analysis is a rapidly evolving field and Python is a multi-paradigm programming language suitable for object-oriented application development and functional design patterns. As Python offers a range of tools and libraries for all purposes, it has slowly evolved as the primary language for data science, including topics on: data analysis, visualization, and machine learning. Python Data Analysis Cookbook focuses on reproducibility and creating production-ready systems. You will start with recipes that set the foundation for data analysis with libraries such as matplotlib, NumPy, and pandas. You will learn to create visualizations by choosing color maps and palettes then dive into statistical data analysis using distribution algorithms and correlations. You’ll then help you find your way around different data and numerical problems, get to grips with Spark and HDFS, and then set up migration scripts for web mining. In this book, you will dive deeper into recipes on spectral analysis, smoothing, and bootstrapping methods. Moving on, you will learn to rank stocks and check market efficiency, then work with metrics and clusters. You will achieve parallelism to improve system performance by using multiple threads and speeding up your code. By the end of the book, you will be capable of handling various data analysis techniques in Python and devising solutions for problem scenarios.
Table of Contents (23 chapters)
Python Data Analysis Cookbook
Credits
About the Author
About the Reviewers
www.PacktPub.com
Preface
Glossary
Index

Sandboxing Python applications with Docker images


Docker uses Linux kernel features to provide an extra virtualization layer. Docker was created in 2013 by Solomon Hykes. Boot2Docker allows us to install Docker on Windows and Mac OS X too. Boot2Docker uses a VirtualBox VM that contains a Linux environment with Docker. In this recipe, we will set up Docker and download the continuumio/miniconda3 Docker image.

Getting ready

The Docker installation docs are saved at https://docs.docker.com/index.html (retrieved July 2015). I installed Docker 1.7.0 with Boot2Docker. The installer requires about 133 MB. However, if you want to follow the whole recipe, you will need several gigabytes.

How to do it...

  1. Once Boot2Docker is installed, you need to initialize the environment. This is only necessary once, and Linux users don't need this step:

    $ boot2docker init
    Latest release for github.com/boot2docker/boot2docker is v1.7.0
    Downloading boot2docker ISO image...
    Success: downloaded https://github.com/boot2docker/boot2docker/releases/download/v1.7.0/boot2docker.iso
    
  2. In the preceding step, you downloaded a VirtualBox VM to a directory such as /VirtualBox\ VMs/boot2docker-vm/.

    The next step for Mac OS X and Windows users is to start the VM:

    $ boot2docker start
    
  3. Check the Docker environment by starting a sample container:

    $ docker run hello-world
    

    Note

    Some people reported a hopefully temporary issue of not being able to connect. The issue can be resolved by issuing commands with an extra argument, for instance:

    $ docker [--tlsverify=false] run hello-world
    
  4. Docker images can be made public. We can search for such images and download them. In Setting up Anaconda, we installed Anaconda; however, Anaconda and Miniconda Docker images also exist. Use the following command:

    $ docker search continuumio
    
  5. The preceding command shows a list of Docker images from Continuum Analytics – the company that developed Anaconda and Miniconda. Download the Miniconda 3 Docker image as follows (if you prefer using my container, skip this):

    $ docker pull continuumio/miniconda3
    
  6. Start the image with the following command:

    $ docker run -t -i continuumio/miniconda3 /bin/bash
    

    We start out as root in the image.

  7. The command $ docker images should list the continuumio/miniconda3 image as well. If you prefer not to install too much software (possibly only Docker and Boot2Docker) for this book, you should use the image I created. It uses the continuumio/miniconda3 image as template. This image allows you to execute Python scripts in the current working directory on your computer, while using installed software from the Docker image:

    $ docker run -it -p 8888:8888 -v $(pwd):/usr/data -w /usr/data "ivanidris/pydacbk:latest" python <somefile>.py 
    
  8. You can also run a IPython notebook in your current working directory with the following command:

    $ docker run -it -p 8888:8888 -v $(pwd):/usr/data -w /usr/data "ivanidris/pydacbk:latest" sh -c "ipython notebook --ip=0.0.0.0 --no-browser"
    
  9. Then, go to either http://192.168.59.103:8888 or http://localhost:8888 to view the IPython home screen. You might have noticed that the command lines are quite long, so I will post additional tips and tricks to make life easier on https://pythonhosted.org/dautil (work in progress).

    The Boot2Docker VM shares the /Users directory on Mac OS X and the C:\Users directory on Windows. In general and on other operating systems, we can mount directories and copy files from the container as described in https://docs.docker.com/userguide/dockervolumes/ (retrieved July 2015).

  10. Shut down the VM (unless you are on Linux, where you use the docker command instead) with the following command:

    $ boot2docker down
    

How it works...

Docker Hub acts as a central registry for public and private Docker images. In this recipe, we downloaded images via this registry. To push an image to Docker Hub, we need to create a local registry first. The way Docker Hub works is in many ways comparable to the way source code repositories such as GitHub work. You can commit changes as well as push, pull, and tag images. The continuumio/miniconda3 image is configured with a special file, which you can find at https://github.com/ContinuumIO/docker-images/blob/master/miniconda3/Dockerfile (retrieved July 2015). In this file, you can read which image was used as base, the name of the maintainer, and the commands used to build the image.

See also