Book Image

Python Data Analysis Cookbook

By : Ivan Idris
Book Image

Python Data Analysis Cookbook

By: Ivan Idris

Overview of this book

Data analysis is a rapidly evolving field and Python is a multi-paradigm programming language suitable for object-oriented application development and functional design patterns. As Python offers a range of tools and libraries for all purposes, it has slowly evolved as the primary language for data science, including topics on: data analysis, visualization, and machine learning. Python Data Analysis Cookbook focuses on reproducibility and creating production-ready systems. You will start with recipes that set the foundation for data analysis with libraries such as matplotlib, NumPy, and pandas. You will learn to create visualizations by choosing color maps and palettes then dive into statistical data analysis using distribution algorithms and correlations. You’ll then help you find your way around different data and numerical problems, get to grips with Spark and HDFS, and then set up migration scripts for web mining. In this book, you will dive deeper into recipes on spectral analysis, smoothing, and bootstrapping methods. Moving on, you will learn to rank stocks and check market efficiency, then work with metrics and clusters. You will achieve parallelism to improve system performance by using multiple threads and speeding up your code. By the end of the book, you will be capable of handling various data analysis techniques in Python and devising solutions for problem scenarios.
Table of Contents (23 chapters)
Python Data Analysis Cookbook
Credits
About the Author
About the Reviewers
www.PacktPub.com
Preface
Glossary
Index

Creating a virtual environment with virtualenv and virtualenvwrapper


Virtual environments provide dependency isolation for small projects. They also keep your site-packages directory small. Since Python 3.3, virtualenv has been part of the standard Python distribution. The virtualenvwrapper Python project has some extra convenient features for virtual environment management. I will demonstrate virtualenv and virtualenvwrapper functionality in this recipe.

Getting ready

You need Python 3.3 or later. You can install virtualenvwrapper with pip command as follows:

$ [sudo] pip install virtualenvwrapper

On Linux and Mac, it's necessary to do some extra work—specifying a directory for the virtual environments and sourcing a script:

$ export WORKON_HOME=/tmp/envs
$ source /usr/local/bin/virtualenvwrapper.sh

Windows has a separate version, which you can install with the following command:

$ pip install virtualenvwrapper-win

How to do it...

  1. Create a virtual environment for a given directory with the pyvenv script part of your Python distribution:

    $ pyvenv /tmp/testenv
    $ ls
    bin        include        lib        pyvenv.cfg
    
  2. In this example, we created a testenv directory in the /tmp directory with several directories and a configuration file. The configuration file pyvenv.cfg contains the Python version and the home directory of the Python distribution.

  3. Activate the environment on Linux or Mac by sourcing the activate script, for example, with the following command:

    $ source bin/activate
    

    On Windows, use the activate.bat file.

  4. You can now install packages in this environment in isolation. When you are done with the environment, switch back on Linux or Mac with the following command:

    $ deactivate
    

    On Windows, use the deactivate.bat file.

  5. Alternatively, you could use virtualenvwrapper. Create and switch to a virtual environment with the following command:

    vagrant@data-science-toolbox:~$ mkvirtualenv env2
    
  6. Deactivate the environment with the deactivate command:

    (env2)vagrant@data-science-toolbox:~$ deactivate
    
  7. Delete the environment with the rmvirtualenv command:

    vagrant@data-science-toolbox:~$ rmvirtualenv env2
    

See also