Book Image

Python Data Analysis Cookbook

By : Ivan Idris
Book Image

Python Data Analysis Cookbook

By: Ivan Idris

Overview of this book

Data analysis is a rapidly evolving field and Python is a multi-paradigm programming language suitable for object-oriented application development and functional design patterns. As Python offers a range of tools and libraries for all purposes, it has slowly evolved as the primary language for data science, including topics on: data analysis, visualization, and machine learning. Python Data Analysis Cookbook focuses on reproducibility and creating production-ready systems. You will start with recipes that set the foundation for data analysis with libraries such as matplotlib, NumPy, and pandas. You will learn to create visualizations by choosing color maps and palettes then dive into statistical data analysis using distribution algorithms and correlations. You’ll then help you find your way around different data and numerical problems, get to grips with Spark and HDFS, and then set up migration scripts for web mining. In this book, you will dive deeper into recipes on spectral analysis, smoothing, and bootstrapping methods. Moving on, you will learn to rank stocks and check market efficiency, then work with metrics and clusters. You will achieve parallelism to improve system performance by using multiple threads and speeding up your code. By the end of the book, you will be capable of handling various data analysis techniques in Python and devising solutions for problem scenarios.
Table of Contents (23 chapters)
Python Data Analysis Cookbook
Credits
About the Author
About the Reviewers
www.PacktPub.com
Preface
Glossary
Index

Setting up Anaconda


Anaconda is a free Python distribution for data analysis and scientific computing. It has its own package manager, conda. The distribution includes more than 200 Python packages, which makes it very convenient. For casual users, the Miniconda distribution may be the better choice. Miniconda contains the conda package manager and Python. The technical editors use Anaconda, and so do I. But don't worry, I will describe in this book alternative installation instructions for readers who are not using Anaconda. In this recipe, we will install Anaconda and Miniconda and create a virtual environment.

Getting ready

The procedures to install Anaconda and Miniconda are similar. Obviously, Anaconda requires more disk space. Follow the instructions on the Anaconda website at http://conda.pydata.org/docs/install/quick.html (retrieved Mar 2016). First, you have to download the appropriate installer for your operating system and Python version. Sometimes, you can choose between a GUI and a command-line installer. I used the Python 3.4 installer, although my system Python version is v2.7. This is possible because Anaconda comes with its own Python. On my machine, the Anaconda installer created an anaconda directory in my home directory and required about 900 MB. The Miniconda installer installs a miniconda directory in your home directory.

How to do it...

  1. Now that Anaconda or Miniconda is installed, list the packages with the following command:

    $ conda list
    
  2. For reproducibility, it is good to know that we can export packages:

    $ conda list --export
    
  3. The preceding command prints packages and versions on the screen, which you can save in a file. You can install these packages with the following command:

    $ conda create -n ch1env --file <export file>
    

    This command also creates an environment named ch1env.

  4. The following command creates a simple testenv environment:

    $ conda create --name testenv python=3
    
  5. On Linux and Mac OS X, switch to this environment with the following command:

    $ source activate testenv
    
  6. On Windows, we don't need source. The syntax to switch back is similar:

    $ [source] deactivate
    
  7. The following command prints export information for the environment in the YAML (explained in the following section) format:

    $ conda env export -n testenv
    
  8. To remove the environment, type the following (note that even after removing, the name of the environment still exists in ~/.conda/environments.txt):

    $ conda remove -n testenv --all
    
  9. Search for a package as follows:

    $ conda search numpy
    

    In this example, we searched for the NumPy package. If NumPy is already present, Anaconda shows an asterisk in the output at the corresponding entry.

  10. Update the distribution as follows:

    $ conda update conda
    

There's more...

The .condarc configuration file follows the YAML syntax.

Note

YAML is a human-readable configuration file format with the extension .yaml or .yml. YAML was initially released in 2011, with the latest release in 2009. The YAML homepage is at http://yaml.org/ (retrieved July 2015).

You can find a sample configuration file at http://conda.pydata.org/docs/install/sample-condarc.html (retrieved July 2015). The related documentation is at http://conda.pydata.org/docs/install/config.html (retrieved July 2015).

See also