Book Image

Python Data Analysis

By : Ivan Idris
Book Image

Python Data Analysis

By: Ivan Idris

Overview of this book

Table of Contents (22 chapters)
Python Data Analysis
Credits
About the Author
About the Reviewers
www.PacktPub.com
Preface
Key Concepts
Online Resources
Index

Storing data with PyTables


Hierarchical Data Format (HDF) is a specification and technology for the storage of big numerical data. HDF was created in the supercomputing community and is now an open standard. The latest version of HDF is HDF5 and is the one we will be using. HDF5 structures data in groups and datasets. Datasets are multidimensional homogeneous arrays. Groups can contain other groups or datasets. Groups are like directories in a hierarchical filesystem.

The two main HDF5 Python libraries are:

  • h5y

  • PyTables

In this example, we will be using PyTables. PyTables has a number of dependencies:

  • NumPy: We installed NumPy in Chapter 1, Getting Started with Python Libraries

  • numexpr: This package claims that it evaluates multiple-operator array expressions many times faster than NumPy can

  • HDF5

    Note

    The parallel version of HDF5 also requires MPI. HDF5 can be installed by obtaining a distribution from http://www.hdfgroup.org/HDF5/release/obtain5.html and running the following commands (which could...