Book Image

Python Data Analysis

By : Ivan Idris
Book Image

Python Data Analysis

By: Ivan Idris

Overview of this book

Table of Contents (22 chapters)
Python Data Analysis
Credits
About the Author
About the Reviewers
www.PacktPub.com
Preface
Key Concepts
Online Resources
Index

Comparing the NumPy .npy binary format and pickling pandas DataFrames


Saving data in the CSV format is fine most of the time. It is easy to exchange CSV files, since most programming languages and applications can handle this format. However, it is not very efficient; CSV and other plaintext formats take up a lot of space. Numerous file formats have been invented, which offer a high level of compression such as zip, bzip, and gzip.

The following is the complete code for this storage comparison exercise, which can also be found in the binary_formats.py file of this book's code bundle:

import numpy as np
import pandas as pd
from tempfile import NamedTemporaryFile
from os.path import getsize

np.random.seed(42)
a = np.random.randn(365, 4)

tmpf = NamedTemporaryFile()
np.savetxt(tmpf, a, delimiter=',')
print "Size CSV file", getsize(tmpf.name)

tmpf = NamedTemporaryFile()
np.save(tmpf, a)
tmpf.seek(0)
loaded = np.load(tmpf)
print "Shape", loaded.shape
print "Size .npy file", getsize(tmpf.name...