Book Image

Python Data Analysis

By : Ivan Idris
Book Image

Python Data Analysis

By: Ivan Idris

Overview of this book

Table of Contents (22 chapters)
Python Data Analysis
Credits
About the Author
About the Reviewers
www.PacktPub.com
Preface
Key Concepts
Online Resources
Index

Writing CSV files with NumPy and pandas


In the previous chapters, we learned about reading CSV files. Writing CSV files is just as straightforward, but uses different functions and methods. Let's first generate some data to be stored in the CSV format. Generate a 3 x 4 NumPy array after seeding the random generator in the following code snippet.

Set one of the array values to NaN:

np.random.seed(42)

a = np.random.randn(3, 4)
a[2][2] = np.nan
print a

This code will print the array as follows:

[[ 0.49671415 -0.1382643   0.64768854  1.52302986]
 [-0.23415337 -0.23413696  1.57921282  0.76743473]
 [-0.46947439  0.54256004         nan -0.46572975]]

The NumPy savetxt() function is the counterpart of the NumPy loadtxt() function and can save arrays in delimited file formats such as CSV. Save the array we created with the following function call:

np.savetxt('np.csv', a, fmt='%.2f', delimiter=',', header=" #1,  #2,  #3,  #4")

In the preceding function call, we specified the name of the file to be saved, the array, an optional format, a delimiter (the default is space), and an optional header.

View the np.csv file we created with the cat command (cat np.csv) or an editor, such as Notepad on Windows. The contents of the file should be displayed as follows:

#  #1,  #2,  #3,  #4
0.50,-0.14,0.65,1.52
-0.23,-0.23,1.58,0.77
-0.47,0.54,nan,-0.47

Create a pandas DataFrame from the random values array:

df = pd.DataFrame(a)
print df

As you can observe, pandas automatically comes up with column names for our data:

          0         1         2         3
0  0.496714 -0.138264  0.647689  1.523030
1 -0.234153 -0.234137  1.579213  0.767435
2 -0.469474  0.542560NaN -0.465730

Write a DataFrame to a CSV file with the pandas to_csv() method as follows:

df.to_csv('pd.csv', float_format='%.2f', na_rep="NAN!")

We gave this method the name of the file, an optional format string analogous to the format parameter of the NumPy savetxt() function, and an optional string that represents NaN. View the pd.csv file to see the following:

,0,1,2,3
0,0.50,-0.14,0.65,1.52
1,-0.23,-0.23,1.58,0.77
2,-0.47,0.54,NAN!,-0.47

Take a look at the code in the writing_csv.py file in this book's code bundle:

import numpy as np
import pandas as pd

np.random.seed(42)

a = np.random.randn(3, 4)
a[2][2] = np.nan
print a
np.savetxt('np.csv', a, fmt='%.2f', delimiter=',', header=" #1,  #2,  #3,  #4")
df = pd.DataFrame(a)
print df
df.to_csv('pd.csv', float_format='%.2f', na_rep="NAN!")