Book Image

Python Business Intelligence Cookbook

Book Image

Python Business Intelligence Cookbook

Overview of this book

The amount of data produced by businesses and devices is going nowhere but up. In this scenario, the major advantage of Python is that it's a general-purpose language and gives you a lot of flexibility in data structures. Python is an excellent tool for more specialized analysis tasks, and is powered with related libraries to process data streams, to visualize datasets, and to carry out scientific calculations. Using Python for business intelligence (BI) can help you solve tricky problems in one go. Rather than spending day after day scouring Internet forums for “how-to” information, here you’ll find more than 60 recipes that take you through the entire process of creating actionable intelligence from your raw data, no matter what shape or form it’s in. Within the first 30 minutes of opening this book, you’ll learn how to use the latest in Python and NoSQL databases to glean insights from data just waiting to be exploited. We’ll begin with a quick-fire introduction to Python for BI and show you what problems Python solves. From there, we move on to working with a predefined data set to extract data as per business requirements, using the Pandas library and MongoDB as our storage engine. Next, we will analyze data and perform transformations for BI with Python. Through this, you will gather insightful data that will help you make informed decisions for your business. The final part of the book will show you the most important task of BI—visualizing data by building stunning dashboards using Matplotlib, PyTables, and iPython Notebook.
Table of Contents (12 chapters)
Python Business Intelligence Cookbook
Credits
About the Author
About the Reviewer
www.PacktPub.com
Preface
Index

Standardizing dates in Pandas


Along with Social Security numbers, there are almost infinite ways to write dates. In order to compare them properly, or use them in your own systems, you need to put them into a single format. This recipe handles commonly seen date formats in the United States, and dates with missing information.

Getting ready

Import Pandas and create a new DataFrame to work with:

import pandas as pd
lc = pd.DataFrame({
'people' : ["cole o'brien", "lise heidenreich", "zilpha skiles", "damion wisozk"],
'age' : [24, 35, 46, 57],
'ssn': ['6439', '689 24 9939', '306-05-2792', '992245832'],
'birth_date': ['2/15/54', '05/07/1958', '19XX-10-23', '01/26/0056'],
'customer_loyalty_level' : ['not at all', 'moderate', 'moderate', 'highly loyal']})

How to do it…

from time import strftime
from datetime import datetime

def standardize_date(the_date):
    """
    Standardizes a date
    :param the_date: the date to standardize
    :return formatted_date
    """

    # Convert what we have to a...