Python Business Intelligence Cookbook

Python Business Intelligence Cookbook

Overview of this book

The amount of data produced by businesses and devices is going nowhere but up. In this scenario, the major advantage of Python is that it's a general-purpose language and gives you a lot of flexibility in data structures. Python is an excellent tool for more specialized analysis tasks, and is powered with related libraries to process data streams, to visualize datasets, and to carry out scientific calculations. Using Python for business intelligence (BI) can help you solve tricky problems in one go. Rather than spending day after day scouring Internet forums for “how-to” information, here you’ll find more than 60 recipes that take you through the entire process of creating actionable intelligence from your raw data, no matter what shape or form it’s in. Within the first 30 minutes of opening this book, you’ll learn how to use the latest in Python and NoSQL databases to glean insights from data just waiting to be exploited. We’ll begin with a quick-fire introduction to Python for BI and show you what problems Python solves. From there, we move on to working with a predefined data set to extract data as per business requirements, using the Pandas library and MongoDB as our storage engine. Next, we will analyze data and perform transformations for BI with Python. Through this, you will gather insightful data that will help you make informed decisions for your business. The final part of the book will show you the most important task of BI—visualizing data by building stunning dashboards using Matplotlib, PyTables, and iPython Notebook.

Python Business Intelligence Cookbook

Credits

About the Author

About the Reviewer

www.PacktPub.com

Preface

Free Chapter

Getting Set Up to Gain Business Intelligence

Introduction

Installing Anaconda

Learn about the Python libraries we will be using

Installing, configuring, and running MongoDB

Installing Rodeo

Starting Rodeo

Installing Robomongo

Using Robomongo to query MongoDB

Downloading the UK Road Safety Data dataset

Making Your Data All It Can Be

Importing a CSV file into MongoDB

Importing an Excel file into MongoDB

Importing a JSON file into MongoDB

Importing a plain text file into MongoDB

Retrieving a single record using PyMongo

Retrieving multiple records using PyMongo

Inserting a single record using PyMongo

Inserting multiple records using PyMongo

Updating a single record using PyMongo

Updating multiple records using PyMongo

Deleting a single record using pymongo

Deleting multiple records using PyMongo

Importing a CSV file into a Pandas DataFrame

Renaming column headers in Pandas

Filling in missing values in Pandas

Removing punctuation in Pandas

Removing whitespace in Pandas

Removing any string from within a string in Pandas

Merging two datasets in Pandas

Titlecasing anything

Uppercasing a column in Pandas

Updating values in place in Pandas

Standardizing a Social Security number in Pandas

Standardizing dates in Pandas

Converting categories to numbers in Pandas for a speed boost

Learning What Your Data Truly Holds

Creating a Pandas DataFrame from a MongoDB query

Creating a Pandas DataFrame from a CSV file

Creating a Pandas DataFrame from an Excel file

Creating a Pandas DataFrame from a JSON file

Creating a data quality report

Generating summary statistics for the entire dataset

Generating summary statistics for object type columns

Getting the mode of the entire dataset

Generating summary statistics for a single column

Getting a count of unique values for a single column

Getting the minimum and maximum values of a single column

Generating quantiles for a single column

Getting the mean, median, mode, and range for a single column

Generating a frequency table for a single column by date

Generating a frequency table of two variables

Creating a histogram for a column

Plotting the data as a probability distribution

Plotting a cumulative distribution function

Showing the histogram as a stepped line

Plotting two sets of values in a probability distribution

Creating a customized box plot with whiskers

Creating a basic bar chart for a single column over time

Performing Data Analysis for Non Data Analysts

Performing a distribution analysis

Performing categorical variable analysis

Performing a linear regression

Performing a time-series analysis

Performing outlier detection

Creating a predictive model using logistic regression

Creating a predictive model using a random forest

Creating a predictive model using Support Vector Machines

Saving a predictive model for production use

Building a Business Intelligence Dashboard Quickly

Creating reports in Excel directly from a Pandas DataFrame

Creating customizable Excel reports using XlsxWriter

Building a shareable dashboard using IPython Notebook and matplotlib

Exporting an IPython Notebook Dashboard to HTML

Exporting an IPython Notebook Dashboard to PDF

Exporting an IPython Notebook Dashboard to an HTML slideshow

Building your First Flask application in 10 minutes or less

Creating and saving your plots for your Flask BI dashboard

Building a business intelligence dashboard in Flask

Index

Customer Reviews

5 star

4 star

3 star

2 star

1 star

Converting categories to numbers in Pandas for a speed boost

When you have text categories in your data, you can dramatically speed up the processing of that data using Pandas categoricals. Categoricals encode the text as numerics, which allows us to take full advantage of Pandas' fast C code. Examples of times when you'd use categoricals are stock symbols, gender, experiment outcomes, states, and in this case, a customer loyalty level.

Getting ready

Import Pandas, and create a new DataFrame to work with.

import pandas as pd
import numpy as np
lc = pd.DataFrame({
'people' : ["cole o'brien", "lise heidenreich", "zilpha skiles", "damion wisozk"],
'age' : [24, 35, 46, 57],
'ssn': ['6439', '689 24 9939', '306-05-2792', '992245832'],
'birth_date': ['2/15/54', '05/07/1958', '19XX-10-23', '01/26/0056'],
'customer_loyalty_level' : ['not at all', 'moderate', 'moderate', 'highly loyal']})

How to do it…

First, convert the customer_loyalty_level column to a category type column:

lc.customer_loyalty_level...

Python Business Intelligence Cookbook

Python Business Intelligence Cookbook

Overview of this book

Related Content you might be interested in

Current Title:

Python Business Intelligence Cookbook

Converting categories to numbers in Pandas for a speed boost

Getting ready

How to do it…