Introduction to R for Business Intelligence

Introduction to R for Business Intelligence

By : Jay Gendron

Buy this Book

Introduction to R for Business Intelligence

By: Jay Gendron

Buy this Book

Overview of this book

Explore the world of Business Intelligence through the eyes of an analyst working in a successful and growing company. Learn R through use cases supporting different functions within that company. This book provides data-driven and analytically focused approaches to help you answer questions in operations, marketing, and finance. In Part 1, you will learn about extracting data from different sources, cleaning that data, and exploring its structure. In Part 2, you will explore predictive models and cluster analysis for Business Intelligence and analyze financial times series. Finally, in Part 3, you will learn to communicate results with sharp visualizations and interactive, web-based dashboards. After completing the use cases, you will be able to work with business data in the R programming environment and realize how data science helps make informed decisions and develops business strategy. Along the way, you will find helpful tips about R and Business Intelligence.

Introduction to R for Business Intelligence

Credits

About the Author

Acknowledgement

About the Reviewers

www.PacktPub.com

Preface

Free Chapter

Extract, Transform, and Load

Understanding big data in BI analytics

Extracting data from sources

Transforming data to fit analytic needs

Loading data into business systems for analysis

Summary

Data Cleaning

Summarizing your data for inspection

Finding and fixing flawed data

Converting inputs to data types suitable for analysis

Adapting string variables to a standard

Summary

Exploratory Data Analysis

Understanding exploratory data analysis

Analyzing a single data variable

Analyzing two variables together

Exploring multiple variables simultaneously

Summary

Linear Regression for Business

Understanding linear regression

Checking model assumptions

Using a simple linear regression

Refining data for simple linear regression

Introducing multiple linear regression

Summary

Data Mining with Cluster Analysis

Explaining clustering analysis

Partitioning using k-means clustering

Clustering using hierarchical techniques

Summary

Time Series Analysis

Analyzing time series data with linear regression

Introducing key elements of time series analysis

Building ARIMA time series models

Summary

Visualizing the Datas Story

Visualizing data

Plotting with ggplot2

Geo-mapping using Leaflet

Creating interactive graphics using rCharts

Summary

Web Dashboards with Shiny

Creating a basic Shiny app

Creating a marketing-campaign Shiny app

Deploying your Shiny app

Summary

References

Other Helpful R Functions

Chapter 1 - Extract, Transform, and Load

Chapter 2 - Data Cleaning

R Packages Used in the Book

R Code for Supporting Market Segment Business Case Calculations

Customer Reviews

5 star

4 star

3 star

2 star

1 star

Extracting data from sources

Now we get to work. The aim of this book is to design, develop, and deliver a business intelligence product-a data product. In this section, you will explore two extraction methods to import data from different types of sources:

Importing CSV and other file formats
Importing data from relational databases

Depending on your background, you may be more or less familiar with both types of extraction methods. Here, you will learn or refresh your knowledge of both in order to have a more complete working understanding of ETL.

Importing CSV and other file formats

You can load the Bike Sharing data file into the R environment by using the read.csv() function. It is an easy and commonly used way to load CSV files:

bike <- read.csv("{filename}.csv")

Calling this function will read the file as long as it is located in your R working directory. The working directory is the space R uses, much like a home directory. There are two commands that you can use to check and change your working directory:

getwd(): This will return the current working directory as a path in the R console
setwd(<path>): This is used in the console to change the working directory to <path> you pass in the function

If you try to read a file that is not in your working directory, you will see an error message. Some analysts manage data in a separate data directory, one level under their working directory. In this case, you can add a path in front of the filename. The following example shows how the data file is located one layer down in a data directory. Adding the ./data/ string to the front of the filename will allow R to access the data:

bike <- read.csv("./data/Ch1_bike_sharing_data.csv") 
str(bike)

The str() function is not required to import the data, but it does provide a confirmation that the data was read into the R environment. It also provides you with a quick look at the structure of the dataset, its dimensions, and the names of the variables:

'data.frame':  17379 obs. of  12 variables:
 $ datetime  : Factor w/ 17379 levels "1/1/2011 0:00",..: 1 2 13  ...
 $ season    : int  1 1 1 1 1 1 1 1 1 1 ...
 $ holiday   : int  0 0 0 0 0 0 0 0 0 0 ...
 $ workingday: int  0 0 0 0 0 0 0 0 0 0 ...
 $ weather   : int  1 1 1 1 1 2 1 1 1 1 ...
 $ temp      : num  9.84 9.02 9.02 9.84 9.84 ...
 $ atemp     : num  14.4 13.6 13.6 14.4 14.4 ...
 $ humidity  : int  81 80 80 75 75 75 80 86 75 76 ...
 $ windspeed : num  0 0 0 0 0 ...
 $ casual    : int  3 8 5 3 0 0 2 1 1 8 ...
 $ registered: int  13 32 27 10 1 1 0 2 7 6 ...
 $ count     : int  16 40 32 13 1 1 2 3 8 14 ...

The read.csv() function uses default parameters that are consistent with CSV files, such as using a comma (,) as a separator. The R environment has a more flexible read option for instances when you may have tab-delimited data or TXT file. This option is the read.table() function:

bike <- read.table("./data/Ch1_bike_sharing_data.csv", sep = ",", 
                   header = TRUE)

This function performs identically to the read.csv() function. In fact, read.csv() is built on read.table() and uses sep = "," and header = TRUE as default parameters.

Tip

Downloading the example code

Detailed steps to download the code bundle are mentioned in the Preface of this book. Please have a look.

The code bundle for the book is also hosted on GitHub at https://github.com/PacktPublishing/Introduction-To-R-For-Business-Intelligence . We also have other code bundles from our rich catalog of books and videos available at https://github.com/PacktPublishing/ . Check them out!

Importing data from relational databases

You can also use R to access the data stored in many relational databases through the Open Database Connectivity (ODBC) application programming interface. In R, you do this using the RODBC package. This package provides you with an advantage for legacy data. By establishing an R connection to relational databases at Bike Sharing, LLC, you can reuse existing SQL-type queries that may already exist in the BI group.

Imagine that the Bike Sharing dataset was not a CSV file, but it was instead stored in a relational database. You can access the data by loading the RODBC package and establishing a connection to the database using the following parameters: data source name, user ID, and password. In this notional example, the user Paul would like to access the data source ourDB using his password R4BI:

library(RODBC) 
connection <- odbcConnect(dsn = "ourDB", uid = "Paul", pwd = "R4BI")

Note

As you follow along with this code, you will only be able to run the odbcConnect() function if you have configured a database source. Advanced users may try this using an Excel file. A tutorial for Windows-based environments is provided by Cyberfella LTD (2013) at http://www.cyberfella.co.uk/2012/06/11/windows7-odbc/ .

Having established a connection to your database, you can use R to write a SQL query, pass this query to the sqlQuery() function, and read the data into a bike data frame. You learned that there is a table in the database called marketing:

query <- "SELECT * FROM marketing" 
bike <- sqlQuery(connection, query) 
close(connection)

The close() function closes the connection between R and the database. Doing this will free any memory consumed by the open connection. It is a good practice to follow.

Tip

R tip: One thing you can do when establishing connections is to add a log to record your ETL processes. In the event something goes wrong, the log will capture the issue and help you find the problem quickly. The log4r package is specifically written for this application. For more information, visit the repository at https://github.com/johnmyleswhite/log4r .

There are also non-relational databases in the work place. Non-relational databases store documents, key value data, and unstructured data. Unlike the relational database approach, they contain different data structures grouped into nodes, clusters, or other schemas. The R community has created various packages to connect with non-relational databases and distributive filing systems. You can connect to MongoDB using the rmongodb package and to Spark distributed computing using the SparkR package.

Tip

BI tip: By looking at the database approach used within an organization and conceptualizing the data, a business analyst can better understand business processes to improve the quality of analysis and meet organizational needs.

Introduction to R for Business Intelligence

By : Jay Gendron

Introduction to R for Business Intelligence

By: Jay Gendron

Overview of this book

Related Content you might be interested in

Current Title:

Introduction to R for Business Intelligence

Extracting data from sources

Importing CSV and other file formats

Tip

Importing data from relational databases

Note

Tip

Tip