Book Image

Introduction to R for Business Intelligence

By : Jay Gendron
Book Image

Introduction to R for Business Intelligence

By: Jay Gendron

Overview of this book

Explore the world of Business Intelligence through the eyes of an analyst working in a successful and growing company. Learn R through use cases supporting different functions within that company. This book provides data-driven and analytically focused approaches to help you answer questions in operations, marketing, and finance. In Part 1, you will learn about extracting data from different sources, cleaning that data, and exploring its structure. In Part 2, you will explore predictive models and cluster analysis for Business Intelligence and analyze financial times series. Finally, in Part 3, you will learn to communicate results with sharp visualizations and interactive, web-based dashboards. After completing the use cases, you will be able to work with business data in the R programming environment and realize how data science helps make informed decisions and develops business strategy. Along the way, you will find helpful tips about R and Business Intelligence.
Table of Contents (19 chapters)
Introduction to R for Business Intelligence
About the Author
About the Reviewers
R Packages Used in the Book
R Code for Supporting Market Segment Business Case Calculations

Understanding big data in BI analytics

Before we begin describing the ETL process, consider its importance in business intelligence. CIO Magazine provides a popular and useful definition of BI (Mulcahy, 2007):

"Business intelligence, or BI, is an umbrella term that refers to a variety of software applications used to analyze an organization's raw data. BI as a discipline is made up of several related activities, including data mining, online analytical processing, querying and reporting."

Mulcahy captures the essence of this book, which presents solutions in R to walk you through the steps from data analytic techniques to communicating your results. The purpose of BI applications has changed over the last decade as big data challenges affect the business world in ways first experienced in the sciences decades ago.

You can find the term big data in many business settings. It appears in advertisements for boot camps, draws attendees to conferences, and perplexes business leaders. Arguably, the term is ill-defined. A 1998 presentation given by John Mashey, then the Chief Scientist of Silicon Graphics, is often cited as the document that introduced the term (Press, 2013). The impact of big data on business is undeniable, despite its elusive meaning. There is a general agreement on the following three characteristics of big data, called the 3Vs:

  • Volume: The size of datasets has grown from megabytes to petabytes

  • Velocity: The speed of data arrival has changed to near real time

  • Variety: The sources of data have grown from structured databases to unstructured ones, such as social media, websites, audio, and video

Together these three characteristics pose a growing challenge to the business community. Data is stored in facilities across a vast network of local servers or relational databases. Virtual software access it with cloud-based applications. BI applications have typically included static dashboards based on fixed measures using structured data. Big data changes the business by affording a competitive advantage to those who can extract value from the large and rapidly changing sources of diverse data.

Today, people ask business analysts, what is going to happen? To answer this type of question, a business needs tools and processes to tap into the growing stream of data. Often this data will not fit into the existing databases without transformation. The continual need to acquire data requires a structured ETL approach to wrangle the unstructured nature of modern data. As you read this chapter, think about how companies may benefit from using the techniques presented, even when they are less complex than big data.


Use case: Bike Sharing, LLC

You will begin your exploration of BI and analytics through the lens of a fictional business called Bike Sharing, LLC. The company operates and maintains a fleet of publically rental bikes in the Washington D.C. metropolitan area. Their customers are typically from the urban area, including people from business, government, and universities. Customers enjoy the convenience of finding bikes easily within a network of bike-sharing stations throughout the city. Renters may rent a bicycle at one location and leave it at another station.Bike Sharing, LLC started operations in 2011, and has enjoyed continued growth. They quickly established a BI group to keep track of the data collected about transactions, customers, and factors related to rentals, such as weather, holidays, and times of day. In 2014, they began to understand how they might use open source datasets to guide decisions regarding sales, operations, and advertising. In 2015, they expanded their BI talent pool with business analysts experienced with R and statistical methods that could use Bike Sharing data in new ways.

You joined Bike Sharing just a few months ago. You have a basic understanding of R from the many courses and tutorials that you used to expand your skills. You are working with a good group that has a diverse skillset, including programming, databases, and business knowledge. The first data you have been given is bike rental data covering the two-year period from Jan 1, 2011 to Dec 31, 2012 (Kaggle, 2014). You can download this same Ch1_bike_sharing_data.csv file from the book's website at

Data sources often include a data dictionary to help new users understand the contents and coding of the data. Data Dictionary for Bike Sharing Data (Kaggle, 2014):

  • datetime: Hourly date + timestamp

  • season: 1 = spring, 2 = summer, 3 = fall, 4 = winter

  • holiday: Whether the day is considered a holiday

  • workingday: Whether the day is neither a weekend nor holiday

  • weather:

    • 1: Clear, Few clouds, Partly cloudy, Partly cloudy

    • 2: Mist + Cloudy, Mist + Broken clouds, Mist + Few clouds, Mist

    • 3: Light Snow, Light Rain + Thunderstorm + Scattered clouds, Light Rain + Scattered clouds

    • 4: Heavy Rain + Ice Pellets + Thunderstorm + Mist, Snow + Fog

  • temp: Temperature in Celsius

  • atemp: Feels like temperature in Celsius

  • humidity: Relative humidity

  • windspeed: Wind speed

  • casual: Number of non-registered user rentals initiated

  • registered: Number of registered user rentals initiated

  • count: Number of total rentals

One of your goals is to strengthen your ETL skills. In this use case, you will learn common extraction, transformation, and loading skills to store a dataset in a file for analysis. Welcome to the Bike Sharing team.