We live in an information age. Large and accessible datasets are being widely used in business intelligence and decision making. When you begin the data cleaning process, you will need a way of summarizing your data. You will need to understand its content and structure at the beginning of the process. Large datasets require ways of summarizing the data for inspection. Fortunately, the R language provides them for you! You will learn data cleaning through a use case called the Bike Sharing Analysis Project.
Use case: Bike Sharing Analysis Project
Imagine you are a business analyst on the Bike Sharing Analysis Project. New data has just arrived for analysis. Unlike the dataset you saw in
Chapter 1, Extract, Transform, and Load, which was pre-processed for a Kaggle competition, this data you received has arrived in raw form. The book's website at
Ch2_raw_bikeshare_data.csv data. Before...