Preparing dataset for statistical analysis is one of the most important steps in any data analytical domain. Data pre-processing takes almost 80 percent of the total data analysis task. There are lots of different libraries developed over time for data pre-processing, but dplyr is one of the most popular and memory-efficient data-processing libraries. In this chapter, you will use the functionalities within the dplyr library to do some pre-processing. The USA domestic airlines data has been downloaded from the website of the Bureau of Transportation Statistics (https://www.transtats.bts.gov). This dataset will be used throughout the chapter.
The dataset contains 61 variables rating time period, airline, origin, destination, departure performance, arrival performance, cancellations and diversions, flight summaries, and causes of delay. Due to the huge size of the data...