Mastering Data analysis with R

Book Image

Mastering Data analysis with R

By : Gergely Daróczi

Book Image

Mastering Data analysis with R

By: Gergely Daróczi

Overview of this book

Mastering Data Analysis with R

Mastering Data Analysis with R

Credits

www.PacktPub.com

www.PacktPub.com

Preface

Free Chapter

Hello, Data!

Loading text files of a reasonable size

Benchmarking text file parsers

Loading a subset of text files

Loading data from databases

Importing data from other statistical systems

Loading Excel spreadsheets

Getting Data from the Web

Getting Data from the Web

Loading datasets from the Internet

Other popular online data formats

Reading data from HTML tables

Scraping data from other online sources

R packages to interact with data source APIs

Filtering and Summarizing Data

Filtering and Summarizing Data

Drop needless data

Running benchmarks

Summary functions

Restructuring Data

Restructuring Data

Transposing matrices

Filtering data by string matching

Rearranging data

dplyr versus data.table

Computing new variables

Merging datasets

Reshaping data in a flexible way

The evolution of the reshape packages

Building Models (authored by Renata Nemeth and Gergely Toth)

Building Models (authored by Renata Nemeth and Gergely Toth)

The motivation behind multivariate models

Linear regression with continuous predictors

Model assumptions

How well does the line fit in the data?

Discrete predictors

Beyond the Linear Trend Line (authored by Renata Nemeth and Gergely Toth)

Beyond the Linear Trend Line (authored by Renata Nemeth and Gergely Toth)

The modeling workflow

Logistic regression

Models for count data

Unstructured Data

Unstructured Data

Importing the corpus

Cleaning the corpus

Visualizing the most frequent words in the corpus

Further cleanup

Analyzing the associations among terms

Some other metrics

The segmentation of documents

Polishing Data

The types and origins of missing data

Identifying missing data

By-passing missing values

Getting rid of missing data

Filtering missing data before or during the actual analysis

Data imputation

Extreme values and outliers

Using robust methods

From Big to Small Data

From Big to Small Data

Principal Component Analysis

Factor analysis

Principal Component Analysis versus Factor Analysis

Multidimensional Scaling

Classification and Clustering

Classification and Clustering

Cluster analysis

Latent class models

Discriminant analysis

Logistic regression

Machine learning algorithms

Social Network Analysis of the R Ecosystem

Social Network Analysis of the R Ecosystem

Loading network data

Centrality measures of networks

Visualizing network data

Further network analysis resources

Analyzing Time-series

Analyzing Time-series

Creating time-series objects

Visualizing time-series

Seasonal decomposition

Holt-Winters filtering

Autoregressive Integrated Moving Average models

Outlier detection

More complex time-series objects

Advanced time-series analysis

Data Around Us

Visualizing point data in space

Finding polygon overlays of point data

Plotting thematic maps

Rendering polygons around points

Interactive maps

Alternative map designs

Spatial statistics

Analyzing the R Community

Analyzing the R Community

R Foundation members

R package maintainers

The R-help mailing list

Analyzing overlaps between our lists of R users

The number of R users in social media

R-related posts in social media

References

General good readings on R

Chapter 1 – Hello, Data!

Chapter 2 – Getting Data from the Web

Chapter 3 – Filtering and Summarizing Data

Chapter 4 – Restructuring Data

Chapter 5 – Building Models (authored by Renata Nemeth and Gergely Toth)

Chapter 6 – Beyond the Linear Trend Line (authored by Renata Nemeth and Gergely Toth)

Chapter 7 – Unstructured Data

Chapter 8 – Polishing Data

Chapter 9 – From Big to Smaller Data

Chapter 10 – Classification and Clustering

Chapter 11 – Social Network Analysis of the R Ecosystem

Chapter 12 – Analyzing Time-series

Chapter 13 – Data Around Us

Chapter 14 – Analysing the R Community

Customer Reviews

5 star

0

4 star

0

3 star

0

2 star

0

1 star

0

Latent class models

Latent Class Analysis (LCA) is a method for identifying latent variables among polychromous outcome variables. It is similar to factor analysis, but can be used with discrete/categorical data. To this end, LCA is mostly used when analyzing surveys.

In this section, we are going to use the poLCA function from the poLCA package. It uses expectation-maximization and Newton-Raphson algorithms for finding the maximum likelihood for the parameters.

The poLCA function requires the data to be coded as integers starting from one or as a factor, otherwise it will produce an error message. To this end, let's transform some of the variables in the mtcars dataset to factors:

> factors <- c('cyl', 'vs', 'am', 'carb', 'gear')
> mtcars[, factors] <- lapply(mtcars[, factors], factor)

Tip

The preceding command will overwrite the mtcars dataset in your current R session. To revert to the original dataset for other examples, please delete this updated dataset from the session by...