The first thing you want to do, when thinking about reducing the number of dimensions or looking for latent variables in the dataset with multivariate statistical analysis, is to check whether the variables are correlated and the data is normally distributed.
The latter is often not a strict requirement. For example, the results of a PCA can be still valid and interpreted if we do not have multivariate normality; on the other hand, maximum likelihood factor analysis does have this strong assumption.
Tip
You should always use the appropriate methods to achieve your data analysis goals, based on the characteristics of your data.
Anyway, you can use (for example) qqplot
to do a pair-wise comparison of variables, and qqnorm
to do univariate normality tests of your variables. First, let's demonstrate this with a subset of hflights
:
> library(hlfights) > JFK <- hflights[which(hflights$Dest == 'JFK'), + c('TaxiIn', 'TaxiOut')]
So we filter our dataset...