In the previous chapter, you learned about the discrete statistics methods for getting the information about the distribution of discrete and continuous variables. In a data science project, the next typical step is to check for the associations between pairs of variables.
When checking for the associations between pairs of variables, you have three possibilities:
Both variables are discrete
Both variables are continuous
One discrete and one continuous variable
Besides dealing with two variables only, this section also introduces linear regression, one of the most important statistical methods, where you model a single response (or dependent) variable with a regression formula that includes one or more predictor (or independent) variables.
Altogether, you will learn about the following in this section:
Chi-squared test of independence of two discrete variables
Phi coefficient, contingency coefficient, and Cramer's V coefficient that measures the association...