In this chapter, we will use the samepublic
dataset that was extracted from the website, http://data.worldbank.org/, in Chapter 4, Segmentation Using Clustering. In the case of the classification problem, we convert the life_expectancy
column into a binomial variable by making the variable one if the life expectancy is more than 70; otherwise, the variable will be set to zero. The name of the dataset has been changed to worlddata_ForClassification
. For the classification problem, we will consider the life_expectancy_morethan_70
column as the column to be predicted and build the logistic regression algorithm:
# Data for Classification Problem worlddatac<- read.csv("data/worlddata_ForClassification.csv")
After reading the preceding data, we will remove the rows that have NA values similar to what we did in the previous chapter, and we will remove the column named country as it is a unique column and will not help us in improving the accuracy of the model. After formatting the dataset...