Now that we have our data ready, we will focus on performing the analyses in R.
We will first predict the income of the participants using C4.5.
We will start by examining the unpruned tree. This is configured using the Weka_Control(U= TRUE)
. J48()
argument in RWeka
, which uses the formula notation we have seen previously. The dot (.
) after the tilde indicates that all attributes except the class
attribute have to be used. We used the control
argument to tell R that we want an unpruned tree (we will discuss pruning later):
C45tree = J48(income ~ . , data= AdultTrain, control= Weka_control(U=TRUE))
You can examine the tree by typing:
C45tree
We will not display it here as it is very big: the size of the tree is 5,715, with 4,683 leaves; but we can examine how well the tree classified the cases:
summary(C45tree)
We can see that even though about 89 percent of cases are...