We will use the example from the previous chapter about the swim preference. We have the same data table:
Swimming suit |
Water temperature |
Swim preference |
None |
Cold |
No |
None |
Warm |
No |
Small |
Cold |
No |
Small |
Warm |
No |
Good |
Cold |
No |
Good |
Warm |
Yes |
We would like to construct a random forest from this data and use it to classify an item (Good,Cold,?).
Analysis:
We are given M=3 variables according to which a feature can be classified. In a random forest algorithm, we usually do not use all three variables to form tree branches at each node. We use only m variables out of M. So we choose m such that m is less than or equal to M. The greater m is, the stronger the classifier is in each constructed tree. However, as mentioned earlier, more data leads to more bias. But, because we use multiple...