In this section, we will try to understand the random forest through a detailed example with a specific dataset. We are going to use the same dataset to work out the iOS Core ML example.
We will use the breast cancer dataset for the random forest problem. Features are computed from a digitized image of a fine needle aspirate (FNA) of a breast mass. They describe the characteristics of the cell nuclei present in the image. The dataset can be found at https://archive.ics.uci.edu/ml/datasets/Breast+Cancer+Wisconsin+(Diagnostic).
We will be using the Breast Cancer dataset. The following list contains the various conventions used in the dataset:
- ID number
- Diagnosis (M = malignant, and B = benign)
- 10 real-valued features are computed for each cell nucleus:
- Radius (mean of the distances from the center to points on the perimeter)
- Texture (standard deviation of gray scale values)
- Perimeter
- Area
- Smoothness (local variation in...