DTs are commonly considered a supervised learning technique used for solving classification and regression tasks.
More technically, each branch in a DT represents a possible decision, occurrence, or reaction, in terms of statistical probability. Compared to naive Bayes, DTs are a far more robust classification technique. The reason is that at first, the DT splits the features into training and test sets. Then, it produces a good generalization to infer the predicted labels or classes. Most interestingly, a DT algorithm can handle both binary and multiclass classification problems.
For instance, in the following example figure, DTs learn from the admission data to approximate a sine curve with a set of if...else decision rules. The dataset contains the record of each student who applied for admission, say, to an American university. Each record contains the graduate record exam score, CGPA score, and the rank of the column. Now we will have to predict who is competent...