In the modeling stage, you will pick an appropriate predictive modeling technique that fits your problem and apply it to your data. There are several factors which influence the selection of a model:
- Who will use the model?
- How will the model be used?
- What are the assumptions of the model?
- How much data do I have?
- How many variables do I need to use?
- What is the accuracy level needed by the model?
- Am I willing to trade some accuracy for interpretability?
Particularly related to the last point is the concept of bias and variance.
Bias is related to the ability of a model to approximate the data. Low bias algorithms are able to fit the data with little error. While this may seem to an advantage all of the time, it can result in a complex model which is unstable, and difficult to explain. On the other hand, a high bias model is relatively simple to explain (like linear regression), but may sacrifice some accuracy for explanability, and stability. You will usually start by looking at...