A data scientist has many options in selecting and implementing a classification or clustering algorithm.
Firstly, a mathematical or statistical model is to be selected to extract knowledge from the raw input data or the output of a data upstream transformation. The selection of the model is constrained by the following parameters:
Business requirements such as accuracy of results or computation time
Availability of training data, algorithms, and libraries
Access to a domain or subject matter expert, if needed
Secondly, the engineer has to select a computational and deployment framework suitable for the amount of data to be processed. The computational context is to be defined by the following parameters:
Available resources such as machines, CPU, memory, or I/O bandwidth
An implementation strategy such as iterative versus recursive computation or caching
Requirements for the responsiveness of the overall process such as duration of computation or display of intermediate results...