As iterated before, the most important aspect of any data science project is the question at hand. Having a clear understanding on what problem are we trying to solve? This is critical to the success of the project. It also drives what is considered as relevant data and what is not. For example, in the current case study, if what we want to look at is the demographics, then movie name and person name are irrelevant. At times, there is no specific question at hand! What then? Even when there is no specific question, the business may still have some objective, or data scientists and domain experts can work together to find the area of business to work on. To understand the business, functions, problem statement, or data, the data scientists start with "Questioning". It not only helps in defining the workflow, but helps in sourcing the right data to work on.
As an example, if the business focus is on demographics information, a formal business problem statement can be defined...