Book Image

Hands-On Data Science with R

By : Vitor Bianchi Lanzetta, Doug Ortiz, Nataraj Dasgupta, Ricardo Anjoleto Farias
Book Image

Hands-On Data Science with R

By: Vitor Bianchi Lanzetta, Doug Ortiz, Nataraj Dasgupta, Ricardo Anjoleto Farias

Overview of this book

R is the most widely used programming language, and when used in association with data science, this powerful combination will solve the complexities involved with unstructured datasets in the real world. This book covers the entire data science ecosystem for aspiring data scientists, right from zero to a level where you are confident enough to get hands-on with real-world data science problems. The book starts with an introduction to data science and introduces readers to popular R libraries for executing data science routine tasks. This book covers all the important processes in data science such as data gathering, cleaning data, and then uncovering patterns from it. You will explore algorithms such as machine learning algorithms, predictive analytical models, and finally deep learning algorithms. You will learn to run the most powerful visualization packages available in R so as to ensure that you can easily derive insights from your data. Towards the end, you will also learn how to integrate R with Spark and Hadoop and perform large-scale data analytics without much complexity.
Table of Contents (16 chapters)

Solving problems with data science

Data science is being used today to solve problems ranging from poverty alleviation to scientific research. It has emerged as the leading discipline that aims to disrupt the industry's status quo and provide a new alternative to pressing business issues.

However, while the promise of data science and machine learning is immense, it is important to bear in mind that it takes time and effort to realize the benefits. The return-on-investment on a machine learning project typically takes a fairly long time. It is thus essential to not overestimate the value it can bring in the short run.

A typical data science project in a corporate setting would require the collaborative efforts of various groups, both on the technical and the business side. Generally, this means that the project should have a business sponsor and a technical or analytics lead in addition to the data science team or data scientist. It is important to set expectations at the onset—both in terms of the time it would take to complete the project and the outcome that may be uncertain until the task has completed. Unlike other projects that may have a definite goal, it is not possible to predetermine the outcome of machine learning projects.

Some common questions to ask include the following:

  • What business value does the data science project bring to the organization?
  • Does it have a critical base of users, that is, would multiple users benefit from the expected outcome of the project?
  • How long would it take to complete the project and are all the business stakeholders aware of the timeline?
  • Have the project stakeholders taken all variables that may affect the timeline into account? Projects can often get delayed due to dependencies on external vendors.
  • Have we considered all other potential business use cases and made an assessment of what approach would have an optimal chance of success?

A few salient points for successful data science projects are given as follows:

  • Find projects or use cases related to business operations that are:
    • Challenging
    • Not necessarily complex, that is, they can be simple tasks but which add business value
    • Intuitive, easily understood (you can explain it to friends and family)
    • Takes effort to accomplish today or requires a lot of manual effort
    • Used frequently by a range of users and the benefits of the outcome would have executive visibility
  • Identify low difficulty–high value (shorter) versus high difficulty–high value (longer)
  • Educate business sponsors, share ideas, show enthusiasm (it's like a long job interview)
  • Score early wins on low difficulty–high value; create minimum viable solutions, get management buy-in before enhancing them (takes time)
  • Early wins act as a catalyst to foster executive confidence; and also make it easier to justify budgets, making it easier to move on to high difficulty—high value tasks