Book Image

Principles of Strategic Data Science

By : Peter Prevos
Book Image

Principles of Strategic Data Science

By: Peter Prevos

Overview of this book

Mathematics and computer science form an integral part of data science, and understanding them is crucial for efficiently managing data. This book is designed to take you through the entire data science pipeline and help you join the dots between mathematics, programming, and business analysis. You’ll start by learning what data science is and how organizations can use it to revolutionize the way they use their data. The book then covers the criteria for the soundness of data products and demonstrates how to effectively visualize information. As you progress, you’ll discover the strategic aspects of data science by exploring the five-phase framework that enables you to enhance the value you extract from data. Toward the concluding chapters, you’ll understand the role of a data science manager in helping an organization take the data-driven approach. By the end of this book, you’ll have a good understanding of data science and how it can enable you to extract value from your data.
Table of Contents (6 chapters)

The Data Revolution

Since Taylor's first writings, businesses and non-profit organizations have sought to become driven by evidence to reduce unconscious bias in their decisions. Although data science is merely a new term for something that has existed for decades, some recent developments have created a watershed between the old and new ways of doing business. The difference between traditional business analysis and the new world of data science is threefold.

Firstly, businesses have much more data available than ever before. The move to electronic transactions means that almost every process leaves a digital footprint. Collecting and storing this data has become exponentially cheaper than in the days of pencil and paper. Many organizations collect this data without maximizing the value they extract from it. After the data is used for its intended purpose, it becomes 'dark data', stored on servers but languishing in obscurity. This data provides opportunities to optimize how an organization operates by recycling and analyzing it to learn about the past to create a better future.

Secondly, the computing power that is now available in a tablet was not long ago the domain of supercomputers. Piotr Luszczek showed that an iPad 2 matches the performance of the world's fastest computer in 1985. (Larabel, M. (2012). Apple iPad 2 As Fast As The Cray-2 Supercomputer. Retrieved 4 February 2019 from (Phoronixhttps://www.phoronix.com/scan.php?page=news_item&px=MTE4NjU)) The affordability of vast computing power enables even small organizations to reap the benefits of advanced analytics.

Lastly, complex machine learning algorithms are freely available as open source software, and a laptop is all that is needed to implement sophisticated mathematical analyses. The R language for statistical computing, and Python, are both potent tools that can undertake a vast array of data science tasks such as complex visualizations and machine learning. These languages are 'Swiss army chainsaws' that can tackle any business analysis problem. Part of their power lies in the healthy communities that support each other in their journey to mastering these languages.

These three changes have caused a revolution in how we create value from data. The barriers to entry for even small organizations to leverage information technology are very low. The only hurdle is to make sense of the fast-moving developments and follow a strategic approach instead of chasing the hype.

This revolution is not necessarily only about powerful machine learning algorithms, but about a more scientific way of solving business problems. The definition of data science in this book is not restricted to machine learning, big data, and artificial intelligence. These developments are essential aspects of data science, but they do not define the field.