Book Image

Principles of Strategic Data Science

By : Peter Prevos
Book Image

Principles of Strategic Data Science

By: Peter Prevos

Overview of this book

Mathematics and computer science form an integral part of data science, and understanding them is crucial for efficiently managing data. This book is designed to take you through the entire data science pipeline and help you join the dots between mathematics, programming, and business analysis. You’ll start by learning what data science is and how organizations can use it to revolutionize the way they use their data. The book then covers the criteria for the soundness of data products and demonstrates how to effectively visualize information. As you progress, you’ll discover the strategic aspects of data science by exploring the five-phase framework that enables you to enhance the value you extract from data. Toward the concluding chapters, you’ll understand the role of a data science manager in helping an organization take the data-driven approach. By the end of this book, you’ll have a good understanding of data science and how it can enable you to extract value from your data.
Table of Contents (6 chapters)

About the Book

Principles of Data Science is created to help you join the dots between mathematics, programming, and business analysis.

I am not a data scientist. These might be strange words to read in a book about data science, so please allow me to explain. This book is the result of a 25-year career in civil engineering, building and managing structures in Europe, Africa, Asia, and Australia. Most of my tasks involve managing and analyzing large amounts of data. Cost estimates, volume calculations, modelling river flows, structural calculations, Monte Carlo simulations, and many other types of number crunching are integral to my work as an engineer.

My journey toward what we now call data science started at university. When studying engineering in the Netherlands, I wrote computer code in the Pascal and BASIC languages. I loved spending time in the computer lab and writing software to solve technical problems. The dean advised me to switch from civil engineering to computer science, but I enjoyed writing software to solve engineering problems, not for the sake of it, so I did not heed the advice.

In my first job as a civil engineer, my company introduced the now defunct Lotus 123 spreadsheet. When first using that package, I thought it was the best thing since sliced bread. Graphical output and managing data were complex tasks when writing code in those days. The allure of the spreadsheet was the ability to combine input with computer code and show the results in text or graphs, all in one convenient file.

Over the following two decades, I wrote hundreds of spreadsheets to solve a myriad of engineering problems. I even developed a 'jungle' of interconnected spreadsheets to manage the logistics of a large river engineering project in Bangladesh. The complexity of this task took me beyond the limits of what spreadsheets can achieve. Throughout my career, I have had many nightmarish experiences, trying to reverse-engineer spreadsheets to figure out how they work – even ones I wrote myself. The combination of data, code, and output that I loved at the start of my career became a source of frustration.

My love affair with the venerable spreadsheet ended when writing my doctoral dissertation. Excel was incapable of helping me with complex statistics such as structural equation modelling or network analysis. A colleague suggested looking into this new thing called 'data science.' I decided to learn how to write code in R, a specialized computer language for statistical analysis. The R language is like a Swiss Army chainsaw for engineers, with capabilities that far exceed anything a spreadsheet can do.

I now manage the data science function for a water utility in regional Australia. Through my experience with practical data analysis and expertise in management, I have developed a strategic approach to data science. I have published and presented my views on strategic data science at conferences in Australia and New Zealand. This book is an expanded version of an article I wrote for the journal of the Australian Water Association, which became one of the most downloaded papers. Lifting the Big Data Veil describes a back-to-basics approach to how to maximize the value we can extract from data assets. This book dives deeper into the principles of data science first presented in this paper.

I have written this book from the perspective of an engineer and a social scientist, but the same principles are valid for any field of human endeavor. All professionals and scientists rely on data to make decisions. Data science provides a systematic approach to making better business decisions and discovering new patterns in society or nature.

My approach in this book goes back to the basics of what it means to create value from data. This book is not a treatise on machine learning, mathematics, or developing software, but a practical guide to strategically and systematically using data to create a better world. This book is pragmatic because it doesn't dwell on the future promises of machine learning, artificial intelligence, or quantum computing. The framework in this book is inspired by my current and desired practice as an engineer and social scientist, with a data science responsibility and best-practice in management.

About the Author

Dr Peter Prevos is a civil engineer and social scientist who also dabbles in theatrical magic. Peter has almost three decades of experience as a water engineer and manager, working in Europe, Africa, Asia, and Australia. He has worked on marine engineering, drinking water, and sewage treatment projects. Throughout his career, analysing data has been a central theme.

He also has a PhD in marketing and is the author of Customer Experience Management for Water Utilities. In his work, he aims to combine the social sciences with engineering to create value for customers. Peter occasionally lectures marketing for MBA students.

He is currently responsible for developing and implementing the data science strategy for a water utility in regional Australia. The objective of this strategy is to create value from data through useful, sound, and aesthetic data science. His mission is to breed unicorn data scientists by motivating other water professionals to ditch their spreadsheets and learn how to write code.

Learning Objectives

  • Get familiar with the five most important steps of data science
  • Use the Conway diagram to visualize the technical skills of the data science team
  • Understand the limitations of data science from a mathematical and ethical perspective
  • Get a quick overview of machine learning
  • Gain insight into the purpose of using data science in your work
  • Understand the role of data science managers and expectations from them

Approach

This book covers the basic approach of creating value from data. It is developed as a practical guide to strategically and systematically use data to create a better world. It doesn't dwell on promises of machine learning, artificial intelligence or quantum computing. The framework in this book is inspired by the author's current and desired practice as an engineer and social scientist, with a data science responsibility and best-practice in management.

Audience

This book is ideal for data scientists and data analysts who are looking for a practical guide to strategically and systematically use data. This book is also useful for those who want to understand in detail what is data science and how can an organization take the data-driven approach. Prior programming knowledge of Python and R is assumed.

Preface to Second Edition

The second edition of this book contains many grammar fixes. Thanks to David Smith and Catherine Cousins for spotting these mistakes.

Interestingly, the machine learning program I used for checking the text did not identify many of these mistakes in the first version. This experience strengthens one of the points in this book, which is that artificial intelligence is not a replacement for natural intelligence.