Book Image

Python Data Science Essentials

Book Image

Python Data Science Essentials

Overview of this book

The book starts by introducing you to setting up your essential data science toolbox. Then it will guide you across all the data munging and preprocessing phases. This will be done in a manner that explains all the core data science activities related to loading data, transforming and fixing it for analysis, as well as exploring and processing it. Finally, it will complete the overview by presenting you with the main machine learning algorithms, the graph analysis technicalities, and all the visualization instruments that can make your life easier in presenting your results. In this walkthrough, structured as a data science project, you will always be accompanied by clear code and simplified examples to help you understand the underlying mechanics and real-world datasets.
Table of Contents (13 chapters)

About the Reviewers

Robert Dempsey is an experienced leader and technology professional specializing in delivering solutions and products to solve tough business challenges. His experience in forming and leading agile teams, combined with more than 14 years of experience in the field of technology, enables him to solve complex problems while always keeping the bottom line in mind.

Robert has founded and built three start-ups in technology and marketing, developed and sold two online applications, consulted Fortune 500 and Inc. 500 companies, and spoken nationally and internationally on software development and agile project management.

He is currently the head of data operations at ARPC, an econometrics firm based in Washington, DC. In addition, he's the founder of Data Wranglers DC, a group dedicated to improving the craft of data wrangling, as well as a board member of Data Community DC.

In addition to spending time with his growing family, Robert geeks out on Raspberry Pis and Arduinos and automates most of his life with the help of hardware and software.

Daniel Frimer has been an advocate for the Python language for 2 years now. With a degree in applied and computational math sciences from the University of Washington, he has spearheaded various automation projects in the Python language involving natural language processing, data munging, and web scraping. In his side projects, he has dived into a deep analysis of NFL and NBA player statistics for his fantasy sports teams.

Daniel has recently started working in SaaS at a private company for online health insurance shopping called Array Health, in support of day-to-day data analysis and the perfection of the integration between consumers, employers, and insurers. He has also worked with data-centric teams at Amazon, Starbucks, and Atlas International.

Kevin Markham is a computer engineer, a data science instructor for General Assembly in Washington, DC, and the cofounder of Causetown, an online cause marketing platform for small businesses. He is passionate about teaching data science and machine learning and enjoys both Python and R. He founded Data School (http://dataschool.io) in order to provide in-depth educational resources that are accessible to data science novices. He has an active YouTube channel (http://youtube.com/dataschool) and can also be found on Twitter (@justmarkham).

Alberto Gonzalez Paje is an economist specializing in information management systems and data science. Educated in Spain and the Netherlands, he has developed an international career as a data analyst at companies such as Coca Cola, Accenture, Bestiario, and CartoDB. He focuses on business strategy, planning, control, and data analysis. He loves architecture, cartography, the Mediterranean way of life, and sports.

Bastiaan Sjardin is a data scientist and entrepreneur with a background in artificial intelligence, mathematics, and machine learning. He has an MSc degree in cognitive science and mathematical statistics at the University of Leiden. In the past 5 years, he has worked on a wide range of data science projects. He is a frequent Community TA with Coursera for the "Social Network analysis" course at the University of Michigan. His programming language of choice is R and Python. Currently, he is the cofounder of Quandbee (www.quandbee.com), a company specialized in machine learning applications.

Michele Usuelli is a data scientist living in London, specializing in R and Hadoop. He has an MSc in mathematical engineering and statistics, and he has worked in fast-paced, growing environments, such as a big data start-up in Milan, the new pricing and analytics division of a big publishing company, and a leading R-based company. He is the author of R Machine Learning Essentials, Packt Publishing, which is a book that shows how to solve business challenges with data-driven solutions. He has also written articles on R-bloggers and is active on StackOverflow.

Zacharias Voulgaris, PhD, is a data scientist with machine learning expertise. His first degree was in production engineering and management, while his post-graduate studies focused on information systems (MSc) and machine learning (PhD). He has worked as a researcher at Georgia Tech and as a data scientist at Elavon Inc. He currently works for Microsoft as a program manager, and he is involved in a variety of big data projects in the field of web search. He has written several research papers and a number of web articles on data science-related topics and has authored his own book titled Data Scientist: The Definite Guide to Becoming a Data Scientist.