Book Image

Python for Data Science For Dummies - Second Edition

By : John Paul Mueller, Luca Massaron
Book Image

Python for Data Science For Dummies - Second Edition

By: John Paul Mueller, Luca Massaron

Overview of this book

Python is a general-purpose programming language created in the late 1980s — and named after Monty Python — that's used by thousands of people to do things from testing microchips at Intel to powering Instagram to building video games with the PyGame library. The book begins by discussing how Python can make data science easy. You’ll learn how to work with the Anaconda tool suite that makes coding in Python easy. You’ll also learn to write code using Google Colab. As you progress, you'll discover how to perform interesting calculations and data manipulations using various Python libraries, such as pandas and NumPy. You’ll learn how to create data visualizations with MatPlotLib. While learning the advanced concepts, you’ll learn how to wrangle data by using techniques, such as hierarchical clustering. Finally, you’ll learn how to work with decision trees and use machine learning to make predictions. By the end of the book, you’ll have the skills and the knowledge that’s needed to write code in Python and extract information from data.
Table of Contents (13 chapters)
Free Chapter
1
Cover
9
Index
10
About the Authors
11
Advertisement Page
12
Connect with Dummies
13
End User License Agreement

Chapter 13

Exploring Data Analysis

IN THIS CHAPTER

Bullet Understanding the Exploratory Data Analysis (EDA) philosophy

Bullet Describing numeric and categorical distributions

Bullet Estimating correlation and association

Bullet Testing mean differences in groups

Bullet Visualizing distributions, relationships, and groups

Data science relies on complex algorithms for building predictions and spotting important signals in data, and each algorithm presents different strong and weak points. In short, you select a range of algorithms, you have them run on the data, you optimize their parameters as much as you can, and finally you decide which one will best help you build your data product or generate insight into your problem.

It sounds a little bit automatic and, partially, it is, thanks to powerful analytical software and scripting languages like Python. Learning algorithms are complex, and their sophisticated procedures naturally seem automatic and a bit opaque to you. However, even if some of these tools seem like...