Mastering Python Data Analysis

Mastering Python Data Analysis

By : Magnus Vilhelm Persson

Buy this Book

Mastering Python Data Analysis

By: Magnus Vilhelm Persson

Buy this Book

Overview of this book

Python, a multi-paradigm programming language, has become the language of choice for data scientists for data analysis, visualization, and machine learning. Ever imagined how to become an expert at effectively approaching data analysis problems, solving them, and extracting all of the available information from your data? Well, look no further, this is the book you want! Through this comprehensive guide, you will explore data and present results and conclusions from statistical analysis in a meaningful way. You’ll be able to quickly and accurately perform the hands-on sorting, reduction, and subsequent analysis, and fully appreciate how data analysis methods can support business decision-making. You’ll start off by learning about the tools available for data analysis in Python and will then explore the statistical models that are used to identify patterns in data. Gradually, you’ll move on to review statistical inference using Python, Pandas, and SciPy. After that, we’ll focus on performing regression using computational tools and you’ll get to understand the problem of identifying clusters in data in an algorithmic way. Finally, we delve into advanced techniques to quantify cause and effect using Bayesian methods and you’ll discover how to use Python’s tools for supervised machine learning.

Mastering Python Data Analysis

Credits

About the Authors

About the Reviewer

www.PacktPub.com

Preface

Free Chapter

Tools of the Trade

Before you start

Using the notebook interface

Imports

An example using the Pandas library

Summary

Exploring Data

The General Social Survey

Univariate data

Relationships between variables – scatterplots

Summary

Learning About Models

Models and experiments

The cumulative distribution function

Working with distributions

The probability density function

Where do models come from?

Multivariate distributions

Summary

Regression

Introducing linear regression

Multivariate regression

Logistic regression

Summary

Clustering

Introduction to cluster finding

K-means clustering

Hierarchical clustering analysis

Summary

Bayesian Methods

The Bayesian method

U.S. air travel safety record

Climate change - CO in the atmosphere

Summary

Supervised and Unsupervised Learning

Introduction to machine learning

Summary

Time Series Analysis

Introduction

Pandas and time series data

Indexing and slicing

Resampling, smoothing, and other estimates

Stationarity

Patterns and components

Time series models

Summary

More on Jupyter Notebook and matplotlib Styles

Jupyter Notebook

Matplotlib styles

Useful resources

Summary

Customer Reviews

5 star

4 star

3 star

2 star

1 star

Chapter 5. Clustering

With data comprising of several separated distributions, how do we find and characterize them? In this chapter, we will look at some ways to identify clusters in data. Groups of points with similar characteristics form clusters. There are many different algorithms and methods to achieve this with good and bad points. We want to detect multiple separate distributions in the data and determine the degree of association (or similarity) with another point or cluster for each point. The degree of association needs to be high if they belong in a cluster together or low if they do not. This can of course, just as previously, be a one-dimensional problem or multi-dimensional problem. One of the inherent difficulties of cluster finding is determining how many clusters there are in the data. Various approaches to define this exist; some where the user needs to input the number of clusters and then the algorithm finds which points belong to which cluster, and some where the starting...

Mastering Python Data Analysis

By : Magnus Vilhelm Persson

Mastering Python Data Analysis

By: Magnus Vilhelm Persson

Overview of this book

Related Content you might be interested in

Current Title:

Mastering Python Data Analysis

Chapter 5. Clustering