Book Image

Mastering Python Data Analysis

By : Magnus Vilhelm Persson
Book Image

Mastering Python Data Analysis

By: Magnus Vilhelm Persson

Overview of this book

Python, a multi-paradigm programming language, has become the language of choice for data scientists for data analysis, visualization, and machine learning. Ever imagined how to become an expert at effectively approaching data analysis problems, solving them, and extracting all of the available information from your data? Well, look no further, this is the book you want! Through this comprehensive guide, you will explore data and present results and conclusions from statistical analysis in a meaningful way. You’ll be able to quickly and accurately perform the hands-on sorting, reduction, and subsequent analysis, and fully appreciate how data analysis methods can support business decision-making. You’ll start off by learning about the tools available for data analysis in Python and will then explore the statistical models that are used to identify patterns in data. Gradually, you’ll move on to review statistical inference using Python, Pandas, and SciPy. After that, we’ll focus on performing regression using computational tools and you’ll get to understand the problem of identifying clusters in data in an algorithmic way. Finally, we delve into advanced techniques to quantify cause and effect using Bayesian methods and you’ll discover how to use Python’s tools for supervised machine learning.
Table of Contents (15 chapters)
Mastering Python Data Analysis
Credits
About the Authors
About the Reviewer
www.PacktPub.com
Preface

Climate change - CO2 in the atmosphere


With Bayesian analysis, we can fit any model; anything that we can do with frequentist or classical statistics, we can do with Bayesian statistics. In this next example, we will perform linear regression with both Bayesian inference and frequentist approaches. As we have covered the model creation and date parsing, we will go through things a little bit more quickly in this example. The data that we are going to use is the atmospheric CO2 over a span of about 1,000 years and the growth rate over the past 40 years, and then fit a linear function to the growth rate over the past 50-60 years.

Getting the data

The data for the last 50-60 years is from National Oceanic and Atmospheric Administration (NOAA) marine stations, surface sites. It can be found at  http://www.esrl.noaa.gov/gmd/ccgg/trends/global.html , where you can download two datasets, growth rates, and annual means. The direct links to the data tables are  ftp://aftp.cmdl.noaa.gov/products/trends...