Mastering pandas

Mastering pandas

By : Femi Anthony

Buy this Book

Mastering pandas

By: Femi Anthony

Buy this Book

Overview of this book

<p>Python is a ground breaking language for its simplicity and succinctness, allowing the user to achieve a great deal with a few lines of code, especially compared to other programming languages. The pandas brings these features of Python into the data analysis realm, by providing expressiveness, simplicity, and powerful capabilities for the task of data analysis. By mastering pandas, users will be able to do complex data analysis in a short period of time, as well as illustrate their findings using the rich visualization capabilities of related tools such as IPython and matplotlib.</p> <p>This book is an in-depth guide to the use of pandas for data analysis, for either the seasoned data analysis practitioner or the novice user. It provides a basic introduction to the pandas framework, and takes users through the installation of the library and the IPython interactive environment. Thereafter, you will learn basic as well as advanced features, such as MultiIndexing, modifying data structures, and sampling data, which provide powerful capabilities for data analysis.</p>

Mastering pandas

Credits

About the Author

About the Reviewers

www.PacktPub.com

Preface

Free Chapter

Introduction to pandas and Data Analysis

Motivation for data analysis

How Python and pandas fit into the data analytics mix

What is pandas?

Benefits of using pandas

Summary

Installation of pandas and the Supporting Software

Selecting a version of Python to use

Python installation

Installation of Python and pandas from a third-party vendor

Continuum Analytics Anaconda

Other numeric or analytics-focused Python distributions

Downloading and installing pandas

IPython installation

Summary

The pandas Data Structures

NumPy ndarrays

Data structures in pandas

Summary

Operations in pandas, Part I – Indexing and Selecting

Basic indexing

Label, integer, and mixed indexing

Boolean indexing

Summary

Operations in pandas, Part II – Grouping, Merging, and Reshaping of Data

Grouping of data

Merging and joining

Pivots and reshaping data

Summary

Missing Data, Time Series, and Plotting Using Matplotlib

Handling missing data

Handling time series

A summary of Time Series-related objects

Summary

A Tour of Statistics – The Classical Approach

Descriptive statistics versus inferential statistics

Measures of central tendency and variability

Hypothesis testing – the null and alternative hypotheses

Summary

A Brief Tour of Bayesian Statistics

Introduction to Bayesian statistics

Mathematical framework for Bayesian statistics

Probability distributions

Bayesian statistics versus Frequentist statistics

Conducting Bayesian statistical analysis

Monte Carlo estimation of the likelihood function and PyMC

References

Summary

The pandas Library Architecture

Introduction to pandas' file hierarchy

Description of pandas' modules and files

Improving performance using Python extensions

Summary

R and pandas Compared

R data types

Slicing and selection

Arithmetic operations on columns

Aggregation and GroupBy

Comparing matching operators in R and pandas

Logical subsetting

Split-apply-combine

Reshaping using melt

Factors/categorical data

Summary

Brief Tour of Machine Learning

Role of pandas in machine learning

Installation of scikit-learn

Introduction to machine learning

Application of machine learning – Kaggle Titanic competition

Data analysis and preprocessing using pandas

A naïve approach to Titanic problem

The scikit-learn ML/classifier interface

Supervised learning algorithms

Unsupervised learning algorithms

Summary

Index

Customer Reviews

5 star

4 star

3 star

2 star

1 star

Measures of central tendency and variability

Some of the measures used in descriptive statistics include the measures of central tendency and measures of variability.

A measure of central tendency is a single value that attempts to describe a dataset by specifying a central position within the data. The three most common measures of central tendency are the mean, median, and mode.

A measure of variability is used to describe the variability in a dataset. Measures of variability include variance and standard deviation.

Measures of central tendency

Let's take a look at the measures of central tendency and an illustration in the following sections.

The mean

The mean or sample is the most popular measure of central tendency. It is equal to the sum of all values in the dataset divided by the number of values in the dataset. Thus, in a dataset of n values, the mean is calculated as follows:

We use if the data values are from a sample and μ if the data values are from a population.

The sample mean and...

Mastering pandas

By : Femi Anthony

Mastering pandas

By: Femi Anthony

Overview of this book

Related Content you might be interested in

Current Title:

Mastering pandas

Measures of central tendency and variability

Measures of central tendency

The mean