Book Image

Mastering pandas

By : Femi Anthony

Book Image

Mastering pandas

By: Femi Anthony

Overview of this book

<p>Python is a ground breaking language for its simplicity and succinctness, allowing the user to achieve a great deal with a few lines of code, especially compared to other programming languages. The pandas brings these features of Python into the data analysis realm, by providing expressiveness, simplicity, and powerful capabilities for the task of data analysis. By mastering pandas, users will be able to do complex data analysis in a short period of time, as well as illustrate their findings using the rich visualization capabilities of related tools such as IPython and matplotlib.</p> <p>This book is an in-depth guide to the use of pandas for data analysis, for either the seasoned data analysis practitioner or the novice user. It provides a basic introduction to the pandas framework, and takes users through the installation of the library and the IPython interactive environment. Thereafter, you will learn basic as well as advanced features, such as MultiIndexing, modifying data structures, and sampling data, which provide powerful capabilities for data analysis.</p>

Mastering pandas

Mastering pandas

Credits

About the Author

About the Author

About the Reviewers

About the Reviewers

www.PacktPub.com

www.PacktPub.com

Preface

Free Chapter

Introduction to pandas and Data Analysis

Introduction to pandas and Data Analysis

Motivation for data analysis

How Python and pandas fit into the data analytics mix

What is pandas?

Benefits of using pandas

Installation of pandas and the Supporting Software

Installation of pandas and the Supporting Software

Selecting a version of Python to use

Python installation

Installation of Python and pandas from a third-party vendor

Continuum Analytics Anaconda

Other numeric or analytics-focused Python distributions

Downloading and installing pandas

IPython installation

The pandas Data Structures

The pandas Data Structures

Data structures in pandas

Operations in pandas, Part I – Indexing and Selecting

Operations in pandas, Part I – Indexing and Selecting

Label, integer, and mixed indexing

Boolean indexing

Operations in pandas, Part II – Grouping, Merging, and Reshaping of Data

Operations in pandas, Part II – Grouping, Merging, and Reshaping of Data

Grouping of data

Merging and joining

Pivots and reshaping data

Missing Data, Time Series, and Plotting Using Matplotlib

Missing Data, Time Series, and Plotting Using Matplotlib

Handling missing data

Handling time series

A summary of Time Series-related objects

A Tour of Statistics – The Classical Approach

A Tour of Statistics – The Classical Approach

Descriptive statistics versus inferential statistics

Measures of central tendency and variability

Hypothesis testing – the null and alternative hypotheses

A Brief Tour of Bayesian Statistics

A Brief Tour of Bayesian Statistics

Introduction to Bayesian statistics

Mathematical framework for Bayesian statistics

Probability distributions

Bayesian statistics versus Frequentist statistics

Conducting Bayesian statistical analysis

Monte Carlo estimation of the likelihood function and PyMC

The pandas Library Architecture

The pandas Library Architecture

Introduction to pandas' file hierarchy

Description of pandas' modules and files

Improving performance using Python extensions

R and pandas Compared

R and pandas Compared

Slicing and selection

Arithmetic operations on columns

Aggregation and GroupBy

Comparing matching operators in R and pandas

Logical subsetting

Split-apply-combine

Reshaping using melt

Factors/categorical data

Brief Tour of Machine Learning

Brief Tour of Machine Learning

Role of pandas in machine learning

Installation of scikit-learn

Introduction to machine learning

Application of machine learning – Kaggle Titanic competition

Data analysis and preprocessing using pandas

A naïve approach to Titanic problem

The scikit-learn ML/classifier interface

Supervised learning algorithms

Unsupervised learning algorithms

Index

Customer Reviews

5 star

0

4 star

0

3 star

0

2 star

0

1 star

0

Index

A

.at operator
- about / The .iat and .at operators
Active State Python
- URL / Third-party Python software installation
aggregate method
- using / Using the aggregate method
aggregation, in R
- about / Aggregation in R
aliases, for Time Series frequencies
- about / Aliases for Time Series frequencies
alpha
- about / The alpha and p-values
alternative hypothesis
- about / The null and alternative hypotheses
Anaconda
- about / Continuum Analytics Anaconda
- URL / Continuum Analytics Anaconda, Final step for all platforms, Other numeric or analytics-focused Python distributions
- installing / Installing Anaconda
- URL, for download / Installing Anaconda
- installing, on Linux / Linux
- installing, on Mac OS/X / Mac OS X
- installing, on Windows / Windows
- installing, final steps / Final step for all platforms
- numeric or analytics-focused Python distributions / Other numeric or analytics-focused Python distributions
- IPython installation / Install via Anaconda (for Linux/Mac OS X)
- scikit-learn, installing via / Installing via Anaconda
append
- using / Using append
arithmetic operations
- applying, on columns / Arithmetic operations on columns

B

Bayesian analysis example
- switchpoint detection / Bayesian analysis example – Switchpoint detection
Bayesians
- about / How the model is defined
Bayesian statistical analysis
- conducting, steps / Conducting Bayesian statistical analysis
Bayesian statistics
- about / Introduction to Bayesian statistics
- reference link / Introduction to Bayesian statistics
- mathematical framework / Mathematical framework for Bayesian statistics
- references / Mathematical framework for Bayesian statistics, Applications of Bayesian statistics, References
- applications / Applications of Bayesian statistics
- versus Frequentist statistics / Bayesian statistics versus Frequentist statistics
Bayes theory
- about / Bayes theory and odds
Bernoulli distribution
- about / The Bernoulli distribution
- reference link / The Bernoulli distribution
big data
- references / We live in a big data world
- 4V’s / 4 V's of big data
- about / 4 V's of big data
- examples / The move towards real-time analytics
binomial distribution
- about / The binomial distribution
Boolean indexing
- about / Boolean indexing
- any() method / The is in and any all methods
- isin method / The is in and any all methods
- all method / The is in and any all methods
- where() method, using / Using the where() method
- indexes, operations / Operations on indexes

C

4-4-5 calendar
- reference link / pandas/tseries
central limit theorem
- reference link / Background
central limit theorem (CLT)
- about / The mean
classes, converter.py
- Converter / pandas/tseries
- Formatters / pandas/tseries
- Locators / pandas/tseries
classes, offsets.py
- DateOffset / pandas/tseries
- BusinessMixin / pandas/tseries
- MonthOffset / pandas/tseries
- MonthBegin / pandas/tseries
- MonthEnd / pandas/tseries
- BusinessMonthEnd / pandas/tseries
- BusinessMonthBegin / pandas/tseries
- YearOffset / pandas/tseries
- YearBegin / pandas/tseries
- YearEnd / pandas/tseries
- BYearEnd / pandas/tseries
- BYearBegin / pandas/tseries
- Week / pandas/tseries
- WeekDay / pandas/tseries
- WeekOfMonth / pandas/tseries
- LastWeekOfMonth / pandas/tseries
- QuarterOffset / pandas/tseries
- QuarterEnd / pandas/tseries
- QuarterrBegin / pandas/tseries
- BQuarterEnd / pandas/tseries
- BQuarterBegin / pandas/tseries
- FY5253Quarter / pandas/tseries
- FY5253 / pandas/tseries
- Easter / pandas/tseries
- Tick / pandas/tseries
classes, parsers.py
- TextFileReader / pandas/io
- ParserBase / pandas/io
- CParserWrapper / pandas/io
- PythonParser / pandas/io
- FixedWidthReader / pandas/io
- FixedWithFieldParser / pandas/io
classes, plm.py
- PanelOLS / pandas/stats
- MovingPanelOLS / pandas/stats
- NonPooledPanelOLS / pandas/stats
classes, sql.py
- PandasSQL / pandas/io
- PandasSQLAlchemy / pandas/io
- PandasSQLTable / pandas/io
- PandasSQLTableLegacy / pandas/io
- PandasSQLLegacy / pandas/io
column
- multiple functions, applying to / Applying multiple functions
column name
- specifying, in R / Specifying column name in R
- specifying, in pandas / Specifying column name in pandas
columns
- arithmetic operations, applying on / Arithmetic operations on columns
concat function
- about / The concat function
concat function, elements
- objs function / The concat function
- axis function / The concat function
- join function / The concat function
- join_axes function / The concat function
- keys function / The concat function
concat operation
- reference link / The join function
Conda
- documentation, URL / Final step for all platforms
conda command
- URL / Final step for all platforms
Confidence (Frequentist) interval
- versus Credible (Bayesian) interval / Confidence (Frequentist) versus Credible (Bayesian) intervals
confidence interval
- about / Confidence intervals
- example / An illustrative example
container types, R
- Vector / R data types
- List / R data types
- DataFrame / R data types
- Matrix / R data types
continuous probability distributions
- about / Continuous probability distributions
- continuous uniform distribution / The continuous uniform distribution
- exponential distribution / The exponential distribution
- normal distribution / The normal distribution
continuous uniform distribution
- about / The continuous uniform distribution
Continuum Analytics
- URL / Third-party Python software installation
correlation
- about / Correlation and linear regression, Correlation
- reference link / Correlation, An illustrative example
Credible (Bayesian) interval
- versus Confidence (Frequentist) interval / Confidence (Frequentist) versus Credible (Bayesian) intervals
cross-sections / Cross sections
cut() function, pandas
- about / The pandas solution
cut() method, R
- about / An R example using cut()
- reference link / An R example using cut()
Cython / What is pandas?
- URL / Source installation
Cython documentation
- reference link / Improving performance using Python extensions

D

data
- grouping / Grouping of data
- reshaping / Pivots and reshaping data
- resampling / Resampling of data
data analysis
- motivation / Motivation for data analysis
- big data / We live in a big data world
- time limitation / So much data, so little time for analysis
- URL / So much data, so little time for analysis
- real-time analytics / The move towards real-time analytics
DataFrame
- about / DataFrame
- creating / DataFrame Creation
- creating, with dictionaries of Series / Using dictionaries of Series
- creating, with dictionary of ndarrays/lists / Using a dictionary of ndarrays/lists
- creating, with structured array / Using a structured array
- creating, with Series structure / Using a Series structure
- constructors / Using a Series structure
- operations / Operations
- single row, appending to / Appending a single row to a DataFrame
DataFrame.join function / The join function
DataFrame constructors
- DataFrame.from_dict / Using a Series structure
- DataFrame.from_records / Using a Series structure
- DataFrame.from_items / Using a Series structure
- pandas.io.parsers.read_csv / Using a Series structure
- pandas.io.parsers.read_table / Using a Series structure
- pandas.io.parsers.read_fwf / Using a Series structure
DataFrame objects
- SQL-like merging/joining / SQL-like merging/joining of DataFrame objects
DataFrame operations
- selection / Selection
- assignment / Assignment
- deletion / Deletion
- alignment / Alignment
- mathematical operations / Other mathematical operations
dataset, Python
- measures of central tendency, computing of / Computing measures of central tendency of a dataset in Python
data structure, pandas
- Series / Series
- DataFrame / DataFrame
- panels / Panel
data types, Numpy
- reference link / R data types
data types, R
- about / R data types
- reference link / R data types
DateOffset object
- about / DateOffset and TimeDelta objects
- features / DateOffset and TimeDelta objects
ddply
- reference link / Split-apply-combine
Debian Python page
- URL / Linux
decision trees / Decision trees
dependence
- reference link / Correlation
descriptive statistics
- versus inferential statistics / Descriptive statistics versus inferential statistics
deviation
- about / Deviation and variance
dimensionality reduction / Dimensionality reduction
discrete probability distributions
- about / Discrete probability distributions
discrete uniform distribution
- about / Discrete uniform distributions
- Bernoulli distribution / The Bernoulli distribution
- binomial distribution / The binomial distribution
- Poisson distribution / The Poisson distribution
- Geometric distribution / The Geometric distribution
- negative binomial distribution / The negative binomial distribution
distribution
- fitting / Fitting a distribution
downsampling
- about / Resampling of data

E

Enhancing Performance, documentation
- reference link / Improving performance using Python extensions
Enthought
- URL / Third-party Python software installation
Enthought Canopy
- URL / Other numeric or analytics-focused Python distributions
exponential distribution
- about / The exponential distribution
- reference link / The exponential distribution

F

Facebook (FB)
- about / Bayesian analysis example – Switchpoint detection
factors / categorical data
- about / Factors/categorical data
Fedora software installs
- URL / Linux
file hierarchy, pandas
- pandas/core / Introduction to pandas' file hierarchy, pandas/core
- pandas/src / Introduction to pandas' file hierarchy
- pandas/io / Introduction to pandas' file hierarchy, pandas/io
- pandas/tools / Introduction to pandas' file hierarchy, pandas/tools
- pandas/sparse / Introduction to pandas' file hierarchy, pandas/sparse
- pandas/stats / Introduction to pandas' file hierarchy, pandas/stats
- pandas/util / Introduction to pandas' file hierarchy, pandas/util
- pandas/rpy / Introduction to pandas' file hierarchy, pandas/rpy
- pandas/tests / pandas/tests
- pandas/compat / pandas/compat
- pandas/computation / pandas/computation
- pandas/tseries / pandas/tseries
- pandas/sandbox / pandas/sandbox
filtering
- applying, on groupby object / Filtering
FM regression
- reference link / pandas/stats
frequency aliases
- reference link / Frequency conversion
frequency conversion / Frequency conversion
Frequentists
- about / How the model is defined
Frequentist statistics
- versus Bayesian statistics / Bayesian statistics versus Frequentist statistics

G

Geometric distribution
- about / The Geometric distribution
get-pip script
- URL / Third-party Python software installation
GitHub
- IPython download, URL / Windows
groupby-transform function / The transform() method
groupby.py submodule
- Splitter classes / pandas/core
- Grouper/Grouping classes / pandas/core
groupby object
- filtering, applying on / Filtering
groupby operation
- about / The groupby operation
- using, with MultiIndex / Using groupby with a MultiIndex
GroupBy operator
- about / Aggregation and GroupBy
- using / The pandas' GroupBy operator

H

histograms, versus bar plots
- reference link / Computing measures of central tendency of a dataset in Python
hyperparameters / The scikit-learn ML/classifier interface
hypothesis testing
- about / Hypothesis testing – the null and alternative hypotheses
- null hypothesis / The null and alternative hypotheses
- alternative hypothesis / The null and alternative hypotheses

I

%in% operator, R / R %in% operator
.iat operator
- about / The .iat and .at operators
.iloc operator
- about / Label, integer, and mixed indexing
.ix operator
- about / Label, integer, and mixed indexing
- indexing, mixing with / Mixed indexing with the .ix operator
illustration, with document classification
- about / Illustration using document classification
- supervised learning / Supervised learning
- unsupervised learning / Unsupervised learning
independent samples t-tests / Types of t-tests
indexing, pandas
- about / Basic indexing
- attributes, accessing with dot operator / Accessing attributes using dot operator
- range slicing / Range slicing
- mixing, with .ix operator / Mixed indexing with the .ix operator
inferential statistics
- versus descriptive statistics / Descriptive statistics versus inferential statistics
integer-oriented indexing
- about / Label, integer, and mixed indexing, Integer-oriented indexing
Intel
- URL / The move towards real-time analytics
Interactive Python (IPython)
- about / IPython
- URL / IPython
interpolate() function
- reference link / Handling missing values
IPython
- installation / IPython installation
- installation, on Linux / Linux
- installation, on Windows / Windows
- installation, URL / Windows
- installation, on Mac OS/X / Mac OS X
- installation, via Anaconda / Install via Anaconda (for Linux/Mac OS X)
- installation, Wakari / Wakari by Continuum Analytics
- installation, with virtualenv / Virtualenv
IPython Notebook
- URL / IPython Notebook
isin() function, pandas / The pandas isin() function

J

join function
- about / The join function
joining
- about / Merging and joining
join operation
- reference link / The join function

K

K-means clustering / K-means clustering
K-means clustering, scikit-learn
- reference link / K-means clustering
Kaggle
- URL / Application of machine learning – Kaggle Titanic competition, A naïve approach to Titanic problem
Kaggle Titanic competition application
- about / Application of machine learning – Kaggle Titanic competition, The titanic: machine learning from disaster problem
- problem of overfitting / The problem of overfitting

L

.loc operator
- about / Label, integer, and mixed indexing
label-oriented indexing / Label, integer, and mixed indexing, Label-oriented indexing
- about / Label-oriented indexing
- selection, Boolean array used / Selection using a Boolean array
lagging / Shifting/lagging
lambda functions
- reference link / The groupby operation
law of large numbers (LLN)
- reference link / The mean
levels
- swapping / Swapping and reordering levels
- re-ordering / Swapping and reordering levels
linear regression
- about / Correlation and linear regression, Linear regression
- example / An illustrative example
Linux
- Python installation / Linux
- Anaconda installation / Linux
- panda installation / Linux
- IPython installation / Linux
logical operators, NumPy array
- np.all() / Logical operators
- np.any() / Logical operators
logical subsetting
- about / Logical subsetting
- in R / Logical subsetting in R
- in pandas / Logical subsetting in pandas
logistic regression
- about / Logistic regression
- reference link / Logistic regression

M

machine learning
- about / Introduction to machine learning
- reference link / Introduction to machine learning
machine learning application
- Kaggle Titanic competition / Application of machine learning – Kaggle Titanic competition
machine learning systems
- working / How machine learning systems learn
Mac OS/X
- Python, installing / Linux, Mac OS X
- Python, installing from compressed tarball / Installing Python from compressed tarball
- Python installation, URL / Installation using a package manager
- Anaconda installation / Mac OS X
- panda installation / Mac
- IPython installation / Mac OS X
- IPython installation, URL / Mac OS X
Markov Chain Monte Carlo (MCMC)
- about / Monte Carlo estimation of the likelihood function and PyMC
Markov Chain Monte Carlo Maximum Likelihood
- reference link / Monte Carlo estimation of the likelihood function and PyMC
matching operators
- comparing, in R and pandas / Comparing matching operators in R and pandas
mathematical framework, Bayesian statistics
- about / Mathematical framework for Bayesian statistics
matplotlib
- using, for plotting / Plotting using matplotlib
- reference link / Plotting using matplotlib
maximum likelihood estimator (MLE)
- about / How the model is defined
mean
- about / Measures of central tendency and variability, The mean
measure of central tendency
- about / Measures of central tendency and variability, Measures of central tendency
- mean / The mean
- median / The median
- mode / The mode
- computing, for dataset in Python / Computing measures of central tendency of a dataset in Python
measure of dispersion
- about / Measures of variability, dispersion, or spread
- range / Range
- quartile / Quartile
measure of spread
- about / Measures of variability, dispersion, or spread
measure of variability
- about / Measures of central tendency and variability, Measures of variability, dispersion, or spread
median
- about / Measures of central tendency and variability, The median
melt() function, pandas
- about / The pandas melt() function
melt() function, R
- about / The R melt() function
melt function
- using / Using the melt function
- used, for reshaping / Reshaping using melt
merge function
- about / SQL-like merging/joining of DataFrame objects
merge function, arguments
- left / SQL-like merging/joining of DataFrame objects
- right / SQL-like merging/joining of DataFrame objects
- how / SQL-like merging/joining of DataFrame objects
- on / SQL-like merging/joining of DataFrame objects
- left_on / SQL-like merging/joining of DataFrame objects
- right_on / SQL-like merging/joining of DataFrame objects
- left_index / SQL-like merging/joining of DataFrame objects
- right_index / SQL-like merging/joining of DataFrame objects
- sort / SQL-like merging/joining of DataFrame objects
- suffixes / SQL-like merging/joining of DataFrame objects
- copy / SQL-like merging/joining of DataFrame objects
merge operation
- reference link / The join function
merging
- about / Merging and joining
- reference link / The concat function, SQL-like merging/joining of DataFrame objects
methods, for reshaping DataFrames
- about / Other methods to reshape DataFrames
- melt function / Using the melt function
- pandas.get_dummies() function / The pandas.get_dummies() function
methods, math.py
- rank(..) / pandas/stats
- solve(..) / pandas/stats
- inv(..) / pandas/stats
- is_psd(..) / pandas/stats
- newey_west(..) / pandas/stats
- calc_F(..) / pandas/stats
methods, parsers.py
- read_csv(..) / pandas/io
- read_table(..) / pandas/io
- read_fwf(..) / pandas/io
methods, pickle.py
- to_pickle(..) / pandas/io
- read_pickle(..) / pandas/io
methods, plotting.py
- scatter_matrix(..) / pandas/tools
- andrews_curves(..) / pandas/tools
- parallel_coordinates(..) / pandas/tools
- lag_plot(..) / pandas/tools
- autocorrelation_plot(..) / pandas/tools
- bootstrap_plot(..) / pandas/tools
- radviz(..) / pandas/tools
methods, sql.py
- pandasSQL_builder(..) / pandas/io
- get_schema(..) / pandas/io
- read_sql_table(..) / pandas/io
- read_sql_query(..) / pandas/io
- read_sql(..) / pandas/io
methods, util.py
- isleapyear(..) / pandas/tseries
- pivot_annual(..) / pandas/tseries
MinGW installation, on Windows
- URL / Source installation
missing data
- handling / Handling missing data
missing values
- handling / Handling missing values
mode
- about / Measures of central tendency and variability, The mode
Monte Carlo (MC) integration
- about / Monte Carlo estimation of the likelihood function and PyMC
- reference link / Monte Carlo estimation of the likelihood function and PyMC
Monte Carlo estimation, likelihood function
- about / Monte Carlo estimation of the likelihood function and PyMC
Monte Carlo estimation, PyMC
- about / Monte Carlo estimation of the likelihood function and PyMC
MSI packages
- URL, for download / Core Python installation
multi-indexing / MultiIndexing
MultiIndex
- groupby operation, using with / Using groupby with a MultiIndex
multiple columns
- selecting, in R / Multicolumn selection in R
- selecting, in pandas / Multicolumn selection in pandas
multiple functions
- applying, to column / Applying multiple functions
multiple object classes, internals.py
- Block / pandas/core
- NumericBlock / pandas/core
- FloatOrComplexBlock / pandas/core
- ComplexBlock / pandas/core
- FloatBlock / pandas/core
- IntBlock / pandas/core
- BoolBlock / pandas/core
- TimeDeltaBlock / pandas/core
- DatetimeBlock / pandas/core
- ObjectBlock / pandas/core
- SparseBlock / pandas/core
- BlockManager / pandas/core
- SingleBlockManager / pandas/core
- JoinUnit / pandas/core

N

N-dimensional version, DataFrame
- reference link / pandas/core
naïve approach, to Titanic problem / A naïve approach to Titanic problem
negative binomial distribution
- about / The negative binomial distribution
normal distribution
- about / The normal distribution
NoSQL
- URL / Variety of big data
np.nan* aggregation functions, NumPy
- reference link / Handling missing data
np.newaxis function / Adding a dimension
np.reshape function
- URL / Reshaping
null, and alternative hypotheses
- alpha value / The alpha and p-values
- p-value / The alpha and p-values
null hypothesis
- about / The null and alternative hypotheses
Null Signifcance Hypothesis Testing (NHST) / A t-test example
numexpr
- reference link / pandas/computation
NumPy
- ndarrays / NumPy ndarrays
- URL / NumPy ndarrays
- datatypes / NumPy datatypes
- datatypes, URL / NumPy datatypes
- indexing / NumPy indexing and slicing
- slicing / NumPy indexing and slicing
- array, slicing / Array slicing
- array, masking / Array masking
- complex indexing / Complex indexing
Numpy
- URL / Source installation
numpy.dot
- URL / Basic operations
numpy.percentile function
- reference link / Quartile
NumPy array
- URL / NumPy ndarrays
- creating / NumPy array creation
- creating, via numpy.array / NumPy arrays via numpy.array
- creating, via numpy.arange / NumPy array via numpy.arange
- creating, via numpy.linspace / NumPy array via numpy.linspace
- creating, via various other functions / NumPy array via various other functions
- indexing, URL / Array slicing
- copies / Copies and views
- views / Copies and views
- operations / Operations
- btoadcasting / Broadcasting
- shape manipulation / Array shape manipulation
- sorting / Array sorting
Numpy array
- versus R-matrix / R-matrix and NumPy array compared
NumPy array, creating via various function
- about / NumPy array via various other functions
- numpy.ones / numpy.ones
- numpy.eye / numpy.eye
- numpy.diag / numpy.diag
- numpy.random.rand / numpy.random.rand
- numpy.empty / numpy.empty
- numpy.tile / numpy.tile
NumPy ndarrays
- about / NumPy ndarrays

O

objects
- slicing / Slicing and selection
odds
- about / Bayes theory and odds
one sample independent t-test / Types of t-tests
Open Suse
- URL / Linux
operations, NumPy array
- basic operations / Basic operations
- reduction operations / Reduction operations
- statistical operators / Statistical operators
- logical operators / Logical operators
Ordinary Least Squares (OLS) / pandas/stats
overfitting / The problem of overfitting

P

p-value
- references / The alpha and p-values
pad method
- reference link / Handling missing values
paired samples t-test / Types of t-tests
Pandas
- installing, from third-party vendor / Installation of Python and pandas from a third-party vendor
pandas
- about / How Python and pandas fit into the data analytics mix, What is pandas?
- features / What is pandas?
- URL / What is pandas?
- benefits / Benefits of using pandas
- installing, from third-party vendor / Installation of Python and pandas from a third-party vendor
- downloading / Downloading and installing pandas
- installing / Downloading and installing pandas
- installing, on Linux / Linux
- installing, on Mac / Mac
- installing, on Windows / Windows
- URL, for download / Source installation
- data structures / Data structures in pandas
- data structures, URL / Data structures in pandas
- indexing / Basic indexing
- file hierarchy / Introduction to pandas' file hierarchy
- column name, specifying in / Specifying column name in pandas
- multiple columns, selecting in / Multicolumn selection in pandas
- isin() function / The pandas isin() function
- logical subsetting / Logical subsetting in pandas
- split-apply-combine, implementing in / Implementation in pandas
- melt() function / The pandas melt() function
- cut() function / The pandas solution
- used, for data analysis / Data analysis and preprocessing using pandas
- used, for preprocessing / Data analysis and preprocessing using pandas
- data, examining / Examining the data
- missing values, handling / Handling missing values
pandas.DataFrame.any
- URL / The is in and any all methods
pandas.get_dummies() function
- about / The pandas.get_dummies() function
pandas/compat
- submodules / pandas/compat
pandas/computation
- submodules / pandas/computation
pandas/core
- about / Introduction to pandas' file hierarchy
- submodules / pandas/core
pandas/io
- about / Introduction to pandas' file hierarchy
- submodules / pandas/io
pandas/rpy
- about / Introduction to pandas' file hierarchy
- submodules / pandas/rpy
- reference link / pandas/rpy
pandas/sparse
- about / Introduction to pandas' file hierarchy
- submodules / pandas/sparse
- reference link / pandas/sparse
pandas/src
- about / Introduction to pandas' file hierarchy
pandas/stats
- about / Introduction to pandas' file hierarchy
- submodules / pandas/stats
pandas/tools
- about / Introduction to pandas' file hierarchy
- submodules / pandas/tools
pandas/tseries
- submodules / pandas/tseries
pandas/util
- about / Introduction to pandas' file hierarchy
- submodules / pandas/util
pandas DataFrames
- versus R DataFrames / R's DataFrames versus pandas' DataFrames
pandas installation, on Linux
- for Ubuntu/Debian / Ubuntu/Debian
- for Red Hat / Red Hat
- for Fedora / Fedora
- for OpenSuse / OpenSuse
pandas installation, on Mac
- source installation / Source installation
- binary installation / Binary installation
pandas installation, on Windows
- binary installation / Binary Installation
- binary installation, URL / Binary Installation
- source installation / Source installation
- Interactive Python (IPython) tool / IPython
- Interactive Python (IPython) tool / IPython
- IPython Notebook / IPython Notebook
pandas series
- versus R lists / R lists and pandas series compared
panel
- about / Panel
- items / Panel
- major_axis / Panel
- minor_axis / Panel
- 3D NumPy array, using with axis labels / Using 3D NumPy array with axis labels
- Python dictionary of DataFrame structures, using / Using a Python dictionary of DataFrame objects
parsers.py
- reference link / pandas/io
Patsy
- model, constructing for scikit-learn / Constructing a model using Patsy for scikit-learn
- reference link / Constructing a model using Patsy for scikit-learn
performance
- improving, Python extensions used / Improving performance using Python extensions
pip / Third-party Python software installation
pivots
- about / Pivots and reshaping data
pivot_table
- references / Pivots and reshaping data
plotting
- performing, with matplotlib / Plotting using matplotlib
Poisson distribution
- about / The Poisson distribution
- reference link / The Poisson distribution
power law
- reference link / Linear regression
Principal Component Analysis (PCA) / Dimensionality reduction
probability
- about / What is probability?
probability density function (PDF) / Continuous probability distributions
probability distributions
- about / Probability distributions
probability mass function (pmf)
- about / Discrete probability distributions
PYMC Pandas Example
- URL / IPython Notebook
PyPI Readline package
- URL / Windows
Python
- about / How Python and pandas fit into the data analytics mix
- features / How Python and pandas fit into the data analytics mix
- URL / How Python and pandas fit into the data analytics mix, Selecting a version of Python to use, Installing Python from compressed tarball
- libraries / How Python and pandas fit into the data analytics mix
- version, selecting / Selecting a version of Python to use
- installation, on Linux / Linux
- installation, on Windows / Core Python installation
- installation, on Mac OS/X / Mac OS X
- Anaconda package, URL / Installation of Python and pandas from a third-party vendor
Python(x,y)
- URL / Other numeric or analytics-focused Python distributions
Python 3.0
- URL / Selecting a version of Python to use
- references / Selecting a version of Python to use
Python decorators
- reference link / pandas/util
Python dictionary, DataFrame objects
- DataFrame.to_panel method, using / Using the DataFrame.to_panel method
- DataFrame.to_panel method, references / Using the DataFrame.to_panel method
- other operations / Other operations
Python extensions
- used, for improving performance / Improving performance using Python extensions
Python installation, on Linux
- about / Linux
- from compressed tarball / Installing Python from compressed tarball
Python installation, on Mac OS/X
- about / Mac OS X
- URL / Mac OS X
- package manager, using / Installation using a package manager
Python installation, on Windows
- about / Windows
- core Python installation / Core Python installation
- third-party software install / Third-party Python software installation
- URL / Third-party Python software installation
Python Lexical Analysis
- URL / Accessing attributes using dot operator

Q

quartile
- about / Quartile
- reference link / Quartile

R

R
- data types / R data types
- column name, specifying in / Specifying column name in R
- multiple columns, selecting in / Multicolumn selection in R
- %in% operator / R %in% operator
- logical subsetting / Logical subsetting in R
- split-apply-combine, implementing in / Implementation in R
- melt() function / The R melt() function
- cut() method / An R example using cut()
R, and pandas
- matching operators, comparing in / Comparing matching operators in R and pandas
R-matrix
- versus Numpy array / R-matrix and NumPy array compared
random forest / Random forest
random walk hypothesis
- reference link / The exponential distribution
range / Range
R DataFrames
- about / R DataFrames
- versus pandas DataFrames / R's DataFrames versus pandas' DataFrames
README file, scikit-learn
- reference link / Installing on Windows
R lists
- about / R lists
- versus pandas series / R lists and pandas series compared
role of pandas, in machine learning / Role of pandas in machine learning

S

sample covariance
- reference link / The mean
sample mean
- reference link / The mean
scikit-learn
- about / Role of pandas in machine learning
- installing / Installation of scikit-learn
- installing, via Anacondas / Installing via Anaconda
- installing, on Unix (Linux/Mac OSX) / Installing on Unix (Linux/Mac OS X)
- installing, on Windows / Installing on Windows
- reference link / Installing on Windows
- model. constructing for / Constructing a model using Patsy for scikit-learn
scikit-learn ML/classifier interface
- about / The scikit-learn ML/classifier interface
- reference link / The scikit-learn ML/classifier interface
scipy.stats function
- reference link / Quartile
Scipy Lecture Notes, Interfacing with C
- reference link / Improving performance using Python extensions
Series
- creating / Series creation
- creating, with numpy.ndarray / Using numpy.ndarray
- creating, with Python dictionary / Using Python dictionary
- creating, with scalar values / Using scalar values
- operations / Operations on Series
Series operations
- assignment / Assignment
- slicing / Slicing
- arithmetic and statistical operations / Other operations
Setuptools
- about / Third-party Python software installation
- URL / Third-party Python software installation
shape manipulation, NumPy array
- about / Array shape manipulation
- multi-dimensional array, flattening / Flattening a multi-dimensional array
- reshaping / Reshaping
- resizing / Resizing
- dimension, adding / Adding a dimension
shifting / Shifting/lagging
single row
- appending, to DataFrame / Appending a single row to a DataFrame
sortlevel() method / MultiIndexing
sparse.py
- reference link / pandas/core
split-apply-combine
- about / Split-apply-combine
- implementing, in R / Implementation in R
- implementing, in pandas / Implementation in pandas
SQL-like merging/joining, of DataFrame objects / SQL-like merging/joining of DataFrame objects
SQL joins
- reference link / SQL-like merging/joining of DataFrame objects
stack() function
- about / The stack() function
stacking
- about / Stacking and unstacking
statistical hypothesis tests
- about / Statistical hypothesis tests
- background / Background
- z-test / The z-test
- t-test / The t-test
structured array, DataFrame
- URL / Using a structured array
submodules, pandas/compat
- chainmap.py / pandas/compat
- chainmap_impl.py / pandas/compat
- pickle_compat.py / pandas/compat
- openpyxl_compat.py / pandas/compat
submodules, pandas/computation
- api.py / pandas/computation
- align.py / pandas/computation
- common.py / pandas/computation
- engines.py / pandas/computation
- eval.py / pandas/computation
- expressions.py / pandas/computation
- ops.py / pandas/computation
- pytables.py / pandas/computation
- scope.py / pandas/computation
submodules, pandas/core
- api.py / pandas/core
- array.py / pandas/core
- base.py / pandas/core
- common.py / pandas/core
- config.py / pandas/core
- datetools.py / pandas/core
- frame.py / pandas/core
- generic.py / pandas/core
- categorical.py / pandas/core
- format.py / pandas/core
- groupby.py / pandas/core
- ops.py / pandas/core
- index.py / pandas/core
- internals.py / pandas/core
- matrix.py / pandas/core
- nanops.py / pandas/core
- panel.py / pandas/core
- panel4d.py / pandas/core
- panelnd.py / pandas/core
- series.py / pandas/core
- sparse.py / pandas/core
- strings.py / pandas/core
submodules, pandas/io
- api.py / pandas/io
- auth.py / pandas/io
- common.py / pandas/io
- data.py / pandas/io
- date_converters.py / pandas/io
- excel.py / pandas/io
- ga.py / pandas/io
- gbq.py / pandas/io
- html.py / pandas/io
- json.py / pandas/io
- packer.py / pandas/io
- parsers.py / pandas/io
- pickle.py / pandas/io
- pytables.py / pandas/io
- sql.py / pandas/io
- to_sql(..) / pandas/io
- stata.py / pandas/io
- wb.py / pandas/io
submodules, pandas/rpy
- base.py / pandas/rpy
- common.py / pandas/rpy
- mass.py / pandas/rpy
- var.py / pandas/rpy
submodules, pandas/sparse
- api.py / pandas/sparse
- array.py / pandas/sparse
- frame.py / pandas/sparse
- list.py / pandas/sparse
- panel.py / pandas/sparse
- series.py / pandas/sparse
submodules, pandas/stats
- api.py / pandas/stats
- common.py / pandas/stats
- fama_macbeth.py / pandas/stats
- interface.py / pandas/stats
- math.py / pandas/stats
- misc.py / pandas/stats
- moments.py / pandas/stats
- ols.py / pandas/stats
- plm.py / pandas/stats
- var.py / pandas/stats
submodules, pandas/tools
- util.py / pandas/tools
- tile.py / pandas/tools
- rplot.py / pandas/tools
- plotting.py / pandas/tools
- pivot.py / pandas/tools
- merge.py / pandas/tools
- describe.py / pandas/tools
submodules, pandas/tseries
- api.py / pandas/tseries
- converter.py / pandas/tseries
- frequencies.py / pandas/tseries
- holiday.py / pandas/tseries
- index.py / pandas/tseries
- interval.py / pandas/tseries
- offsets.py / pandas/tseries
- period.py / pandas/tseries
- plotting.py / pandas/tseries
- resample.py / pandas/tseries
- timedeltas.py / pandas/tseries
- tools.py / pandas/tseries
- util.py / pandas/tseries
submodules, pandas/util
- terminal.py / pandas/util
- print_versions.py / pandas/util
- misc.py / pandas/util
- decorators.py / pandas/util
- clipboard.py / pandas/util
supervised learning
- versus unsupervised learning / Supervised versus unsupervised learning
- about / Supervised learning
supervised learning algorithms
- about / Supervised learning algorithms
- model, constructing for scikit-learn with Patsy / Constructing a model using Patsy for scikit-learn
- general boilerplate code explanation / General boilerplate code explanation
- logistic regression / Logistic regression
- support vector machine (SVM) / Support vector machine
- decision trees / Decision trees
- random forest / Random forest
supervised learning problems
- classification / Supervised versus unsupervised learning
- regression / Supervised versus unsupervised learning
support vector machine (SVM) / Support vector machine
- URL / Support vector machine
swaplevel function / Swapping and reordering levels
SWIG Documentation
- reference link / Improving performance using Python extensions
switchpoint detection, Bayesian analysis example / Bayesian analysis example – Switchpoint detection

T

t-distribution
- reference link / The t-test
t-test
- about / The t-test
- one sample independent t-test / Types of t-tests
- independent samples t-tests / Types of t-tests
- paired samples t-test / Types of t-tests
- reference link / Types of t-tests
- example / A t-test example
tailed test
- reference link / Statistical hypothesis tests
time-series-related instance methods
- about / Time series-related instance methods
- shifting/lagging / Shifting/lagging
- frequency conversion / Frequency conversion
- data, resampling / Resampling of data
- aliases, for Time Series frequencies / Aliases for Time Series frequencies
Time-Series-related objects
- datetime.datetime / A summary of Time Series-related objects
- Timestamp / A summary of Time Series-related objects
- DatetimeIndex / A summary of Time Series-related objects
- Period / A summary of Time Series-related objects
- PeriodIndex / A summary of Time Series-related objects
- DateOffset / A summary of Time Series-related objects
- timedelta / A summary of Time Series-related objects
TimeDelta object / DateOffset and TimeDelta objects
time series
- handling / Handling time series
TimeSeries.resample function
- about / Resampling of data
Time series concepts
- about / Time series concepts and datatypes
time series data
- reading in / Reading in time series data
- TimeDelta object / DateOffset and TimeDelta objects
- DateOffset object / DateOffset and TimeDelta objects
Time series datatypes
- about / Time series concepts and datatypes
- Period / Period and PeriodIndex
- PeriodIndex / PeriodIndex
Time Series datatypes
- conversion between / Conversions between Time Series datatypes
time series datatypes
- PeriodIndex / PeriodIndex
Time Series frequencies
- aliases / Aliases for Time Series frequencies
Titanic problem
- naïve approach / A naïve approach to Titanic problem
transform() method / The transform() method
Type I Error / Type I and Type II errors
Type II Error / Type I and Type II errors

U

UEFA Champions League
- URL / The groupby operation
unbiased estimator
- reference link / Deviation and variance
Unix (Linux/Mac OSX)
- scikit-learn, installing on / Installing on Unix (Linux/Mac OS X)
unstacking
- about / Stacking and unstacking
unsupervised learning
- versus supervised learning / Supervised versus unsupervised learning
- about / Unsupervised learning
unsupervised learning algorithms
- about / Unsupervised learning algorithms
- dimensionality reduction / Dimensionality reduction
- K-means clustering / K-means clustering
upsampling
- about / Resampling of data

V

4V’s of big data
- about / 4 V's of big data, Veracity of big data
- volume / Volume of big data
- velocity / Velocity of big data
- variety / Variety of big data
- veracity / Veracity of big data
variance
- about / Deviation and variance
variety, big data / Variety of big data
vector auto-regression classes, var.py
- VAR / pandas/stats
- PanelVAR / pandas/stats
vector autoregression
- reference link / pandas/stats
velocity, big data / Velocity of big data
veracity, big data / Veracity of big data
virtualenv tool
- about / Virtualenv
- installing / Virtualenv installation and usage
- using / Virtualenv installation and usage
- URL / Virtualenv installation and usage
volume, big data / Volume of big data

W

Wakari
- about / Wakari by Continuum Analytics
- URL / Wakari by Continuum Analytics
where() method / Using the where() method
Windows
- Python, installing / Windows, Core Python installation
- Anaconda installation / Windows
- panda installation / Windows
- IPython installation / Windows
- scikit-learn, installing on / Installing on Windows
WinPython
- URL / Other numeric or analytics-focused Python distributions
World Bank Economic data
- URL / Benefits of using pandas

X

xs method / Cross sections

Z

z-test
- about / The z-test
zettabytes
- URL / Volume of big data