Book Image

Mastering pandas

By : Femi Anthony
Book Image

Mastering pandas

By: Femi Anthony

Overview of this book

<p>Python is a ground breaking language for its simplicity and succinctness, allowing the user to achieve a great deal with a few lines of code, especially compared to other programming languages. The pandas brings these features of Python into the data analysis realm, by providing expressiveness, simplicity, and powerful capabilities for the task of data analysis. By mastering pandas, users will be able to do complex data analysis in a short period of time, as well as illustrate their findings using the rich visualization capabilities of related tools such as IPython and matplotlib.</p> <p>This book is an in-depth guide to the use of pandas for data analysis, for either the seasoned data analysis practitioner or the novice user. It provides a basic introduction to the pandas framework, and takes users through the installation of the library and the IPython interactive environment. Thereafter, you will learn basic as well as advanced features, such as MultiIndexing, modifying data structures, and sampling data, which provide powerful capabilities for data analysis.</p>
Table of Contents (18 chapters)
Mastering pandas
Credits
About the Author
About the Reviewers
www.PacktPub.com
Preface
Index

Unsupervised learning algorithms


There are two tasks that we are mostly concerned with in unsupervised learning: dimensionality reduction and clustering.

Dimensionality reduction

Dimensionality reduction is used to help visualize higher-dimensional data in a systematic way. This is useful because our human brains can visualize only three spatial dimensions (and possibly, a temporal one), but most datasets involve much higher dimensions.

The typical technique used in dimensionality reduction is Principal Component Analysis (PCA). PCA involves using linear algebra techniques to project higher-dimensional data onto a lower-dimensional space. This inevitably involves the loss of information, but often by projecting along the correct set and number of dimensions, the information loss can be minimized. A common dimensionality reduction technique is to find the combination of variables that explain the most variance (proxy for information) in our data and project along these dimensions.

In the case...