Book Image

Learning NumPy Array

By : Ivan Idris
Book Image

Learning NumPy Array

By: Ivan Idris

Overview of this book

Table of Contents (14 chapters)
Learning NumPy Array
Credits
About the Author
About the Reviewers
www.PacktPub.com
Preface
Index

Clustering stocks with scikit-learn


Scikit-learn is an open source software for machine learning. Clustering is a type of machine learning algorithm that aims to group items based on similarities.

Note

A legion of scikits exists. These are all open source scientific Python projects. For a list of scikits, please refer to https://scikits.appspot.com/scikits.

Clustering is unsupervised, which means that you don't have to create learning examples. The algorithm puts items in the appropriate bucket based on some measure of distance, so that items that are close to each other end up in the same bucket. In this example, we will use the log returns of stocks in the Dow Jones Industrial (DJI) Index to cluster.

Note

A myriad of clustering algorithms exist, and since this is a rapidly evolving field, new algorithms are invented each year. Due to the exigencies of this book, we cannot touch upon all of them. The interested reader can have a look at https://en.wikipedia.org/wiki/Cluster_analysis.

First, we...