Book Image

Mastering Python for Data Science

By : Samir Madhavan
Book Image

Mastering Python for Data Science

By: Samir Madhavan

Overview of this book

Table of Contents (19 chapters)
Mastering Python for Data Science
Credits
About the Author
About the Reviewers
www.PacktPub.com
Preface
7
Estimating the Likelihood of Events
Index

Clustering the countries


We'll now apply the k-means algorithm to cluster the countries together:

>>> km = KMeans(3, init='k-means++', random_state = 3425) # initialize
>>> km.fit(df.values)
>>> df['countrySegment'] = km.predict(df.values)
>>> df[:5]

After the preceding code is executed we'll get the following output:

Let's find the average GDP per capita for each country segment:

>>> df.groupby('countrySegment').GDPperCapita.mean()
>>> countrySegment
0    13800.586207
1     1624.538462
2    29681.625000
Name: GDPperCapita, dtype: float64

We can see that cluster 2 has the highest average GDP per capita and we can assume that this includes developed countries. Cluster 0 has the second highest GDP, we can assume this includes developing countries, and finally, cluster 1 has a very low average GDP per capita. We can assume this includes developed nations:

>>> clust_map = {
       0:'Developing',
       1:'Under Developed',
     ...