Book Image

Mastering Python for Data Science

By : Samir Madhavan
Book Image

Mastering Python for Data Science

By: Samir Madhavan

Overview of this book

Table of Contents (19 chapters)
Mastering Python for Data Science
Credits
About the Author
About the Reviewers
www.PacktPub.com
Preface
7
Estimating the Likelihood of Events
Index

The k-means clustering with countries


We have UN data on different countries of the world with regard to education of people to Gross Domestic Product. We'll use this data to bucket the countries based on their development. Here are the descriptions of the columns:

Here is a screenshot of the data:

Lets see the data type of each column:

>>> df = pd.read_csv('./Data/UN.csv')
>>> # print the raw column information plus summary header
>>> print('----')
>>> # look at the types of each column explicitly
>>> [(col, type(df[col][0])) for col in df.columns] [(x, type(df[x][0])) for x in df.columns] 
       [('country', str),
       ('region', str),
       ('tfr', numpy.float64),
       ('contraception', numpy.float64),
       ('educationMale', numpy.float64),
       ('educationFemale', numpy.float64),
       ('lifeMale', numpy.float64),
       ('lifeFemale', numpy.float64),
       ('infantMortality', numpy.float64),
       ('GDPperCapita', numpy.float64...