Book Image

Mastering Numerical Computing with NumPy

By : Umit Mert Cakmak, Tiago Antao, Mert Cuhadaroglu
Book Image

Mastering Numerical Computing with NumPy

By: Umit Mert Cakmak, Tiago Antao, Mert Cuhadaroglu

Overview of this book

NumPy is one of the most important scientific computing libraries available for Python. Mastering Numerical Computing with NumPy teaches you how to achieve expert level competency to perform complex operations, with in-depth coverage of advanced concepts. Beginning with NumPy's arrays and functions, you will familiarize yourself with linear algebra concepts to perform vector and matrix math operations. You will thoroughly understand and practice data processing, exploratory data analysis (EDA), and predictive modeling. You will then move on to working on practical examples which will teach you how to use NumPy statistics in order to explore US housing data and develop a predictive model using simple and multiple linear regression techniques. Once you have got to grips with the basics, you will explore unsupervised learning and clustering algorithms, followed by understanding how to write better NumPy code while keeping advanced considerations in mind. The book also demonstrates the use of different high-performance numerical computing libraries and their relationship with NumPy. You will study how to benchmark the performance of different configurations and choose the best for your system. By the end of this book, you will have become an expert in handling and performing complex data manipulations.
Table of Contents (11 chapters)

Implementing our algorithm for a single variable

Let's implement the k-means algorithm for a single variable. You will start with one dimensional vector, which has 20 records, as shown here:

data = [1,2,3,2,1,3,9,8,11,12,10,11,14,25,26,24,30,22,24,27] 
 
trace1 = go.Scatter( 
    x=data, 
    y=[0 for x in data], 
    mode='markers', 
    name='Data', 
    marker=dict( 
        size=12 
    ) 
) 
 
layout = go.Layout( 
title='1D vector',
)

traces = [trace1]

fig = go.Figure(data=traces, layout=layout)

plot(fig)

This will output following plot, as shown in this diagram:

Our aim is to find 3 clusters which are visible in the data. In order to start implementing the k-means algorithm, you need to initialize cluster centers by choosing random indexes, as shown here:

n_clusters = 3

c_centers = np.random.choice(X, n_clusters)

print(c_centers)

# [ 1 22 26...