#### Overview of this book

NumPy is one of the most important scientific computing libraries available for Python. Mastering Numerical Computing with NumPy teaches you how to achieve expert level competency to perform complex operations, with in-depth coverage of advanced concepts. Beginning with NumPy's arrays and functions, you will familiarize yourself with linear algebra concepts to perform vector and matrix math operations. You will thoroughly understand and practice data processing, exploratory data analysis (EDA), and predictive modeling. You will then move on to working on practical examples which will teach you how to use NumPy statistics in order to explore US housing data and develop a predictive model using simple and multiple linear regression techniques. Once you have got to grips with the basics, you will explore unsupervised learning and clustering algorithms, followed by understanding how to write better NumPy code while keeping advanced considerations in mind. The book also demonstrates the use of different high-performance numerical computing libraries and their relationship with NumPy. You will study how to benchmark the performance of different configurations and choose the best for your system. By the end of this book, you will have become an expert in handling and performing complex data manipulations.
Table of Contents (11 chapters)
Preface
Free Chapter
Working with NumPy Arrays
Linear Algebra with NumPy
Exploratory Data Analysis of Boston Housing Data with NumPy Statistics
Predicting Housing Prices Using Linear Regression
Clustering Clients of a Wholesale Distributor Using NumPy
NumPy, SciPy, Pandas, and Scikit-Learn
Advanced Numpy
Overview of High-Performance Numerical Computing Libraries
Performance Benchmarks
Other Books You May Enjoy

# Working with multidimensional arrays

This section will give you a brief understanding of multidimensional arrays by going through different matrix operations.

In order to do matrix multiplication in NumPy, you have to use dot() instead of *. Let's see some examples:

`In [66]: c = np.ones((4, 4))         c*cOut[66]: array([[ 1., 1., 1., 1.],                [ 1., 1., 1., 1.],                [ 1., 1., 1., 1.],                [ 1., 1., 1., 1.]])In [67]: c.dot(c)Out[67]: array([[ 4., 4., 4., 4.],                [ 4., 4., 4., 4.],                [ 4., 4., 4., 4.],                [ 4., 4., 4., 4.]])`

The most important topic in working with multidimensional arrays is stacking, in other words how to merge two arrays. hstack is used for stacking arrays horizontally (column-wise) and vstack is used for stacking arrays vertically (row-wise). You can also split the columns with the hsplit and vsplit methods in the same way that you stacked them:

`In [68]: y = np.arange(15).reshape(3,5)         x = np.arange(10).reshape(2,5)         new_array = np.vstack((y,x))         new_arrayOut[68]: array([[ 0, 1, 2, 3, 4],                [ 5, 6, 7, 8, 9],                [10, 11, 12, 13, 14],                [ 0, 1, 2, 3, 4],                [ 5, 6, 7, 8, 9]])In [69]: y = np.arange(15).reshape(5,3)         x = np.arange(10).reshape(5,2)         new_array = np.hstack((y,x))         new_arrayOut[69]: array([[ 0, 1, 2, 0, 1],                [ 3, 4, 5, 2, 3],                [ 6, 7, 8, 4, 5],                [ 9, 10, 11, 6, 7],                [12, 13, 14, 8, 9]])`

These methods are very useful in machine learning applications, especially when creating datasets. After you stack your arrays, you can check their descriptive statistics by using Scipy.stats. Imagine a case where you have 100 records, and each record has 10 features, which means you have a 2D matrix which has 100 rows and 10 columns. The following example shows how you can easily get some descriptive statistics for each feature:

`In [70]: from scipy import stats         x= np.random.rand(100,10)         n, min_max, mean, var, skew, kurt = stats.describe(x)         new_array = np.vstack((mean,var,skew,kurt,min_max[0],min_max[1]))         new_array.TOut[70]: array([[ 5.46011575e-01, 8.30007104e-02, -9.72899085e-02,                 -1.17492785e+00, 4.07031246e-04, 9.85652100e-01],                [ 4.79292653e-01, 8.13883169e-02, 1.00411352e-01,                 -1.15988275e+00, 1.27241020e-02, 9.85985488e-01],                [ 4.81319367e-01, 8.34107619e-02, 5.55926602e-02,                 -1.20006450e+00, 7.49534810e-03, 9.86671083e-01],                [ 5.26977277e-01, 9.33829059e-02, -1.12640661e-01,                 -1.19955646e+00, 5.74237697e-03, 9.94980830e-01],                [ 5.42622228e-01, 8.92615897e-02, -1.79102183e-01,                 -1.13744108e+00, 2.27821933e-03, 9.93861532e-01],                [ 4.84397369e-01, 9.18274523e-02, 2.33663872e-01,                 -1.36827574e+00, 1.18986562e-02, 9.96563489e-01],                [ 4.41436165e-01, 9.54357485e-02, 3.48194314e-01,                 -1.15588500e+00, 1.77608372e-03, 9.93865324e-01],                [ 5.34834409e-01, 7.61735119e-02, -2.10467450e-01,                 -1.01442389e+00, 2.44706226e-02, 9.97784091e-01],                [ 4.90262346e-01, 9.28757119e-02, 1.02682367e-01,                 -1.28987137e+00, 2.97705706e-03, 9.98205307e-01],                [ 4.42767478e-01, 7.32159267e-02, 1.74375646e-01,                 -9.58660574e-01, 5.52410464e-04, 9.95383732e-01]])`

NumPy has a great module named numpy.ma, which is used for masking array elements. It's very useful when you want to mask (ignore) some elements while doing your calculations. When NumPy masks, it will be treated as an invalid and does not take into account computation:

`In [71]: import numpy.ma as ma         x = np.arange(6)         print(x.mean())         masked_array = ma.masked_array(x, mask=[1,0,0,0,0,0])         masked_array.mean()         2.5 Out[71]: 3.0`

In the preceding code, you have an array x = [0,1,2,3,4,5]. What you do is mask the first element of the array and then calculate the mean. When an element is masked as 1(True), the associated index value in the array will be masked. This method is also very useful while replacing the NAN values:

`In [72]: x = np.arange(25, dtype = float).reshape(5,5)         x[x<5] = np.nan         xOut[72]: array([[ nan, nan, nan, nan, nan],                [ 5., 6., 7., 8., 9.],                [ 10., 11., 12., 13., 14.],                [ 15., 16., 17., 18., 19.],                [ 20., 21., 22., 23., 24.]])In [73]: np.where(np.isnan(x), ma.array(x, mask=np.isnan(x)).mean(axis=0), x)Out[73]: array([[ 12.5, 13.5, 14.5, 15.5, 16.5],                [ 5. , 6. , 7. , 8. , 9. ],                [ 10. , 11. , 12. , 13. , 14. ],                [ 15. , 16. , 17. , 18. , 19. ],                [ 20. , 21. , 22. , 23. , 24. ]])`

In preceding code, we changed the value of the first five elements to nan by putting a condition with index. x[x<5] refers to the elements which indexed for 0, 1, 2, 3, and 4. Then we overwrite these values with the mean of each column(excluding nan values). There are many other useful methods in array operations in order help your code be more concise:

 Method Description np.concatenate Join to the matrix in a sequence with a given matrix np.repeat Repeat the element of an array along a specific axis np.delete Return a new array with the deleted subarrays np.insert Insert values before the specified axis np.unique Find unique values in an array np.tile Create an array by repeating a given input for a given number of repetitions