Working with NumPy Arrays | Mastering Numerical Computing with NumPy

This section will guide you through the creation and manipulation of numerical data with NumPy. Let's start by creating a NumPy array from the list:

In [17]: my_list = [2, 14, 6, 8]
         my_array = np.asarray(my_list)
         type(my_array)
Out[17]: numpy.ndarray

Let's do some addition, subtraction, multiplication, and division with scalar values:

In [18]: my_array + 2
Out[18]: array([ 4, 16, 8, 10])
In [19]: my_array - 1
Out[19]: array([ 1, 13, 5, 7])
In [20]: my_array * 2
Out[20]: array([ 4, 28, 12, 16, 8])
In [21]: my_array / 2
Out[21]: array([ 1. , 7. , 3. , 4. ])

It's much harder to do the same operations in a list because the list does not support vectorized operations and you need to iterate its elements. There are many ways to create NumPy arrays, and now you will use one of these methods to create an array which is full of zeros. Later, you will perform some arithmetic operations to see how NumPy behaves in element-wise operations between two arrays:

In [22]: second_array = np.zeros(4) + 3
         second_array
Out[22]: array([ 3., 3., 3., 3.])
In [23]: my_array - second_array
Out[23]: array([ -1., 11., 3., 5.])
In [24]: second_array / my_array
Out[24]: array([ 1.5 , 0.21428571, 0.5 , 0.375 ])

As we did in the previous code, you can create an array which is full of ones with np.ones or an identity array with np.identity and do the same algebraic operations that you did previously:

In [25]: second_array = np.ones(4) + 3
         second_array
Out[25]: array([ 4., 4., 4., 4.])
In [26]: my_array - second_array
Out[26]: array([ -2., 10., 2., 4.])
In [27]: second_array / my_array
Out[27]: array([ 2. , 0.28571429, 0.66666667, 0.5 ])

It works as expected with the np.ones method, but when you use the identity matrix, the calculation returns a (4,4) array as follows:

In [28]: second_array = np.identity(4)
         second_array
Out[28]: array([[ 1., 0., 0., 0.],
                [ 0., 1., 0., 0.],
                [ 0., 0., 1., 0.],
                [ 0., 0., 0., 1.]])
In [29]: second_array = np.identity(4) + 3
         second_array
Out[29]: array([[ 4., 3., 3., 3.],
                [ 3., 4., 3., 3.],
                [ 3., 3., 4., 3.],
                [ 3., 3., 3., 4.]])
In [30]: my_array - second_array
Out[30]: array([[ -2., 11., 3., 5.],
                [ -1., 10., 3., 5.],
                [ -1., 11., 2., 5.],
                [ -1., 11., 3., 4.]])

What this does is subtract the first element of my_array from all of the elements of the first column of second_array and the second_element of the second column, and so on. The same rule is applied to division as well. Please keep in mind that you can successfully do array operations even if they are not exactly the same shape. Later in this chapter, you will learn about broadcasting errors when computation cannot be done between two arrays due to differences in their shapes:

In [31]: second_array / my_array
Out[31]: array([[ 2.  , 0.21428571, 0.5       , 0.375      ],
                [ 1.5 , 0.28571429, 0.5       , 0.375      ],
                [ 1.5 , 0.21428571, 0.66666667, 0.375      ],
                [ 1.5 , 0.21428571, 0.5       , 0.5        ]])

One of the most useful methods in creating NumPy arrays is arange. This returns an array for a given interval between your start and end values. The first argument is the start value of your array, the second is the end value (where it stops creating values), and the third one is the interval. Optionally, you can define your dtype as the fourth argument. The default interval values are 1:

In [32]: x = np.arange(3,7,0.5)
         x
Out[32]: array([ 3. , 3.5, 4. , 4.5, 5. , 5.5, 6. , 6.5])

There is another way to create an array with fixed intervals between the start and stop point when you cannot decide what the interval should be, but you should know how many splits your array should have:

In [33]: x = np.linspace(1.2, 40.5, num=20)
         x
Out[33]: array([ 1.2        , 3.26842105,  5.33684211,  7.40526316,   9.47368421,
                 11.54210526, 13.61052632, 15.67894737, 17.74736842, 19.81578947,
                 21.88421053, 23.95263158, 26.02105263, 28.08947368, 30.15789474,
                 32.22631579, 34.29473684, 36.36315789, 38.43157895, 40.5       ])

There are two different methods which are similar in usage but return different sequences of numbers because their base scale is different. This means that the distribution of the numbers will be different as well. The first one is geomspace, which returns numbers on a logarithmic scale with a geometric progression:

In [34]: np.geomspace(1, 625, num=5)
Out[34]: array([ 1., 5., 25., 125., 625.])

The other important method is logspace, where you can return the values for your start and stop points, which are evenly scaled in:

In [35]: np.logspace(3, 4, num=5)
Out[35]: array([ 1000. , 1778.27941004, 3162.27766017, 5623.4132519 ,
                10000. ])

What are these arguments? If the starting point is 3 and the ending point is 4, then these functions return the numbers which are much higher than the initial range. Your starting point is actually set as default to 10**Start Argument and the ending is set as 10**End Argument. So technically, in this example, the starting point is 10**3 and the ending point is 10**4. You can avoid such situations and keep your start and end points the same as when you put them as arguments in the method. The trick is to use base 10 logarithms of the given arguments:

In [36]: np.logspace(np.log10(3) , np.log10(4) , num=5)
Out[36]: array([ 3. , 3.2237098 , 3.46410162, 3.72241944, 4. ])

By now, you should be familiar with different ways of creating arrays with different distributions. You have also learned how to do some basic operations with them. Let's continue with other useful functions that you will definitely use in your day to day work. Most of the time, you will have to work with multiple arrays and you will need to compare them very quickly. NumPy has a great solution for this problem; you can compare the arrays as you would compare two integers:

In [37]: x = np.array([1,2,3,4])
         y = np.array([1,3,4,4])
         x == y
Out[37]: array([ True, False, False, True], dtype=bool)

The comparison is done element-wise and it returns a Boolean vector, whether elements are matching in two different arrays or not. This method works well in small size arrays and also gives you more details. You can see from the output array where the values are represented as False, that these indexed values are not matching in these two arrays. If you have a large array, you may also choose to get a single answer to your question, whether the elements are matching in two different arrays or not:

In [38]: x = np.array([1,2,3,4])
         y = np.array([1,3,4,4])
         np.array_equal(x,y)
Out[38]: False

Here, you have a single Boolean output. You only know that arrays are not equal, but you don't know which elements exactly are not equal. The comparison is not only limited to checking whether two arrays are equal or not. You can also do element-wise higher- lower comparison between two arrays:

In [39]: x = np.array([1,2,3,4])
         y = np.array([1,3,4,4])
         x < y
Out[39]: array([False, True, True, False], dtype=bool)

When you need to do logical comparison (AND, OR, XOR), you can use them in your array as follows:

In [40]: x = np.array([0, 1, 0, 0], dtype=bool)
         y = np.array([1, 1, 0, 1], dtype=bool)
         np.logical_or(x,y)
Out[40]: array([ True, True, False, True], dtype=bool)
In [41]: np.logical_and(x,y)
Out[41]: array([False, True, False, False], dtype=bool)
In [42]: x = np.array([12,16,57,11])
         np.logical_or(x < 13, x > 50)
Out[42]: array([ True, False, True, True], dtype=bool)

So far, algebraic operations such as addition and multiplication have been covered. How can we use these operations with transcendental functions such as the exponential function, logarithms, or trigonometric functions?

In [43]: x = np.array([1, 2, 3,4 ])
         np.exp(x)
Out[43]: array([ 2.71828183, 7.3890561 , 20.08553692, 54.59815003])
In [44]: np.log(x)
Out[44]: array([ 0. , 0.69314718, 1.09861229, 1.38629436])
In [45]: np.sin(x)
Out[45]: array([ 0.84147098, 0.90929743, 0.14112001, -0.7568025 ])

What about the transpose of a matrix? First, you will use the reshape function with arange to set the desired shape of the matrix:

In [46]: x = np.arange(9)
         x
Out[46]: array([0, 1, 2, 3, 4, 5, 6, 7, 8])
In [47]: x = np.arange(9).reshape((3, 3))
         x
Out[47]: array([[0, 1, 2],
                [3, 4, 5],
                [6, 7, 8]])
In [48]: x.T
Out[48]: array([[0, 3, 6],
                [1, 4, 7],
                [2, 5, 8]])

You transpose the 3*3 array, so the shape doesn't change because both dimensions are 3. Let's see what happens when you don't have a square array:

In [49]: x = np.arange(6).reshape(2,3)
         x
Out[49]: array([[0, 1, 2],
                [3, 4, 5]])
In [50]: x.T
Out[50]: array([[0, 3],
                [1, 4],
                [2, 5]])

The transpose works as expected and the dimensions are switched as well. You can also get summary statistics from arrays such as mean, median, and standard deviation. Let's start with methods that NumPy offers for calculating basic statistics:

Method	Description
`np.sum`	Returns the sum of all array values or along the specified axis
`np.amin`	Returns the minimum value of all arrays or along the specified axis
`np.amax`	Returns the maximum value of all arrays or along the specified axis
`np.percentile`	Returns the given q^th percentile of all arrays or along the specified axis
`np.nanmin`	The same as `np.amin`, but ignores NaN values in an array
`np.nanmax`	The same as `np.amax`, but ignores NaN values in an array
`np.nanpercentile`	The same as `np.percentile`, but ignores NaN values in an array

The following code block gives an example of the preceding statistical methods of NumPy. These methods are very useful as you can operate the methods in a whole array or axis-wise according to your needs. You should note that you can find more fully-featured and better implementations of these methods in SciPy as it uses NumPy multidimensional arrays as a data structure:

In [51]: x = np.arange(9).reshape((3,3))
         x
Out[51]: array([[0, 1, 2],
                [3, 4, 5],
                [6, 7, 8]])
In [52]: np.sum(x)
Out[52]: 36
In [53]: np.amin(x)
Out[53]: 0
In [54]: np.amax(x)
Out[54]: 8
In [55]: np.amin(x, axis=0)
Out[55]: array([0, 1, 2])
In [56]: np.amin(x, axis=1)
Out[56]: array([0, 3, 6])
In [57]: np.percentile(x, 80)
Out[57]: 6.4000000000000004

The axis argument determines the dimension that this function will operate on. In this example, axis=0 represents the first axis which is the x axis, and axis = 1 represents the second axis which is y. When we use a regular amin(x), we return a single value because it calculates the minimum value in all arrays, but when we specify the axis, it starts evaluating the function axis-wise and returns an array which shows the results for each row or column. Imagine you have a large array; you find the max value by using amax, but what will happen if you need to pass the index of this value to another function? In such cases, argmin and argmax come to the rescue, as shown in the following snippet:

In [58]: x = np.array([1,-21,3,-3])
         np.argmax(x)
Out[58]: 2
In [59]: np.argmin(x)
Out[59]: 1

Let's continue with more statistical functions:

Method	Description
`np.mean`	Returns the mean of all array values or along the specific axis
`np.median`	Returns the median of all array values or along the specific axis
`np.std`	Returns the standard deviation of all array values or along the specific axis
`np.nanmean`	The same as `np.mean`, but ignores NaN values in an array
`np.nanmedian`	The same as `np.nanmedian`, but ignores NaN values in an array
`np.nonstd`	The same as `np.nanstd`, but ignores NaN values in an array

The following code gives more examples of the preceding statistical methods of NumPy. These methods are heavily used in data discovery phases, where you analyze your data features and distribution:

In [60]: x = np.array([[2, 3, 5], [20, 12, 4]])
         x
Out[60]: array([[ 2, 3, 5],
                [20, 12, 4]])
In [61]: np.mean(x)
Out[61]: 7.666666666666667
In [62]: np.mean(x, axis=0)
Out[62]: array([ 11. , 7.5, 4.5])
In [63]: np.mean(x, axis=1)
Out[63]: array([ 3.33333333, 12. ])
In [64]: np.median(x)
Out[64]: 4.5
In [65]: np.std(x)
Out[65]: 6.3944420310836261

Mastering Numerical Computing with NumPy

By : Mert Cakmak, Tiago Antao, Cuhadaroglu

Mastering Numerical Computing with NumPy

By: Mert Cakmak, Tiago Antao, Cuhadaroglu

Overview of this book

NumPy array operations

Mastering Numerical Computing with NumPy

By : Mert Cakmak, Tiago Antao, Cuhadaroglu

Mastering Numerical Computing with NumPy

By: Mert Cakmak, Tiago Antao, Cuhadaroglu

Overview of this book

Confirmation

Buy this book with your credits?

Submit Your Feedback

Create a Free Account To Continue Reading

Sign in to activate your 7-day free access