Book Image

Mastering Python for Data Science

By : Samir Madhavan
Book Image

Mastering Python for Data Science

By: Samir Madhavan

Overview of this book

Table of Contents (19 chapters)
Mastering Python for Data Science
Credits
About the Author
About the Reviewers
www.PacktPub.com
Preface
7
Estimating the Likelihood of Events
Index

The world of arrays with NumPy


Python, by default, comes with a data structure, such as List, which can be utilized for array operations, but a Python list on its own is not suitable to perform heavy mathematical operations, as it is not optimized for it.

NumPy is a wonderful Python package produced by Travis Oliphant, which has been created fundamentally for scientific computing. It helps handle large multidimensional arrays and matrices, along with a large library of high-level mathematical functions to operate on these arrays.

A NumPy array would require much less memory to store the same amount of data compared to a Python list, which helps in reading and writing from the array in a faster manner.

Creating an array

A list of numbers can be passed to the following array function to create a NumPy array object:

>>> import numpy as np

>>> n_array = np.array([[0, 1, 2, 3],
                 [4, 5, 6, 7],
                 [8, 9, 10, 11]])

A NumPy array object has a number of attributes, which help in giving information about the array. Here are its important attributes:

  • ndim: This gives the number of dimensions of the array. The following shows that the array that we defined had two dimensions:

    >>> n_array.ndim
    2
    

    n_array has a rank of 2, which is a 2D array.

  • shape: This gives the size of each dimension of the array:

    >>> n_array.shape
    (3, 4)
    

    The first dimension of n_array has a size of 3 and the second dimension has a size of 4. This can be also visualized as three rows and four columns.

  • size: This gives the number of elements:

    >>> n_array.size
    12
    

    The total number of elements in n_array is 12.

  • dtype: This gives the datatype of the elements in the array:

    >>> n_array.dtype.name
    int64
    

    The number is stored as int64 in n_array.

Mathematical operations

When you have an array of data, you would like to perform certain mathematical operations on it. We will now discuss a few of the important ones in the following sections.

Array subtraction

The following commands subtract the a array from the b array to get the resultant c array. The subtraction happens element by element:

>>> a = np.array( [11, 12, 13, 14])
>>> b = np.array( [ 1, 2, 3, 4])
>>> c = a - b
>>> c
Array[10 10 10 10]

Do note that when you subtract two arrays, they should be of equal dimensions.

Squaring an array

The following command raises each element to the power of 2 to obtain this result:

>>> b**2
[1  4  9 16]

A trigonometric function performed on the array

The following command applies cosine to each of the values in the b array to obtain the following result:

>>> np.cos(b)
[ 0.54030231 -0.41614684 -0.9899925  -0.65364362]

Conditional operations

The following command will apply a conditional operation to each of the elements of the b array, in order to generate the respective Boolean values:

>>> b<2
[ True False False False]

Matrix multiplication

Two matrices can be multiplied element by element or in a dot product. The following commands will perform the element-by-element multiplication:

>>> A1 = np.array([[1, 1],
            [0, 1]])

>>> A2 = np.array([[2, 0],
            [3, 4]])

>>> A1 * A2
[[2 0]
[0 4]]

The dot product can be performed with the following command:

>>> np.dot(A1, A2)
[[5 4]
[3 4]]

Indexing and slicing

If you want to select a particular element of an array, it can be achieved using indexes:

>>> n_array[0,1]
1

The preceding command will select the first array and then select the second value in the array. It can also be seen as an intersection of the first row and the second column of the matrix.

If a range of values has to be selected on a row, then we can use the following command:

>>> n_array[ 0 , 0:3 ]
[0 1 2]

The 0:3 value selects the first three values of the first row.

The whole row of values can be selected with the following command:

>>> n_array[ 0 , : ]
[0 1 2 3]

Using the following command, an entire column of values need to be selected:

>>> n_array[ : , 1 ]
[1 5 9]

Shape manipulation

Once the array has been created, we can change the shape of it too. The following command flattens the array:

>>> n_array.ravel()
[ 0  1  2  3  4  5  6  7  8  9 10 11]

The following command reshapes the array in to a six rows and two columns format. Also, note that when reshaping, the new shape should have the same number of elements as the previous one:

>>> n_array.shape = (6,2)
>>> n_array
[[ 0  1]
[ 2  3]
[ 4  5]
[ 6  7]
[ 8  9]
[10 11]]

The array can be transposed too:

>>> n_array.transpose()
[[ 0  2  4  6  8 10]
[ 1  3  5  7  9 11]]