As mentioned in the preceding section, what makes NumPy special is the usage of multidimensional arrays called **ndarrays**. All `ndarray` items are homogeneous and use the same size in memory. Let's start by importing NumPy and analyzing the structure of a NumPy array object by creating the array. You can easily import this library by typing the following statement into your console. You can use any naming convention instead of `np`, but in this book, `np` will be used as it's the standard convention. Let's create a simple array and explain what the attributes hold by Python behind the scenes as metadata of the created array, so-called **attributes:**

In [2]: import numpy as np

x = np.array([[1,2,3],[4,5,6]])

x

Out[2]: array([[1, 2, 3],[4, 5, 6]])

In [3]: print("We just create a ", type(x))

Out[3]: We just create a <class 'numpy.ndarray'>

In [4]: print("Our template has shape as" ,x.shape)

Out[4]: Our template has shape as (2, 3)

In [5]: print("Total size is",x.size)

Out[5]: Total size is 6

In [6]: print("The dimension of our array is " ,x.ndim)

Out[6]: The dimension of our array is 2

In [7]: print("Data type of elements are",x.dtype)

Out[7]: Data type of elements are int32

In [8]: print("It consumes",x.nbytes,"bytes")

Out[8]: It consumes 24 bytes

As you can see, the type of our object is a NumPy array. `x.shape` returns a tuple which gives you the dimension of the array as an output such as *(n,m)*. You can get the total number of elements in an array by using `x.size`*.* In our example, we have six elements in total. Knowing attributes such as *shape and dimension* is very important. The more you know, the more you will be comfortable with computations. If you don't know your array's size and dimensions, it wouldn't be wise to start doing computations with it. In NumPy, you can use `x.ndim` to check what the dimension of your array is. There are other attributes such as `dtype` and `nbytes`*,* which are very useful while you are checking memory consumption and verifying what kind of data type you should use in the array. In our example, each element has a data type of `int32` that consumes 24 bytes in total. You can also force some of these attributes while creating your array such as `dtype`. Previously, the data type was an integer. Let's switch it to `float`, `complex`, or `uint` (unsigned integer). In order to see what the data type change does, let's analyze what byte consumption is, which is shown as follows:

In [9]: x = np.array([[1,2,3],[4,5,6]], dtype = np.float)

print(x)

Out[9]: print(x.nbytes)

[[ 1. 2. 3.]

[ 4. 5. 6.]]

48

In [10]: x = np.array([[1,2,3],[4,5,6]], dtype = np.complex)

print(x)

print(x.nbytes)

Out[10]: [[ 1.+0.j 2.+0.j 3.+0.j]

[ 4.+0.j 5.+0.j 6.+0.j]]

96

In [11]: x = np.array([[1,2,3],[4,-5,6]], dtype = np.uint32)

print(x)

print(x.nbytes)

Out[11]: [[ 1 2 3]

[ 4 4294967291 6]]

24

As you can see, each type consumes a different number of bytes. Imagine you have a matrix as follows and that you are using `int64` or `int32` as the data type:

In [12]: x = np.array([[1,2,3],[4,5,6]], dtype = np.int64)

print("int64 consumes",x.nbytes, "bytes")

x = np.array([[1,2,3],[4,5,6]], dtype = np.int32)

print("int32 consumes",x.nbytes, "bytes")

Out[12]: int64 consumes 48 bytes

int32 consumes 24 bytes

The memory need is doubled if you use `int64`. Ask this question to yourself; which data type would suffice? Until your numbers are higher than 2,147,483,648 and lower than -2,147,483,647, using `int32` is enough. Imagine you have a huge array with a size over 100 MB. In such cases, this conversion plays a crucial role in performance.

As you may have noticed in the previous example, when you were changing the data types, you were creating an array each time. Technically, you cannot change the `dtype` after you create the array. However, what you can do is either create it again or copy the existing one with a new `dtype` and with the `astype` attribute. Let's create a copy of the array with the new `dtype`*.* Here is an example of how you can also change your `dtype` with the `astype` attribute:

In [13]: x_copy = np.array(x, dtype = np.float)

x_copy

Out[13]: array([[ 1., 2., 3.],

[ 4., 5., 6.]])

In [14]: x_copy_int = x_copy.astype(np.int)

x_copy_int

Out[14]: array([[1, 2, 3],

[4, 5, 6]])

Please keep in mind that when you use the `astype` attribute, it doesn't change the `dtype` of the `x_copy`, even though you applied it to `x_copy`*.* It keeps the `x_copy`, but creates a `x_copy_int`:

In [15]: x_copy

Out[15]: array([[ 1., 2., 3.],

[ 4., 5., 6.]])

Let's imagine a case where you are working in a research group that tries to identify and calculate the risks of an individual patient who has cancer. You have 100,000 records (rows), where each row represents a single patient, and each patient has 100 features (results of some of the tests). As a result, you have (100000, 100) arrays:

In [16]: Data_Cancer= np.random.rand(100000,100)

print(type(Data_Cancer))

print(Data_Cancer.dtype)

print(Data_Cancer.nbytes)

Data_Cancer_New = np.array(Data_Cancer, dtype = np.float32)

print(Data_Cancer_New.nbytes)

Out[16]: <class 'numpy.ndarray'>

float64

80000000

40000000

As you can see from the preceding code, their size decreases from 80 MB to 40 MB just by changing the `dtype`. What we get in return is less precision after decimal points. Instead of being precise to 16 decimals points, you will have only 7 decimals. In some machine learning algorithms, precision can be negligible. In such cases, you should feel free to adjust your `dtype` so that it minimizes your memory usage.