Sometimes, we need to deal with NumPy arrays that are too big to fit in the system memory. A common solution is to use memory mapping and implement out-of-core computations. The array is stored in a file on the hard drive, and we create a memory-mapped object to this file that can be used as a regular NumPy array. Accessing a portion of the array results in the corresponding data being automatically fetched from the hard drive. Therefore, we only consume what we use.
Let's create a memory-mapped array:
In [1]: import numpy as np In [2]: nrows, ncols = 1000000, 100 In [3]: f = np.memmap('memmapped.dat', dtype=np.float32, mode='w+', shape=(nrows, ncols))
Let's feed the array with random values, one column at a time because our system's memory is limited!
In [4]: for i in range(ncols): f[:,i] = np.random.rand(nrows)
We save the last column of the array:
In [5]: x = f[:,-1]
Now, we flush memory changes...