PCA (and Kernel PCA) both use low-rank matrix approximation to estimate the principal components. The low-rank matrix approximation minimizes a cost function represented as a fit between a given matrix and its approximation.
Such a method might be really costly for big datasets. By randomizing how the singular value decomposition of the input dataset happens, the speed up in the estimation is significant.
To execute this recipe, you will need NumPy
, Scikit
, and Matplotlib
. No other prerequisites are required.
As before, we create a wrapper method to estimate our model (the reduce_randomizedPCA.py
file):
def reduce_randomizedPCA(x): ''' Reduce the dimensions using Randomized PCA algorithm ''' # create the CCA object randomPCA = dc.RandomizedPCA(n_components=2, whiten=True, copy=False) # learn the principal components from all the features return randomPCA.fit(x...