Three different techniques for removing attributes are illustrated in the following sections. These are as follows:
Remove useless attributes by employing simple statistical techniques.
Weighting, which determines how much influence or weight an individual attribute has on the label. The assumption in this case is that the data is being used for a classification problem and the removal of attributes will speed up the modeling process but reduce the accuracy.
Model-based, which uses a classification model to determine the most predictive attributes of the label. As with weighting, the assumption is that the data is being used for classification.
The Remove Useless Attributes
operator is well named but it is worth understanding how it works to ensure that useful attributes are not accidently removed.
The following screenshot shows Statistics View for the first few attributes of a document vector containing 24176 attributes (refer to the process, reduceLargeDocumentVector...