By now, you have learned how to do several kinds of calculations which enrich the set of data. There is still another kind of operation that is frequently used; it does not have to do with enriching the data but with discarding or filtering unwanted information. That's the core of this section.
Suppose you have a dataset and you only want to keep the rows that match a condition. To demonstrate how to implement this kind of filtering, we will read a file, build a list of words found in the file, and then filter the nulls or unwanted words. We will split the exercise into two parts:
- In the first part, we will read the file and prepare the data for filtering
- In the second part, we will effectively filter the data