By now, you have learned how to do several kinds of calculations which enrich the set of data. There is still another kind of operation that is frequently used; it does not have to do with enriching the data but with discarding or filtering unwanted information. That's the core of this section.
Filtering data
Filtering rows upon conditions
Suppose you have a dataset and you only want to keep the rows that match a condition. To demonstrate how to implement this kind of filtering, we will read a file, build a list of words found in the file, and then filter the nulls or unwanted words. We will split the exercise into two parts:
- In the first part, we will read the file and prepare the data for filtering
- In the...