Filtering data
By now, you have learned how to do several kinds of calculations which enrich the set of data. There is still another kind of operation that is frequently used; it does not have to do with enriching the data but with discarding or filtering unwanted information. That's the core of this section.
Filtering rows upon conditions
Suppose you have a dataset and you only want to keep the rows that match a condition. To demonstrate how to implement this kind of filtering, we will read a file, build a list of words found in the file, and then filter the nulls or unwanted words. We will split the exercise into two parts:
- In the first part, we will read the file and prepare the data for filtering
- In the second part, we will effectively filter the data
Reading a file and getting the list of words found in it
Let's start by reading a sample file.
Note
Before starting, you'll need at least one text file to play with. The text file used in this tutorial is named smcng10.txt
. Its content is about...