On this occasion, you have some plain text files, and you want to know what is said in them. You don't want to read them, so you decide to count the times that the words appear in the text, and see the most frequent ones to get an idea of what the files are about. The first of our two tutorials on filtering is about counting the words in the file.
Note
Before starting, you'll need at least one text file to play with. The text file used in this tutorial is named smcng10.txt
, and is available for you to download from Packt Publishing's website, www.packtpub.com.
Let's work.
Tip
This section and the following sections have many steps. So, feel free to preview the data from time-to-time. In this way, you make sure that you are doing well, and understand what filtering is about, as you progress in the design of your transformation.
Create a new transformation.
By using a Text file input step, read your file. The trick here is to put as a Separator...