Sometimes you need to load data in a specific format and TextInputFormat
is not a good fit for that. Spark provides two methods for this purpose:
sparkContext.hadoopFile
: This supports the old MapReduce APIsparkContext.newAPIHadoopFile
: This supports the new MapReduce API
These two methods provide support for all of Hadoop's built-in InputFormats interfaces as well as any custom InputFormat
.
We are going to load text data in key-value format and load it in Spark using KeyValueTextInputFormat
:
Create the
currency
directory by using the following command:$ mkdir currency
Change the current directory to
currency
:$ cd currency
Create the
na.txt
text file and enter currency values in key-value format delimited by tab (key: country, value: currency):$ vi na.txt United States of America US Dollar Canada Canadian Dollar Mexico Peso
You can create more files for each continent.
Upload the
currency
folder to HDFS:$ hdfs dfs -put...