Book Image

Spark Cookbook

By : Rishi Yadav
Book Image

Spark Cookbook

By: Rishi Yadav

Overview of this book

Table of Contents (19 chapters)
Spark Cookbook
Credits
About the Author
About the Reviewers
www.PacktPub.com
Preface
Index

Exploring the Spark shell


Spark comes bundled with a REPL shell, which is a wrapper around the Scala shell. Though the Spark shell looks like a command line for simple things, in reality a lot of complex queries can also be executed using it. This chapter explores different development environments in which Spark applications can be developed.

How to do it...

Hadoop MapReduce's word count becomes very simple with the Spark shell. In this recipe, we are going to create a simple 1-line text file, upload it to the Hadoop distributed file system (HDFS), and use Spark to count occurrences of words. Let's see how:

  1. Create the words directory by using the following command:

    $ mkdir words
    
  2. Get into the words directory:

    $ cd words
    
  3. Create a sh.txt text file and enter "to be or not to be" in it:

    $ echo "to be or not to be" > sh.txt
    
  4. Start the Spark shell:

    $ spark-shell
    
  5. Load the words directory as RDD:

    Scala> val words = sc.textFile("hdfs://localhost:9000/user/hduser/words")
    
  6. Count the number of lines...