Book Image

Apache Spark 2.x for Java Developers

By : Sourav Gulati, Sumit Kumar
Book Image

Apache Spark 2.x for Java Developers

By: Sourav Gulati, Sumit Kumar

Overview of this book

Apache Spark is the buzzword in the big data industry right now, especially with the increasing need for real-time streaming and data processing. While Spark is built on Scala, the Spark Java API exposes all the Spark features available in the Scala version for Java developers. This book will show you how you can implement various functionalities of the Apache Spark framework in Java, without stepping out of your comfort zone. The book starts with an introduction to the Apache Spark 2.x ecosystem, followed by explaining how to install and configure Spark, and refreshes the Java concepts that will be useful to you when consuming Apache Spark's APIs. You will explore RDD and its associated common Action and Transformation Java APIs, set up a production-like clustered environment, and work with Spark SQL. Moving on, you will perform near-real-time processing with Spark streaming, Machine Learning analytics with Spark MLlib, and graph processing with GraphX, all using various Java packages. By the end of the book, you will have a solid foundation in implementing components in the Spark framework in Java to build fast, real-time applications.
Table of Contents (19 chapters)
Title Page
Credits
Foreword
About the Authors
About the Reviewer
www.PacktPub.com
Customer Feedback
Preface

Spark Driver Web UI


This section will provide some important aspects of the Spark driver's UI. We will see the statistics of the jobs we executed using Spark shell on Spark UI.

As described in the Getting started with Apache Spark section, Spark driver's UI runs at http://localhost:4040/ (unless you make any changes to default settings).

When you start Spark shell, Spark driver's UI will look as follows:

SparkContext is an entry point to every Spark application. Every Spark job is launched with a SparkContext and can consist of only one SparkContext.

Spark shell, being a Spark application starts with SparkContext and every SparkContext launches its own web UI. The default port is 4040. Spark UI can be enabled/disabled or can be launched on a separate port using the following properties:

Property

Default value

spark.ui.enabled

True

spark.ui.port

4040

For example, Spark shell application with Spark UI running on 5050 port can be launched as:

spark-shell --confspark.ui.port=5050

If multiple Spark applications...