Book Image

Spark Cookbook

By : Rishi Yadav
Book Image

Spark Cookbook

By: Rishi Yadav

Overview of this book

Table of Contents (19 chapters)
Spark Cookbook
Credits
About the Author
About the Reviewers
www.PacktPub.com
Preface
Index

Optimizing memory


Spark is a complex distributed computing framework, and has many moving parts. Various cluster resources, such as memory, CPU, and network bandwidth, can become bottlenecks at various points. As Spark is an in-memory compute framework, the impact of the memory is the biggest.

Another issue is that it is common for Spark applications to use a huge amount of memory, sometimes more than 100 GB. This amount of memory usage is not common in traditional Java applications.

In Spark, there are two places where memory optimization is needed, and that is at the driver and at the executor level.

You can use the following commands to set the driver memory:

  • Spark shell:

    $ spark-shell --drive-memory 4g
    
  • Spark submit:

    $ spark-submit --drive-memory 4g
    

You can use the following commands to set the executor memory:

  • Spark shell:

    $ spark-shell --executor-memory 4g
    
  • Spark submit:

    $ spark-submit --executor-memory 4g
    

To understand memory optimization, it is a good idea to understand how memory management...