Sign In Start Free Trial
Account

Add to playlist

Create a Playlist

Modal Close icon
You need to login to use this feature.
  • Book Overview & Buying Databricks Certified Associate Developer for Apache Spark Using Python
  • Table Of Contents Toc
Databricks Certified Associate Developer for Apache Spark Using Python

Databricks Certified Associate Developer for Apache Spark Using Python

By : Saba Shah
5 (4)
close
close
Databricks Certified Associate Developer for Apache Spark Using Python

Databricks Certified Associate Developer for Apache Spark Using Python

5 (4)
By: Saba Shah

Overview of this book

Spark has become a de facto standard for big data processing. Migrating data processing to Spark saves resources, streamlines your business focus, and modernizes workloads, creating new business opportunities through Spark’s advanced capabilities. Written by a senior solutions architect at Databricks, with experience in leading data science and data engineering teams in Fortune 500s as well as startups, this book is your exhaustive guide to achieving the Databricks Certified Associate Developer for Apache Spark certification on your first attempt. You’ll explore the core components of Apache Spark, its architecture, and its optimization, while familiarizing yourself with the Spark DataFrame API and its components needed for data manipulation. You’ll also find out what Spark streaming is and why it’s important for modern data stacks, before learning about machine learning in Spark and its different use cases. What’s more, you’ll discover sample questions at the end of each section along with two mock exams to help you prepare for the certification exam. By the end of this book, you’ll know what to expect in the exam and gain enough understanding of Spark and its tools to pass the exam. You’ll also be able to apply this knowledge in a real-world setting and take your skillset to the next level.
Table of Contents (18 chapters)
close
close
1
Part 1: Exam Overview
3
Part 2: Introducing Spark
6
Part 3: Spark Operations
10
Part 4: Spark Applications
13
Part 5: Mock Papers
14
Chapter 9: Mock Test 1
15
Chapter 10: Mock Test 2

Creating DataFrame operations

As we have already discussed, DataFrames are the main building blocks of Spark data. They consist of rows and column data structures.

DataFrames in PySpark are created using the pyspark.sql.SparkSession.createDataFrame function. You can use lists, lists of lists, tuples, dictionaries, Pandas DataFrames, RDDs, and pyspark.sql.Rows to create DataFrames.

Spark DataFrames also has an argument named schema that specifies the schema of the DataFrame. You can either choose to specify the schema explicitly or let Spark infer the schema from the DataFrame itself. If you don’t specify this argument in the code, Spark will infer the schema on its own.

There are different ways to create DataFrames in Spark. Some of them are explained in the following sections.

Using a list of rows

The first way to create DataFrames we see is by using rows of data. You can think of rows of data as lists. They would share common header values for each of the values...

CONTINUE READING
83
Tech Concepts
36
Programming languages
73
Tech Tools
Icon Unlimited access to the largest independent learning library in tech of over 8,000 expert-authored tech books and videos.
Icon Innovative learning tools, including AI book assistants, code context explainers, and text-to-speech.
Icon 50+ new titles added per month and exclusive early access to books as they are being written.
Databricks Certified Associate Developer for Apache Spark Using Python
notes
bookmark Notes and Bookmarks search Search in title playlist Add to playlist download Download options font-size Font size

Change the font size

margin-width Margin width

Change margin width

day-mode Day/Sepia/Night Modes

Change background colour

Close icon Search
Country selected

Close icon Your notes and bookmarks

Confirmation

Modal Close icon
claim successful

Buy this book with your credits?

Modal Close icon
Are you sure you want to buy this book with one of your credits?
Close
YES, BUY

Submit Your Feedback

Modal Close icon
Modal Close icon
Modal Close icon