Sign In Start Free Trial
Account

Add to playlist

Create a Playlist

Modal Close icon
You need to login to use this feature.
  • Book Overview & Buying 50 Hours of Big Data, PySpark, AWS, Scala, and Scraping
  • Table Of Contents Toc
50 Hours of Big Data, PySpark, AWS, Scala, and Scraping

50 Hours of Big Data, PySpark, AWS, Scala, and Scraping

By : AI Sciences
5 (1)
close
close
50 Hours of Big Data, PySpark, AWS, Scala, and Scraping

50 Hours of Big Data, PySpark, AWS, Scala, and Scraping

5 (1)
By: AI Sciences

Overview of this book

Part 1 is designed to reflect the most in-demand Scala skills. It provides an in-depth understanding of core Scala concepts. We will wrap up with a discussion on Map Reduce and ETL pipelines using Spark from AWS S3 to AWS RDS (includes six mini-projects and one Scala Spark project). Part 2 covers PySpark to perform data analysis. You will explore Spark RDDs, Dataframes, a bit of Spark SQL queries, transformations, and actions that can be performed on the data using Spark RDDs and dataframes, the ecosystem of Spark and Hadoop, and their underlying architecture. You will also learn how we can leverage AWS storage, databases, computations, and how Spark can communicate with different AWS services. Part 3 is all about data scraping and data mining. You will cover important concepts such as Internet Browser execution and communication with the server, synchronous and asynchronous, parsing data in response from the server, tools for data scraping, Python requests module, and more. In Part 4, you will be using MongoDB to develop an understanding of the NoSQL databases. You will explore the basic operations and explore the MongoDB query, project and update operators. We will wind up this section with two projects: Developing a CRUD-based application using Django and MongoDB and implementing an ETL pipeline using PySpark to dump the data in MongoDB. By the end of this course, you will be able to relate the concepts and practical aspects of learned technologies with real-world problems. All the resources of this course are available at https://github.com/PacktPublishing/50-Hours-of-Big-Data-PySpark-AWS-Scala-and-Scraping
Table of Contents (35 chapters)
close
close
16
Part 3 - PySpark and AWS - Master Big Data with PySpark and AWS
24
Part 4 - MongoDB-Mastering MongoDB for Beginners (Theory and Projects)
You're currently viewing a free sample. Access the full title and Packt library for free now with a free trial.
Chapter: 12
Functions
Icon This video is locked
Icon
Icon
0:00
2.0x
1.5x
1.25x
1.0x
0.5x
caption settings
caption off
Icon Icon
ShowHide Transcripts Icon
CONTINUE WATCHING
83
Tech Concepts
36
Programming languages
73
Tech Tools
Icon Unlimited access to the largest independent learning library in tech of over 8,000 expert-authored tech books and videos.
Icon Innovative learning tools, including AI book assistants, code context explainers, and text-to-speech.
Icon 50+ new titles added per month and exclusive early access to books as they are being written.
50 Hours of Big Data, PySpark, AWS, Scala, and Scraping
notes
bookmark Notes and Bookmarks search Search in title playlist Add to playlist download Download options font-size Font size

Change the font size

margin-width Margin width

Change margin width

day-mode Day/Sepia/Night Modes

Change background colour

Close icon Search
Country selected

Close icon Your notes and bookmarks

Confirmation

Modal Close icon
claim successful

Buy this book with your credits?

Modal Close icon
Are you sure you want to buy this book with one of your credits?
Close
YES, BUY

Submit Your Feedback

Modal Close icon
Modal Close icon
Modal Close icon