Sign In Start Free Trial
Account

Add to playlist

Create a Playlist

Modal Close icon
You need to login to use this feature.
  • Book Overview & Buying Integrating Hadoop
  • Table Of Contents Toc
Integrating Hadoop

Integrating Hadoop

By : William McKnight, Jake Dolezal
5 (1)
close
close
Integrating Hadoop

Integrating Hadoop

5 (1)
By: William McKnight, Jake Dolezal

Overview of this book

In today’s time, data with value is branched off into numerous databases across multiple companies. The challenge is bringing the data together. Integrating Hadoop shows how Hadoop is used to collect and load the data on physical devices and the cloud. The book begins with an introduction of Hadoop and the types of data fit for it. Next, it focuses on assembling the integration team and gives an overview of workloads in the organization. You will also identify data sources for Hadoop, such as No SQL Databases and Legacy/Relational Databases, distinguish between ETL and ELT, and learn how to load and unload data into Hadoop. You will also practice managing big data using methods such as Upserts and Use HBase, and discover the advantages of real-time computing and the basic structure of streaming data architecture. Finally, you will interact with the master data of an organization and learn the top 10 mistakes people commit while integrating Hadoop data and how to avoid them.
Table of Contents (12 chapters)
close
close
Lock Free Chapter
1
1 Hadoop in Support of an Information Strategy
6
6 Unloading/Distributing Data from Hadoop
7
7 Apache Spark Cluster Computing with Hadoop
12
Index

3 ETL versus ELT



ETL (extract, transform, load) and ELT (extract, load, transform) are both acronyms for three-step processes that move data from one place (and purpose) to another. Generally, data from multiple source systems is being moved to, and consolidated in, an enterprise data warehouse (EDW) or other target database(s), where it becomes available for further use.

The important difference between ETL and ELT is in the transformation step. In this step, data is cleansed, put into formats/structures required by the EDW or downstream datamarts and apps, and normalized and integrated so that it can be compared, merged, and analyzed with data from other sources. When using ETL, these tasks are performed through automated tools or hand-coded scripts prior to loading. With ELT, the bulk of transformation is completed after loading, inside the HDFS, data warehouse, or other target database. Pre-transformations can take place in the source database as well.

CONTINUE READING
83
Tech Concepts
36
Programming languages
73
Tech Tools
Icon Unlimited access to the largest independent learning library in tech of over 8,000 expert-authored tech books and videos.
Icon Innovative learning tools, including AI book assistants, code context explainers, and text-to-speech.
Icon 50+ new titles added per month and exclusive early access to books as they are being written.
Integrating Hadoop
notes
bookmark Notes and Bookmarks search Search in title playlist Add to playlist font-size Font size

Change the font size

margin-width Margin width

Change margin width

day-mode Day/Sepia/Night Modes

Change background colour

Close icon Search
Country selected

Close icon Your notes and bookmarks

Confirmation

Modal Close icon
claim successful

Buy this book with your credits?

Modal Close icon
Are you sure you want to buy this book with one of your credits?
Close
YES, BUY

Submit Your Feedback

Modal Close icon
Modal Close icon
Modal Close icon