Book Image

HDInsight Essentials - Second Edition

By : Rajesh Nadipalli
Book Image

HDInsight Essentials - Second Edition

By: Rajesh Nadipalli

Overview of this book

Table of Contents (16 chapters)
HDInsight Essentials Second Edition
Credits
About the Author
About the Reviewers
www.PacktPub.com
Preface
Index

Ingesting to Data Lake using HDFS command


The simplest way to upload files is to use Hadoop command line. The following are the steps to load data into Data Lake.

Connecting to a Hadoop client

You can connect to the Hadoop cluster via a remote desktop connection to the active head node. After establishing the remote connection, launch the Hadoop command-line application that can be found as a shortcut on the desktop.

Getting your files on the local storage

Get your files on the edge node, either via web download, SCP or SFTP. The following figure shows you the steps to download the on-time performance data from the website. The steps are selecting the year, month, and other similar fields and then clicking on Download. The fields that we need for the project are listed as follows:

YEAR, QUARTER, MONTH, DAY_OF_MONTH, DAY_OF_WEEK, FL_DATE, UNIQUE_CARRIER, AIRLINE_ID, FL_NUM, ORIGIN_AIRPORT_ID, ORIGIN_AIRPORT_SEQ_ID, ORIGIN_CITY_MARKET_ID, ORIGIN, ORIGIN_STATE_ABR, DEST_AIRPORT_ID, DEST_AIRPORT_SEQ_ID...