Index
A
- access log, formats
- %h / How to do it...
- %l / How to do it...
- %u / How to do it...
- %t / How to do it...
- %r / How to do it...
- %>s / How to do it...
- %b / How to do it...
- referrer / How to do it...
- user agent / How to do it...
- Apache Spark
- about / Introduction
- URL / Getting ready
- Area Under a Curve
- reference / How to do it...
- Atomic export
- performing, Sqoop used / Performing Atomic export using Sqoop, How it works...
- Avro
- reference / Processing Hive data in the Avro format
- AVRO format
- Hive data, processing in / Processing Hive data in the Avro format, How to do it..., How it works...
- AWS EC2
- reference / Getting ready
B
- balancer command
- executing, for uniform data distribution / Executing the balancer command for uniform data distribution, How to do it...
- benchmarking
- performing, on Hadoop cluster / Performing benchmarking on a Hadoop cluster, How to do it..., TestDFSIO, How it works...
C
- Call Data Record analytics
- defining / Call Data Record analytics, Problem Statement, Solution
- URL / Getting ready
- problem statement / Problem Statement
- solution / Solution
- Call Data Record Analytics
- performing, Hive used / How to do it...
- Call Data Records (CDR)
- about / Call Data Record Analytics using Hive, Getting ready
- reference / Call Data Record Analytics using Hive
- Change Data Capture (CDC)
- implementing, Hive used / Implementing Change Data Capture using Hive, How to do it, How it works
- CLI
- Hbase operation, performing in / Performing the Hbase operation in CLI, How to do it, How it works...
- combined access logs
- command options, Hadoop
- reference / There's more
- compressed data
- saving, on HDFS / Saving compressed data in HDFS, How it works...
- configuration parameters, Hadoop
- hadoop-env.sh / How to do it...
- core-site.xml / How to do it...
- yarn-site.xml / How to do it...
- mapred-site.xml / How to do it...
- hdfs-site.xml / How to do it...
- Confusion Matrix
- reference / How to do it...
- context Ngram
- performing, in Hive / Performing context Ngram in Hive, How it works...
- custom SerDe
- reference / How it works...
D
- data
- loading, from local machine to HDFS / Loading data from a local machine to HDFS, How it works...
- exporting, from HDFS to local machine / Getting ready, How it works...
- importing, from Hadoop cluster / Importing data from another Hadoop cluster, How it works...
- importing from RDMBS to HDFS, Sqoop used / Importing data from RDMBS to HDFS using Sqoop, How to do it...
- exporting, from HDFS to RDBMS / Exporting data from HDFS to RDBMS, Getting ready, How to do it..., How it works...
- importing into Hive table, Sqoop used / Importing data into Hive tables using Sqoop, How to do it..., How it works...
- importing into HDFS, from Mainframes / Importing data into HDFS from Mainframes
- importing from RDBMS to Hbase, Sqoop used / Importing data from RDBMS to Hbase using Sqoop, How to do it..., How it works...
- importing from Kafka into HDFS, Flume used / Importing data from Kafka into HDFS using Flume, How to do it..., How it works
- Data Encryption Key (DEK) / How it works...
- DataNodes
- decommissioning / Decommissioning DataNodes, How it works...
- deleted data
- recycling, from trash to HDFS / Recycling deleted data from trash to HDFS, How to do it..., How it works...
- DistCp
- about / How to do it...
- -update option / How to do it...
- -overwrite option / How to do it...
- reference / How it works...
- distinct values
- finding, Map Reduce program used / Map Reduce program to find distinct values, How to do it
E
- e-mail action job
- implementing, Oozie used / Implementing an e-mail action job using Oozie, How to do it..., How it works...
- Encrypted Data Encryption Key(EDEK) / How it works...
- Encryption Zone Key( EZK) / How it works...
- Extract-Transform-Load (ETL)
- about / Introduction
F
- FILTER By queries
- performing, in Pig / Performing FILTER By queries in Pig, How to do it...
- Flume
- used, for importing Twitter data into HDFS / Importing Twitter data into HDFS using Flume, How to do it...
- about / How it works
- used, for importing data from Kafka into HDFS / Importing data from Kafka into HDFS using Flume, How to do it..., How it works
- used, for importing web logs data into HDFS / Importing web logs data into HDFS using Flume, How to do it..., How it works...
- full outer join / Full outer join
G
- Google File System (GFS)
- about / Introduction
- graphs
- processing, Graph X used / Processing graphs using Graph X, How to do it...
- Graph X
- used, for processing graphs / Processing graphs using Graph X, How to do it...
- URL / How it works...
- Group By queries
- performing, in Pig / Performing Group By queries in Pig, How to do it...
- Group operator
- reference / How it works...
H
- Hadoop
- about / Introduction
- configuration parameters / How to do it...
- support, adding for new writable data type / Adding support for a new writable data type in Hadoop, How to do it..., How it works...
- used, for sensitive data masking / Sensitive data masking and encryption using Hadoop, Getting ready, Solution, How it works...
- used, for encryption / Sensitive data masking and encryption using Hadoop, Getting ready, Solution, How it works...
- problem statement / Problem statement
- solution / Solution
- Hadoop, components
- storage / Introduction
- processing / Introduction
- Hadoop 2.7
- download link / How to do it...
- Hadoop cluster
- benchmarking, performing on / Performing benchmarking on a Hadoop cluster, How to do it..., TestDFSIO, How it works...
- data, importing from / Importing data from another Hadoop cluster, How it works...
- Map Reduce program, executing in / Executing the Map Reduce program in a Hadoop cluster, How it works...
- Hadoop clusters
- new nodes, adding to / Adding new nodes to existing Hadoop clusters, How it works...
- Hadoop Distributed File System (HDFS)
- Hbase
- reference / Introduction, Getting ready
- Hbase operation
- performing, in CLI / Performing the Hbase operation in CLI, How to do it, How it works...
- performing, in Java / Performing Hbase operations in Java, How it works...
- HDFS
- replication factor, modifying of existing file / Changing the replication factor of an existing file in HDFS, How it works...
- transparent encryption, enabling for / Enabling transparent encryption for HDFS, How to do it..., How it works...
- compressed data, saving on / Saving compressed data in HDFS, How it works...
- HDFS, to local machine
- data, exporting from / Getting ready, How it works...
- HDFS block size
- setting, for files in cluster / Getting ready, How it works...
- setting, for specific file in cluster / Getting ready, How it works...
- Hive
- reference / Introduction
- User Defined functions, writing in / Writing a user-defined function in Hive, How to do it
- table joins, performing in / Performing table joins in Hive, How to do it...
- map side joins, executing in / Executing map side joins in Hive, How to do it...
- context Ngram, performing in / Performing context Ngram in Hive, How it works...
- used, for performing Call Data Record Analytics / How to do it..., How it works...
- used, for performing Twitter sentiment analysis / Twitter sentiment analysis using Hive, How to do it..., How it works
- used, for implementing Change Data Capture (CDC) / Implementing Change Data Capture using Hive, How to do it, How it works
- used, for inserting multiple tables / Multiple table inserting using Hive, How to do it
- Hive action job
- implementing, Oozie used / Implementing a Hive action job using Oozie, How to do it..., How it works...
- Hive data
- processing, in sequential file format / Storing and processing Hive data in a sequential file format, How to do it...
- storing, in sequential file format / Storing and processing Hive data in a sequential file format, How to do it...
- processing, in RC file format / Storing and processing Hive data in the ORC file format, How it works...
- storing, in RC file format / Storing and processing Hive data in the ORC file format, How it works...
- processing, in ORC file format / Storing and processing Hive data in the ORC file format, How it works...
- storing, in ORC file format / Storing and processing Hive data in the ORC file format, How it works...
- storing, in Parquet file format / Storing and processing Hive data in the Parquet file format, How to do it...
- processing, in Parquet file format / Storing and processing Hive data in the Parquet file format, How to do it...
- processing, in AVRO format / Processing Hive data in the Avro format, How to do it..., How it works...
- Hive JSON SerDe
- used, for processing JSON data / Processing JSON data in Hive using JSON SerDe, How to do it..., How it works...
- Hive XML SerDe
- used, for processing XML data / Processing XML data in Hive using XML SerDe, How to do it..., How it works
I
- incremental import
- defining, Sqoop used / Incremental import using Sqoop, How to do it...
- installing
- Single Node Hadoop Cluster / Getting ready, How to do it..., How it works...
- multi-node Hadoop cluster / Installing a multi-node Hadoop cluster, How to do it..., How it works...
- iris.txt
- URL / How to do it...
- Iris flower data set
- reference / Performing Predictive Analytics using R
- iris flowers
- URL / How to do it...
- item based recommendation engine
- setting up, Mahout used / Creating an item-based recommendation engine using Mahout, How to do it..., How it works...
J
- Java
- Hbase operation, performing in / Performing Hbase operations in Java, How it works...
- Java action job
- implementing, Oozie used / Implementing a Java action job using Oozie, How to do it
- Java JDK
- download link / How to do it...
- job
- scheduling, in Oozie / Scheduling a job in Oozie, How to do it...
- JOINS
- performing, in Pig / Performing JOINS in Pig, How to do it..., How it works
- JSON data
- processing, Hive JSON SerDe used / Processing JSON data in Hive using JSON SerDe, How to do it..., How it works...
- analyzing, Spark used / Analyzing JSON data using Spark , How to do it..., How it works...
- JSON SerDe binaries
- references / Getting ready
K
- K-Means
- reference / Clustering text data using K-Means
- Kafka
- Key Management Server (KMS) / How to do it...
L
- LazyOutputFormat
- reference link / How it works...
- left outer join / Left outer join
- left semi join / Left semi join
- local machine, to HDFS
- data, loading from / Loading data from a local machine to HDFS, How it works...
- Luhn's Algorithm
- URL / Solution
M
- Mahout
- about / Introduction
- download link / How to do it...
- used, for setting up item based recommendation engine / Creating an item-based recommendation engine using Mahout, How to do it..., How it works...
- used, for setting up user based recommendation engine / Creating a user-based recommendation engine using Mahout, How to do it..., How it works...
- used, for performing predictive analytics on bank data / Using Predictive analytics on Bank Data using Mahout, How to do it...
- Mahout algorithms
- reference / How it works...
- Mahout development environment
- Mainframes
- data, importing into HDFS from / Importing data into HDFS from Mainframes
- about / Importing data into HDFS from Mainframes
- Map Reduce
- about / Introduction
- used, for performing Reduce side Joins / Performing Reduce side Joins using Map Reduce, How to do it, How it works...
- Map Reduce action job
- implementing, Oozie used / Implementing a Map Reduce action job using Oozie, How to do it...
- Map Reduce code
- unit testing, MRUnit used / Unit testing the Map Reduce code using MRUnit, How to do it..., How it works...
- Map Reduce program
- writing, in Java for web log data analysis / Writing the Map Reduce program in Java to analyze web log data, How to do it..., How it works...
- executing, in Hadoop cluster / Executing the Map Reduce program in a Hadoop cluster, How it works...
- user-defined counter, implementing in / Implementing a user-defined counter in a Map Reduce program, How to do it..., How it works...
- used, for finding top X / Map Reduce program to find the top X, How to do it...
- used, for finding distinct values / Map Reduce program to find distinct values, How to do it
- writing, for data partitioning / Map Reduce program to partition data using a custom partitioner, How to do it..., How it works...
- MapReduce programing Hbase Table
- Map Reduce results
- writing, to multiple output files / Writing Map Reduce results to multiple output files, How to do it...
- map side joins
- executing, in Hive / Executing map side joins in Hive, How to do it...
- reference / How it works...
- merge joins
- about / Merge Joins
- reference / Merge Joins
- MRBench
- benchmarking / MRBench
- MRUnit
- Map Reduce code, unit testing / Unit testing the Map Reduce code using MRUnit, How to do it..., How it works...
- multi-node Hadoop cluster
- installing / Installing a multi-node Hadoop cluster, How to do it..., How it works...
- multiple tables
- inserting, Hive used / Multiple table inserting using Hive, How to do it
- MySQL connector
- URL / Getting ready
N
- Naive Bayes algorithm
- reference / How it works...
- Ngrams
- reference / How it works...
- NNBench
- benchmarking / NNBench
- nodes
- adding, to existing Hadoop clusters / Adding new nodes to existing Hadoop clusters, How it works...
O
- Olympics Athletes Data Analytics
- defining, Spark Shell used / Olympics Athletes analytics using the Spark Shell, How to do it...
- URL / How to do it...
- Oozie
- used, for implementing Sqoop action job / Implementing a Sqoop action job using Oozie, How to do it...
- used, for implementing Map Reduce action job / Implementing a Map Reduce action job using Oozie, How to do it...
- used, for implementing Java action job / Implementing a Java action job using Oozie, How to do it
- used, for implementing Hive action job / Implementing a Hive action job using Oozie, How to do it..., How it works...
- used, for implementing Pig action job / Implementing a Pig action job using Oozie, How to do it..., How it works
- used, for implementing e-mail action job / Implementing an e-mail action job using Oozie, How to do it..., How it works...
- used, for executing parallel jobs / Executing parallel jobs using Oozie (fork), How to do it...
- job, scheduling / Scheduling a job in Oozie, How to do it...
- ORC file format
- Hive data, storing in / Storing and processing Hive data in the ORC file format, How it works...
- Hive data, processing in / Storing and processing Hive data in the ORC file format, How it works...
- Order By queries
- performing, in Pig / Performing Order By queries in Pig, How it works...
P
- parallel jobs
- executing, Oozie used / Executing parallel jobs using Oozie (fork), How to do it...
- Parquet
- about / Analyzing Parquet files using Spark
- URL / How to do it...
- Parquet file format
- Hive data, processing in / Storing and processing Hive data in the Parquet file format, How to do it...
- Hive data, storing in / Storing and processing Hive data in the Parquet file format, How it works...
- reference / How it works...
- Parquet files
- analyzing, Spark used / Getting ready, How to do it..., How it works...
- Pearson product-moment correlation coefficient
- reference / How it works...
- people.json sample
- URL / How to do it...
- Pig
- reference / Introduction
- FILTER By queries, performing in / Performing FILTER By queries in Pig, How to do it...
- Group By queries, performing in / Performing Group By queries in Pig, How to do it...
- Order By queries, performing in / Performing Order By queries in Pig, How it works...
- JOINS, performing in / Performing JOINS in Pig, How to do it...
- user-defined function, writing in / Writing a user-defined function in Pig, How to do it...
- used, for analyzing web log data / Analyzing web log data using Pig, How to do it...
- Pig 0.15
- reference / Getting ready
- Pig action job
- implementing, Oozie used / Implementing a Pig action job using Oozie, How to do it..., How it works
- population data analytics
- performing, R used / Performing Population Data Analytics using R, How to do it...
- predictive analytics
- performing, R used / Performing Predictive Analytics using R, How to do it...
- conducting, Spark MLib used / Conducting predictive analytics using Spark MLib, How to do it..., How it works...
Q
- query operator
- used, in Sqoop import / Using query operator in Sqoop import, How it works...
R
- R
- about / Introduction
- used, for performing population data analytics / Performing Population Data Analytics using R, How to do it...
- used, for performing Twitter sentiment analytics / Performing Twitter Sentiment Analytics using R, How to do it..., How it works...
- used, for performing predictive analytics / Performing Predictive Analytics using R, How to do it...
- RC file format
- Hive data, storing in / Storing and processing Hive data in the ORC file format, How it works...
- Hive data, processing in / Storing and processing Hive data in the ORC file format, How it works...
- Reduce side Joins
- performing, Map Reduce used / Performing Reduce side Joins using Map Reduce, How to do it, How it works...
- Remote Procedure Calls (RPC) / Processing Hive data in the Avro format
- replicated joins
- about / Replicated Joins
- reference / Replicated Joins
- replication factor
- modifying, of existing file in HDFS / Changing the replication factor of an existing file in HDFS, How it works...
- right outer join / Right outer join
S
- safe mode
- sequential file format
- Hive data, storing in / Storing and processing Hive data in a sequential file format, How to do it...
- Hive data, processing in / Storing and processing Hive data in a sequential file format, How to do it...
- SGD for logistic regression
- reference / How to do it...
- Single Node Hadoop Cluster
- installing / Getting ready, How to do it..., How it works...
- HDFS file operations, performing on / There's more
- skewed joins
- about / Skewed Joins
- reference / Skewed Joins
- Spark
- running, on YARN / Running Spark on YARN, How to do it...
- used, for analyzing Parquet files / Getting ready, How to do it..., How it works...
- used, for analyzing JSON data / Analyzing JSON data using Spark , How to do it..., How it works...
- Spark Shell
- used, for Olympics Athletes Data Analytics / Olympics Athletes analytics using the Spark Shell, How to do it...
- Spark standalone
- running / Running Spark standalone, How to do it..., How it works...
- Spark streaming
- used, for Twitter trending topics / Twitter trending topics using Spark streaming, How to do it...
- Spark Streaming
- used, for creating Twitter trending topics / Creating Twitter trending topics using Spark Streaming, How to do it..., How it works...
- URL / How it works...
- Sqoop
- used, for importing data from RDMBS to HDFS / Importing data from RDMBS to HDFS using Sqoop, How to do it...
- used, for performing Atomic export / Performing Atomic export using Sqoop, How it works...
- used, for importing data into Hive table / Importing data into Hive tables using Sqoop, How to do it..., How it works...
- used, for incremental import / Incremental import using Sqoop, How to do it...
- used, for importing data from RDBMS to Hbase / Importing data from RDBMS to Hbase using Sqoop, How to do it..., How it works...
- Sqoop, in compressed format
- used, for importing data / Importing data using Sqoop in compressed format, How to do it..., How it works...
- Sqoop action job
- implementing, Oozie used / Implementing a Sqoop action job using Oozie, How to do it...
- Sqoop import
- query operator, using / Using query operator in Sqoop import, How to do it...
- Sqoop job
- creating / Creating and executing Sqoop job, How it works...
- executing / Creating and executing Sqoop job, How it works...
- Stochastic Gradient Descent (SGD) / How to do it...
T
- table joins
- performing, in Hive / Performing table joins in Hive, How to do it...
- left outer join / Left outer join
- right outer join / Right outer join
- full outer join / Full outer join
- left semi join / Left semi join
- TestDFSIO
- benchmarking / TestDFSIO
- text data clustering, K-Means
- Mahout used / Clustering text data using K-Means, How to do it...
- top X
- finding, Map Reduce program used / Map Reduce program to find the top X, How to do it...
- transparent encryption
- enabling, for HDFS / Enabling transparent encryption for HDFS, How to do it..., How it works...
- reference / How it works...
- TreeMap
- reference link / How to do it...
- Twitter apps
- URL / How to do it...
- Twitter authorization tokens
- generating / How to do it...
- Twitter data
- importing Twitter data, Flume used / Importing Twitter data into HDFS using Flume, How to do it...
- Twitter sentiment analysis
- performing, Hive used / Twitter sentiment analysis using Hive, How to do it..., How it works
- Twitter sentiment analytics
- performing, R used / Performing Twitter Sentiment Analytics using R, How to do it..., How it works...
- Twitter trending topics
- creating, Spark Streaming used / Creating Twitter trending topics using Spark Streaming, How to do it..., How it works...
- defining, Spark streaming used / Twitter trending topics using Spark streaming, How to do it...
U
- uniform data distribution
- balancer command, executing for / Executing the balancer command for uniform data distribution, How to do it...
- user-defined counter
- implementing, in Map Reduce program / Implementing a user-defined counter in a Map Reduce program, How to do it..., How it works...
- user-defined function
- writing, in Pig / Writing a user-defined function in Pig, How to do it...
- User-Defined Functions (UDFs)
- about / Introduction
- user based recommendation engine
- setting up, Mahout used / Creating a user-based recommendation engine using Mahout, How to do it..., How it works...
- User Defined functions
- writing, in Hive / Writing a user-defined function in Hive, How to do it
W
- Web log analytics
- defining / Web log analytics, Solution
- references / Getting ready
- problem statement / Problem statement
- solution / Solution
- web log data
- analyzing, Pig used / Analyzing web log data using Pig, How to do it...
- web logs data into HDFS
- importing, Flume used / Importing web logs data into HDFS using Flume, How to do it..., How it works...
X
- XML data
- processing, Hive XML SerDe used / Processing XML data in Hive using XML SerDe, How to do it..., How it works
- XML SerDe
- references / Getting ready
Y
- YARN
- Spark, running on / Running Spark on YARN, How to do it...
- Yet Another Resource Negotiator (YARN)