Index
A
- advertisements
- advertisements, to keywords
- assigning, Adwords balance algorithm used / Assigning advertisements to keywords using the Adwords balance algorithm, How to do it..., How it works...
- adwords
- Adwords balance algorithm
- used, for assigning advertisements to keywords / Assigning advertisements to keywords using the Adwords balance algorithm, How to do it..., How it works...
- Amazon EC2
- Apache HBase cluster, deploying on / Deploying an Apache HBase cluster on Amazon EC2 using EMR, How to do it..., See also
- Amazon EC2 console
- URL / How to do it...
- Amazon EC2 Spot Instances
- Amazon EMR console
- Amazon EMR job flow
- creating, AWS CLI used / Creating an Amazon EMR job flow using the AWS Command Line Interface, How to do it..., See also
- Amazon product co-purchasing network metadata dataset
- reference link / Introduction, How to do it...
- Amazon S3 monitoring console
- URL / How to do it..., How to do it...
- Amazon Web Services (AWS) / Running Hadoop MapReduce v2 computations using Amazon Elastic MapReduce
- analytics
- performing, MapReduce used / Simple analytics using MapReduce, Getting ready, How it works...
- Apache Hadoop cluster
- deploying, Apache Whirr used / Using Apache Whirr to deploy an Apache Hadoop cluster in a cloud environment, How to do it..., How it works...
- Apache HBase
- about / Getting started with Apache HBase, How to do it...
- URL / See also
- configuring, as backend data store for Apache Nutch / Configuring Apache HBase as the backend data store for Apache Nutch, Getting ready, How to do it...
- Apache HBase cluster
- deploying, on Amazon EC2 / Deploying an Apache HBase cluster on Amazon EC2 using EMR, How to do it..., See also
- Apache Hive
- used, for querying SQL-style data / Simple SQL-style data querying using Apache Hive, How to do it..., There's more...
- Apache Lucene project
- Apache Mahout
- about / Getting started with Apache Mahout, How it works..., There's more...
- K-means, running with / Running K-means with Mahout, How it works...
- references / There's more...
- used, for clustering text data / Clustering text data using Apache Mahout, How it works...
- Apache Nutch
- used, for intradomain web crawling / Intradomain web crawling using Apache Nutch, How to do it...
- Apache HBase, configuring as backend data store for / Configuring Apache HBase as the backend data store for Apache Nutch, Getting ready, How to do it...
- URL / How to do it...
- Apache Nutch 2.2.1
- URL / How to do it...
- Apache Nutch search engine
- about / Introduction
- Apache Oozie
- about / There's more...
- Apache Pig
- Apache Solr
- about / Indexing and searching web documents using Apache Solr
- used, for indexing web documents / Indexing and searching web documents using Apache Solr, How to do it..., How it works...
- used, for searching web documents / Indexing and searching web documents using Apache Solr, How to do it..., How it works...
- URL / How to do it...
- Apache Sqoop
- used, for importing data to HDFS from relational database / Importing data to HDFS from a relational database using Apache Sqoop, How to do it...
- used, for exporting data from HDFS to relational database / Exporting data from HDFS to a relational database using Apache Sqoop, How to do it...
- Apache Tez
- used, as execution engine for Hive / Using Apache Tez as the execution engine for Hive
- about / Using Apache Tez as the execution engine for Hive
- Apache Whirr
- used, for deploying Apache Hadoop cluster / Using Apache Whirr to deploy an Apache Hadoop cluster in a cloud environment, How to do it..., How it works...
- Apache Whirr binary distribution
- URL / How to do it...
- ApplicationMaster
- about / Hadoop YARN
- archives
- distributing, DistributedCache used / Distributing archives using the DistributedCache
- AWS account
- URL / How to do it...
- AWS CLI
- used, for creating Amazon EMR job flow / Creating an Amazon EMR job flow using the AWS Command Line Interface, How to do it..., See also
- AWS IAM console
- URL / How to do it...
B
- Bigtable paper
- bulkload
- used, for loading large datasets to Apache HBase data store / Loading large datasets to an Apache HBase data store – importtsv and bulkload, How to do it…, How it works...
C
- Capacity scheduler
- used, for shared user Hadoop clusters / Shared user Hadoop clusters – using Fair and Capacity schedulers, How it works...
- about / Shared user Hadoop clusters – using Fair and Capacity schedulers
- URL / There's more...
- classification
- performing, naïve Bayes classifier used / How to do it..., How it works...
- classifier
- classpath precedence
- setting, to user-provided JARs / Setting classpath precedence to user-provided JARs
- CLI
- cloud environments
- advantages / Introduction
- Cloudera CDH
- cluster deployments
- Hadoop YARN configuration, optimizing for / Optimizing Hadoop YARN and MapReduce configurations for cluster deployments, How to do it..., There's more...
- MapReduce configuration, optimizing for / Optimizing Hadoop YARN and MapReduce configurations for cluster deployments, How to do it..., There's more...
- collaborative filtering
- combiner
- combiner step
- adding, to WordCount MapReduce program / Adding a combiner step to the WordCount MapReduce program, How to do it..., There's more...
- command line
- resources adding, to DistributedCache from / Adding resources to the DistributedCache from the command line
- Command Line Interface (CLI) / How to do it...
- common configuration
- URL / There's more...
- complex dataset
- parsing, with Hadoop / Parsing a complex dataset with Hadoop, How it works...
- containers
- about / Hadoop YARN
- content-based recommendations
- about / Performing content-based recommendations
- performing / How to do it..., How it works...
- references / There's more...
- crawled web pages
- in-links graph, generating for / Generating the in-links graph for crawled web pages, How it works...
- custom Hadoop key type
- implementing / Implementing a custom Hadoop key type, How to do it..., How it works...
- custom Hadoop Writable data type
- custom InputFormat
- custom metrics
- reporting, Hadoop counters used for / Hadoop counters to report custom metrics, How it works...
D
- data
- importing, to HDFS from relational database / Importing data to HDFS from a relational database using Apache Sqoop, How to do it...
- exporting, from HDFS to relational database / Exporting data from HDFS to a relational database using Apache Sqoop, How to do it...
- de-duplicating, Hadoop streaming used / De-duplicating data using Hadoop streaming, How it works...
- data, of different value types
- emitting, from Mapper / Emitting data of different value types from a Mapper, How to do it..., How it works...
- Data Definition Language (DDL) / HCatalog – performing Java MapReduce computations on data mapped to Hive tables
- data formats
- TextInputFormat / There's more...
- NLineInputFormat / There's more...
- SequenceFileInputFormat / There's more...
- DBInputFormat / There's more...
- data mining algorithm / Getting started with Apache Mahout
- DataNode
- about / Hadoop Distributed File System – HDFS
- adding / Adding a new DataNode, How to do it...
- DataNodes
- decommissioning / Decommissioning DataNodes, How to do it...
- data preprocessing
- performing, Hadoop streaming used / Data preprocessing using Hadoop streaming and Python, How to do it..., How it works...
- performing, Python used / Data preprocessing using Hadoop streaming and Python, How to do it..., How it works...
- dependencies
- adding, between MapReduce jobs / Adding dependencies between MapReduce jobs, How it works..., There's more...
- Directed Acyclic Graphs (DAG) / There's more...
- DistributedCache
- about / Introduction
- used, for distributing archives / Distributing archives using the DistributedCache
- resources, adding to / Adding resources to the DistributedCache from the command line
- used, for adding resources to classpath / Adding resources to the classpath using the DistributedCache
- distributed cluster environment
- Hadoop YARN, setting up in / Setting up Hadoop YARN in a distributed cluster environment using Hadoop v2, Getting ready, How to do it...
- Hadoop ecosystem, setting up in / Setting up Hadoop ecosystem in a distributed cluster environment using a Hadoop distribution, How to do it...
- WordCount program, running in / Running the WordCount program in a distributed cluster environment, How to do it...
- document classification
- performing, Mahout Naive Bayes Classifier used / Document classification using Mahout Naive Bayes Classifier, How to do it..., How it works...
E
- e-mail archives
- URL / Introduction
- EC2 console
- URL / How to do it...
- Elasticsearch
- for indexing / Elasticsearch for indexing and searching, How to do it...
- for searching / Elasticsearch for indexing and searching, How to do it...
- URL / Elasticsearch for indexing and searching, How to do it..., How it works...
- EMR
- used, for running Hadoop MapReduce v2 computations / Running Hadoop MapReduce v2 computations using Amazon Elastic MapReduce, How to do it...
- Amazon EC2 Spot Instances, used with / Saving money using Amazon EC2 Spot Instances to execute EMR job flows, There's more...
- used, for executing Pig script / Executing a Pig script using EMR, How to do it..., Starting a Pig interactive session
- used, for executing Hive script / Executing a Hive script using EMR, How to do it..., Starting a Hive interactive session
- used, for deploying Apache HBase cluster / Deploying an Apache HBase cluster on Amazon EC2 using EMR, How to do it..., See also
- EMR bootstrap actions
- extract-transform-load (ETL)
- about / Introduction
F
- Fair scheduler
- used, for shared user Hadoop clusters / Shared user Hadoop clusters – using Fair and Capacity schedulers, How it works...
- about / Shared user Hadoop clusters – using Fair and Capacity schedulers
- file
- adding, to Hadoop DistributedCache / Broadcasting and distributing shared resources to tasks in a MapReduce job – Hadoop DistributedCache, How it works..., There's more...
- file replication factor
- FileSystem object
- configuring / Configuring the FileSystem object
- FILTER operator
- about / How it works...
- First in First out (FIFO) / Shared user Hadoop clusters – using Fair and Capacity schedulers
- frequency distributions
- calculating, MapReduce used / Calculating frequency distributions and sorting using MapReduce, How to do it..., There's more...
- about / Calculating frequency distributions and sorting using MapReduce
G
- gnuplot
- used, for plotting Hadoop MapReduce results / Plotting the Hadoop MapReduce results using gnuplot, How to do it..., How it works...
- URL / There's more...
- Google File System
- URL / Introduction
- Google MapReduce
- URL / Introduction
- Gradle distribution
- Gross National Income (GNI) / Running MapReduce jobs on HBase
- GROUP BY
- performing, MapReduce used / Performing GROUP BY using MapReduce, How to do it..., How it works...
H
- Hadoop
- built-in data types / There's more...
- Text / There's more...
- BytesWritable / There's more...
- VLongWritable / There's more...
- VIntWritable / There's more...
- NullWritable / There's more...
- ArrayWritable / There's more...
- TwoDArrayWritable / There's more...
- MapWritable / There's more...
- SortedMapWritable / There's more...
- used, with legacy applications / Using Hadoop with legacy applications – Hadoop streaming, How it works..., There's more...
- complex dataset, parsing with / Parsing a complex dataset with Hadoop, How it works...
- about / Introduction
- advantages / Introduction
- Hadoop configurations
- configuration files / How it works...
- Hadoop counters
- used, for reporting custom metrics / Hadoop counters to report custom metrics, How it works...
- Hadoop data types
- Hadoop DistributedCache
- Hadoop distribution
- used, for setting up Hadoop ecosystem / Setting up Hadoop ecosystem in a distributed cluster environment using a Hadoop distribution, How to do it...
- Hadoop ecosystem
- setting up, in distributed cluster environment / Setting up Hadoop ecosystem in a distributed cluster environment using a Hadoop distribution, How to do it...
- Hadoop InputFormat
- selecting, for input data format / Choosing a suitable Hadoop InputFormat for your input data format, How it works..., There's more...
- Hadoop installation mode
- about / Hadoop installation modes
- Hadoop intermediate data partitioning
- about / Hadoop intermediate data partitioning, How it works...
- TotalOrderPartitioner / TotalOrderPartitioner
- KeyFieldBasedPartitioner / KeyFieldBasedPartitioner
- Hadoop local mode
- used, for running WordCount MapReduce application / Writing a WordCount MapReduce application, bundling it, and running it using the Hadoop local mode, How to do it..., How it works..., See also
- Hadoop MapReduce
- about / Hadoop MapReduce
- benchmarking, TeraSort used / Benchmarking Hadoop MapReduce using TeraSort, How to do it...
- used, for generating inverted index / Generating an inverted index using Hadoop MapReduce, How to do it..., How it works..., There's more...
- Hadoop MapReduce applications
- unit testing, MRUnit used / Unit testing Hadoop MapReduce applications using MRUnit, How to do it...
- integration testing, YARN mini cluster used / Integration testing Hadoop MapReduce applications using MiniYarnCluster, How to do it...
- Hadoop MapReduce results
- plotting, gnuplot used / Plotting the Hadoop MapReduce results using gnuplot, How to do it..., How it works...
- Hadoop MapReduce v2 computations
- running, EMR used / Running Hadoop MapReduce v2 computations using Amazon Elastic MapReduce, How to do it...
- Hadoop OutputFormats
- used, for formatting results of MapReduce computations / Formatting the results of MapReduce computations – using Hadoop OutputFormats, How it works...
- Hadoop Streaming
- Hadoop streaming
- used, for data preprocessing / Data preprocessing using Hadoop streaming and Python, How to do it..., How it works...
- used, for de-duplicating data / De-duplicating data using Hadoop streaming, How it works...
- Hadoop v2
- setting up, on local machine / Setting up Hadoop v2 on your local machine
- used, for setting up Hadoop YARN / Setting up Hadoop YARN in a distributed cluster environment using Hadoop v2, Getting ready, How to do it...
- Hadoop v2, installation
- local mode / Hadoop installation modes
- pseudo distributed mode / Hadoop installation modes
- distributed mode / Hadoop installation modes
- Hadoop YARN
- about / Hadoop YARN
- setting up, in distributed cluster environment / Setting up Hadoop YARN in a distributed cluster environment using Hadoop v2, How to do it...
- Hadoop YARN configuration
- optimizing, for cluster deployments / Optimizing Hadoop YARN and MapReduce configurations for cluster deployments, How to do it..., There's more...
- HashPartitioner
- HBase
- about / Introduction
- Java client APIs, used for interacting with / Data random access using Java client APIs, How it works...
- MapReduce jobs, running on / Running MapReduce jobs on HBase, How to do it...
- used, for data de-duplication / Data de-duplication using HBase
- HBase cluster
- used, for web crawling with Apache Nutch / Getting ready, How to do it..., How it works...
- HBase tables
- Hive used, for inserting data into / Using Hive to insert data into HBase tables, How to do it...
- HCatalog
- about / HCatalog – performing Java MapReduce computations on data mapped to Hive tables, How to do it..., How it works...
- used, for performing Java MapReduce computations / HCatalog – performing Java MapReduce computations on data mapped to Hive tables, How to do it..., How it works...
- used, for writing data to Hive tables / HCatalog – writing data to Hive tables from Java MapReduce computations, How to do it..., How it works...
- used, for accessing Hive table data in Pig / Accessing a Hive table data in Pig using HCatalog, How to do it..., There's more...
- HDFS
- about / Introduction, Hadoop Distributed File System – HDFS, Setting up HDFS
- setting up / Setting up HDFS, How to do it...
- benchmarking / Benchmarking HDFS using DFSIO, How to do it...
- rebalancing / Rebalancing HDFS
- HDFS block size
- setting / Setting the HDFS block size, How to do it...
- HDFS command-line file
- operations / HDFS command-line file operations, How to do it...
- HDFS configuration
- URL / There's more...
- HDFS disk usage
- HDFS Java API
- HDFS replication factor
- High Availability (HA)
- High Performance Computing (HPC)
- about / Introduction
- histograms
- calculating, MapReduce used / Calculating histograms using MapReduce, How to do it..., How it works...
- about / Calculating histograms using MapReduce
- Hive
- defining / Getting started with Apache Hive, How to do it...
- data types / Hive data types
- external tables / Hive external tables
- ORDER BY / There's more...
- SORT BY / There's more...
- CLUSTER BY / There's more...
- join, performing with / Performing a join with Hive, How to do it..., How it works...
- used, for inserting data into HBase tables / Using Hive to insert data into HBase tables, How to do it...
- Hive batch mode
- query file, used for / Hive batch mode - using a query file, How to do it..., How it works..., There's more...
- Hive built-in functions
- Hive databases
- creating, Hive CLI used / Creating databases and tables using Hive CLI, How to do it..., How it works...
- Hive interactive session
- starting / Starting a Hive interactive session
- Hive Query Language (HQL) / Using Hive to insert data into HBase tables
- Hive script
- executing, EMR used / Executing a Hive script using EMR, How to do it..., Starting a Hive interactive session
- Hive table data, in Pig
- accessing, HCatalog used / Accessing a Hive table data in Pig using HCatalog, How to do it..., There's more...
- Hive tables
- creating, Hive CLI used / Creating databases and tables using Hive CLI, How to do it..., How it works...
- describe formatted command used, for inspecting metadata / Using the describe formatted command to inspect the metadata of Hive tables
- creating, Hive query results used / Creating and populating Hive tables and views using Hive query results, How to do it...
- populating, Hive query results used / Creating and populating Hive tables and views using Hive query results, How to do it...
- Java MapReduce computations, performing on data mapped to / HCatalog – performing Java MapReduce computations on data mapped to Hive tables, How to do it..., How it works...
- data, writing from Java MapReduce computations / HCatalog – writing data to Hive tables from Java MapReduce computations, How to do it..., How it works...
- Hive User-defined Functions
- Hive version
- URL / How to do it...
- Hive views
- populating, Hive query results used / Creating and populating Hive tables and views using Hive query results, How to do it...
- creating, Hive query results used / Creating and populating Hive tables and views using Hive query results, How to do it...
- Hortonworks Data Platform (HDP)
- HTTP server log data set
- Human Development Report (HDR) / Running MapReduce jobs on HBase, Running K-means with Mahout
- Human Development Reports data
- HyperSQL
- URL / How to do it...
I
- importtsv
- used, for loading large datasets to Apache HBase data store / Loading large datasets to an Apache HBase data store – importtsv and bulkload, How to do it…, How it works...
- in-links graph
- generating, for crawled web pages / Generating the in-links graph for crawled web pages, How it works...
- input data format
- Hadoop InputFormat, selecting for / Choosing a suitable Hadoop InputFormat for your input data format, How it works..., There's more...
- intradomain web crawling
- Apache Nutch, used for / Intradomain web crawling using Apache Nutch, How to do it...
- inverted document frequencies (IDF) / Creating TF and TF-IDF vectors for the text data
- inverted index
- generating, Hadoop MapReduce used / Generating an inverted index using Hadoop MapReduce, How to do it..., How it works..., There's more...
J
- Java client APIs
- used, for interacting with HBase / Data random access using Java client APIs, How it works...
- Java Integrated Development Environment (IDE)
- about / There's more...
- Java MapReduce computations
- performing, on data mapped to Hive tables / HCatalog – performing Java MapReduce computations on data mapped to Hive tables, How to do it..., How it works...
- data, writing to Hive tables from / HCatalog – writing data to Hive tables from Java MapReduce computations, How to do it..., How it works...
- Java regular expressions
- URL / There's more...
- Java Virtual Machine (JVM)
- JobTracker process
- about / Hadoop MapReduce
- join
- performing, with Hive / Performing a join with Hive, How to do it..., How it works...
K
- K-means
- running, with Apache Mahout / Running K-means with Mahout, How it works...
- KeyFieldBasedPartitioner / KeyFieldBasedPartitioner
- KeyValueTextInputFormat
- about / How it works...
L
- large datasets, to Apache HBase data store
- loading, importtsv used / Loading large datasets to an Apache HBase data store – importtsv and bulkload, How to do it…, How it works...
- loading, bulkload used / Loading large datasets to an Apache HBase data store – importtsv and bulkload, How to do it…, How it works..., There's more...
- LDA
- used, for topic discovery / Topic discovery using Latent Dirichlet Allocation (LDA), How to do it…, How it works…
- legacy applications
- Hadoop, using with / Using Hadoop with legacy applications – Hadoop streaming, How it works..., There's more...
- LIMIT operator
- about / How it works...
- list of data blocks
- retrieving / Retrieving the list of data blocks of a file
M
- machine learning algorithm / Getting started with Apache Mahout
- Mahout
- about / Introduction
- Mahout Naive Bayes Classifier
- used, for document classification / Document classification using Mahout Naive Bayes Classifier, How to do it..., How it works...
- MapFile
- MapFileOutputFormat format / Outputting a random accessible indexed InvertedIndex
- Map function / Simple analytics using MapReduce
- Mapper
- data of different value types, emitting from / Emitting data of different value types from a Mapper, How to do it..., How it works...
- MapReduce / Introduction
- used, for simple analytics / Simple analytics using MapReduce, Getting ready, How it works...
- used, for performing GROUP BY / Performing GROUP BY using MapReduce, How to do it..., How it works...
- used, for calculating frequency distributions / Calculating frequency distributions and sorting using MapReduce, How to do it..., There's more...
- used, for calculating sorting / Calculating frequency distributions and sorting using MapReduce, How to do it..., There's more...
- used, for calculating histograms / Calculating histograms using MapReduce, How to do it..., How it works...
- used, for calculating Scatter plots / Calculating Scatter plots using MapReduce, How to do it..., How it works...
- used, for joining two datasets / Joining two datasets using MapReduce, How to do it..., How it works...
- MapReduce computation
- multiple outputs, writing from / Writing multiple outputs from a MapReduce computation, How to do it..., How it works...
- MapReduce computations, results
- formatting, Hadoop OutputFormats used / Formatting the results of MapReduce computations – using Hadoop OutputFormats, How it works...
- MapReduce configuration
- optimizing, for cluster deployments / Optimizing Hadoop YARN and MapReduce configurations for cluster deployments, How to do it..., There's more...
- URL / There's more...
- MapReduce jobs
- dependencies, adding between / Adding dependencies between MapReduce jobs, How it works..., There's more...
- running, on HBase / Running MapReduce jobs on HBase, How to do it...
- MapReduce programming model
- Map function / Hadoop MapReduce
- Reduce function / Hadoop MapReduce
- MRUnit
- used, for unit testing Hadoop MapReduce applications / Unit testing Hadoop MapReduce applications using MRUnit, How to do it...
- about / Unit testing Hadoop MapReduce applications using MRUnit
- URL / See also
- multiple disks/volumes
- multiple input data types
- used, in single MapReduce application / Using multiple input data types and multiple Mapper implementations in a single MapReduce application
- multiple Mapper implementations
- used, in single MapReduce application / Using multiple input data types and multiple Mapper implementations in a single MapReduce application
- multiple outputs
- writing, from MapReduce computation / Writing multiple outputs from a MapReduce computation, How to do it..., How it works...
N
- 20 Newsgroups dataset
- URL / Introduction
- N-dimensional space / Running K-means with Mahout
- NameNode
- NASA weblog dataset
- URL / Introduction
- naïve Bayer classifier
- naïve Bayes classifier
- used, for classification / How to do it..., How it works...
O
- Oracle JDK
- URL / Getting ready
- ORC files
- used, for storing table data / Utilizing different storage formats in Hive - storing table data using ORC files, How to do it...
- ORDER BY operator
- about / How it works...
P
- partitioned Hive tables
- creating / Creating partitioned Hive tables, How to do it...
- Partitioner
- about / Introduction
- password-less SSH
- configuring / How to do it...
- Pig
- about / Introduction
- URL / Getting started with Apache Pig
- used, for joining two datasets / Joining two datasets using Pig, How it works...
- Pig interactive session
- starting / Starting a Pig interactive session
- Pig Latin
- about / Getting started with Apache Pig
- Pig script
- executing, EMR used / Executing a Pig script using EMR, How to do it..., Starting a Pig interactive session
- PostgreSQL JDBC driver
- URL / How to do it...
- predefined bootstrap actions
- configure-daemons / There's more...
- configure-hadoop / There's more...
- memory-intensive / There's more...
- run-if / There's more...
- Puppet-based cluster installation
- URL / There's more...
- Python
- used, for data preprocessing / Data preprocessing using Hadoop streaming and Python, How to do it..., How it works...
Q
- query file
- used, for Hive batch mode / Hive batch mode - using a query file, How to do it..., How it works..., There's more...
R
- random accessible indexed InvertedIndex
- outputting / Outputting a random accessible indexed InvertedIndex
- recommendations
- about / Performing content-based recommendations
- making, ways / Performing content-based recommendations
- Reduce function / Simple analytics using MapReduce
- Reduce input values
- repository files
S
- S3 bucket
- about / How to do it...
- URL / How to do it...
- sample code, GitHub
- URL / Introduction
- Scatter plots
- calculating, MapReduce used / Calculating Scatter plots using MapReduce, How to do it..., How it works...
- about / Calculating Scatter plots using MapReduce
- SequenceFileInputFormat
- subclasses / There's more...
- shared user Hadoop clusters
- Capacity scheduler, used for / Shared user Hadoop clusters – using Fair and Capacity schedulers, How it works...
- Fair scheduler, used for / Shared user Hadoop clusters – using Fair and Capacity schedulers, How it works...
- shuffling
- about / Introduction
- Simple Storage Service (S3) / Running Hadoop MapReduce v2 computations using Amazon Elastic MapReduce
- single MapReduce application
- multiple input data types, used in / Using multiple input data types and multiple Mapper implementations in a single MapReduce application
- multiple Mapper implementations, used in / Using multiple input data types and multiple Mapper implementations in a single MapReduce application
- SolrCloud
- URL / See also
- sorting
- calculating, MapReduce used / Calculating frequency distributions and sorting using MapReduce, How to do it..., There's more...
- SQL-style data
- querying, Apache Hive used / Simple SQL-style data querying using Apache Hive, How to do it..., There's more...
- Sqoop
- about / Introduction
- stragglers
- straggling tasks
- executing / Speculative execution of straggling tasks
T
- table data
- storing, ORC files used / Utilizing different storage formats in Hive - storing table data using ORC files, How to do it...
- TaskTrackers
- about / Hadoop MapReduce
- TeraSort
- used, for benchmarking Hadoop MapReduce / Benchmarking Hadoop MapReduce using TeraSort, How to do it...
- term frequencies (TF) / Creating TF and TF-IDF vectors for the text data
- Term frequency-inverse document frequency (TF-IDF) / Creating TF and TF-IDF vectors for the text data
- text data
- TF-IDF vector, creating for / Creating TF and TF-IDF vectors for the text data, How to do it…, How it works…
- TF vector, creating for / Creating TF and TF-IDF vectors for the text data, How to do it…, How it works…
- clustering, Apache Mahout used / Clustering text data using Apache Mahout, How it works...
- TF-IDF vector
- creating, for text data / Creating TF and TF-IDF vectors for the text data, How to do it…, How it works…
- TF vector
- creating, for text data / Creating TF and TF-IDF vectors for the text data, How to do it…, How it works…
- TotalOrderPartitioner / TotalOrderPartitioner
- Twahpic
- two datasets
- joining, MapReduce used / Joining two datasets using MapReduce, How to do it..., How it works...
- joining, Pig used / Joining two datasets using Pig, How it works...
U
- User-defined Function (UDF)
- user-provided JARs
- classpath precedence, setting to / Setting classpath precedence to user-provided JARs
V
- VM, for Amazon EMR jobs
- configuring, EMR bootstrap actions used / Using EMR bootstrap actions to configure VMs for the Amazon EMR jobs, How to do it..., There's more...
W
- web crawling
- web crawling, with Apache Nutch
- performing, Hadoop cluster used / Whole web crawling with Apache Nutch using a Hadoop/HBase cluster, How to do it..., How it works...
- performing, HBase cluster used / Whole web crawling with Apache Nutch using a Hadoop/HBase cluster, How to do it..., How it works...
- web documents
- indexing, Apache Solr used / Indexing and searching web documents using Apache Solr, How to do it..., How it works...
- searching, Apache Solr used / Indexing and searching web documents using Apache Solr, How to do it..., How it works...
- web searching
- about / Introduction
- Whirr configuration
- URL / How it works...
- WordCount MapReduce application
- writing / Writing a WordCount MapReduce application, bundling it, and running it using the Hadoop local mode, How to do it..., How it works...
- bundling / Writing a WordCount MapReduce application, bundling it, and running it using the Hadoop local mode, How to do it..., How it works..., There's more...
- running, Hadoop local mode used / Writing a WordCount MapReduce application, bundling it, and running it using the Hadoop local mode, How to do it..., How it works..., There's more...
- WordCount MapReduce program
- combiner step, adding to / Adding a combiner step to the WordCount MapReduce program, How to do it..., There's more...
- WordCount program
- running, in distributed cluster environment / Running the WordCount program in a distributed cluster environment, How to do it...
Y
- YARN (Yet Another Resource Negotiator)
- about / Hadoop YARN
- YARN configuration
- URL / There's more...
- YARN mini cluster
- used, for integration testing Hadoop MapReduce applications / Integration testing Hadoop MapReduce applications using MiniYarnCluster, How to do it...
Z
- zipf (power law) distribution
- about / How to do it...