Index
A
- Ambari
- about / Performance monitoring tools
- Apache Ambari
- URL / Using Apache Ambari to monitor Hadoop
- using, for Hadoop monitoring / Using Apache Ambari to monitor Hadoop
- Associative property
- about / Using Combiners
- URL / Using Combiners
B
- best practices, Hadoop / Hadoop best practices and recommendations, Hadoop tuning recommendations, Using a MapReduce template class code
- Bios tuning
- checklists / The Bios tuning checklist
- Blacklisted node / Checking the Hadoop cluster node's health
C
- Cacti / Checking the Hadoop cluster node's health
- Chukwa
- about / Performance monitoring tools
- using, for Hadoop monitoring / Using Chukwa to monitor Hadoop
- collect() method / Using Combiners
- Combiner
- about / Using Combiners
- using / Using Combiners
- Commutative property / Using Combiners
- Associative property / Using Combiners
- implementing / Using Combiners
- Hadoop counters, focusing on / Using Combiners
- Combiner function / The MapReduce model
- Commutative property
- about / Using Combiners
- URL / Using Combiners
- Completely Fair Queuing (CFQ) / OS configuration recommendations
- Compress Input phase / Using compression
- compression
- Compress Mapper output phase / Using compression
- Compress Reducer output phase / Using compression
- Context Switch / Checking for CPU contention
- core-site.xml configuration file
- about / The core-site.xml configuration file
- fs.default.name variable / The core-site.xml configuration file
- hadoop.tmp.dir variable / The core-site.xml configuration file
- fs.checkpoint.dir variable / The core-site.xml configuration file
- io.file.buffer.size variable / The core-site.xml configuration file
- CPU-related parameters
- mapred.tasktracker.map.tasks.maximum variable / The CPU-related parameters
- mapred.tasktracker.reduce.tasks.maximum variable / The CPU-related parameters
- CPU bottlenecks
- identifying / Identifying CPU bottlenecks
D
- data locality / Sizing your Hadoop cluster
- DataNodes / An overview of Hadoop MapReduce
- dfs.access.time.precision variable / The hdfs-site.xml configuration file
- dfs.balance.bandwidthPerSec variable / The hdfs-site.xml configuration file
- dfs.block.size variable / The hdfs-site.xml configuration file
- dfs.data.dir (hdfs-site.xml) variable / The disk I/O related parameters
- dfs.data.dir variable / The disk I/O related parameters
- dfs.datanode.du.reserved variable / The hdfs-site.xml configuration file
- dfs.datanode.handler.count variable / The hdfs-site.xml configuration file
- dfs.name.dir variable / The hdfs-site.xml configuration file
- dfs.name.edits.dir variable / The hdfs-site.xml configuration file
- dfs.namenode.handler.count parameter / The hdfs-site.xml configuration file
- dfs.replication.considerLoad variable / The hdfs-site.xml configuration file
- dfs.replication variable / The hdfs-site.xml configuration file
- disk I/O-related parameters
- about / The disk I/O related parameters
- mapred.compress.map.output variable / The disk I/O related parameters
- mapred.output.compress parameter / The disk I/O related parameters
- mapred.map.output.compression.codec parameter / The disk I/O related parameters
- mapred.local.dir variable / The disk I/O related parameters
- dfs.data.dir (hdfs-site.xml) variable / The disk I/O related parameters
- mapred.child.java.opts variable / The memory-related parameters
- Mapred.child.ulimit variable / The memory-related parameters
- io.sort.mb variable / The memory-related parameters
- io.sort.factor / The memory-related parameters
- mapred.job.reduce.input.buffer.percent variable / The memory-related parameters
E
- Excluded node / Checking the Hadoop cluster node's health
F
- Fetch phase / Enhancing map tasks
- FileInputFormat class / Using appropriate Writable types
- FileOutputFormat class / Using a MapReduce template class code
- fs.checkpoint.dir variable / The core-site.xml configuration file
- fs.default.name variable / The core-site.xml configuration file
G
- Ganglia
- URL / Using Ganglia to monitor Hadoop
- using, for Hadoop monitoring / Using Ganglia to monitor Hadoop
- Ganglia Collector / Using Ganglia to monitor Hadoop
- garbage collector (GC) / Hadoop tuning recommendations
- GFS / An overview of Hadoop MapReduce
- Graylisted node / Checking the Hadoop cluster node's health
- Graylisted nodes / Checking the Hadoop cluster node's health
- gzip codec / Using compression
H
- Hadoop
- JobTracker / Sizing your Hadoop cluster
- TaskTracker / Sizing your Hadoop cluster
- best practices / Hadoop best practices and recommendations
- deploying / Deploying Hadoop
- tuning recommendations, checklists / Hadoop tuning recommendations
- I/O tuning recommendations / Hadoop tuning recommendations
- minimal configuration checklist / Hadoop tuning recommendations
- hadoop.tmp.dir variable / The core-site.xml configuration file
- Hadoop cluster / An overview of Hadoop MapReduce
- configuring / Creating a performance baseline
- weakness, identifying / Identifying cluster weakness
- nodes health, checking / Checking the Hadoop cluster node's health
- input data size, checking / Checking the input data size
- massive I/O, checking / Checking massive I/O and network traffic
- network traffic, checking / Checking massive I/O and network traffic
- insufficient concurrent tasks, checking / Checking for insufficient concurrent tasks
- CPU contention, checking / Checking for CPU contention
- sizing / Sizing your Hadoop cluster
- configuring correctly / Configuring your cluster correctly
- checklists / The Hadoop cluster checklist
- Hadoop cluster weakness
- Hadoop Distributed File System (HDFS) / Enhancing map tasks
- Hadoop job
- components / Factors affecting the performance of MapReduce
- Hadoop MapReduce
- overview / An overview of Hadoop MapReduce
- internals / Hadoop MapReduce internals
- performance, affecting factors / Factors affecting the performance of MapReduce
- metrics / Hadoop MapReduce metrics
- Hadoop parameter
- dfs.replication / Creating a performance baseline
- dfs.block.size / Creating a performance baseline
- dfs.namenode.handler.count / Creating a performance baseline
- dfs.datanode.handler.count / Creating a performance baseline
- io.sort.factor / Creating a performance baseline
- io.sort.mb / Creating a performance baseline
- mapred.tasktracker.map.tasks.maximum / Creating a performance baseline
- mapred.map.tasks / Creating a performance baseline
- mapred.reduce.tasks / Creating a performance baseline
- mapred.tasktracker.reduce.tasks.maximum / Creating a performance baseline
- mapred.reduce.parallel.copies / Creating a performance baseline
- mapred.job.reduce.input.buffer.percent / Creating a performance baseline
- mapred.child.java.opts / Creating a performance baseline
- Hadoop parameters
- investigating / Investigating the Hadoop parameters
- mapred-site.xml configuration file / The mapred-site.xml configuration file
- HaLoop / Optimizing mappers and reducers code
- HDFS
- about / An overview of Hadoop MapReduce
- NameNode / An overview of Hadoop MapReduce
- DataNodes / An overview of Hadoop MapReduce
- hdfs-site.xml configuration file
- about / The hdfs-site.xml configuration file
- dfs.access.time.precision variable / The hdfs-site.xml configuration file
- dfs.balance.bandwidthPerSec variable / The hdfs-site.xml configuration file
- dfs.block.size variable / The hdfs-site.xml configuration file
- dfs.data.dir variable / The hdfs-site.xml configuration file
- dfs.datanode.du.reserved variable / The hdfs-site.xml configuration file
- dfs.datanode.handler.count variable / The hdfs-site.xml configuration file
- dfs.max.objects parameter / The hdfs-site.xml configuration file
- dfs.name.dir variable / The hdfs-site.xml configuration file
- dfs.name.edits.dir variable / The hdfs-site.xml configuration file
- dfs.replication variable / The hdfs-site.xml configuration file
- dfs.replication.considerLoad variable / The hdfs-site.xml configuration file
- High Performance Computing (HPC) / Using Nagios to monitor Hadoop
- Host Controller Interface (AHCI) option / The Bios tuning checklist
I
- I/O mode
- Direct I/O / Factors affecting the performance of MapReduce
- Streaming I/O / Factors affecting the performance of MapReduce
- immutable objects / Factors affecting the performance of MapReduce
- InputFormat class / Using appropriate Writable types
- io.compression.codec parameter / Using compression
- io.file.buffer.size variable / The core-site.xml configuration file
- io.sort.factor parameter / Reducing spilled records during the Map phase, Improving Reduce execution phase
- io.sort.factor variable / The memory-related parameters
- io.sort.mb parameter / Reducing spilled records during the Map phase
- io.sort.mb variable / The memory-related parameters
- io.sort.record.percent parameter / Reducing spilled records during the Map phase
- io.sort.spill.percent parameter / Reducing spilled records during the Map phase
J
- JobTracker / An overview of Hadoop MapReduce
- JVM (Java Virtual Machine) / Investigating the Hadoop parameters
L
- Limpel-Zif-Oberhumer (LZO) / Using compression
- Logical Volume Management (LVM) / OS configuration recommendations
M
- main() method / Using a MapReduce template class code
- map() function / Using a MapReduce template class code
- map-side bottlenecks
- identifying / Enhancing map tasks
- map function / The MapReduce model
- mappers
- optimizing / Optimizing mappers and reducers code
- Map phase
- profiling / Enhancing map tasks
- mapred-site.xml configuration file
- CPU-related parameters / The CPU-related parameters
- CPU-related variable / The CPU-related parameters
- disk I/O-related parameters / The disk I/O related parameters
- memory-related parameters / The memory-related parameters
- network-related parameters / The network-related parameters
- mapred.child.java.opts parameter / Tuning map and reduce parameters, Reusing types smartly
- Mapred.child.ulimit variable / The memory-related parameters
- mapred.compress.map.output variable / The disk I/O related parameters
- mapred.job.reduce.input.buffer.percent parameter / Improving Reduce execution phase
- mapred.job.reduce.input.buffer.percent variable / The memory-related parameters
- mapred.job.reuse.jvm.num.tasks parameter / Reusing types smartly
- mapred.job.shuffle.input.buffer.percent parameter / Improving Reduce execution phase
- mapred.local.dir variable / The disk I/O related parameters
- mapred.map.output.compression.codec variable / The disk I/O related parameters
- mapred.output.compress variable / The disk I/O related parameters
- mapred.reduce.parallel.copies parameter / Improving Reduce execution phase
- mapred.reduce.parallel.copies variable / The network-related parameters
- mapred.tasktracker.map.tasks.maximum variable / The CPU-related parameters
- mapred.tasktracker.reduce.tasks.maximum variable / The CPU-related parameters
- MapReduce
- job performance / Factors affecting the performance of MapReduce
- mapreduce.map.output.compress.codec parameter / Using compression
- mapreduce.map.output.compress parameter / Using compression
- mapreduce.output.fileoutputformat.compress.codec parameter / Using compression
- mapreduce.output.fileoutputformat.compress.type parameter / Using compression
- mapreduce.output.fileoutputformat.compress parameter / Using compression
- MapReduce job
- launching / Hadoop MapReduce internals
- optimizing / Optimizing mappers and reducers code
- MapReduce job phase
- Compress Input / Using compression
- Compress Mapper output / Using compression
- Compress Reducer output / Using compression
- MapReduce model
- about / The MapReduce model
- map function / The MapReduce model
- reduce function / The MapReduce model
- using / The MapReduce model
- phases / The MapReduce model
- design / The MapReduce model
- diagram / The MapReduce model
- MapReduceTemplate class / Using a MapReduce template class code
- MapReduce template class code
- map tasks
- execution sequence / Enhancing map tasks
- map tasks(mappers) / Hadoop MapReduce internals
- Map tasks performance
- Map tasks performance, enhancing
- input data / Input data and block size impact
- block size / Input data and block size impact
- small files, dealing with / Dealing with small and unsplittable files
- small files, packing options / Dealing with small and unsplittable files
- spilled records, reducing / Reducing spilled records during the Map phase
- map tasks’throughput, calculating / Calculating map tasks' throughput
- map tasksthroughput, calculating / Calculating map tasks' throughput
- master nodes / Configuring your cluster correctly
- merge-sort algorithm / Factors affecting the performance of MapReduce
- Merge phase / Enhancing map tasks
- Metadata size (MS) / Reducing spilled records during the Map phase
- mutable objects / Factors affecting the performance of MapReduce
N
- Nagios
- about / Performance monitoring tools
- URL / Using Nagios to monitor Hadoop
- using, for Hadoop monitoring / Using Nagios to monitor Hadoop
- using, for monitoring perspectives / Using Nagios to monitor Hadoop
- benefits / Using Nagios to monitor Hadoop
- NameNode / An overview of Hadoop MapReduce
- Native Command Queuing mode (NCQ) / The Bios tuning checklist
- network-related parameters
- mapred.reduce.parallel.copies variable / The network-related parameters
- topology.script.file.name (core-site.xml) variable / The network-related parameters
- network bandwidth bottlenecks
- identifying / Identifying network bandwidth bottlenecks
- nodiratime / OS configuration recommendations
O
- off-machine level / The core-site.xml configuration file
- on-machine level / The core-site.xml configuration file
- OS configuration
- recommendations / OS configuration recommendations
P
- performance
- monitoring, tools / Performance monitoring tools
- performance affecting factors, Hadoop MapReduce
- I/O mode / Factors affecting the performance of MapReduce
- input data parsing / Factors affecting the performance of MapReduce
- performance baseline
- creating / Creating a performance baseline
- TeraGen modules / Creating a performance baseline
- TeraSort modules / Creating a performance baseline
- TeraValidate modules / Creating a performance baseline
- performance monitoring, tools
- Chukwa / Using Chukwa to monitor Hadoop
- Ganglia / Using Ganglia to monitor Hadoop
- Nagios / Using Nagios to monitor Hadoop
- Apache Ambari / Using Apache Ambari to monitor Hadoop
- performance tuning
- goal / Performance tuning
- categories / Performance tuning
- of Hadoop MapReduce job / Performance tuning
- steps / Performance tuning
- diagram / Performance tuning
- pseudo formula / Configuring your cluster correctly
R
- rack awareness. concept / The network-related parameters
- RAM bottlenecks
- identifying / Identifying RAM bottlenecks
- read-only default configuration
- readInt() method / Using appropriate Writable types
- Read phase / Enhancing map tasks
- Record length (RL) / Reducing spilled records during the Map phase
- records / Optimizing mappers and reducers code
- reduce () function / Using a MapReduce template class code
- reduce function / The MapReduce model
- Reduce phase
- enhancing, parameters / Improving Reduce execution phase
- Reducer function / Using a MapReduce template class code
- reducers code
- optimizing / Optimizing mappers and reducers code
- Reduce tasks
- enhancing / Enhancing reduce tasks, Calculating reduce tasks' throughput, Improving Reduce execution phase
- phases / Enhancing reduce tasks
- Shuffle phase / Enhancing reduce tasks
- Reduce phase / Enhancing reduce tasks
- Write phase / Enhancing reduce tasks
- reduce tasks(reducers) / Hadoop MapReduce internals
- Reduce tasks, enhancing
- Shuffle phase, profiling / Enhancing reduce tasks
- Reduce phase / Enhancing reduce tasks
- Write phase, profiling / Enhancing reduce tasks
- reduce task throughput, calculating / Calculating reduce tasks' throughput
- Reduce execution phase, improving / Improving Reduce execution phase
- reduce parameters, tuning / Tuning map and reduce parameters
- map parameters, tuning / Tuning map and reduce parameters
- resource bottlenecks
- identifying / Identifying resource bottlenecks
- RAM bottlenecks, identifying / Identifying RAM bottlenecks
- CPU bottlenecks, identifying / Identifying CPU bottlenecks
- storage bottlenecks, identifying / Identifying storage bottlenecks
- network bandwidth bottlenecks, identifying / Identifying network bandwidth bottlenecks
- run() method / Using a MapReduce template class code
S
- Secure Copy Protocol (SCP) / Deploying Hadoop
- Secure Shell (SSH) / Deploying Hadoop
- site-specific configuration
- SkipBadRecords class / Optimizing mappers and reducers code
- small files
- packing, alternatives / Dealing with small and unsplittable files
- Spilled Records size (RS) / Reducing spilled records during the Map phase
- Spill phase / Enhancing map tasks
- split file / Dealing with small and unsplittable files
- storage bottlenecks
- identifying / Identifying storage bottlenecks
- system performance
- analyzing / Identifying network bandwidth bottlenecks
T
- TaskTracker / An overview of Hadoop MapReduce
- tasktrakker.http.threads parameter / Reducing spilled records during the Map phase
- TeraGen modules / Creating a performance baseline
- TeraSort modules / Creating a performance baseline
- TeraValidate modules / Creating a performance baseline
- TestDFSIO benchmark tool
- using / Identifying storage bottlenecks
- output log / Identifying storage bottlenecks
- topology.script.file.name (core-site.xml) variable / The network-related parameters
- Twister / Optimizing mappers and reducers code
- types
- reusing / Reusing types smartly
V
- vmstat tool / Checking for CPU contention
W
- Writable class / Using appropriate Writable types
- WritableComparable class / Using appropriate Writable types
- WritableComparator class / Using appropriate Writable types
- Writable type object
- writing / Using appropriate Writable types