Index
A
- acceptance testing / Introduction to testing
- acceptance tests, TDD
- decomposing / Defining acceptance tests
- access patterns, SQL
- SELECT / SQL databases
- INSERT / SQL databases
- UPDATE / SQL databases
- UPSERT / SQL databases
- DELETE / SQL databases
- addTrap operation
- about / Pipe operations
- Ad targeting
- about / Exploring ad targeting
- daily points calculation / Calculating daily points
- historic points calculation / Calculating historic points
- targeted ads generation / Generating targeted ads
- advanced serialization files
- about / Reading and writing files
- Algebird abstract algebra library
- using / Other libraries
- algorithm, TDD
- decomposing / Decomposing the algorithm
- Apache Lucene library
- about / Search platforms
- Azkaban
- about / Scheduling execution
B
- black box testing
- about / Black box testing
- benefit / Black box testing
C
- Cascading
- about / MapReduce abstractions
- working with / Introducing Cascading
- pipe / What happens inside a pipe
- pipe assemblies / Pipe assemblies
- extensions / Cascading extensions
- Cassandra
- about / NoSQL databases
- ClusterWritable object / K-Means using Mahout
- Comma Separated Values (CSV)
- about / Reading and writing files
- composite operations
- unique operation / Composite operations
- crossWithTiny operation / Composite operations
- normalize operation / Composite operations
- partition operation / Composite operations
- Concurrent
- about / Monitoring Scalding jobs
- configuration
- performing, Hadoop parameters used / Configuring using Hadoop parameters
- configuration data
- reading, from property file / Configuring using a property file
- cron
- about / Scheduling execution
- crossWithTiny operation
- about / Composite operations
D
- daily points
- calculating / Calculating daily points
- DataNode nodes
- about / The Hadoop platform
- debug operation
- about / Pipe operations
- delimited files
- about / Reading and writing files
- Dependency Injection pattern
- about / The dependency injection pattern
- implementing / The dependency injection pattern
- development editors, Scala
- about / Development editors
- discard operation
- about / Map-like operations
- domain-specific language (DSL) / Pipe assemblies
- dot group operation
- about / Operations on groups
- Driven
- about / Monitoring Scalding jobs
- URL / Monitoring Scalding jobs
E
- ElaphantDB
- about / NoSQL databases
- ElasticSearch
- about / Search platforms, Elastic search
- URL / Elastic search
- Scalding wrapper, implementing for / Elastic search
- advanced search tap, URL / Elastic search
- Euclidean
- using / K-Means using Mahout
- execution
- scheduling / Scheduling execution
- execution throttling
- scalding / Scalding execution throttling
- Extension Methods
- working, URL / The external operations pattern
- external operations pattern
- about / The external operations pattern
- LogsSchemas object, creating / The external operations pattern
- implementing / The external operations pattern
- Scalding job responsibilities / The external operations pattern
- external systems
- interacting with / Interacting with external systems
F
- file formats, Scalding
- TextLine format / Reading and writing files
- delimited files / Reading and writing files
- advanced serialization files / Reading and writing files
- files
- reading, with Scalding / Reading and writing files, TextLine parsing, Executing in the local and Hadoop modes
- writing, with Scalding / Reading and writing files, TextLine parsing, Executing in the local and Hadoop modes
- reading, best practices / Best practices to read and write files
- writing, best practices / Best practices to read and write files
- filter operation
- about / Map-like operations
- finalized job scalability
- analyzing / Completing the implementation, Exploring ad targeting
- flatMap function
- about / Scala basics
- flatMap operation
- about / Map-like operations
- flatMapTo operation
- about / Map-like operations
- foldLeft group operation
- about / Operations on groups
- function literal, Scala
- about / Scala basics
G
- Ganitha
- about / Other libraries
- URL / Other libraries
- groupAll operation
- about / Grouping/reducing functions
- groupBy function
- about / Scala basics
- groupBy operation
- about / Grouping/reducing functions
- grouping operations
- about / Grouping/reducing functions
- groupBy operation / Grouping/reducing functions
- groupAll operation / Grouping/reducing functions
- group operations
- sizeAveStdev / Operations on groups
- toList / Operations on groups
- sortBy / Operations on groups
- last / Operations on groups
- take / Operations on groups
- takeWhile / Operations on groups
- drop / Operations on groups
- sortWithTake / Operations on groups
- sortedReverseTake / Operations on groups
- pivot / Operations on groups
- reducers / Operations on groups
- reduce / Operations on groups
- foldLeft / Operations on groups
- dot / Operations on groups
- histogram / Operations on groups
- hyperLogLog / Operations on groups
- groups
- operations, performing on / Operations on groups
- composite operations, performing on / Composite operations
H
- Hadoop
- installing / Installing Hadoop in five minutes
- Scalding job, submitting into / Submitting a Scalding job in Hadoop
- Hadoop cluster
- Scalding, executing in / Executing Scalding in a Hadoop cluster
- Hadoop parameters
- used, for configuration / Configuring using Hadoop parameters
- Hadoop platform
- about / The Hadoop platform
- HBase
- about / NoSQL databases, Understanding HBase
- reading from / Reading from HBase
- writing in / Writing in HBase
- advanced features, using / Using advanced HBase features
- HDFS
- about / The Hadoop platform
- HDFS mode
- used, for executing Scalding application / Executing in the local and Hadoop modes
- head group operation
- about / Operations on groups
- Hello World application
- executing, in Scala / Hello World in Scala
- histogram group operation
- about / Operations on groups
- historic points
- calculating / Calculating historic points
- hyperLogLog group operation
- about / Operations on groups
I
- insert operation
- about / Map-like operations
- integration testing / Introduction to testing
- integration tests, TDD
- implementing / Implementing integration tests
J
- Jaccard index
- used, for setting text similarity / Setting a similarity using the Jaccard index, K-Means using Mahout
- JDBC (Java Database Connectivity) / SQL databases
- Jenkins
- about / Scheduling execution
- job execution
- coordinating / Coordinating job execution
- JobLibLoader class
- about / Using slim JAR files
- JobRunner class
- about / Using slim JAR files
- job scheduling
- tools / Scheduling execution
- JobTracker
- about / The Hadoop platform
- used, for submitting Scalding job / Submitting a Scalding job in Hadoop
- URL / Scalding execution throttling
- join operations
- about / Join operations
- joinWithSmaller / Join operations
- joinWithLarger / Join operations
- joinWithTiny / Join operations
- joinWithLarger operation
- about / Join operations
- joinWithSmaller operation
- about / Join operations
- syntax / Join operations
- joinWithTiny operation
- about / Join operations
K
- K-Means
- about / K-Means using Mahout
- implementing, Mahout used / K-Means using Mahout
- URL / K-Means using Mahout
L
- last group operation
- about / Operations on groups
- Late Bound Dependency pattern
- about / The late bound dependency pattern
- implementing / The late bound dependency pattern
- left join
- about / Join operations
- limit operation
- about / Map-like operations
- lists
- about / Scala basics
- higher-order functions / Scala basics
- Locality Sensitive Hashing (LSH) / Other libraries
- logfile analysis
- about / Logfile analysis
- data-transformation jobs, implementing / Logfile analysis
- data-transformation jobs, executing / Logfile analysis
- bucketing and binning / Logfile analysis
- data-processing job, completing / Logfile analysis
- implementation, completing / Completing the implementation, Exploring ad targeting
- logsAddDayColumn operation
- defining / The external operations pattern
- logsCountVisits operation
- defining / The external operations pattern
M
- Mahout
- used, for K-Means implementation / K-Means using Mahout
- map-like operations
- about / Map-like operations
- map operation / Map-like operations
- mapTo operation / Map-like operations
- flatMap operation / Map-like operations
- flatMapTo operation / Map-like operations
- unpivot operation / Map-like operations
- pivot operation / Map-like operations
- project operation / Map-like operations
- discard operation / Map-like operations
- insert operation / Map-like operations
- limit operation / Map-like operations
- filter operation / Map-like operations
- sample operation / Map-like operations
- pack operation / Map-like operations
- unpack operation / Map-like operations
- map operation
- about / Map-like operations
- MapReduce
- about / The Hadoop platform
- shared nothing architecture / MapReduce
- working, example / A MapReduce example
- testing challenges / MapReduce testing challenges
- MapReduce abstractions
- about / MapReduce abstractions
- Cascading / MapReduce abstractions
- MapReduce logic, TDD
- implementing / Implementing the MapReduce logic
- mapTo operation
- about / Map-like operations
- maven-assembly-plugin / Using slim JAR files
- mkString operation
- about / Operations on groups
- MongoDB
- about / NoSQL databases
- mvn package / Using slim JAR files
N
- NameNode
- about / The Hadoop platform
- NameNode service / Monitoring Scalding jobs
- NameNode web interface / Submitting a Scalding job in Hadoop
- name operation
- about / Pipe operations
- normalize operation
- about / Composite operations
- NoSQL databases
- about / NoSQL databases
- MongoDB / NoSQL databases
- Cassandra / NoSQL databases
- ElaphantDB / NoSQL databases
- HBase / NoSQL databases
O
- One Separated Values (OSV)
- about / Reading and writing files
- Oozie
- about / Scheduling execution
- outer join
- about / Join operations
- outlier detection
- about / K-Means using Mahout
P
- pack operation
- about / Map-like operations
- partition operation
- about / Composite operations
- Pig
- about / MapReduce abstractions
- using / MapReduce abstractions
- pipe assemblies
- about / Pipe assemblies
- Each / Pipe assemblies
- GroupBy / Pipe assemblies
- Every / Pipe assemblies
- CoGroup / Pipe assemblies
- SubAssembly / Pipe assemblies
- pipe operations
- about / Pipe operations
- pipes
- about / Introducing Cascading, What happens inside a pipe
- implementing / Pipe assemblies
- reusing / A simple example
- pivot group operation
- about / Operations on groups
- POJO
- about / Map-like operations
- project operation
- about / Map-like operations
- property file
- configuration data, reading from / Configuring using a property file
R
- reduce group operation
- about / Operations on groups
- reducers group operation
- about / Operations on groups
- rename operation
- about / Pipe operations
- right join
- about / Join operations
S
- sample operation
- about / Map-like operations
- Scala
- about / Why Scala?
- significance / Why Scala?
- basics / Scala basics
- trait / Scala basics
- lists / Scala basics
- tuples / Scala basics
- mehtods / Scala basics
- function literals / Scala basics
- Hello World application, executing in / Hello World in Scala
- Scala build tools
- about / Scala build tools
- Scala functions
- flatMap / Scala basics
- groupBy / Scala basics
- Scala IDE
- URL / Development editors
- Scalding
- used, for reading files / Reading and writing files
- used, for writing files / Reading and writing files, Best practices to read and write files
- TextLine parsing / TextLine parsing
- executing, in HDFS mode / Executing in the local and Hadoop modes
- executing, in local mode / Executing in the local and Hadoop modes
- core capabilities / Understanding the core capabilities of Scalding, Map-like operations, Join operations, Grouping/reducing functions
- executing, in Hadoop cluster / Executing Scalding in a Hadoop cluster
- Scalding core capabilities
- map-like operations / Map-like operations
- join operations / Join operations
- pipe operations / Pipe operations
- grouping/reducing functions / Grouping/reducing functions
- Scalding job
- running / Running our first Scalding job
- submitting, into Hadoop / Submitting a Scalding job in Hadoop
- Scalding jobs
- monitoring / Monitoring Scalding jobs
- ScaldingUnit framework / Implementing unit tests
- scanLeft operation
- about / Calculating daily points
- running / Calculating daily points
- search platforms
- about / Search platforms
- Elasticsearch / Elastic search
- shared nothing architecture, MapReduce
- about / MapReduce
- Shuffle
- about / A MapReduce example
- Simple Build Tool (sbt)
- about / Scala build tools
- sizeAveStdev group operation
- about / Operations on groups
- slim JAR files
- using / Using slim JAR files
- software testing / Introduction to testing
- Solr
- about / Search platforms
- sortBy group operation
- about / Operations on groups
- sortedReverseTake group operation
- about / Operations on groups
- sortWithTake group operation
- about / Operations on groups
- SpyGlass
- URL / Reading from HBase
- used, for reading data from HBase / Reading from HBase
- used, for wrting data to HBase / Writing in HBase
- SQL databases
- using / SQL databases
- access patterns / SQL databases
- SQL dialects
- using / SQL databases
- system testing / Introduction to testing
- system tests, TDD
- defining / Defining and performing system tests
- performing / Defining and performing system tests
T
- Tab Separated Values (TSV)
- about / Reading and writing files
- take group operation
- about / Operations on groups
- takeWhile group operation
- about / Operations on groups
- targeted ads
- generating / Generating targeted ads
- TaskTracker nodes
- about / The Hadoop platform
- TDD
- implementing / Implementing the TDD methodology
- FOR Scalding developers / Implementing the TDD methodology, Implementing integration tests
- algorithm, decomposing / Decomposing the algorithm
- acceptance tests, defining / Defining acceptance tests
- integration tests tests, defining / Implementing integration tests
- unit tests, implementing / Implementing unit tests
- MapReduce logic, implementing / Implementing the MapReduce logic
- system tests, defining / Defining and performing system tests
- testing strategy
- data science phase, data exploration / Development lifecycle with testing strategy
- data science phase, whiteboard design / Development lifecycle with testing strategy
- development tasks, TDD implementation / Development lifecycle with testing strategy
- development tasks, production deployment and monitoring / Development lifecycle with testing strategy
- TextLine format
- about / Reading and writing files
- TextLine parsing
- about / TextLine parsing
- example / TextLine parsing
- text similarity
- computing, TF-IDF used / Text similarity using TF-IDF
- setting, Jaccard index used / Setting a similarity using the Jaccard index, K-Means using Mahout
- TF-IDF
- about / Text similarity using TF-IDF
- used, for text similarity / Text similarity using TF-IDF
- toList group operation
- about / Operations on groups
- toList operation
- about / Calculating daily points
- tools, for job scheduling
- cron / Scheduling execution
- Jenkins / Scheduling execution
- Oozie / Scheduling execution
- Azkaban / Scheduling execution
- trait
- about / Scala basics
- tuples, Scala
- about / Scala basics
- Typed API
- about / Typed API
U
- unique operation
- about / Composite operations
- Unit/component testing / Introduction to testing
- unit tests, TDD
- implementing / Implementing unit tests
- unpack operation
- about / Map-like operations
- unpivot group operation
- about / Operations on groups
- user-defined functions (UDF)
- about / MapReduce abstractions
W
- WritableSequenceFile object / K-Means using Mahout
Z
- ZooKeeper
- about / The Hadoop platform