Index
A
- Abstract syntax tree (AST)
- about / The EXPLAIN statement
- ACLs
- on HDFS, URL / Storage-based mode
- Advanced Encryption Standard (AES)
- URL / Encryption
- aggregate functions / Operators and functions
- aggregation
- data aggregation / Basic aggregation – GROUP BY
- without GROUP BY columns / Basic aggregation – GROUP BY
- with GROUP BY columns / Basic aggregation – GROUP BY
- advanced / Advanced aggregation – GROUPING SETS, Advanced aggregation – ROLLUP and CUBE
- ROLLUP statement / Advanced aggregation – ROLLUP and CUBE
- CUBE statement / Advanced aggregation – ROLLUP and CUBE
- condition, HAVING statement / Aggregation condition – HAVING
- Amazon EMR
- analytic functions
- about / Analytic functions
- Function (arg1,..., argn) / Analytic functions
- Standard aggregations / Analytic functions
- RANK / Analytic functions
- DENSE_RANK / Analytic functions
- ROW_NUMBER / Analytic functions
- CUME_DIST / Analytic functions
- PERCENT_RANK / Analytic functions
- NTILE / Analytic functions
- LEAD function / Analytic functions
- LAG function / Analytic functions
- FIRST_VALUE / Analytic functions
- LAST_VALUE / Analytic functions
- window expressions / Analytic functions
- ANALYZE statement
- about / The ANALYZE statement
- ANTLR
- URL / The EXPLAIN statement
- Apache
- used, for installing Hive / Installing Hive from Apache
- Apache Hive
- Wiki, URL / Using the Hive command line and Beeline
- Apache Hive Wiki
- URL / HBase
- Apache JIRA Hive-365
- Atomicity, Consistency, Isolation, and Durability (ACID)
- about / Transactions
- authentication
- about / Authentication
- Metastore server authentication / Metastore server authentication
- HiveServer2 authentication / HiveServer2 authentication
- authorization
- about / Authorization
- legacy mode / Legacy mode
- storage-based mode / Storage-based mode
- SQL standard-based mode / SQL standard-based mode
- Avro
- URL / SerDe
- AvroSerDe / SerDe
- Azure HDInsight Service
B
- batch processing
- Beeline
- using / Using the Hive command line and Beeline
- URL / Using the Hive command line and Beeline
- command-line syntax / Using the Hive command line and Beeline
- big data
- about / Introducing big data
- Volume / Introducing big data
- volume / Introducing big data
- velocity / Introducing big data
- variety / Introducing big data
- veracity / Introducing big data
- variability / Introducing big data
- volatility / Introducing big data
- visualization / Introducing big data
- value / Introducing big data
- block sampling / Sampling
- bucket map join / Bucket map join
- buckets
- about / Hive buckets
- number / Hive buckets
- bucket tables
- about / Bucket tables
- bucket table sampling / Sampling
C
- cloud
- Hive, starting / Starting Hive in the cloud
- Cloudera
- URL / Starting Hive in the cloud
- about / JDBC / ODBC connector
- Cloudera Distributed Hadoop (CDH)
- CLUSTER BY / ORDER and SORT
- collection functions / Operators and functions
- collection item delimiter / Understanding Hive data types
- ColumnarSerDe / SerDe
- CombineFileInputFormat / Storage optimization
- common join, join optimization / Common join
- Common Table Expression (CTE) / Hive internal and external tables
- Common Table Expression (CTE) / Hive internal and external tables
- compression / Compression
- conditional functions / Operators and functions
- Cost-Based Optimizer (CBO)
- about / The ANALYZE statement
- Cost Base Optimizer (CBO) / Hive roadmap
- CREATE TABLE / Hive internal and external tables
- Create the table as select (CTAS) / Hive internal and external tables
- CROSS JOIN statement / The OUTER JOIN and CROSS JOIN statements
- CUBE statement
D
- data aggregation
- about / Basic aggregation – GROUP BY
- database, Hive
- about / Hive database
- data exchange
- LOAD keyword / Data exchange – LOAD
- INSERT keyword / Data exchange – INSERT
- data exchange
- EXPORT statement / Data exchange – EXPORT and IMPORT
- IMPORT statement / Data exchange – EXPORT and IMPORT
- data file optimization
- about / Data file optimization
- file format / File format
- compression / Compression
- storage optimization / Storage optimization
- data type conversions
- about / Data type conversions
- primitive type conversion / Data type conversions
- explicit type conversion / Data type conversions
- data type functions tips, complex / Operators and functions
- data types, Hive
- about / Understanding Hive data types
- TINYINT / Understanding Hive data types
- SMALLINT / Understanding Hive data types
- INT / Understanding Hive data types
- BIGINT / Understanding Hive data types
- FLOAT / Understanding Hive data types
- DOUBLE / Understanding Hive data types
- DECIMAL / Understanding Hive data types
- BINARY / Understanding Hive data types
- BOOLEAN / Understanding Hive data types
- STRING / Understanding Hive data types
- CHAR / Understanding Hive data types
- VARCHAR / Understanding Hive data types
- DATE / Understanding Hive data types
- TIMESTAMP / Understanding Hive data types
- date functions / Operators and functions
- date function tips / Operators and functions
- delimiters
- row delimiter / Understanding Hive data types
- collection item delimiter / Understanding Hive data types
- map key delimiter / Understanding Hive data types
- deployment / Development and deployment
- Derby
- design optimization
- about / Design optimization
- partition tables / Partition tables
- bucket tables / Bucket tables
- index / Index
- development / Development and deployment
- Directed Acyclical Graph (DAG) / Oozie
- directed acyclic graphs (DAGs) / Index
- DISTRIBUTE BY / ORDER and SORT
E
- encryption
- about / Encryption
- EXPLAIN statement
- about / The EXPLAIN statement
- EXTENDED keyword / The EXPLAIN statement
- DEPENDENCY keyword / The EXPLAIN statement
- AUTHORIZATION keyword / The EXPLAIN statement
- explicit type conversion / Data type conversions
- EXPORT statement / Data exchange – EXPORT and IMPORT
- external tables / Hive internal and external tables
F
- file format, data file optimization
- about / File format
- TEXTFILE / File format
- SEQUENCEFILE / File format
- RCFILE / File format
- Optimized Row Columnar (ORC) / File format
- PARQUET / File format
- Flume / Overview of the Hadoop ecosystem
- functions
- about / Operators and functions
- mathematical functions / Operators and functions
- collection functions / Operators and functions
- type conversion functions / Operators and functions
- date functions / Operators and functions
- conditional functions / Operators and functions
- string functions / Operators and functions
- aggregate functions / Operators and functions
- table-generating functions / Operators and functions
- customized / Operators and functions
- complex data type functions tips / Operators and functions
- date function tips / Operators and functions
- CASE, for datatypes / Operators and functions
- parser and search tips / Operators and functions
- virtual columns / Operators and functions
G
- GenericUDAF
- URL / The UDAF code template
- GROUPING SETS keyword
H
- Hadoop
- versus relational database / Relational and NoSQL database versus Hadoop
- versus NoSQL database / Relational and NoSQL database versus Hadoop
- Hadoop Archive
- and HAR / Storage optimization
- Hadoop Archive File (HAR) / File format
- Hadoop ecosystem
- about / Overview of the Hadoop ecosystem
- HAVING statement
- about / Aggregation condition – HAVING
- HBase
- HBaseSerDe / SerDe
- HCatalog
- HDFS
- HDFS federation / Storage optimization
- Hive
- about / Hive overview
- installing, from Apache / Installing Hive from Apache
- URL / Installing Hive from Apache
- installing, from vendor packages / Installing Hive from vendor packages
- starting, in cloud / Starting Hive in the cloud
- data types / Understanding Hive data types
- complex types / Understanding Hive data types
- types / Understanding Hive data types
- database / Hive database
- internal tables / Hive internal and external tables
- external tables / Hive internal and external tables
- partitions / Hive partitions
- buckets / Hive buckets
- views / Hive views
- performance utilities / Performance utilities
- Hive, complex types
- ARRAY / Understanding Hive data types
- MAP / Understanding Hive data types
- STRUCT / Understanding Hive data types
- NAMED STRUCT / Understanding Hive data types
- UNION / Understanding Hive data types
- Hive-integrated development environment (IDE)
- hive.map.aggr property / Basic aggregation – GROUP BY
- Hive CLI
- command-line syntax / Using the Hive command line and Beeline
- URL / Using the Hive command line and Beeline
- Hive command line
- Hive Data Definition Language (DDL)
- about / Hive Data Definition Language
- Hive join optimization
- URL / Skew join
- Hive roadmap
- about / Hive roadmap
- HiveServer2
- HiveServer2 authentication
- none authentication / HiveServer2 authentication
- Kerberos authentication / HiveServer2 authentication
- LDAP authentication / HiveServer2 authentication
- pluggable custom authentication / HiveServer2 authentication
- Pluggable Authentication Modules (PAM) authentication / HiveServer2 authentication
- Hive Wiki
- URL / Operators and functions
- Hortonworks
- URL / JDBC / ODBC connector
- HQL
- about / Hive overview
- Hue
- URL / The Hive-integrated development environment, Hue
- about / Hue
I
- Impala
- URL / A short history
- IMPORT statement / Data exchange – EXPORT and IMPORT
- index
- about / Index
- INNER JOIN statement / The INNER JOIN statement
- INSERT keyword / Data exchange – INSERT
- internal tables / Hive internal and external tables
J
- Java IDE
- Java Virtual Machine (JVM) / Batch, real-time, and stream processing
- javax.script API
- URL / User-defined functions
- JDBC/ODBC connector
- about / JDBC / ODBC connector
- job and query optimization
- about / Job and query optimization
- local mode / Local mode
- JVM reuse / JVM reuse
- parallel execution / Parallel execution
- join optimization
- about / Join optimization
- common join / Common join
- map join / Map join
- bucket map join / Bucket map join
- Sort merge bucket (SMB) join / Sort merge bucket (SMB) join
- Sort merge bucket map (SMBM) join / Sort merge bucket map (SMBM) join
- skew join / Skew join
- JSONSerDe
- JVM reuse, job and query optimization / JVM reuse
K
- Kerberos
- about / Authentication
- Kerberos authentication / HiveServer2 authentication
- Key Distribution Center (KDC) / Authentication
L
- LazySimpleSerDe / SerDe
- LDAP authentication / HiveServer2 authentication
- legacy mode, authorization
- about / Legacy mode
- Live Long And Process (LLAP) / Hive roadmap
- LOAD keyword / Data exchange – LOAD
- local mode, job and query optimization / Local mode
M
- map join, join optimization / Map join
- MAPJOIN statement / Special JOIN – MAPJOIN
- map key delimiter / Understanding Hive data types
- mathematical functions / Operators and functions
- Maven
- metastore / Hive overview
- Metastore server authentication
- about / Metastore server authentication
- MIT Kerberos
- URL / Authentication
- MySQL
N
- none authentication / HiveServer2 authentication
- NoSQL database
- versus Hadoop / Relational and NoSQL database versus Hadoop
O
- Oozie
- OpenCSVSerDe / SerDe
- operators
- about / Operators and functions
- Optimized Row Columnar (ORC) / Index, File format
- Optimized Row Columnar (ORC) file
- about / Transactions
- ORDER BY (ASC|DESC) keyword / ORDER and SORT
- ORDER keyword / ORDER and SORT
- OUTER JOIN statement / The OUTER JOIN and CROSS JOIN statements
- Out Of Memory (OOM) exceptions / The INNER JOIN statement
P
- parallel execution, job and query optimization / Parallel execution
- ParquetHiveSerDe / SerDe
- parser and search tips / Operators and functions
- PARTITION BY statement / Analytic functions
- partitions
- about / Hive partitions
- partition tables
- by date and time / Partition tables
- by locations / Partition tables
- by business logics / Partition tables
- personal identity information (PII)
- about / Encryption
- Phoenix
- URL / HBase
- Pluggable Authentication Modules (PAM) authentication / HiveServer2 authentication
- pluggable custom authentication / HiveServer2 authentication
- PostgreSQL
- Presto
- URL / A short history
- primitive type conversion / Data type conversions
- Processing Elements (PE) / Batch, real-time, and stream processing
R
- random sampling
- URL / Sampling
- real-time processing
- Record Columnar File (RCFILE) / File format
- RegexSerDe / SerDe
- relational database
- versus Hadoop / Relational and NoSQL database versus Hadoop
- ROLLUP statement
- row delimiter / Understanding Hive data types
S
- sampling
- SELECT * statement / The SELECT statement
- SELECT statement / The SELECT statement
- Sentry
- URL / SQL standard-based mode
- SequenceFile format / Storage optimization
- SerDe
- SHOW TRANSACTIONS command / Transactions
- Simple Authentication and Security Layer (SASL) framework / Metastore server authentication
- skew join / Skew join
- SORT BY (ASC|DESC) keyword / ORDER and SORT
- SORT keyword / ORDER and SORT
- sort merge bucket (SMB) join / Sort merge bucket (SMB) join
- sort merge bucket map (SMBM) join / Sort merge bucket map (SMBM) join
- Spark / Overview of the Hadoop ecosystem
- SQLLine
- SQL standard-based mode, authorization
- about / SQL standard-based mode
- Sqoop / Overview of the Hadoop ecosystem
- stage dependencies
- about / The EXPLAIN statement
- stage plans
- about / The EXPLAIN statement
- storage-based mode, authorization
- about / Storage-based mode
- storage optimization / Storage optimization
- Storm
- streaming
- about / Streaming
- stream processing
- string functions / Operators and functions
- Structured Query Language (SQL)
- about / A short history
T
- table-generating functions / Operators and functions
- Tez / Overview of the Hadoop ecosystem
- transactions
- about / Transactions
- type conversion functions / Operators and functions
U
- UDAF
- code, template / The UDAF code template
- UDAFs
- about / User-defined functions
- UDF
- code, template / The UDF code template
- UDFs
- about / User-defined functions
- UDTF
- code, template / The UDTF code template
- UDTFs
- about / User-defined functions
- Uniform Resource Identifier (URI) / Data exchange – LOAD
- UNION ALL statement / Set operation – UNION ALL
V
- value / Introducing big data
- variability / Introducing big data
- variety / Introducing big data
- Vectorization optimization
- velocity / Introducing big data
- vendor packages
- used, for installing Hive / Installing Hive from vendor packages
- veracity / Introducing big data
- views
- about / Hive views
- altering / Hive views
- redefining / Hive views
- dropping / Hive views
- virtual columns / Operators and functions
- visualization / Introducing big data
- volatility / Introducing big data
- volume / Introducing big data
W
- WHERE clauses
- subqueries, restrictions / The SELECT statement
- window expressions
- BETWEEN AND clause / Analytic functions
- N PRECEDING or FOLLOWING / Analytic functions
- UNBOUNDED PRECEDING / Analytic functions
- UNBOUNDED FOLLOWING / Analytic functions
- UNBOUNDED PRECEDING AND UNBOUNED FOLLOWING / Analytic functions
- CURRENT ROW / Analytic functions
- URL / Analytic functions