Index
A
- ALTER FUNCTION command / Creating, modifying, and dropping functions
- analytical data / Classification
- ANALYZE function / The ANALYZE function
- Apache Sqoop
- URL / Sqoop 2
- Apriori algorithm
- about / The Apriori algorithm
- architecture, HDFS / Hadoop Distributed File System (HDFS)
- architecture, UAP / Core architecture concepts
- data warehousing / Data warehousing
- column-oriented database / Column-oriented databases
- parallel processing systems / Parallel versus distributed computing/processing
- distributed processing systems / Parallel versus distributed computing/processing
- massive parallel processing (MPP) systems / Shared nothing, massive parallel processing (MPP) systems, and elastic scalability
- elastic scalability / Shared nothing, massive parallel processing (MPP) systems, and elastic scalability
- shared nothing data architecture / Shared nothing, massive parallel processing (MPP) systems, and elastic scalability
- association rules
- about / Association rules
- Apriori algorithm / The Apriori algorithm
- attributes
- about / Multi-structured data
B
- BI (Business Intelligence) / Data Integration Accelerator (DIA) modules
- Big Data
- about / Big Data
- properties / So, what is Big Data?
- data formats / Multi-structured data
- Big Data analytics
- requisites / Big Data analytics – platform requirements
- branches
- decision branch / Decision trees
- event branch / Decision trees
- broadcast motion
- BulkLoader CLI / Greenplum BulkLoader for Hadoop
- BulkLoader CLI node / Greenplum BulkLoader for Hadoop
- BulkLoader manager / Greenplum BulkLoader for Hadoop
- BulkLoader scheduler / Greenplum BulkLoader for Hadoop
- Business Intelligence (BI)
- about / Data analytics
- business problem
- stating / Phase 1 – state business problem
C
- C4.5 / Decision trees
- CART / Decision trees
- Chorus
- classification / Classification
- client programs
- clustering / Clustering
- column-oriented database
- about / Column-oriented databases
- Command Center
- about / Command Center
- Command Center database
- functions, executing / Monitoring DCA
- components, UAP / Core components, Greenplum UAP components
- Greenplum Database / Greenplum Database, Greenplum Database
- HD / Hadoop (HD), Hadoop (HD)
- Chorus / Chorus, Chorus
- Command Center / Command Center
- Compute layer / Data Computing Appliance (DCA)
- Confidence
- about / Association rules
- configuration data / Classification
- COPY command / Greenplum data loading options
- CREATE FUNCTION command / Creating, modifying, and dropping functions
- CSV (Comma Separated Values) / High-speed data loading using external tables
D
- data
- setting up / Phase 2 – set up data
- loading, techniques / Phase 3 – explore/transform data
- sourcing, from Greenplum / Sourcing large volumes of data from Greenplum
- skewing / Data skew and performance
- data analytics
- about / Data analytics
- drivers / Data analytics
- techniques / Data analytics
- paradigms / Analytic paradigms
- modeling methods / Modeling methods
- data analytics, techniques / Analytics classified
- descriptive analytics / Data analytics
- predictive analytics / Data analytics
- specialized analytics / Data analytics
- forecasting / Analytics classified, Forecasting or prediction or regression
- prediction / Analytics classified, Forecasting or prediction or regression
- regression / Analytics classified, Forecasting or prediction or regression
- clustering / Analytics classified, Clustering
- optimization / Analytics classified, Optimization
- simulations / Analytics classified, Simulations
- classification / Classification
- usage / Simulations
- Database layer / Data Computing Appliance (DCA)
- database modules
- about / Database modules
- data distribution
- about / Data distribution
- hash distribution / Data distribution
- round robin distribution / Data distribution
- data exploration
- about / Phase 3 – explore/transform data
- data formats, Big Data / Multi-structured data
- structured / Multi-structured data
- semi-structured / Multi-structured data
- unstructured / Multi-structured data
- Data Integration (DI)
- about / Data analytics
- Data Integration Accelerator (DIA) / Using external ETL to load data into Greenplum
- data loading
- patterns / Data loading patterns
- external tables, used / High-speed data loading using external tables
- data loading, patterns
- ETL / Data loading patterns
- ELT / Data loading patterns
- ETLT / Data loading patterns
- data redundancy
- components, implementing / The Greenplum high-availability architecture
- data science
- about / Data science
- data science life cycle
- about / Data science life cycle
- business problem, stating / Phase 1 – state business problem
- data, setting up / Phase 2 – set up data
- data exploration / Phase 3 – explore/transform data
- data transformation / Phase 3 – explore/transform data
- model, designing / Phase 4 – model
- model, executing / Phase 4 – model
- publish insights / Phase 5 – publish insights
- effectiveness, measuring / Phase 6 – measure effectiveness
- data streams / Parallel versus distributed computing/processing
- data transformation
- about / Phase 3 – explore/transform data
- data warehouse
- about / Data warehousing
- data, characteristics / Data warehousing
- data warehousing
- about / Data warehousing
- Data Warehousing (DW)
- about / Data analytics
- DBI Connector
- about / DBI Connector for R
- DCA
- about / Greenplum Data Computing Appliance (DCA), Data Computing Appliance (DCA)
- layer / Data Computing Appliance (DCA)
- Compute layer / Data Computing Appliance (DCA)
- Storage layer / Data Computing Appliance (DCA)
- Database layer / Data Computing Appliance (DCA)
- Network layer / Data Computing Appliance (DCA)
- module / Data Computing Appliance (DCA)
- Greenplum Database Compute module / Data Computing Appliance (DCA)
- Greenplum Database Standard module / Data Computing Appliance (DCA)
- HD module / Data Computing Appliance (DCA)
- HD Compute module / Data Computing Appliance (DCA)
- DIA module / Data Computing Appliance (DCA)
- master server RAID configuration / Master server RAID configurations
- segment server RAID configuration / Segment server RAID configurations
- monitoring / Monitoring DCA
- decision branch / Decision trees
- decision node / Decision trees
- decision tree
- about / Decision trees
- node / Decision trees
- branches / Decision trees
- descriptive analytics
- about / Data analytics
- DIA
- DIA module / Data Computing Appliance (DCA)
- DIA modules
- Distributed Files Systems (DFS)
- about / Data analytics
- distributed processing systems
- about / Parallel versus distributed computing/processing
- vs, parallel processing systems / Parallel versus distributed computing/processing
- distribution key
- about / Data distribution
- DROP FUNCTION command / Creating, modifying, and dropping functions
- dual interconnect switches
- Dynamic Pipelining
- about / Dynamic Pipelining in Greenplum
- features / Dynamic Pipelining in Greenplum
E
- effectiveness, data science life cycle
- measuring / Phase 6 – measure effectiveness
- elastic scalability
- ELT
- enterprise data
- about / Enterprise data
- classification / Classification
- features / Features
- enterprise data, classification
- transactional data / Classification
- master data / Classification
- reference data / Classification
- analytical data / Classification
- configuration data / Classification
- historic data / Classification
- transitional data / Classification
- ETL / Data Integration Accelerator (DIA) modules
- about / Data loading patterns
- vs, ELT and ETLT / Data loading patterns
- ETLT
- event branch / Decision trees
- event node / Decision trees
- EXECUTE clause / External tables
- EXPLAIN function
- about / The EXPLAIN function
- external ETL
- used, for loading data into Greenplum / Using external ETL to load data into Greenplum
- external tables / Greenplum data loading options
- used, for data loading / High-speed data loading using external tables
- writable external tables / External table types, External tables
- readable external tables / External table types, External tables
- about / External tables
- file formats / External tables
F
- features, enterprise data / Features
- file
- // / External tables
- Flume
- about / Hadoop (HD)
- forecasting / Forecasting or prediction or regression
G
- Gini index / Decision trees
- gpcheckperf utility / Monitoring DCA
- gpfdist / External tables
- gpfdists / External tables
- gpfdist utility / Data loading patterns, High-speed data loading using external tables, gpfdist
- gphdfs / External tables
- gpload utility / Data loading patterns, Greenplum data loading options, gpload
- gppkg utility / Using MADlib with Greenplum
- gpstate utility / Monitoring DCA
- Greenplum
- high-availability architecture / The Greenplum high-availability architecture
- external tables / External table types, External tables
- gpfdist utility / gpfdist
- gpload utility / gpload
- data loading, external ETL used / Using external ETL to load data into Greenplum
- data, sourcing from / Sourcing large volumes of data from Greenplum
- unsupported data types / Unsupported Greenplum data types
- table partitioning / Greenplum table distribution and partitioning, Partitioning
- table distribution / Greenplum table distribution and partitioning, Distribution
- in-database analytics, options / In-database analytics options (Greenplum-specific)
- R, using with / Using R with Greenplum
- Weka, using with / Using Weka with Greenplum
- MADlib, using with / Using MADlib with Greenplum
- Greenplum BulkLoader / Greenplum BulkLoader for Hadoop
- component / Greenplum BulkLoader for Hadoop
- Greenplum BulkLoader, component
- BulkLoader manager / Greenplum BulkLoader for Hadoop
- BulkLoader scheduler / Greenplum BulkLoader for Hadoop
- BulkLoader CLI / Greenplum BulkLoader for Hadoop
- Greenplum Chorus
- using / Using Greenplum Chorus
- data types / Using Greenplum Chorus
- Greenplum Database
- about / Greenplum Database, Greenplum Database
- physical architecture / The Greenplum Database physical architecture
- data loading, external tables used / High-speed data loading using external tables
- polymorphic data storage / Polymorphic data storage and historic data management
- historic data management / Polymorphic data storage and historic data management
- data distribution / Data distribution
- data, loading / Data loading for Greenplum Database and HD
- data loading, options / Greenplum data loading options
- querying / Querying Greenplum Database and HD
- queries, analyzing / The ANALYZE function
- queries, optimizing / The ANALYZE function
- Dynamic Pipelining / Dynamic Pipelining in Greenplum
- data communication, with Hadoop / Data communication between Greenplum Database and Hadoop (using external tables)
- Greenplum Database Compute module / Data Computing Appliance (DCA)
- Greenplum Database management
- about / Greenplum Database management
- Greenplum Database Standard module / Data Computing Appliance (DCA)
- Greenplum Data Loader
- about / Greenplum BulkLoader for Hadoop
- Greenplum Data Loader cluster
- master node / Greenplum BulkLoader for Hadoop
- slave node / Greenplum BulkLoader for Hadoop
- BulkLoader CLI node / Greenplum BulkLoader for Hadoop
- Greenplum target
- configuration / Greenplum target configuration
H
- Hadoop
- data communication, with Greenplum Database / Data communication between Greenplum Database and Hadoop (using external tables)
- Hadoop MapReduce
- about / Hadoop MapReduce
- Hash distribution
- about / Distribution
- hash distribution
- about / Data distribution
- HBase
- about / Hadoop (HD)
- HD
- about / Hadoop (HD), Hadoop (HD)
- characteristics / Hadoop (HD)
- data, loading / Data loading for Greenplum Database and HD
- data loading, options / Hadoop (HD) data loading options
- Sqoop 2 / Sqoop 2
- Greenplum BulkLoader / Greenplum BulkLoader for Hadoop
- querying / Querying Greenplum Database and HD
- HD Compute module / Data Computing Appliance (DCA)
- HDFS
- about / Hadoop (HD), Hadoop Distributed File System (HDFS)
- architecture / Hadoop Distributed File System (HDFS)
- querying / Querying HDFS
- Hive / Hive
- Pig / Pig
- HD module / Data Computing Appliance (DCA)
- HD modules
- about / HD modules
- historic data / Classification
- historic data management
- Hive
- about / Hadoop (HD)
I
- in-database analytics
- options / In-database analytics options (Greenplum-specific)
- window function / Window functions
- user-defined aggregates / User-defined aggregates
- Informatica / Using external ETL to load data into Greenplum
- INSERT command / Greenplum data loading options
- installation, MADlib / In-database analytics using MADlib
- instruction streams / Parallel versus distributed computing/processing
- interconnect
- Itemset
- about / Association rules
J
- JDBC drivers / The Greenplum Database physical architecture
K
- K-means clustering
- about / K-means clustering
L
- libpq / The Greenplum Database physical architecture
- linear regression
- about / Linear regression
- limitations / Linear regression
- LOCATION clause / External tables
- logistic regression
- about / Logistic regression
M
- MADlib
- about / In-database analytics using MADlib
- installing / In-database analytics using MADlib
- URL / In-database analytics using MADlib
- using, with Greenplum / Using MADlib with Greenplum
- URL, for documentation / Using MADlib with Greenplum
- Mahout
- about / Hadoop (HD)
- massive parallel processing (MPP) systems / Shared nothing, massive parallel processing (MPP) systems, and elastic scalability
- master data / Classification
- master host
- master node
- functions / Hadoop Distributed File System (HDFS)
- master server RAID configuration
- mirror segment instance
- model
- executing / Phase 4 – model
- designing / Phase 4 – model
- modeling methods
- about / Modeling methods
- decision tree / Modeling methods, Decision trees
- association rules / Modeling methods, Association rules
- linear regression / Modeling methods, Linear regression
- logistic regression / Modeling methods, Logistic regression
- Naive Bayesian classifier / Modeling methods, The Naive Bayesian classifier
- K-means clustering / Modeling methods, K-means clustering
- text analysis / Modeling methods, Text analysis
- modules, UAP / Modules
- database modules / Database modules
- HD modules / HD modules
- DIA modules / Data Integration Accelerator (DIA) modules
- Multiple Instruction Single Data (MISD) / Parallel versus distributed computing/processing
- Multiple Instructions Multiple Data (MIMD) / Parallel versus distributed computing/processing
N
- Naive Bayesian classifier
- about / The Naive Bayesian classifier
- Natural Language Processing (NLP)
- about / Data analytics
- Network layer / Data Computing Appliance (DCA)
- node
- about / Decision trees
- decision node / Decision trees
- event node / Decision trees
- terminal node / Decision trees
- noisy data
- about / Big Data
O
- ODBC drivers / The Greenplum Database physical architecture
- OLAP database
- about / Data warehousing
- vs, OLTP database / Data warehousing
- OLTP database
- about / Data warehousing
- vs, OLAP database / Data warehousing
- OpenChorus
- URL / Using Greenplum Chorus
- operational data
- about / Data analytics
- optimization / Optimization
- ORDER BY Clause / The ORDER BY clause
- ORDER BY clause / Window functions
- OVER() clause / Window functions
- OVER clause / The OVER (ORDER BY…) clause
P
- paradigms, data analytics / Analytic paradigms
- predictive analytics / Analytic paradigms, Predictive analytics
- prescriptive analytics / Analytic paradigms, Prescriptive analytics
- descriptive analytics / Descriptive analytics
- parallel processing systems
- about / Parallel versus distributed computing/processing
- vs, distributed processing systems / Parallel versus distributed computing/processing
- data streams / Parallel versus distributed computing/processing
- instruction streams / Parallel versus distributed computing/processing
- parsing
- about / Text analysis
- PARTITION BY clause / Window functions, The PARTITION BY clause
- PDO
- about / Data loading patterns
- Pentaho / Using external ETL to load data into Greenplum
- Perl DBI / The Greenplum Database physical architecture
- pgAdmin3 / The Greenplum Database physical architecture
- physical architecture, Greenplum Database / The Greenplum Database physical architecture
- Pig
- about / Hadoop (HD)
- Pivotal
- about / Greenplum Database, Pivotal
- Pivotal Database
- about / Greenplum Database
- PL/R
- about / PL/R
- polymorphic data storage
- prediction / Forecasting or prediction or regression
- predictive analytics / Analytic paradigms, Predictive analytics
- about / Data analytics
- aspects / Predictive analytics
- used for / Predictive analytics
- prescriptive analytics / Analytic paradigms, Prescriptive analytics
- used for / Prescriptive analytics
- psql / The Greenplum Database physical architecture
- publish insights
- about / Phase 5 – publish insights
- PWX connector
- PWX connectors
- Python / The Greenplum Database physical architecture
Q
- query executor
R
- R
- about / R programming
- URL, for installation / R programming
- using, with Greenplum / Using R with Greenplum
- DBI Connector / DBI Connector for R
- PL/R / PL/R
- random distribution
- about / Distribution
- RANGE clause / Window functions
- RANK function / The ORDER BY clause
- readable external tables
- about / External table types
- redistribute motion
- reference data / Classification
- regression / Forecasting or prediction or regression
- rep command / R programming
- REPLACE FUNCTION command / Creating, modifying, and dropping functions
- round robin distribution
- about / Data distribution
- ROWS clause / Window functions
- R programming
- about / R programming
- runif function / R programming
S
- sample function / R programming
- Sandbox dataset / Using Greenplum Chorus
- segment host
- segment server RAID configuration
- semi-structured data
- about / Multi-structured data
- characteristics / Multi-structured data
- shared disk data architecture
- about / Shared disk data architecture
- shared memory data architecture
- about / Shared memory data architecture
- shared nothing data architecture
- sigmoid
- about / Logistic regression
- simulations / Simulations
- Single Instruction Multiple Data (SIMD) / Parallel versus distributed computing/processing
- Single Instruction Single Data (SISD) / Parallel versus distributed computing/processing
- slave node
- functions / Hadoop Distributed File System (HDFS)
- Source dataset / Using Greenplum Chorus
- specialized analytics
- about / Data analytics
- Sqoop
- about / Hadoop (HD)
- Sqoop 2 / Sqoop 2
- sqoop command
- about / Sqoop 2
- standby master
- standby master host
- Storage layer / Data Computing Appliance (DCA)
- strategic data
- about / Data analytics
- structured data
- about / Multi-structured data
- supervised analysis / Analytics classified
- Support
- about / Association rules
- Support count
- about / Association rules
- Symmetric Processing (SMP) / Data loading patterns
T
- table distribution
- about / Greenplum table distribution and partitioning, Distribution
- Hash distribution / Distribution
- random distribution / Distribution
- data, skewing / Data skew and performance
- broadcast motion, optimizing / Optimizing the broadcast or redistribution motion for data co-location
- redistribute motion, optimizing / Optimizing the broadcast or redistribution motion for data co-location
- table partitioning
- about / Greenplum table distribution and partitioning, Partitioning
- features / Partitioning
- guidelines / Partitioning
- tactical data
- about / Data analytics
- Talend / Using external ETL to load data into Greenplum
- terminal node / Decision trees
- text analysis
- about / Text analysis
- Total cost of ownership (TCO) / Data Integration Accelerator (DIA) modules
- Total Lifetime Value (TLV) / Classification
- transactional data / Classification
- transitional data / Classification
U
- UAP
- about / Greenplum Unified Analytics Platform (UAP)
- components / Core components, Greenplum UAP components
- modules / Modules
- architecture / Core architecture concepts
- unstructured data
- about / Multi-structured data
- characteristics / Multi-structured data
- unsupervised analysis / Analytics classified
- unsupported data types, Greenplum / Unsupported Greenplum data types
- user-defined aggregates / User-defined aggregates
V
- Vector / R programming
W
- Weka
- about / Weka
- URL / Weka
- features / Weka
- using, with Greenplum / Using Weka with Greenplum
- window function
- about / Window functions
- characteristics / Window functions
- PARTITION BY clause / The PARTITION BY clause
- ORDER BY clause / The ORDER BY clause
- OVER clause / The OVER (ORDER BY…) clause
- creating / Creating, modifying, and dropping functions
- modifying / Creating, modifying, and dropping functions
- dropping / Creating, modifying, and dropping functions
- writable external tables
- about / External table types
Y
- YARN
- about / Hadoop (HD)
Z
- ZooKeeper
- about / Hadoop (HD)