Index
A
- Airflow
- about / AirFlow
- features / AirFlow
- Alternate Exchanges / Alternate Exchanges
- AMQP, concepts
- exchange / Event-Data Pipelines
- exchange type / Event-Data Pipelines
- message queue / Event-Data Pipelines
- binding / Event-Data Pipelines
- routing key / Event-Data Pipelines
- broker / Event-Data Pipelines
- channel / Event-Data Pipelines
- Virtual Hosts / Event-Data Pipelines
- Apache Atlas
- about / Apache Atlas
- high-level architecture / Apache Atlas high-level architecture
- type system / Apache Atlas high-level architecture
- Graph engine / Apache Atlas high-level architecture
- ingest / Apache Atlas high-level architecture
- export / Apache Atlas high-level architecture
- Apache Falcon / Apache Falcon
- Apache Flume
- about / Apache Flume
- event flow reliability / Flume event flow reliability
- multi-agent flow / Flume multi-agent flow
- multiplexer / Flow multiplexer
- Apache HBase architecture
- HMaster / Components of Apache HBase architecture
- region server / Components of Apache HBase architecture
- Apache Kafka
- reference / Apache Kafka as an event bus
- as event bus / Apache Kafka as an event bus
- message persistence / Message persistence
- Persistent Queue Design / Persistent Queue Design
- message batch / Message batch
- sendfile operation / Kafka and the sendfile operation
- compression / Compression
- properties / Compression
- Apache Nifi
- about / Apache Nifi
- high-level use cases / Apache Nifi
- components / Apache Nifi
- Apache Sqoop
- about / Apache Sqoop
- use cases / Apache Sqoop
- Apache YARN
- about / Apache YARN
- fundamentals, for configuration / Apache YARN
- Resource Manager / Apache YARN
- Scheduler / Apache YARN
- Applications Manager / Apache YARN
- Node Manager / Apache YARN
- Application Master / Apache YARN
- API Platform
- about / API Platform
- drawbacks / API Platform
- benefits / API Platform
- message-oriented application style / Message-oriented application style
- Micro Services application styles / Micro Services application styles
- Micro-Services Application Style, characteristics / Micro Services application styles
- application styles
- about / Application styles
- combining / Combining different application styles
- APTs (Advanced Persistent Threat Actors) / Elastic search and free text search queries
- architectural assumptions
- listing / Listing architectural assumptions
- architectural capabilities
- about / Architectural capabilities
- logical layers / Architectural capabilities
- UI capabilities / UI capabilities
- service gateway/API gateway capabilities / Service gateway/API gateway capabilities
- business service capabilities / Business service capabilities
- data capabilities / Data partitioning
- architectural patterns
- about / Architectural patterns
- retry pattern / The retry pattern
- circuit breaker / The circuit breaker
- throttling / Throttling
- bulk heads / Bulk heads
- Event-Sourcing / Event-sourcing
- Command and Query Responsibility Segregation (CQRS) / Command and Query Responsibility Segregation
- architectural principles
- defining / Defining architectural principles, Principle 1, Principle 5, Principle 6, Principle 7
- auto sharding / Horizontal scaling with automatic sharding of HBase tables
- AWS (Amazon Web Services) / Data dissemination architecture in a threat intel sharing system
- AWS API gateway / AWS API gateway
- AWS Lambda / AWS Lambda
- Azkaban
- about / Azkaban
- components / Azkaban
- features / Azkaban
B
- B-Tree / Relational Database Management Systems and Big data
- Balanced Trees / Relational Database Management Systems and Big data
- bandwidth / Relational Database Management Systems and Big data
- basic shell component / Basic shell component
- batch layer components
- about / Batch layer components and subcomponents
- read/extract component / Read/extract component
- normalizer component / Normalizer component
- Validation component / Validation component
- processing component / Processing component
- writer/formatter component / Writer/formatter component
- batch processing
- about / What do we mean by batch processing
- defining / What do we mean by batch processing
- principles / What do we mean by batch processing
- and Lambda architecture / Lambda architecture and batch processing
- beats
- FileBeat / Beats
- MetricBeat / Beats
- PacketBeat / Beats
- WinlogBeat / Beats
- HeartBeat / Beats
- Big data / Relational Database Management Systems and Big data
- BITES
- about / BITES – Unstructured/Semistructured document store
- structured data extraction / Structured data extraction
- text extraction / Text extraction
- document queries / Document queries
- highly-available clusters / Highly-available clusters
- guarantees / Guarantees
- scaling up / Scaling up
- integration, with SPARQL / Integration with SPARQL
- data formats / Data Formats
- business service capabilities, architectural capabilities
- about / Business service capabilities
- microservices / Microservices
- messaging / Messaging
- distributed (batch/stream) processing / Distributed (batch/stream) processing
C
- circuit breaker
- about / The circuit breaker
- closed state / The circuit breaker
- open state / The circuit breaker
- half-open / The circuit breaker
- clustering
- about / Clustering, Clustering and Network Partitions
- mirrored queues / Mirrored queues
- Persistent Messages / Persistent Messages
- data manipulation / Data Manipulation and Security
- security / Data Manipulation and Security
- use cases / Use Case 1, Use Case 2
- exchanges / Exchanges
- guidelines, for selecting exchange type / Guidelines on choosing the right Exchange Type
- headers, versus Topic exchanges / Headers versus Topic Exchanges
- routing / Routing
- communication protocol / Communication protocol
- communication style
- about / Communication styles
- synchronous / Communication styles
- asynchronous / Communication styles
- Event-Driven / Communication styles
- reactive / Communication styles
- components, Apache Nifi
- web server / Apache Nifi
- flow controller / Apache Nifi
- extensions / Apache Nifi
- flow file repository / Apache Nifi
- content repository / Apache Nifi
- provenance repository / Apache Nifi
- Consistency, Availability, and Partition Tolerance (CAP) theorem / Desired properties of a data-intensive system
- coordination service
- about / Coordination service
- characteristics / Coordination service
- use cases / Coordination service
- customer premises equipment (CPEs) / Data insight
D
- data
- about / Making sense of the data
- processing / What is data processing?
- Data-Collection System
- requisites / Data collection system requirements
- architecture principles / Data collection system architecture principles
- high-level component architecture / High-level component architecture
- high-level architecture / High-level architecture
- architecture technology mapping / Architecture technology mapping
- data-intensive system, properties
- robust and fault-tolerant / Desired properties of a data-intensive system
- low latency reads and updates / Desired properties of a data-intensive system
- salable / Desired properties of a data-intensive system
- general / Desired properties of a data-intensive system
- extensible / Desired properties of a data-intensive system
- ad-hoc queries, allowing / Desired properties of a data-intensive system
- minimal maintenance / Desired properties of a data-intensive system
- CAP theorem / Desired properties of a data-intensive system
- data capabilities, architectural capabilities
- data partitioning / Data partitioning
- data replication / Data replication
- Data Collector/Normalizer / Threat intel share – backend
- data dissemination
- about / Data dissemination
- considerations, for defining architecture / Data dissemination
- communication protocol / Communication protocol
- target audience / Target audience
- use case / Use case
- response schema / Response schema
- communication channel / Communication channel
- in threat intel sharing system / Data dissemination architecture in a threat intel sharing system
- threat intel share backend architecture / Threat intel share – backend
- threat intel share frontend architecture / Threat intel share – frontend
- AWS Lambda / AWS Lambda
- AWS API gateway / AWS API gateway
- cache population / Cache population
- cache eviction / Cache eviction
- non-functional aspects / Discussing the non-functional aspects of the preceding architecture
- non-functional use cases / Non-functional use cases for dissemination architecture
- data ecosystem
- about / What is a data ecosystem?, What constitutes a data ecosystem?
- interconnected data /
- environment / Data environment
- data sharing / Data sharing
- Data Enricher / Threat intel share – backend
- data explosion problem / The data explosion problem
- data ingestion
- batch ingestion / Data ingest
- stream ingestion / Data ingest
- data integrity, Stardog
- strict parsing of RDF / Strict parsing of RDF
- Integrity Constraint Validation / Integrity Constraint Validation
- data lineage
- about / Data lineage
- Apache Atlas / Apache Atlas
- Apache Falcon / Apache Falcon
- DataNode / DataNode
- data nodes / High-level architecture of HDFS
- data partitioning
- about / Distributed storage, Data partitioning
- range-based partitioning / Range-based partitioning
- hash-based partitioning / Hash-based partitioning
- data pipeline / Data pipeline
- data processing design
- challenges / The 3 + 1 Vs and how they affect choice in data processing design
- data quality / Data quality
- data replication / Distributed storage
- data sharing
- about / Data sharing
- traffic light protocol / Traffic light protocol
- data sources
- types / Types of data sources
- transactional data / Types of data sources
- User Data and Personnel Data / Types of data sources
- social and demographic data / Types of data sources
- about / Types of data sources
- publicly-available data / Types of data sources
- Dead-Letter Exchanges
- URL / Dead-Letter Exchanges
- about / Dead-Letter Exchanges
- Dependency Hell
- reference / Micro Services application styles
- Direct Exchange type / Dead-Letter Exchanges
- Distributed Configuration-Management Module
- configuration-management service / An introduction to ETCD
- configuration-management client / An introduction to ETCD
- distributed data
- centralized collection / Centralized collection of distributed data
- distributed filesystems / Hadoop Distributed Filesystem
- distributed processing
- about / Distributed processing, Distributed processing
- capabilities / Distributed processing, Distributed processing
- distributed storage
- about / Distributed storage
- data partitioning / Distributed storage
E
- elastic search / Elastic search and free text search queries
- ElasticSearch-Logstash-Kibana (ELK)
- about / ELK
- beats / Beats
- load balancing / Load-balancing
- Logstash / Logstash
- back pressure / Back pressure
- high-availability / High-availability
- Enterprise Service Bus (ESB) / Query-Data pipelines
- ETCD
- about / An introduction to ETCD
- high-level capabilities / An introduction to ETCD
- scheduler / Scheduler
- Micro Service, designing / Designing the Micro Service
- Event-Data Pipelines
- about / Event-Data Pipelines
- topologies / Topology 1, Topology 2, Topology 3
- resilience / Resilience
- high-availability / High-availability
- clustering / Clustering
- event-sourcing / Reliable messaging
- event streams / Architectural concepts
- executor component / Scheduler/executor component
F
- Flume Deployment Topology
- reference / Apache Flume
- formation management reference architecture, Oracle
- business view / Reference architecture – business view
- formatter component / Writer/formatter component
G
- General Data Protection Regulation (GDPR) / Data lineage
- graph store
- use case / Background of the use case
- solution discussion / Solution discussion
- bank fraud data mode / Bank fraud data model (as can be designed in a property graph data store such as Neo4J)
H
- Hadoop / What are Hadoop and HDFS, Introducing Hadoop, the Big Elephant
- Hadoop Distributed File System (HDFS)
- about / What are Hadoop and HDFS, The data explosion problem, Introducing Hadoop, the Big Elephant, Hadoop Distributed Filesystem
- NameNode / NameNode
- DataNode / DataNode
- MapReduce / MapReduce
- architecture principles / HDFS architecture principles (and assumptions)
- high-level architecture / High-level architecture of HDFS
- file formats / HDFS file formats
- Hadoop MapReduce / Introducing Hadoop, the Big Elephant
- Hadoop YARN / Introducing Hadoop, the Big Elephant
- hash partitioning / Hash-based partitioning
- HBase
- about / HBase
- basics / Understanding the basics of HBase
- data model / HBase data model
- architecture / HBase architecture
- horizontal scaling, with automatic sharding of tables / Horizontal scaling with automatic sharding of HBase tables
- region assignment / HMaster, region assignment, and balancing
- HMaster / HMaster, region assignment, and balancing
- balancing / HMaster, region assignment, and balancing
- HBase cluster
- performance tips / Tips for improved performance from your HBase cluster
- HDFS file formats / HDFS file formats
- High-Availability, Data Bus
- about / High-availability
- availability chart / Availability Chart
- high-level architecture, Data-Collection System
- about / High-level architecture
- service gateway / Service gateway
- discovery server / Discovery server
- high-level reference architecture / High-level reference architecture
I
- ICV constraint validations
- examples / Integrity Constraint Validation
- information-exchange, between nodes in DAG
- dumb exchange / Data pipeline
- smart exchange / Data pipeline
- information management conceptual reference architecture, Oracle
- about / Oracle's information management conceptual reference architecture
- conceptual view / Conceptual view
- event engine / Conceptual view
- data reservoir / Conceptual view
- data factory / Conceptual view
- enterprise information store / Conceptual view
- reporting / Conceptual view
- discovery lab / Conceptual view
- information management reference architecture, Oracle
- about / Oracle's information management reference architecture
- data process view / Data process view
- use case examples / Real-life use case examples
J
- Job / Architectural concepts
- Job-Execution Context / Architectural concepts
K
- Kafka streams
- about / Kafka streams
- features / Kafka streams
- processing topology / Stream processing topology
- Kappa architecture
- about / Kappa architecture
- No-Sql data stores, comparing / A brief comparison of different leading No-Sql data stores
L
- Lambda architecture
- about / Lambda architecture
- data immutability / Lambda architecture
- batch layer / Lambda architecture
- serving layer / Lambda architecture
- speed layer / Lambda architecture, Lambda architecture's speed layer
- and batch processing / Lambda architecture and batch processing
- Lambdas / Non-functional use cases for dissemination architecture
- Listeners
- Execution Listener / Designing the Micro Service
- Execution State Listener / Designing the Micro Service
- locking strategies
- optimistic locking / Processing strategy
- pessimistic locking / Processing strategy
- Luigi
- about / Luigi
- features / Luigi
M
- MapReduce framework
- reference / DataNode
- about / MapReduce
- message-oriented application style / Message-oriented application style
- micro-batch stream processing / Micro-batch stream processing
- Micro Service, ETCD
- components / Designing the Micro Service
- scheduling / Designing the Micro Service
- task, executing / Designing the Micro Service
- Pagination Use Case, implementing / Designing the Micro Service
- Micro Services application styles / Micro Services application styles
- mirrored-queue / Mirrored queues
- multi-processing / Processing strategy
N
- NameNode
- about / NameNode, High-level architecture of HDFS
- reference / DataNode
- network partitions / Clustering and Network Partitions
- non-functional aspects, data dissemination
- use cases / Non-functional use cases for dissemination architecture
- elastic search / Elastic search and free text search queries
- normalizer component / Normalizer component
- notions of time, in streams
- event time / Notion of time in stream processing
- processing time / Notion of time in stream processing
- ingestion time / Notion of time in stream processing
O
- Oozie
- about / Oozie
- features / Oozie
- optimistic locking / Processing strategy
P
- parallel processing / Processing strategy
- partitioning strategy
- caveats / Hash-based partitioning
- Persistent Messages / Persistent Messages
- pessimistic locking / Processing strategy
- processing application
- performing / How to perform the processing
- location / Where to perform the processing
- data quality / Quality of data
- networks / Networks are everywhere
- effect consumption of data / Effective consumption of the data
- processing component / Processing component
- processing guarantees
- about / Processing guarantees
- exactly-once guarantee / Processing guarantees
- at-least-once guarantee / Processing guarantees
- at-most-once guarantee / Processing guarantees
- processing strategy / Processing strategy
Q
- Quartz / Architecture technology mapping
- Quartz Scheduler
- components / Scheduler
- reference / Scheduler
- Query-Data Pipelines / Query-Data pipelines
R
- range-based partitioning / Range-based partitioning
- read component / Read/extract component
- reference architecture
- about / What is a reference architecture?
- problem statement / Problem statement
- for data-intensive system / Reference architecture for a data-intensive system
- reference architecture, for data-intensive system
- about / Reference architecture for a data-intensive system
- component view / Component view
- data ingestion / Data ingest
- data preparation / Data preparation
- data, processing / Data processing
- workflow management / Workflow management
- data, accessing / Data access
- data insight / Data insight
- data governance / Data governance
- data pipeline / Data pipeline
- regional API endpoint / Non-functional use cases for dissemination architecture
- Relational Database Management System / Relational Database Management Systems and Big data
- Reliability guarantees
- at-least-once delivery / Reliable messaging
- at-most-once delivery / Reliable messaging
- exactly-once delivery / Reliable messaging
- reliable messaging / Reliable messaging
- resources
- sharing, among processing applications / Sharing resources among processing applications
- retry pattern
- about / The retry pattern
- considerations / The retry pattern
- routing
- about / Routing
- Header-Based Content Routing / Header-Based Content Routing
- Topic-Based Content Routing / Topic-Based Content Routing
S
- Samza
- stream processing API / Samza's stream processing API
- Samza architecture
- about / Samza architecture
- concepts / Architectural concepts
- event-streaming layer / Event-streaming layer
- scheduler component / Scheduler/executor component
- seeking / Relational Database Management Systems and Big data
- semantic graph
- about / Semantic graph
- linked data / Linked data
- vocabularies / Vocabularies
- Semantic Query Language / Semantic Query Language
- inference / Inference
- service gateway/API gateway capabilities, architectural capabilities
- about / Service gateway/API gateway capabilities
- security / Security
- traffic control / Traffic control
- mediation / Mediation
- caching / Caching
- routing / Routing
- service orchestration / Service orchestration
- session window / Types of windows
- sink processor / Stream processing topology
- sliding window / Types of windows
- Solid State Drive (SDD)
- reference / The data explosion problem
- source processor / Stream processing topology
- Sparql
- reference / Semantic Query Language
- Stardog
- about / Stardog
- GraphQL queries / GraphQL queries
- Gremlin / Gremlin
- Virtual Graphs / Virtual Graphs – a Unifying DAO
- structured data / Structured data
- CVs / CVS
- constraints, validating / Data integrity and validating constraints
- data integrity / Data integrity and validating constraints
- monitoring and operation / Monitoring and operation
- performance / Performance
- reference / Performance
- strategies, for loading configuration properties
- fallback strategy / An introduction to ETCD
- local only / An introduction to ETCD
- remote only / An introduction to ETCD
- stream / Stream processing topology
- streaming application
- real time views, computing / Computing real time views
- streaming architecture
- scheduler/executor component / The scheduler/executor component of the streaming architecture
- streaming system / What is a streaming system?
- stream partition / Architectural concepts
- stream processing
- notions of time / Notion of time in stream processing
- stream processing application / Stream processing topology
- stream processor / Stream processing topology
T
- target audience / Target audience
- Task / Architectural concepts
- TaskTracker
- reference / DataNode
- Technopedia
- reference / Content mashup
- threat intel share backend architecture
- about / Threat intel share – backend
- RT query processor / RT query processor
- view builder component / View builder
- threat intel share frontend architecture / Threat intel share – frontend
- throttling
- strategies / Throttling
- top-level objects
- indicator / Target audience
- vulnerability / Target audience
- campaign / Target audience
- threat actor / Target audience
- Topic Exchange type / Alternate Exchanges
- traffic light protocol / Traffic light protocol
- tumbling window / Types of windows
- types, data
- structured data / What constitutes a data ecosystem?
- semi-structured data / What constitutes a data ecosystem?
- unstructured data / What constitutes a data ecosystem?
U
- UI capabilities, architectural capabilities
- about / UI capabilities
- content mashup / Content mashup
- multi-channel support / Multi-channel support
- user workflow / User workflow
- AR/VR support / AR/VR support
- unstructured data / HDFS file formats
- use case
- scenario / Scenario
- use case examples, formation management reference architecture
- machine learning use case /
- data enrichment use case / Data enrichment use case
- extract transform load use case / Extract transform load use case
V
- 3 + 1 Vs
- about / The 3 + 1 Vs and how they affect choice in data processing design
- cost associated with latency / Cost associated with latency
- classic way of doing things / Classic way of doing things
- validation component / Validation component
W
- windowing / Windowing
- windows
- sliding window / Types of windows
- tumbling windows / Types of windows
- session window / Types of windows
- writer/formatter component
- basic shell component / Basic shell component
- scheduler/executor component / Scheduler/executor component
- writer component / Writer/formatter component
Y
- YARN (Yet Another Resource Manager) / Samza's stream processing API