Book Image

Scalable Data Streaming with Amazon Kinesis

By : Tarik Makota, Brian Maguire, Danny Gagne, Rajeev Chakrabarti
Book Image

Scalable Data Streaming with Amazon Kinesis

By: Tarik Makota, Brian Maguire, Danny Gagne, Rajeev Chakrabarti

Overview of this book

Amazon Kinesis is a collection of secure, serverless, durable, and highly available purpose-built data streaming services. This data streaming service provides APIs and client SDKs that enable you to produce and consume data at scale. Scalable Data Streaming with Amazon Kinesis begins with a quick overview of the core concepts of data streams, along with the essentials of the AWS Kinesis landscape. You'll then explore the requirements of the use case shown through the book to help you get started and cover the key pain points encountered in the data stream life cycle. As you advance, you'll get to grips with the architectural components of Kinesis, understand how they are configured to build data pipelines, and delve into the applications that connect to them for consumption and processing. You'll also build a Kinesis data pipeline from scratch and learn how to implement and apply practical solutions. Moving on, you'll learn how to configure Kinesis on a cloud platform. Finally, you’ll learn how other AWS services can be integrated into Kinesis. These services include Redshift, Dynamo Database, AWS S3, Elastic Search, and third-party applications such as Splunk. By the end of this AWS book, you’ll be able to build and deploy your own Kinesis data pipelines with Kinesis Data Streams (KDS), Kinesis Data Firehose (KFH), Kinesis Video Streams (KVS), and Kinesis Data Analytics (KDA).
Table of Contents (13 chapters)
1
Section 1: Introduction to Data Streaming and Amazon Kinesis
5
Section 2: Deep Dive into Kinesis
10
Section 3: Integrations

Challenges associated with distributed systems

The fundamental challenge of distributed systems is intra-system communication. When possible, a messaging system can provide a core decoupling function, allowing intermittent and transient failures not to cascade or cross fault boundaries. These systems must be highly available, scalable, and durable. The following core concepts are essential to understand and reason about these systems: transactions per second, scaling, latency, and high availability. They allow us to understand the system's key dynamics so that resources can be provisioned to support the workload.

Transactions per second

The most important metric for all messaging systems is Transactions Per Second (TPS). This metric is not as simple as it may seem initially, as the maximum TPS is constrained by either a discrete number of transactions or the maximum size of data that can be processed. This max TPS is called capacity. In general, messaging systems have different capacity for the inbound side and the outbound side, with the outbound side normally having a greater capacity to support multiple consumers and prevent large message backlogs.

Backpressure refers to a system state in which the producer TPS is higher than the consumer TPS. The input is coming in faster than it can be processed. There are multiple strategies for handling backpressure. The easiest is to reduce the number of messages being sent, for example, having a temperature sensor send data once a minute instead of once a second. The second is to scale the compute for the consumers to increase the consumer TPS. If the flow of messages is intermittent or bursty, a buffer can temporarily hold the messages and allow the consumers to catch up. Buffers are often used in conjunction with scaling to store messages while compute is scaled up. The last method is to drop messages. Depending on the message type, this can be unacceptable – you don't want to drop customer orders – but, in the case of sensor data, sampling, can be used to process a fixed percentage of data, for example, 5% of data.

Scaling

Messaging systems need to present an access point that hides the complexity of the internal system. In general, messaging systems consist of multiple independent channels and shards. A shard is an independent unit of capacity. This internal complexity cannot be completely hidden from users since the way data is distributed to the different shards needs to be understood by both senders and receivers. Scaling is used to increase the capacity of the messaging system. One way to think about scaling is to consider cables supporting a bridge. If it has 5 strands, it can support 50 tons, and with 50 strands it can support 500 tons. Thus, one unit of scaling for bridges would be the number of cable strands.

As a system scales, its ability to maintain the global order of records becomes limited. In general, order will only be maintained at a sub-global group level. This is an important design consideration that must be covered when designing real-time solutions. If global order is needed, there are fundamental limits on the system's maximum throughput, and if the throughput required is higher, the system will have to be rearchitected.

Etymology of the shard

In 1997, the game Ultima Online was released. In order to reduce latency and handle scale, there were multiple servers around the world that a player could log in to. Each server functioned independently and existed as its own universe in a multi-verse. This was explained in-game by the wizard Mondain capturing the world of Sosaria in the Gem of Immortality. This gem was then shattered by the Avatar into multiple shards. The player then selected which shard they wanted to play in. The term shard, or sharding, is another way to talk about the horizontal partitioning of data, that is, spreading data across multiple servers.

Latency

Latency is the amount of time between a cause and effect in a system. In the context of a messaging system, there are multiple measures of latency that are important in understanding its behavior. In general, it is the time between when a message enters the system and when the message leaves the system. For example, it can be thought of as the time between pressing the brakes in a car and the vehicle stopping. Some workloads, for example, real-time audio/video communication, are especially latency-sensitive and care must be taken to minimize this across all aspects of the system.

The two primary measures of latency in a messaging system are propagation delay and age of message.

The propagation delay is the amount of time from when a message is written to the message broker to when it is read by the consumer application. In most cases, propagation delay is a reflection of how often producers or consumers are polling the message broker. Network effects on the producer's connection to the message system and the acknowledgment of putting a message are known as producer latency, and correspondingly, the time it takes for a request to complete on the consumer side is consumer latency.

The last measure of latency that is extremely important is understanding how long a message has been in the system before it is retrieved, that is, the age of the message. If the average age of messages is increasing, that indicates a backlog and means that messages are being added faster than they are being retrieved.

Fault tolerance/high availability

Messaging systems are foundational to modern distributed systems and need to be designed in such a way to be highly available.

"Everything fails all the time."

– Werner Vogel, Amazon CTO

The preceding quote hints at the difficulty of building highly available systems. To avoid single points of failure, redundancy is required, and messages, once acknowledged, need to be durably stored. Even though messaging systems present a simple interface, to achieve this level of performance, they are actually comprised of many systems configured as a cluster.

Now that we have the vocabulary to talk about inter-system communication, let's introduce the components of messaging systems.