Book Image

Mastering DynamoDB

By : Tanmay Deshpande
Book Image

Mastering DynamoDB

By: Tanmay Deshpande

Overview of this book

Table of Contents (18 chapters)
Mastering DynamoDB
Credits
Foreword
About the Author
Acknowledgments
About the Reviewers
www.PacktPub.com
Preface
8
Useful Libraries and Tools
Index

Data model concepts


To understand DynamoDB better, we need to understand its data model first. DynamoDB's data model includes Tables, Items, and Attributes. A table in DynamoDB is nothing but what we have in relational databases. DynamoDB tables need not have fixed schema (number of columns, column names, their data types, column order, and column size). It needs only the fixed primary key, its data type, and a secondary index if needed, and the remaining attributes can be decided at runtime. Items in DynamoDB are individual records of the table. We can have any number of attributes in an item.

DynamoDB stores the item attributes as key-value pairs. Item size is calculated by adding the length of attribute names and their values.

Tip

DynamoDB has an item-size limit of 64 KB; so, while designing your data model, you have to keep this thing in mind that your item size must not cross this limitation. There are various ways of avoiding the over spill, and we will discuss such best practices in Chapter 4, Best Practices.

The following diagram shows the data model hierarchy of DynamoDB:

Here, we have a table called Student, which can have multiple items in it. Each item can have multiple attributes that are stored in key–value pairs. We will see more details about the data models in Chapter 2, Data Models.

Operations

DynamoDB supports various operations to play with tables, items, and attributes.

Table operations

DynamoDB supports the create, update, and delete operations at the table level. It also supports the UpdateTable operation, which can be used to increase or decrease the provisioned throughput. We have the ListTables operation to get the list of all available tables associated with your account for a specific endpoint. The DescribeTable operation can be used to get detailed information about the given table.

Item operations

Item operations allows you to add, update, or delete an item from the given table. The UpdateItem operation allows us to add, update, or delete existing attributes from a given item.

The Query and Scan operations

The Query and Scan operations are used to retrieve information from tables. The Query operation allows us to query the given table with provided hash key and range key. We can also query tables for secondary indexes. The Scan operation reads all items from a given table. More information on operations can be found in Chapter 2, Data Models.

Provisioned throughput

Provisioned throughput is a special feature of DynamoDB that allows us to have consistent and predictable performance. We need to specify the read and write capacity units. A read capacity unit is one strongly consistent read and two eventually consistent reads per second unit for an item as large as 4 KB, whereas one write capacity unit is one strongly consistent write unit for an item as large as 1 KB. A consistent read reflects all successful writes prior to that read request, whereas a consistent write updates all replications of a given data object so that a read on this object after this write will always reflect the same value.

For items whose size is more than 4 KB, the required read capacity units are calculated by summing it up to the next closest multiple of 4. For example, if we want to read an item whose size is 11 KB, then the number of read capacity units required is three, as the nearest multiple of 4 to 11 is 12. So, 12/4 = 3 is the required number of read capacity units.

Required Capacity Units For

Consistency

Formula

Reads

Strongly consistent

No. of Item reads per second * Item Size

Reads

Eventually consistent

Number of Item reads per second * Item Size/2

Writes

NA

Number of Item writes per second * Item Size

If our application exceeds the maximum provisioned throughput for a given table, then we get notified with a proper exception. We can also monitor the provisioned and actual throughput from the AWS management console, which will give us the exact idea of our application behavior. To understand it better, let's take an example. Suppose, we have set the write capacity units to 100 for a certain table and if your application starts writing to the table by 1,500 capacity units, then DynamoDB allows the first 1,000 writes and throttles the rest. As all DynamoDB operations work as RESTful services, it gives the error code 400 (Bad Request).

If you have items smaller than 4 KB, even then it will consider it to be a single read capacity unit. We cannot group together multiple items smaller than 4 KB into a single read capacity unit. For instance, if your item size is 3 KB and if you want to read 50 items per second, then you need to provision 50 read capacity units in a table definition for strong consistency and 25 read capacity units for eventual consistency.

If you have items larger than 4 KB, then you have to round up the size to the next multiple of 4. For example, if your item size is 7 KB (~8KB) and you need to read 100 items per second, then the required read capacity units would be 200 for strong consistency and 100 capacity units for eventual consistency.

In the case of write capacity units, the same logic is followed. If the item size is less than 1 KB, then it is rounded up to 1 KB, and if item size is more than 1 KB, then it is rounded up to next multiple of 1.

The AWS SDK provides auto-retries on ProvisionedThroughputExceededException when configured though client configuration. This configuration option allows us to set the maximum number of times HttpClient should retry sending the request to DynamoDB. It also implements the default backoff strategy that decides the retry interval.

The following is a sample code to set a maximum of three auto retries:

   // Create a configuration objectfinal ClientConfiguration cfg = new ClientConfiguration();// Set the maximum auto-reties to 3cfg.setMaxErrorRetry(3);
    // Set configuration object in Clientclient.setConfiguration(cfg);

Tip

Downloading the example code

You can download the example code files for all Packt books you have purchased from your account at http://www.packtpub.com. If you purchased this book elsewhere, you can visit http://www.packtpub.com/support and register to have the files e-mailed directly to you.

DynamoDB features

Like we said earlier, DynamoDB comes with enormous scalability and high availability with predictable performance, which makes it stand out strong compared to other NoSQL databases. It has tons of features; we will discuss some of them.

Fully managed

DynamoDB allows developers to focus on the development part rather than deciding which hardware to provision, how to do administration, how to set up the distributed cluster, how to take care of fault tolerance, and so on. DynamoDB handles all scaling needs; it partitions your data in such a manner that the performance requirements get taken care of. Any distributed system that starts scaling is an overhead to manage but DynamoDB is a fully managed service, so you don't need to bother about hiring an administrator to take care of this system.

Durable

Once data is loaded into DynamoDB, it automatically replicates the data into different availability zones in a region. So, even if your data from one data center gets lost, there is always a backup in another data center. DynamoDB does this automatically and synchronously. By default, DynamoDB replicates your data to three different data centers.

Scalable

DynamoDB distributes your data on multiple servers across multiple availability zones automatically as the data size grows. The number of servers could be easily from hundreds to thousands. Developers can easily write and read data of any size and there are no limitations on data size. DynamoDB follows the shared-nothing architecture.

Fast

DynamoDB serves at a very high throughput, providing single-digit millisecond latency. It uses SSD for consistent and optimized performance at a very high scale. DynamoDB does not index all attributes of a table, saving costs, as it only needs to index the primary key, and this makes read and write operations superfast. Any application running on an EC2 instance will show single-digit millisecond latency for an item of size 1 KB. The latencies remain constant even at scale due to the highly distributed nature and optimized routing algorithms.

Simple administration

DynamoDB is very easy to manage. The Amazon web console has a user-friendly interface to create tables and provide necessary details. You can simply start using the table within a few minutes. Once the data load starts, you don't need to do anything as rest is taken care by DynamoDB. You can monitor Amazon CloudWatch for the provision throughput and can make changes to read and write capacity units accordingly if needed.

Fault tolerance

DynamoDB automatically replicates the data to multiple availability zones which helps in reducing any risk associated with failures.

Flexible

DynamoDB, being a NoSQL database, does not force users to define the table schema beforehand. Being a key-value data store, it allows users to decide what attributes need to be there in an item, on the fly. Each item of a table can have different number of attributes.

Rich Data ModelDynamoDB has a rich data model, which allows a user to define the attributes with various data types, for example, number, string, binary, number set, string set, and binary set. We are going to talk about these data types in Chapter 2, Data Models, in detail.

Indexing

DynamoDB indexes the primary key of each item, which allows us to access any element in a faster and efficient manner. It also allows global and local secondary indexes, which allows the user to query on any non-primary key attribute.

Secure

Each call to DynamoDB makes sure that only authenticated users can access the data. It also uses the latest and effective cryptographic techniques to see your data. It can be easily integrated with AWS Identity and Access Management (IAM), which allows users to set fine-grained access control and authorization.

Cost effective

DynamoDB provides a very cost-effective pricing model to host an application of any scale. The pay-per-use model gives users the flexibility to control expenditure. It also provides free tier, which allows users 100 MB free data storage with 5 writes/second and 10 reads/second as throughput capacity. More details about pricing can be found at http://aws.amazon.com/dynamodb/pricing/.