Book Image

DynamoDB Applied Design Patterns

By : Uchit Hamendra Vyas
Book Image

DynamoDB Applied Design Patterns

By: Uchit Hamendra Vyas

Overview of this book

Table of Contents (17 chapters)
DynamoDB Applied Design Patterns
Credits
About the Authors
About the Reviewers
www.PacktPub.com
Preface
Index

DynamoDB versus Cassandra


Let's start with a data model such as DynamoDB's storage model, which is very similar to Cassandra's model, in which data is hashed on the row key, and the data inside the key is ordered by a specific column of the insert. In DynamoDB, a column can be single valued or scalar, which means that attributes can be multivalued. Cassandra has various attribute types, such as Integer, BigInteger, ASCII, UTF8, and Double, and it also offers composite and dynamic composite columns. It provide the full range of data formats, which include structured, semi-structured and unstructured data that can be run on recent applications, whereas DynamoDB has only two attribute types, namely String and Number.

Multi-datacenter across all regions is supported by Cassandra, whereas DynamoDB replicates data across multiple availability zones in the same region, but cross-region is not supported. So if we want to provide local data latencies in any regions across the world, then Cassandra provides full control over data consistency.

Let's take a scenario in which we require a large number of increments with a few counters, with the ability to read the current counter. Scaling the throughput on an individual counter is quite difficult, because there is a direct read/write operation performed. So if we need more than one node to handle one count, then the read operation becomes slow, and it involves all the nodes. In this case, we retry this operation in the event, because we don't know whether our previous request succeeded. We are performing the same update twice so that it frequently causes a long latency or load spikes across the cluster. In DynamoDB, there is an atomic counter that is more reliable with low latency, and it supports as many increments as the operations we have performed.

In DynamoDB the overload is effectively handled. If we exceed the predicted/mentioned throughput, we rapidly get the ThroughputExceeded error; meanwhile, no other requests are affected. This is very useful for a heavily-loaded site where thousands of requests come at a time, and, because of the latency spikes, queues will be generated with great thrust.

In Cassandra, the virtual node scale-up is pretty easy, but the scale-down operation remains slow, manual, and error prone. In data streaming, the transmitting nodes are joining or leaving the rings. This causes a group failure of nodes, so it requires repairing. Also, data lost during the decommissioning operation requires data restore through a backup. In DynamoDB, scale-up becomes effortless with a single-line command that waits for a while to be scaled, while in Cassandra Cluster, it's a multistep as well as multihour process. In Dynamo DB, scale-down is also better and much less time consuming, with low latency.

DynamoDB has the capability to insert or delete an item from a set without using complex code. Its operational cost will be zero as, once we set up backup jobs at specific time intervals, there is no need to manage the database, no disk space monitoring, no need to check memory usage, and no need to replace or repair the failed node. DynamoDB saves costs too. Cassandra basically supports a logically unlimited amount of data at a time with a specific key. This means that this limit is up to the disk space on a particular node, but DynamoDB's limit is up to 64 KB, so it might be tricky to handle overflow. Cassandra supports transactions very well by delivering ACID compliance using a commit log to capture all read and write operations, with built-in redundancy that ensures data durability if the hardware fails.

Now take a look at the tabular comparison between these two databases:

Specification

DynamoDB

Cassandra

Data model

Key-value store

Key-value with wide column store

Operating system

Cross platform (hosted)

BSD

Linux

OS X

Windows

License

Commercial (Amazon)

Commercial(Apache)

Data storage

Solid-state drive (SSD)

Filesystem

Secondary indexes

Yes

No

Accessing method

API call

API call

CQL (short for Cassandra Query Language)

Apache Thrift

Server-side script

No

No

Triggers

No

Yes

Partitioning

Sharding

Sharding

MapReduce

No (can be done with other services of AWS)

Yes

Integrity model supports

  • BASE

  • MVCC

  • ACID

  • Eventual consistency

  • Log replication

  • Read committed

BASE

Composite key support

Yes

Yes

Data consistency

Yes

Most operations

Distributed counters

Yes

Yes

Idempotent write batches

No

Yes

Time to live support

No

Yes

Conditional updates

Yes

No

Indexes on column value

No

Yes

Hadoop integration

M/R, Hive

M/R, Hive, Pig

Monitorable

Yes

Yes

Backups

Low impact snapshot with incremental

incremental

Deployment policy

Only with AWS

Anywhere

Transaction

No

Yes

Full text search

No

No

Geospatial indexes

No

No

Horizontal scalability

Yes

Yes

Replication method

Master-slave replica

Master-slave replica

Largest value supported

64 KB

2 GB

Object-relational mapping

No

Yes

Log support

No

Yes

User concepts

Access rights for users and roles can be defined via AWS Identity and Access Management (IAM)

Users can be defined per object