Book Image

A Definitive Guide to Apache ShardingSphere

By : Trista Pan, Zhang Liang, Yacine Si Tayeb
Book Image

A Definitive Guide to Apache ShardingSphere

By: Trista Pan, Zhang Liang, Yacine Si Tayeb

Overview of this book

Apache ShardingSphere is a new open source ecosystem for distributed data infrastructures based on pluggability and cloud-native principles that helps enhance your database. This book begins with a quick overview of the main challenges faced by database management systems (DBMSs) in production environments, followed by a brief introduction to the software's kernel concept. After that, using real-world examples of distributed database solutions, elastic scaling, DistSQL, synthetic monitoring, database gateways, and SQL authority and user authentication, you’ll fully understand ShardingSphere's architectural components, how they’re configured and can be plugged into your existing infrastructure, and how to manage your data and applications. You’ll also explore ShardingSphere-JDBC and ShardingSphere-Proxy, the ecosystem’s clients, and how they can work either concurrently or independently to address your needs. You’ll then learn how to customize the plugin platform to define personalized user strategies and manage multiple configurations seamlessly. Finally, the book enables you to get up and running with functional and performance tests for all scenarios. By the end of this book, you’ll be able to build and deploy a customized version of ShardingSphere, addressing the key pain points encountered in your data management infrastructure.
Table of Contents (18 chapters)
1
Section 1: Introducing Apache ShardingSphere
4
Section 2: Apache ShardingSphere Architecture, Installation, and Configuration
10
Section 3: Apache ShardingSphere Real-World Examples, Performance, and Scenario Tests

Understanding data sharding

Data sharding refers to splitting the data in a single database into multiple databases or tables according to a certain dimension, to improve performance and availability.

Data sharding is not to be confused with data partitioning, which is about dividing the data into sub-groups while keeping it stored in a single database. Many other opinions and ideas are floating around in academia and on the internet about this, but rest assured that the number of databases where the data is stored represents the main difference that you should be aware of when distinguishing between sharding and partitioning.

According to the granularity of data sharding, we can divide data sharding into two common forms – database shards and table shards:

  • Database shards are partitions of data in a database, with each shard being stored in a different database instance.
  • Table shards are the smaller pieces that used to be part of a single table and are now...