Book Image

Scalable Data Architecture with Java

By : Sinchan Banerjee
Book Image

Scalable Data Architecture with Java

By: Sinchan Banerjee

Overview of this book

Java architectural patterns and tools help architects to build reliable, scalable, and secure data engineering solutions that collect, manipulate, and publish data. This book will help you make the most of the architecting data solutions available with clear and actionable advice from an expert. You’ll start with an overview of data architecture, exploring responsibilities of a Java data architect, and learning about various data formats, data storage, databases, and data application platforms as well as how to choose them. Next, you’ll understand how to architect a batch and real-time data processing pipeline. You’ll also get to grips with the various Java data processing patterns, before progressing to data security and governance. The later chapters will show you how to publish Data as a Service and how you can architect it. Finally, you’ll focus on how to evaluate and recommend an architecture by developing performance benchmarks, estimations, and various decision metrics. By the end of this book, you’ll be able to successfully orchestrate data architecture solutions using Java and related technologies as well as to evaluate and present the most suitable solution to your clients.
Table of Contents (19 chapters)
1
Section 1 – Foundation of Data Systems
5
Section 2 – Building Data Processing Pipelines
11
Section 3 – Enabling Data as a Service
14
Section 4 – Choosing Suitable Data Architecture

Implementing and unit testing the solution

In this section, we will build the Spring Batch application to implement the solution that we designed in the preceding section. We will also run and test the solution.

First, we must understand that different jobs will have their own schedules. However, the dimension tables need to be loaded before the fact table, because the dimension tables are the lookup tables.

For the brevity of our discussion, we will only implement the Spring Batch application for the fact table. In this implementation, we will load the device data and event data from CSV to the table manually. However, you can follow the lead of the discussion by implementing the solution and developing two different Spring Batch applications for the device and event dimension tables. In this implementation, we will assume that the device and event data have already been loaded into the data warehouse.

You can do that manually by executing the DMLs present at the following...