Book Image

Architecting Data-Intensive Applications

By : Anuj Kumar
Book Image

Architecting Data-Intensive Applications

By: Anuj Kumar

Overview of this book

<p>Are you an architect or a developer who looks at your own applications gingerly while browsing through Facebook and applauding it silently for its data-intensive, yet ?uent and efficient, behaviour? This book is your gateway to build smart data-intensive systems by incorporating the core data-intensive architectural principles, patterns, and techniques directly into your application architecture.</p> <p>This book starts by taking you through the primary design challenges involved with architecting data-intensive applications. You will learn how to implement data curation and data dissemination, depending on the volume of your data. You will then implement your application architecture one step at a time. You will get to grips with implementing the correct message delivery protocols and creating a data layer that doesn’t fail when running high traffic. This book will show you how you can divide your application into layers, each of which adheres to the single responsibility principle. By the end of this book, you will learn to streamline your thoughts and make the right choice in terms of technologies and architectural principles based on the problem at hand.</p>
Table of Contents (18 chapters)
Title Page
Packt Upsell
Contributors
Preface
Index

Preface

Architecting Data Intensive Applications is all about exploring the principles, capabilities, and patterns of a system that is being architected and designed to handle variety of workflows such as read, process, write, and analyze from a variety of data sources that are emitting different volumes of data at a consistent pace. This book educates its readers about various aspects, pitfalls to avoid and use cases that point to the need of a system capable of handling large data. It avoids the notion of comparison with Big Data systems. The reason is that in the reader’s opinion, "Big Data" phrase is quite overloaded already. How "Big" is really "Big" depends on the context in which the application is being built. Something that is "Big" for an organization with three employees that handles Twitter feeds of 10,000 users may not be "Big" for Twitter that handles millions of Twitter feeds every day. Therefore, this book tries to avoid any mention or comparison with the Big Data terminology. Readers will find this book as a technical guide and also a go-to book in situations where the reader wants to understand the aspects of dealing with data, such as Data Collection, Data Processing, Data Dissemination, Data Governance. This book will also contain example code at various places that will mostly be written in Java. All care has been taken to keep the examples simple and easy to understand with sufficient description, therefore, working knowledge of Java is not mandatory, although it will speed up the process of grasping the concept. Knowledge of OOP is essential though.

Who this book is for

This book is for developers and data architects who have to code, test, deploy, and/or maintain large-scale, high data volume applications. It is also useful for system architects who need to understand various non-functional aspects revolving around Data Intensive Systems.

What this book covers

Chapter 1,  Exploring the Data Ecosystem, will start with data ecosystem and also helps us in understanding its characteristics. You will take a look at the 3Vs of data ecosystem and discuss some data and information sharing standards and frameworks.

 

Chapter 2, Defining a Reference Architecture for Data-Intensive Systems, will give you an insight into reference architecture for a data-intensive system and will then provide you with a variety of possible implementations of that framework in different scenarios. You will also take a look at the architectural principles and its capabilities.

Chapter 3,  Patterns of the Data Intensive Architecture, will focus on various architectural patterns and discuss the application and the communication style in detail. You will learn how to combine different application styles and dive deep in various architectural patterns, enabling you to understand the why as well as the how of a data-centric architecture.

Chapter 4,  Discussing Data-Centric Architectures, will discuss the various reference architectures for a data-intensive system. This chapter will also look at the functional components that make the foundation of a distributed system and understand why the Lambda architecture is so popular with distributed systems. It will also provide an insight into Kappa architecture, which is a simplified version of Lambda architecture.

Chapter 5, Understanding Data Collection and Normalization Requirements and Techniques, will provide an in-depth design of a data collection system that you want to build from the scratch and its requirements and techniques.

Chapter 6, Creating a Data Pipeline for Consistent Data Collection, Processing, and Dissemination, will help you to learn how to create a scalable and highly-available architecture for designing and implementing a data pipeline in your overall architecture. This chapter will also delve deeper into the different considerations of designing the data pipeline and take a look at various design patterns that will help you in creating a resilient-data pipeline.

Chapter 7Building a Robust and Fault-Tolerant Data Collection System, will focus on data collection systems that are available in the open source community and NiFi, which is a highly-scalable and user-friendly system to define data flows. It will also deal with Sqoop, which addresses a very specific use case of transferring data between HDFS and relational systems. 

Chapter 8, Challenges of Data Processing, will act as a backbone for the further chapters. This chapter will discuss various challenges that an architect can face while creating data processing system within their organization. You will learn how to enable the large-scale processing of data while keeping the overall system costs lower and how to keep the overall processing time within the defined SLA as the load on the processing system increases. You will also learn how to effectively consume the processed data. 

 

Chapter 9, Let Us Process Data in Batches, will explore the creation of a batch processing system and the criteria necessary for designing a batch system. This will also discuss the Lambda architecture and its batch processing layer. Then, you’ll learn about how distributed processing works and how Hadoop and Map reduce is the go-to system to implement a batch processing system. 

Chapter 10, Handling Streams of Data, will explore the concepts and capabilities of a streaming application and its association with the Lambda architecture. Also, this chapter discusses the various sub-components of a stream-based system. Also, you will take a look at the various design considerations when designing a stream-based application and take a walk through the different components of a stream-based system in action.

Chapter 11, Let's Store the Data, will help you understand how to store a huge dataset and discuss about HDFS and its storage formats and discuss HBase, a columnar data store, and take a look at the graph databases.

Chapter 12When Data Dissemination is as Important as Data Itself, will explore how efficiently you can disseminate your data using indexing technologies and caching techniques. This chapter will also take a look at the data governance and teach you how to design a dissemination architecture.

To get the most out of this book

  1. Inform the reader of the things that they need to know before they start, and spell out what knowledge you are assuming.
  2. Any additional installation instructions and information they need for getting set up.

Get in touch

Feedback from our readers is always welcome.

General feedback: Email [email protected] and mention the book title in the subject of your message. If you have questions about any aspect of this book, please email us at [email protected].

Errata: Although we have taken every care to ensure the accuracy of our content, mistakes do happen. If you have found a mistake in this book, we would be grateful if you would report this to us. Please visit www.packtpub.com/submit-errata, selecting your book, clicking on the Errata Submission Form link, and entering the details.

 

 

Piracy: If you come across any illegal copies of our works in any form on the Internet, we would be grateful if you would provide us with the location address or website name. Please contact us at [email protected] with a link to the material.

If you are interested in becoming an author: If there is a topic that you have expertise in and you are interested in either writing or contributing to a book, please visit authors.packtpub.com.

Reviews

Please leave a review. Once you have read and used this book, why not leave a review on the site that you purchased it from? Potential readers can then see and use your unbiased opinion to make purchase decisions, we at Packt can understand what you think about our products, and our authors can see your feedback on their book. Thank you!

For more information about Packt, please visit packtpub.com.