Book Image

Solutions Architect's Handbook

By : Saurabh Shrivastava, Neelanjali Srivastav
Book Image

Solutions Architect's Handbook

By: Saurabh Shrivastava, Neelanjali Srivastav

Overview of this book

Becoming a solutions architect gives you the flexibility to work with cutting-edge technologies and define product strategies. This handbook takes you through the essential concepts, design principles and patterns, architectural considerations, and all the latest technology that you need to know to become a successful solutions architect. This book starts with a quick introduction to the fundamentals of solution architecture design principles and attributes that will assist you in understanding how solution architecture benefits software projects across enterprises. You'll learn what a cloud migration and application modernization framework looks like, and will use microservices, event-driven, cache-based, and serverless patterns to design robust architectures. You'll then explore the main pillars of architecture design, including performance, scalability, cost optimization, security, operational excellence, and DevOps. Additionally, you'll also learn advanced concepts relating to big data, machine learning, and the Internet of Things (IoT). Finally, you'll get to grips with the documentation of architecture design and the soft skills that are necessary to become a better solutions architect. By the end of this book, you'll have learned techniques to create an efficient architecture design that meets your business requirements.
Table of Contents (18 chapters)

Unstructured data stores

When you look at the requirements for an unstructured data store, it seems that Hadoop is a perfect choice because it is scalable, extensible, and very flexible. It can run on consumer hardware, has a vast ecosystem of tools, and appears to be cost effective to run. Hadoop uses a master-and-child-node model, where data is distributed between multiple child nodes and the master node co-ordinates jobs for running queries on data. The Hadoop system is based on massively parallel processing (MPP), which makes it fast to perform queries on all types of data, whether it is structured or unstructured.

When a Hadoop cluster is created, each child node created from the server comes with a block of the attached disk storage called a local Hadoop Distributed File System (HDFS) disk store. You can run the query against stored data using common processing frameworks such as Hive, Ping, and Spark. However, data on the local disk persists only for the life of the associated...