Book Image

Solutions Architect's Handbook

By : Saurabh Shrivastava, Neelanjali Srivastav
Book Image

Solutions Architect's Handbook

By: Saurabh Shrivastava, Neelanjali Srivastav

Overview of this book

Becoming a solutions architect gives you the flexibility to work with cutting-edge technologies and define product strategies. This handbook takes you through the essential concepts, design principles and patterns, architectural considerations, and all the latest technology that you need to know to become a successful solutions architect. This book starts with a quick introduction to the fundamentals of solution architecture design principles and attributes that will assist you in understanding how solution architecture benefits software projects across enterprises. You'll learn what a cloud migration and application modernization framework looks like, and will use microservices, event-driven, cache-based, and serverless patterns to design robust architectures. You'll then explore the main pillars of architecture design, including performance, scalability, cost optimization, security, operational excellence, and DevOps. Additionally, you'll also learn advanced concepts relating to big data, machine learning, and the Internet of Things (IoT). Finally, you'll get to grips with the documentation of architecture design and the soft skills that are necessary to become a better solutions architect. By the end of this book, you'll have learned techniques to create an efficient architecture design that meets your business requirements.
Table of Contents (18 chapters)

Technology choices for data ingestion

Let's looks at some popular open source tools for data ingestion and transfer:

  • Apache DistCp: DistCp stands for distributed copy and is part of the Hadoop ecosystem. The DistCp tool is used to copy large data within a cluster or between clusters. DistCp achieves the efficient and fast copying of data by utilizing the parallel processing distribution capability that comes with MapReduce. It distributes directories and files into map tasks to copy files partitions from source to target. DistCp also does error handling, recovery, and reporting across clusters.
  • Apache Sqoop: Sqoop is also part of the Hadoop ecosystem project and helps to transfer data between Hadoop and relational data stores such as RDBMS. Sqoop allows you to import data from a structured data store into HDFS and to export data from HDFS into a structured data store. Sqoop uses plugin connectors to connect to relational databases. You can use the Sqoop extension API to build...