Book Image

Big Data Architect's Handbook

By : Syed Muhammad Fahad Akhtar
Book Image

Big Data Architect's Handbook

By: Syed Muhammad Fahad Akhtar

Overview of this book

The big data architects are the “masters” of data, and hold high value in today’s market. Handling big data, be it of good or bad quality, is not an easy task. The prime job for any big data architect is to build an end-to-end big data solution that integrates data from different sources and analyzes it to find useful, hidden insights. Big Data Architect’s Handbook takes you through developing a complete, end-to-end big data pipeline, which will lay the foundation for you and provide the necessary knowledge required to be an architect in big data. Right from understanding the design considerations to implementing a solid, efficient, and scalable data pipeline, this book walks you through all the essential aspects of big data. It also gives you an overview of how you can leverage the power of various big data tools such as Apache Hadoop and ElasticSearch in order to bring them together and build an efficient big data solution. By the end of this book, you will be able to build your own design system which integrates, maintains, visualizes, and monitors your data. In addition, you will have a smooth design flow in each process, putting insights in action.
Table of Contents (21 chapters)
Preface
Free Chapter
1
Why Big Data?
2
Big Data Environment Setup
3
Hadoop Ecosystem
4
NoSQL Database
5
Off-the-Shelf Commercial Tools
6
Containerization
7
Network Infrastructure
8
Cloud Infrastructure
9
Security and Monitoring
10
Frontend Architecture
11
Backend Architecture
12
Machine Learning
13
Artificial Intelligence
14
Elasticsearch
15
Structured Data
16
Unstructured Data
17
Data Visualization
18
Financial Trading System
19
Retail Recommendation System
20
Other Books You May Enjoy
Preface

Big data architects are the masters of data and hold high value in today’s market. Handling big data, be it of good or bad quality, is not an easy task. The prime task before any big data architect is to build an end-to-end big data solution that integrates data from different sources and analyzes it to find useful, hidden insights. Big Data Architect's Handbook takes you through developing a complete, end-to-end big data pipeline that will lay the foundation for you and provide the necessary knowledge required to be an architect in big data. Right from understanding the design considerations to implementing a solid, efficient, and scalable data pipeline, this book walks you through all the essential aspects of big data. It also gives you an overview of how you can leverage the power of various big data tools such as Apache Hadoop and Elasticsearch in order to bring them together and build an efficient big data solution.
By the end of this book, you will be able to build your own design system that integrates, maintains, visualizes, and monitors your data. In addition, you will have a smooth design flow in each process, putting insights in action.

Who this book is for

Big Data Architect's Handbook is for you if you are an aspiring data professional, developer, or IT enthusiast who aims to be an all-round architect in big data. This book is a one-stop solution to enhance your knowledge and carry out easy to complex activities required to become a big data architect.

What this book covers

Chapter 1, Why Big Data?, explains what big data is, why we need big data, who should deal with big data, when to use big data, and how to use big data. The design consideration of the end-to-end big data solution, including cloud, Hadoop, network, analytics and so on, are also outlined here.

Chapter 2, Big Data Environment Setup, provides a step-by-step guide of how to setup environment to run big data applications.

Chapter 3, Hadoop Ecosystem, is about the Hadoop ecosystem. It consists of different open source modules, accessories, and Apache projects for reliable and scalable distributed computing. This chapter will teach you how to build a Hadoop big data system for streaming data with a step-by-step guide.

Chapter 4, NoSQL Database, explains the concepts, principles, properties, performance and hybrid of the popular NoSQL database so that a big data architect can confidently choose appropriate NoSQL for their projects. This chapter will teach you how to implement NoSQL for killer applications with a step-by-step guide.

Chapter 5, Off-the-Shelf Commercial Tools, introduces some popular commercial off-the-shelf tools for big data with a hands-on Stream Analytics example.

Chapter 6, Containerization,  introduces the concept and application of container-based virtualization. It is an OS-level virtualization method for deploying and running distributed applications without launching an entire VM for each application. Moreover, management of Dockers and Kubernetes using Openshift is demonstrated here.

Chapter 7, Network Infrastructure, teaches essential network technology for an architect to design big data systems across racks, data centers, and geographical locations. Moreover, this chapter will teach you the network visualization tool via a step-by-step guide.

Chapter 8, Cloud Infrastructure, introduces essential considerations on cloud infrastructure design for big data from the perspective of performance and capability. The requirements of deploying big data in cloud are unique and quite different from traditional applications. Therefore, a big data architect must need careful design, especially estimating the amount of data to analyze by using the big data capability in the cloud, because not all public or private cloud offerings are built to accommodate big data solutions.

Chapter 9, Security and Monitoring, is about essential knowledge on security, including next-generation firewalls, DevOps security, and monitoring tools.

Chapter 10 Frontend Architecture, introduces the Frontend architecture, which is a collection of tools and processes that aims to improve the quality of our frontend code while creating a more efficient, scalable, and sustainable design for big data systems. To be a successful big data Architect, one critical factor is to present persuasive analytic results to mostly non-technical persons, such as C-level management, and decision-makers with a user-friendly, elegant, and responsive user graphic interface. This chapter will teach you how to use the React + Redux framework to build a responsive and easy debug user interface.

Chapter 11Backend Architecture, shows how to design a scalable, resilient, manageable, and cost-effective distributed backend architecture with different combinations of technology. It handles business logic and data storage with a RESTful web API service.

Chapter 12Machine Learning, teaches the essential concepts and killer applications of Machine Learning. You will learn about the most effective machine learning techniques, and gain practice implementing them and getting them to work for yourself and your enterprise. You'll learn about not only the theoretical underpinnings of learning, but also the practical know-how needed to quickly and powerfully apply these techniques to new problems.

Chapter 13Artificial Intelligence, introduces AI and CNN with hands-on big data killer applications. The application for CNN or deep learning to work with machine learning is one good method to handle unstructured big data.

Chapter 14Elasticsearch, shows how to use the open source tool Elasticsearch to do searching tasks in a big data system. This is because it is an enterprise-grade search engine, and easy to scale. More features of it are: handy REST API and JSON response, good documentation, Sense UI, stable and proven Lucene underlying engine, excellent Query DSL, multi-tenancy, advanced Search Features, configurable and extensible, percolation, custom analyzer, On-the-Fly Analyzer selection, rich ecosystem, and active community.

Chapter 15Structured Data, introduces the use of open source tools to manipulate and analyze structured data.

Chapter 16Unstructured Data, shows how to use open source tools to manipulate and analyze unstructured data. The readers will learn how to use machine learning and AI to extract information for analysis in killer applications such as a Retail Recommendation System and Facial Recognition.

Chapter 17Data Visualization, illustrates how to use tools to present analytical results to users using two top-of-the-shelf tools, Matplotlib and D3.js.

Chapter 18Financial Trading System, covers algorithmic trading benefits and strategies, and how to design and deploy an end-to-end Financial Trading System via a step-by-step guide.

Chapter 19, Retail Recommendation System shows how to design and deploy an end-to-end Retail Recommendation System through a step-by-step guide.

To get the most out of this book

  1. This book uses Ubuntu Linux desktop environment to setup and execute all the example and sample codes.
  2. Each chapter contains the installation and setup instruction of the framework / application used. Follow those instructions carefully in order to setup the environment and successfully run the provided example.

Download the example code files

You can download the example code files for this book from your account at www.packtpub.com. If you purchased this book elsewhere, you can visit www.packtpub.com/support and register to have the files emailed directly to you.

You can download the code files by following these steps:

  1. Log in or register at www.packtpub.com.
  2. Select the SUPPORT tab.
  3. Click on Code Downloads & Errata.
  4. Enter the name of the book in the Search box and follow the onscreen instructions.

Once the file is downloaded, please make sure that you unzip or extract the folder using the latest version of:

  • WinRAR/7-Zip for Windows
  • Zipeg/iZip/UnRarX for Mac
  • 7-Zip/PeaZip for Linux

The code bundle for the book is also hosted on GitHub at https://github.com/PacktPublishing/Big-Data-Architects-HandbookIn case there's an update to the code, it will be updated on the existing GitHub repository.

We also have other code bundles from our rich catalog of books and videos available at https://github.com/PacktPublishing/. Check them out!

Download the color images

We also provide a PDF file that has color images of the screenshots/diagrams used in this book. You can download it here: http://www.packtpub.com/sites/default/files/downloads/BigDataArchitectsHandbook_ColorImages.pdf.

Conventions used

There are a number of text conventions used throughout this book.

CodeInText: Indicates code words in text, database table names, folder names, filenames, file extensions, pathnames, dummy URLs, user input, and Twitter handles. Here is an example: "Mount the downloaded WebStorm-10*.dmg disk image file as another disk in your system."

A block of code is set as follows:

html, body, #map {
height: 100%;
margin: 0;
padding: 0
}

When we wish to draw your attention to a particular part of a code block, the relevant lines or items are set in bold:

[default]
exten => s,1,Dial(Zap/1|30)
exten => s,2,Voicemail(u100)
exten => s,102,Voicemail(b100)
exten => i,1,Voicemail(s0)

Any command-line input or output is written as follows:

$ mkdir css
$ cd css

Bold: Indicates a new term, an important word, or words that you see onscreen. For example, words in menus or dialog boxes appear in the text like this. Here is an example: "Select System info from the Administration panel."

Warnings or important notes appear like this.
Tips and tricks appear like this.

Get in touch

Feedback from our readers is always welcome.

General feedback: Email [email protected] and mention the book title in the subject of your message. If you have questions about any aspect of this book, please email us at [email protected].

Errata: Although we have taken every care to ensure the accuracy of our content, mistakes do happen. If you have found a mistake in this book, we would be grateful if you would report this to us. Please visit www.packtpub.com/submit-errata, selecting your book, clicking on the Errata Submission Form link, and entering the details.

Piracy: If you come across any illegal copies of our works in any form on the Internet, we would be grateful if you would provide us with the location address or website name. Please contact us at [email protected] with a link to the material.

If you are interested in becoming an author: If there is a topic that you have expertise in and you are interested in either writing or contributing to a book, please visit authors.packtpub.com.

Reviews

Please leave a review. Once you have read and used this book, why not leave a review on the site that you purchased it from? Potential readers can then see and use your unbiased opinion to make purchase decisions, we at Packt can understand what you think about our products, and our authors can see your feedback on their book. Thank you!

For more information about Packt, please visit packtpub.com.