Hands-on DevOps

Hands-on DevOps

By : Sricharan Vadapalli

Buy this Book

Hands-on DevOps

By: Sricharan Vadapalli

Buy this Book

Overview of this book

DevOps strategies have really become an important factor for big data environments. This book initially provides an introduction to big data, DevOps, and Cloud computing along with the need for DevOps strategies in big data environments. We move on to explore the adoption of DevOps frameworks and business scenarios. We then build a big data cluster, deploy it on the cloud, and explore DevOps activities such as CI/CD and containerization. Next, we cover big data concepts such as ETL for data sources, Hadoop clusters, and their applications. Towards the end of the book, we explore ERP applications useful for migrating to DevOps frameworks and examine a few case studies for migrating big data and prediction models. By the end of this book, you will have mastered implementing DevOps tools and strategies for your big data clusters.

Title Page

Credits

About the Author

About the Reviewers

www.PacktPub.com

Customer Feedback

Preface

Free Chapter

Introduction to DevOps

DevOps application - business scenarios

Business drivers for DevOps adoption to big data

Planning the DevOps strategy

Benefits of DevOps

Summary

Introduction to Big Data and Data Sciences

DevOps Framework

DevOps best practices

DevOps frameworks

Summary

Big Data Hadoop Ecosystems

Big data Hadoop ecosystems

Big data clusters

Hadoop big data cluster nodes

Commercial Hadoop distributions

Capacity planning for systems

Summary

Cloud Computing

Cloud computing technologies

Multi-tier cloud architecture model

Summary

Building Big Data Applications

Traditional enterprise architecture

Principles to build big data enterprise applications

Big data systems life cycle

Building enterprise applications with Spark

Data science

Summary

DevOps - Continuous Integration and Delivery

Best practices for CI/CD

Jenkins setup

Git (SCM) integration with Jenkins

Maven (Build) tool Integration with Jenkins

Building jobs with Jenkins

Source code review - Gerrit

Installation of Gerrit

Repository management

Testing with Jenkins

Continuous delivery- Build Pipeline

Jenkins features

Summary

DevOps Continuous Deployment

Chef

Ansible

Monitoring

Splunk

Nagios monitoring tool for infrastructure

Integrated dashboards for network analysis, monitoring, and bandwidth

Summary

Containers, IoT, and Microservices

Virtualization

Containers

Container orchestration

Internet of Things (IoT)

Microservices

Summary

DevOps for Digital Transformation

Digital transformation

Big data and DevOps

Cloud migration - DevOps

Migration to microservices - DevOps

Apps modernization

Architecture migration approach

Best practices for architectural and implementation considerations

DevOps for data science

DevOps for authentication and security

DevOps for IoT systems

Summary

DevOps Adoption by ERP Systems

DevOps Periodic Table

Business Intelligence Trends

Testing Types and Levels

Java Platform SE 8

Customer Reviews

5 star

4 star

3 star

2 star

1 star

Business drivers for DevOps adoption to big data

Factors contributing to wide-scale popularity and adoption of DevOps among big data systems are listed as follows.

Data explosion

Data is the new form of currency--yes you read right, it's as much a valuable asset as oil and gold. In the past decade, many companies realized the potential of data as an invaluable asset to their growth and performance.

Let's understand how data is valuable. For any organization, data could be in many forms such as, for example, customer data, product data, employee data, and so on. Not having the right data on your employees, customers, or products could be devastating. It's basic knowledge and common sense that the correct data is key to running a business effectively. There is hardly any business today that doesn't depend on data-driven decisions; CEOs these days are relying more on data for business decisions than ever before, such as which product is more successful in the market, how much demand exists area-wise, which price is more competitive, and so on.

Data can be generated through multiple sources, internal, external, and even social media. Internal data is the data generated through internal systems and operations, such as in a bank, adding new customers or customer transactions with the bank through multiple channels such as ATM, online payments, purchases, and so on. External sources could be procuring gold exchange rates and foreign exchange rates from RBI. These days, social media data is widely used for marketing and customer feedback on products. Harnessing the data from all avenues and using it intelligently is key for business success.

Going a step further, a few companies even monetize data, for example, Healthcare IQ, Owens & Minor, State Street Global Corporation, Ad Juggler, comScore, Verisk Analytics, Nielsen, and LexisNexis. These organizations buy raw data such as web analytics on online product sales, or online search records for each brand, reprocess the data into an organized format, and sell it to research analysts or organizations looking for competitor intelligence data to reposition their products in markets.

Let's analyze the factors fueling the growth of data and business. Fundamental changes in market and customer behavior have had a significant impact on the data explosion. Some of the key drivers of change are:

Customer preference: Today, customers have many means of interacting with businesses; for example, a bank provides multiple channels such as ATM withdrawals, online banking, mobile banking, card payments, on-premise banking, and so on. The same is true for purchases; these can be in the shop, online, mobile-based, and so on, which organizations have to maintain for business operations. So, these multiple channels contribute to increased data management.
Social media: Data is flooding in from social media such as Facebook, LinkedIn, and Twitter. On the one hand, they are social interaction sites between individuals; on the other hand, companies also rely on social media to socialize their products. The data posted in terabytes/petabytes, in turn, is used by many organizations for data mining too. This is contributing to the huge data explosion.
Regulations: Companies are required to maintain data in proper formats for a stipulated time, as required by regulatory bodies. For example, to combat money laundering, each organization dealing with finance is required to have clear customer records and credentials to share with regulatory authorities over extended periods of time, such as 10 to 15 years.
Digital world: As we move towards the paperless digital world, we keep adding more digital data, such as e-books and ERP applications to automate many tasks and avoid paperwork. These innovations are generating much of the digital data growth as well.

The next generation will be more data intensive, with the Internet of Things and data science at the forefront, driving business and customer priorities.

Cloud computing

Acceptance of cloud platforms as the de facto service line has brought many changes to procuring and managing infrastructure. Provisioning hardware and other types of commodity work on the cloud is also important for improving efficiency, as moving these IT functions to the cloud enhances the efficiency of services, and allows IT departments to shift their focus away from patching operating systems. DevOps with cloud adoption is the most widely implemented popular option. With cloud penetration, addition of infrastructure/servers is just a click away. This, along with credible open source tools, has paved the way for DevOps.

In a fraction of time, build, QA, and pre-prod machines can be added as exact replicas and configurations as required, using open source tools.

Big data

Big data is the term used to represent multiple dimensions of data such as large volumes, velocity, and variety, and delivering value for the business. Data comes from multiple sources, such as structured, semi-structured, and unstructured data. The data velocity could be a batch mode, real-time from a machine sensor or online server logs, and streaming data in real time. The volumes of data could be terabytes or petabytes, which are typically stored on Hadoop-based storage and other open source platforms. Big data analytics extends to building social media analytics such as market sentiment analysis based on social media data from Twitter, LinkedIn, Facebook, and so on; this data is useful to understand customer sentiment and support marketing and customer service activities.

Data science and machine learning

Data science as a field has many dimensions and applications. We are familiar with science; we understand the features, behavior patterns, and meaningful insights that result in formulating reusable and established formulas. In a similar way, data can also be investigated to understand the behavior patterns and meaningful insights, through engineering and statistical methods. Hence it can be viewed as data + science, or the science of data. Machine learning is a combination of data extraction, extract, transform, load (ETL) or extract, load, transform (ELT) preparation, and using prediction algorithms to derive meaningful patterns from data to generate business value. These projects have a development life cycle in line with a project or product development. Aligning with DevOps methodologies will provide a valuable benefit for the program evolution.

In-memory computing

Traditional software architecture was formerly based on disks as the primary data storage; then the data moved from disk to main memory and CPU to perform aggregations for business logic. This caused the IO overhead of moving large volumes of data back and forth from disk to memory units.

In-memory technology is based on hardware and software innovations to handle the complete business application data in the main memory itself, so the computations are very fast. To enable in-memory computing, many underlying hardware and software advancements have contributed.

The software advancements include the following:

Partitioning of data
No aggregate tables
Insert the only delta
Data compression
Row plus column storage

The hardware advancements include the following:

Multi-core architecture allows massive parallel scaling
Multifold compression
Main memory has scalable capacity
Fast prefetch unlimited size

We will elaborate on these in detail in coming chapters.

Hands-on DevOps

By : Sricharan Vadapalli

Hands-on DevOps

By: Sricharan Vadapalli

Overview of this book

Related Content you might be interested in

Current Title:

Hands-on DevOps

Apache Hadoop 3 Quick Start Guide

Data Lake for Enterprises

Architecting Cloud Computing Solutions

Business drivers for DevOps adoption to big data

Data explosion

Cloud computing

Big data

Data science and machine learning

In-memory computing