Book Image

Apache Ignite Quick Start Guide

By : Sujoy Acharya
Book Image

Apache Ignite Quick Start Guide

By: Sujoy Acharya

Overview of this book

Apache Ignite is a distributed in-memory platform designed to scale and process large volume of data. It can be integrated with microservices as well as monolithic systems, and can be used as a scalable, highly available and performant deployment platform for microservices. This book will teach you to use Apache Ignite for building a high-performance, scalable, highly available system architecture with data integrity. The book takes you through the basics of Apache Ignite and in-memory technologies. You will learn about installation and clustering Ignite nodes, caching topologies, and various caching strategies, such as cache aside, read and write through, and write behind. Next, you will delve into detailed aspects of Ignite’s data grid: web session clustering and querying data. You will learn how to process large volumes of data using compute grid and Ignite’s map-reduce and executor service. You will learn about the memory architecture of Apache Ignite and monitoring memory and caches. You will use Ignite for complex event processing, event streaming, and the time-series predictions of opportunities and threats. Additionally, you will go through off-heap and on-heap caching, swapping, and native and Spring framework integration with Apache Ignite. By the end of this book, you will be confident with all the features of Apache Ignite 2.x that can be used to build a high-performance system architecture.
Table of Contents (9 chapters)

Why Apache Ignite?

Apache Ignite is an open source In-Memory Data Grid (IMDG), distributed database, caching and high performance computing platform. It offers a bucketload of features and integrates well with other Apache frameworks such as Hadoop, Spark, and Cassandra.

So why do we need Apache Ignite? We need it for its High Performance and Scalability.

Of course, the phrase high performance might be very popular in our industry, but it's equally ambiguous. There's no established numerical threshold for when regular performance becomes high performance, just as there's no clear threshold for when data becomes Big Data, or when services become Microservices.

Fortunately, culture tends to generate its own barometers, and in computer science, the term high performance generally refers to the prowess possessed by supercomputers. Supercomputers are used to achieve high throughput using distributed parallel processing. They are mainly used for processing compute-intensive tasks such as weather forecasting, gene model analysis, big-bang simulations, and so on. High performance computing enables us to process huge chunks of data as quickly as possible.

Following the supercomputers analogy, we can stack up many virtual machines/workstations (form a grid) to process a computationally intensive task, but in traditional database-centric applications, parallel processing doesn't scale linearly. If we add 10 more machines to the grid, it will not process 10 times faster. At most, it can gain 2-4% in performance.

Apache Ignite plays a key role here to achieve a 20-30% linear performance improvement. It keeps data in RAM for fast processing and linear scaling. If you add more workstations to the grid, it will offer higher scalability and performance gains.

NoSQL databases were introduced to mitigate RDBMS scalability issues. There are four types of NoSQL databases, used to handle different use cases, but still, a NoSQL database cannot help us to scale our system to handle real high volume transactional data. Apache Ignite offers caching APIs to process a high volume of ACID-compliant transactional data.

If you need to process records in a transactional manner and still need a 20-30% performance gain over a traditional database, Apache Ignite can offer you high performance improvement, linear scalability, and ACID compliant transactions with high availability and resiliency.

Apache Ignite can be used for various types of data sources, from high volume financial service transaction data to streams of IoT sensor data. Ignite stores data in RAM for fast processing throughput but for resiliency, you can persist the data in a third-party data store as well as in the native Ignite persistence store. We will explore each of them later.

Ignite offers an ANSI SQL query API to query data, an API to perform CRUD on caches, ACID transactions, a compute and service grid, streams, and complex event processing to Machine Learning APIs.

NoSQL and NewSQL
NoSQL came into the picture to solve the RDBMS scalability bottleneck, they are eventually consistency and follows the CAP theorem of distributed transaction. Doesn't offer transactional consistency, relational SQL joins but scales many times faster than the RDBMs. NewSQL is a new type of databases offer the ACID complaint distributed transaction that can scale. Apache Ignite can be termed as a NewSQL db