Book Image

Distributed Computing in Java 9

Book Image

Distributed Computing in Java 9

Overview of this book

Distributed computing is the concept with which a bigger computation process is accomplished by splitting it into multiple smaller logical activities and performed by diverse systems, resulting in maximized performance in lower infrastructure investment. This book will teach you how to improve the performance of traditional applications through the usage of parallelism and optimized resource utilization in Java 9. After a brief introduction to the fundamentals of distributed and parallel computing, the book moves on to explain different ways of communicating with remote systems/objects in a distributed architecture. You will learn about asynchronous messaging with enterprise integration and related patterns, and how to handle large amount of data using HPC and implement distributed computing for databases. Moving on, it explains how to deploy distributed applications on different cloud platforms and self-contained application development. You will also learn about big data technologies and understand how they contribute to distributed computing. The book concludes with the detailed coverage of testing, debugging, troubleshooting, and security aspects of distributed applications so the programs you build are robust, efficient, and secure.
Table of Contents (17 chapters)
Title Page
Credits
About the Author
About the Reviewer
www.PacktPub.com
Preface
Customer Feedback
2
Communication between Distributed Applications
3
RMI, CORBA, and JavaSpaces

Chapter 8. Big Data Analytics

Big data (as embodied by Hadoop clusters) and Big Compute (as embodied by MPI clusters) provide unique capabilities for storing and processing large volumes of data. Hadoop clusters make distributed computing readily accessible to the Java community, while MPI clusters provide high parallel efficiency for compute-intensive workloads. Bringing the big data and Big Compute communities together is an active area of research. Projects such as Apache ZooKeeper provide a centralized infrastructure and service that enables synchronization across a cluster, which is the way to achieve distributed computing in big data systems.

In this chapter, we will cover the following:

  • What is big data?
  • Big data characteristics
  • NoSQL databases
  • Hadoop, MapReduce, and HDFS
  • Distributed computing for big data
  • ZooKeeper for distributed computing

While technologies such as Hadoop, Hbase, Accumulo, and Cassandra allow us to store, query, and index large volumes of complex data, Dynamic Distributed...