Book Image

Raspberry Pi Super Cluster

By : Andrew K. Dennis
Book Image

Raspberry Pi Super Cluster

By: Andrew K. Dennis

Overview of this book

A cluster is a type of parallel/distributed processing system which consists of a collection of interconnected stand-alone computers cooperatively working together. Using Raspberry Pi computers, you can build a two-node parallel computing cluster which enhances performance and availability. This practical, example-oriented guide will teach you how to set up the hardware and operating systems of multiple Raspberry Pi computers to create your own cluster. It will then navigate you through how to install the necessary software to write your own programs such as Hadoop and MPICH before moving on to cover topics such as MapReduce. Throughout this book, you will explore the technology with the help of practical examples and tutorials to help you learn quickly and efficiently. Starting from a pile of hardware, with this book, you will be guided through exciting tutorials that will help you turn your hardware into your own super-computing cluster. You'll start out by learning how to set up your Raspberry Pi cluster's hardware. Following this, you will be taken through how to install the operating system, and you will also be given a taste of what parallel computing is about. With your Raspberry Pi cluster successfully set up, you will then install software such as MPI and Hadoop. Having reviewed some examples and written some programs that explore these two technologies, you will then wrap up with some fun ancillary projects. Finally, you will be provided with useful links to help take your projects to the next step.
Table of Contents (15 chapters)
Raspberry Pi Super Cluster
Credits
About the Author
About the Reviewers
www.PacktPub.com
Preface
Index

Raspberry Pi and parallel computing


Having reviewed some of the key terms of High Performance Computing, it is now time to turn our attention to the Raspberry Pi and how and why we intend to implement many of the ideas explained so far.

This book assumes that you are familiar with the basics of the Raspberry Pi and how it works, and have a basic understanding of programming. Throughout this book when using the term Raspberry Pi, it will be in reference to the Model B version.

For those of you new to the device, we recommend reading a little more about it at the official Raspberry Pi home page:

http://www.raspberrypi.org/

Other topics covered in this book, such as Apache Hadoop, will also be accompanied with links to information that provides a more in-depth guide to the topic at hand.

Due to the Raspberry Pi's small size and low cost, it makes a good alternative to building a cluster in the cloud on Amazon, or similar providers which can be expensive or using desktop PC's.

The Raspberry Pi comes with a built-in Ethernet port, which allows you to connect it to a switch, router, or similar device. Multiple Raspberry Pi devices connected to a switch can then be formed into a cluster; this model will form the basis of our hardware configuration in the book.

Unlike your laptop or PC, which may contain more than one CPU, the Raspberry Pi contains just a single ARM processor; however, multiple Raspberry Pi's combined give us more CPU's to work with.

One benefit of the Raspberry Pi is that it also uses SD cards as secondary storage, which can easily be copied, allowing you to create an image of the Raspberry Pi's operating system and then clone it for re-use on multiple machines. When starting out with the Raspberry Pi this is a useful feature and something that will be covered in Chapter 2, Setting Up your Raspberry Pi Software and Hardware for Parallel Computing.

The Model B contains two USB ports allowing us to expand the device's storage capacity (and the speed of accessing the data) by using a USB hard drive instead of the SD card.

From the perspective of writing software, the Raspberry Pi can run various versions of the Linux operating system as well as other operating systems, such as FreeBSD and the software and tools associated with development on it. This allows us to implement the types of technology found in Beowulf clusters and other parallel systems. We shall provide an overview of these development tools next.

Programming languages and frameworks

A number of programming languages including Fortran, C/C++, and Java are available on the Raspberry Pi, including via the standard repositories. These can be used for writing parallel applications using implementations of MPI, Hadoop, and the other frameworks we discussed earlier in this chapter.

Fortran, C, and C++ have a long history with parallel computing and will all be examined to varying degrees throughout the book. We will also be installing Java in order to write Hadoop-based MapReduce applications.

Fortran, due to its early implementation on supercomputing projects is still popular today for parallel computing application development, as a large body of code that performs specific scientific calculations exists. In Chapter 2, Setting Up your Raspberry Pi Software and Hardware for Parallel Computing, we will provide brief instructions on installing it onto your Raspberry Pi and provide a further project in Chapter 7, Going Further.

In Chapter 3, Parallel Computing - MPI on the Raspberry Pi, we will install MPICH and run an example C application that comes bundled with the library, which will give you the opportunity of using the Message Passing Interface (MPI).

MPI is a language-independent message-passing communication method developed in the early 1990's to aid parallel computing application development. The topic of MPI will be covered in greater depth in Chapter 3, Parallel Computing - MPI on the Raspberry Pi, where we will test an application that calculates π using two Raspberry Pi devices.

In Chapter 4, Hadoop – Distributed Applications on the Raspberry Pi, we examine the Java programming language and Apache Hadoop in further detail. These form the final two important technologies we will cover in this book.

Apache Hadoop is an open source Java-based MapReduce framework designed for distributed parallel application development.

A MapReduce framework allows an application to take, for example, a number of data sets, divide them up, and mine each data set independently. This can take place on separate devices and then the results are combined into a single data set from which we finally extract a meaningful value.

In Chapter 5, MapReduce Applications with Hadoop and Java, we explain MapReduce in detail. The MapReduce model lends itself to being deployed on COTS clusters and cloud services such as EC2. In this book we will demonstrate how to set up Hadoop on two Raspberry Pis in order to mine for data and calculate π using a Monte Carlo Simulator.

Finally the Appendix of this book contains a number of links and resources that the reader may find of interest for Fortran, Java, C, and C++.