Book Image

Learning Hadoop 2

Book Image

Learning Hadoop 2

Overview of this book

Table of Contents (18 chapters)
Learning Hadoop 2
Credits
About the Authors
About the Reviewers
www.PacktPub.com
Preface
Index

Kite Data


The Kite SDK (http://www.kitesdk.org) is a collection of classes, command-line tools, and examples that aims at easing the process of building applications on top of Hadoop.

In this section we will look at how Kite Data, a subproject of Kite, can ease integration with several components of a Hadoop data warehouse. Kite examples can be found at https://github.com/kite-sdk/kite-examples.

On Cloudera's QuickStart VM, Kite JARs can be found at /opt/cloudera/parcels/CDH/lib/kite/.

Kite Data is organized in a number of subprojects, some of which we'll describe in the following sections.

Data Core

As the name suggests, the core is the building block for all capabilities provided in the Data module. Its principal abstractions are datasets and repositories.

The org.kitesdk.data.Dataset interface is used to represent an immutable set of data:

@Immutable
public interface Dataset<E> extends RefinableView<E> {
  String getName();
  DatasetDescriptor getDescriptor();
  Dataset<E>...