Book Image

HP Vertica Essentials

By : Rishabh Agrawal
Book Image

HP Vertica Essentials

By: Rishabh Agrawal

Overview of this book

<p>With the rise of Massively Parallel Processing (MPP) and NewSQL databases, many users are confused about which MPP technology to opt for. Today, HP Vertica is gaining a lot of traction as a major MPP technology. Vertica's distributed architecture allows fast query processing, and it is a highly fault-tolerant architecture, thus making it one of the most sought-after MPP databases today.</p> <p>HP Vertica Essentials will help you to learn day-to-day administration activities in a step-by-step format. You will start by learning how to install Vertica, followed by its management and monitoring. You will learn about the different backup and restore techniques, including the concept of projections in Vertica. Finally, you will explore the various techniques to improve performance and bulk loading in Vertica. By the end of this book, you will be able to install, manage, and monitor Vertica efficiently.</p>
Table of Contents (13 chapters)

Chapter 1. Installing Vertica

Massively Parallel Processing (MPP) databases are those which partition (and optionally replicate) data into multiple nodes. All meta-information regarding data distribution is stored in master nodes. When a query is issued, it is parsed and a suitable query plan is developed as per the meta-information and executed on relevant nodes (nodes that store related user data). HP offers one such MPP database called Vertica to solve pertinent issues of Big Data analytics.

Vertica differentiates itself from other MPP databases in many ways. The following are some of the key points:

  • Column-oriented architecture: Unlike traditional databases that store data in a row-oriented format, Vertica stores its data in columnar fashion. This allows a great level of compression on data, thus freeing up a lot of disk space. (More on this is covered in Chapter 5, Performance Improvement.)

  • Design tools: Vertica offers automated design tools that help in arranging your data more effectively and efficiently. The changes recommended by the tool not only ease pressure on the designer, but also help in achieving seamless performance. (More on this is covered in Chapter 5, Performance Improvement.)

  • Low hardware costs: Vertica allows you to easily scale up your cluster using just commodity servers, thus reducing hardware-related costs to a certain extent.

This chapter will guide you through the installation and creation of a Vertica cluster. This chapter will also cover the installation of Vertica Management Control, which is shipped with the Vertica Enterprise edition only. It should be noted that it is possible to upgrade Vertica to a higher version but vice versa is not possible.

Before installing Vertica, you should bear in mind the following points:

  • Only one database instance can be run per cluster of Vertica. So, if you have a three-node cluster, then all three nodes will be dedicated to one single database.

  • Only one instance of Vertica is allowed to run per node/host.

  • Each node requires at least 1 GB of RAM.

  • Vertica can be deployed on Linux only and has the following requirements:

    • Only the root user or the user with all privileges (sudo) can run the install_vertica script. This script is very crucial for installation and will be used at many places.

    • Only ext3/ext4 filesystems are supported by Vertica.

    • Verify whether rsync is installed.

    • The time should be synchronized in all nodes/servers of a Vertica cluster; hence, it is good to check whether NTP daemon is running.