Book Image

Learning Puppet Security

Book Image

Learning Puppet Security

Overview of this book

Table of Contents (17 chapters)
Learning Puppet Security
About the Author
About the Reviewers

What is Puppet?

The Puppet Labs website describes open source Puppet as follows:

Open source Puppet is a configuration management system that allows you to define the state of your IT infrastructure, then automatically enforces the correct state.

What does this mean, though?

Puppet is a configuration management tool. A configuration management tool is a tool that helps the user specify how to put a computer system in a desired state. Other popular tools that are considered as configuration management tools are Chef and CFEngine. There are also a variety of other options that are gaining a user base, such as Bcfg2 and Salt.

Chef is another configuration management tool. It uses pure Ruby Domain-specific Language (DSL) similar to Puppet. We'll cover what a domain-specific language is shortly. This difference allows you to write the desired state of your systems in Ruby. Doing so allows one to use the features of the Ruby language, such as iteration, to solve some problems that can be more difficult to solve in the stricter domain-specific language of Puppet. However, it also requires you to be familiar with Ruby programming. More information on Chef can be found at

CFEngine is the oldest of the three main tools mentioned here. It has grown into a very mature platform as it has expanded. Puppet was created out of some frustrations with CFEngine. One example of this is that the CFEngine community was formally quite closed, that is, they didn't accept user input on design decisions. Additionally, there was a focus in CFEngine on the methods used to configure systems. Puppet aimed to be a more open system that was community-focused. It also aimed to make the resource the primary actor, and relied on the engine to make necessary changes instead of relying on scripts in most cases.


Many of these issues were addressed in CFEngine 3, and it retains a very large user base. More information on CFEngine can be found at

Bcfg2 and Salt are both tools that are gaining a user base. Both written in Python, they provide another option for a user who may be more familiar with Python than other languages. Information on these tools, as well as a list of others that are available, can be found at

Configuration management tools were brought about by a desire to make system administration work repeatable, as well as automate it.

In the early days of system administration, it was very common for an administrator to install the operating system needed as well as install any necessary software packages. When systems were simple and few in number, this was a low effort way of managing them.

As systems grew more complex and greater numbers of them were installed, this became much more difficult. Troubleshooting an application as it began to run on multiple systems also became difficult. The difference in software versions on installed nodes and other configuration differences created inconsistencies in the behavior of multiple systems that were running the same application. Installation manuals, run books, and other forms of documentation were often deployed to try to remedy this, but it was clear that we needed a better way.

As time moved on, system administrators realized that they needed a better way to manage their systems. A variety of methods were born, but many of them were home built. They often used SSH to manage remote hosts. I also built several such systems at various places before coming across Puppet.

Puppet sought to ease the pain and shortcomings of the early days. It was a big change from anything that was present at the time. A large part of this was because of its declarative nature.

Declarative versus imperative approaches

At the core of Puppet is software that allows you to specify the state of the system and let Puppet get the system there. It differs from many of the other products in the configuration management space due to its declarative nature.

In a declarative system, we model the desired state of the resources (things being managed).

Declarative systems have the following properties:

  • Desired state is expressed, not steps used to get there

  • Usually no flow control, such as loops; it may contain conditional statements

  • Actions are normally idempotent

  • Dependency is usually explicitly declared


The concept of actions being idempotent is a very important one in Puppet. It means that actions can be repeated without causing unnecessary side effects. For example, removing a user is idempotent, because removing it when it doesn't exist causes no side effects. Running a script that increments to the next user ID and creates a user may not be idempotent, because the user ID might change.

Imperative systems, on the other hand, use algorithms and steps to express their desired state. Most traditional programming languages, such as C and Java, are considered imperative. Imperative systems have the following properties:

  • They use algorithms to describe the steps to the solution

  • They use flow control to add conditionals and loops

  • Actions may not be idempotent

  • Dependency is normally executed by ordering

In Puppet, which is declarative, the users can describe how they want the system to look in the end, and leave the implementation details of how to get there up to the types and providers within Puppet. Puppet uses types, which represent resources, such as files or packages. Each type can optionally be implemented by one or more provider.

Types provide the core functionality available in Puppet. The type system is extensible, and additional types can be added using pure Ruby code. Later on in this chapter, we'll use the file and package types in our example.

Providers include the code for the type that actually does the low level implementation of a resource. Many types have several providers that implement their functionality in different ways. An example of this is the package type. It has providers for RPM, Yum, dpkg, Windows using MSI, and several others. While it is not a requirement that all types have multiple providers, it is not uncommon to see them, especially for resources that have different implementation details across operating systems.

This system of types and providers isolates the user from having to have specific knowledge of how a given task is done. This allows them to focus on how the system should be configured, and leave specific implementation details, such as how to put it in that state, to Puppet.

A few tools, such as Chef, actually use more of a hybrid approach. They can be used in a declarative state, but also allow the use of loops and other flow control structures that are imperative. Puppet is slowly starting to gain some support for this in their new future parser, however these are experimental and advanced features at this point.

While the declarative approach may have a larger learning curve, especially around dependency management, many sysadmins find it a much better fit with their way of thinking once they learn how it works.

The Puppet client-server model

Puppet uses a client-server model in the most common configurations. In this mode, one or more systems, called Puppet Masters, contain files called manifests. Manifests are code written in the Puppet DSL. A DSL is a language designed to be used for a specific application. In this case, the language is used to describe the desired state of a system. This differs from more general purpose languages, such as C and Ruby, in that it contains specialized constructs for the problem being solved. In this case, the resources in the language are specific to the configuration management domain.

Manifests contain the classes and resources which Puppet uses to describe the state of the system. They also contain declarations of the dependencies between these resources.

Classes are often bundled up into modules which package up classes into reusable chunks that can be managed separately. As your system becomes more complicated, using modules helps you manage each subsystem independently of the others.

The client systems contain the Puppet agent, which is the component that communicates with the master. At specified run intervals (30 minutes by default), the agent will run and the following actions will take place:

  1. Custom plugins, such as facts, types, and providers, are sent to the client, if configured.

  2. The client collects facts and sends them to the master.

  3. The master compiles a catalog and sends it to the client.

  4. The client processes the catalog sent by the master.

  5. The client sends the reporting data to the master, if configured.

The catalog, sent to the client by the master, contains a compiled state of the system resources of the client. The client then applies this information using types and providers to bring the system into the desired state. The following illustration shows how data flows between the components:

It is also possible to run Puppet in a masterless mode. In this mode, the Puppet manifests and other needed components, such as custom facts, types, and providers, are distributed to each system using an out of band method, such as scp or rsync. Puppet is then applied on the local node using cron or some other tool.

cron has the advantage of not requiring the server setup with open ports that the master-based setup has. In some organizations, this makes it easier to get past information security teams. However, many of the reporting and other benefits we will explore in this book are less effective when run in this fashion. The book Puppet 3: Beginners Guide, John Arundel, Packt Publishing, has a good amount of information about such a masterless setup.

Other Puppet components

Puppet has a number of other components that form part of the Puppet ecosystem, which are worth exploring due to their use as security tools. The specific components we are going to explore here include PuppetDB and Hiera.


PuppetDB is an application used to store information on the Puppet infrastructure. Released in 2012, PuppetDB solved performance issues present in the older storeconfigs method that stored information about Puppet runs.

PuppetDB allows you to store facts, catalogs, reports, and resource information (via exported resources). Mining this data, using one of the reporting APIs, is an easy and powerful way to get a view of your infrastructure. More information on PuppetDB will be presented in Chapter 3, Puppet for Compliance, as well as Chapter 4, Security Reporting with Puppet.


Hiera was a new feature introduced in Puppet 3. It is a hierarchal data store, which helps to keep information about your environment. This allows you to separate data about the environment from code that acts on the environment. By doing so, you can apply separate security policies to the code that drives the environment and data about the systems.

Before Hiera, it was not uncommon to see large sections of Puppet code dedicated to maintaining sites or installation of specific information on the systems under management. This area was often difficult to maintain if the ability to override parameters using many different factors was needed.

By adding a hierarchy that can depend on any facts, it becomes much easier to store the data needed for the systems under management. A model of most specific to least specific can then be applied, which makes it much easier to override the default data at a site, environment, or system level.

For example, let's say you had a set of development environments where a certain group of development accounts needed to get created, and SSH access to those accounts was granted. However, these accounts and the access granted should only exist in the development machines, and not in production. Without Hiera, there would likely be site-specific information in the modules to manage the SSH configuration, and perhaps in the user creation module to manage the users. Using Hiera, we can add a fact for the type of system (production or development) and store which users get created there, or have access. This moves the list of users with access to the system out of the code itself, and into a data file.

As our examples get more complicated later in this book, we will explore using Hiera to store some system data.


Downloading the example code

You can download the example code files for all Packt books you have purchased from your account at If you purchased this book elsewhere, you can visit and register to have the files e-mailed directly to you.