Book Image

Extending Puppet

By : Alessandro Franceschi
Book Image

Extending Puppet

By: Alessandro Franceschi

Overview of this book

Table of Contents (21 chapters)
Extending Puppet
About the Author
About the Reviewers

Puppet in action

Client-server communication is done using REST-like API calls on an SSL socket; basically, it's all HTTPS traffic from clients to the server's port 8140/TCP.

The first time we execute Puppet on a node, its x509 certificates are created and placed in ssldir, and then the Puppet Master is contacted in order to retrieve the node's catalog.

On the Puppet Master, unless we have autosign enabled, we must manually sign the client's certificates using the cert subcommand:

puppet cert list # List the unsigned clients certificates
puppet cert list --all # List all certificates
puppet cert sign <certname> # Sign the given certificate

Once the node's certificate has been recognized as valid and been signed, a trust relationship is created, and a secure client-server communication can be established.

If we happen to recreate a new machine with an existing certname, we have to remove the certificate of the old client from the server with the following command:

puppet cert clean  <certname> # Remove a signed certificate

At times, we may also need to remove the certificates on the client; we can do this with the following command:

mv /var/lib/puppet/ssl /var/lib/puppet/ssl.bak

This is safe enough as the whole directory is recreated with new certificates when Puppet is run again (never do this on the Master as it'll remove all the clients' certificates previously signed, along with the Master's certificate, whose public key has been copied to all clients).

A typical Puppet run is composed of different phases. It's important to know them in order to troubleshoot problems:

  1. Execute Puppet on the client. On a root shell, run puppet agent -t.

  2. If pluginsync = true (default from Puppet 3.0), the client retrieves any extra plugin (facts, types, and providers) present in the modules on the Master's $modulepath client output with the following command:

    Info: Retrieving plugin
  3. The client runs facter and sends its facts to the server client output:

    Info: Loading facts in /var/lib/puppet/lib/facter/... [...]
  4. The Master looks for the client's certname in its nodes' list.

  5. The Master compiles the catalog for the client using also its facts. Master's logs:

    Compiled catalog for <client> in environment production in 8.22 seconds
  6. If there are syntax errors in the processed Puppet code, they are exposed here, and the process terminates; otherwise, the server sends the catalog to the client in the PSON format.

    Client output:

    Info: Caching catalog for <client>
  7. The client receives the catalog and starts to apply it locally. If there are dependency loops, the catalog can't be applied and the whole run fails.

    Client output:

    Info: Applying configuration version '1355353107'
  8. All changes to the system are shown on stdout or in logs. If there are errors (in red or pink, according to Puppet versions), they are relevant to specific resources but do not block the application of other resources (unless they depend on the failed ones).

  9. At the end of the Puppet run, the client sends to the server a report of what has been changed.

    Client output:

    Finished catalog run in 13.78 seconds
  10. The server sends the report to a report collector if enabled.


When dealing with Puppet's DSL, most of the time we use resources as they are single units of configuration that express the properties of objects on the system. A resource declaration is always composed by the following parts:

  • type: a package, service, file, user, mount, exec, and so on

  • title: how it is called and referred in other parts of the code

  • One or more attributes

    type { 'title':
      argument  => value,
      other_arg => value,

Inside a catalog, for a given type, there can be only one title; otherwise, we get an error as follows:

Error: Duplicate declaration: <Type>[<name>] is already declared in file <manifest_file> at line <line_number>; cannot redeclare on node <node_name>.

Resources can be native (written in Ruby), or defined by users in Puppet DSL.

These are examples of common native resources; what they do should be quite obvious:

  file { 'motd':
    path    => '/etc/motd',
    content => "Tomorrow is another day\n",

  package { 'openssh':
    ensure => present,

  service { 'httpd':
    ensure => running, # Service must be running
    enable => true,    # Service must be enabled at boot time

We can write code of this kind in manifests, which are files with a .pp extension that contain valid Puppet code. It's possible to test the effect of this code on the local system with the puppet apply command, which expects the path of a manifest file as the argument:

puppet apply /etc/puppet/manifests/site.pp

We can also directly execute Puppet code with the --execute (-e) option:

puppet apply –e "package { 'openssh': ensure => present }"

In this case, instead of a manifest file, the argument is a fragment of valid Puppet DSL.

For inline documentation about a resource, use the describe subcommand, for example:

puppet describe file


For a complete reference of the native resource types and their arguments, check

The resource abstraction layer

From the previous resource examples, we can deduce that the Puppet DSL allows us to concentrate on the types of objects (resources) to manage, and it doesn't bother us on how these resources may be applied on different operating systems.

This is one of Puppet's strong points; resources are abstracted from the underlying OS; we don't have to care or specify how, for example, to install a package on Red Hat Linux, Debian, Solaris, or Mac OS; we just have to provide a valid package name.

This is possible thanks to Puppet's Resource Abstraction Layer (RAL), which is engineered around the concept of types and providers.

Types, as we have seen, map to an object on the system.

There are more than 50 native types in Puppet (some of them are applicable only to a specific OS); the most commonly used ones are augeas, cron, exec, file, group, host, mount, package, service, and user.

To have a look at their Ruby code and learn how to create custom types, check this file:

ls -l $(facter rubysitedir)/puppet/type

For each type, there is at least one provider, which is the component that enables that type on a specific OS. For example, the package type is known for having a large number of providers that manage the packages' installations on many OSes, which are aix, appdmg, apple, aptitude, apt, aptrpm, blastwave, dpkg, fink, freebsd, gem, hpux, macports, msi, nim, openbsd, pacman, pip, pkgdmg, pkg, pkgutil, portage, ports, rpm, rug, sunfreeware, sun, up2date, urpmi, yum, and zypper.

We can find them with the following command:

ls -l $(facter rubysitedir)/puppet/provider/package/

The Puppet executable offers a powerful subcommand to interrogate and operate with the RAL puppet resource.

For a list of all the users present on the system, type the following:

puppet resource user

For a specific user, type the following:

puppet resource user root

Other examples that might show glimpses of the power of RAL to map a system's resources are as follows:

puppet resource package
puppet resource mount
puppet resource host
puppet resource file /etc/hosts
puppet resource service

The output is in the Puppet DSL format; we can use it in our manifests to reproduce that resource wherever we want.

The puppet resource subcommand can also be used to modify the properties of a resource directly from the command line, and since it uses the Puppet RAL, we don't have to know how to do that on a specific OS, for example, to enable the httpd service:

puppet resource service httpd ensure=running enable=true


We can place the above resources in our first manifest file (/etc/puppet/manifests/site.pp) or in the one included from there, and they would be applied to all our Puppet-managed nodes. This is okay for quick samples out of books, but in real life, things are much different. We have hundreds of different resources to manage and apply, with different logic and properties to (dozens? hundreds? thousands?) different systems.

To help you organize your Puppet code, there are two different language elements; with node, we can confine resources to a given host and apply them only to it; with class, we can group different resources (or other classes) that generally have a common function or task.

Whatever is declared in a node definition is included only in the catalog compiled for that node. The general syntax is as follows:

node $name [inherits $parent_node] {
  [ Puppet code, resources and classes applied to the node ]

Here $name is a placeholder for the certname of the client (by default, it's FQDN) or a regular expression; it's possible to inherit in a node whatever is defined in the parent node and inside the curly braces; we can place any kind of Puppet code, such as resource declarations, class inclusions, and variable definitions. Here are some examples:

node '' {
  package { 'mysql-server':
    ensure => present,
  service { 'mysql':
    ensure => 'running',

However, generally in nodes we just include classes, so a better real-life example would be the following one:

node '' {
  include common
  include mysql

The previous include statements do what we might expect; they include all the resources declared in the referred class.

Note that there are alternatives to the usage of the node statement; we can use an External Node Classifier (ENC) to define which variables and classes are assigned to nodes, or we can have a nodeless setup, where resources applied to nodes are defined in a case statement based on the hostname or a similar fact that identifies a node.

Classes and defines

A class can be defined (the resources provided by the class are defined for later usage, but are not yet included in the catalog) with this syntax:

class mysql {
  $mysql_service_name = $::osfamily ? {
    'RedHat' => 'mysqld',
    default  => 'mysql',
  package { 'mysql-server':
    ensure => present,
  service { 'mysql':
    name => $mysql_service_name,
    ensure => 'running',

Once defined, a class can be declared (the resources provided by the class are actually included in the catalog) in two ways:

  • Just by including it (we can include the same class many times, but it is evaluated only once):

    include mysql
  • Using the parameterized style (available since Puppet 2.6), where we can optionally pass parameters to the class if available (we can declare a class with this syntax only once for each node in our catalog):

    class { 'mysql': }

A parameterized class has a syntax similar to the following code:

class mysql (
  $config_file_template = undef,
) {

In this code, the expected parameters are defined between parentheses, which may or may not have a default value (parameters without default values, such as the $root_password in this sample, must be set explicitly while declaring the class). The declaration of a parameterized class has exactly the same syntax as that of a normal resource:

class { 'mysql':
  $root_password => 's3cr3t',

Puppet 3.0 introduced a feature called data binding; if we don't pass a value for a given parameter, as in the above example, before using the default value if present, Puppet performs an automatic lookup to a Hiera variable with the name $class::$parameter. In this example, it would be mysql::root_password.

This is an important feature that radically changes the approach on how to manage data in Puppet architectures. We will come back to this topic in the following chapters.

Besides classes, Puppet also has defines, which can be considered as classes that can be used multiple times on the same host (with a different title). Defines are also called defined types, since they are types that can be defined using Puppet DSL, contrary to the native types that are written in Ruby.

They have a similar syntax:

define mysql::user (
  $password,                # Mandatory parameter, no defaults set
  $host      = 'localhost', # Parameter with a default value
 ) {
  # Here all the resources

They are also used in a similar way:

mysql::user { 'al':
  $password => 'secret',

Note that defines (also called user-defined types, defined resource types, or definitions), like the one above, even if written in Puppet DSL, have exactly the same usage pattern of native types, that are written in Ruby (packages, services, files, and so on).

In types, besides the parameters that are explicitly exposed, there are two variables that are automatically set: $title is the defined title, and $name, which defaults to the value of $title, can be set to an alternate value.

Since a define can be declared more than once inside a catalog (with different titles), it's important to avoid to declare, inside a define, resources with a static title. For example, this is wrong:

define mysql::user ( ...) {
  exec { 'create_mysql_user':
    [ … ]

This is because when there are two different mysql::user declarations, it will generate an error like the following:

Duplicate definition: Exec[create_mysql_user] is already defined in file /etc/puppet/modules/mysql/manifests/user.pp at line 2; cannot redefine at /etc/puppet/modules/mysql/manifests/user.pp:2 on node 

A correct version could use the $title variable, which is inherently different each time:

define mysql::user ( ...) {
  exec { "create_mysql_user_${title}":
    [ … ]

Class inheritance

We have seen that in Puppet, classes are just containers of resources and have nothing to do with Object-oriented Programming classes; so the definition of class inheritance is somehow limited to a few specific cases.

When using class inheritance, the main class (puppet in the following sample) is always evaluated first, and all the variables and resource defaults that it sets are available in the scope of the child class (puppet::server).

Moreover, the child class can override the arguments of a resource defined in the parent class:

class puppet {
  file { '/etc/puppet/puppet.conf':
    content => template('puppet/client/puppet.conf'),
class puppet::server inherits puppet {
  File['/etc/puppet/puppet.conf'] {
    content => template('puppet/server/puppet.conf'),

Note the syntax used when declaring a resource; we use a syntax like file { '/etc/puppet/puppet.conf': [...] }. When referring to it, the syntax is File['/etc/puppet/puppet.conf'].

Resource defaults

It is possible to set the default argument values for a resource type in order to reduce code duplication. The general syntax to define a resource default is as follows:

Type {
  argument => default_value,

Common examples are as follows:

Exec {
  path => '/sbin:/bin:/usr/sbin:/usr/bin',
File {
  mode  => 0644,
  owner => 'root',
  group => 'root',

Resource defaults can be overridden when declaring a specific resource of the same type.

It is worth noting that the area of effect of the resource defaults might bring unexpected results. The general suggestion is as follows:

  • Place global resource defaults in /etc/puppet/manifests/site.pp outside any node definition.

  • Place local resource defaults at the beginning of a class that uses them (mostly for clarity sake, as they are independent of the parse-order).

We cannot expect a resource default that is defined in a class to be working in another class, unless it is a child class with an inheritance relationship.

Resource references

In Puppet, any resource is uniquely identified by its type and its name. We cannot have two resources of the same type with the same name in a node's catalog.

We have seen that we declare resources with a syntax like the following one:

type { 'name':
  arguments => values,

When we need to reference them (typically when we define dependencies between resources) in our code, the following is the syntax (note the square brackets and the capital letter):


Some examples are as follows:

file { 'motd': ... }
apache::virtualhost { '': .... }
exec { 'download_myapp': .... }

These examples are referenced, respectively, with the following code:


Variables, facts, and scopes

When writing our manifests, we can set and use variables; they help us in organizing which resources we want to apply, how they are parameterized, and how they change according to our logic, infrastructure, and our needs.

They may have different sources:

  • Facter (variables, called facts, automatically generated on the Puppet client)

  • User-defined variables in Puppet code (variables that are defined using Puppet DSL)

  • User-defined variables from an ENC

  • User-defined variables on Hiera

  • Puppet's built-in variables

System's facts

When we install Puppet on a system, the facter package is installed as a dependency. Facter is executed on the client each time Puppet is run, and it collects a large set of key-value pairs that reflect many properties of the system. They are called facts and provide valuable information such as the system's operatingsystem, operatingsystemrelease, osfamily, ipaddress, hostname, fqdn, and macaddress to name just some of the most used ones.

All the facts gathered on the client are available as variables to the Puppet Master and can be used inside manifests to provide a catalog that fits the client.

We can see all the facts of our nodes running locally:

facter -p

(The -p argument is the short version of --puppet and also shows eventual custom facts that are added to the native ones, via our modules.)

User variables in Puppet DSL

Variable definition inside the Puppet DSL follows the general syntax: $variable = value

Let's see some examples. Here, the value is set as string, boolean, array, or hash as shown in the following code:

$redis_package_name = 'redis'
$install_java = true
$dns_servers = [ '' , '' ]
$config_hash = { user => 'joe', group => 'admin' }

Here, the value is the result of a function call (which may have values, as arguments, strings, other data types, or other variables):

$config_file_content = template('motd/motd.erb')

$dns_servers = hiera(name_servers)
$dns_servers_count = inline_template('<%= @dns_servers.length %>')

Here, the value is determined according to the value of another variable (here, the $::osfamily fact is used), using the selector construct:

$mysql_service_name = $::osfamily ? {
  'RedHat' => 'mysqld',
  default  => 'mysql', 

A special value for a variable is undef (similar to Ruby's nil), which basically removes any value to the variable. This can be useful in resources when we want to disable (and make Puppet ignore) an existing attribute:

$config_file_source = undef
file { '/etc/motd':
  source  => $config_file_source,
  content => $config_file_content,

Note that we can't change the value assigned to a variable inside the same class (more precisely, inside the same scope; we will review them later).

$counter = '1'
$counter = $counter + 1

The preceding code will produce the following error:

Cannot reassign variable counter

User variables in an ENC

When an ENC is used for classifying nodes, it returns the classes to include in the requested node and variables. All the variables provided by an ENC are at the top scope (we can reference them with $::variablename all over our manifests).

User variables in Hiera

Hiera is another very popular and useful place to place user data (yes, variables); we will review it extensively in Chapter 2, Hiera; here, let's just point out a few basic usage patterns. We can use it to manage any kind of variable whose value can change according to custom logic in a hierarchical way. Inside manifests, we can look up a Hiera variable using the hiera() function. Some examples are as follows:

$dns = hiera(dnsservers)
class { 'resolver':
  dns_server => $dns,

The previous code can also be written as:

class { 'resolver':
  dns_server => hiera(dnsservers),

In our Hiera YAML files, we would have something like the following:


If our Puppet Master uses Puppet Version 3 or greater, then we can benefit from the Hiera automatic lookup for class parameters, which is the ability to define in Hiera values for any parameter exposed by the class. The above example would become something like the following:

include resolver

and then, in Hiera YAML files:


Puppet's built-in variables

A bunch of other variables is available and can be used in manifests or templates:

Variables set by the client (agent):

  • $clientcert: This is the name of the node (the certname setting in its puppet.conf, by default, is the host's FQDN)

  • $clientversion: This is the Puppet version of the agent

Variables set by the server (Master):

  • $environment: This is a very important special variable, which defines the Puppet's environment of a node (for different environments, the Puppet Master can serve manifests and modules from different paths)

  • $servername, $serverip: Respectively the Master's FQDN and IP address.

  • $serverversion: The Puppet version on the Master (is always better to have Masters with Puppet version equal or newer than the clients)

  • $settings::<setting_name>: Any configuration setting of the Puppet Master's puppet.conf

Variables set by the parser during catalog compilation:

  • $module_name: This is the name of the module that contains the current resource's definition

  • $caller_module_name: This is the name of the module that contains the current resource's declaration

A variable's scope

One of the parts where Puppet development can be misleading and not so intuitive is how variables are evaluated according to the place in the code where they are used.

Variables have to be declared before they can be used, and this is dependent on the parse-order; so, also for this reason, Puppet language can't be considered completely declarative.

In Puppet, there are different scopes, which are partially isolated areas of code where variables and resource default values can be confined and accessed.

There are four types of scopes, from general to local there are:

  • Top scope: Any code defined outside nodes and classes, as what is generally placed in /etc/puppet/manifests/site.pp

  • Node scope: Code defined inside the node's definitions

  • Class scope: Code defined inside a class or define

  • Sub class scope: Code defined in a class that inherits another class

We always write code within a scope, and we can directly access variables (that is, by just specifying their name without using the fully qualified name) defined only in the same scope or in a parent or containing one. So:

  • Top scope variables can be accessed from anywhere

  • Node scope variables can be accessed in classes (used by the node) but not at the Top scope

  • Class (also called local) variables are directly available, with their plain name, only from within the same class, or define where they are set or in a child class

The variables' value or resources default arguments that are defined at a more general level can be overridden at a local level (Puppet always uses the most local value).

It's possible to refer to variables outside a scope by specifying their fully qualified name, which contains the name of the class where the variables is defined, for example, $::apache::config_dir is a variable called config_dir, and is defined in the apache class.

One important change introduced in Puppet 3.x is the forcing of static scoping for variables; this indicates that the parent scope for a class can only be its parent class.

Earlier, Puppet versions had dynamic scoping, where parent scopes were assigned both by inheritance (like in static scoping) and by simple declaration; that is, any class has as a parent the first scope where it has been declared. This means that since we can include classes multiple times, the order used by Puppet to parse our manifests may change the parent scope, and therefore, how a variable is evaluated.

This can obviously lead to any kind of unexpected problems if we are not particularly careful about how classes are declared, with variables evaluated in different parse-order dependent ways. The solution is Puppet 3's static scoping and the need to reference to out-of-scope variables with their fully qualified name.

Meta parameters

Meta parameters are general-purpose parameters available to any resource type even if not explicitly defined. They can be used for different purposes:

  • Manage the ordering of dependencies and resources (more on them in the next section): before, require, subscribe, notify, stage

  • Manage resources' application policies: audit (audit the changes done on the attributes of a resource), noop (do not apply any real change for a resource), schedule (apply the resources only within a given time schedule), and loglevel (manage the log verbosity)

  • Add information to a resource using alias (adds an alias that can be used to reference a resource) and tag (adds a tag that can be used to refer to a group resources according to custom needs; we will see a use case later in this chapter in the external resources section)

Managing order and dependencies

Puppet language is declarative and not procedural (*); it defines states. The order in which resources are written in manifests does not affect the order in which they are applied to the desired state.


(*) This is not entirely true; contrary to resources, variables definitions are parse-order dependent, so the order is important when it is used to define variables. As a general rule, just set variables before using them, which sounds logical but is actually procedural.

There are cases where we need to set some kind of ordering among resources, for example, we want to manage a configuration file only after the relevant package has been installed, or have a service automatically restart when its configuration files changes.

Also, we may want to install packages only after we've configured our packaging systems (apt sources, yum repos, and so on), or install our application only after the whole system and the middleware has been configured.

To manage these cases, there are three different methods, which can coexist, as follows:

  1. Use the meta parameters before, require, notify, and subscribe

  2. Use the chaining arrows operator (respective to the meta parameters: ->, <-, <~, and ~>)

  3. Use run stages

In a typical package/service/configuration file example, we want the package to be installed first. Then, configure it and start the service, and eventually manage its restart if the configuration file changes.

This can be expressed with meta parameters:

package { 'exim':
  before => File['exim.conf'],  
file { 'exim.conf':
  notify => Service['exim'],
service { 'exim': }

This is equivalent to the following chaining arrows syntax:

package {'exim': } ->
file {'exim.conf': } ~>
service{'exim': }

However, the same ordering can be expressed using the alternate reverse meta parameters:

package { 'exim': }
file { 'exim.conf':
  require => Package['exim'],
service { 'exim':
  subscribe => File['exim.conf'], 

They can also be expressed as follows:

service{'exim': } <~
file{'exim.conf': } <-
package{'exim': }

Run stages

Puppet 2.6 introduced the concept of run stages to help users manage the order of dependencies when applying groups of resources.

Puppet provides a default main stage; we can add any number of stages and manage their ordering with the stage resource type using the normal syntax for resources declaration as we have seen previously:

stage { 'pre':
  before => Stage['main'],

This is equivalent to:

stage { 'pre': }
Stage['pre'] -> Stage['main']

We can assign any class to a defined stage with the stage meta parameter:

class { 'yum':
  stage => 'pre',

In this way, all the resources provided by the yum class , which is included in pre-stage are applied before all the other resources (in the default main stage).

The idea of stages at the beginning seemed a good solution to better handle large sets of dependencies in Puppet. In reality, some drawbacks and the augmented risk of having dependency cycles make them less useful than expected.

As a rule of thumb, it is recommended to use them for simple classes (that don't include other classes) and where really necessary (for example, to set up package management configurations at the beginning of a Puppet run, or deploy our application after all the other resources have been managed).

Reserved names and allowed characters

As with every language, Puppet DSL has some restrictions on the names we can give to its elements and the allowed characters.

As a general rule, for names of resources, variables, parameters, classes, and modules, we can use only lowercase letters, numbers, and the underscore (_). Usage of hyphens (-) should be avoided (in some cases, it is forbidden; in others, it depends on Puppet's version).

We can use uppercase letters in variable names (but not at their beginning), and use any character for resources' titles.

Names are case-sensitive, and there are some reserved words that cannot be used as names for resources, classes or defines, or as unquoted word strings in the code:

and, case, class, default, define, else, elsif, false, if, in, import, inherits, node, or, true, undef, unless, main, settings, $string.


Puppet provides different constructs to manage conditionals inside manifests.

Selectors, as we have seen, let us set the value of a variable or an argument inside a resource declaration according to the value of another variable. Selectors, therefore, just return values and are not used to conditionally manage entire blocks of code.

Here's an example of a selector:

$package_name = $::osfamily ? {
  'RedHat' => 'httpd',
  'Debian' => 'apache2',
  default  => undef,

A case statement is used to execute different blocks of code according to the values of a variable. It's recommended to have a default block for unmatched entries. Case statements can't be used inside resource declarations. We can achieve the same result of the previous selector with this case sample:

case $::osfamily {
  'Debian': { $package_name = 'apache2' }
  'RedHat': { $package_name = 'httpd' }
  default: { fail ("OS $::operatingsystem not supported") } 

The if, elsif, and else conditionals, like case, are used to execute different blocks of code, and can't be used inside resources' declarations. We can use any of Puppet's comparison expressions, and we can combine more than one for complex pattern matching.

The previous sample variables assignment can also be expressed in this way:

if $::osfamily == 'Debian' {
  $package_name = 'apache2'
} elsif $::osfamily == 'RedHat' {
  $package_name = 'httpd'
} else {
  fail ("OS $::operatingsystem not supported")

An unless statement is the opposite of if. It evaluates a Boolean condition, and if it's false, it executes a block of code.

Comparison operators

Puppet supports comparison operators that resolve to true or false. They are as follows:

  • Equal ==, returns true if the operands are equal. Used with numbers, strings, arrays, hashes, and Booleans, as shown in the following example:

    if $::osfamily == 'Debian' { [ ... ] }
  • Not equal != , returns true if the operands are different:

    if $::kernel != 'Linux' { [ ... ] }
  • Less than <, greater than >, less than or equal to <= and greater than or equal to >= can be used to compare numbers:

    if $::uptime_days > 365 { [ ... ] }
    if $::operatingsystemrelease <= 6 { [ ... ] }
  • Regex match =~ compares a string (the left operator) with a regular expression (the right operator). Resolves true, if it matches. Regular expressions are enclosed between forward slashes and follow the normal Ruby syntax:

    if $mode =~ /(server|client)/ { [ ... ] }
    if $::ipaddress =~ /^10\./ { [ ... ] }
  • Regex not match !~ , opposite to =~, resolves false if the operands match.

The In operator

The in operator checks if a string is present in another string, an array, or in the keys of a hash; it is case-sensitive:

if '64' in $::architecture
if $monitor_tool in [ 'nagios' , 'icinga' , 'sensu' ]

Expressions combinations

It's possible to combine multiple comparisons with and and or as shown in the following code:

if ($::osfamily == 'RedHat') and ($::operatingsystemrelease == '5') { [ ... ] }
if (operatingsystem == 'Ubuntu') or ($::operatingsystem == 'Mint') { [ ...] }

Exported resources

When we need to provide a host with information about the resources present in another host, things in Puppet become trickier. The only official solution has been, for a long time, to use exported resources; resources are declared in the catalog of a node (based on its facts and variables) but applied (collected) on another node. Some alternative approaches are now possible with PuppetDB; we will review them in Chapter 3, PuppetDB.

Resources are declared with the special @@ notation, which marks them as exported so that they are not applied to the node where they are declared:

@@host { $::fqdn:
  ip  => $::ipaddress,
@@concat::fragment { "balance-fe-${::hostname}":
  target  => '/etc/haproxy/haproxy.cfg',
  content => "server ${::hostname} ${::ipaddress} maxconn 5000",
  tag     => "balance-fe",

Once a catalog that contains exported resources has been applied on a node and stored by the Puppet Master, the exported resources can be collected with the <<| |>> operator, where it is possible to specify search queries:

Host <<| |>>
Concat::Fragment <<| tag == "balance-fe" |>>
Sshkey <<| |>>
Nagios_service <<| |>>

In order to use exported resources, we need to enable on the Puppet Master the storeconfigs option and specify the backend to use. For a long time, the only available backend was Rails' active records, which typically used MySQL for data persistence. This solution was the best for its time but suffered severe scaling limitations. Luckily, things have changed a lot with the introduction of PuppetDB, which is a fast and reliable storage solution for all the data generated by Puppet, including exported resources.

In order to configure a Puppet Master to enable storeconfigs with PuppetDB, we have to add these lines in the [master] section of puppet.conf (more on this in a later chapter):

storeconfigs = true
storeconfigs_backend = puppetdb

If we want to use the old ActiveRecord backend, with a SQLite backend (which is useful to test exported resources without the need to install any other component, but definitively not applicable in production environments), the configuration is (we need to have installed the sqlite packages and ruby bindings) shown in the following code:

storeconfigs = true
dbadapter = sqlite3

To use ActiveRecords with a MySQL backend, we need these configurations:

storeconfigs = true
dbadapter = mysql
dbuser = puppet
dbpassword = secretpassword
dbserver = localhost
dbsocket = /var/run/mysqld/mysqld.sock # If server is local

Obviously, we will need to grant the relevant credentials on MySQL:

# mysql -u root -p
mysql> create database puppet;
mysql> grant all privileges on puppet.* to puppet@localhost identified by 'secretpassword';

Virtual resources

Virtual resources define a desired state for a resource without adding it to the catalog. Like normal resources, they are applied only on the node where they are declared, but like virtual resources, we can apply only a subset of the ones we have declared; they also have a similar usage syntax; we declare them with a single @ prefix (instead of the @@ prefix used for exported resources), and we collect them with <| |> (instead of <<| |>>).

A useful and rather typical example involves user management.

We can declare all our users in a single class, included by all our nodes:

class my_users {
  @user { 'al': […] tag => 'admins' }
  @user { 'matt': […] tag => 'developers' }
  @user { 'joe': [… tag => 'admins' }
[ … ]

These users are actually not created on the system; we can decide which ones we want on a specific node with a syntax like the following:

User <| tag == admins |>

This is equivalent to:

realize(User['al'] , User['joe'])

Note that the realize function needs to address resources by their name.


Modules are self-contained, distributable, and (ideally) reusable recipes to manage specific applications or system's elements.

They are basically just a directory with a predefined and standard structure that enforces configuration over naming conventions for the managed provided classes, extensions, and files.

The $modulepath configuration entry defines where modules are searched; this can be a list of colon-separated directories.

The paths of a module and autoloading

Modules have a standard structure, for example, for a MySQL module:

mysql/            # Main module directory

mysql/manifests/  # Manifests directory. Puppet code here.
mysql/lib/        # Plugins directory. Ruby code here
mysql/templates/  # ERB Templates directory
mysql/files/      # Static files directory
mysql/spec/       # Puppet-rspec test directory
mysql/tests/      # Tests / Usage examples directory

mysql/Modulefile  # Module's metadata descriptor

This layout enables useful conventions that are widely used in the Puppet world; we must know these to understand where to look for files and classes.

For example, when we use modules and write the code:

include mysql

Puppet automatically looks for a class called mysql defined in the file $modulepath/mysql/manifests/init.pp.

The init.pp script is a special case that applies for classes that have the same name of the module. For subclasses there's a similar convention that takes in consideration the subclass name:

include mysql::server

It autoloads the file $modulepath/mysql/manifests/server.pp.

A similar scheme is followed also for defines or classes at lower levels:

mysql::conf { ...}

This define is searched in $modulepath/mysql/manifests/conf.pp

include mysql::server::ha

This class is searched in $modulepath/mysql/manifests/server/ha.pp.

It's generally recommended to follow these naming conventions that allow the autoloading of classes and defines without the need to explicitly import the manifests that contain them.


Even if this is not considered a good practice, we can currently define more than one class or define inside the same manifest; when Puppet parses a manifest, it parses its whole contents.

A module's naming conventions apply also to the files that Puppet provides to clients.

We have seen that the file resource accepts two different and alternative arguments to manage the content of a file: source and content. Both of them have a naming convention when used inside a module.

ERB templates are typically parsed via the template function with a syntax like the following:

content => template('mysql/my.cnf.erb'),

This template is found in $modulepath/mysql/templates/my.cnf.erb.

This also applies for subdirectories, so for example:

content => template('apache/vhost/vhost.conf.erb'),

uses a template located in $modulepath/apache/templates/vhost/vhost.conf.erb.

A similar approach is followed with static files provided via the source argument:

source => 'puppet:///modules/mysql/my.cnf'

serves a file placed in $modulepath/mysql/files/my.cnf.

source => 'puppet:///modules/site/openssh/sshd_config'

serves a file placed in $modulepath/site/openssh/sshd_config

Finally, the whole content of the lib subdirectory in a module has a standard scheme. Here, we can place Ruby code that extends Puppet's functionality and is automatically redistributed from the Master to all clients (if the pluginsync configuration parameter is set to true, this is the default value for Puppet 3 and is widely recommended in any setup).

mysql/lib/augeas/lenses/                # Custom Augeas lenses.
mysql/lib/facter/                       # Custom facts.
mysql/lib/puppet/type/                  # Custom types.
mysql/lib/puppet/provider/<type_name>/  # Custom providers.
mysql/lib/puppet/parser/functions/      # Custom functions.

ERB templates

Files provisioned by Puppet can be templates written in Ruby's ERB templating language.

An ERB template can contain whatever text we need, and have inside <% %> tags an interpolation of variables or Ruby code. We can access in a template, all the Puppet variables (facts or user-assigned) with the <%= tag:

# File managed by Puppet on <%= @fqdn %>
search <%= @domain %>

It is recommended, and will be mandatory in future Puppet versions to refer to variables in a scope using the @ prefix).

To use out of scope variables, we can use the scope.lookupvar method:

path <%= scope.lookupvar('apache::vhost_dir') %>

This uses the variable's fully qualified name. If the variable is at top scope:

path <%= scope.lookupvar('::fqdn') %>

Since Puppet 3, we can use this alternate syntax:

path <%= scope['apache::vhost_dir'] %>

In ERB templates, we can also use more elaborate Ruby code inside a <% opening tag, for example, to reiterate over an array:

<% @dns_servers.each do |ns| %>
nameserver <%= ns %>
<% end %>

The <% tag is used to place line of text if some conditions are met:

<% if scope.lookupvar('puppet::db') == "puppetdb" -%>
  storeconfigs_backend = puppetdb
<% end -%>

Noticed the -%> ending tag here? When the dash is present, no line is introduced on the generated file as it would happen if we had written <% end %>.

Restoring files from a filebucket

Puppet, by default, makes a local copy of all the files that it changes on a system. This functionality is managed with the filebucket type, which allows storing a copy of the original files either on a central server or locally on the managed system.

When we run Puppet, we see messages like:

info: /Stage[main]/Ntp/File[ntp.conf]: Filebucketed /etc/ntp.conf to puppet with sum 7fda24f62b1c7ae951db0f746dc6e0cc

The checksum of the original file is useful to retrieve it; in fact, files are saved in the directory /var/lib/puppet/clientbucket in a series of subdirectories named according to the same checksum. So, given the above example, we can see the original file content with the command:

cat /var/lib/puppet/clientbucket/7/f/d/a/2/4/f/6/7fda24f62b1c7ae951db0f746dc6e0cc/contents

We can show the original path with the command:

cat /var/lib/puppet/clientbucket/7/f/d/a/2/4/f/6/7fda24f62b1c7ae951db0f746dc6e0cc/paths

A quick way to search for the saved copies of a file, therefore, is to use a command like the following:

grep -R /etc/ntp.conf /var/lib/puppet/clientbucket/

Puppet provides the filebucket subcommand to retrieve saved files. In the above example, we can recover the original file with a (not particularly handy) command, as follows:

puppet filebucket restore -l --bucket /var/lib/puppet/clientbucket /etc/ntp.conf 7fda24f62b1c7ae951db0f746dc6e0cc

It's possible to configure a remote filebucket, typically on the Puppet Master, using the special filebucket type:

filebucket { 'central':
  path   => false,    # This is required for remote filebuckets.
  server => '', # Optional, by default is the puppetmaster

Once filebucket is declared, we can assign it to a file with the backup argument:

file { '/etc/ntp.conf':
  backup => 'central',

This is generally done using a resource default defined at top scope (typically in our /etc/puppet/manifests/site.pp ):

File { backup => 'central', }