Book Image

Troubleshooting Puppet

By : Thomas Uphill
Book Image

Troubleshooting Puppet

By: Thomas Uphill

Overview of this book

Table of Contents (14 chapters)

Communication issues


Before you can begin debugging complex catalog problems, you need your nodes to communicate with each other. Communication problems between nodes and the master can either be network-related or certificate-related (SSL).

Network-related problems

When the Puppet agent is started on a node, one of the first things that the agent does is look up the value for the server option. You can either specify this with --server on the command line, or with server=[hostname] in the puppet.conf configuration file. By default, Puppet will look for a server named puppet. If it cannot find one named puppet, it will then try puppet.[your domain].

Tip

What Puppet believes to be your domain may be obtained by running facter domain.

When you are debugging the initial communication problems, you need to first verify that your nodes can find the Puppet master. For Unix systems, the way in which the system searches for a machine by name is called the gethostbyname system call. This system call uses the Name Service Switch (NSS) library to find a host in a number of databases. NSS is configured by the /etc/nsswitch.conf file. The line in this file that is used to find hosts by their respective names is the hosts line. The default configuration on most of the systems is the following:

hosts:  files dns

This line means that the system will search for hosts by name in the local files first. Then, if the host is not found, it will search in the Internet Domain Name System (DNS). The local file that is first consulted is /etc/hosts. This file contains static host entries. If you inherited your Puppet environment, you should look in this file for statically defined Puppet entries. If the machine puppet or puppet.[domain] is not found in /etc/hosts, the system then queries the DNS to find the host. The DNS is configured with the /etc/resolv.conf file on the Unix systems.

Tip

When troubleshooting, be aware that the domain fact is calculated using a combination of calls to the utility hostname and looking for a domain line in /etc/resolv.conf.

This file is known as the resolver configuration file. It's important to verify that you can reach the servers listed in the nameserver lines in this file. Your file may contain a search line. This line lists the domains that will be appended to your search queries. Consider a situation where the search line is as follows:

search example.com external.example.com internal.example.com

When you search for Puppet, the system will first search for puppet, then puppet.example.com, then puppet.external.example.com, and finally puppet.internal.example.com.

Several utilities exist for the testing of DNS. Among these utilities, host and dig are the most common. An older utility, nslookup, may also be used. To lookup the ipaddress option of the default Puppet Server, use the following:

t@mylaptop ~ $ host puppet
Host puppet not found: 3(NXDOMAIN)

In this example, the host puppet is not found. Yet, I know that this node works as expected. Remember that the system uses the gethostbyname system call when looking up the Puppet Server. Another utility on the system uses this call—the ping utility. When we try to ping the Puppet Server, this succeeds, and the output is as follows:

t@mylaptop ~ $ ping -c 1 puppet
PING localhost (127.0.0.1) 56(84) bytes of data.
64 bytes from localhost (127.0.0.1): icmp_seq=1 ttl=64 time=0.093 ms
--- localhost ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 0.093/0.093/0.093/0.000 ms

As you can see, the loopback address (127.0.0.1) is being used for the Puppet Server. We can verify that this information is coming from the /etc/hosts file using grep:

t@mylaptop ~ $ grep puppet /etc/hosts
127.0.0.1       localhostlocalhost.localdomainmylaptop localhost4 localhost4.localdomain4 mylaptop.example.com puppet.example.com puppet

Remembering the difference between using host or dig and using the gethostbyname system call can quickly help you find problems with your configuration. Adding an entry to /etc/hosts for your Puppet Server also bypasses any DNS problems that you may have in the initial configuration of your nodes.

Netcat

The next step in diagnosing network issues is verifying that you can reach the Puppet Server on the masterport, which is by default TCP port 8140. The masterport number may be changed, though. So, you should first confirm the port number using puppet config print masterport. One of the simplest tests to verify that you can reach the Puppet Server on port 8140 is to use Netcat. Netcat is known as the Swiss Army knife of network tools. You can do many interesting things with Netcat. More information about Netcat is available at http://nmap.org/ncat/.

Tip

There are several versions of Netcat available. The version installed on the most recent distributions is Ncat. The rewrite was done by Nmap (for more information, visit https://nmap.org).

To verify that you can reach port 8140 on your Puppet Server, issue the following command:

# nc -v puppet 8140
Connection to puppet 8140 port [tcp/*] succeeded!

If your Puppet Server was inaccessible, you will see an error message that looks like this:

nc: connect to puppet port 8140 (tcp) failed: Connection refused

If you see a Connection refused error as in the preceding output, this may indicate that there is a host-based firewall on the Puppet Server that is refusing the connection. Connection refusal means that you were able to contact the server, but the server did not permit the communication on the specified port. The first step in troubleshooting this type of problem is to verify that the Puppet Server is listening for connections on the port. The lsof utility can do this for you, as shown in the following code:

[root@puppet ~]# lsof -i :8140
COMMAND  PID   USER   FD   TYPE DEVICE SIZE/OFF NODE NAME
java    1960 puppet   18r  IPv6  22323      0t0  TCP *:8140 (LISTEN)

My Puppet Server is running the java process because puppetserver runs inside a JVM. We see java as the process name in the lsof output. If you do not see any output here, then you will know that your Puppet Server is not listening on the 8140 port.

If you do see a line with the LISTEN text, then your Puppet Server is listening and a firewall is blocking the communication. Host-based firewalls on Linux are configured with the firewalld system or iptables, depending on your distribution. More information on these two systems can be found at http://en.wikipedia.org/wiki/Iptables and https://fedoraproject.org/wiki/FirewallD.

Tip

Ubuntu distributions also include an Uncomplicated Firewall (ufw) utility to configure iptables. BSD-based systems will use the Berkeley Packet Filter (pf) or IPFilter. Knowing how to configure your host-based firewall configuration is a key troubleshooting skill.

If you are familiar with firewall configuration, you can add port 8140 to the allow list and solve the problem. If you are new to firewall configuration, you may choose to temporarily disable the firewall to aid your troubleshooting. Although a perimeter firewall is often a better solution, host-based firewalls should be used wherever possible to avoid accidentally or unintentionally exposing ports on your servers. When you have fixed the problem, turn the host-based firewall back on. On an Enterprise Linux-based distribution, the following will disable the host-based firewall:

[root@puppet state]# service iptables stop
iptables: Setting chains to policy ACCEPT: filter          [  OK  ]
iptables: Flushing firewall rules:                         [  OK  ]
iptables: Unloading modules:                               [  OK  ]

If removing your host-based firewall does not solve your communication issue and you have verified that the service is listening on the correct port, then you will have to resort to advanced network troubleshooting tools.

Tools that may help in this case are mtr and traceroute. It is important to note that, even if a ping test fails, you may still be able to reach your Puppet Server on the masterport. The ping utility uses ICMP packets, which may be blocked or restricted on your network. If the netcat test still fails after addressing the firewall concerns, then you should try the mtr utility to check whether you can find where your communication is not reaching the server. For example, to test connectivity with the puppet server, issue the following command:

# mtr puppet

As an example, from my laptop, the following is the mtr output when attempting to reach https://puppetlabs.com/:

If you were unable to reach the Puppet Server, the last line in the host list would be ???. The line immediately preceding the ??? line would be the point at which the line of communication between the node and master was broken.

After you have verified that the network communication between the node and master is working as expected, the next issue that you should resolve is certificates.

SSL-related problems

Puppet uses X509 certificates to secure the communication between nodes and the master. As a Puppet administrator, you should know how the SSL certificates and a CA works.

Your infrastructure may have a separate server that acts as a CA for your Puppet installation. The CA is the certificate that is used to sign all the certificates that are generated by your master(s). If your CA is a separate server, the ca_server option will be specified in the puppet.conf file.

Although the server may be specified from the command line when running puppet agent, the ca_server option cannot.

By default, the CA certificate is generated on the first run of either the Puppet master or puppetserver. The certificate is stored in /var/lib/puppet/ssl/ca/ca_crt.pem for the Open Source Puppet (OSS) or /etc/puppetlabs/puppet/ssl/ca/ca_crt.pem for Puppet Enterprise (PE). To view the information in the certificate, use OpenSSL's x509 utility, as follows:

# openssl x509 -in ca_crt.pem -text
Certificate:
    Data:
        Version: 3 (0x2)
        Serial Number: 1 (0x1)
    Signature Algorithm: sha256WithRSAEncryption
        Issuer: CN=Puppet CA: puppet.example.com
        Validity
            Not Before: Feb 28 06:29:29 2015 GMT
            Not After : Feb 28 06:29:29 2020 GMT
        Subject: CN=Puppet CA: puppet.example.com
        Subject Public Key Info:
            Public Key Algorithm: rsaEncryption
                Public-Key: (4096 bit)
                Modulus:
                    00:99:2f:50:c4:5a:9c:e9:3a:4a:f0:1b:9b:9e:d1:
...

If you are new to the openssl command-line utility, try running openssl help (help is not actually an option, but it will cause the openssl command to print helpful information). Each of the subcommands to the openssl utility has its own Unix manual page. The manual page for the x509 subcommand can be found using man x509.

The preceding information shows that the CA certificate was automatically generated and has a five-year expiry. 5 years has been the default expiry time for some time now, and many Puppet installations are nearly 5 years old and require the generation of new CA certificates. If everything suddenly stopped working, you may wish to verify the expiry date of your CA. In addition to the expiry time, we can see the subject of the certificate, puppet.example.com. This is the name that Puppet has given to the CA based on the hostname and domain facts when the master/Puppet Server was started.

If you are diagnosing a certificate issue, you can first start by downloading the CA certificate. This can be done with the curl or wget utilities. In this example, we will use curl and pass the --insecure option to curl (since we have not downloaded the CA yet and cannot verify the certificate at this point), as follows:

$ curl --insecure https://puppet:8140/production/certificate/ca
-----BEGIN CERTIFICATE-----
MIIFfjCCA2agAwIBAgIBATANBgkqhkiG9w0BAQsFADAoMSYwJAYDVQQDDB1QdXBw
ZXQgQ0E6IHB1cHBldC5leGFtcGxlLmNvbTAeFw0xNTAyMjgwNjI5MjlaFw0yMDAy
...

We can use a pipe (|) to direct the curl output to openssl and verify the certificate, as follows:

$ curl --insecure https://puppet:8140/production/certificate/ca |openssl x509 -text
  % Total    % Received % Xferd  Average Speed   Time    TimeTime  Current
Dload  Upload   Total   Spent    Left  Speed
100  1964  100  1964    0     0   6684      0 --:--:-- --:--:-- --:--:--  6680
Certificate:
    Data:
        Version: 3 (0x2)
        Serial Number: 1 (0x1)
    Signature Algorithm: sha256WithRSAEncryption
        Issuer: CN=Puppet CA: puppet.example.com
        Validity
            Not Before: Feb 28 06:29:29 2015 GMT
            Not After : Feb 28 06:29:29 2020 GMT
        Subject: CN=Puppet CA: puppet.example.com
        Subject Public Key Info:
            Public Key Algorithm: rsaEncryption
                Public-Key: (4096 bit)
                Modulus:
                    00:99:2f:50:c4:5a:9c:e9:3a:4a:f0:1b:9b:9e:d1:
...

If the CA certificate verifies correctly, the next step is to attempt to retrieve the certificate for your node. You can do this by first downloading the CA certificate to a local file as follows:

$ curl --insecure https://puppet:8140/production/certificate/ca >ca_crt.pem
  % Total    % Received % Xferd  Average Speed   Time    TimeTime  Current
Dload  Upload   Total   Spent    Left  Speed
100  1964  100  1964    0     0   6851      0 --:--:-- --:--:-- --:--:--  6867

In this example, my hostname is mylaptop. I will attempt to download my certificate from the master using curl (verifying the communication with the previously downloaded CA certificate):

$ curl --cacertca_crt.pem https://puppet:8140/production/certificate/mylaptop
-----BEGIN CERTIFICATE-----
MIIFcTCCA1mgAwIBAgIBBDANBgkqhkiG9w0BAQsFADAoMSYwJAYDVQQDDB1QdXBw
ZXQgQ0E6IHB1cHBldC5leGFtcGxlLmNvbTAeFw0xNTAzMDEwNjMzMDdaFw0yMDAy
...

As you can see, this succeeded. If we pipe the output to OpenSSL, we see that the subject of the certificate is mylaptop and the certificate has not expired:

$ curl --cacertca_crt.pem https://puppet:8140/production/certificate/mylaptop |openssl x509 -text
  % Total    % Received % Xferd  Average Speed   Time    TimeTime  Current
Dload  Upload   Total   Spent    Left  Speed
100  1948  100  1948    0     0   6155      0 --:--:-- --:--:-- --:--:--  6145
Certificate:
    Data:
        Version: 3 (0x2)
        Serial Number: 4 (0x4)
    Signature Algorithm: sha256WithRSAEncryption
        Issuer: CN=Puppet CA: puppet.example.com
        Validity
            Not Before: Mar  1 06:33:07 2015 GMT
            Not After : Feb 29 06:33:07 2020 GMT
        Subject: CN=mylaptop
...

Since we previously downloaded the CA certificate, we can also verify this certificate by using the verify subcommand. To use verify, we will give the path to the CA certificate that was previously downloaded, and the client certificate that we just downloaded, as follows:

$ openssl verify -CAfile ca_crt.pem mylaptop.pem
mylaptop.pem: OK

If your master failed to return a certificate in the previous step, use puppet cert on the master to find the certificate. For the mylaptop example, issue the following commands:

[root@puppet ~]# puppet cert --list mylaptop
+ "mylaptop" (SHA256) 76:05:4E:C6:25:5F:04:63:A3:B7:5D:45:C9:60:48:DF:24:0D:B7:3E:4D:9F:75:5E:C8:9F:64:1D:56:34:C2:D2

If the certificate is present but unsigned, the output will have a missing + symbol at the beginning, like this:

[root@puppet ~]# puppet cert --list mylaptop
  "mylaptop" (SHA256) 87:B3:28:31:B6:A4:3D:4A:BE:E0:4B:BD:DE:24:28:74:E1:00:8A:09:91:3C:CD:B5:17:92:73:44:A1:41:C9:9E

If the certificate is not present, the output will look like this:

Error: Could not find a certificate for mylaptop

A common problem with certificates is an old certificate or a mismatch between the ca_server/master and the node. The simplest solution to this sort of problem is to remove the certificate from both machines and start again.

To remove the certificate on the ca_server, use puppet cert clean with the appropriate hostname, as follows:

[root@puppet ~]# puppet cert clean mylaptop
Notice: Revoked certificate with serial 6
Notice: Removing file Puppet::SSL::Certificate mylaptop at '/var/lib/puppet/ssl/ca/signed/mylaptop.pem'
Notice: Removing file Puppet::SSL::Certificate mylaptop at '/var/lib/puppet/ssl/certs/mylaptop.pem'

As mentioned in the output, the certificates are stored in the subdirectories of /var/lib/puppet/ssl. If the puppet cert clean command does not remove the certificate, you can remove the files manually from this location.

On the node, remove private_key and certificate from the /var/lib/puppet/ssl directory manually (there is no automatic way to do this). Alternatively, you can choose to remove the entire /var/lib/puppet/ssl directory and have the node download the CA certificate again.

This location is different for Puppet Enterprise. Puppet Enterprise stores certificates in /etc/puppetlabs/puppet/ssl. This often involves less work as compared to that of finding all the files that need to be removed.

When we ran puppet cert clean on the master, one of the output lines mentioned that the certificate has been revoked. X509 certificates can be revoked. The list of certificates that have been revoked is kept in the Certificate Revocation List (CRL), which is in the ca_crl.pem file in /var/lib/puppet/ssl/ca.

We can use OpenSSL's crl utility to inspect the CRL, as follows:

[root@puppetca]# opensslcrl -in ca_crl.pem -text
Certificate Revocation List (CRL):
        Version 2 (0x1)
    Signature Algorithm: sha1WithRSAEncryption
        Issuer: /CN=Puppet CA: puppet.example.com
        Last Update: Mar  5 06:40:51 2015 GMT
        Next Update: Mar  3 06:40:52 2020 GMT
        CRL extensions:
            X509v3 Authority Key Identifier:
keyid:25:18:D4:0B:37:BD:BA:FE:70:D9:BB:17:8F:D9:84:EC:6D:30:76:71

            X509v3 CRL Number:
                2
Revoked Certificates:
    Serial Number: 06
        Revocation Date: Mar  5 06:40:52 2015 GMT
        CRL entry extensions:
            X509v3 CRL Reason Code:
                Key Compromise

As you can see, the certificate with the serial number 6 has been marked as revoked. The serial number is located within the certificate. When the master verifies a client, it will consult the CRL to verify that the serial number is not in the CRL.

More information on X509 certificates can be found at https://www.ietf.org/rfc/rfc2459.txt and http://en.wikipedia.org/wiki/X.509.