Book Image

Instant Chef Starter

By : John Ewart
Book Image

Instant Chef Starter

By: John Ewart

Overview of this book

<p>As any systems administrator will tell you, managing one server can be challenging, let alone a dozen or more. With Chef, you can make managing dozens or even hundreds of servers manageable and learn how to configure and deploy new servers.<br /><br />"Instant Chef Starter" is a hands-on guide to managing your infrastructure. You will learn the benefits of using Chef as well as how to install, configure, and use the set of tools provided. The book will also cover developing recipes for use with Chef to install software and maintain configurations so managing dozens of servers is no more difficult than managing one. <br /><br />Learn how Chef fits into your infrastructure, install the software, build your own recipes, and provision servers with ease.<br /><br />This book covers installing your own Chef server to manage your infrastructure and software configurations. Discover where you can find existing templates for managing software packages and operating systems and then learn to write your own. After you have done that, learn how to apply operations, execute scripts, and manage configurations across an entire network with only one command.</p>
Table of Contents (7 chapters)

Top features you need to know about


Chef is a very feature-rich system for managing your infrastructure. The goal of this section is to introduce you to some of the advanced features of Chef. We will cover understanding and developing our own cookbooks, using knife in more advanced ways, bootstrapping our own servers and using JSON data bags to store complex pieces of information about our infrastructure.

Getting more in-depth with Chef

A small note before beginning this section: As the majority of the work that you will do with Chef involves Ruby programming, this section assumes that you have a reasonably firm grasp of Ruby programming concepts. The section aims to be readable by anyone with a sufficient technical background but some of the Ruby-specific portions may require some knowledge that is outside the scope of this book.

Developing your own recipes and cookbooks

Now that we have gotten our server running and have applied some basic recipes, it's time to venture into the details of writing our own cookbooks and recipes. Here we will discuss how a cookbook is structured, what tools are available to you when writing recipes, and see some examples of how we might approach developing our own.

Cookbook contents

Cookbooks are one of the core components of the Chef ecosystem. They are, as their name suggests, a collection of recipes and other data that, when combined, provides a specific set of functionality to a System Administrator. In each cookbook, you will find a collection of directories and files describing the cookbook and its recipes and functionality. Core components that a cookbook contains are as follows:

  • Attributes: These are attributes that the cookbook's recipes rely on. A well-defined cookbook should contain some sane defaults for the recipes such as installation directories, usernames, download URLs, version numbers, and so on. Anything a recipe expects the node to have defined should be given a default value so that the recipe will behave as expected.

  • Recipes: Ruby scripts define the recipes in the cookbook. A cookbook can contain as few as one, or as many recipes as its author would like to put into it. Most package-specific cookbooks only contain a few recipes, while some cookbooks, such as your organization's private cookbook, may have dozens of recipes for internal use.

  • Templates: These are Ruby ERB files that are used to describe any file that needs to have some dynamic data in it; often these are used for startup scripts or configuration files.

  • Resources: This describes a resource that can be used in a recipe, for example the supervisord cookbook defines a service resource that can be used. Resources are ruby scripts that use Chef's resource DSL (Domain-Specific Language) to describe various actions, attributes, and other properties of the resource.

  • Providers: This describes an implementation of a resource, in the case of the supervisord cookbook, the service provider file outlines the actual implementation-specific logic of the actions that a resource can perform. There are many different types of services that you could have—supervisord, runit, monit, bluepill, and so on.

  • Additionally, cookbooks may include a variety of support files that are not directly a part of the recipes, such as:

    • Definitions: This defines various structures and run-time components. Perhaps you want to define the structure of a user account, a background worker, or a runnable process. Definitions provide a way to programmatically describe what these components look like, and implement any logic that they might need.

    • Ruby libraries: This includes any re-usable code that your recipes need and can be included in the cookbook. Things that go in here are accessible by your recipes and automatically loaded for you.

    • Support Files: These are arbitrary data files that don't fall into any of the other categories.

    • Tests: Recipes, being composed of Ruby code, can include unit tests or cucumber tests to verify that the logic works. Note that these tests are unit tests, not integration tests—they are not designed to ensure that you have configured your nodes properly or that there are no conflicts or other issues when applying these recipes.

Cookbook file organization

These files have their own homes in the hierarchy of files and directories contained within a cookbook. The names of the directories and files are fairly transparent, as we will see here.

As an example, let's take a look at a fairly simple cookbook that we downloaded earlier, the MySQL cookbook. If we were to look at the file contents of that cookbook, we would see directories containing the various components of a cookbook: attributes, auxiliary libraries, recipes, file templates, and resource files.

If you look inside the MySQL cookbook you can see that the directory structure maps closely to the names of the various components of our cookbook. Each component has its own directory structure and expectation of files contained within, which we will discuss in the corresponding sections.

Note that not every cookbook contains all of these components, and there are other components a cookbook may have that the MySQL cookbook does not.

Some of these files are purely informational such as the README.md, CONTRIBUTING, LICENSE, and CHANGELOG files. These files are here to convey information to you about how to participate, license, or otherwise use the cookbook.

Recipes

Recipes, housed in the recipes directory inside the cookbook, are a set of Ruby scripts each of which achieves a specific purpose. Think of a recipe for making chocolate chip cookies; following the recipe yields a very specific result: chocolate chip cookies. It doesn't produce oatmeal raisin cookies.

Similarly, your Chef recipes should by very clear about what objective they will achieve and perform only that task. If you look at the list of files in the MySQL cookbook, you will see that it contains five recipes: client, default, server, ruby, and server_ec2. Each recipe achieves one specific goal, to install the MySQL client, Ruby library, server, or server on an EC2 node. There is also a default recipe in the cookbook, which in this case installs the client.

Each recipe is a script that is run from beginning to finish (assuming that nothing causes it to abort), and can access the node's attribute data, compile templates, create directories, install packages, execute commands, download files, and do just about anything that you can do from a shell as an administrator. And, if you can't accomplish what you need to do using the existing Chef resources, you can always execute a user-defined shell script.

Let's learn about the other components of the cookbook and then re-visit the recipes themselves once we've learned more about what's in the cookbook.

Metadata

Each cookbook contains a metadata.rb file in the root directory of the cookbook that is responsible for declaring information about the cookbook itself. The contents of this script are then used to generate a JSON file that describes the cookbook that is used by the Chef Server for dependency resolution, searching, and importing into run lists.

This is a required file for your cookbook to be uploaded into the Chef Server so it knows what recipes are being provided and what other cookbooks need to be installed in order for your cookbook to be fully operational.

A bare-minimum metadata.rb file in a cookbook for the open source Gearman daemon might look like the following code:

maintainer       "John Ewart"
maintainer_email "[email protected]"
license          "Apache 2.0"
description      "Install the Gearman daemon"
long_description "The gearman daemon is a job-processing queue"
version          "0.1.0"
# Recipes contained, one per recipe method
recipe           "gearman", "Empty recipe, use one of the others"
recipe           "gearman::java_daemon", "Install the Java daemon"
# Platforms it supports
supports         "ubuntu"
# Cookbooks it depends on
depends          "java"

Because these are Ruby scripts you can also do anything Ruby-like inside of them, allowing for you to automate some of the portions of the metadata file. For example, you might choose to replace the long_description entry with something programmatic such as reading in the contents of your README.md file rather than duplicating all or some of the content:

long_description  IO.read(File.join(File.dirname(__FILE__),     'README.md'))

Or, if your cookbook supports a handful of platforms, instead of writing out each platform that is supported on a line of its own, you could produce a list automatically using something similar to the following code snippet:

%W{ ubuntu debian freebsd }.each do |os|
  supports os
end

As long as your Ruby code produces something that is an acceptable argument or block for the configuration method, you can be as clever as you want (just don't be so clever that it doesn't make sense next week!)

Tip

Take advantage of Ruby programming wherever possible to cut down on excessive duplication of configuration descriptions or anywhere that you might want to perform logic checks. Almost everything in your cookbook is a Ruby script, and we're here to automate things!

Attributes

Every node has attributes associated with it. Data from various locations are combined to produce the final hash of attributes, which is computed when a client requests its run list from the server. One of those locations is the cookbook itself, which provides a baseline set of attributes that the recipes inside rely on. Other sources, including the environment, role, and node itself may override these attributes. When writing your recipes these attributes are accessed through the node hash, and are computed for you by Chef ahead of time. The order of precedence used when computing this hash are the following (lowest to highest) levels:

  • default

  • normal (also set)

  • override

Within each level, the sources of attribute data, in order of increasing precedence, are as follows:

  • attributes file inside of a cookbook

  • environment

  • role

  • node

This means that a node-specific override attribute takes precedence over all others, which in turn is more important than the role, environment and cookbook override attributes, and so down the chain of precedence. As a result, you will see that any default attribute set by your cookbook will be the lowest priority, meaning that you can safely set some sane defaults in your cookbook knowing that they will be only used as a fallback.

Order of loading

Chef loads attributes files in alphabetical order, and cookbooks tend to contain only one attributes file named attributes/default.rb. If you have a cookbook that has more complex attributes definition files, it might be wise to separate them into recipe-specific attributes files. For example, the MySQL cookbook from the Opscode site has two attribute files: server.rb and client.rb, each of which contains anywhere between fifty and one hundred and fifty lines of Ruby code. Again, like most other configuration files in Chef, these are Ruby scripts and can range from simple attribute-setting statements to complex logic used to determine an appropriate set of default attributes.

For example, a simple default attributes file for HAProxy might look like the following code snippet:

default['haproxy']['incoming_port'] = "80"
default['haproxy']['member_port'] = "8080"
default['haproxy']['enable_admin'] = true
default['haproxy']['app_server_role'] = "webserver"

Notice that the attributes for a cookbook are name-spaced inside of a key, typically the same name as the cookbook (in this case haproxy). If you have multiple recipes inside a cookbook, you would likely have default configurations in every recipe. Consider a simplified MySQL default attributes file:

default['mysql']['client']['use_ssl'] = true
default['mysql']['server']['listen_port'] = "3306"
default['mysql']['server']['log_dir'] = "/var/log/mysql"

However, there are times when just a simple attributes file doesn't make sense, again being able to script these comes in handy. Consider a recipe where the default group for the root user depends on the platform you are using (wheel on BSD, admin on Ubuntu Linux, root anywhere else). We can use plain-old Ruby code, or an optional Chef-provided convenience method such as value_for_platform:

default[:users]['root'][:primary_group] = value_for_platform(
  "openbsd"   => { "default" => "wheel" },
  "freebsd"   => { "default" => "wheel" },
  "ubuntu"    => { "default" => "admin" },
  "default"   => "root"
)

Additionally, you can load attributes from another cookbook using the include_attribute method. Let's say, for example, you need to load the apache port attribute. You can just use node['apache']['port'] but it is not guaranteed that it has been over-ridden or that the recipe has been loaded yet. To address that, we can do the following to load the settings from attributes/default.rb inside our apache cookbook:

include_attribute "apache"
default['mywebapp']['port'] = node['apache']['port']

If you need to load an attributes file other than default.rb, say attributes/client.rb inside the mysql cookbook, you can specify it as follows:

include_attribute "mysql::client"

But make sure that you add cookbooks that you load defaults from as a dependency in your cookbook's metadata.

As you can see, depending on our needs, we can generate attributes for our recipes ranging from simple, straightforward static configuration defaults, to complex platform-dependent defaults.

Using attributes

Once you have defined your attributes, they are accessible in our recipes using the node hash. Chef will compute the attributes in the order discussed and produce one large hash, (also called a "mash" in Chef as it is a hash with indifferent access—string keys or symbol keys are treated as the same, so node[:key] is the same as node["key"]) which you have access to.

If our node had loaded our mysql and our root group defaults as specified earlier, and did not have any environment, role, or node-level overrides defined, the node hash would contain (among other things):

node = {
'mysql' => {
  'client' => {
           'use_ssl' => true,
         },
         'server' => {
           'listen_port' => "3306",
        'log_dir'     => "/var/log/mysql",
      }
   }, 
   'users' => {
     'root' => { 'primary_group' => 'wheel' },
   }
}

This can then be accessed anywhere in our recipe or templates through variables such as node[:mysql][:server][:listen_port],or node[:users][:root][:primary_group].

Templates

Often times, if you are installing a specific piece of software, or writing out any sort of customizable data file to the filesystem, you will need to generate a file with some data inside of it. To do this, you have some options; you could simply take the approach of programmatically writing the contents of your file line-by-line, just like the following code:

File.open(local_filename, 'w') do |f| 
  f.write("<VirtualHost *:#{node['app]['port']}")
#...
  f.write("</VirtualHost>")
end

And, technically, this is a perfectly acceptable approach. But as far as maintainability is concerned (and readability), this is one of those things to avoid. The other choice would be to implement some sort of template language and store the configuration files as templates with placeholders built-in for dynamic data (since most configuration files are boilerplate, most of the file is just plain text).

Template file format

Chef uses ERB, a template language that is provided by the core Ruby library (which makes it incredibly accessible). It would be possible to write your own template provided that used something else if you were so inclined, but it would likely be more effort than it was worth.

As ERB is a Ruby template language, it supports arbitrary Ruby code within it, as well as some ERB specific markup.

A quick ERB primer

As ERB is very well documented and widely used, this section serves only as a quick reference to some of the most commonly used ERB mechanisms. For more information, see the official Ruby documentation at http://ruby-doc.org/stdlib-1.9.3/libdoc/erb/rdoc/ERB.html.

  • Executing Ruby: To execute some arbitrary Ruby code, you use the <% %> container. The <% tag indicates the beginning of the Ruby code, and %> tag indicates the end of the block. The block can span multiple lines or just one single line. For example, the following code:

    <% 
    [1,2,3].each do |index| 
       puts index
    end
    %>
    
    <% users.collect{|u| u.full_name }%>

    You can mix Ruby and non-Ruby code (useful for repeating blocks of non-Ruby text) like such:

    <% [1,2,3].each do |value| %>
    Non-ruby text...
    <% end %>

    The preceding code would yield the following output:

    Non-ruby text...
    Non-ruby text...
    Non-ruby text...
  • Variable replacement: ERB has syntax for replacing the block with the results of the block, and that container is similar to the last one, with the addition of the equal sign inside the opening tag. This looks like this, <%= %>. Any valid Ruby code is acceptable inside this block, and the result of this code is put into the template in place of the block. For example, the following code:

    <%= @somevariable %>
    <%= hash[:key] + otherhash[:other_key] %>
    <%= array.join(", ") %>

    This can be combined with the preceding example to produce complex output:

    <% [1,2,3].each do |value| %>
    The value currently is <%= value %>
    <% end %>

    The preceding code would yield the following output:

    The value currently is 1
    The value currently is 2
    The value currently is 3
The template resource

Fortunately, Chef provides you with a template resource at your disposal. The template resource looks for a file in the templates directory contained inside of your cookbook that you specify. The template resource requires one argument, the name of the file on the filesystem to write to, and then a block containing the description of what's to go into that file and any other option such as owner, group, permissions, and so on.

For example, let's say you wanted to write out a file, /etc/apache2/ports.conf, which would contain a listing of all the ports that Apache is supposed to be listening on. That file is supposed to be owned by the same user Apache runs as, with the permission mask of 0600. The template resource description would look like the following code snippet:

template "/etc/apache2/ports.conf" do 
  source "ports.conf.erb"
  owner "apache2"
  mode "0600"
end

In this example, we are defining a template, and indicating that Chef is to process the template file, ports.conf.erb, contained within the cookbook's template directory and write it out to /etc/apache2/ports.conf on the server when the recipe is run, then change the owner of the file to the apache2 user and then change the mode of the file to 0600.

A very simple ports.conf.erb file as outlined previously might look like this:

<% node['apache']['ports'].each do |port| %>
Listen <%= port %>
<% end %>

Now combine it with the following node attribute JSON data computed by Chef:

{
   'apache': { 
        'ports': [80, 81, 82, 83]
    }
}

This will produce the following contents in the file /etc/apache2/ports.conf:

Listen 80
Listen 81
Listen 82
Listen 83
Template variables

Templates automatically have access to the compiled attributes of the node they are being run on via the node hash. Additionally, you may pass variables into the template resource using the variables attribute inside the resource block. An example of this looks like the following code snippet:

template "/etc/apache2/errbits_vhost.conf" do 
  source "app_vhost.conf.erb"
  mode "0600"
  variables(
    :application_name => "errbits",
    :params => params_hash
  )
end

This now becomes accessible in the template, as class variables with the same name as declared in the resource, prefixed by the @ character:

<VirtualHost <%= node[:ipaddr] %>:*>
    ServerName <%= @application_name %>.yourcorp.com
    DocumentRoot <%= @params[:document_root] %>
</VirtualHost>
Where to store templates

The templates directory has its own directory hierarchy inside of it, a mandatory one named default and then any platform-specific or host-specific directories alongside it. Chef has a specific order in which it will search the directories for a template:

  1. Hostname

  2. Distribution-version

  3. Distribution

  4. Default location

Most cookbooks strive to be reusable, but this is not always the case. If you have only internal cookbooks, or a modified version of a public cookbook, it would be perfectly acceptable to have host-specific templates.

As an example, let's consider a scenario in which we applied the recipe with the ports.conf.erb template resource to a node, db1.production.mycorp.com, which is running Debian 6.0. Chef would then look for the following files inside of templates:

  1. host-db1.production.mycorp.com/ports.conf.erb

  2. debian-6.0/ports.conf.erb

  3. debian/ports.conf.erb

  4. default/ports.conf.erb

Again, the search is performed in that order, and the first match wins. If the file requested does not exist in any of those directories, then the template resource will fail.

This differentiation of configuration files by host, platform, and even version is very useful. Some cookbooks can be installed on a multitude of platforms and distributions, and those likely have platform-specific configuration settings. In this manner, common configuration data can be stored in the default directory while platform-specific or host- specific configurations can be kept organized and kept away from the common files.

Resources

Chef provides a fairly extensive set of resources that are available to you, among which some of them include:

  • Cron jobs

  • Deployments

  • Filesystem components (mount points, files and directories, and so on)

  • Source code repositories (git, svn)

  • Logs

  • PowerShell scripts (Windows targets)

  • Shell scripts

  • Templates

  • Packages

  • Users and groups

And, if you haven't noticed the trend by now, Chef allows you to easily define new resources if there isn't one available that matches what you need. We will learn how resources work and discuss a few of the more commonly used ones that Chef provides.

Resources are composed of a resource name (package name, file path, service name, and so on), an action, and some attributes that describe that resource. In the following list, there are three of the available resources—package, directory, and script; together these three resources can accomplish a lot.

Using resources

Resources, when used in a recipe take the following form:

resource_name "name attribute" do 
   attribute "value"
   attribute "value"
end

Where the block being passed to the resource is optional and contains zero or more attribute descriptions, resource_name is replaced with the resource you are describing, and the string value being provided to the resource is also known as the name attribute.

A concrete example might be installing the package tcpdump on your system. To install the default version with no customization, you could use a resource description such as, package "tcpdump".

However, if you wanted to be more verbose with your resource description and install a specific version of the tcpdump package, you could use the following code:

package "tcpdump" do
   action :install
   version "X.Y.Z"
end

In the preceding code, the block passed to the package resource defines some additional attributes; in this case the version, and the action (install, which is the default action for the package resource).

The following tables are for three resources available to you in Chef that we will be using in our examples. Each table lists the resource name at the top, followed by the list of actions (default is bold), and some of the attributes that you can apply to that resource (not a comprehensive list). The first attribute in the list is italicized and is the "name" attribute of the resource (in the case of package, it's the name of the package).

One of the most frequently used resources in recipes is the package resource, which provides us with the ability to install software via the local packaging system (apt, yum, and so on).

Package

Actions

install, upgrade, remove, purge

Attributes

package_name

The name of the package to install

 

version

Version of package to install

 

source

Local file to install (as opposed to downloading via APT or yum)

Another useful resource is the directory resource. This resource allows us to manipulate directories on the local system (create or delete them). It is very useful for making sure certain directories exist, or cleaning up after a script runs.

Directory

Actions

create, delete

Attributes

path

Path to the directory to create/delete

 

group

Group to assign ownership (string or numeric ID)

 

user

User account to assign ownership to (string or numeric ID)

 

mode

Octal file mode

 

recursive

Delete/create recursively (ownership applies only to the actual directory being created, not intermediate ones)

Being able to run arbitrary shell scripts makes Chef recipes very flexible. If a piece of software doesn't have a package for the system or needs to be compiled, or if you need to perform some complex initialization or configuration, the run resource provides you with that ability.

Script

Actions

run

Attributes

command

Name of the script that you are running

 

code

The actual (quoted) script to run. This can be a single-line or a multiline script using heredoc syntax

 

user

Username to run the script as

 

interpreter

The script interpreter to use

 

cwd

Directory to change to before executing the script

Writing a basic recipe

At their core, recipes put together resources in a certain order to produce an outcome. These resources are either pre-defined by Chef for you, or are developed by you to provide custom functionality that suits your needs.

Putting it all together, we can build our own recipes that range from very simple single-step recipes to multi-step, multi-platform recipes.

Let's take a look at a simple recipe that leverages the script resource to perform the following steps:

  1. Check to see if the script has successfully been executed before (no sense in re-building the same thing a second time).

  2. Use bash as the interpreter.

  3. Assume the role of the root user.

  4. Change the working directory to /tmp.

  5. Execute a bash script that will fetch a file, decompress it, and then build the source.

    script "install_mrsid" do
      not_if {File.exists?('/opt/mrsid-7.0.0.2167/VERSION.txt')}
      interpreter "bash"
      user "root"
      cwd "/tmp"
      code <<-EOH
        wget http://dl.dropbox.com/u/282613/mrsid-7.0.tar
        tar xvf mrsid-7.0.tar -C /opt
        gdal-mrsid-build /opt/mrsid-7.0.0.2167
      EOH
    end

Notice that here we use a multiline script (anything non-trivial will likely be multiple lines long) by using the <<- heredoc syntax. This allows us to write a multiline string without having to use quotes. The syntax <<-EOH tells Ruby to read until it sees the specified characters EOH (EOH has been chosen to represent "end of heredocs" but you can use any text you want as long as it doesn't show up in your script).

One thing we haven't seen yet is the use of the not_if qualifier. This is exactly what it looks like; if the block supplied to not_if returns a true value, then the resource is not processed. This is very useful for ensuring that you don't clobber important files or repeat expensive operations such as recompiling a software package.

Getting to know your knife (every chef's primary tool)

Chef comes with a set of command-line tools that perform a variety of tasks. Among these the most often-used one is knife, which provides you with the ability to manage your infrastructure the same way as you would use the web interface.

Commands

Knife is the command-line interface for the Chef Server, and provides the same level of functionality as the web-based interface does. It allows you to interact with your Chef Server from anywhere that you have access to your shell, it also allows you to automate actions on Chef Servers using scripts (automating automation—what will we think of next?).

There are a dozen or so commands that knife knows about, each of which allows you to interact with a specific facet of the Chef Server or your servers. Depending on what you want to accomplish you will use a different knife command. The ones we will discuss in this book are as follows:

  • bootstrap: This command executes a bootstrap script on a server

  • node, role: These commands manipulate the node or role data, respectively

  • cookbook: This command provides tools for downloading, uploading, listing, and testing cookbooks

  • data bag: This command provides tools for uploading and managing the JSON data bags stored in the server

  • ssh: This runs commands on any number of nodes or servers

Bootstrapping a server

The bootstrap command is the first one that you will run to bring up a new server and register it with your Chef Server. The bootstrap command has a number of options available, but the ones that are more commonly used are:

  • -d distribution_name: This is the distribution name (the name of the template to look for) such as ubuntu-12.10-ruby19 which would look for ubuntu-12.10-ruby19.erb in the distribution directory

  • -N node name: This is the name of the node to register with Chef (if not provided, the hostname of the node will be used as the default)

  • -x username: This is the SSH username to use when connecting to the server

  • --sudo: The sudo is used to bootstrap the server (as opposed to logging in as root)

  • -i identity_file: This is the SSH key file to use for your identity (useful when bootstrapping EC2 instances)

The bootstrap command has one argument, the hostname or IP of the server to bootstrap. A typical bootstrap command would look something like the following:

knife bootstrap myserver.mycorp.com -x root -d debian-6.0-ruby19

Whereas bootstrapping an EC2 server (which requires you use an SSH key and typically does not allow root logins) might look like this instead:

knife bootstrap ec2hostname -i ~/.ssh/id_ec2 -d ubuntu12.10 --sudo

By using the bootstrap command, we can set up new servers with a single command, which could in turn be folded into a higher-level automation system that leverages knife to automate bringing up new servers.

Viewing and manipulating data

All the data in Chef is stored as JSON. This makes it incredibly convenient to edit, view, and manipulate any information in the system using your favorite text editor. Knife will execute whatever editor is specified by the $EDITOR environment variable and load the data requested into it, writing data back to the Chef Server when the buffer is written out.

Note

You must have an EDITOR environment variable set or knife will refuse to edit data. Also note that you need to have an editor that does not run in the background or the file will not update.

Managing nodes and roles

The knife tool easily manages nodes and roles; you can: list, edit, remove, show, and create them. In addition to creating them by hand, you can also import them from an existing JSON file (very handy if you are creating multiple roles and you have a template to start from).

Editing existing data

You can easily edit most data stored in Chef, but let's say you wanted to edit an existing node, web01-production. You can simply issue the command knife node edit web01-production and it will fire up vim (or your selected editor) with the JSON data stored for that node, this shows you the editable per-node attributes and configuration, as follows:

{
  "name": "web01-production",
  "chef_environment": "_default",
  "normal": {
    "firewall": {
      "state": "[{\"SSH\"=>{\"protocol\"=>\"tcp\", \"dest_port\"=>\"22\"}}]"
    },  
    "tags": [

    ]   
  },  
  "run_list": [
    "role[base_server]",
    "role[web_server]"
  ]
}

If you compare the preceding list of data to the attributes listed in the web interface the results will differ. The edit command does not, by default, edit automatic, default, or override attributes, in order to do this you must pass the -a or --all flag to the edit command, which will allow you to edit all of the attributes that are currently set on the node.

Once you save the file you are editing, the contents are uploaded to Chef and validated. Assuming that your JSON validates, the changes are applied to the data stored in Chef—they are not automatically pushed to the nodes, you will have to run chef-client on them (manually or using the knife ssh command) in order to update their state.

Creating new entities

To create new records, the node and role commands have the create subcommand. For example, to create a new role called worker_node, you would use the following command:

knife role create worker_node

The preceding command will bring up your editor with the basic skeleton of a role represented using JSON:

{
  "name": "example_server",
  "description": "", 
  "json_class": "Chef::Role",
  "default_attributes": {
  },  
  "override_attributes": {
  },  
  "chef_type": "role",
  "run_list": [

  ],  
  "env_run_lists": {
  }
}

Here you can create a new role and assign it metadata including: default attributes, overrides, run list, and so on. Let's assume that you want this worker node to be able to use the Resque job processing system (which you have recipes for), you could then edit the run_list section to contain a list of recipes to run (remember: order matters). To accomplish this, your run_list might look like the following code:

"run_list": [
   "recipe[redis::server]", 
   "recipe[ruby]", 
   "recipe[corp::worker_scripts]"
 ],

Let's also assume that you want to install Ruby 1.9.3 and Redis 2.6.7 as a part of this role. Assuming that the recipes expected attributes are structured like this, your override_attributes might look like the following code snippet:

"override_attributes": {
   "ruby": {
       "version": "1.9.3"
   }, 
   "redis": {
       "server": { "version": "2.6.7" }
   }
],

Thus it allows you to update the run list at the same time as the attributes for the node. Compare this to the web interface, which would require visiting a few different screens to accomplish the same task. As you become more comfortable with knife, you may never even visit the web management console again.

Note

If you edit the data and the JSON fails to validate, you will likely lose your changes. As a result, it may be wiser to use from file in place of create. This allows you to edit the file ahead of time and upload it. An easy way to do this is to create a new node, and save the skeleton JSON somewhere locally and then edit that, uploading it once you have made your changes.

Deploying to multiple servers with a single bound!

Knife is a very powerful tool, one that can make any system administrator seem like they have super powers. One such feature of knife is the ability to execute commands on multiple servers with only a single command. It accomplishes this by leveraging the built-in search infrastructure of Chef (you will recall that when we had set up the Chef Server, we made sure that the chef-solr component was running, here is where it comes in handy).

Chef search queries

To perform a task across multiple servers, the ssh command takes a query, which is a search query that specifies one or more attributes and the values that you are looking for. These queries can be as simple as searching for nodes whose name is web00-production:

user@host% knife search node "name:web00-production"
1 items found

Node Name:   web00-production
Environment: _default
FQDN:        web00-production.mycorp.com
IP:          192.168.1.1
Run List:    role[web_server]
Roles:       web_server
Recipes:     users::shell_users, users::sysadmins, sudo, zsh
Platform:    ubuntu 12.04

You can use wildcard symbols "?" and "*" to match multiple records, such as the following:

knife search node "name:web??-production"
knife search node "name:*-production"

You can also search on any data field that Chef knows about, such as platform, roles, recipes, IP addresses, FQDN, environment, and so on. You can also use logical expressions combining search queries with AND and OR, and perform range searches using Solr-compatible search queries. Some more examples for completeness are as follows:

knife search node "platform:ubuntu"
knife search node "fqdn:*production*"
knife search node "recipes:apache2 OR recipes:nginx"
knife search node "recipes:gearman AND fqdn:*production*"
knife search firewall "id[mysql_server TO web_server]"
knife search user_databag "id{alex TO john}"

As you can see, we can search based on a variety of attributes using several different search techniques. Again, since Chef uses Solr-compatible search syntax, you can find a lot of good resources out there on more advanced search queries. Also note that you don't only have to search nodes (though the SSH command implicitly searches nodes); you can search any index—nodes, roles, environments, clients, and data bags.

Multiple SSH sessions

Let's consider a scenario where we want to deploy an application to a large number of different servers in a batch, but we only want to deploy it to the servers that are located on the east coast, as denoted by their FQDN of servername.east.mycorp.com. Thanks to Chef, we can do that easily with a command like the following one:

knife ssh "fqdn:*.east.mycorp.com" "chef-client" -x app_user 

This will run the command chef-client on each machine whose FQDN matches the wildcard expression "*.east.mycorp.com" and download and execute that node's run list from the Chef Server automatically. Again, using more advanced queries, we could restrict (or expand) the server list by using more specific query filter logic.

Once you have mastered this aspect of using knife, you can learn more about knife's support for executing multiple connections concurrently and even interact with terminal multiplexers such as screen and tmux.

Advanced data configuration using data bags

Thus far, we have seen storing data about our configuration in the attributes of nodes, roles, and recipes. But what if we have information that is global to our infrastructure such as user accounts, internal firewall rules, or other information that could possibly be used in a variety of different recipes? This is where data bags come in, they are a place where we can store arbitrary data about our configuration that can be searched for, read from, and written to by recipes.

What are data bags

Data bags are recipe-independent and cookbook-independent, globally available JSON data. They can be searched, accessed, modified, and created from recipes or via knife. Think of them as a place to store your infrastructure configuration. Examples of data that would be placed into data bags might include the following:

  • User accounts

  • SSH keys/deployment keys

  • Firewall rules

  • Internal/external IP addresses

  • Site-wide configuration data

  • API keys for various services

  • Configuration settings for servers/services

  • Anything else you can think of that you need to access from multiple nodes, recipes, roles, or knife

Data bags allow us to write recipes that are more generic in nature. Instead of writing a firewall recipe that loads a rigid set of rules, or requires that you place your settings inside of nodes and roles as override or default attributes, you could build a set of firewall rules inside a data bag for just this purpose. This way your infrastructure-wide firewall configurations are contained in one location and in a hierarchical, structured manner.

Structure

Data bags are containers, and inside each data bag are zero or more items each of which has name and some arbitrary JSON data. As such there are no enforcements on how you structure your item's data, so long as it is can be represented using JSON.

To take the example of centralized firewall data a little further, we can look at the way the ufw::databag recipe (ufw stands for "uncomplicated firewall", a popular iptables-based firewall rule generation package) makes use of data bags to make the recipe as flexible as possible.

ufw::databag expects that there is a data bag named "firewall" and inside of it are items that share names with two roles or nodes. For example, if we had two roles, web_server, and database_server, then our firewall data bag could contain two items named accordingly, each looking something like the following code:

{
  "name": "data_bag_item_firewall_web_server",
  "json_class": "Chef::DataBagItem",
  "chef_type": "data_bag_item",
  "data_bag": "firewall",
  "raw_data": {
    "rules": [
      {   
        "HTTP": {
          "dest_port": "80",
          "protocol": "tcp"
        }, 
        "HTTPS": {
          "dest_port": "443",
          "protocol": "tcp"
        }   
      }   
    ],  
    "id": "web_server"
  }
}

In the previous example the id of the item maps to the name of the role, so that the ufw::databag recipe knows where to fetch the data it needs to build its internal firewall rules.

Using in recipes

Data bags can be accessed directly from a recipe using a few of Chef's built-in data bag methods. There are two primary methods for fetching data from data bags: data_bag and data_bag_item. The former fetches the list of items stored in the data bag itself, the latter fetches a specific item from the data bag. Note that each of these makes an HTTP request to the Chef API at run-time so try to be conservative with how often you make these calls.

Accessing data

Let's say that you have a collection of web applications that you maintain in your infrastructure. Each of which has a name, a port that it listens on, and some database configuration settings. These items are stored in a data bag named "webapps."

Fetching the list of items from our "webapps" data bag would look like this:

data_bag("webapps") which could yield a list of items, such as errbit, phpmysql, wordpress.

To fetch a specific one, say Wordpress, we would use the data_bag_item method which takes two arguments: the name of the data bag, and the item to fetch. The call would look like:

data_bag_item("webapps", "wordpress")

It might, in turn, yield a Ruby hash like this:

{ "id" => "wordpress", 
 "db_config" =>
{ "server": "db00", "user": "dbuser", "password": "dbpassword" },        
 "port" => 8000, 
 "enabled" => true 
}

We can then use this data in the same way we would use any other variable in our recipe. It can be passed into a template, picked apart, and used in any other resource, or used as a conditional test or in a loop.

An example recipe

Using our previous example of some web applications, we could possibly have a recipe, called mycorp::webapplications that is responsible for deploying our web applications to their respective servers. The contents of which might look like the following code:

webapps = data_bag("webapps")
webapps.each do |webapp_name|
  webapp = data_bag_item("webapps", webappname)

  template "/etc/apache2/sites-available/#{webapp_name}.conf" do 
   source "webapp.conf.erb"
   owner "apache2"
   mode "0600"
   variables (:webapp => webapp)
  end

  if webapp[:enabled]	
   execute "a2ensite #{webapp_name}" do
     command "/usr/sbin/a2ensite #{webapp_name}"
     notifies :restart, resources(:service => "apache2")
   end
  end
end

As you can see from the preceding recipe, it accesses the "webapps" data bag and, for every item in there, generates an Apache virtual host file using the webapp.conf.erb template. It passes the data bag item to the template and then, if the webapp in question is enabled, enables the site and notifies the Apache2 service of an impending restart being required.

Searching data bags

Anywhere that we would use the data_bag method to get a list of all items, we can replace that with a search instead. This allows us to use the search service to filter which items we get back rather than scanning them ourselves. This is advantageous for several reasons:

  • Solr is designed for searching, so let it do what it's good at.

  • Solr has a very flexible search criteria, it will save you time to write a query rather than inspect the contents of the nodes.

  • Each time you fetch an item, you incur the overhead of making an HTTP request to the API. This may not matter for a few items, but if you have hundreds or thousands of items, it adds up.

The method for searching data bags is called search and it takes two arguments: the name of the data bag, and the query to execute. In fact, the method call data_bag('webapps') is synonymous with the call search('webapps', '*:*')—the results are exactly the same, and could be used interchangeably.

In our previous example, if we wanted only the web apps that were enabled, we could modify the recipe to begin with the following:

webapps = search("webapps", "enabled:true")

This would result in the set of web app items that have their key enabled set to true.

Wrapping up

As you can see, Chef has the ability to combine complex system resource management such as file templates, services, user accounts, packages, and more, with an incredibly feature-rich language for building recipes as well as first-class tools for interacting with your data. By learning to write recipes for tasks, we can automate setup and deployment of servers and make our infrastructure more cohesive and better organized.