Book Image

Getting Started with Terraform

By : Kirill Shirinkin
Book Image

Getting Started with Terraform

By: Kirill Shirinkin

Overview of this book

Terraform is a tool used to efficiently build, configure, and improve production infrastructure. It can manage existing infrastructure as well as create custom in-house solutions. This book shows you when and how to implement infrastructure as a code practices with Terraform. It covers everything necessary to set up complete management of infrastructure with Terraform, starting with the basics of using providers and resources. This book is a comprehensive guide that begins with very small infrastructure templates and takes you all the way to managing complex systems, all using concrete examples that evolve over the course of the book. It finishes with the complete workflow of managing a production infrastructure as code – this is achieved with the help of version control and continuous integration. At the end of this book, you will be familiar with advanced techniques such as multi-provider support and multiple remote modules.
Table of Contents (15 chapters)
Getting Started with Terraform
About the Author
About the Reviewer
Customer Feedback

Declarative vs Procedural tools for Infrastructure as Code

What is infrastructure code specifically? It highly depends on your particular infrastructure setup.

In the simplest case, it might be just a bunch of shell scripts and component-specific configuration files (Nginx configuration, cron jobs, and so on) stored in source control. Inside these shell scripts, you specify exact steps computer needs to take to achieve the state you need:

  1. Copy this file to that folder.

  2. Replace all occurrences of ADDRESS with

  3. Restart the Nginx service.

  4. Send an e-mail about successful deployment.

This is what we call procedural programming. It's not bad. For example, build steps of Continuous Integration tools such as Jenkins that are a perfect fit for a procedural approach—after all the sequence of command is exactly what you need in this case. 

However, you can only go that far with shell scripts when it comes to configuring servers and higher level pieces. The more common and mature approach these days is to use tools that provide a declarative, rather than a procedural way to define your infrastructure. With declarative definitions, you don't need to think how to do something; you only write what should be there.

Perhaps the main benefit of it is that rerunning a declarative definition will never do the same job twice, whereas executing the same shell script will most likely break something on the second run. Proper configuration management tool will ensure that the server will be in the exactly same state as defined in your code. This property of modern configuration and provisioning tools is named idempotency.

Let's look at an example. Let's say that you have a box in your network that hosts packages repository. For some reason, instead of using DNS server, you want to hardcode the IP address of this box to the  /etc/hosts file with a domain name repository.internal.


In Unix-like systems, the  /etc/hosts file contains a local text database of DNS records. The system tries to resolve DNS name by looking at this file first, and only asking DNS-server only after.

Not a complex task to do, given that you only need to add a new line to the  /etc/hosts file. To achieve this, you could have a script like the following:

echo repository.internal >> /etc/hosts/hosts

Running it once will do the job: required entry will be added to the end of the /etc/hosts file. But what will happen if you execute it again? You guessed it right: exactly the same line will be appended again. And even worse, what if the IP address of repository box will change? Then, if you execute your script, you will end up with two different host entries for the same domain name.

You can ensure idempotency yourself inside the script, with the high usage of conditional checks. But why reinvent the wheel when there is already a tool to do exactly this job? It would be so much better to just define the end result, without composing sequence of commands to achieve this.

And that is exactly what configuration management tools such as Puppet and Chef do by providing you a special Domain Specific Language (DSL) for defining the desired state of the machine. The certain downside is the necessity to learn a new DSL: a special small language focused on solving one particular task. It's not a complete programming language, neither does it to be; in this case, its only job is to describe the state of your server.

Let's look at how the same task could be done with the help of a Puppet manifest:

host { 'repository.internal': 
  ip => '', 

Applying this manifest multiple times will never add extra entries, and changing the IP address in the manifest will be reflected correctly in host files changing the existing entry, and not creating a new one.


There is an additional benefit I should mention: on top of idempotency, you often get platform agnosticism. What it means is that the same definition could be used for completely different operating systems without any change. For example, by using package resource in Puppet, you don't care whether the underlying system uses rpm or deb.

Now you should better understand that when it comes to configuration management tools that provide the declarative way of doing things are preferred.

Modern configuration management tools such as Chef or Puppet completely solved the problem of setting up a single machine. There is an increasing number of high-quality libraries (be it cookbooks or modules) for configuring all kinds of software in an (almost) OS-agnostic way. But configuring what goes inside single server is only part of the picture. The other part that is located a layer above also requires a new tooling.