Mastering Ansible

Once a playbook is parsed and the hosts are determined, Ansible is ready to execute a task. Tasks are made up of a name (optional, but please don't skip it), a module reference, module arguments, and task control keywords. A later chapter will cover task control keywords in detail, so we will only concern ourselves with the module reference and arguments.

Module reference

Every task has a module reference. This tells Ansible which bit of work to do. Ansible is designed to easily allow for custom modules to live alongside a playbook. These custom modules can be a wholly new functionality, or they can replace modules shipped with Ansible itself. When Ansible parses a task and discovers the name of the module to use for a task, it looks into a series of locations in order to find the module requested. Where it looks also depends on where the task lives, whether in a role or not.

If a task is in a role, Ansible will first look for the module within a directory tree named library within the role the task resides in. If the module is not found there, Ansible looks for a directory named library at the same level as the main playbook (the one referenced by the ansible-playbook execution). If the module is not found there, Ansible will finally look in the configured library path, which defaults to /usr/share/ansible/. This library path can be configured in an Ansible config file, or by way of the ANSIBLE_LIBRARY environment variable.

This design, allowing modules to be bundled with roles and playbooks, allows for adding functionality, or quickly repairing problems very easily.

Module arguments

Arguments to a module are not always required; the help output of a module will indicate which models are required and which are not. Module documentation can be accessed with the ansible-doc command:

Note

This command was piped into cat to prevent shell paging from being used.

Arguments can be templated with Jinja2, which will be parsed at module execution time, allowing for data discovered in a previous task to be used in later tasks; this is a very powerful design element.

Arguments can be supplied in a key = value format, or in a complex format that is more native to YAML. Here are two examples of arguments being passed to a module showcasing the two formats:

- name: add a keypair to nova
  nova_keypair: login_password={{ pass }} login_tenant_name=admin            
                name=admin-key

- name: add a keypair to nova
  nova_keypair: login_password: "{{ pass }}" login_tenant_name: admin
                name: admin-key

Both formats will lead to the same result in this example; however, the complex format is required if you wish to pass complex arguments into a module. Some modules expect a list object or a hash of data to be passed in; the complex format allows for this. While both formats are acceptable for many tasks, the complex format is the format used for the majority of examples in this book.

Module transport and execution

Once a module is found, Ansible has to execute it in some way. How the module is transported and executed depends on a few factors, however the common process is to locate the module file on the local filesystem and read it into memory, and then add in the arguments passed to the module. Finally, the boilerplate module code from core Ansible is added to complete the file object in memory. What happens next really depends on the connection method and runtime options (such as leaving the module code on the remote system for review).

The default connection method is smart, which most often resolves to the ssh connection method. With a default configuration, Ansible will open an SSH connection to the remote host, create a temporary directory, and close the connection. Ansible will then open another SSH connection in order to write out the task object from memory (the result of local module file, task module arguments, and Ansible boilerplate code) into a file within the temporary directory that we just created and close the connection.

Finally, Ansible will open a third connection in order to execute the module and delete the temporary directory and all its contents. The module results are captured from stdout in the JSON format, which Ansible will parse and handle appropriately. If a task has an async control, Ansible will close the third connection before the module is complete, and SSH back in to the host to check the status of the task after a prescribed period until the module is complete or a prescribed timeout has been reached.

Task performance

Doing the math from the above description, that's at least three SSH connections per task, per host. In a small fleet with a small number of tasks, this may not be a concern; however, as the task set grows and the fleet size grows, the time required to create and tear down SSH connections increases. Thankfully, there are a couple ways to mitigate this.

The first is an SSH feature, ControlPersist, which provides a mechanism to create persistent sockets when first connecting to a remote host that can be reused in subsequent connections to bypass some of the handshaking required when creating a connection. This can drastically reduce the amount of time Ansible spends on opening new connections. Ansible automatically utilizes this feature if the host platform where Ansible is run from supports it. To check whether your platform supports this feature, check the SSH main page for ControlPersist.

The second performance enhancement that can be utilized is an Ansible feature called pipelining. Pipelining is available to SSH-based connection methods and is configured in the Ansible configuration file within the ssh_connection section:

[ssh_connection]
pipelining=true

This setting changes how modules are transported. Instead of opening an SSH connection to create a directory, another to write out the composed module, and a third to execute and clean up, Ansible will instead open an SSH connection and start the Python interpreter on the remote host. Then, over that live connection, Ansible will pipe in the composed module code for execution. This reduces the connections from three to one, which can really add up. By default, pipelining is disabled.

Utilizing the combination of these two performance tweaks can keep your playbooks nice and fast even as you scale your fleet. However, keep in mind that Ansible will only address as many hosts at once as the number of forks Ansible is configured to run. Forks are the number of processes Ansible will split off as a worker to communicate with remote hosts. The default is five forks, which will address up to five hosts at once. Raise this number to address more hosts as your fleet grows by adjusting the forks= parameter in an Ansible configuration file, or by using the –forks (-f) argument with ansible or ansible-playbook.

Mastering Ansible

Mastering Ansible

Overview of this book

Related Content you might be interested in

Current Title:

Mastering Ansible

Module transport and execution

Module reference

Module arguments

Note

Module transport and execution

Task performance