Good and Bad Parts of Ansible after 2 Years of Usage

I have been using Ansible for 2.5 years now, professionally. I read a blogpost about Ansible alternatives, which made me re-evaluate why I use Ansible. So I wanted to make a list of pros/cons for future reference, to not forget why I like and dislike Ansible.

Image by mohamed Hassan

These are coming from a software engineer who now works as an SRE. Everything you’ll read is opinionated.

How did I use Ansible?

I used Ansible with a python middleware. This python code prepares variables, decides which playbook to run, runs a few playbooks in a row if necessary.

What did I use Ansible for?

Mostly configure servers, install software
Make deployments in a mutable infrastructure, run shell commands when things get messy
Create/drop databases, create/drop users in these databases, assign roles, etc.
Interact with AWS services such as S3 and EC2

Design principles we had

Same resources, multiple environments

We had a simple resource architecture, but a complicated service architecture. It was a pain to run all services in a local environment. And solutions like Vagrant were not much viable because they were reserving almost all the CPU and memory, and maintaining such solutions required also employing a support engineer just to help team members manage their local environments. So we had to provision n development environments for n developers, ideally.
Almost production-like environments

Differences between production and development bite everyone in the industry. We aimed to maintain almost production-like environments to decrease the number of surprises. But you can’t make it the same as production, I don’t know any customer who wants to receive emails from your QA environment or a company who wants to pay 20 times the price of production resources.
Maintain a small team

We were and still are, a small team, so we didn’t have the time to tweak a running wheel to exactly our requirements. Therefore we needed to write less code, use good tools that cover most of our use cases.

Ansible: the good parts

I like Ansible for several reasons:

Battle-tested: it is used by many companies in production environments, it is backed by a tech company, it is under regular development.
Declarative language: It is nice to not handle 3-4 cases of a task, instead add a parameter like state: present
Many ready-to-use modules: There are many modules that I can use covering most of my use cases, from OS-specific tasks to cloud service management.
Low barrier of entry to language: YAML has very simple rules, it is quite easy to start using Ansible.
Filters, functions, standard tools: Along with Jinja2 support, standard filters and functions come very handy.

Ansible: the bad parts

As famously known, there are no silver bullets. This is where I complain, so it will be longer than the good parts :) What I don’t like about Ansible:

Ansible has external dependencies to Python modules, both on the machine running the ansible commands and the host that running the playbooks.¹ So first of all, you need to provision host machines before using Ansible on them
Secondly, you need to deal with Python’s dependency management problems from the start. If you are not doing fancy stuff that’s fine, but when you start to use software that has strong dependencies to Python 2, now you can include that week’s favorite virtual environment management tool. Also you gotta plan for Python 2’s removal.
Installing third-party libraries give you some freedom, let you install bug fixes without waiting for ansible releases, but then you have to deal with all libraries playing nice together.
In complex software projects, readability is everything, especially if the code is used and maintained for more than 6 months. The signature of a method, function, or class, needs to be clear to show what it takes, what it does, and what it returns. In Ansible, you only have the second one: what it does. It is probably something like install_nginx. However, to understand which variables, or facts, it requires, you have to go to role definition. Search all templates it using. Search all the roles it is including. It is a mess.
Skipping roles or tasks is possible, but not nice. I always want to use the same playbooks for development and production. But I can’t do that, the world is not that of a nice place. I want to provide a proper development experience to my colleagues, so I want to reduce the strictness of production systems in development environments, such as giving options like please recreate the environment's database. If I use the same playbook for production, I need a nasty when condition there. The first time I did that, I had a small heart-attack when I saw recreate_database in production deployment, relieved after seeing like 10 tasks are skipped immediately.² Of course, I have the liberty of providing different playbooks for production and development, which I already do right now, but then I have to maintain two different playbooks, and when I need to change something I have to keep in mind that the other playbook needs to change as well.³
when conditions are working for if-else situations, but if you need to store another task’s result and use it in the next task’s when condition, your code gets forgotten 1 month later, instead of a regular 3 months timeline. You need to re-discover your own code more frequently.
I like package isolation in general. If a playbook doesn’t use a role, it shouldn’t have access to it (ideally). What I can do with Ansible is as follows:

common/
  - common_role1/tasks/main.yml
  - common_role2/tasks/main.yml
api_setup/
  - roles/
    - do_this_api/tasks/main.yml
    - do_that_api/tasks/main.yml
  - setup1.yml
  - setup2.yml
api_deploy/
  - roles/
    - do_this_api/tasks/main.yml
    - do_that_api/tasks/main.yml
  - deploy_safely.yml
  - deploy_by_nuking.yml

API is one domain here, this goes like five other domains. Ansible allows me to specify roles lookup path, which I specify as common. But it doesn’t allow me to approach the whole project from a domain-driven perspective. I’d like to have an api package, which contains its own common roles folder: api/common_roles, and each subproject under api will have access to these common roles and the root common roles at the same time. But right now, api_deploy and web_deploy playbooks share the same common roles path, which is far from ideal for me. This method solves my management problems today, but it creates a potential maintenance problem for the future.

Ansible doesn’t show real-time outputs. There is a long-time standing issue for that. This is especially painful for long tasks. When I’m running docker_container to build an image, I’m looking at a frozen log screen for more than 10 minutes sometimes. (yes, I’m building java, nice guess!)

Conclusion

Ansible is a good tool, it has paid of its investment. Having said that, I am ready for the next tool and the next concept.

Docker container module requires Docker, dockerpy installed on the host machine; all aws related modules require boto3 installed on the local machine. ↩︎
disable showing display_skipped_hosts = False option in ansible.cfg disables printing skipped tasks, but does it make it less dangerous? ↩︎
tags provide a better way to manage these things, I’m aware. But tags act as build modes, in some situations, when condition consists of the result of a previous task. In these situations, tags don’t help at all. ↩︎