Good and Bad Parts of Ansible after 2 Years of Usage
I have been using Ansible for 2.5 years now, professionally. I read a blogpost about Ansible alternatives, which made me re-evaluate why I use Ansible. So I wanted to make a list of pros/cons for future reference, to not forget why I like and dislike Ansible.
Image by mohamed Hassan
These are coming from a software engineer who now works as an SRE. Everything you’ll read is opinionated.
How did I use Ansible?
I used Ansible with a python middleware. This python code prepares variables, decides which playbook to run, runs a few playbooks in a row if necessary.
What did I use Ansible for?
- Mostly configure servers, install software
- Make deployments in a mutable infrastructure, run shell commands when things get messy
- Create/drop databases, create/drop users in these databases, assign roles, etc.
- Interact with AWS services such as S3 and EC2
Design principles we had
-
Same resources, multiple environments
We had a simple resource architecture, but a complicated service architecture. It was a pain to run all services in a local environment. And solutions like Vagrant were not much viable because they were reserving almost all the CPU and memory, and maintaining such solutions required also employing a support engineer just to help team members manage their local environments. So we had to provision n development environments for n developers, ideally.
-
Almost production-like environments
Differences between production and development bite everyone in the industry. We aimed to maintain almost production-like environments to decrease the number of surprises. But you can’t make it the same as production, I don’t know any customer who wants to receive emails from your QA environment or a company who wants to pay 20 times the price of production resources.
-
Maintain a small team
We were and still are, a small team, so we didn’t have the time to tweak a running wheel to exactly our requirements. Therefore we needed to write less code, use good tools that cover most of our use cases.
Ansible: the good parts
I like Ansible for several reasons:
-
Battle-tested: it is used by many companies in production environments, it is backed by a tech company, it is under regular development.
-
Declarative language: It is nice to not handle 3-4 cases of a task, instead add a parameter like
state: present
-
Many ready-to-use modules: There are many modules that I can use covering most of my use cases, from OS-specific tasks to cloud service management.
-
Low barrier of entry to language: YAML has very simple rules, it is quite easy to start using Ansible.
-
Filters, functions, standard tools: Along with Jinja2 support, standard filters and functions come very handy.
Ansible: the bad parts
As famously known, there are no silver bullets. This is where I complain, so it will be longer than the good parts :) What I don’t like about Ansible:
-
Ansible has external dependencies to Python modules, both on the machine running the ansible commands and the host that running the playbooks.1 So first of all, you need to provision host machines before using Ansible on them
-
Secondly, you need to deal with Python’s dependency management problems from the start. If you are not doing fancy stuff that’s fine, but when you start to use software that has strong dependencies to Python 2, now you can include that week’s favorite virtual environment management tool. Also you gotta plan for Python 2’s removal.
-
Installing third-party libraries give you some freedom, let you install bug fixes without waiting for ansible releases, but then you have to deal with all libraries playing nice together.
-
In complex software projects, readability is everything, especially if the code is used and maintained for more than 6 months. The signature of a method, function, or class, needs to be clear to show what it takes, what it does, and what it returns. In Ansible, you only have the second one: what it does. It is probably something like
install_nginx
. However, to understand which variables, or facts, it requires, you have to go to role definition. Search all templates it using. Search all the roles it is including. It is a mess. -
Skipping roles or tasks is possible, but not nice. I always want to use the same playbooks for development and production. But I can’t do that, the world is not that of a nice place. I want to provide a proper development experience to my colleagues, so I want to reduce the strictness of production systems in development environments, such as giving options like
please recreate the environment's database
. If I use the same playbook for production, I need a nastywhen
condition there. The first time I did that, I had a small heart-attack when I sawrecreate_database
in production deployment, relieved after seeing like 10 tasks areskipped
immediately.2 Of course, I have the liberty of providing different playbooks for production and development, which I already do right now, but then I have to maintain two different playbooks, and when I need to change something I have to keep in mind that the other playbook needs to change as well.3 -
when
conditions are working for if-else situations, but if you need to store another task’s result and use it in the next task’swhen
condition, your code gets forgotten 1 month later, instead of a regular 3 months timeline. You need to re-discover your own code more frequently. -
I like package isolation in general. If a playbook doesn’t use a role, it shouldn’t have access to it (ideally). What I can do with Ansible is as follows:
common/
- common_role1/tasks/main.yml
- common_role2/tasks/main.yml
api_setup/
- roles/
- do_this_api/tasks/main.yml
- do_that_api/tasks/main.yml
- setup1.yml
- setup2.yml
api_deploy/
- roles/
- do_this_api/tasks/main.yml
- do_that_api/tasks/main.yml
- deploy_safely.yml
- deploy_by_nuking.yml
API is one domain here, this goes like five other domains. Ansible allows me to specify roles lookup path, which I specify as
common
. But it doesn’t allow me to approach the whole project from a domain-driven perspective. I’d like to have an
api
package, which contains its own common roles folder: api/common_roles
, and each subproject under api
will have
access to these common roles and the root common roles at the same time. But right now, api_deploy
and web_deploy
playbooks share the same common roles path, which is far from ideal for me. This method solves my management problems
today, but it creates a potential maintenance problem for the future.
- Ansible doesn’t show real-time outputs. There is a long-time standing issue
for that. This is especially painful for long tasks. When I’m running
docker_container
to build an image, I’m looking at a frozen log screen for more than 10 minutes sometimes. (yes, I’m building java, nice guess!)
Conclusion
Ansible is a good tool, it has paid of its investment. Having said that, I am ready for the next tool and the next concept.
-
Docker container module requires Docker,
dockerpy
installed on the host machine; allaws
related modules requireboto3
installed on the local machine. ↩︎ -
disable showing display_skipped_hosts = False
option inansible.cfg
disables printing skipped tasks, but does it make it less dangerous? ↩︎ -
tags provide a better way to manage these things, I’m aware. But tags act as build modes, in some situations,
when
condition consists of the result of a previous task. In these situations, tags don’t help at all. ↩︎