I'm trying to write my first ansible playbook to setup my Arch Linux workstations and servers. So far everything worked fine but now I've run into a problem I can't really wrap my head around.
I'm trying to have multiple roles that require changes to the mkinitcpio.conf file, often in the same line.
At the moment I have a role mkinitcpio which generates the /etc/mkinitcpio.conf file based on a template.
group_vars/all.yml
encryption
enabled: true
roles/mkinitcpio/tasks/main.yml
- name: Generate mkinitcpio.conf
template:
src: mkinitcpio.conf.j2
dest: /etc/mkinitcpio.conf
notify:
- generate mkinitcpio
roles/mkinitcpio/handlers/main.yml
- name: generate mkinitcpio
comand: mkinitcpio -p linux
roles/mkinitcpio/templates/mkinitcpio.conf.j2
HOOKS=(base systemd autodetect keyboard sd-vconsole modconf block {% if encryption.enabled == true %}sdencrypt {% endif %}sd-lvm2 filesystems fsck)
Now I have a second role plymouth that also has to update the mkinitcpio.conf, dependend if it will be installed or not.
It will have to add sd-plymouth between systemd and autodetect and notify generate mkinitcpio after that.
The solutions I came up with so far are:
1.)
Make the plymouth role dependend on the mkinitcpio role.
The plymouth role installs plymouth, than modifies the HOOKS line via the lineinfile module. After that, it notifies generate mkinitcpio if needed.
This solution will get very complicated, if more roles will be added, that may update the mkinitcpio.conf file.
2.) Add another conditional to roles/mkinitcpio/templates/mkinitcpio.conf.j2:
HOOKS=(base systemd {% if plymouth == true %}sd-plymouth {% endif %}autodetect keyboard sd-vconsole modconf block {% if encryption.enabled == true %}sdencrypt {% endif %}sd-lvm2 filesystems fsck)
Make the mkinitcpio role dependend on the plymouth role, so that it will install plymouth at first. After that, the mkinitcpio role will update the complete mkinitcpio.conf file and notify generate mkinitcpio if needed.
This solution does not seem right to me, because it has a "reversed dependency".
I hope I described my problem understandable and you can give me some tips about the best way to solve it. Thank you in advance!
Related
I want to combine Ansible with Terraform so that Terraform creates the machines and Ansible will provision them. Using terraform-provisioner-ansible it's possible to bring them seamlessly together. But I saw a lack of change detection, which doesn't happen when Ansible runs standalone.
TL;DR: How can I apply changes made in Ansible to the Terraform Ansible plugin? Or at least execute the ansible plugin on every update so that Ansible can handle this itself?
Example use case
Consider this playbook which installs some packages
- name: Ansible install package test
hosts: all
tasks:
- name: Install cli tools
become: yes
apt:
name: "{{ tools }}"
update_cache: yes
vars:
tools:
- nnn
- htop
which is integrated into Terraform using the plugin
resource "libvirt_domain" "ubuntu18" {
# ...
connection {
type = "ssh"
host = "192.168.2.2"
user = "ubuntu"
private_key = "${file("~/.ssh/id_rsa")}"
}
provisioner "ansible" {
plays {
enabled = true
become_method = "sudo"
playbook = {
file_path = "ansible-test.yml"
}
}
}
}
will fork fine on the first run. But later I notice some package was missing
- name: Ansible install package test
hosts: all
tasks:
- name: Install cli tools
become: yes
apt:
name: "{{ tools }}"
update_cache: yes
vars:
tools:
- nnn
- htop
- vim # This is a new package
When running terraform plan I'll get No changes. Infrastructure is up-to-date. My new package vim will never got installed! So Ansible didn't run because if Ansible runs, it would install the new package.
The problem seems to be the provisioner itself:
Creation-time provisioners are only run during creation, not during updating or any other lifecycle. They are meant as a means to perform bootstrapping of a system.
But what is the correct way of applying updates? I tried a null_ressource with depends_on link to my vm ressource, but Terraform doesn't detect changes on the Ansible part, too. Seems to be a lack of change detection from the Terraform plugin.
In the doc I only found destroy time provisioners. But none for updates. I could destroy and re-create the machine. This would slow down things a lot. I like the Ansible aproach of checking what is presend and only apply changes which aren't already present, this seems a good way of provisioning.
Isn't it possible to do something similar with Terraform?
With my current experience (more Ansible than Terraform), I don't see any other way as dropping the nice plugin and execute Ansible on my own. But this would also drop the nice integration. So I need to generate inventory files on my own or even by hand (which misses the automation approach in my point of view).
source_code_hash may be an option but is inflexible: When having multiple plays/roles, I need to do this by hand for every single file which keeps error-prone easily.
Use a null_ressource with pseudo trigger
The idea from tedsmitt uses a timestamp as trigger, which seems the only way to force a provisioner. Howver running ansible-playbook plain from the CLI would create overhead of maintaining the inventory by hand. You can't call the python dynamic inventory script from here since terraform apply need to complete before
In my point of view, a better approach would be running the ansible provisioner here:
resource "null_resource" "ansible-provisioner" {
triggers {
build_number = "${timestamp()}"
}
depends_on = ["libvirt_domain.ubuntu18"]
connection {
type = "ssh"
host = "192.168.2.2"
user = "ubuntu"
private_key = "${file("~/.ssh/id_rsa")}"
}
provisioner "ansible" {
plays {
enabled = true
become_method = "sudo"
playbook = {
file_path = "ansible-test.yml"
}
}
}
}
Only drawbag here is: Terraform will recognize a pseudo change everytime
Terraform will perform the following actions:
-/+ null_resource.ansible-provisioner (new resource required)
id: "3365240528326363062" => <computed> (forces new resource)
triggers.%: "1" => "1"
triggers.build_number: "2019-06-04T09:32:27Z" => "2019-06-04T09:34:17Z" (forces new resource)
Plan: 1 to add, 0 to change, 1 to destroy.
This seems the best compromise to me, according to other workarounds avaliable.
Run Ansible manually with dynamic inventory
Another way I found is the dynamic inventory plugin, detailled description can be found in this blog entry. It integrates into Terraform and let you specify ressources as inventory host, some example:
resource "ansible_host" "k8s" {
inventory_hostname = "192.168.2.2"
groups = ["test"]
vars = {
ansible_user = "ubuntu"
ansible_ssh_private_key_file = "~/.ssh/id_rsa"
}
}
The Python script use this information to generate a dynamic inventory, which can be used like this:
ansible-playbook -i /etc/ansible/terraform.py ansible-test.yml
A big benefit is: It keeps your configuration DRY. Terraform has the leading configuration file, no need to also maintain separate Ansible files. And also the ability for variable usage (e.g. the inventory hostname shouldn't be hardcoded for production usage as in my example).
In my use case (Provision Rancher testcluster) the null_ressource approach seems better since EVERYTHING is build with a single Terraform command. No need to additionally executing Ansible. But depending on the requirements, it can be better to keep Ansible a seperate step, so I posted this as alternative.
Installing the plugin
When trying this solution, remember that you need to install the corresponding Terraform plugin from here:
version=0.0.4
wget https://github.com/nbering/terraform-provider-ansible/releases/download/v${version}/terraform-provider-ansible-linux_amd64.zip -O terraform-provisioner-ansible.zip
unzip terraform-provisioner-ansible.zip
chmod +x linux_amd64/*
mv linux_amd64 ~/.terraform.d/plugins
And also notice, that the automated provisioner from the solution above needs to be removed first, since it has the same name (may conflict).
As you mentioned in your question, there is no change detection in the plugin. You could implement a trigger on a null_resource so that it runs on every apply.
resource "null_resource" "ansible-provisioner" {
triggers {
build_number = "${timestamp()}"
}
provisioner "local-exec" {
command = "ansible-playbook ansible-test.yml"
}
}
You can try this, It works for me.
resource "null_resource" "ansible-swarm-setup" {
local_file.ansible_inventory ]
#nhu
triggers= {
instance_ids = join(",",openstack_compute_instance_v2.swarm-cluster-hosts[*].id)
}
connection {
type = "ssh"
user = var.ansible_user
timeout = "3m"
private_key = var.private_ssh_key
host = local.cluster_ips[0]
}
}
When it detects the changes in instance index/ids then it will triger ansible playbook.
I have some continuous integration checks which run a few ansible-playbook commands. Each playbook may be running many plays, including numerous large roles.
Every now and then, somebody introduces some change that causes a warning when ansible-playbook runs, e.g. something like this:
[WARNING]: when statements should not include jinja2 templating delimiters
such as {{ }} or {% %}. Found: "{{ some_variable}}" not in
some_result.stdout
or:
[WARNING]: Consider using unarchive module rather than running tar
or some deprecation warnings like:
[DEPRECATION WARNING]: ec2_facts is kept for backwards compatibility but usage
is discouraged. The module documentation details page may explain more about
this rationale.. This feature will be removed in a future release. Deprecation
warnings can be disabled by setting deprecation_warnings=False in ansible.cfg.
and so on. Sometimes these warnings pop up when we upgrade ansible versions. Regardless of why they happen, I would really like for some way to have the ansible-playbook command fail loudly when it causes one of these warnings, instead of quietly proceeding on and having my CI check be successful. Is there any way to do this? I'm using ansible 2.4.3 currently.
I find lots of discussion about ways to hide these warnings, but haven't found anything about promoting them to hard errors.
I have the exact same problem. My workaround is:
run the playbook with --check and 'tee' to a temporary file
do some grep-magic to filter out 'WARNING]:'
do some grep-sed-magic to filter out results that are not zero (except ok/skipped)
I know it is not ideal, so if you came with a nice solution please share :-)
There is an option any_errors_fatal in ansible.cfg, what about putting
any_errors_fatal = True
To hide such [DEPRECATION WARNING] as stated you can
"disabled by setting deprecation_warnings=False in ansible.cfg."
But this may not be ideal, As in future you have no visibility of what may change or being depreciated.
If you are absolutely sure that you can ignore such a warning, You can use::
args:
warn: false
Or modify your code as suggested in the warning message.
else
If you want to raise error for any warnings, You could register the result and apply failed_when like below example.
- hosts: localhost
gather_facts: false
tasks:
- name: Fail if there is any warnings
shell: touch a.txt
register: result
failed_when: result.warnings is defined`
What for? Do they interfere with the implementation of the playbook? Most likely it is not. Just fix the playbook code.
Such a warning to remove as I understood it is either impossible or very difficult (in any case, it did not work for me when I tried):
[WARNING]: when statements should not include jinja2 templating delimiters
such as {{ }} or {% %}. Found: "{{ some_variable}}" not in
some_result.stdout
Just use:
some_variable not in some_result.stdout
In the "when" conditions it is not necessary to set {{}} to get the variable value
Personally, I prefer to search through "find":
some_result.stdout.find(some_variable) == -1
This warning:
[DEPRECATION WARNING]: ec2_facts is kept for backwards compatibility but usage
is discouraged. The module documentation details page may explain more about
this rationale.. This feature will be removed in a future release. Deprecation
warnings can be disabled by setting deprecation_warnings=False in ansible.cfg.
can be removed from output by setting:
deprecation_warnings = false
in the file ansible.cfg
This warn:
[WARNING]: Consider using unarchive module rather than running tar
Easy to remove if you use the module ansible instead of tar in "command":
unarchive: src=file.tgz dest=/opt/dest
I have the following Ansible role which simply does the following:
Create a temporary directory.
Download Goss, a server testing tool, into that temporary directory.
Upload a main Goss YAML file for the tests.
Upload additional directories for additional included tests.
Here are a couple places where I'm using it:
naftulikay.python-dev
naftulikay.ruby-dev
Specifically, these playbooks upload a local file adjacent to the playbook named goss.yml and a directory goss.d again adjacent to the playbook.
Unfortunately, it seems that Ansible logic has changed recently, causing my tests to not work as expected. My role ships with a default goss.yml, and it appears that when I set goss_file: goss.yml within my playbook, it uploads degoss/files/goss.yml instead of the Goss file adjacent to my playbook.
If I'm passing the name of a file to a role, is there a way to specify that Ansible should look up the file in the context of the playbook or the current working directory?
The actual role logic that is no longer working is this:
# deploy test files including the main and additional test files
- name: deploy test files
copy: src={{ item }} dest={{ degoss_test_root }} mode=0644 directory_mode=0755 setype=user_tmp_t
with_items: "{{ [goss_file] + goss_addtl_files + goss_addtl_dirs }}"
changed_when: degoss_changed_when
I am on Ansible 2.3.2.0 and I can reproduce this across distributions (namely CentOS 7, Ubuntu 14.04, and Ubuntu 16.04).
Ansible searches for relative paths in role's scope first, then in playbook's scope.
For example if you want to copy file test.txt in role r1, search order is this:
/path/to/playbook/roles/r1/files/test.txt
/path/to/playbook/roles/r1/test.txt
/path/to/playbook/roles/r1/tasks/files/test.txt
/path/to/playbook/roles/r1/tasks/test.txt
/path/to/playbook/files/test.txt
/path/to/playbook/test.txt
You can inspect your search_path order by calling ansible with ANSIBLE_DEBUG=1.
To answer your question, you have to options:
Use filename that doesn't exist within role's scope. Like:
goss_file: local_goss.yml
Supply absolute path. For example, you can use:
goss_file: '{{ playbook_dir }}/goss.yml'
Ansible doesn't apply search logic if the path is absolute.
Sometimes, ansible doesn't do what you want. And increasing verbosity doesn't help. For example, I'm now trying to start coturn server, which comes with init script on systemd OS (Debian Jessie). Ansible considers it running, but it's not. How do I look into what's happening under the hood? Which commands are executed, and what output/exit code?
Debugging modules
The most basic way is to run ansible/ansible-playbook with an increased verbosity level by adding -vvv to the execution line.
The most thorough way for the modules written in Python (Linux/Unix) is to run ansible/ansible-playbook with an environment variable ANSIBLE_KEEP_REMOTE_FILES set to 1 (on the control machine).
It causes Ansible to leave the exact copy of the Python scripts it executed (either successfully or not) on the target machine.
The path to the scripts is printed in the Ansible log and for regular tasks they are stored under the SSH user's home directory: ~/.ansible/tmp/.
The exact logic is embedded in the scripts and depends on each module. Some are using Python with standard or external libraries, some are calling external commands.
Debugging playbooks
Similarly to debugging modules increasing verbosity level with -vvv parameter causes more data to be printed to the Ansible log
Since Ansible 2.1 a Playbook Debugger allows to debug interactively failed tasks: check, modify the data; re-run the task.
Debugging connections
Adding -vvvv parameter to the ansible/ansible-playbook call causes the log to include the debugging information for the connections.
Debugging Ansible tasks can be almost impossible if the tasks are not your own. Contrary to what Ansible website states.
No special coding skills needed
Ansible requires highly specialized programming skills because it is not YAML or Python, it is a messy mix of both.
The idea of using markup languages for programming has been tried before. XML was very popular in Java community at one time. XSLT is also a fine example.
As Ansible projects grow, the complexity grows exponentially as result. Take for example the OpenShift Ansible project which has the following task:
- name: Create the master server certificate
command: >
{{ hostvars[openshift_ca_host]['first_master_client_binary'] }} adm ca create-server-cert
{% for named_ca_certificate in openshift.master.named_certificates | default([]) | lib_utils_oo_collect('cafile') %}
--certificate-authority {{ named_ca_certificate }}
{% endfor %}
{% for legacy_ca_certificate in g_master_legacy_ca_result.files | default([]) | lib_utils_oo_collect('path') %}
--certificate-authority {{ legacy_ca_certificate }}
{% endfor %}
--hostnames={{ hostvars[item].openshift.common.all_hostnames | join(',') }}
--cert={{ openshift_generated_configs_dir }}/master-{{ hostvars[item].openshift.common.hostname }}/master.server.crt
--key={{ openshift_generated_configs_dir }}/master-{{ hostvars[item].openshift.common.hostname }}/master.server.key
--expire-days={{ openshift_master_cert_expire_days }}
--signer-cert={{ openshift_ca_cert }}
--signer-key={{ openshift_ca_key }}
--signer-serial={{ openshift_ca_serial }}
--overwrite=false
when: item != openshift_ca_host
with_items: "{{ hostvars
| lib_utils_oo_select_keys(groups['oo_masters_to_config'])
| lib_utils_oo_collect(attribute='inventory_hostname', filters={'master_certs_missing':True}) }}"
delegate_to: "{{ openshift_ca_host }}"
run_once: true
I think we can all agree that this is programming in YAML. Not a very good idea. This specific snippet could fail with a message like
fatal: [master0]: FAILED! => {"msg": "The conditional check 'item !=
openshift_ca_host' failed. The error was: error while evaluating
conditional (item != openshift_ca_host): 'item' is undefined\n\nThe
error appears to have been in
'/home/user/openshift-ansible/roles/openshift_master_certificates/tasks/main.yml':
line 39, column 3, but may\nbe elsewhere in the file depending on the
exact syntax problem.\n\nThe offending line appears to be:\n\n\n-
name: Create the master server certificate\n ^ here\n"}
If you hit a message like that you are doomed. But we have the debugger right? Okay, let's take a look what is going on.
master0] TASK: openshift_master_certificates : Create the master server certificate (debug)> p task.args
{u'_raw_params': u"{{ hostvars[openshift_ca_host]['first_master_client_binary'] }} adm ca create-server-cert {% for named_ca_certificate in openshift.master.named_certificates | default([]) | lib_utils_oo_collect('cafile') %} --certificate-authority {{ named_ca_certificate }} {% endfor %} {% for legacy_ca_certificate in g_master_legacy_ca_result.files | default([]) | lib_utils_oo_collect('path') %} --certificate-authority {{ legacy_ca_certificate }} {% endfor %} --hostnames={{ hostvars[item].openshift.common.all_hostnames | join(',') }} --cert={{ openshift_generated_configs_dir }}/master-{{ hostvars[item].openshift.common.hostname }}/master.server.crt --key={{ openshift_generated_configs_dir }}/master-{{ hostvars[item].openshift.common.hostname }}/master.server.key --expire-days={{ openshift_master_cert_expire_days }} --signer-cert={{ openshift_ca_cert }} --signer-key={{ openshift_ca_key }} --signer-serial={{ openshift_ca_serial }} --overwrite=false"}
[master0] TASK: openshift_master_certificates : Create the master server certificate (debug)> exit
How does that help? It doesn't.
The point here is that it is an incredibly bad idea to use YAML as a programming language. It is a mess. And the symptoms of the mess we are creating are everywhere.
Some additional facts. Provision of prerequisites phase on Azure of Openshift Ansible takes on +50 minutes. Deploy phase takes more than +70 minutes. Each time! First run or subsequent runs. And there is no way to limit provision to a single node. This limit problem was part of Ansible in 2012 and it is still part of Ansible today. This fact tells us something.
The point here is that Ansible should be used as was intended. For simple tasks without the YAML programming. Fine for lots of servers but it should not be used for complex configuration management tasks.
Ansible is a not Infrastructure as Code ( IaC ) tool.
If you ask how to debug Ansible issues, you are using it in a way it was not intended to be used. Don't use it as a IaC tool.
Here's what I came up with.
Ansible sends modules to the target system and executes them there. Therefore, if you change module locally, your changes will take effect when running playbook. On my machine modules are at /usr/lib/python2.7/site-packages/ansible/modules (ansible-2.1.2.0). And service module is at core/system/service.py. Anisble modules (instances of AnsibleModule class declared at module_utils/basic.py) has log method, which sends messages to systemd journal if available, or falls back to syslog. So, run journalctl -f on target system, add debug statements (module.log(msg='test')) to module locally, and run your playbook. You'll see debug statements under ansible-basic.py unit name.
Additionally, when you run ansible-playbook with -vvv, you can see some debug output in systemd journal, at least invocation messages, and error messages if any.
One more thing, if you try to debug code that's running locally with pdb (import pdb; pdb.set_trace()), you'll most likely run into BdbQuit exception. That's because python closes stdin when creating a thread (ansible worker). The solution here is to reopen stdin before running pdb.set_trace() as suggested here:
sys.stdin = open('/dev/tty')
import pdb; pdb.set_trace()
Debugging roles/playbooks
Basically debugging ansible automation over big inventory across large networks is none the other than debugging a distributed network application. It can be very tedious and delicate, and there are not enough user friendly tools.
Thus I believe the also answer to your question is a union of all the answers before mine + small addition. So here:
absolutely mandatory: you have to want to know what's going on, i.e. what you're automating, what you are expecting to happen. e.g. ansible failing to detect service with systemd unit as running or as stopped usually means a bug in service unit file or service module, so you need to 1. identify the bug, 2. Report the bug to vendor/community, 3. Provide your workaround with TODO and link to bug. 4. When bug is fixed - delete your workaround
to make your code easier to debug use modules, as much as you can
give all tasks and variables meaningful names.
use static code analysis tools like ansible-lint. This saves you from really stupid small mistakes.
utilize verbosity flags and log path
use debug module wisely
"Know thy facts" - sometimes it is useful to dump target machine facts into file and pull it to ansible master
use strategy: debugin some cases you can fall into a task debugger at error. You then can eval all the params the task is using, and decide what to do next
the last resort would be using Python debugger, attaching it to local ansible run and/or to remote Python executing the modules. This is usually tricky: you need to allow additional port on machine to be open, and if the code opening the port is the one causing the problem?
Also, sometimes it is useful to "look aside" - connect to your target hosts and increase their debuggability (more verbose logging)
Of course log collection makes it easier to track changes happening as a result of ansible operations.
As you can see, like any other distributed applications and frameworks - debug-ability is still not as we'd wish for.
Filters/plugins
This is basically Python development, debug as any Python app
Modules
Depending on technology, and complicated by the fact you need to see both what happens locally and remotely, you better choose language easy enough to debug remotely.
You could use register module, and debug module to print the return values. For example, I want to know what is the return code of my script execution called "somescript.sh", so I will have my tasks inside the play such as:
- name: my task
shell: "bash somescript.sh"
register: output
- debug:
msg: "{{ output.rc }}"
For full return values you can access in Ansible, you can check this page: http://docs.ansible.com/ansible/latest/common_return_values.html
There are multiple levels of debugging that you might need but the easiest one is to add ANSIBLE_STRATEGY=debug environment variable, which will enable the debugger on the first error.
1st approach: Debugging Ansible module via q module and print the debug logs via the q module as q('Debug statement'). Please check q module page to check where in tmp directory the logs would get generated in the majority of the case either it'll be generated either at: $TMPDIR\q or \tmp\q, so one can do tail -f $TMPDIR\q to check the logs generated once the Ansible module play runs (ref: q module).
2nd Approach: If the play is running on localhost one can use pdb module to debug the play following respective doc: https://docs.ansible.com/ansible/latest/dev_guide/debugging.html
3rd Approach: Using Ansible debug module to print the play result and debug the module(ref: Debug module).
I am using the zzet.rbenv role on my playbook. It has a files/default-gems file that it copies to the provisioned system.
I need my playbook to check for a myplaybook/files/default-gems and use it if it exists, using the zzet.rbenv/files/default-gems if otherwise.
How can I do that?
After some research and trial/error. I found out that Ansible is not capable of checking if files exist between roles. This is due to the way role dependencies (which roles themselves) will get expanded into the one requiring it, making it part of the playbook. There are no tasks that will let you differentiate my_role/files/my_file.txt from required_role/files/my_file.txt.
One approach to the problem (the one I found the easiest and cleanest) was to:
Add a variable to the my_role with the path to the file I want to use (overriding the default one)
Add a task (identical to the one that uses the default file) that checks if the above variable is defined and run the task using it
Example
required_role
# Existing task
- name: some task
copy: src=roles_file.txt dest=some/directory/file.txt
when: my_file_path is not defined
# My custom task
- name: my custom task (an alteration of the above task)
copy: src={{ my_file_path }} dest=/some/directory/file.txt
when: my_file_path is defined
my_role
#... existing code
my_file_path: "path/to/my/file"
As mentioned by Ramon de la Fuente: this solution was accepted into the zzet.rbenv repo :)