Ansible: Ignore errors and move to the next node - ansible

My ansible playbook is setup to install docker on all the nodes in a cluster
As my input, i parse a list(array) of node ips and create my inventory file.
In a loop, i run this playbook for each node.
What i observe is that the playbook fails and doesn't proceed to complete the installation on the following nodes if even one of the previous nodes fails(unreachable host)
How can i ignore this error and run the playbook for all the nodes in my list.

You need to add ignore_unreachable: yes to your play. There are 2 important things to take note of though:
This needs Ansible version >= 2.7
If the task fails for any reason other than "host unreachable" then it will still abort the play. If you want to continue in this scenario, you will also need to add ignore_errors: yes

Here's one way to do this. The runbook continues to the following nodes on both unreachable errors and task errors.
---
- hosts: all
ignore_unreachable: true
tasks:
- ansible.builtin.ping:
register: ping
- when: ping.ping is defined
block:
- import_tasks: you_main_tasks.yml
rescue:
- ansible.builtin.debug:
msg: failed

Related

Ansible host reachability should not determine overall playbook success

A playbook which is gathering uptimes from hosts supplied by an inventory plugin reports failure if any hosts are unreachable; I'd like the PB to stop trying to run tasks against unreachable server - ie the default behaviour - but not fail the playbook as a whole, since it is a likely condition that given the 100s of servers in the inventory, some may have been torn down before the inventory is updated.
The first task in the playbook is setup (aka gather_facts), which is where I want to put the error handling:
- name: get some info about the host
setup:
gather_subset: minimal
ignore_unreachable: true
...
- name: Do something with the facts
write_data:
blah
The intention is that the playbook runs gather_facts, taking note of unreachability, but not allowing that to cause the PB as a whole to be marked as a failure.

Continuing Ansible on error in a multi-host deployment

In our deployment strategy, our playbooks take on the following structure:
workflow.yml
- hosts: host1
tasks:
- name: setup | create virtual machines on host1
include_tasks: setup.yml
- name: run | import a playbook that will target new virtual machines
include: "virtual_machine_playbook/main.yml"
- hosts: host1
tasks:
- name: cleanup | destroy virtual machines on host1
include_tasks: destroy.yml
virtual_machine_playbook/main.yml
- hosts: newCreatedVMs
roles:
- install
- run
Which works great most of the time. However if for some reason, the virtual_machine_playbook/main.yml errors out, the last hosts block does not run and we are required to manually destroy our VMs. I wanted to know if there was a way to mandate that each hosts block run, regardless of what happens before it.
Other Notes:
The reason that we structure our playbooks this way is because we would like everything to be as contained as possible. Variables are created in each hosts block that are rather important to the ones that follow. Splitting them out into separate files and invocations is something we have not had much success with
We have tried the standard ansible approach for error handling as found here, but most of the options only apply at the task level (blocks, ignore_errors, etc.)

Delegation a set of tasks to another host that relies on conditional

My question is slightly complex. I have some multi-node cluster infrastructure that orchestrated with Ansible.
And there can be nodes that Kubernetes masters or slaves. Depending on that I need to delegate a specific set of tasks to master node if current task is playing on slave node.
For example, I have inventory structure like this:
[k8s_master]
hostname ansible_ssh_host= ... etc.
[k8s_slaves]
hostname ansible_ssh_host= ... etc.
[k8s_cluster:children]
k8s_master
k8s_slaves
I have a task that checking if k8s node is master or slave and registering some value:
- name: Checking if node is kubernetes master
stat:
path: "{{kubeconf}}"
register: master_conf
and I want to execute some set of tasks depending on master_conf.stat.exists value (true or false) locally (if the node is k8s master) or delegate it to master (if the node is k8s slave). Problems:
I need to delegate a set of tasks or dynamically included
playbook but delegate_to does not work with block: or
include_tasks:.
I need to delegate this set of tasks depending
on conditional statement or play it locally.
I need to pass node
hostname to this set of tasks even it they will be playing on remote
node. For example, I can set it like that:
set_fact:
node_hostname: "{{ansible_hostname}}"
and then I need the variable {{node_hostname}} inside tasks even if they were
delegated. Then I need to register some variables during the play on
master node and use it in tasks on the slave node again.
Still can't find right solution. I've tried something like:
- name: Including tasks to perform if we are on the master node
include_tasks: set-of-tasks.yml
when: master_conf.stat.exists
- name: Including tasks to perform if we are on the slave node
include_tasks: set-of-tasks.yml
delegate_to: "{{item}}"
delegate_facts: true
with_items: "{{groups.k8s_master}}"
when: master_conf.stat.exists == false
but this doesn't work.
Resolved this case with combination of include and arbitrary var containing hostname of the node where I need to delegate my task (include_task doesn't support arbitrary vars).
So the syntax is:
- name: Checking if node is kubernetes master
stat:
path: "{{kubeconf}}"
register: master_conf
- name: Including tasks to perform if we are on the master node
include: set-of-tasks.yml
when: master_conf.stat.exists
- name: Including tasks to perform if we are on the slave node
include: set-of-tasks.yml
delegate_host: "{{item}}"
with_items: "{{groups.k8s_master}}"
when: master_conf.stat.exists == false
And then in set-of-tasks.yml adding delegate_to to delegated tasks:
delegate_to: "{{delegate_host}}"
For some reason default(omit) doesn't work for me (Ansible tries to resolve original node hostname and fails though it works fine with that name from inventory file with IP specified). So I added something like that at the beginning of set-of-tasks.yml:
- name: Setting target kubernetes master node
set_fact:
delegate_host: "{{ansible_hostname}}"
when: delegate_host is undefined

Error handling with "failed_when" doesn't work

I wish to create an ansible playbook to run on multiple servers in order to check for conformity to the compliance rules. Therefore the playbook is supposed to check whether several services are disabled and whether they are stopped.
I do know that not all the services will be running or enabled on all machines. Therefore I wish to handle certain return codes within the playbook.
I tried the failed_when Statement for this Task. It seems the way to go as it allows to set which RCs to handle.
- hosts: server_group1
remote_user: ansible
tasks:
- name: Stop services
command: /bin/systemctl stop "{{ item }}"
with_items:
- cups
- avahi
- slapd
- isc-dhcp-server
- isc-dhcp-server6
- nfs-server
- rpcbind
- bind9
- vsftpd
- dovecot
- smbd
- snmpd
- squid
register: output
failed_when: "output.rc != 0 and output.rc != 1"
However the loop only works if I use ignore_errors: True which is not shown in the example code.
I do know that I wish to catch RCs of 0 and 1 from the command executed by the playbook. But no matter what failed_when always generates a fatal error and my playbook fails.
I also tried the failed_when line without the "" but that doesn't change a thing.
Something is missing, but I don't see what.
Any advice?
Therefore the playbook is supposed to check whether several services are disabled and whether they are stopped.
Your playbook isn't doing either of these things. If a service exists, whether or not it's running, then systemctl stop <service> will return successfully in most cases. The only time you'll get a non-zero exit code is if either (a) the service does not exist or (b) systemd is unable to stop the service for some reason.
Note that calling systemctl stop has no effect on whether or not a service is disabled; with your current playbook, all those services would start back up the next time the host boots.
If your goal is not simply to check that services are stopped and disabled, but rather to ensure that services are stopped and disabled, you could do something like this:
- name: "Stop services"
service:
name: "{{ item }}"
enabled: false
state: stopped
ignore_errors: true
register: results
loop:
- cups
- avahi
- slapd
- isc-dhcp-server
- isc-dhcp-server6
- nfs-server
- rpcbind
- bind9
- vsftpd
- dovecot
- smbd
- snmpd
- squid
You probably want ignore_errors: true here, because you want run the task for every item in your list.
A different way of handling errors might be with something like:
failed_when: >-
results is failed and
"Could not find the requested service" not in results.msg|default('')
This would only fail the task if stopping the service failed for a reason other than the the fact that there was not matching service on the target host.

How to run an ansible task only once regardless of how many targets there are

Consider the following Ansible task:
- name: stop tomcat
gather_facts: false
hosts: pod1
pre_tasks:
- include_vars:
dir: "vars/{{ environment }}"
vars:
hipchat_message: "stop tomcat pod1 done."
hipchat_notify: "yes"
tasks:
- include: tasks/stopTomcat8AndClearCache.yml
- include: tasks/stopHttpd.yml
- include: tasks/hipchatNotification.yml
This stops tomcat on n number of servers. What I want it to do is send a hipchat notification when it's done doing this. However, this code sends a separate hipchat message for each server the task happens on. This floods the hipchat window with redundant messages. Is there a way to make the hipchat task happen once after the stop tomcat/stop httpd tasks have been done on all the targets? I want the task to shut down tomcat on all the servers, then send one hip chat message saying "tomcat stopped on pod 1".
You can conditionally run the hipchat notification task on only one of the pod1 hosts.
- include: tasks/hipChatNotification.yml
when: inventory_hostname == groups.pod1[0]
Alternately you could only run it on localhost if you don't need any of the variables from the previous play.
- name: Run notification
gather_facts: false
hosts: localhost
tasks:
- include: tasks/hipchatNotification.yml
You also could use the run_once flag on the task itself.
- name: Do a thing on the first host in a group.
debug:
msg: "Yay only prints once"
run_once: true
- name: Run this block only once per host group
block:
- name: Do a thing on the first host in a group.
debug:
msg: "Yay only prints once"
run_once: true
Ansible handlers are made for this type of problem where you want to run a task once at the end of an operation even though it may have been triggered multiple times in the play.
You can define a handler section in your playbook and notify it in the tasks, the handlers will not run unless notified by a task, and will only run once regardless of how many times they are notified.
handlers:
- name: hipchat notify
hipchat:
room: someroom
msg: tomcat stopped on pod 1
In your play tasks just include a "notify" on the tasks that should trigger the handler and if they change it will run the handler after all tasks have executed.
- name: Stop service httpd, if started
service:
name: httpd
state: stopped
notify:
- hipchat notify

Resources