Error handling with "failed_when" doesn't work - ansible

I wish to create an ansible playbook to run on multiple servers in order to check for conformity to the compliance rules. Therefore the playbook is supposed to check whether several services are disabled and whether they are stopped.
I do know that not all the services will be running or enabled on all machines. Therefore I wish to handle certain return codes within the playbook.
I tried the failed_when Statement for this Task. It seems the way to go as it allows to set which RCs to handle.
- hosts: server_group1
remote_user: ansible
tasks:
- name: Stop services
command: /bin/systemctl stop "{{ item }}"
with_items:
- cups
- avahi
- slapd
- isc-dhcp-server
- isc-dhcp-server6
- nfs-server
- rpcbind
- bind9
- vsftpd
- dovecot
- smbd
- snmpd
- squid
register: output
failed_when: "output.rc != 0 and output.rc != 1"
However the loop only works if I use ignore_errors: True which is not shown in the example code.
I do know that I wish to catch RCs of 0 and 1 from the command executed by the playbook. But no matter what failed_when always generates a fatal error and my playbook fails.
I also tried the failed_when line without the "" but that doesn't change a thing.
Something is missing, but I don't see what.
Any advice?

Therefore the playbook is supposed to check whether several services are disabled and whether they are stopped.
Your playbook isn't doing either of these things. If a service exists, whether or not it's running, then systemctl stop <service> will return successfully in most cases. The only time you'll get a non-zero exit code is if either (a) the service does not exist or (b) systemd is unable to stop the service for some reason.
Note that calling systemctl stop has no effect on whether or not a service is disabled; with your current playbook, all those services would start back up the next time the host boots.
If your goal is not simply to check that services are stopped and disabled, but rather to ensure that services are stopped and disabled, you could do something like this:
- name: "Stop services"
service:
name: "{{ item }}"
enabled: false
state: stopped
ignore_errors: true
register: results
loop:
- cups
- avahi
- slapd
- isc-dhcp-server
- isc-dhcp-server6
- nfs-server
- rpcbind
- bind9
- vsftpd
- dovecot
- smbd
- snmpd
- squid
You probably want ignore_errors: true here, because you want run the task for every item in your list.
A different way of handling errors might be with something like:
failed_when: >-
results is failed and
"Could not find the requested service" not in results.msg|default('')
This would only fail the task if stopping the service failed for a reason other than the the fact that there was not matching service on the target host.

Related

Restarting a service after looped commands on multiple servers

I poked around a bit here but didn't see anything that quite matched up to what I am trying to accomplish, so here goes.
So I've put together my first Ansible playbook which opens or closes one or more ports on the firewall of one or more hosts, for one or more specified IP addresses. Works great so far. But what I want to do is restart the firewall service after all the tasks for a given host are complete (with no errors, of course).
NOTE: The hostvars/localhost references just hold vars_prompt input from the user in a task list above this one. I store prompted data in hosts: localhost build a dynamic host list based on what the user entered, and then have a separate task list to actually do the work.
So:
- name: Execute remote firewall-cmd for each host in "dynamically created host group"
hosts: dynamically_created_host_list
gather_facts: no
tasks:
- set_fact:
hostList: "{{hostvars['localhost']['hostList']}}"
- set_fact:
portList: "{{hostvars['localhost']['portList']}}"
- set_fact:
portStateRequested: "{{hostvars['localhost']['portStateRequested']}}"
- set_fact:
portState: "{{hostvars['localhost']['portState']}}"
- set_fact:
remoteIPs: "{{hostvars['localhost']['remoteIPs']}}"
- name: Invoke firewall-cmd remotely
firewalld:
.. module-specific stuff here ...
with_nested:
- "{{ remoteIPs.split(',') }}"
- "{{ portList.split(',') }}"
register: requestStatus
In my original version of the script, which only did 1 port for 1 host for 1 IP, I just did:
- name: Reload firewalld
when: requestStatus.changed
systemd:
name: firewalld
state: reloaded
But I don't think that will work as easily here because of the nesting. For example. Let's say I want to open port 9999 for a remote IP address of 1.1.1.1 on 10 different hosts. And let's say the 5th host has an error for some reason. I may not want to restart the firewall service at that point.
Actually, now that I think about it, I guess that in that scenario, there would be 4 new entries to the firewall config, and 6 that didn't take because of the error. Now I'm wondering if I need to track the successes, and have a rescue block within the Playbook to back those entries that did go through.
Grrr.... any ideas? Sorry, new to Ansible here. Plus, I hate YAML for things like this. :D
Thanks in advance for any guidance.
It looks to me like what you are looking for is what Ansible call handlers.
As we’ve mentioned, modules should be idempotent and can relay when
they have made a change on the remote system. Playbooks recognize this
and have a basic event system that can be used to respond to change.
These ‘notify’ actions are triggered at the end of each block of tasks
in a play, and will only be triggered once even if notified by
multiple different tasks.
For instance, multiple resources may indicate that apache needs to be
restarted because they have changed a config file, but apache will
only be bounced once to avoid unnecessary restarts.
Note that handlers are simply a pair of
A notify attribute on one or multiple tasks
A handler, with a name matching your above mentioned notify attribute
So your playbook should look like
- name: Execute remote firewall-cmd for each host in "dynamically created host group"
hosts: dynamically_created_host_list
gather_facts: no
tasks:
# set_fact removed for concision
- name: Invoke firewall-cmd remotely
firewalld:
# .. module-specific stuff here ...
with_nested:
- "{{ remoteIPs.split(',') }}"
- "{{ portList.split(',') }}"
notify: Reload firewalld
handlers:
- name: Reload firewalld
systemd:
name: firewalld
state: reloaded

Ansible: Ignore errors and move to the next node

My ansible playbook is setup to install docker on all the nodes in a cluster
As my input, i parse a list(array) of node ips and create my inventory file.
In a loop, i run this playbook for each node.
What i observe is that the playbook fails and doesn't proceed to complete the installation on the following nodes if even one of the previous nodes fails(unreachable host)
How can i ignore this error and run the playbook for all the nodes in my list.
You need to add ignore_unreachable: yes to your play. There are 2 important things to take note of though:
This needs Ansible version >= 2.7
If the task fails for any reason other than "host unreachable" then it will still abort the play. If you want to continue in this scenario, you will also need to add ignore_errors: yes
Here's one way to do this. The runbook continues to the following nodes on both unreachable errors and task errors.
---
- hosts: all
ignore_unreachable: true
tasks:
- ansible.builtin.ping:
register: ping
- when: ping.ping is defined
block:
- import_tasks: you_main_tasks.yml
rescue:
- ansible.builtin.debug:
msg: failed

How to run an ansible task only once regardless of how many targets there are

Consider the following Ansible task:
- name: stop tomcat
gather_facts: false
hosts: pod1
pre_tasks:
- include_vars:
dir: "vars/{{ environment }}"
vars:
hipchat_message: "stop tomcat pod1 done."
hipchat_notify: "yes"
tasks:
- include: tasks/stopTomcat8AndClearCache.yml
- include: tasks/stopHttpd.yml
- include: tasks/hipchatNotification.yml
This stops tomcat on n number of servers. What I want it to do is send a hipchat notification when it's done doing this. However, this code sends a separate hipchat message for each server the task happens on. This floods the hipchat window with redundant messages. Is there a way to make the hipchat task happen once after the stop tomcat/stop httpd tasks have been done on all the targets? I want the task to shut down tomcat on all the servers, then send one hip chat message saying "tomcat stopped on pod 1".
You can conditionally run the hipchat notification task on only one of the pod1 hosts.
- include: tasks/hipChatNotification.yml
when: inventory_hostname == groups.pod1[0]
Alternately you could only run it on localhost if you don't need any of the variables from the previous play.
- name: Run notification
gather_facts: false
hosts: localhost
tasks:
- include: tasks/hipchatNotification.yml
You also could use the run_once flag on the task itself.
- name: Do a thing on the first host in a group.
debug:
msg: "Yay only prints once"
run_once: true
- name: Run this block only once per host group
block:
- name: Do a thing on the first host in a group.
debug:
msg: "Yay only prints once"
run_once: true
Ansible handlers are made for this type of problem where you want to run a task once at the end of an operation even though it may have been triggered multiple times in the play.
You can define a handler section in your playbook and notify it in the tasks, the handlers will not run unless notified by a task, and will only run once regardless of how many times they are notified.
handlers:
- name: hipchat notify
hipchat:
room: someroom
msg: tomcat stopped on pod 1
In your play tasks just include a "notify" on the tasks that should trigger the handler and if they change it will run the handler after all tasks have executed.
- name: Stop service httpd, if started
service:
name: httpd
state: stopped
notify:
- hipchat notify

How to view the single service status via Ansible with modules rather than shell?

Need to check the Single service status on multiple system using Ansible playbook , is there a way to do using service modules rather than shell module ?
With the service module (and the related more specific modules like systemd) you can make sure that a service is in a desired state.
For example, the following task will enable apache start at boot if not already configured, start apache if it is stopped, and report change if any change was made or ok if no change was needed.
- name: Enable and start apache
service:
name: apache
enabled: yes
state: started
Simply checking the service status without any change is not supported by those modules. You will have to use the command line and analyse the output / return status.
example with systemd
- name: Check status of my service
command: systemctl -q is-active my_service
check_mode: no
failed_when: false
changed_when: false
register: my_service_status
- name: Report status of my service
debug:
msg: "my_service is {{ (my_service_status.rc == 0) | ternary('Up', 'Down') }}"
To be noted:
check_mode: no make the task run wether or not you use '--check' on ansible_playbook command line. Without this, in check mode, the next task will fail with an undefined variable.
failed_when: false refrains the task to fail when the return code is different from 0 (when the service is not started). You can be more specific by listing all the possible return code in normal conditions and failing when you get an other (e.g. failed_when: not (my_service_status in [0, 3, X]))
changed_when: false makes the task always report ok except of changed by default for command and shell module.

Using Ansible to stop service that might not exist

I am using Ansible 2.6.1.
I am trying to ensure that certain service is not running on target hosts.
Problem is that the service might not exist at all on some hosts. If this is the case Ansible fails with error because of missing service. Services are run by Systemd.
Using service module:
- name: Stop service
service:
name: '{{ target_service }}'
state: stopped
Fails with error Could not find the requested service SERVICE: host
Trying with command module:
- name: Stop service
command: service {{ target_service }} stop
Gives error: Failed to stop SERVICE.service: Unit SERVICE.service not loaded.
I know I could use ignore_errors: yes but it might hide real errors too.
An other solution would be having 2 tasks. One checking for existance of service and other that is run only when first task found service but feels complex.
Is there simpler way to ensure that service is stopped and avoid errors if the service does not exists?
I'm using the following steps:
- name: Get the list of services
service_facts:
- name: Stop service
systemd:
name: <service_name_here>
state: stopped
when: "'<service_name_here>.service' in services"
service_facts could be called once in the gathering facts phase.
The following will register the module output in service_stop; if the module execution's standard output does not contain "Could not find the requested service" AND the service fails to stop based on return code, the module execution will fail. Since you did not include the entire stack trace I am assuming the error you posted is in the standard output, you may need to change slightly based on your error.
- name: Stop service
register: service_stop
failed_when:
- '"Could not find the requested service" not in service_stop.stdout'
- service_stop.rc != 0
service:
name: '{{ target_service }}'
state: stopped
IMHO there isn't simpler way to ensure that service is stopped. Ansible service module doesn't check service's existence. Either (1) more then one task, or (2) command that check service's existence is needed. The command would be OS specific. For example for FreeBSD
command: "service -e | grep {{ target_service }} && service {{ target_service }} stop"
Same solution as Vladimir's, but for Ubuntu (systemd) and with better state handling:
- name: restart {{ target_service }} if exists
shell: if systemctl is-enabled --quiet {{ target_service }}; then systemctl restart {{ target_service }} && echo restarted ; fi
register: output
changed_when: "'restarted' in output.stdout"
It produces 3 states:
service is absent or disabled — ok
service exists and was restarted — changed
service exists and restart failed — failed
Same solution as #ToughKernel's but use systemd to manage service.
- name: disable ntpd service
systemd:
name: ntpd
enabled: no
state: stopped
register: stop_service
failed_when:
- stop_service.failed == true
- '"Could not find the requested service" not in stop_service.msg'
# the order is important, only failed == true, there will be
# attribute 'msg' in the result
When the service module fails, check if the service that needs to be stopped is installed at all. That is similar to this answer, but avoids the rather lengthy gathering of service facts unless necessary.
- name: Stop a service
block:
- name: Attempt to stop the service
service:
name: < service name >
state: stopped
rescue:
- name: Get the list of services
service_facts:
- name: Verify that Nagios is not installed
assert:
that:
- "'< service name >.service' not in services"

Resources