Ansible does not cancel a job after running into fata error (awx) - ansible

I have the following issue concerning ansible (awx):
When a Job fails with fatal: [IP]: FAILED!, ansible does not cancel this job and awx keeps displaying "Running" forever. I need to cancel those jobs manual which is quite annoying.
The reason why ansible fails does not matter here.
I've tried to solve this problem by adding
- name: Fail task when the command error output prints FAILED
ansible.builtin.command: /usr/bin/example-command -x -y -z
register: command_result
failed_when: "'FAILED' in command_result.stderr"
at the top of the playbook, but it won't work.
If you have any ideas...
Thanks!

Playbooks support asynchronous mode, if you need stablish a timeout for your task you can use it. Ansible waits until the tasks either completes, fails or timeout. For this you should use async and poll parameters, async stablishs the timeout for the task and poll stablishs the time in wich ansible checks the status of the task. Both must be in seconds.
You could try as below
- name: Fail task when the command error output prints FAILED
ansible.builtin.command: /usr/bin/example-command -x -y -z
register: command_result
async: 60
poll: 15
For more information:
https://docs.ansible.com/ansible/latest/user_guide/playbooks_async.html#asynchronous-playbook-tasks

Related

Ansible task failing with shutdown command on win_command

I've a playbook to be used to hibernate several machines at once, but if I use it, it will hang on the first host of the list, but will run the command on the first node of the list without problem.
My question is, how can I simply send those commands without waiting for a response from nodes?
Here is the task that I am using:
- name: Hibernate
win_shell: 'shutdown /h'
If you do not want to wait for the return of a command, you can use asynchronous actions and polling:
If you want to run multiple tasks in a playbook concurrently, use async with poll set to 0. When you set poll: 0, Ansible starts the task and immediately moves on to the next task without waiting for a result. Each async task runs until it either completes, fails or times out (runs longer than its async value). The playbook run ends without checking back on async tasks.
Source: https://docs.ansible.com/ansible/latest/user_guide/playbooks_async.html#run-tasks-concurrently-poll-0
So, for your task:
- name: Hibernate
win_shell: 'shutdown /h'
async: 45
poll: 0

How can I slow down ansible?

Is there a way to slow down ansible by placing a "sleep 5 seconds" between every server run.
For example, I would require something like that:
# --sleep-in-between is a hypothetical flag that would easily explain what I'm looking for
ansible production_servers -a "systemctl restart network" -f 1 --sleep-in-between 5secs
So, if production_servers is a group of servers: server_1, server_2, server_3 then the above command will perform the following:
Output:
server_1: Executing systemctl restart network
sleep 5 seconds
server_2: Executing systemctl restart network
sleep 5 seconds
server_3: Executing systemctl restart network
sleep 5 seconds
I need automation to take place slowly so that I can observe the system for any glitches taking place in the monitoring system while ansible is running.
Q: "Is there a way to slow down ansible by placing a "sleep 5 seconds" between every server run?"
A: Yes. It is. Use wait_for module and set serial to 1. For example
shell> cat playbook.yml
- hosts: all
serial: 1
tasks:
- debug:
msg: systemctl restart network
- wait_for:
timeout: "{{ sleep_in_between|default(5) }}"
By default, each host will wait 5 seconds before termination. It's possible to set it from the command line. For example, sleep for 10 seconds
shell> ansible-playbook -e "sleep_in_between=10" playbook.yml

Ansible Debugger: continue execution when a task fails

I'm aware that the continue command will continue running the playbook, however it will stop running tasks on the failed host. Is there any way to continue running tasks on the host that failed?
https://docs.ansible.com/ansible/latest/user_guide/playbooks_debugger.html
In the task level, if it's likely to be failure, e.g. stop a stopped server, you can indicate "ignore_errors: yes", to tell Ansible it's OK to fail the task and continue, e.g.:
name: This task could fail, but it doesn't matter, contiue with next task
shell: "stop_server.sh"
ignore_errors: yes
While in host level, you already have the configuration to continue
Not sure if that's something you're looking for
A workaround: always skip a task in check mode by adding the following field to a task:
ignore_errors: "{{ ansible_check_mode }}"

Long running command in Ansible ending in failed status with host unreachable

I have to run a command in Ansible, which takes approximately 30+ minutes to complete. The command has to be executed in serial. The command I am running is nodetool repair in Cassandra and, if we started it parallel, it will hand the process in all machines. As repair in Cassandra cannot run parallel in all machines.
So, we are running them in serial. However, the command sometimes takes long time to complete.
As the command is taking long and my Ansible playbook is dying after waiting for some time, with message node unreachable.
{"changed": false, "msg": "Failed to connect to the host via ssh.", "unreachable": true}
Is there a way, that I can wait for the process to complete?
I am using serial: 1 for hosts and running below task
task:
- name: Execute nodetool repair
command: {{cassandra_installation_dir}}/bin/nodetool repair -j 4
You should use async for this:
- name: Execute nodetool repair
command: {{cassandra_installation_dir}}/bin/nodetool repair -j 4
async: 3600
poll: 10
This will run the command in asynchronous mode for max 3600 seconds (1h) and check if the command is finished every 10 seconds (which is default anyway). If the command doesn't finish after 1h, the task will fail.

How to run a command that will reset the network interfaces in ansible?

I want to run a command on remote machine. The command will reset the network interfaces. How to run this in ansible playbook
- name: Execute config command
sudo: yes
shell: "mycommand"
async: 0
poll: 0
ignore_errors: true
The above task is not working consistently. Even I tried with async: 300, the same inconsistency is being observed.
You're likely running into a situation similar to the one I describe in this question. Depending on the command you are running (mycommand) the network connection is likely dropping very quickly, causing Ansible to think that the connection was dropped unexpectedly. When this happens it will cause Ansible to treat it as an error.
You likely want to modify mycommand to include a sleep for a few seconds before the reset occurs, and continue using async:0 and poll:0. This will give Ansible enough time to launch mycommand into the background and cleanly disconnect from the server without error before the server resets the network connection.
Depending on what your next task is you may also want to include a wait_for task that runs via local_action to ensure Ansible waits for this network reset to complete before attempting any other tasks.

Resources