Long running command in Ansible ending in failed status with host unreachable - ansible

I have to run a command in Ansible, which takes approximately 30+ minutes to complete. The command has to be executed in serial. The command I am running is nodetool repair in Cassandra and, if we started it parallel, it will hand the process in all machines. As repair in Cassandra cannot run parallel in all machines.
So, we are running them in serial. However, the command sometimes takes long time to complete.
As the command is taking long and my Ansible playbook is dying after waiting for some time, with message node unreachable.
{"changed": false, "msg": "Failed to connect to the host via ssh.", "unreachable": true}
Is there a way, that I can wait for the process to complete?
I am using serial: 1 for hosts and running below task
task:
- name: Execute nodetool repair
command: {{cassandra_installation_dir}}/bin/nodetool repair -j 4

You should use async for this:
- name: Execute nodetool repair
command: {{cassandra_installation_dir}}/bin/nodetool repair -j 4
async: 3600
poll: 10
This will run the command in asynchronous mode for max 3600 seconds (1h) and check if the command is finished every 10 seconds (which is default anyway). If the command doesn't finish after 1h, the task will fail.

Related

Ansible task failing with shutdown command on win_command

I've a playbook to be used to hibernate several machines at once, but if I use it, it will hang on the first host of the list, but will run the command on the first node of the list without problem.
My question is, how can I simply send those commands without waiting for a response from nodes?
Here is the task that I am using:
- name: Hibernate
win_shell: 'shutdown /h'
If you do not want to wait for the return of a command, you can use asynchronous actions and polling:
If you want to run multiple tasks in a playbook concurrently, use async with poll set to 0. When you set poll: 0, Ansible starts the task and immediately moves on to the next task without waiting for a result. Each async task runs until it either completes, fails or times out (runs longer than its async value). The playbook run ends without checking back on async tasks.
Source: https://docs.ansible.com/ansible/latest/user_guide/playbooks_async.html#run-tasks-concurrently-poll-0
So, for your task:
- name: Hibernate
win_shell: 'shutdown /h'
async: 45
poll: 0

extend the duration/execution time of ansible command

- name: 'TASK 1: Debug'
aireos_command:
commands:
- 'debug client <mac address>'
register: debug
i have this command and how can i keep this command running for about 5 minutes instead of ending it as soon as the command has been configured. Wanted to keep this command running so we can capture necessary logs in 5 minute duration
According the documentation of Cisco Wireless LAN Controller (WLC) Command Reference, there seems to be no sleep of wait there.
But according description of aireos_command you might be able to introduce additional parameters interval, match, and retries to achieve your goal.

Ansible does not cancel a job after running into fata error (awx)

I have the following issue concerning ansible (awx):
When a Job fails with fatal: [IP]: FAILED!, ansible does not cancel this job and awx keeps displaying "Running" forever. I need to cancel those jobs manual which is quite annoying.
The reason why ansible fails does not matter here.
I've tried to solve this problem by adding
- name: Fail task when the command error output prints FAILED
ansible.builtin.command: /usr/bin/example-command -x -y -z
register: command_result
failed_when: "'FAILED' in command_result.stderr"
at the top of the playbook, but it won't work.
If you have any ideas...
Thanks!
Playbooks support asynchronous mode, if you need stablish a timeout for your task you can use it. Ansible waits until the tasks either completes, fails or timeout. For this you should use async and poll parameters, async stablishs the timeout for the task and poll stablishs the time in wich ansible checks the status of the task. Both must be in seconds.
You could try as below
- name: Fail task when the command error output prints FAILED
ansible.builtin.command: /usr/bin/example-command -x -y -z
register: command_result
async: 60
poll: 15
For more information:
https://docs.ansible.com/ansible/latest/user_guide/playbooks_async.html#asynchronous-playbook-tasks

Ansible ad-hoc command background not working

It is my understanding that running ansible with -B will put the process in the background and I will get the console back. I don't know if I am using it wrong, or it is not working as expected. What I expect is to have the sleep command initiate on all three computers and then the prompt will be available to me to run another command. What happens is that I do not get access to the console until the command completes (in this case 2 minutes).
Is something wrong, am I misunderstanding what the -B does or am I doing it wrong?
With polling:
Without polling:
There are two parameters to configure async tasks in Ansible: async and poll.
async in playbooks (-B in ad-hoc) – total number of seconds you allow the task to run.
poll in playbooks (-P in ad-hoc) – period in seconds how often you want check for result.
So if you just need fire and forget ad-hoc command, use -B 3600 -P 0: allow 1 min execution and don't care about result.
By default -P 15, so ansible doesn't exit but checks your job every 15 seconds.

How to run a command that will reset the network interfaces in ansible?

I want to run a command on remote machine. The command will reset the network interfaces. How to run this in ansible playbook
- name: Execute config command
sudo: yes
shell: "mycommand"
async: 0
poll: 0
ignore_errors: true
The above task is not working consistently. Even I tried with async: 300, the same inconsistency is being observed.
You're likely running into a situation similar to the one I describe in this question. Depending on the command you are running (mycommand) the network connection is likely dropping very quickly, causing Ansible to think that the connection was dropped unexpectedly. When this happens it will cause Ansible to treat it as an error.
You likely want to modify mycommand to include a sleep for a few seconds before the reset occurs, and continue using async:0 and poll:0. This will give Ansible enough time to launch mycommand into the background and cleanly disconnect from the server without error before the server resets the network connection.
Depending on what your next task is you may also want to include a wait_for task that runs via local_action to ensure Ansible waits for this network reset to complete before attempting any other tasks.

Resources