ansible playbook stuck for particular kind of host - ansible

There are few hosts, where ssh will be talking lot of time/forever to give prompt, in this case my playbook wait forever
Here,I want to skip/kill the task after waiting max 1min (60sec).
This is my playbook
- name: uname
# shell: uname -a;hostname
command: timeout 20 uname -a;hostname #tried but no luck
async: 60
poll: 0
register: yum_sleeper
- name: 'Verify the job status'
async_status: jid={{ yum_sleeper.ansible_job_id }}
register: output
until: output.finished
retries: 6
delay: 10
ignore_errors: true
The stange part here is that, the "uname" task is not started at all on target host as that host never give prompt after ssh, so what ever command I mentioned in task are not going to perform.
The playbook output as like below
Using /root/.ansible.cfg as config file
PLAY [all] *******************************************************************************
I also ran the first ansible "ping" command that also stuck
ansible -i invetory_file web -m ping
Please suggest

Related

Ansible playbook for installing cyberpanel stops execution

Greetings for the day,
I was trying to install cyberpanel using Ansible by writing a playbook.
The playbook was this
---
- name: Installing cybepanel
hosts: ansible_client
user: ubuntu
become: yes
become_user: root
become_method: sudo
tasks:
- name: Installing screen
apt:
name: screen
state: present
- name: Download the script
get_url:
url=https://cyberpanel.net/install.sh
dest=/root/installer.sh
- name: Execute the script
become: yes
become_method: su
become_user: root
become_exe: sudo su -
expect:
command:
screen -S cyberpanel-installation
sh installer.sh
echo: yes
responses:
(.*) Please enter the number(>*): "1"
'Full installation \[Y/n\]:': "Y"
(.*) Remote MySQL(.*): "N"
(.*)Enter specific version such as:(.*): ""
(.*)Choose(.*)password(.*): "r"
'Please select \[Y/n\]:': "Y"
(.*)Please type Yes or no(.*): "no"
'Would you like to restart your server now? \[y/N\]:': "y"
async: 1800
poll: 5
register: output
- name: 'Checking the status'
async_status:
jid: "{{ output.ansible_job_id }}"
register: job_result
until: job_result.finished
retries: 150
delay: 60
- name: debugging
debug:
var=output
The playbook doesn't have any error or conflicts.
The playbook works fine and the cyberpanel is installing with in 20-30 mins(As there is screen in the playbook. The screen stays detached in the destination server and after attaching it (when the playbook stops execution) in the destination server we could see that the installation in progress And successfully completes with in 20-30 mins.
The issue is that the playbook stops execution after 1 minutes of execution with a return code(rc)=0.
This is the output of playbook.
As you can see i am using the async method with poll=0 and poll>0 for long time execution of the script. It is not working the playbook still timesout.
I also increased the SSH timeout to check whether any ssh timeout takes place or not and there is no ssh timeout too.
Also tried using the timeout attribute instead of async method that also don't worked for me.
Anybody with a helping hand is well appreciated.

Ansible playbook and return values during a failure

I have an ansible playbook as below, the objective is to run a ping check across a number of hosts, and the playbook should return if the ping was successful or not per host. This works by running a ping on a windows host to ascertain connectivity, the status of the connectivity is then passed onto a powershell script. The powershell script will record the result for analysis.
What I want the script to now do is report back if connectivity fails for a particular host, at the moment with the playbook, if host connectivity fails, the entire playbook errors. The idea is that if i have 10 hosts, and connectivity fails for 2 and works for 8 hosts, the failed ping response should be returned to the powershell script for 2 hosts, and 8 hosts should send a return value of connection successful to the powershell script.
---
- name: Get host facts
set_fact:
serverdomain: "{{ansible_domain}}"
server_ip: "{{ansible_ip_addresses[1]}}"
- name: Host Ping Check
ignore_errors: yes
win_ping:
register: var_ping
- name: Get Host name
debug: msg="{{the_host_name}}"
- name: Set Execution File and parameters
set_fact:
scriptfile: "{{ansible_user_dir}}\\scripts\\host_check.ps1"
params: "-servername '{{the_host_name}}' -response var_ping.failed"
- name: Execute script
win_command: powershell.exe "{{scriptfile}}" "{{params}}"
Do not use ignore_errors: true as this is almost always an issue but instead use failed_when: false.
Make use of when: to condition execution based of what var_ping contains, likely you want to print it before to figure out what it contains in each execution path.

Ansible continue playbook after connection lost

I have a playbook like below,
- name: Executing shell script
shell: |
cd "{{ mntout.stdout }}"
sh config_script -f
register: installo
ignore_errors: yes
- name: Formatting output
shell: echo "{{ installo.stdout }}" | sed -r "s/\x1B\[([0-9]{1,3}(;[0-9]{1,2})?)?[mGK]//g"
register: trout
delegate_to: localhost
- name: Show output
debug:
msg: "{{ trout.stdout | replace('\r','\n')|replace('\n','\n') | replace('\b','') }}"
delegate_to: localhost
So, Once the config script completes it will reboot the target(And I dont want to wait for target to come up).
But, I want the playbook to continue after connection lost and execute remaining tasks on localhost as I need to print the output of the script. Any suggestions?
Need to continue even after below error
fatal: [147.234.158.192]: UNREACHABLE! => {"changed": false, "msg": "Failed to connect to the host via ssh: ", "unreachable": true}
TIA
I'm not sure where in your playbook the target machine reboots. The best you can do is to let Ansible reboot the target machine with the reboot module. Especially since you're expecting the machine to reboot.
- name: Reboot a slow machine that might have lots of updates to apply
reboot:
reboot_timeout: 3600

Ansible reboot remote host wait for 60 sec and reboot next remote host

I'm trying to reboot remote host with ansible. For now it working but remote host rebooted in the same times. I would like to reboot one by one with sleep time.
I tried to put wait_for in the code below but It doesn't work. I got error that conflict with shell.
Playbook file
- name: Rebooting ...
wait_for:
time_out: 60
shell: sleep 2 && /sbin/shutdown -r now "Reboot required"
async: 1
poll: 0
ignore_errors: true
register: rebooting
Error message:
The error appears to have been in '/home/ansible/reboot-hosts.yml': line 20, column 5, but may
be elsewhere in the file depending on the exact syntax problem.
The offending line appears to be:
- name: Rebooting ...
^ here
exception type: <class 'ansible.errors.AnsibleParserError'>
exception: conflicting action statements: shell, wait_for
The error appears to have been in '/home/ansible/reboot-hosts.yml': line 20, column 5, but may
be elsewhere in the file depending on the exact syntax problem.
This is expect procedure:
Reboot host 1
Sleep 60 seconds
Reboot host 2
Sleep 60 seconds
Reboot host 3
As #Peschke said, try the reboot module. But do do it one at a time, you need to set serial: 1 in the play:
- hosts: all
serial: 1
become: yes
tasks:
- name: Rebooting ...
reboot:
reboot_timeout: 60
The issue is that you have two actions in your task: wait_for and shell. Unless you use a block, the wait_for module needs to be within its own task.
Try something like this:
- name: Rebooting ...
shell: sleep 2 && /sbin/shutdown -r now "Reboot required"
async: 1
poll: 0
ignore_errors: true
register: rebooting
- name: wait for reboot
wait_for:
timeout: 60
delegate_to: localhost
Another option is to use the reboot module. This module will wait for the system to go down and come back up before proceeding. By default, it waits 600 seconds for the system to come back up.
If you only wanted to wait 60 seconds, you could do the following:
- name: Rebooting ...
reboot:
reboot_timeout: 60

Ansible task for checking that a host is really offline after shutdown

I am using the following Ansible playbook to shut down a list of remote Ubuntu hosts all at once:
- hosts: my_hosts
become: yes
remote_user: my_user
tasks:
- name: Confirm shutdown
pause:
prompt: >-
Do you really want to shutdown machine(s) "{{play_hosts}}"? Press
Enter to continue or Ctrl+C, then A, then Enter to abort ...
- name: Cancel existing shutdown calls
command: /sbin/shutdown -c
ignore_errors: yes
- name: Shutdown machine
command: /sbin/shutdown -h now
Two questions on this:
Is there any module available which can handle the shutdown in a more elegant way than having to run two custom commands?
Is there any way to check that the machines are really down? Or is it an anti-pattern to check this from the same playbook?
I tried something with the net_ping module but I am not sure if this is its real purpose:
- name: Check that machine is down
become: no
net_ping:
dest: "{{ ansible_host }}"
count: 5
state: absent
This, however, fails with
FAILED! => {"changed": false, "msg": "invalid connection specified, expected connection=local, got ssh"}
In more restricted environments, where ping messages are blocked you can listen on ssh port until it goes down. In my case I have set timeout to 60 seconds.
- name: Save target host IP
set_fact:
target_host: "{{ ansible_host }}"
- name: wait for ssh to stop
wait_for: "port=22 host={{ target_host }} delay=10 state=stopped timeout=60"
delegate_to: 127.0.0.1
There is no shutdown module. You can use single fire-and-forget call:
- name: Shutdown server
become: yes
shell: sleep 2 && /sbin/shutdown -c && /sbin/shutdown -h now
async: 1
poll: 0
As for net_ping, it is for network appliances such as switches and routers. If you rely on ICMP messages to test shutdown process, you can use something like this:
- name: Store actual host to be used with local_action
set_fact:
original_host: "{{ ansible_host }}"
- name: Wait for ping loss
local_action: shell ping -q -c 1 -W 1 {{ original_host }}
register: res
retries: 5
until: ('100.0% packet loss' in res.stdout)
failed_when: ('100.0% packet loss' not in res.stdout)
changed_when: no
This will wait for 100% packet loss or fail after 5 retries.
Here you want to use local_action because otherwise commands are executed on remote host (which is supposed to be down).
And you want to use trick to store ansible_host into temp fact, because ansible_host is replaced with 127.0.0.1 when delegated to local host.

Resources