Ansible Reload Cisco Devices Hangs - ansible

I have a task that reloads a number of Cisco Routers and Cisco Firewalls. For some reason the playbook always hangs at this task after the playbook have already reloaded the first device. The reload command is actually sent to the second device and I can see the device restart but the task will eventually fail.
Output:
Task [Reload Host Device]
fatal: [firewall2]: FAILED! => {"changed: false, "msg": "command timeout triggered, timeout value is 30 secs. \nsee the timeout setting options in the Network Debug and Troubleshooting Guide."} ...ignoring
Task:
- name: Reload Host Device
cli_command:
command: "reload"
prompt:
- "Proceed with reload?"
answer:
- "y"
ignore_errors: yes

According the given description, task and error message
command timeout triggered ... see the timeout setting options in the Network Debug and Troubleshooting Guide.
the devices which Ansible are connects to are going to restart faster than Ansible and the module cli_command can maintain his own connection. Meaning, close and disconnect the session.
According the Command Reference and Cisco Router: Auto restart in x seconds you may try with
- name: Schedule Host Device Reload in 1 min
cli_command:
command: "reload in 1"
prompt:
- "Proceed with reload?"
answer:
- "y"
register: result
- name: Show result
debug:
var: result
Further Reading
An other approach which was used in the past for Linux systems
How to automate system reboots using the Ansible?

Related

How to display message to user when machine is rebooted by ansible

I am trying to reboot machine using ansible and want to display broad cast message to users when rebooted is triggered by ansible . I am using below lines in playbook. but it is not displaying any message to user not even the default one .
- name: Machine Reboot
reboot:
msg: "Reboot triggered by Ansible"
You can add command: wall reboot from Ansible to your playbook before doing the actual reboot.
For handlers it can look like this:
handlers:
- name: reboot notify
become: true
command: wall reboot with love from ansible
listen: reboot
- name: reboot
become: true
reboot:

Ansible wait_for_connection until the hosts are ready for ansible?

I am using ansible to configure some VM's.
Problem I am facing right now is, I can't execute ansible commands right after the VM's are just started, it gives connection time out error. This happens when I execute the ansible right after the VMs are spinned up in GCP.
Commands working fine when I execute ansible playbook after 60 seconds, but I am looking for a way to do this automatically without manually wait 60s and execute, so I can execute right after VM's are spun up and ansible will wait until they are ready. I don't want to add a delay seconds to ansible tasks as well,
I am looking for a dynamic way where ansible tries to execute playbook and when it fails, it won't show any error but wait until the VM's are ready?
I used this, but it still doesn't work (as it fails)
---
- hosts: all
tasks:
- name: Wait for connection
wait_for_connection: # but this will still fails, am I doing this wrong?
- name: Ping all hosts for connectivity check
ping:
Can someone please help me?
I have the same issue on my side.
I've fixed htis with this task wait_for.
The basic way is to waiting ssh connection like this :
- name: Wait 300 seconds for port 22 to become open and contain "OpenSSH"
wait_for:
port: 22
host: '{{ (ansible_ssh_host|default(ansible_host))|default(inventory_hostname) }}'
search_regex: OpenSSH
delay: 10
connection: local
I guess your VM must launch an application/service so you can monitor on the vm in the log file where application is started, like this for example (here for nexus container):
- name: Wait container is start and running
become: yes
become_user: "{{ ansible_nexus_user }}"
wait_for:
path: "{{ ansible_nexus_directory_data }}/log/nexus.log"
search_regex: ".*Started Sonatype Nexus.*"
I believe what you are looking for is to postpone gather_facts until the server is up, as that otherwise will time out as you experienced. Your file could work as follows:
---
- hosts: all
gather_facts: no
tasks:
- name: Wait for connection (600s default)
ansible.builtin.wait_for_connection:
- name: Gather facts manually
ansible.builtin.wait_for_connection
I have these under pre_tasks instead of tasks, but it should probably work if they are first in your file.

Make ansible wait for server to start, without logging in

When I provision a new server, there is a lag between the time I create it and it becomes available. So I need to wait until it's ready.
I assumed that was the purpose of the wait_for task:
hosts:
[servers]
42.42.42.42
playbook.yml:
---
- hosts: all
gather_facts: no
tasks:
- name: wait until server is up
wait_for: port=22
This fails with Permission denied. I assume that's because nothing is setup yet.
I expected it to open an ssh connection and wait for the prompt - just to see if the server is up. But what actually happens is it tries to login.
Is there some other way to perform a wait that doesn't try to login?
as you correctly stated, this task executes on the "to be provisioned" host, so ansible tries to connect to it (via ssh) first, then would try to wait for the port to be up. this would work for other ports/services, but not for 22 on a given host, since 22 is a "prerequisite" for executing any task on that host.
what you could do is try to delegate_to this task to the ansible host (that you run the PB) and add the host parameter in the wait_for task.
Example:
- name: wait until server is up
wait_for:
port: 22
host: <the IP of the host you are trying to provision>
delegate_to: localhost
hope it helps
Q: "Is there some other way to perform a wait that doesn't try to login?"
A: It is possible to wait_for_connection. For example
- hosts: all
gather_facts: no
tasks:
- name: wait until server is up
wait_for_connection:
delay: 60
timeout: 300

Error handling with "failed_when" doesn't work

I wish to create an ansible playbook to run on multiple servers in order to check for conformity to the compliance rules. Therefore the playbook is supposed to check whether several services are disabled and whether they are stopped.
I do know that not all the services will be running or enabled on all machines. Therefore I wish to handle certain return codes within the playbook.
I tried the failed_when Statement for this Task. It seems the way to go as it allows to set which RCs to handle.
- hosts: server_group1
remote_user: ansible
tasks:
- name: Stop services
command: /bin/systemctl stop "{{ item }}"
with_items:
- cups
- avahi
- slapd
- isc-dhcp-server
- isc-dhcp-server6
- nfs-server
- rpcbind
- bind9
- vsftpd
- dovecot
- smbd
- snmpd
- squid
register: output
failed_when: "output.rc != 0 and output.rc != 1"
However the loop only works if I use ignore_errors: True which is not shown in the example code.
I do know that I wish to catch RCs of 0 and 1 from the command executed by the playbook. But no matter what failed_when always generates a fatal error and my playbook fails.
I also tried the failed_when line without the "" but that doesn't change a thing.
Something is missing, but I don't see what.
Any advice?
Therefore the playbook is supposed to check whether several services are disabled and whether they are stopped.
Your playbook isn't doing either of these things. If a service exists, whether or not it's running, then systemctl stop <service> will return successfully in most cases. The only time you'll get a non-zero exit code is if either (a) the service does not exist or (b) systemd is unable to stop the service for some reason.
Note that calling systemctl stop has no effect on whether or not a service is disabled; with your current playbook, all those services would start back up the next time the host boots.
If your goal is not simply to check that services are stopped and disabled, but rather to ensure that services are stopped and disabled, you could do something like this:
- name: "Stop services"
service:
name: "{{ item }}"
enabled: false
state: stopped
ignore_errors: true
register: results
loop:
- cups
- avahi
- slapd
- isc-dhcp-server
- isc-dhcp-server6
- nfs-server
- rpcbind
- bind9
- vsftpd
- dovecot
- smbd
- snmpd
- squid
You probably want ignore_errors: true here, because you want run the task for every item in your list.
A different way of handling errors might be with something like:
failed_when: >-
results is failed and
"Could not find the requested service" not in results.msg|default('')
This would only fail the task if stopping the service failed for a reason other than the the fact that there was not matching service on the target host.

Ansible How to replay notifications

Currently I am switching from puppet to Ansible and I am a bit confused with some concepts or at least how ansible works.
Some info on the setup:
I am using the examples from Ansible Best Practices and have structured my project similar with several roles (playbooks) and so on.
I am using Vagrant for provisioning and the box is Saucy64 VBox.
Where the Confusion comes:
When I provision, and I run ansible, tasks start to execute, then the stack of notifications.
Example:
Last task:
TASK: [mysql | delete anonymous MySQL server user for localhost] **************
<127.0.0.1> REMOTE_MODULE mysql_user user='' state=absent
changed: [default] => {"changed": true, "item": "", "user": ""}
Then first notification:
NOTIFIED: [timezone | update tzdata] ******************************************
<127.0.0.1> REMOTE_MODULE command /usr/sbin/dpkg-reconfigure --frontend noninteractive tzdata
changed: [default] => {"changed": true, "cmd": ["/usr/sbin/dpkg-reconfigure", "--frontend", "noninteractive", "tzdata"], "delta": "0:00:00.224081", "end": "2014-02-03 22:34:48.508961", "item": "", "rc": 0, "start": "2014-02-03 22:34:48.284880", "stderr": "\nCurrent default time zone: 'Europe/Amsterdam'\nLocal time is now: Mon Feb 3 22:34:48 CET 2014.\nUniversal Time is now: Mon Feb 3 21:34:48 UTC 2014.", "stdout": ""}
Now this is all fine. As the roles increase more and more notifications stuck up.
Now here comes the problem.
When a notification fails the provisioning stops as usual. But then the notification stack is empty!
This means that all notifications that where after the faulty one will not be executed!
If that is so then if you changed a vhosts setting for apache and had a notification for the apache service to reload then this would get lost.
Let's give an example (pseudo lang):
- name: Install Apache Modules
notify: Restart Apache
- name: Enable Vhosts
notify: Reload Apache
- name: Install PHP
command: GGGGGG # throws an error
When the above executes:
Apache modules are installed
Vhosts are enables
PHP tries to istall and fails
Script exits
(Where are the notifications?)
Now at this point all seems logical but again Ansible tries to be clever (no!*) stacks notifications and thus reload and restart apache will result in a single restart of apache run at the end of provisioning. That means that all notifications will fail!!!
Now up to here for some people this is fine as well. They will say hey just re-run the provisioning and the notifications will fire up, thus apache will be finally reloaded and site will be up again. This is not the case.
On the second run of the script after the code for installing php is corrected the notifications will not run due to design. Why?
This is why:
Ansible will have the tasks that executed successfully, marked as "Done/Green" thus not registering any notifications for these tasks. The provisioning will be successful and in order to trigger the notification and thus the apache restart you can do one of the following:
Run a direct command to the server via ansible or ssh
Edit the script to trigger the task
Add a separate task for that
Destroy instance of box and reprovision
This is quite frustrating because requires total cleanup of the box, or do I not understand something correctly with Ansible?
Is there another way to 'reclaim'/replay/force the notifications to execute?
Clever would be either to mark the task as incomplete and then restart the notifications or keep a separate queue with the notifications as tasks of their own.*
Yeah, that's one of the shortcomings of Ansible to say compared to Puppet. Puppet is declarative and doesn't error out like Ansible (or Chef) for that matter. It has its positives and negatives, for example Puppet takes a little bit of time before it starts running because it needs to compile its catalog.
So, you are right if your Ansible script errors out then your notification updates won't happen. The only way we've gotten around it is by using conditional statements. In your playbook you can do something like this:
- name: My cool playbook
hosts: all
vars:
force_tasks: 0
tasks:
- name: Apache install
action: apt pkg=$item state=latest
with_items:
- apache2
- apache2-mpm-prefork
- name: Restart apache
action: service name=apache2 state=restart
when: force_tasks
Then when you run your playbook you can pass force_tasks as an environment variable:
ansible-playbook -i my_inventory -e "force_tasks=True" my_ansible_playbook.yml
You can accomplish this in similar fashion with tags.
Run ansible-playbook with the --force-handlers flag. This tells Ansible to run any queued handlers even if a task fails and further processing stops. The Ansible developers plan to add this as an option to the ansible.cfg file so it can be set globally and forgotten about. I don't know what the time frame for that is.

Resources