Ansible: clear host errors after handler call - ansible

What I Do
For Bare Metal deployment, I configure interfaces on CentOS 7 servers via ansible 2.7.9.
Sometimes, the interface definitions change
- name: Copy sysctl and ifcfg- files from templates.
template: src={{ item.src }} dest={{ item.dest }}
with_items:
[...]
- { src: 'network.j2', dest: '/etc/sysconfig/network' }
notify:
- Restart network service
- Wait for reconnect
- Reset host errors
which is why I call a handler to restart the network.service when a change happens:
- name: Restart network service
service:
name: network
state: restarted
- name: Reset host errors
meta: clear_host_errors
- name: Wait for reconnect
wait_for_connection:
connect_timeout: 20
sleep: 5
delay: 5
timeout: 600
What I want
I can't get ansible to not quit the run when the Restart network service handler fails. Since the service restart is working fine on the host itself, I either want the restart to always exit with RC=0 or clear the host error after the failing handler call. In the following list, is there anything I am missing or doing wrong?
What I tried
ignore errors: true, failed_when: false, changed_when: false directives. Either with shell/command module or service module in the restart network handler block.
meta: clear_host_errors directly below the - name: Copy sysctl and ifcfg- files from templates. block
calling meta: clear_host_errors as a handler
Having the handler Restart network service exit with || true
async/poll variants for Restart network service
setting Pipelining to false
I always end up with:
RUNNING HANDLER [os : Restart network service] *******************************************************************
fatal: [host-redacted]: UNREACHABLE! => {"changed": false, "msg": "Failed to connect to the host via ssh: Shared connection to aa.bb.cc.dd closed.", "unreachable": true}
RUNNING HANDLER [os : Reset host errors] *************************************************************************
fatal: [host-redacted]: FAILED! => {"changed": false, "module_stderr": "Shared connection to aa.bb.cc.dd closed.\r\n", "module_stdout": "", "msg": "MODULE FAILURE\nSee stdout/stderr for the exact error", "rc": 0}
RUNNING HANDLER [os : Wait for reconnect] ************************************************************************
Where is the correct placement for meta: clear_host_errors in this case?
Some additional info
restart of network.service takes about ~40 seconds. (Tried async: 120 and poll: 30)
established, non-ansible SSH connections recover with enough timeout
re-run of ansible directly after the first exit work fine
Interestingly enough, the skipping works fine with ignore_errors: true when using Mitogen:
TASK [os : Restart network service] ******************************************************************************************************************************************************************************************************
skipping: [host-redacted]
skipping: [host-redacted]
fatal: [host-redacted]: FAILED! => {"msg": "EOF on stream; last 300 bytes received: 'ssh: connect to host aa.bb.cc.dd port 22: Connection refused\\r\\n'"}
...ignoring
This starts to look like a bug to me.

Related

Ansible: systemd fails. Which sudo permissions are needed?

Ansible 2.9, Linux Ubuntu 18.
I'm getting the following error with Ansible, when trying to change the status of a service with 'systemd'.
failed: [host.domain.com] (item=service_1) => {"ansible_loop_var": "item", "changed": false, "item": "service_1", "module_stderr": "Shared connection to host.domain.com closed.\r\n", "module_stdout": "\r\n", "msg": "MODULE FAILURE\nSee stdout/stderr for the exact error", "rc": 1}
- name: Stop services
ansible.builtin.systemd:
name: "{{ serv }}"
state: stopped
with_servs:
- service_1
- service_2
- service_3
become: yes
The code above works fine with an account that has full sudo access (same as root privileges)
This will fail as shown above with an account having limited sudo access (sudo access to specific commands, such as /bin/systemctl * service_1*, /bin/systemctl * service_2*, /bin/systemctl * service_3*
Which sudo permissions are needed to run ansible.builtin.systemd? I'm trying to find out what command Ansible sends to the device to check if I gave the right permissions to the account, but no success on finding that yet (any hints?).
At first change your with_servs to loop: ist much easier to writes playbooks with loop. Set become as a globall var for playbook. Workaround for this can be exec. a coomand by module commands but it's not recomennded because, in every sitatuion it will be executing this commands even when service is stopped.

Calling Netscaler CLI commands from Ansible

I'm trying to use the "cli_command" module from Ansible to configure Netscaler appliances.
For 2 of them, running version "12.0 - build 60.9.nc" a simple task like this works perfectly :
- name: call NS CLI
cli_command:
command: show nsconf
register: cs_vserver
delegate_to: netscaler_dmz
Changing, the "delegate_to" to an appliance running version "NS11.1: Build 56.19.nc", I get the error :
The full traceback is: WARNING: The below traceback may not be
related to the actual failure. File
"/tmp/ansible_cli_command_payload_4w503v/ansible_cli_command_payload.zip/ansible/modules/network/cli/cli_command.py",
line 167, in main File
"/tmp/ansible_cli_command_payload_4w503v/ansible_cli_command_payload.zip/ansible/module_utils/connection.py",
line 185, in rpc
raise ConnectionError(to_text(msg, errors='surrogate_then_replace'), code=code) fatal: [localhost ->
172.26.58.112]: FAILED! => {
"changed": false,
"invocation": {
"module_args": {
"answer": null,
"check_all": false,
"command": "show nsconf",
"newline": true,
"prompt": null,
"sendonly": false
}
},
"msg": "command timeout triggered, timeout value is 30 secs.\nSee the timeout setting options in the Network Debug and Troubleshooting
Guide." }
Authentication uses RSA keys for all 3 devices, logs show welcome banner and connection is fine (manual connection using ssh works fine too), but soon after above error occurs.
Strangely, Netscaler is not listed in the list of availiable network platforms (https://docs.ansible.com/ansible/latest/network/user_guide/platform_index.html#settings-by-platform) but using parameters as follow work fine on the two others Netscalers (inventory file) :
all:
hosts:
localhost:
ansible_connection: local
netscaler_dmz_int: <= OK
ansible_host: 192.168.XXX.XXX
ansible_connection: network_cli
ansible_network_os: ios
ansible_user: nsroot
netscaler_dmz_prod: <= OK
ansible_host: 192.168.XXX.XXX
ansible_connection: network_cli
ansible_network_os: ios
ansible_user: nsroot
netscaler_dc: <= KO
ansible_host: 172.26.XXX.XXX
ansible_connection: network_cli
ansible_network_os: ios
ansible_user: nsroot
Upgrading firmware is not feasible in the short terms.
Does the problem come from the older version ? Is there a more adequate parameters to make it woorks on all 3 devices ?
Thanks.
Problem solved thanks to 2 collegues : the fact that the prompt of the Citrix device, once connected, was only showing ">" instead of a more complexe one like "user_device_name>" was causing the paramiko module to wait indefinitly ending with a timeout.
Before :
The cli_command result :
2021-08-06 10:37:07,728 p=4783 u=xxxxx n=p=4783 u=xxxxx | paramiko [xxx.xx.xx.xxx] | Authentication (publickey) successful!
2021-08-06 10:37:34,487 p=4646 u=xxxxx n=ansible | persistent connection idle timeout triggered, timeout value is 30 secs.
It's possible to change this prompt for the specific user used for connection, "nsroot" here :
After :
The connection was sucessful afterward.

Error when I am trying to execute show version into a Cisco device

I am trying to learn Ansible but I have some problems: I did a simple playbook, my first one, but it didn't work well: I am able to connect to my device with user teste and password teste and also execute the command.
fatal: [ansible_user=teste]: FAILED! => {"changed": false, "msg":
"command timeout triggered, timeout value is 10 secs.\nSee the timeout
setting options in the Network Debug and Troubleshooting Guide."}
fatal: [ansible_password=teste]: FAILED! => {"changed": false, "msg":
"command timeout triggered, timeout value is 10 secs.\nSee the timeout
setting options in the Network Debug and Troubleshooting Guide."}
fatal: [192.168.0.103]: FAILED! => {"changed": false, "msg": "command
timeout triggered, timeout value is 10 secs.\nSee the timeout setting
options in the Network Debug and Troubleshooting Guide."}
This is my playbook:
---
- name: First Play
hosts: routers
gather_facts: False
connection: local
tasks:
- name: Fist Task
ios_command:
commands: show version
register: version
Do you have any idea of what I am doing wrong?
well, I have to change my host file:
this way did not work:
[routers]
192.168.0.103
ansible_user=teste
ansible_password=teste
after check in the internet, I tried this way and worked fine:
[routers]
192.168.0.103
[routers:vars]
ansible_user=teste
ansible_password=teste
ansible_connection=network_cli
ansible_network_os=ios
The was issue was resolved after adding the username and password in the hosts file

unable to restart iptables from ansible ( Interactive authentication required)

How to restart iptables service from Ansible (in order to reload config file /etc/sysconfig/iptables)
I have handler restart iptables defined as
service: name=iptables enabled=yes state=restarted
But it produces following error message:
fatal: [xx.xx.xx.xx]: FAILED! => {"changed": false, "failed": true,
"msg": "Failed to stop iptables.service: Interactive authentication
required.\n Failed to start iptables.service: Interactive
authentication required.\n"}
I am working with CentOS Linux release 7.2.1511 (Core)
I was not running my handler command as root. If handler contains become: yes then handler works fine.
- name: restart iptables
become: yes
service: name=iptables enabled=yes state=restarted
Another way of refreshing iptables configuration, without restarting it is
- name: reload iptables
become: yes
shell: iptables-restore < /etc/sysconfig/iptables

Ansible Service Restart Failed

I've been having some trouble with restarting the SSH daemon with Ansible.
I'm using the latest software as of May 11 2015 (Ansible 1.9.1 / Vagrant 1.7.2 / VirtualBox 4.3.26 / Host: OS X 10.10.1 / Guest: ubuntu/trusty64)
tl;dr: There appears to be something wrong with the way I'm invoking the service syntax.
Problem With Original Use Case (Handler)
Playbook
- hosts: all
- remote_user: vagrant
- tasks:
...
- name: Forbid SSH root login
sudo: yes
lineinfile: dest=/etc/ssh/sshd_config regexp="^PermitRootLogin" line="permitRootLogin no" state=present
notify:
- restart ssh
...
- handlers:
- name: restart ssh
sudo: yes
service: name=ssh state=restarted
Output
NOTIFIED: [restart ssh]
failed: [default] => {"failed": true}
FATAL: all hosts have already failed -- aborting
The nginx handler completed successfully with nearly identical syntax.
Task Also Fails
Playbook
- name: Restart SSH server
sudo: yes
service: name=ssh state=restarted
Same output as the handler use case.
Ad Hoc Command Also Fails
Shell
> ansible all -i ansible_inventory -u vagrant -k -m service -a "name=ssh state=restarted"
Inventory
127.0.0.1:8022
Output
127.0.0.1 | FAILED >> {
"failed": true,
"msg": ""
}
Shell command in box works
When I SSH in and run the usual command, everything works fine.
> vagrant ssh
> sudo service ssh restart
ssh stop/waiting
ssh start/running, process 7899
> echo $?
0
Command task also works
Output
TASK: [Restart SSH server] ****************************************************
changed: [default] => {"changed": true, "cmd": ["service", "ssh", "restart"], "delta": "0:00:00.060220", "end": "2015-05-11 07:59:25.310183", "rc": 0, "start": "2015-05-11 07:59:25.249963", "stderr": "", "stdout": "ssh stop/waiting\nssh start/running, process 8553", "warnings": ["Consider using service module rather than running service"]}
As we can see in the warning, we're supposed to use the service module, but I'm still not sure where the snag is.
As the comments above state, this is an Ansible issue that will apparently be fixed in the 2.0 release.
I just changed my handler to use the command module and moved on:
- name: restart sshd
command: service ssh restart

Resources