I trigger multiple Tomcat startup scripts and then need to check if all process listens on their specific port across multiple hosts in the quickest time possible.
For the test case, I m writing 3 scripts that run on a single host and listen on ports 4443, 4445, 4447 respectively as below.
/tmp/startapp1.sh
while test 1 # infinite loop
sleep 10
do
nc -l localhost 4443 > /tmp/app1.log
done
/tmp/startapp2.sh
while test 1 # infinite loop
sleep 30
do
nc -l localhost 4445 > /tmp/app2.log
done
/tmp/startapp3.sh
while test 1 # infinite loop
sleep 20
do
nc -l localhost 4447 > /tmp/app3.log
done
Below is my code to trigger the script and check if the telnet is successful:
main.yml
- include_tasks: "internal.yml"
loop:
- /tmp/startapp1.sh 4443
- /tmp/startapp2.sh 4445
- /tmp/startapp3.sh 4447
internal.yml
- shell: "{{ item.split()[0] }}"
async: 600
poll: 0
- name: DEBUG CHECK TELNET
shell: "telnet {{ item.split()[1] }}"
delegate_to: localhost
register: telnetcheck
until: telnetcheck.rc == 0
async: 600
poll: 0
delay: 6
retries: 10
- name: Result of TELNET
async_status:
jid: "{{ item.ansible_job_id }}"
register: _jobs
until: _jobs.finished
delay: 6
retries: 10
with_items: "{{ telnetcheck.results }}"
To run: ansible-playbook main.yml
Requirement: the above three scripts should start along with telnet check in about 30 seconds.
Thus, the basic check that needs to be done here is telnet until: telnetcheck.rc == 0 but due to async the telnet shell module does not have entries for rc and hence I get the below error:
"msg": "The conditional check 'telnetcheck.rc == 0' failed. The error was: error while evaluating conditional (telnetcheck.rc == 0): 'dict object' has no attribute 'rc'"
In the above code where and how can I check if telnet had succeeded i.e telnetcheck.rc == 0 and make sure the requirement is met?
Currently I am not aware a solution with which one could start a shell script and wait for a status of it in one task. It might be possible to just change the shell script according the necessary behavior and let it provide self checks and exit codes. Or you could implement two or more tasks, whereby one is executing the shell script and the others later check on certain conditions.
Regarding your requirement
wait until telnet localhost 8076 is LISTENING (successful).
you may have a look into the module wait_for.
---
- hosts: localhost
become: false
gather_facts: false
tasks:
- name: "Test connection to local port"
wait_for:
host: localhost
port: 8076
delay: 0
timeout: 3
active_connection_states: SYN_RECV
check_mode: false # because remote module (wait_for) does not support it
register: result
- name: Show result
debug:
msg: "{{ result }}"
Further Q&A
How to use Ansible module wait_for together with loop?
Firewall Functional Test
An other approach of testing from Control Node on Remote Node if there is a LISTENER on localhost could be
---
- hosts: test.example.com
become: true
gather_facts: false
vars:
PORT: "8076"
tasks:
- name: "Check for LISTENER on remote localhost"
shell:
cmd: "lsof -Pi TCP:{{ PORT }}"
changed_when: false
check_mode: false
register: result
failed_when: result.rc != 0 and result.rc != 1
- name: Report missing LISTENER
debug:
msg: "No LISTENER on PORT {{ PORT }}"
when: result.rc == 1
Using an asynchronous action and an until in the same task makes nearly no sense.
As for your requirement to have the answer in the quickest time possible, you will have to rethink it through. With your three ports case, if you want them all to be opened before you move on the task, it will always be as slow as the slowest port to open, no matter what. Even if the first we probe is indeed the slowest, the two other will then probe in no time, so, trying to optimise it in an async is, to my point of view, an unnecessary optimisation.
Either you want to use until, and then each port probe would be stuck until they answer, or you want to run them asynchronously and the async_status will catch the return as it should if you wrap the telnet in a shell until loop.
In your until loop, the issue is that the return code won't be set until the command does indeed return, so you just have to check if the rc key of the dictionary is defined.
Mind that for all the examples below, I am manually opening port with nc -l -p <port>, this is why they do gradually open.
With until:
- shell: "telnet localhost {{ item.split()[1] }}"
delegate_to: localhost
register: telnetcheck
until:
- telnetcheck.rc is defined
- telnetcheck.rc == 0
delay: 6
retries: 10
This will yield:
TASK [shell] *****************************************************************
FAILED - RETRYING: [localhost]: shell (10 retries left).
changed: [localhost] => (item=/tmp/startapp1.sh 4443)
FAILED - RETRYING: [localhost]: shell (10 retries left).
changed: [localhost] => (item=/tmp/startapp2.sh 4445)
FAILED - RETRYING: [localhost]: shell (10 retries left).
changed: [localhost] => (item=/tmp/startapp3.sh 4447)
With async:
- shell: "until telnet 127.0.0.1 {{ item.split()[1] }}; do sleep 2; done"
delegate_to: localhost
register: telnetcheck
async: 600
poll: 0
- async_status:
jid: "{{ item.ansible_job_id }}"
register: _jobs
until: _jobs.finished
delay: 6
retries: 10
loop: "{{ telnetcheck.results }}"
loop_control:
label: "{{ item.item }}"
This will yield:
TASK [shell] *****************************************************************
changed: [localhost] => (item=/tmp/startapp1.sh 4443)
changed: [localhost] => (item=/tmp/startapp2.sh 4445)
changed: [localhost] => (item=/tmp/startapp3.sh 4447)
TASK [async_status] **********************************************************
FAILED - RETRYING: [localhost]: async_status (10 retries left).
changed: [localhost] => (item=/tmp/startapp1.sh 4443)
FAILED - RETRYING: [localhost]: async_status (10 retries left).
changed: [localhost] => (item=/tmp/startapp2.sh 4445)
FAILED - RETRYING: [localhost]: async_status (10 retries left).
changed: [localhost] => (item=/tmp/startapp3.sh 4447)
This said, you have to seriously consider #U880D's answer, as this is a more native answer for Ansible:
- wait_for:
host: localhost
port: "{{ item.split()[1] }}"
delay: 6
timeout: 60
This will yield:
TASK [wait_for] **************************************************************
ok: [localhost] => (item=/tmp/startapp1.sh 4443)
ok: [localhost] => (item=/tmp/startapp2.sh 4445)
ok: [localhost] => (item=/tmp/startapp3.sh 4447)
Related
I made something that is working but I look for something more 'elegant' ;)
- name: Wait for DNS propagation
ansible.builtin.shell:
cmd: host "{{ vm_fqdn }}"
register: dns_result
until: dns_result.rc == 0
retries: 30
delay: 10
But previously I tried with a lookup and the community.general.dig
- name: Wait for DNS propagation
ansible.builtin.set_fact:
dns_result: "{{ lookup('community.general.dig', vm_fqdn)}}"
until: dns_result == vm_ip
retries: 30
delay: 10
Unfortunately, even I added a register I couldn't get the dns_result variable updated after multiple tries. It's like the lookup happens only once on the first iteration, but is not 're-triggered' on the next try. So maybe it is the behavior of lookup or something else, but I'm curious to know.
You're using the lookup at the wrong point. It should be directly in the until, so that it is evaluated each time.
- name: Wait for DNS propagation
debug:
msg: waiting
until: lookup('community.general.dig', vm_fqdn) == vm_ip
retries: 30
delay: 10
I couldn't get the dns_result variable updated after multiple tries ... but is not 're-triggered' on the next try.
Right, that is the expected behavior for your given example.
It's like the lookup happens only once on the first iteration
But this caused by the way variables becomes registered and handled internally. For the retries loop run the variable content for until condition will stay at the initial value since it is set (registered) at "compile time" and not "re-set during run time".
The following example will show the behavior.
---
- hosts: localhost
become: false
gather_facts: false
vars:
RETRIES: [1, 2, 3]
tasks:
- name: Set Facts in loop
set_fact:
RESULT: "{{ ansible_loop.index }}"
# Inner loop
until: RESULT == '3'
retries: 3
delay: 1
# Outer loop
loop: "{{ RETRIES }}"
loop_control:
extended: true
label: "{{ item }}"
As one can see from the result output, the retries behaves like an loop for the inner object, which is a single outer loop state here.
PLAY [localhost] *************************************
FAILED - RETRYING: Set Facts in loop (3 retries left).
FAILED - RETRYING: Set Facts in loop (2 retries left).
FAILED - RETRYING: Set Facts in loop (1 retries left).
TASK [Set Facts in loop] *****************************
failed: [localhost] (item=1) => changed=false
ansible_facts:
RESULT: '1'
...
ansible_loop_var: item
attempts: 3
item: 1
FAILED - RETRYING: Set Facts in loop (3 retries left).
FAILED - RETRYING: Set Facts in loop (2 retries left).
FAILED - RETRYING: Set Facts in loop (1 retries left).
failed: [localhost] (item=2) => changed=false
ansible_facts:
RESULT: '2'
...
ansible_loop_var: item
attempts: 3
item: 2
ok: [localhost] => (item=3)
To summarize, the construct set_fact - lookup('dig') - retries - until can't work unless the initial condition is already true. This is shown in the example run #3.
Documentation
Retrying a task until a condition is met
When you run a task with until and register the result as a variable, the registered variable will include a key called “attempts”, which records the number of the retries for the task.
Further Q&A
How to specify multiple conditions in a do until loop in Ansible?
Possible Solutions
You may
either stay with you first approach
try something with the wait_for module – Waits for a condition before continuing since according Parameter host: there is the option for
A resolvable hostname or IP address to wait for.
- name: Wait for host
wait_for:
host: test.example.com
port: 22
timeout: 5
register: result
but you will need a port to check for at least.
or use the here proposed solution from flowerysong (recommended)
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about programming within the scope defined in the help center.
Closed 10 months ago.
Improve this question
I am new to ansible and need help here.
There is a file with 500+ remote_host:port line entries like below.
remote_host1:port1
remote_host2:port2
remote_host3:port1
Using ansible how to loop over the lines from file, split the lines with 2 variables remot_host and port, login to remote_host and using nc -k -l port start listening on the port, verify connectivity nc -vz remote_host port from a given host, kill the nc command from remote host.
So gar I have used wait_for module to verify connectivity of single remote_host and its port.
For example, given the file
shell> cat remote-hosts.txt
test_11:22
test_12:22
test_13:80
Use the module wait_for. The playbook below
- hosts: localhost
tasks:
- block:
- wait_for:
host: "{{ item.split(':').0 }}"
port: "{{ item.split(':').1|int }}"
timeout: 5
loop: "{{ lookup('file', 'remote-hosts.txt').splitlines() }}"
rescue:
- debug:
msg: "{{ ansible_failed_result.results|selectattr('failed') }}"
gives (abridged)
TASK [wait_for] ******************************************************************************
ok: [localhost] => (item=test_11:22)
ok: [localhost] => (item=test_12:22)
failed: [localhost] (item=test_13:80) => changed=false
ansible_loop_var: item
elapsed: 6
item: test_13:80
msg: Timeout when waiting for test_13:80
TASK [debug] *********************************************************************************
ok: [localhost] =>
msg:
- ansible_loop_var: item
changed: false
elapsed: 6
failed: true
invocation:
module_args:
active_connection_states:
- ESTABLISHED
- FIN_WAIT1
- FIN_WAIT2
- SYN_RECV
- SYN_SENT
- TIME_WAIT
connect_timeout: 5
delay: 0
exclude_hosts: null
host: test_13
msg: null
path: null
port: 80
search_regex: null
sleep: 1
state: started
timeout: 5
item: test_13:80
msg: Timeout when waiting for test_13:80
All,
Example: If i've got 20 hosts for a playbook and running them with Serial:10, below shell command runs on 10 hosts at a time. Once done handler task is called, wherein the task which creates dict (_dict) doesn't give a dictionary output thus the second task - Failed host - failed with mentioned error.
- name: Run some shell command
shell: "echo 2 > /abcd/abcd.txt"
when: random condition is satisfied
register: update2
ignore_errors: yes
notify: abc_handler
- handler:
- name: abcd_handler
set_fact:
_dict: "{{ dict(ansible_play_hosts|zip(
ansible_play_hosts|map('extract', hostvars, 'update2'))) }}"
run_once: true
- name: Find failed hosts
set_fact:
_failed: "{{ _dict|dict2items|json_query('[?value.failed].key') }}"
run_once: true
Handler First task output:
"changed: false"
"ansible_facts": {
"_dict": "{u'host1': {'stderr_lines': [], u'changed': True,...u'host2':.....u'host10'}"
2nd handler task gives the mentioned error when the dict2items is run for above values.
Thank you.
Q: "List of hosts where a certain task executed, changed something, or got failed."
A: For example, the command makes no changes at test_11 changes the file at test_12, and fails at test_13
- hosts: test_11,test_12,test_13
tasks:
- shell:
cmd: "echo 2 > /tmp/test/abcd.txt"
creates: /tmp/test/abcd.txt
register: update1
ignore_errors: true
TASK [shell] ***********************************************************
changed: [test_12]
fatal: [test_13]: FAILED! => changed=true
cmd: echo 2 > /tmp/test/abcd.txt
delta: '0:00:00.045992'
end: '2021-04-25 23:22:31.623804'
msg: non-zero return code
rc: 2
start: '2021-04-25 23:22:31.577812'
stderr: '/bin/sh: cannot create /tmp/test/abcd.txt: Permission denied'
stderr_lines: <omitted>
stdout: ''
stdout_lines: <omitted>
...ignoring
ok: [test_11]
Let's create a dictionary with the data first, e.g.
- set_fact:
_dict: "{{ dict(ansible_play_hosts|
zip(ansible_play_hosts|
map('extract', hostvars, 'update1'))) }}"
run_once: true
gives
_dict:
test_11:
changed: false
cmd: echo 2 > /tmp/test/abcd.txt
failed: false
rc: 0
stdout: skipped, since /tmp/test/abcd.txt exists
stdout_lines:
- skipped, since /tmp/test/abcd.txt exists
test_12:
changed: true
cmd: echo 2 > /tmp/test/abcd.txt
delta: '0:00:00.032474'
end: '2021-04-25 23:14:36.361510'
failed: false
rc: 0
start: '2021-04-25 23:14:36.329036'
stderr: ''
stderr_lines: []
stdout: ''
stdout_lines: []
test_13:
changed: true
cmd: echo 2 > /tmp/test/abcd.txt
delta: '0:00:00.054980'
end: '2021-04-25 23:14:35.565811'
failed: true
msg: non-zero return code
rc: 2
start: '2021-04-25 23:14:35.510831'
stderr: '/bin/sh: cannot create /tmp/test/abcd.txt: Permission denied'
stderr_lines:
- '/bin/sh: cannot create /tmp/test/abcd.txt: Permission denied'
stdout: ''
stdout_lines: []
Note that test_11 is reported ok not skipped despite the registered variable showing "stdout: skipped, since /tmp/test/abcd.txt exists".
The analysis is now trivial, e.g.
- set_fact:
_failed: "{{ _dict|dict2items|json_query('[?value.failed].key') }}"
run_once: true
gives the list of the failed hosts
_failed:
- test_13
and the next task
- set_fact:
_changed: "{{ (_dict|dict2items|json_query('[?value.changed].key'))|
difference(_failed) }}"
_ok: "{{ _dict|dict2items|json_query('[?value.changed == `false`].key') }}"
run_once: true
gives
_changed:
- test_12
_ok:
- test_11
Note that
The failed hosts need to be subtracted from the changed hosts because failed hosts are also reported as changed.
There will be no registered variable if a task is skipped.
Serial
Split the playbook into 2 plays if serial is used. e.g.
shell> cat playbook.yml
- hosts: all
serial: 10
tasks:
- shell:
cmd: "echo 2 > /tmp/test/abcd.txt"
creates: /tmp/test/abcd.txt
register: update1
ignore_errors: true
- hosts: all
tasks:
- set_fact:
_dict: "{{ dict(ansible_play_hosts|zip(
ansible_play_hosts|map('extract', hostvars, 'update1'))) }}"
run_once: true
It seems you would like to get the hosts on which the command task (shown in the question) failed or changed, and then target them for some other tasks.
There are two things required for this:
If the command task fails, playbook execution will stop and hence none of the following tasks will run. So we need to add ignore_errors flag to the task
add_host module to create a new group of hosts when the task failed or changed
So finally tasks like below should do the trick:
- hosts: some_group
serial: 1
- name: update file count
shell: "echo 2 > /home/ec2-user/abcd.txt"
when:
- count.stdout == "1"
register: update1
ignore_errors: true
- name: conditionally add the hosts from current play hosts to a new group
add_host:
groups:
- new_group
host: "{{ ansible_hostname }}"
when: >
cmd_stat is failed or
cmd_stat is changed
# Then have a play targeting the new group
- hosts: new_group
tasks:
# Tasks to be performed
Though the use of serial might make the whole playbook run longer if there are lot of hosts.
Below are couple of IP addresses and their telnet response (output)
telnet 10.9.9.112 22
Trying 10.9.9.112......
telnet 10.9.9.143 22
Trying 10.9.9.143.....
telnet: connect to address 10.9.9.143: Connection refused.
For the first IP 10.9.9.112 there is no connection and firewall blocks any connection from source to destination. The output simply says Trying .... and stays that way without printing anything else.
For the second IP 10.9.9.143 i get Connection refused immediately in the output and the control back to the prompt.
I wish to grab both scenarios in when condition and perform different activities for both the cases.
I tried to use Ansible's telnet module but I don't know how to grab both the different outputs in the registered variable.
In my case it prints the same message for both the IPs.
Ansible output for first ip:
TASK [debug] *************************************
ok: [localhost] => {
"msg": "HERE:{u'msg': u'Timeout when waiting for 10.9.9.112', u'failed': True, 'changed': False, u'elapsed': 4}"
Ansible Output for second ip:
TASK [debug] *************************************
ok: [localhost] => {
"msg": "HERE:{u'msg': u'Timeout when waiting for 10.9.9.143', u'failed': True, 'changed': False, u'elapsed': 3}"
The only difference I see is the value for elapsed.
Here is my playbook.
---
- name: "Play 1"
hosts: localhost
tasks:
- wait_for:
hosts: "{{ item }}"
port: 22
state: started
delay: 3
timeout: 90
ignore_errors: yes
register: telnetout
loop:
- 10.9.9.112
- 10.9.9.143
- debug:
msg: "HERE: {{ telnetout }}"
telnet module unfortunately does not record Connection Refused message in the output.
We have to use raw module instead like below.
---
- name: "Play 1"
hosts: localhost
tasks:
- raw: "timeout --signal=9 2 telnet {{ item }} 22"
ignore_errors: yes
register: telnetout
loop:
- 10.9.9.112
- 10.9.9.143
- debug:
msg: "HERE: {{ telnetout }}"
I am trying to run a command multiple times and check if the output contains some string in it("hi"). I am purposefully simulating failure and expecting the until loop to fail. Everything is good till this point.
Now, I need to have some custom message stating why the until loop or the task failed. For Example: "Your command failed to print hi"
So Question is, How can I print custom message from until loop if the loop failed to pass withing the retries.
Playbook:
-->cat until.yml
---
- hosts: localhost
gather_facts: no
tasks:
- name: "check command"
shell: echo hello
register: var
until: var.stdout.find('hi') != -1
retries: 5
delay: 1
playbook output:
-->ansible-playbook until.yml
PLAY [localhost] *************************************************************************************************************************************************************************************************************************
TASK [check command] ********************************************************************************************************************************************************************************************************
FAILED - RETRYING: who triggered the playbook (5 retries left).
FAILED - RETRYING: who triggered the playbook (4 retries left).
FAILED - RETRYING: who triggered the playbook (3 retries left).
FAILED - RETRYING: who triggered the playbook (2 retries left).
FAILED - RETRYING: who triggered the playbook (1 retries left).
fatal: [localhost]: FAILED! => {
"attempts": 5,
"changed": true,
"cmd": "echo hello",
"delta": "0:00:00.003004",
"end": "2019-12-03 10:04:14.731488",
"rc": 0,
"start": "2019-12-03 10:04:14.728484"
}
STDOUT:
hello
PLAY RECAP *******************************************************************************************************************************************************************************************************************************
localhost : ok=0 changed=0 unreachable=0 failed=1
You can divide your task into two tasks:
First task will poll for the desired output using until loop. But we have used ignore_errors: True , so that until loop will not fail the playbook. We will just capture the result.
In second task , use assert to print success_msg for success case and fail_msg for failure case.
Following is tweaked ,minimum working example:
---
- hosts: localhost
gather_facts: no
tasks:
- name: "check command"
shell: echo hello
register: var
until: var.stdout.find('hi') != -1
retries: 5
delay: 1
ignore_errors: true
- name: "Print result"
assert:
that: var.stdout|regex_search('hi')
fail_msg: "COuld not find HI in command output"
success_msg: "Hi is present in Command output"
Have a look at block error handling which can be used for this purpose.
Basic overview:
- block:
- name: A task that may fail.
debug:
msg: "I may fail"
failed_when: true
register: might_fail_exec
rescue:
- name: fail nicely with a msg
fail:
msg: "The task that might fail has failed. Here is some info from the task: {{ might_fail_exec }}"