I'm using Ansible to deploy a webapp. I'd like to wait for the application to be running by checking that a given page returns a JSON with a given key/value.
I want the task to be tried a few times before failing. I'm therefore using the combination of until/retries/delay keybwords.
Issue is, I want the number of retries to be taken from a variable. If I write :
retries: {{apache_test_retries}}
I fall into the usual Yaml Gotcha (http://docs.ansible.com/YAMLSyntax.html#gotchas).
If, instead, I write:
retries: "{{apache_test_retries}}"
I'm being said the value is not an integer.
ValueError: invalid literal for int() with base 10: '{{apache_test_retries}}'
Here is my full code:
- name: Wait for the application to be running
local_action:
uri
url=http://{{webapp_url}}/health
timeout=60
register: res
sudo: false
when: updated.changed and apache_test_url is defined
until: res.status == 200 and res['json'] is defined and res['json']['status'] == 'UP'
retries: "{{apache_test_retries}}"
delay: 1
Any idea on how to work around this issue? Thanks.
I had the same issue and tried a bunch of things that didn't work so for some time I just worked around without using a variable but found the answer so for everyone who has it.
Daniels solution indeed should work:
retries: "{{ apache_test_retries | int }}"
However, if you are running a little older version of Ansible it won't work. So make sure you update Ansible. I tested on 1.8.4 and it works and it doesn't on 1.8.2
This was the original bug on ansible:
https://github.com/ansible/ansible/issues/5865
You should be able to convert it to an integer with the int filter:
retries: "{{ apache_test_retries | int }}"
I had the same problem and the solutions suggested here didn't work. I didn't try Tim Diels' suggestion though.
Here's what worked for me:
vars:
capacity: "{{ param_capacity | default(16) }}"
tasks:
- name: some task
...
when: item.usage < (capacity | int)
loop:
...
And here's what I was trying to do:
vars:
capacity: "{{ (param_capacity | default(16)) | int }}"
tasks:
- name: some task
...
when: item.usage < capacity
loop:
...
I found this issue on GitHub, about this same problem, and actually the intended way to use this filter is applying it where you use the variable, not where you declare it.
I have faced a similar issue, in my case I wanted to restart celeryd service. It sometimes takes a very long time to restart and I wanted to give it max 30 seconds for a soft restart, then force-restart it. I used async for this (polling for restart result every 5 seconds).
celery/handlers/main.yml
- name: restart celeryd
service:
name=celeryd
state=restarted
register: celeryd_restart_result
ignore_errors: true
async: "{{ async_val | default(30) }}"
poll: 5
- name: check celeryd restart result and force restart if needed
shell: service celeryd kill && service celeryd start
when: celeryd_restart_result|failed
And then I use above in the playbook as handlers to a task (restart celeryd is always first in notify list)
In your case something like below could possibly work. Haven't checked whether it does but it might give you some hack idea to solve it in a different way. Also since you will be ignoring errors in the 1st task, you need to make sure that things are fine in 2nd:
- name: Poll to check if the application is running
local_action:
uri
url=http://{{webapp_url}}/health
timeout=60
register: res
sudo: false
when: updated.changed and apache_test_url is defined
failed_when: res.status != 200 and res['json'] is not defined and not res['json']['status'] == 'UP'
ignore_errors: true
async: "{{ apache_test_retries | default(60) }}"
poll: 1
# Task above will exit as early as possible on success
# It will keep trying for 60 secs, polling every 1 sec
# You need to make sure it's fine **again** because it has ignore_errors: true
- name: Final UP check
local_action:
uri
url=http://{{webapp_url}}/health
timeout=60
register: res
sudo: false
when: updated.changed and apache_test_url is defined
failed_when: res.status != 200 and res['json'] is not defined and not res['json']['status'] == 'UP'
Hope it helps you solve the issue with a bug in retries.
Related
I have a users.yaml file with information regarding 400+ users. I need Ansible to create these users during provisioning. I tried with the async keyword (if that's the right word to use, tell me if I'm wrong) and poll: 15 but it takes ~10minutes.
- name: Add FTP users asynchronously
ansible.builtin.user:
name: "{{ item.name }}"
home: "{{ item.home }}"
shell: /sbin/nologin
groups: ftp-users
create_home: yes
append: no
loop: "{{ ftp_users }}"
async: 60
poll: 15
tags: users
I also tried using poll:0 but many users aren't created.
Your actual use of async is adapted to a single long running task use case where you want to minimize the chance of getting your connection kicked because of a timeout. You are asking ansible to start a job, disconnect from the target and then reconnect every 15 seconds to check if the job is done (or until you reach the 60 seconds timeout). Nothing will be launched in parallel: the next iteration in the loop will only start when the current is done.
What you want to do instead is run those tasks in parallel as fast as possible and then check back later if they are done. In this case, you have to use poll: 0 on your task and later check for completion with the async_status module as described on the ansible async guide. Note that you also need to cleanup the async job cache as ansible will not do it automagically for you in that case.
In your case, this would give:
- name: Add FTP users asynchronously
ansible.builtin.user:
name: "{{ item.name }}"
home: "{{ item.home }}"
shell: /sbin/nologin
groups: ftp-users
create_home: yes
append: no
loop: "{{ ftp_users }}"
async: 60
poll: 0
register: add_user
- name: Wait until all commands are done
async_status:
jid: "{{ item.ansible_job_id }}"
register: async_poll_result
until: async_poll_result.finished
retries: 60
delay: 1
loop: "{{ add_user.results }}"
- name: clean async job cache
async_status:
jid: "{{ item.ansible_job_id }}"
mode: cleanup
loop: "{{ add_user.results }}"
Meanwhile, although this is a direct answer on how to use async for parallel jobs, I'm not entirely sure this will fix your actual performance problem which could come from other issues (like slow dns, slow network, pipelining not enabled if that is possible, master ssh connection not configured...)
I have an ansible task that fails about 20% of the time. It almost always succeeds if retried a couple of times. I'd like to use until to loop until the task succeeds and store the output of each attempt to a separate log file on the local machine. Is there a good way to achieve this?
For example, my task currently looks like this:
- name: Provision
register: prov_ret
until: prov_ret is succeeded
retries: 2
command: provision_cmd
I can see how to store the log output from the last retry when it succeeds, but I'd like to store it from each retry. To store from the last attempt to run the command I use:
- name: Write Log
local_action: copy content={{ prov_ret | to_nice_json }} dest="/tmp/ansible_logs/provision.log"
It's not possible as of 2.9. The until loop doesn't preserve results as loop does. Once a task terminates all variables inside this task will be gone except the register one.
To see what's going on in the loop write a log inside the command at the remote host. For example, the command provision_cmd writes a log to /scratch/provision_cmd.log. Run it in the block and display the log in the rescue section.
- block:
- name: Provision
command: provision_cmd
register: prov_ret
until: prov_ret is succeeded
retries: 2
rescue:
- name: Display registered variable
debug:
var: prov_ret
- name: Read the log
slurp:
src: /scratch/provision_cmd.log
register: provision_cmd_log
- name: Display log
debug:
msg: "{{ msg.split('\n') }}"
vars:
msg: "{{ provision_cmd_log.content|b64decode }}"
I am installing some plugins and then checking the status in a command loop. I want to check the result of the status of the command and if the plugins are not installed I want to install it again with the help of retry module.
- name: install plugins
command: "run {{ item }}"
with_items:
- install plugins
- status
register: result
until: result.stdout.find("InstallPlugin1 and InstallPlugin2") != -1
retries: 5
delay: 10
I am using register to save the result and I know register saves the result in results and in this case it will save the result in "results" dict. Now I want to check a string in result of status command in until, which should be the 2nd value of results dictionary but I am not able to grab it.
when I use
debug: msg="{{ result['results'][1]['stdout'] }}"
I can see the output of the status command but I dont know how to use this in until module. whenever I use results there it gives an error. I want to use something like
until: result['results'][1]['stdout'].find("all systems go") != -1
If both run install plugins and run status return something like
installed: InstallPlugin1, InstallPlugin2
the task below will do the job
- name: install plugins
command: "run {{ item }}"
loop:
- install plugins
- status
register: result
until:
- result.stdout is search('InstallPlugin1')
- result.stdout is search('InstallPlugin2')
retries: 5
delay: 10
It's not possible to use the loop if only run status returns the confirmation, because the until statement is evaluated in each iteration. An option would be to concatenate the commands. For example
- name: install plugins
command: "run install plugins; run status"
register: result
until:
- result.stdout is search('InstallPlugin1')
- result.stdout is search('InstallPlugin2')
retries: 5
delay: 10
It's possible to test the registered result in each loop. After the loop is done the variable result will keep accumulated result.results. It might be worth to review it.
- debug:
var: result
I think this is what you're looking for:
until: "all systems go" in item['stdout']
The register statement you have there will be a list of the aggregate results from all irritations in the with_items loop and what you want to conditional on is the item itself. Depending on what what you're doing, you might not even need to register that variable.
How can I see realtime output from a shell script run by ansible?
I recently refactored a wait script to use multiprocessing and provide realtime status of the various service wait checks for multiple services.
As a stand alone script, it works as expecting providing status for each thread as they wait in parallel for various services to get stable.
In ansible, the output pauses until the python script completes (or terminates) and then provides the output. While, OK, it I'd rather find a way to display output sooner. I've tried setting PYTHONUNBUFFERED prior to running ansible-playbook via jenkins withEnv but that doesn't seem to accomplish the goal either
- name: Wait up to 30m for service stability
shell: "{{ venv_dir }}/bin/python3 -u wait_service_state.py"
args:
chdir: "{{ script_dir }}"
What's the standard ansible pattern for displaying output for a long running script?
My guess is that I could follow one of these routes
Not use ansible
execute in a docker container and report output via ansible provided this doesn't hit the identical class of problem
Output to a file from the script and have either ansible thread or Jenkins pipeline thread watch and tail the file (both seem kludgy as this blurs the separation of concerns coupling my build server to the deploy scripts a little too tightly)
You can use - https://docs.ansible.com/ansible/latest/user_guide/playbooks_async.html
main.yml
- name: Run items asynchronously in batch of two items
vars:
sleep_durations:
- 1
- 2
- 3
- 4
- 5
durations: "{{ item }}"
include_tasks: execute_batch.yml
loop: "{{ sleep_durations | batch(2) | list }}"
execute_batch.yml
- name: Async sleeping for batched_items
command: sleep {{ async_item }}
async: 45
poll: 0
loop: "{{ durations }}"
loop_control:
loop_var: "async_item"
register: async_results
- name: Check sync status
async_status:
jid: "{{ async_result_item.ansible_job_id }}"
loop: "{{ async_results.results }}"
loop_control:
loop_var: "async_result_item"
register: async_poll_results
until: async_poll_results.finished
retries: 30
"What's the standard ansible pattern for displaying output for a long running script?"
Standard ansible pattern for displaying output for a long-running script is polling async and loop until async_status finishes. The customization of the until loop's output is limited. See Feature request: until for blocks #16621.
ansible-runner is another route that might be followed.
I'm creating playbook which will be applied to new Docker swarm manager(s). Server(s) is/are not configured before playbook run.
We already have some Swarm managers. I can find all of them (include new one) with:
- name: 'Search for SwarmManager server IPs'
ec2_instance_facts:
region: "{{ ec2_region }}"
filters:
vpc-id: "{{ ec2_vpc_id }}"
"tag:aws:cloudformation:logical-id": "AutoScalingGroupSwarmManager"
register: swarmmanager_instance_facts_result
Now I can use something like this to get join-token:
- set_fact:
swarmmanager_ip: "{{ swarmmanager_instance_facts_result.instances[0].private_ip_address }}"
- name: 'Get the docker swarm join-token'
shell: docker swarm join-token -q manager
changed_when: False
register: docker_swarm_token_result
delegate_to: "{{ swarmmanager_ip }}"
run_once: true
Success shell output looks like this — just 1 line started with "SWMTKN-1":
SWMTKN-1-11xxxyyyzzz-xxxyyyzzz
But I see some possible problems here with swarmmanager_ip:
it can be new instance which still unconfigured,
it can be instance with not working Swarm manager.
So I decided to loop over results until I've got join-token. But many code variants I've tried doesn't work. For example, this one runs over all list without break:
- name: 'Get the docker swarm join-token'
shell: docker swarm join-token -q manager
changed_when: False
register: docker_swarm_token_result
delegate_to: "{{ item.private_ip_address }}"
loop: "{{ swarmmanager_instance_facts_result.instances }}"
# ignore_errors: true
# until: docker_swarm_token_result.stdout_lines|length == 1
when: docker_swarm_token_result is not defined or docker_swarm_token_result.stdout_lines is not defined or docker_swarm_token_result.stdout_lines|length == 1
run_once: true
check_mode: false
Do you know how to iterate over list until first success shell output?
I use Ansible 2.6.11, it is OK to receive answer about 2.7.
P.S.: I've already read How to break `with_lines` cycle in Ansible?, it doesn't works for modern Ansible versions.