I do a simple service stop in ansible for a windows service:
- name: stop service
win_service:
name: "{{ tomcat_srv_name }}"
state: stopped
Due to a problem on the remote server, the stop fails. In case I try this on the remote server i get a timeout. but the above ansible statement hangs forever.
Is there a way to catch this? Something like wait_for ...?
Based on #kfreezy's note I have build this block to catch a potential error and react accordingly:
block:
# try to stop the service
- win_service:
name: "{{ srv_name }}"
state: stopped
async: 45
poll: 5
register: service_stop_info
- debug:
msg: "STOP seevice {{ srv_name }} results in: {{ service_stop_info.state }}"
rescue:
# in case the service can not be stopped, kill its process
- name: Kill process of service
win_command: taskkill /f /fi "Services eq {{ srv_name }}"
register: cmd_result_service_kill
- debug:
msg: "KILL process of service {{ srv_name }} results in: {{ cmd_result_service_kill.stdout }}"
always:
# restart the service
- win_service:
name: "{{ srv_name }}"
state: started
register: service_start_info
- debug:
msg: "START service {{ srv_name }} results in: {{ service_start_info.state }}"
Async and polling should work (haven't used it on a windows machine). You'll probably want to tweak the values a bit depending on how long it normally takes to stop tomcat.
- name: stop service
win_service:
name: "{{ tomcat_srv_name }}"
state: stopped
async: 45
poll: 5
Related
I have a users.yaml file with information regarding 400+ users. I need Ansible to create these users during provisioning. I tried with the async keyword (if that's the right word to use, tell me if I'm wrong) and poll: 15 but it takes ~10minutes.
- name: Add FTP users asynchronously
ansible.builtin.user:
name: "{{ item.name }}"
home: "{{ item.home }}"
shell: /sbin/nologin
groups: ftp-users
create_home: yes
append: no
loop: "{{ ftp_users }}"
async: 60
poll: 15
tags: users
I also tried using poll:0 but many users aren't created.
Your actual use of async is adapted to a single long running task use case where you want to minimize the chance of getting your connection kicked because of a timeout. You are asking ansible to start a job, disconnect from the target and then reconnect every 15 seconds to check if the job is done (or until you reach the 60 seconds timeout). Nothing will be launched in parallel: the next iteration in the loop will only start when the current is done.
What you want to do instead is run those tasks in parallel as fast as possible and then check back later if they are done. In this case, you have to use poll: 0 on your task and later check for completion with the async_status module as described on the ansible async guide. Note that you also need to cleanup the async job cache as ansible will not do it automagically for you in that case.
In your case, this would give:
- name: Add FTP users asynchronously
ansible.builtin.user:
name: "{{ item.name }}"
home: "{{ item.home }}"
shell: /sbin/nologin
groups: ftp-users
create_home: yes
append: no
loop: "{{ ftp_users }}"
async: 60
poll: 0
register: add_user
- name: Wait until all commands are done
async_status:
jid: "{{ item.ansible_job_id }}"
register: async_poll_result
until: async_poll_result.finished
retries: 60
delay: 1
loop: "{{ add_user.results }}"
- name: clean async job cache
async_status:
jid: "{{ item.ansible_job_id }}"
mode: cleanup
loop: "{{ add_user.results }}"
Meanwhile, although this is a direct answer on how to use async for parallel jobs, I'm not entirely sure this will fix your actual performance problem which could come from other issues (like slow dns, slow network, pipelining not enabled if that is possible, master ssh connection not configured...)
I have a 3 node ubuntu 20.04 lts - kvm - kubernetes cluster, and the kvm-host is also ubuntu 20.04 lts. I ran the playbooks on the kvm-host.
I have the following inventory extract:
nodes:
hosts:
sea_r:
ansible_host: 192.168.122.60
spring_r:
ansible_host: 192.168.122.92
island_r:
ansible_host: 192.168.122.93
vars:
ansible_user: root
and have been trying a lot with async_status, but always fails,
- name: root commands
hosts: nodes
tasks:
- name: bash commands
ansible.builtin.shell: |
apt update
args:
chdir: /root
executable: /bin/bash
async: 2000
poll: 2
register: output
- name: check progress
ansible.builtin.async_status:
jid: "{{ output.ansible_job_id }}"
register: job_result
until: job_result.finished
retries: 200
delay: 5
with error:
fatal: [sea_r]: FAILED! => {"msg": "The task
includes an option with an undefined variable.
The error was: 'dict object' has no attribute
'ansible_job_id' ...
If instead I try with the following,
- name: root commands
hosts: nodes
tasks:
- name: bash commands
ansible.builtin.shell: |
apt update
args:
chdir: /root
executable: /bin/bash
async: 2000
poll: 2
register: output
- debug: msg="{{ output.stdout_lines }}"
- debug: msg="{{ output.stderr_lines }}"
I get no errors.
Also tried following variation,
- name: check progress
ansible.builtin.async_status:
jid: "{{ item.ansible_job_id }}"
with_items: "{{ output }}"
register: job_result
until: job_result.finished
retries: 200
delay: 5
that was suggested as a solution to similar error. That also does not help, I just get slightly different error:
fatal: [sea_r]: FAILED! => {"msg": "The task includes
an option with an undefined variable. The error
was: 'ansible.utils.unsafe_proxy.AnsibleUnsafeText
object' has no attribute 'ansible_job_id' ...
At the beginning and the end of the playbook, I resume and pause my 3 kvm server nodes like so:
- name: resume vms
hosts: local_vm_ctl
tasks:
- name: resume vm servers
shell: |
virsh resume kub3
virsh resume kub2
virsh resume kub1
virsh list --state-paused --state-running
args:
chdir: /home/bi
executable: /bin/bash
environment:
LIBVIRT_DEFAULT_URI: qemu:///system
register: output
- debug: msg="{{ output.stdout_lines }}"
- debug: msg="{{ output.stderr_lines }}"
and so
- name: pause vms
hosts: local_vm_ctl
tasks:
- name: suspend vm servers
shell: |
virsh suspend kub3
virsh suspend kub2
virsh suspend kub1
virsh list --state-paused --state-running
args:
chdir: /home/bi
executable: /bin/bash
environment:
LIBVIRT_DEFAULT_URI: qemu:///system
register: output
- debug: msg="{{ output.stdout_lines }}"
- debug: msg="{{ output.stderr_lines }}"
but I don't see how these plays could have anything to do with said error.
Any help will be much appreciated.
You get an undefined error for your job id because:
You use poll: X on your initial task, so ansible connects every X seconds to check if the task is finished
When ansible exists that task and enters your next async_status task, the job is done. And since you used a non-zero value to poll the async status cache is automatically cleared.
since the cache was cleared, the job id does not exist anymore.
Your above scenario is meant to be used to avoid timeouts with your target on long running tasks, not to run tasks concurrently and have a later checkpoint on their status. For this second requirement, you need to run the async task with poll: 0 and clean-up the cache by yourself
See the documentation for more explanation on the above concepts:
ansible async guide
ansible async_status module
I made an example with your above task and fixed it to use the dedicated module apt (note that you could add a name option to the module with one or a list of packages and ansible would do both the cache update and install in a single step). Also, retries * delay on the async_status task should be equal or greater than async on the initial task if you want to make sure that you won't miss the end.
- name: Update apt cache
ansible.builtin.apt:
update_cache: true
async: 2000
poll: 0
register: output
- name: check progress
ansible.builtin.async_status:
jid: "{{ output.ansible_job_id }}"
register: job_result
until: job_result.finished
retries: 400
delay: 5
- name: clean async job cache
ansible.builtin.async_status:
jid: "{{ output.ansible_job_id }}"
mode: cleanup
This is more useful to launch a bunch of long lasting tasks in parallel. Here is a useless yet functional example:
- name: launch some loooooong tasks
shell: "{{ item }}"
loop:
- sleep 30
- sleep 20
- sleep 35
async: 100
poll: 0
register: long_cmd
- name: wait until all commands are done
async_status:
jid: "{{ item.ansible_job_id }}"
register: async_poll_result
until: async_poll_result.finished
retries: 50
delay: 2
loop: "{{ long_cmd.results }}"
- name: clean async job cache
async_status:
jid: "{{ item.ansible_job_id }}"
mode: cleanup
loop: "{{ long_cmd.results }}"
You have poll: 2 on your task, which tells Ansible to internally poll the async job every 2 seconds and return the final status in the registered variable. In order to use async_status you should set poll: 0 so that the task does not wait for the job to finish.
I'm creating playbook which will be applied to new Docker swarm manager(s). Server(s) is/are not configured before playbook run.
We already have some Swarm managers. I can find all of them (include new one) with:
- name: 'Search for SwarmManager server IPs'
ec2_instance_facts:
region: "{{ ec2_region }}"
filters:
vpc-id: "{{ ec2_vpc_id }}"
"tag:aws:cloudformation:logical-id": "AutoScalingGroupSwarmManager"
register: swarmmanager_instance_facts_result
Now I can use something like this to get join-token:
- set_fact:
swarmmanager_ip: "{{ swarmmanager_instance_facts_result.instances[0].private_ip_address }}"
- name: 'Get the docker swarm join-token'
shell: docker swarm join-token -q manager
changed_when: False
register: docker_swarm_token_result
delegate_to: "{{ swarmmanager_ip }}"
run_once: true
Success shell output looks like this — just 1 line started with "SWMTKN-1":
SWMTKN-1-11xxxyyyzzz-xxxyyyzzz
But I see some possible problems here with swarmmanager_ip:
it can be new instance which still unconfigured,
it can be instance with not working Swarm manager.
So I decided to loop over results until I've got join-token. But many code variants I've tried doesn't work. For example, this one runs over all list without break:
- name: 'Get the docker swarm join-token'
shell: docker swarm join-token -q manager
changed_when: False
register: docker_swarm_token_result
delegate_to: "{{ item.private_ip_address }}"
loop: "{{ swarmmanager_instance_facts_result.instances }}"
# ignore_errors: true
# until: docker_swarm_token_result.stdout_lines|length == 1
when: docker_swarm_token_result is not defined or docker_swarm_token_result.stdout_lines is not defined or docker_swarm_token_result.stdout_lines|length == 1
run_once: true
check_mode: false
Do you know how to iterate over list until first success shell output?
I use Ansible 2.6.11, it is OK to receive answer about 2.7.
P.S.: I've already read How to break `with_lines` cycle in Ansible?, it doesn't works for modern Ansible versions.
In my host, it needs time (about 20s) to initialize CLI session,... before doing cli
I'm trying to do command by playbook ansible:
---
- name: Run show sub command
hosts: em
gather_facts: no
remote_user: duypn
tasks:
- name: wait for SSH to respond on all hosts
local_action: wait_for host=em port=22 delay=60 state=started
- name: run show sub command
raw: show sub id=xxxxx;display=term-type
After 10 mins, ansible gives me output which is not the result of show sub command :(
...
["CLI Session initializing..", "Autocompleter initializing..", "CLI>This session has been IDLE for too long.",
...
I'm glad to hear your suggestion. Thank you :)
I don't have a copy-paste solution for you but one thing I learned is to put a sleep after ssh is 'up' to allow the machine to finish it's work. This might give you a nudge in the right direction.
- name: Wait for SSH to come up
local_action: wait_for
host={{ item.public_ip }}
port=22
state=started
with_items: "{{ ec2.instances }}"
- name: waiting for a few seconds to let the machine start
pause:
seconds: 20
So I had the same problem and this is how I solved it:
---
- name: "Get instances info"
ec2_instance_facts:
aws_access_key: "{{ aws_access_key }}"
aws_secret_key: "{{ aws_secret_key }}"
region: "{{ aws_region }}"
filters:
vpc-id : "{{ vpc_id }}"
private-ip-address: "{{ ansible_ssh_host }}"
delegate_to: localhost
register: my_ec2
- name: "Waiting for {{ hostname }} to response"
wait_for:
host: "{{ item.public_ip_address }}"
state: "{{ state }}"
sleep: 1
port: 22
delegate_to: localhost
with_items:
- "{{ my_ec2.instances }}"
That is the playbook named aws_ec2_status.
The playbook I ran looks like this:
---
# Create an ec2 instance in aws
- hosts: nodes
gather_facts: false
serial: 1
vars:
state: "present"
roles:
- aws_create_ec2
- hosts: nodes
gather_facts: no
vars:
state: "started"
roles:
- aws_ec2_status
The reason I split the create and check to two different playbooks is because I want the playbook to create instances and not wait for one to be ready before creating the other.
But if the second instance is depended on the first one so you should combine them.
FYI Let me know if you want to see my aws_create_ec2 playbook.
I want to pass a variable to a notification handler, but can't find anywhere be it here on SO, the docs or the issues in the github repo, how to do it. What I'm doing is deploying multiple webapps, and when the code for one of those webapps is changed, it should restart the service for that webapp.
From this SO question, I got this to work, somewhat:
- hosts: localhost
tasks:
- name: "task 1"
shell: "echo {{ item }}"
register: "task_1_output"
with_items: [a,b]
- name: "task 2"
debug:
msg: "{{ item.item }}"
when: item.changed
with_items: task_1_output.results
(Put it in test.yml and run it with ansible-playbook test.yml -c local.)
But this registers the result of the first task and conditionally loops over that in the second task. My problem is that it gets messy when you have two or more tasks that need to notify the second task! For example, restart the web service if either the code was updated or the configuration was changed.
AFAICT, there's no way to pass a variable to a handler. That would cleanly fix it for me. I found some issues on github where other people run into the same problem, and some syntaxes are proposed, but none of them actually work.
Including a sub-playbook won't work either, because using with_items together with include was deprecated.
In my playbooks, I have a site.yml that lists the roles of a group, then in the group_vars for that group I define the list of webapps (including the versions) that should be installed. This seems correct to me, because this way I can use the same playbook for staging and production. But maybe the only solution is to define the role multiple times, and duplicate the list of roles for staging and production.
So what is the wisdom here?
Variables in Ansible are global so there is no reason to pass a variable to handler. If you are trying to make a handler parameterized in a way that you are trying to use a variable in the name of a handler you won't be able to do that in Ansible.
What you can do is create a handler that loops over a list of services easily enough, here is a working example that can be tested locally:
- hosts: localhost
tasks:
- file: >
path=/tmp/{{ item }}
state=directory
register: files_created
with_items:
- one
- two
notify: some_handler
handlers:
- name: "some_handler"
shell: "echo {{ item }} has changed!"
when: item.changed
with_items: files_created.results
I finally solved it by splitting the apps out over multiple instances of the same role. This way, the handler in the role can refer to variables that are defined as role variable.
In site.yml:
- hosts: localhost
roles:
- role: something
name: a
- role: something
name: b
In roles/something/tasks/main.yml:
- name: do something
shell: "echo {{ name }}"
notify: something happened
- name: do something else
shell: "echo {{ name }}"
notify: something happened
In roles/something/handlers/main.yml:
- name: something happened
debug:
msg: "{{ name }}"
Seems a lot less hackish than the first solution!
To update jarv's answer above, Ansible 2.5 replaces with_items with loop. When getting results, item by itself will not work. You will need to explicitly get the name, e.g., item.name.
- hosts: localhost
tasks:
- file: >
path=/tmp/{{ item }}
state=directory
register: files_created
loop:
- one
- two
notify: some_handler
handlers:
- name: "some_handler"
shell: "echo {{ item.name }} has changed!"
when: item.changed
loop: files_created.results
I got mine to work like this - I had to add some curly brackets
tasks:
- name: Aktivieren von Security-, Backport- und Non-Security-Upgrades
lineinfile:
path: /etc/apt/apt.conf.d/50unattended-upgrades
regexp: '^[^"//"]*"\${distro_id}:\${distro_codename}-{{ item }}";'
line: ' "${distro_id}:${distro_codename}-{{ item }}";'
insertafter: "Unattended-Upgrade::Allowed-Origins {"
state: present
register: aenderung
loop:
- updates
- security
- backports
notify: Auskommentierte Zeilen entfernen
handlers:
- name: Auskommentierte Zeilen entfernen
lineinfile:
path: /etc/apt/apt.conf.d/50unattended-upgrades
regexp: '^\/\/.*{{ item.item }}";.*'
state: absent
when: item.changed
loop: "{{ aenderung.results }}"