What's the fastest method to add Linux users with Ansible? - ansible

I have a users.yaml file with information regarding 400+ users. I need Ansible to create these users during provisioning. I tried with the async keyword (if that's the right word to use, tell me if I'm wrong) and poll: 15 but it takes ~10minutes.
- name: Add FTP users asynchronously
ansible.builtin.user:
name: "{{ item.name }}"
home: "{{ item.home }}"
shell: /sbin/nologin
groups: ftp-users
create_home: yes
append: no
loop: "{{ ftp_users }}"
async: 60
poll: 15
tags: users
I also tried using poll:0 but many users aren't created.

Your actual use of async is adapted to a single long running task use case where you want to minimize the chance of getting your connection kicked because of a timeout. You are asking ansible to start a job, disconnect from the target and then reconnect every 15 seconds to check if the job is done (or until you reach the 60 seconds timeout). Nothing will be launched in parallel: the next iteration in the loop will only start when the current is done.
What you want to do instead is run those tasks in parallel as fast as possible and then check back later if they are done. In this case, you have to use poll: 0 on your task and later check for completion with the async_status module as described on the ansible async guide. Note that you also need to cleanup the async job cache as ansible will not do it automagically for you in that case.
In your case, this would give:
- name: Add FTP users asynchronously
ansible.builtin.user:
name: "{{ item.name }}"
home: "{{ item.home }}"
shell: /sbin/nologin
groups: ftp-users
create_home: yes
append: no
loop: "{{ ftp_users }}"
async: 60
poll: 0
register: add_user
- name: Wait until all commands are done
async_status:
jid: "{{ item.ansible_job_id }}"
register: async_poll_result
until: async_poll_result.finished
retries: 60
delay: 1
loop: "{{ add_user.results }}"
- name: clean async job cache
async_status:
jid: "{{ item.ansible_job_id }}"
mode: cleanup
loop: "{{ add_user.results }}"
Meanwhile, although this is a direct answer on how to use async for parallel jobs, I'm not entirely sure this will fix your actual performance problem which could come from other issues (like slow dns, slow network, pipelining not enabled if that is possible, master ssh connection not configured...)

Related

Ansible async_status task - error: ansible_job_id "undefined variable"

I have a 3 node ubuntu 20.04 lts - kvm - kubernetes cluster, and the kvm-host is also ubuntu 20.04 lts. I ran the playbooks on the kvm-host.
I have the following inventory extract:
nodes:
hosts:
sea_r:
ansible_host: 192.168.122.60
spring_r:
ansible_host: 192.168.122.92
island_r:
ansible_host: 192.168.122.93
vars:
ansible_user: root
and have been trying a lot with async_status, but always fails,
- name: root commands
hosts: nodes
tasks:
- name: bash commands
ansible.builtin.shell: |
apt update
args:
chdir: /root
executable: /bin/bash
async: 2000
poll: 2
register: output
- name: check progress
ansible.builtin.async_status:
jid: "{{ output.ansible_job_id }}"
register: job_result
until: job_result.finished
retries: 200
delay: 5
with error:
fatal: [sea_r]: FAILED! => {"msg": "The task
includes an option with an undefined variable.
The error was: 'dict object' has no attribute
'ansible_job_id' ...
If instead I try with the following,
- name: root commands
hosts: nodes
tasks:
- name: bash commands
ansible.builtin.shell: |
apt update
args:
chdir: /root
executable: /bin/bash
async: 2000
poll: 2
register: output
- debug: msg="{{ output.stdout_lines }}"
- debug: msg="{{ output.stderr_lines }}"
I get no errors.
Also tried following variation,
- name: check progress
ansible.builtin.async_status:
jid: "{{ item.ansible_job_id }}"
with_items: "{{ output }}"
register: job_result
until: job_result.finished
retries: 200
delay: 5
that was suggested as a solution to similar error. That also does not help, I just get slightly different error:
fatal: [sea_r]: FAILED! => {"msg": "The task includes
an option with an undefined variable. The error
was: 'ansible.utils.unsafe_proxy.AnsibleUnsafeText
object' has no attribute 'ansible_job_id' ...
At the beginning and the end of the playbook, I resume and pause my 3 kvm server nodes like so:
- name: resume vms
hosts: local_vm_ctl
tasks:
- name: resume vm servers
shell: |
virsh resume kub3
virsh resume kub2
virsh resume kub1
virsh list --state-paused --state-running
args:
chdir: /home/bi
executable: /bin/bash
environment:
LIBVIRT_DEFAULT_URI: qemu:///system
register: output
- debug: msg="{{ output.stdout_lines }}"
- debug: msg="{{ output.stderr_lines }}"
and so
- name: pause vms
hosts: local_vm_ctl
tasks:
- name: suspend vm servers
shell: |
virsh suspend kub3
virsh suspend kub2
virsh suspend kub1
virsh list --state-paused --state-running
args:
chdir: /home/bi
executable: /bin/bash
environment:
LIBVIRT_DEFAULT_URI: qemu:///system
register: output
- debug: msg="{{ output.stdout_lines }}"
- debug: msg="{{ output.stderr_lines }}"
but I don't see how these plays could have anything to do with said error.
Any help will be much appreciated.
You get an undefined error for your job id because:
You use poll: X on your initial task, so ansible connects every X seconds to check if the task is finished
When ansible exists that task and enters your next async_status task, the job is done. And since you used a non-zero value to poll the async status cache is automatically cleared.
since the cache was cleared, the job id does not exist anymore.
Your above scenario is meant to be used to avoid timeouts with your target on long running tasks, not to run tasks concurrently and have a later checkpoint on their status. For this second requirement, you need to run the async task with poll: 0 and clean-up the cache by yourself
See the documentation for more explanation on the above concepts:
ansible async guide
ansible async_status module
I made an example with your above task and fixed it to use the dedicated module apt (note that you could add a name option to the module with one or a list of packages and ansible would do both the cache update and install in a single step). Also, retries * delay on the async_status task should be equal or greater than async on the initial task if you want to make sure that you won't miss the end.
- name: Update apt cache
ansible.builtin.apt:
update_cache: true
async: 2000
poll: 0
register: output
- name: check progress
ansible.builtin.async_status:
jid: "{{ output.ansible_job_id }}"
register: job_result
until: job_result.finished
retries: 400
delay: 5
- name: clean async job cache
ansible.builtin.async_status:
jid: "{{ output.ansible_job_id }}"
mode: cleanup
This is more useful to launch a bunch of long lasting tasks in parallel. Here is a useless yet functional example:
- name: launch some loooooong tasks
shell: "{{ item }}"
loop:
- sleep 30
- sleep 20
- sleep 35
async: 100
poll: 0
register: long_cmd
- name: wait until all commands are done
async_status:
jid: "{{ item.ansible_job_id }}"
register: async_poll_result
until: async_poll_result.finished
retries: 50
delay: 2
loop: "{{ long_cmd.results }}"
- name: clean async job cache
async_status:
jid: "{{ item.ansible_job_id }}"
mode: cleanup
loop: "{{ long_cmd.results }}"
You have poll: 2 on your task, which tells Ansible to internally poll the async job every 2 seconds and return the final status in the registered variable. In order to use async_status you should set poll: 0 so that the task does not wait for the job to finish.

Ansible Downloading large files

I am trying to download backup files from my websites. I have structured my playbook the following:
site_vars.yml holds my variables:
website_backup_download:
- name: ftp://username:userpassword#ftp.mysite1.com/backups/mysite1backup.tgz
path: mysites/backups/www
- name: ftp://username:userpassword#ftp.mysite2.com/backups/mysite2backup.tgz
path: mysites/backups/www
- name: ftp://username:userpassword#ftp.mysite3.com/backups/mysite3backup.tgz
path: mysites/backups/www
Actual downloader playbook:
# Downloader
task:
- name: Download backups from FTP's
get_url:
url: "{{ item.name }}"
dest: "{{ item.path }}"
mode: 0750
no_log: false
ignore_errors: True
with_items:
- "{{ website_backup_download }}"
This works actually very well, but the problem begins with large backup files, the task needs to be running until the backup file has been downloaded properly.
I can't repeat the task to complete the incompleted file or files. :)
Have tried another solution, this works also well for a single site, but can't use it for multiple downloads :(
- name: Download backups
command: wget -c ftp://username:userpassword#ftp.mysite1.com/backups/mysite1backup.tgz
args:
chdir: "{{ down_path }}"
warn: false
register: task_result
retries: 10
delay: 1
until: task_result.rc == 0
ignore_errors: True
Thanks for your help.
I have modified the task by adding the timeout parameter for runtime, additionally added the until parameter, waiting for download to be finished, and retry and delay parameters to retrying until it meths conditions.
This works for now :)
Thanks to all of you.
# Downloader
task:
- name: Download backups from FTP's
get_url:
url: "{{ item.name }}"
dest: "{{ item.path }}"
mode: 0750
timeout: 1800
retries: 10
delay: 3
register: result
until: result is succeeded
no_log: false
ignore_errors: True
with_items:
- "{{ website_backup_download }}"

ansible sh module does not report output until shell completes

How can I see realtime output from a shell script run by ansible?
I recently refactored a wait script to use multiprocessing and provide realtime status of the various service wait checks for multiple services.
As a stand alone script, it works as expecting providing status for each thread as they wait in parallel for various services to get stable.
In ansible, the output pauses until the python script completes (or terminates) and then provides the output. While, OK, it I'd rather find a way to display output sooner. I've tried setting PYTHONUNBUFFERED prior to running ansible-playbook via jenkins withEnv but that doesn't seem to accomplish the goal either
- name: Wait up to 30m for service stability
shell: "{{ venv_dir }}/bin/python3 -u wait_service_state.py"
args:
chdir: "{{ script_dir }}"
What's the standard ansible pattern for displaying output for a long running script?
My guess is that I could follow one of these routes
Not use ansible
execute in a docker container and report output via ansible provided this doesn't hit the identical class of problem
Output to a file from the script and have either ansible thread or Jenkins pipeline thread watch and tail the file (both seem kludgy as this blurs the separation of concerns coupling my build server to the deploy scripts a little too tightly)
You can use - https://docs.ansible.com/ansible/latest/user_guide/playbooks_async.html
main.yml
- name: Run items asynchronously in batch of two items
vars:
sleep_durations:
- 1
- 2
- 3
- 4
- 5
durations: "{{ item }}"
include_tasks: execute_batch.yml
loop: "{{ sleep_durations | batch(2) | list }}"
execute_batch.yml
- name: Async sleeping for batched_items
command: sleep {{ async_item }}
async: 45
poll: 0
loop: "{{ durations }}"
loop_control:
loop_var: "async_item"
register: async_results
- name: Check sync status
async_status:
jid: "{{ async_result_item.ansible_job_id }}"
loop: "{{ async_results.results }}"
loop_control:
loop_var: "async_result_item"
register: async_poll_results
until: async_poll_results.finished
retries: 30
"What's the standard ansible pattern for displaying output for a long running script?"
Standard ansible pattern for displaying output for a long-running script is polling async and loop until async_status finishes. The customization of the until loop's output is limited. See Feature request: until for blocks #16621.
ansible-runner is another route that might be followed.

Ansible: how to loop over ip-addresses until first success shell output?

I'm creating playbook which will be applied to new Docker swarm manager(s). Server(s) is/are not configured before playbook run.
We already have some Swarm managers. I can find all of them (include new one) with:
- name: 'Search for SwarmManager server IPs'
ec2_instance_facts:
region: "{{ ec2_region }}"
filters:
vpc-id: "{{ ec2_vpc_id }}"
"tag:aws:cloudformation:logical-id": "AutoScalingGroupSwarmManager"
register: swarmmanager_instance_facts_result
Now I can use something like this to get join-token:
- set_fact:
swarmmanager_ip: "{{ swarmmanager_instance_facts_result.instances[0].private_ip_address }}"
- name: 'Get the docker swarm join-token'
shell: docker swarm join-token -q manager
changed_when: False
register: docker_swarm_token_result
delegate_to: "{{ swarmmanager_ip }}"
run_once: true
Success shell output looks like this — just 1 line started with "SWMTKN-1":
SWMTKN-1-11xxxyyyzzz-xxxyyyzzz
But I see some possible problems here with swarmmanager_ip:
it can be new instance which still unconfigured,
it can be instance with not working Swarm manager.
So I decided to loop over results until I've got join-token. But many code variants I've tried doesn't work. For example, this one runs over all list without break:
- name: 'Get the docker swarm join-token'
shell: docker swarm join-token -q manager
changed_when: False
register: docker_swarm_token_result
delegate_to: "{{ item.private_ip_address }}"
loop: "{{ swarmmanager_instance_facts_result.instances }}"
# ignore_errors: true
# until: docker_swarm_token_result.stdout_lines|length == 1
when: docker_swarm_token_result is not defined or docker_swarm_token_result.stdout_lines is not defined or docker_swarm_token_result.stdout_lines|length == 1
run_once: true
check_mode: false
Do you know how to iterate over list until first success shell output?
I use Ansible 2.6.11, it is OK to receive answer about 2.7.
P.S.: I've already read How to break `with_lines` cycle in Ansible?, it doesn't works for modern Ansible versions.

Ansible win_service stop fails - How to assert?

I do a simple service stop in ansible for a windows service:
- name: stop service
win_service:
name: "{{ tomcat_srv_name }}"
state: stopped
Due to a problem on the remote server, the stop fails. In case I try this on the remote server i get a timeout. but the above ansible statement hangs forever.
Is there a way to catch this? Something like wait_for ...?
Based on #kfreezy's note I have build this block to catch a potential error and react accordingly:
block:
# try to stop the service
- win_service:
name: "{{ srv_name }}"
state: stopped
async: 45
poll: 5
register: service_stop_info
- debug:
msg: "STOP seevice {{ srv_name }} results in: {{ service_stop_info.state }}"
rescue:
# in case the service can not be stopped, kill its process
- name: Kill process of service
win_command: taskkill /f /fi "Services eq {{ srv_name }}"
register: cmd_result_service_kill
- debug:
msg: "KILL process of service {{ srv_name }} results in: {{ cmd_result_service_kill.stdout }}"
always:
# restart the service
- win_service:
name: "{{ srv_name }}"
state: started
register: service_start_info
- debug:
msg: "START service {{ srv_name }} results in: {{ service_start_info.state }}"
Async and polling should work (haven't used it on a windows machine). You'll probably want to tweak the values a bit depending on how long it normally takes to stop tomcat.
- name: stop service
win_service:
name: "{{ tomcat_srv_name }}"
state: stopped
async: 45
poll: 5

Resources