Ansible - give each node context to split data - amazon-ec2

I'm using ansible over AWS-EC2 to spawn N number of instances each being connected to a volume containing all the data to process, that is p files.
I'm trying to find a way to pass some environment variables to each node so that each one can process a split of the p files.
For example, I thought I could give the total number of nodes and the number of the current node i so that to processing script could take the ith split, but I can't find a way in ansible to do such a thing.
I tried using facts combined with with_indexed_items in this manner:
- name: Launch the calculation
shell: /opt/anaconda/bin/python tests/dicho.py chdir=/root/project/
async: 10000
poll: 10
environment:
PARTIAL_DATA_SPLITS: {{ groups.antmachines|length}}
PARTIAL_DATA_ALLOC: {{item.0}}
with_indexed_items: groups.antmachines
but the loop is done one time for each node.
Any help would be appreciated.

Following Konstantin Suvorov advice, the following code does the trick:
- set_fact:
padded_host_index: "{{ play_hosts.index(inventory_hostname) }}"
- name: Launch the calculation
shell: /opt/anaconda/bin/python tests/dicho.py chdir=/root/project/
async: 10000
poll: 10
environment:
PARTIAL_DATA_SPLITS: {{ groups.antmachines|length }}
PARTIAL_DATA_ALLOC: {{ padded_host_index }}

Related

Ansible: How to find aggregated file size across inventory hosts?

I'm able to find the total size of all the three files in variable totalsize on a single host as shown below.
cat all.hosts
[destnode]
myhost1
myhost2
myhost3
cat myplay.yml
- name: "Play 1"
hosts: "destnode"
gather_facts: false
tasks:
- name: Fail if file size is greater than 2GB
include_tasks: "{{ playbook_dir }}/checkfilesize.yml"
with_items:
- "{{ source_file_new.splitlines() }}"
cat checkfilesize.yml
- name: Check file size
stat:
path: "{{ item }}"
register: file_size
- set_fact:
totalsize: "{{ totalsize | default(0) |int + ( file_size.stat.size / 1024 / 1024 ) | int }}"
- debug:
msg: "TOTALSIZE: {{ totalsize }}"
To run:
ansible-playbook -i all.hosts myplay.yml -e source_file_new="/tmp/file1.log\n/tmp/file1.log\n/tmp/file1.log"
The above play works fine and gets me the total sum of sizes of all the files mentioned in variable source_file_new on individual hosts.
My requirement is to get the total size of all the files from all the three(or more) hosts mention is destnode group.
So, if each file is 10 MB on each host, the current playbook prints 10+10+10=30MB on host1 and like wise on host2 and host3.
Instead, I wish to the the sum of all the sizes from all the hosts like below
host1 (10+10+10) + host2 (10+10+10) + host3 (10+10+10) = 90MB
Extract the totalsize facts for each node in destnode from hostvars and sum them up.
In a nutshell, at the end of your current checkfilesize.yml task file, replace the debug task:
- name: Show total size for all nodes
vars:
overall_size: "{{ groups['destnode'] | map('extract', hostvars, 'totalsize')
| map('int') | sum }}"
debug:
msg: "Total size for all nodes: {{ overall_size }}"
run_once: true
If you need to reuse that value later, you can store it at once in a fact that will be set with the same value for all hosts:
- name: Set overall size as fact for all hosts
set_fact:
overall_size: "{{ groups['destnode'] | map('extract', hostvars, 'totalsize')
| map('int') | sum }}"
run_once: true
- name: Show the overall size (on result with same value for each host)
debug:
msg: "Total size for all nodes: {{ overall_size }} - (from {{ inventory_hostname }})"
As an alternative, you can replace set_fact with a variable declaration at play level.
It seems you are trying to implement (distributed) programming paradigms which aren't plain possible, at least not in that way and since Ansible is not a programming language or something for distributed computing but a Configuration Management Tool in which you declare a state. Therefore those are not recommended and should probably avoided.
Since your use case looks for me like in a normal MapReduce environment I understand from your description that you like to implement a kind of Reducer in a Distributed Environment in Ansible.
You made already the observation that the facts are distributed over your hosts in your environment. To sum them up it will be necessary that they become aggregated on one of the hosts, probably the Control Node.
To do so:
It might be possible to use Delegating facts for your task set_fact to get all necessary information to sum up onto one host
An other approach could be to let your task creating and adding custom facts about the summed up filesize during run. Those Custom Facts could become gathered and cached on the Control Node during next run.
A third option and since Custom Facts can be simple files, one could probably create a simple cronjob which creates the necessary .fact file with requested information (filesize, etc.) on a scheduled base.
Further Documentation
facts.d or local facts
Introduction to Ansible facts
Similar Q&A
Ansible: How to define ... a global ... variable?
Summary
My requirement is to get the total size of all the files from all the three (or more) hosts ...
Instead of creating a playbook which is generating and calculating values (facts) during execution time it is recommended to define something for the Target Nodes and create a playbook which is just collecting the facts in question.
In example
... add dynamic facts by adding executable scripts to facts.d. For example, you can add a list of all users on a host to your facts by creating and running a script in facts.d.
which can also be about files and the size.

How can I loop through two tasks in a role using with_sequence?

Problem
I need to loop through two separate tasks with_sequence as the {{ item }} value is the name of my stack.
For each integer in with_sequence, it needs to loop {{ item }} through task1 once, then loop through task2 once. Once it goes to the next integer, lets say from 0 to 1, it needs to repeat the loop of going through task1 once then task2 until it reaches 10.
This is my failed attempt at trying to get it to work but just adding it here as an example of what I'm trying to do. Is there a better way on how to do this with a loop?
# /roles/app1/tasks/main.yml
---
- name: Include env vars to pass to tasks
include_vars:
file: "./roles/app1/vars/{{ cluster }}_vars.yml"
# tasks file for infra
- name: Run through each task in serial using with_sequence
include_tasks:
- ./roles/app1/tasks/task1.yml
- ./roles/app1/tasks/task2.yml
with_sequence: "start=0 end=10"

Run Ansible Taks for several Groups in Parallel

We have many similar hosts that are grouped to specific types.
Every group has several hosts in it, mostly 2 to 8 for scalability within the type.
Now we need to run the same tasks/role on all these hosts.
Serialised within each group but all groups at the same time.
This should run much faster than all groups (about 10 groups currently) in a row.
Is this possible today with Ansible?
Maybe. I'm afriad I do not have the ability to test this idea, but here goes....
Let's say you have GroupA and GroupB. To ping each host in a group serially, but have the groups run in parallel, you could try this hideous construct:
---
- hosts: localhost
tasks:
- ping:
delegate_to: "{{ item }}"
with_items: "{{ groups['groupA'] }}"
forks: 1
async: 0
poll: 0
- ping:
delegate_to: "{{ item }}"
with_items: "{{ groups['groupB'] }}"
forks: 1
async: 0
poll: 0
Ansible is still going to show the task output separately.
When I ran this, files were created in /home/ansible/.ansible_async. Those files show the task start times, and it looked like it worked. To verify, I ran shell: sleep 5 instead of ping:, and saw the start times in those files properly interleaved.
Good luck.

ansible sh module does not report output until shell completes

How can I see realtime output from a shell script run by ansible?
I recently refactored a wait script to use multiprocessing and provide realtime status of the various service wait checks for multiple services.
As a stand alone script, it works as expecting providing status for each thread as they wait in parallel for various services to get stable.
In ansible, the output pauses until the python script completes (or terminates) and then provides the output. While, OK, it I'd rather find a way to display output sooner. I've tried setting PYTHONUNBUFFERED prior to running ansible-playbook via jenkins withEnv but that doesn't seem to accomplish the goal either
- name: Wait up to 30m for service stability
shell: "{{ venv_dir }}/bin/python3 -u wait_service_state.py"
args:
chdir: "{{ script_dir }}"
What's the standard ansible pattern for displaying output for a long running script?
My guess is that I could follow one of these routes
Not use ansible
execute in a docker container and report output via ansible provided this doesn't hit the identical class of problem
Output to a file from the script and have either ansible thread or Jenkins pipeline thread watch and tail the file (both seem kludgy as this blurs the separation of concerns coupling my build server to the deploy scripts a little too tightly)
You can use - https://docs.ansible.com/ansible/latest/user_guide/playbooks_async.html
main.yml
- name: Run items asynchronously in batch of two items
vars:
sleep_durations:
- 1
- 2
- 3
- 4
- 5
durations: "{{ item }}"
include_tasks: execute_batch.yml
loop: "{{ sleep_durations | batch(2) | list }}"
execute_batch.yml
- name: Async sleeping for batched_items
command: sleep {{ async_item }}
async: 45
poll: 0
loop: "{{ durations }}"
loop_control:
loop_var: "async_item"
register: async_results
- name: Check sync status
async_status:
jid: "{{ async_result_item.ansible_job_id }}"
loop: "{{ async_results.results }}"
loop_control:
loop_var: "async_result_item"
register: async_poll_results
until: async_poll_results.finished
retries: 30
"What's the standard ansible pattern for displaying output for a long running script?"
Standard ansible pattern for displaying output for a long-running script is polling async and loop until async_status finishes. The customization of the until loop's output is limited. See Feature request: until for blocks #16621.
ansible-runner is another route that might be followed.

Ansible integer variables in YAML

I'm using Ansible to deploy a webapp. I'd like to wait for the application to be running by checking that a given page returns a JSON with a given key/value.
I want the task to be tried a few times before failing. I'm therefore using the combination of until/retries/delay keybwords.
Issue is, I want the number of retries to be taken from a variable. If I write :
retries: {{apache_test_retries}}
I fall into the usual Yaml Gotcha (http://docs.ansible.com/YAMLSyntax.html#gotchas).
If, instead, I write:
retries: "{{apache_test_retries}}"
I'm being said the value is not an integer.
ValueError: invalid literal for int() with base 10: '{{apache_test_retries}}'
Here is my full code:
- name: Wait for the application to be running
local_action:
uri
url=http://{{webapp_url}}/health
timeout=60
register: res
sudo: false
when: updated.changed and apache_test_url is defined
until: res.status == 200 and res['json'] is defined and res['json']['status'] == 'UP'
retries: "{{apache_test_retries}}"
delay: 1
Any idea on how to work around this issue? Thanks.
I had the same issue and tried a bunch of things that didn't work so for some time I just worked around without using a variable but found the answer so for everyone who has it.
Daniels solution indeed should work:
retries: "{{ apache_test_retries | int }}"
However, if you are running a little older version of Ansible it won't work. So make sure you update Ansible. I tested on 1.8.4 and it works and it doesn't on 1.8.2
This was the original bug on ansible:
https://github.com/ansible/ansible/issues/5865
You should be able to convert it to an integer with the int filter:
retries: "{{ apache_test_retries | int }}"
I had the same problem and the solutions suggested here didn't work. I didn't try Tim Diels' suggestion though.
Here's what worked for me:
vars:
capacity: "{{ param_capacity | default(16) }}"
tasks:
- name: some task
...
when: item.usage < (capacity | int)
loop:
...
And here's what I was trying to do:
vars:
capacity: "{{ (param_capacity | default(16)) | int }}"
tasks:
- name: some task
...
when: item.usage < capacity
loop:
...
I found this issue on GitHub, about this same problem, and actually the intended way to use this filter is applying it where you use the variable, not where you declare it.
I have faced a similar issue, in my case I wanted to restart celeryd service. It sometimes takes a very long time to restart and I wanted to give it max 30 seconds for a soft restart, then force-restart it. I used async for this (polling for restart result every 5 seconds).
celery/handlers/main.yml
- name: restart celeryd
service:
name=celeryd
state=restarted
register: celeryd_restart_result
ignore_errors: true
async: "{{ async_val | default(30) }}"
poll: 5
- name: check celeryd restart result and force restart if needed
shell: service celeryd kill && service celeryd start
when: celeryd_restart_result|failed
And then I use above in the playbook as handlers to a task (restart celeryd is always first in notify list)
In your case something like below could possibly work. Haven't checked whether it does but it might give you some hack idea to solve it in a different way. Also since you will be ignoring errors in the 1st task, you need to make sure that things are fine in 2nd:
- name: Poll to check if the application is running
local_action:
uri
url=http://{{webapp_url}}/health
timeout=60
register: res
sudo: false
when: updated.changed and apache_test_url is defined
failed_when: res.status != 200 and res['json'] is not defined and not res['json']['status'] == 'UP'
ignore_errors: true
async: "{{ apache_test_retries | default(60) }}"
poll: 1
# Task above will exit as early as possible on success
# It will keep trying for 60 secs, polling every 1 sec
# You need to make sure it's fine **again** because it has ignore_errors: true
- name: Final UP check
local_action:
uri
url=http://{{webapp_url}}/health
timeout=60
register: res
sudo: false
when: updated.changed and apache_test_url is defined
failed_when: res.status != 200 and res['json'] is not defined and not res['json']['status'] == 'UP'
Hope it helps you solve the issue with a bug in retries.

Resources