How to run 1 playbook for the same group by multiple plays aka threaded - ansible

Current setup that we do have ~2000 servers (in 1 group)
I would like to know if there is a way to run x.yml on all the group (where all the 2k servers are in ) but with multiple plays (threaded , or something)
ansible-playbook -i prod.ini -l my_group[50%] x.yml
ansible-playbook -i prod.ini -l my_group[other 50%] x.yml
solutions with awx or ansible-tower are not relevant.
using even 500-1000 forks didn't gave any improvement

try to combine forks, and the free strategy.
the default behavior of Ansible is:
Ansible runs each task on all hosts affected by a play before starting the next task on any host, using 5 forks.
So event if your increase the forks number, the tasks on special forks will still wait any host finish to go ahead. The free strategy allows each host to run until the end of the play as fast as it can
- hosts: all
strategy: free
tasks:
# ...
ansible-playbook -i prod.ini -f 500 -l my_group x.yml

As mentioned above, you should preferably increase fork and set the strategy to free. Increasing fork will help you run the playbook on more server and setting the strategy to free would allow you to run a task for servers independently without waiting for others.
Please refer to below doc for more clarifaction.
docs

resolved by using patterns my_group[:1000] and my_group[999:]
forks didnt give any time decrease in my case.
also free strategy did multiplied the time which was pretty weird.
also debugging free strategy summary is free difficult when u have 2k servers and about 50 tasks in playbook .
thanks everyone for sharing
much appreciated

Related

How do I check the succesfull retry with separate command in ansible?

Ansible 2.9.6
There is standard way to use retry in Ansible.
- name: run my command
register: result
retries: 5
delay: 60
until: result.rc == 0
shell:
cmd: >
mycommand.sh
until is a passive check here.
How can I do the check with the separate command? Like "retry command A several times until command B return 0"
Of cause I may put both commands inside shell execution "commandA ; commandB" and I will get exit status of the second one for the result.rc. But is any Ansible way to do this?
The ansible way would point towards retries over a block or include that contains a command task for each script. However, that's not supported. You can use a modified version of the workaround described here, though, but it starts to get complicated. Therefore, you may prefer to take Zeitounator's suggestion.

Use ansible for manual staged rollout using `serial` and unknown inventory size

Consider an Ansible inventory with an unknown number of servers in a nodes key.
The script I'm writing should be usable with different inventories that should be as simple as possible and are out of my control, so I don't know the number of nodes ahead of time.
My command to run the playbook is pretty vanilla and I can freely change it. There could be two separate commands for both rollout stages.
ansible-playbook -i $INVENTORY_PATH playbooks/example.yml
And the playbook is pretty standard as well and can be adjusted:
- hosts: nodes
vars:
...
remote_user: '{{ sudo_user }}'
gather_facts: no
tasks:
...
How would I go about implementing a staged execution without changing the inventory?
I'd like to run one command to execute the playbook for 50% of the inventory first. Here the result needs to be checked manually by a human. Then I'd like to use another command to execute the playbook for the other half. The author of the inventory should not have to worry about this. All machines below the nodes key are the same.
I've looked into the serial keyword, but it doesn't seem like I could automatically end execution after one batch and then later come back to continue with the second half.
Maybe something creative could be done with variables passed to ansible-playbook? I'm just wondering, shouldn't this be a common use-case? Are all staged rollouts supposed to be fully automated?
Without even using serial here is a possible very simple scenario.
First get a calculation of $half of the inventory by inspecting the inventory itself. The following is enabling the json callback plugin for the ad hoc command and making sure it is the only plugin enabled. It is also using jq to parse the result. You can adapt to any other json parser (or even use the yaml callback with a yaml parser if your prefer). Anyway, adapt to your own needs.
half=$( \
ANSIBLE_LOAD_CALLBACK_PLUGINS=1 \
ANSIBLE_STDOUT_CALLBACK=json \
ANSIBLE_CALLBACK_WHITELIST=json \
ansible localhost -i yourinventory.yml -m debug -a "msg={{ (groups['nodes'] | length / 2) | round(0, 'ceil') | int }}" \
| jq -r ".plays[0].tasks[0].hosts.localhost.msg" \
)
Then launch your playbook limiting to the first $half nodes with whatever vars are needed for human check, and launch it again for the remainder nodes without check.
ansible-playbook -i yourinventory.yml example_playbook.yml -l nodes[0:$(($half-1))] -e human_check=true
ansible-playbook -i yourinventory.yml example_playbook.yml -l nodes[$half:] -e human_check=false

Why playbook take wrong values for group variables?

I have a problem with groups variables.
Example: I have two inventory groups group_A and group_B, and also have the same name files in group_vars:
inventories/
hosts.inv
[group_A]
server1
server2
[group_B]
server3
server4
group_vars/
group_A - file
var_port: 9001
group_B - file
var_port: 9002
The problem is when i execute:
ansible-playbook playbooks/playbook.yml -i inventories/hosts.inv -l group_B
playbook was executed for proper scope of servers (server3, server4) but it takes variables from group variables file group_A.
expected result: var_port: 9002
in realty : var_port: 9001
ansible 2.4.2.0
BR Oleg
I included ANSIBLE_DEBUG , and what i have found:
2018-05-03 15:23:23,663 p=129458 u=user | 129458 1525353803.66336: Loading data from /ansible/inventories/prod/group_vars/group_B.yml
2018-05-03 15:23:23,663 p=129458 u=user | 129661 1525353803.66060: in run() - task 00505680-eccc-d94e-2b1b-0000000000f4
2018-05-03 15:23:23,664 p=129458 u=user | 129661 1525353803.66458: calling self._execute()
2018-05-03 15:23:23,665 p=129458 u=user | 129458 1525353803.66589: Loading data from /ansible/inventories/prod/group_vars/group_A.yml
on playbook execution ansible scan all files with variables in folder group_vars which have variable "var_port", last will win.....
as you can found in another topic:
Ansible servers/groups in development/production
and from documentation:
http://docs.ansible.com/ansible/latest/user_guide/playbooks_variables.html#variable-precedence-where-should-i-put-a-variable
Note
Within any section, redefining a var will overwrite the previous instance. If multiple groups have the same variable, **the last one loaded wins**. If you define a variable twice in a play’s vars: section, the **2nd one wins**.
For me now NOT clear how to manage configuration files. In this case I must use unique variables names for each group, but it is not possible regarding roles, or should I use include_vars when i call playbook?
Super example how to manage variables files in multistage environment from DigitalOcean
How to Manage Multistage Environments with Ansible
I believe that the problem here, while not explicitly stated in the original question, is that Server{1,2} and Server{3,4} are actually the same servers in 2 different groups at the same level.
I ran into this problem which caused me to do some digging. I don't agree with it, but it is as designed. This was even fixed with full compatibility and the pull request was rejected
Discussion
Pull Request

Run Ansible playbook on UNIQUE user/host combination

I've been trying to implement Ansible in our team to manage different kinds of application things such as configuration files for products and applications, the distribution of maintenance scripts, ...
We don't like to work with "hostnames" in our team because we have 300+ of them with meaningless names. Therefor, I started out creating aliases for them in the Ansible hosts file like:
[bpm-i]
bpm-app1-i1 ansible_user=bpmadmin ansible_host=el1001.bc
bpm-app1-i2 ansible_user=bpmadmin ansible_host=el1003.bc
[bpm-u]
bpm-app1-u1 ansible_user=bpmadmin ansible_host=el2001.bc
bpm-app1-u2 ansible_user=bpmadmin ansible_host=el2003.bc
[bpm-all:children]
bpm-i
bpm-u
Meaning we have a BPM application named "app1" and it's deployed on two hosts in integration-testing and on two hosts in user-acceptance-testing. So far so good. Now I can run an Ansible playbook to (for example) setup the SSH accesses (authorized_keys) for team members or push a maintenance script. I can run those PBs on each host seperately, on all hosts ITT or UAT or even everywhere.
But, typically, we'll have install the same application app1 again on an existing host but with a different purpose - say "training" environment. My reflex would be to do this:
[bpm-i]
bpm-app1-i1 ansible_user=bpmadmin ansible_host=el1001.bc
bpm-app1-i2 ansible_user=bpmadmin ansible_host=el1003.bc
[bpm-u]
bpm-app1-u1 ansible_user=bpmadmin ansible_host=el2001.bc
bpm-app1-u2 ansible_user=bpmadmin ansible_host=el2003.bc
[bpm-t]
bpm-app1-t1 ansible_user=bpmadmin ansible_host=el2001.bc
bpm-app1-t2 ansible_user=bpmadmin ansible_host=el2003.bc
[bpm-all:children]
bpm-i
bpm-u
bpm-t
But ... running PB's becomes a mess now and cause errors. Logically I have two alias names to reach the same user/host combination : bpm-app1-u1 and bpm-app1-t1. I don't mind, that's perfectly logical, but if I were to test a new maintenance script, I would first push it to bpm-app1-i1 for testing and when ok, I probably would run the PB against bpm-all. But because of the non-unique user/host combinations for some aliases the PB would run multiple times on the same user/host. Depending on the actions in the PB this may work coincidentally, but it may also fail horribly.
Is there no way to tell Ansible "Run on ALL - UNIQUE user/host combinations" ?
Since most tasks change something on the remote host, you could use Conditionals to check for that change on the host before running.
For example, if your playbook has a task to run a script that creates a file on the remote host, you can add a when clause to "skip the task if file exists" and check for the existence of that file with a stat task before that one.
- Check whether script has run in previous instance by looking for file
stat: path=/path/to/something
register: something
- name: Run Script when file above does not exist
command: bash myscript.sh
when: not something.exists

How to detect non-busy machines over a LAN automatically?

I'm writing an MPI program to be run over a local area network. These machines can be ssh'd to by any student at any time.
Although I always test my program at night, the performance has been very inconsistent. My guess is that some nodes were busy when I ran the program.
So my question is: can I write a script to detect non-busy machines and update the machine file? What's an easy way to write it?
Thanks a lot.
SSH into each machine, then read the /proc/loadavg file or determine the "business" in some other way.
I think the easiest way would be installing the check_load[1] script from Nagios to every node you want to check and call it via ssh with some sensible parameters:
# /usr/lib64/nagios/plugins/check_load -w 1,2,3 -c 3,4,5
OK - load average: 0.20, 0.43, 0.50|load1=0.200;1.000;3.000;0; load5=0.430;2.000;4.000;0; load15=0.500;3.000;5.000;0;
# /usr/lib64/nagios/plugins/check_load -w 0.1,2,3 -c 3,4,5
WARNING - load average: 0.18, 0.43, 0.50|load1=0.180;0.100;3.000;0; load5=0.430;2.000;4.000;0; load15=0.500;3.000;5.000;0;
# /usr/lib64/nagios/plugins/check_load -w 0.01,2,3 -c
0.1,4,5
CRITICAL - load average: 0.41, 0.46, 0.51|load1=0.410;0.010;0.100;0; load5=0.460;2.000;4.000;0; load15=0.510;3.000;5.000;0;
CRITICAL would mean "really busy", WARNING could be "is kinda busy" and OK would mean "the machine is idle".
You have to pay attention for the tresholds you have to give as 1/5/15 minute for warning and critical; for instance, a machine with 16 cores having a load of 3 is perfectly ok, while a load of 3 on a single-core machine would mean it's really really busy.
Good luck!
Alex.
[1] http://nagiosplugins.org/man/check_load

Resources