I'm using Ansible with a dynamic inventory plugin to build some virtual machines in libvirt. After creating the machines, I need to wait for them to acquire an ip address. I can't simply do this:
- wait_for_connection:
Because immediately after the virtual machines are created, they won't have an ip address. What I need to do is this:
- name: wait until node has acquired an address
meta: refresh_inventory
until: ansible_host|ipaddr
retries: 30
delay: 1
- name: wait until node has finished booting
wait_for_connection:
That is, I need to wait until the inventory information for each host includes an address in ansible_host. Unfortunately, the above task doesn't work: it simply executes once and continues.
I could just hardcode a delay:
- pause:
seconds: 30
But I would love to have a more active check here to avoid unnecessary delays (and problems if something takes longer than expected).
After fiddling around with it a bit, this is what I ended up doing:
- hosts: ovn
gather_facts: false
tasks:
- name: wait for nodes to acquire addresses
delegate_to: localhost
command: >-
ansible-inventory --host {{ inventory_hostname }}
register: nodecheck
changed_when: false
until: >-
(nodecheck.stdout|from_json).ansible_host|default('')|ipaddr
retries: 30
delay: 1
- meta: refresh_inventory
This repeatedly calls ansible-inventory --host <host>, which outputs inventory information for <host> in JSON format. We parse that, look for ansible_host, and verity that it's an ip address.
Once we know that the inventory source is able to report an ip address for all the nodes, we then call refresh_inventory.
Related
I've written a small playbook to run the sudo /usr/sbin/dmidecode -t1 | grep -i vmware | grep -i product command and write the output in a result file by usign the following code as a .yml:
# Check if server is vmware
---
- name: Check if server is vmware
hosts: all
become: yes
#ignore_errors: yes
gather_facts: False
serial: 50
#become_flags: -i
tasks:
- name: Run uptime command
#become: yes
shell: "sudo /usr/sbin/dmidecode -t1 | grep -i vmware | grep -i product"
register: upcmd
- debug:
msg: "{{ upcmd.stdout }}"
- name: write to file
lineinfile:
path: /home/myuser/ansible/mine/vmware.out
create: yes
line: "{{ inventory_hostname }};{{ upcmd.stdout }}"
delegate_to: localhost
#when: upcmd.stdout != ""
When running the playbook against a list of hosts I get different weird results so even if the debug shows the correct output, when I check the /home/myuser/ansible/mine/vmware.out file I see only part of them being present. Even weirder is that if I run the playbook again, I will correctly populate the whole list but only if I run this twice. I have repeated this several times with some minor tweaks but not getting the expected result. Doing -v or -vv shows nothing unusual.
You are writing to the same file in parallel on localhost. I suspect you're hitting a write concurrency issue. Try the following and see if it fixes your problem:
- name: write to file
lineinfile:
path: /home/myuser/ansible/mine/vmware.out
create: yes
line: "{{ host }};{{ hostvars[host].upcmd.stdout }}"
delegate_to: localhost
run_once: true
loop: "{{ ansible_play_hosts }}"
loop_control:
loop_var: host
From your described case I understand that you like to find out "How to check if a server is virtual?"
The information will already be collected by the setup module.
---
- hosts: linux_host
become: false
gather_facts: true
tasks:
- name: Show Gathered Facts
debug:
msg: "{{ ansible_facts }}"
For an under MS Hyper-V virtualized Linux system, the output could contain
...
bios_version: Hyper-V UEFI Release v1.0
...
system_vendor: Microsoft Corporation
uptime_seconds: 2908494
...
userspace_architecture: x86_64
userspace_bits: '64'
virtualization_role: guest
virtualization_type: VirtualPC
and having already the uptime in seconds included
uptime
... up 33 days ...
For just only a virtual check one could gather_subset resulting into a full output of
gather_subset:
- '!all'
- '!min'
- virtual
module_setup: true
virtualization_role: guest
virtualization_type: VirtualPC
By Caching facts
... you have access to variables and information about all hosts even when you are only managing a small number of servers
on your Ansible Control Node. In ansible.cfg you can configure where and how they are stored and for how long.
fact_caching = yaml
fact_caching_connection = /tmp/ansible/facts_cache
fact_caching_timeout = 86400 # seconds
This would be a minimal and simple solution without re-implementing functionality which is already there.
Further Documentation and Q&A
Ansible facts
What is the exact list of Ansible setup min?
i'm trying to write an ansible playbook to check if a set of machines are up and running.
Let's say, I've 5 machines to test. I'm trying to understand if I can have a playbook to capture status(up or down) of all 5 machines by checking one by one sequentially without failing the play if one of the machine is down.
It's possible to use wait_for_connection in the block. For example
- hosts: all
gather_facts: false
tasks:
- block:
- wait_for_connection:
sleep: 1
timeout: 10
rescue:
- debug:
msg: "{{ inventory_hostname }} not connected. End of host."
- meta: clear_host_errors
- meta: end_host
- debug:
msg: "{{ inventory_hostname }} is running"
- setup:
I want to run some ansible task on 4 servers one by one, i.e. on serial manner. But there will be a pause in between. So, I have added the pause at last in the playbook, but I want it to be skipped on last server. Otherwise it will wait for no reason. Please let me know how to implement this.
---
- hosts: server1,server2,server3,server4
serial: 1
vars_files:
- ./vars.yml
tasks:
- name: Variable test
pause:
minutes: 1
Really interesting problem which forced me to look for an actual solution. Here is the quickest one I came up with.
The ansible special variables documentation defines the ansible_play_hosts_all variable as follow
List of all the hosts that were targeted by the play
The list of hosts in that var is in the order it was found inside the inventory.
Provided you use the default inventory order for your play, you can set a test that will trigger the task unless the current host is the last one in that list:
when: inventory_hostname != ansible_play_hosts_all[-1]
As reported by #Vladimir in the comments below, if you change the play order parameter from default, this approach will break.
The playbook below does the job
- hosts: all
serial: 1
vars:
completed: false
tasks:
- set_fact:
completed: true
- block:
- debug:
msg: All completed. End of play.
- meta: end_play
when: "groups['all']|
map('extract', hostvars, 'completed')|
list is all"
- name: Variable test
pause:
minutes: 1
Notes
see any/all
see Extracting values from containers
see hostvars
I am using the following Ansible playbook to shut down a list of remote Ubuntu hosts all at once:
- hosts: my_hosts
become: yes
remote_user: my_user
tasks:
- name: Confirm shutdown
pause:
prompt: >-
Do you really want to shutdown machine(s) "{{play_hosts}}"? Press
Enter to continue or Ctrl+C, then A, then Enter to abort ...
- name: Cancel existing shutdown calls
command: /sbin/shutdown -c
ignore_errors: yes
- name: Shutdown machine
command: /sbin/shutdown -h now
Two questions on this:
Is there any module available which can handle the shutdown in a more elegant way than having to run two custom commands?
Is there any way to check that the machines are really down? Or is it an anti-pattern to check this from the same playbook?
I tried something with the net_ping module but I am not sure if this is its real purpose:
- name: Check that machine is down
become: no
net_ping:
dest: "{{ ansible_host }}"
count: 5
state: absent
This, however, fails with
FAILED! => {"changed": false, "msg": "invalid connection specified, expected connection=local, got ssh"}
In more restricted environments, where ping messages are blocked you can listen on ssh port until it goes down. In my case I have set timeout to 60 seconds.
- name: Save target host IP
set_fact:
target_host: "{{ ansible_host }}"
- name: wait for ssh to stop
wait_for: "port=22 host={{ target_host }} delay=10 state=stopped timeout=60"
delegate_to: 127.0.0.1
There is no shutdown module. You can use single fire-and-forget call:
- name: Shutdown server
become: yes
shell: sleep 2 && /sbin/shutdown -c && /sbin/shutdown -h now
async: 1
poll: 0
As for net_ping, it is for network appliances such as switches and routers. If you rely on ICMP messages to test shutdown process, you can use something like this:
- name: Store actual host to be used with local_action
set_fact:
original_host: "{{ ansible_host }}"
- name: Wait for ping loss
local_action: shell ping -q -c 1 -W 1 {{ original_host }}
register: res
retries: 5
until: ('100.0% packet loss' in res.stdout)
failed_when: ('100.0% packet loss' not in res.stdout)
changed_when: no
This will wait for 100% packet loss or fail after 5 retries.
Here you want to use local_action because otherwise commands are executed on remote host (which is supposed to be down).
And you want to use trick to store ansible_host into temp fact, because ansible_host is replaced with 127.0.0.1 when delegated to local host.
I'm provisioning a new server via Terraform and using Ansible as the provisioner on my local system.
Terraform provisions a system on EC2, and then it runs the Ansible playbook providing the IP of the newly built system as the inventory.
I want to use Ansible to wait for the system to finish booting and prevent further tasks from being attempted up until a connection can be established. Up until this point I have been using a manual pause which is inconvenient and imprecise.
Ansible doesn't seem to do what the documentation says it will (unless I'm wrong, a very possible scenario). Here's my code:
- name: waiting for server to be alive
wait_for:
state: started
port: 22
host: "{{ ansible_ssh_host | default(inventory_hostname) }}"
delay: 10
timeout: 300
connect_timeout: 300
search_regex: OpenSSH
delegate_to: localhost
What happens in this step is that the connection doesn't wait any more than 10 seconds to make the connection, and it fails. If the server has booted and I try the playbook again, it works fine and performs as expected.
I've also tried do_until style loops which never seem to work. All examples given in documentation use shell output, and I don't see any way that it would work for non-shell modules.
I also can't seem to get any debug information if I try to register a result and print it out using the debug module.
Anyone have any suggestions as to what I'm doing wrong?
When you use delegate_to or local_action module, {{ ansible_ssh_host }} resolves to localhost, so your task is always running with the following parameter:
host: localhost
It waits for 10 seconds, checks the SSH connection to local host and proceeds (because most likely it is open).
If you use gather_facts: false (which I believe you do) you can add a set_fact task before, to store the target host name value in a variable:
- set_fact:
host_to_wait_for: "{{ ansible_ssh_host | default(inventory_hostname) }}"
and change the line to:
host: "{{ host_to_wait_for }}"
You can proof-test the variables with the following playbook:
---
- hosts: all
gather_facts: false
tasks:
- set_fact:
host_to_wait_for: "{{ ansible_ssh_host | default(inventory_hostname) }}"
- debug: msg="ansible_ssh_host={{ ansible_ssh_host }}, inventory_hostname={{ inventory_hostname }}, host_to_wait_for={{ host_to_wait_for }}"
delegate_to: localhost
Alternatively you can try to find a way to provide the IP address of the EC2 instance to Ansible as a variable and use it as a value for host: parameter. For example, you run Ansible from CLI, then pass ${aws_instance.example.public_ip} to --extra-vars argument.
As techraf indicates, your inventory lookup is actually grabbing the localhost address because of the delegation, so it's not running against the correct machine.
I think your best solution might be to have terraform pass a variable to the playbook containing the instance's IP address. Example:
terraform passes -e "new_ec2_host=<IP_ADDR>"
Ansible task:
- name: waiting for server to be alive
wait_for:
state: started
port: 22
host: "{{ new_ec2_host }}"
delay: 10
timeout: 300
connect_timeout: 300
search_regex: OpenSSH
delegate_to: localhost