I have been trying to call the 'restart the network' service in a fire and forget mode because obviously I will get disconnected from the SSH connection after I restart the network in a VM so I wanted to have a timeout process to do that.
In order to do that I did this inside of my restart networking tasks:
- name: Restart network
become: true
service: name=network state=restarted
async: 1000
poll: 0
When Ansible gets to this point I get this error:
fatal: [build]: FAILED! => {"failed": true, "msg": "async mode is not supported with the service module"}
Which I found that is an Ansible bug that is not yet in production and they still have it in the development branch, which I don't want to do because that would also mean more possible bugs in Ansible.
So, I have two options in my opinion, either I wait for the new release of Ansible to come with the bug fix or change async: 0 and poll: 0 to wait for the service to finish ( which it will never is going to finish ) so I press CTRL+C when get to that point to stop the service manually.
I don't want to go either of those routes because they are not very efficient for me, so I was wondering if there is a solution would be better at this point.
Try this as a temporary workaround:
- name: Restart network
become: yes
shell: sleep 2 && service network restart
async: 1
poll: 0
And don't forget to wait_for port 22 after this task to avoid host unreachable error.
Related
I'm deploying helm charts using community.kubernetes.helm with ease but I've run into conditions where the connection is refused and it's not clear how best to configure a retries/wait/until. I've run into a case where every now and then, helm can't communicate with the cluster, here's an example (dns/ip faked) showing that the issue is as simple as not being able to connect to the cluster:
fatal: [localhost]: FAILED! => {"changed": false, "command":
"/usr/local/bin/helm --kubeconfig /var/opt/kubeconfig
--namespace=gpu-operator list --output=yaml --filter gpu-operator", "msg": "Failure when executing Helm command. Exited 1.\nstdout:
\nstderr: Error: Kubernetes cluster unreachable: Get
"https://ec2-host/k8s/clusters/c-xrsqn/version?timeout=32s": dial
tcp 192.168.1.1:443: connect: connection refused\n", "stderr": "Error:
Kubernetes cluster unreachable: Get
"https://ec2-host/k8s/clusters/c-xrsqn/version?timeout=32s": dial
tcp 192.168.1.1:443: connect: connection refused\n", "stderr_lines":
["Error: Kubernetes cluster unreachable: Get
"https://ec2-host/k8s/clusters/c-xrsqn/version?timeout=32s": dial
tcp 192.168.1.1:443: connect: connection refused"], "stdout": "",
"stdout_lines": []}
In my experience, I have seen that try/retry will work. I agree that it would be ideal to figure out why I can't connect to the service, but it would be even more ideal to work around this by taking advantage of a catch all "until" block that tries this block until it works or gives up after N tries while taking a break of N seconds.
Here's an example of the ansible block:
- name: deploy Nvidia GPU Operator
block:
- name: deploy gpu operator
community.kubernetes.helm:
name: gpu-operator
chart_ref: "{{ CHARTS_DIR }}/gpu-operator"
create_namespace: yes
release_namespace: gpu-operator
kubeconfig: "{{ STATE_DIR }}/{{ INSTANCE_NAME }}-kubeconfig"
until: ???
retries: 5
delay: 3
when: GPU_NODE is defined
I would really appreciate any suggestions/pointers.
I discovered that registering the output and then testing until it's defined get's ansible to rerun. The key is learning what is going to be a successful output. For helm, it says it will define a status when it works correctly. So, this is what you need to add
register: _gpu_result
until: _gpu_result.status is defined
ignore_errors: true
retries: 5
delay: 3
retries/delay is up to you
I am unable to run ping commands from a ansible host (using localhost, see below).
I built a simple playbook to run ping using the command module:
---
#
- name: GET INFO
hosts: localhost
tasks:
- name: return motd to registered var
command: "/usr/bin/ping 10.39.120.129"
register: mymotd
- name: debug output
debug: var=mymotd
However, I this error: "ping: socket: Operation not permitted"
Seems like there is a permissions issue. However, looking at the /usr/bin directory, it looks like ping would be executable to me:
"-rwxr-xr-x. 1 root root 66176 Aug 4 2017 ping",
I cannot become or use sudo, it seems like tower is locked down for that and I don't have the authority to change it either.
Anyone have any suggestions? What brought me to this, is that I am trying to run ping in a custom module and getting a similar issue.
Thanks
ping binary needs to have the SETUID bit set to be fully runable as a normal user, which is not the case on your server.
You need to run as root:
chmod u+s $(which ping)
If you don't have root access and cannot have this done by an admin, I'm affraid you're stuck... unless the server you are trying to ping is a machine you can manage with ansible.
In this later case, there is a ping module you can use. It is not ICMP ping as said in the doc. See if this can be used in your situation.
One of the numerous ref I could find about ping permissions: https://ubuntuforums.org/showthread.php?t=927709
I created a playbook to reboot my remote servers. I use wait_for to wait for remote servers up before I continue. So I have the following code:
—-
- hosts: hostName
tasks:
- name: reboot
shell: reboot
async: 1
poll: 0
- name: wait for server to come up
Local_action: wait_for
args:
host: hostName
port: 22
state: started
delay: 10
timeout: 600
My targeted server was up about 5 minutes after reboot was initiated. However, the playbook stacked at this play till it timed out and generated error.
My questions are:
1. How doeS wait_for work here? Does it send ssh connection request to target host and time out if it cannot connect to the target host after 600 seconds? Or does it keep pinging the target host till it times out?
2.What could be the problem I am having?
You'll be better off using wait_for_connection in this case. For example, given the play is running at - hosts: hostName
- name: Wait 600 seconds, but only start checking after 10 seconds
wait_for_connection:
delay: 10
timeout: 600
Q: How does wait_for work here?
A: wait_for is waiting for a port to become available.
Q: Does it send the ssh connection request to the target host and time out if it cannot connect to the target host after 600 seconds?
A: No. It's testing the port.
Q: Does it keep pinging the target host till it times out?
A: No. It tries to create a socket. See wait_for.py
s = socket.create_connection((host, port), connect_timeout)
Q: What could be the problem I am having?
A: It's not clear from the data available. Do not run wait_for as local_action. Make sure the host rebooted successfully.
I am running ansible playbook to restart some of our servers but we need to sleep for 40 minutes between each server restart so if I sleep for 40 minutes in my playbook then it sleeps for a while but then my session gets terminated on Ubuntu box in prod and whole script is also stopped. Is there anything I can add in ansible playbook so that it can keep my session alive during the time whole playbook is running?
# This will restart servers
---
- hosts: tester
serial: "{{ num_serial }}"
tasks:
- name: copy files
copy: src=conf.prod dest=/opt/process/config/conf.prod owner=goldy group=goldy
- name: stop server
command: sudo systemctl stop server_one.service
- name: start server
command: sudo systemctl start server_one.service
- name: sleep for 40 minutes
pause: minutes=40
I want to sleep for 40 minutes without terminating my linux session and then move to next set of servers restart.
I am running ansible 2.6.3 version.
You can run your ansible script inside screen in order to keep the session alive even after disconnection.
Basically what you want to do is ssh into the production server, run screen, then execute the playbook inside the newly created session.
If you ever get disconnected, you can connect back to the server, then run screen -r to get back into your saved session.
I have a set of web server processes that I wish to restart one at a time. I want to wait for process N to be ready to service HTTP requests before restarting process N+1
The following works:
- name: restart server 9990
supervisorctl: name='server_9990' state=restarted
- wait_for: port=9990 delay=1
- name: restart server 9991
supervisorctl: name='server_9991' state=restarted
- wait_for: port=9991 delay=1
etc.
But I'd really like to do this in a loop. It seems that Ansible doesn't allow multiple tasks inside a loop (in this case, I need two tasks: supervisorctl and wait_for)
Am I missing a way to do this or is replicating these tasks for each instance of the server really the way to go?
I believe that's not possible with Ansible default functionality. I think your best bet would be to create your own module. I had seen that modules can call other modules, so you might be able to have a slim module which simply calls the supervisorctl module and then waits for the port to be ready. This module then could be called with with_items.
Another idea is to not use supervisorctl in the first place. You could run a shell or script task which does then manually call supervisorctl and waits for the port to open.