I am facing currently a strange issue and I am not able to guess what causes it.
I wrote small Ansible scripts to test if Kafka schema-registry and connectors are running by calling their APIs.
I could run those Ansible playbooks on my local machine successfully. However, when running them in Gitlab CI pipeline (im using the same local machine as gitlab runner), the connect_test always breaks with the following error:
fatal: [xx.xxx.x.x]: FAILED! => {"changed": false, "elapsed": 0, "msg": "Status code was -1 and not [200]: Request failed: <urlopen error [Errno 111] Connection refused>", "redirected": false, "status": -1, "url": "http://localhost:8083"}
The strange thing, that this failed job will work when I click on retry button in CI pipeline.
Has anyone an idea about this issue? I would appreciate your help.
schema_test.yml
---
- name: Test schema-registry
hosts: SCHEMA
become: yes
become_user: root
tasks:
- name: list schemas
uri:
url: http://localhost:8081/subjects
register: schema
- debug:
msg: "{{ schema }}"
connect_test.yml
---
- name: Test connect
hosts: CONNECT
become: yes
become_user: root
tasks:
- name: check connect
uri:
url: http://localhost:8083
register: connect
- debug:
msg: "{{ connect }}"
.gitlab-ci.yml
test-connect:
stage: test
script:
- ansible-playbook connect_test.yml
tags:
- gitlab-runner
test-schema:
stage: test
script:
- ansible-playbook schema_test.yml
tags:
- gitlab-runner
update
I replaced URI module with shell. as a result, I see the same behaviour. The initial run of the pipeline will fail and retrying the job will fix it
maybe you are restarting the services in a previous job, take in mind that kafka connect needs generally more time to be available after the restart. Try to pause ansible after you restart the service for a minute or so.
Related
In my job there is a playbook developed in the following way that is executed by ansible tower.
This is the file that ansible tower executes and calls a playbook
report.yaml:
- hosts: localhost
gather_facts: false
connection: local
tasks:
- name: "Execute"
include_role:
name: 'fusion'
main.yaml from fusion role:
- name: "hc fusion"
include_tasks: "hc_fusion.yaml"
hc_fusion.yaml from fusion role:
- name: "FUSION"
shell: ansible-playbook roles/fusion/tasks/fusion.yaml --extra-vars 'fusion_ip_ha={{item.ip}} fusion_user={{item.username}} fusion_pass={{item.password}} fecha="{{fecha.stdout}}" fusion_ansible_become_user={{item.ansible_become_user}} fusion_ansible_become_pass={{item.ansible_become_pass}}'
fusion.yaml from fusion role:
- hosts: localhost
vars:
ansible_become_user: "{{fusion_ansible_become_user}}"
ansible_become_pass: "{{fusion_ansible_become_pass}}"
tasks:
- name: Validate
ignore_unreachable: yes
shell: service had status
delegate_to: "{{fusion_user}}#{{fusion_ip_ha}}"
become: True
become_method: su
This is a summary of the entire run.
Previously it worked but throws the following error.
stdout: PLAY [localhost] \nTASK [Validate] [1;31mfatal: [localhost -> gandalf#10.66.173.14]: UNREACHABLE! => {\"changed\": false, \"msg\": \"Failed to connect to the host via ssh: Warning: Permanently added '10.66.173.14' (RSA) to the list of known hosts.\ngandalf#10.66.173.14: Permission denied (publickey,gssapi-keyex,gssapi-with-mic,password), \"skip_reason\": \"Host localhost is unreachable\"
When I execute ansible-playbook roles/fusion/tasks/fusion.yaml --extra-vars XXXXXXXX from the command line with user awx it works.
Also I validated the connection from the server where ansible tower is running to where you want to connect with the ssh command and if it allows me to connect without requesting a password with the user awx
fusion.yaml does not explicitly specify connection plugin, thus default ssh type is being used. For localhost this approach usually brings a number of related problems (ssh keys, known_hosts, loopback interfaces etc.). If you need to run tasks on localhost you should define connection plugin local just like in your report.yaml playbook.
Additionally, as Zeitounator mentioned, running one ansible playbook from another with shell model is a really bad practice. Please, avoid this. Ansible has a number of mechanism for code re-use (includes, imports, roles etc.).
Trying to execute this code but having the said errors encountered above
code
---
- hosts: localhost
connection: local
collections:
- community.general.terraform
tasks:
- name: Execute Terraform Template
project_path: '/Users/<username>/Desktop/<repository>/<file>'
state: present
force_init: true
The offending line appears to be:
tasks:
- name: Execute Terraform Template
^ here
been trying to figure this one out.. I am using a macOS, installed Ansible locally already.
Thank you in advance!!
Trying to execute the code above but unable to succeed.
You have an indentation problem in your script.
Should work like this:
- name: Run Terraform
gather_facts: No
hosts: localhost
tasks:
- name: Execute Terraform Template
terraform:
project_path: '/Users/<username>/Desktop/<repository>/<file>'
state: present
force_init: true
I am using Ansible AWX to issue a restart command to restart an apache2 service on a host. The restart command is contained in a playbook.
---
- name: Manage Linux Services
hosts: all
tasks:
- name: Restart a linux service
command: systemctl restart '{{ service_name }}'
register: result
ignore_errors: yes
- name: Show result of task
debug:
var: result
OR
---
- name: Manage Linux Services
hosts: all
tasks:
- name: Restart a linux service
ansible.builtin.service:
name: '{{ service_name }}'
state: restarted
register: result
ignore_errors: yes
- name: Show result of task
debug:
var: result
However, when I run the command, I get the error below:
"Failed to restart apache2.service: Connection timed out",
"See system logs and 'systemctl status apache2.service' for details."
I have tried to figure out the issue, but no luck yet.
I later figured the cause of the issue.
Here's how I fixed it:
The restart command requires sudo access to run which was missing in my command.
All I have to do was to add the become: true command so that I can execute the command with root privileges.
So my playbook looked like this thereafter:
---
- name: Manage Linux Services
hosts: all
tasks:
- name: Restart a linux service
command: systemctl restart '{{ service_name }}'
become: true
register: result
ignore_errors: yes
- name: Show result of task
debug:
var: result
OR
---
- name: Manage Linux Services
hosts: all
tasks:
- name: Restart a linux service
ansible.builtin.service:
name: '{{ service_name }}'
state: restarted
become: true
register: result
ignore_errors: yes
- name: Show result of task
debug:
var: result
Another way if you want to achieve this on Ansible AWX is to tick the Privilege Escalation option in the job template.
If enabled, this runs the selected playbook in the job template as an administrator.
That's all.
I hope this helps
Restarting a service requires sudo privileges. Besides adding the 'become' directive, if you would like to prompt for the password, you can do so by passing the -K flag (note: uppercase K)
$ ansible-playbook myplay.yml -i hosts -u myname --ask-pass -K
Is there a way to wait for ssh to become available on a host before installing a role? There's wait_for_connection but I only figured out how to use it with tasks.
This particular playbook spin up servers on a cloud provider before attempting to install roles. But fails since the ssh service on the hosts isn't available yet.
How should I fix this?
---
- hosts: localhost
connection: local
tasks:
- name: Deploy vultr servers
include_tasks: create_vultr_server.yml
loop: "{{ groups['vultr_servers'] }}"
- hosts: all
gather_facts: no
become: true
tasks:
- name: wait_for_connection # This one works
wait_for_connection:
delay: 5
timeout: 600
- name: Gather facts for first time
setup:
- name: Install curl
package:
name: "curl"
state: present
roles: # How to NOT install roles UNLESS the current host is available ?
- role: apache2
vars:
doc_root: /var/www/example
message: 'Hello world!'
- common-tools
Ansible play actions start with pre_tasks, then roles, followed by tasks and finally post_tasks. Move your wait_for_connection task as the first pre_tasks and it will block everything until connection is available:
- hosts: all
gather_facts: no
become: true
pre_tasks:
- name: wait_for_connection # This one works
wait_for_connection:
delay: 5
timeout: 600
roles: ...
tasks: ...
For more info on execution order, see this title in role's documentation (paragraph just above the notes).
Note: you probably want to move all your current example tasks in that section too so that facts are gathered and curl installed prior to do anything else.
I'm trying to create a playbook that creates EC2 snapshots of some AWS Windows servers. I'm having a lot of trouble understanding how to get the correct directives in place to make sure things are running as they can. What I need to do is:
run the AWS commands locally, i.e. on the AWX host (as this is preferable to having to configure credentials on every server)
run the commands against each host in the inventory
I've done this in the past with a different group of Linux servers with no issue. But the fact that I'm having these issues makes me think that's not working as I think it is (I'm pretty new to all things Ansible/AWX).
The first step is I need to identify instances that are usually turned off and turn them on, then to take snapshots, then to turn them off again if they are usually turned off. So this is my main.yml:
---
- name: start the instance if the default state is stopped
import_playbook: start-instance.yml
when: default_state is defined and default_state == 'stopped'
- name: run the snapshot script
import_playbook: instance-snapshot.yml
- name: stop the instance if the default state is stopped
import_playbook: stop-instance.yml
when: default_state is defined and default_state == 'stopped'
And this is start-instance.yml
---
- name: make sure instances are running
hosts: all
gather_facts: false
connection: local
tasks:
- name: start the instances
ec2:
instance_id: "{{ instance_id }}"
region: "{{ aws_region }}"
state: running
wait: true
register: ec2
- name: pause for 120 seconds to allow the instance to start
pause: seconds=120
When I run this, I get the following error:
fatal: [myhost,mydomain.com]: UNREACHABLE! => {
"changed": false,
"msg": "ssl: HTTPSConnectionPool(host='myhost,mydomain.com', port=5986): Max retries exceeded with url: /wsman (Caused by ConnectTimeoutError(<requests.packages.urllib3.connection.VerifiedHTTPSConnection object at 0x7f99045e6bd0>, 'Connection to myhost,mydomain.com timed out. (connect timeout=30)'))",
"unreachable": true
I've learnt enough to know that this means it is, indeed, trying to connect to each Windows host which is not what I want. However, I thought having connection: local resolved this as it seemed to with another template I have which uses and identical playbook.
So If I change start-instance.yml to instead say "connection: localhost", then the play runs - but skips the steps because it determines that no hosts meet the condition (i.e. default_state is defined and default_state == 'stopped'). This tells me that the play is running on using localhost to run the play, but is also running it against localhost instead of the list of hosts in the inventory in AWX.
My hosts have variables defined in AWX such as instance ID, region, default state. I know in normal Ansible use we would have the instance IDs in the playbook and that's how Ansible would know which AWS instances to start up, but in this case I need it to get this information from AWX.
Is there any way to do this? So, run the tasks (start the instances, pause for 120 seconds) against all hosts in my AWX inventory, using localhost?
In the end I used delegate_to for this. I changed the code to this:
---
- name: make sure instances are running
hosts: all
gather_facts: false
tasks:
- name: start the instances
delegate_to: localhost
ec2:
instance_id: "{{ instance_id }}"
region: "{{ aws_region }}"
state: running
wait: true
register: ec2
- name: pause for 120 seconds to allow the instance to start
pause: seconds=120
And AWX correctly used localhost to run the tasks I added delegation to.
Worth noting I got stuck for a while before realising my template needed two sets of credentials - one IAM user with correct permissions to run the AWS commands, and one for the machine credentials for the Windows instances.