How do I solve the "Waiting for calico node service" error message - ibm-cloud-private

Need assistance with ICP install. I have encountered the following error:
TASK [addon : Waiting for calico node service] *********************************
failed: [localhost -> 129.40.227.142] (item=129.40.227.142) => {"elapsed": 600, "failed": true, "item": "129.40.227.142", "msg": "Timeout when waiting for 129.40.227.142:9099"}
PLAY RECAP *********************************************************************
129.40.227.142 : ok=172 changed=66 unreachable=0 failed=0
129.40.227.143 : ok=157 changed=55 unreachable=0 failed=0
129.40.227.144 : ok=116 changed=24 unreachable=0 failed=0
localhost : ok=118 changed=52 unreachable=0 failed=1

Try explicitly setting the calico_ip_autodetection_method: interface to your interface name in your config.yaml file.
Here's the documentation of all things you can do in that file.
https://www.ibm.com/support/knowledgecenter/SSBS6K_2.1.0/installing/config_yaml.html#network_setting.
Kudos to Harry P.

We had a similar issue installing.
The solution was the following command on each machine. (Use Ansible to make your life easier)
sysctl -w net.ipv4.conf.all.rp_filtered=1
make sure to add it to the sysctl.conf file so it persists on restart.
echo "net.ipv4.conf.all.rp_filtered=1" | tee -a /etc/sysctl.conf
Hope this helps others.

As for me I had similar issue, caused by running out of disk space in the /var folder. Check if you have no error messages in the /var/log/containers folder stating some pods could not be started or similar.
The documentation states you will need 40G of space. If you setup docker to place the containers on your data drive, the install will use ~ 500MB in the /opt and ~300MB in the /var.

Related

ansible-playbook ignores '--check' parameter?

As testing a simple Ansible playbook
---
- hosts: mikrotiks
connection: network_cli
gather_facts: no
vars:
ansible_network_os: routeros
ansible_user: admin
tasks:
- name: Add Basic FW Rules
routeros_command:
commands:
- /ip firewall nat add chain=srcnat out-interface=ether1 action=masquerade
on my mikrotik router, I used the command with --check argument
ansible-playbook -i hosts mikrotik.yml --check
but it seems that tasks actually got executed.
PLAY [mikrotiks] **************************************************************************************************************************************
TASK [Add Basic FW Rules] **************************************************************************************************************************************
changed: [192.168.1.82]
PLAY RECAP **************************************************************************************************************************************
192.168.1.82 : ok=1 changed=1 unreachable=0 failed=0 skipped=0 rescued=0 ignored=0
ansible.cfg file is the default configuration after fresh install.
According the documentation command module – Run commands on remote devices running MikroTik RouterOS
The module always indicates a (changed) status. You can use the changed_when task property to determine whether a command task actually resulted in a change or not.
Since the module is part of the community.routeros collection I had a short look into the source there and found that is supporting check_mode according
module = AnsibleModule(argument_spec=argument_spec,
supports_check_mode=True)
So you will need to follow up with defining "changed".

Ansible failed to complete successfully

When running vagrant up I get the following error:
RUNNING HANDLER [mariadb : restart mysql server] *******************************
DEBUG subprocess: stdout: changed: [default]
INFO interface: detail: changed: [default]
changed: [default]
DEBUG subprocess: stdout:
PLAY RECAP *********************************************************************
INFO interface: detail:
PLAY RECAP *********************************************************************
PLAY RECAP *********************************************************************
DEBUG subprocess: stdout: default : ok=120 changed=83 unreachable=0 failed=1 skipped=32 rescued=0 ignored=0
INFO interface: detail: default : ok=120 changed=83 unreachable=0 failed=1 skipped=32 rescued=0 ignored=0
default : ok=120 changed=83 unreachable=0 failed=1 skipped=32 rescued=0 ignored=0
DEBUG subprocess: Waiting for process to exit. Remaining to timeout: 31722
DEBUG subprocess: Exit status: 2
ERROR warden: Error occurred: Ansible failed to complete successfully. Any error output should be
visible above. Please fix these errors and try again.
Maybe this part of the output is also of interest:
ERROR vagrant: /opt/vagrant/embedded/gems/2.2.6/gems/vagrant-2.2.6/plugins/provisioners/ansible/provisioner/host.rb:104:in `execute_command_from_host'
/opt/vagrant/embedded/gems/2.2.6/gems/vagrant-2.2.6/plugins/provisioners/ansible/provisioner/host.rb:179:in `execute_ansible_playbook_from_host'
My version of Ansible is 2.9.2 (the newest I guess). There is some large Vagrant file I do not know which part is causing the error. How can I debug this?

Ansible Tower fetching file job returns OK but no file present at local machine

I have a lab that consists of an Ansible Tower system and Ubuntu Desktop client. I've successfuly created and executed some playbooks to update and install packages and everythig was OK. Now i want to fetch /var/log/syslog from remote Ubuntu desktop to my Ansible Tower system. My playbook is:
---
- hosts: Ubuntu_18.04_Desktops
tasks:
- name: Get /var/log/syslog
fetch:
src: /var/log/syslog
dest: /tmp
Running this playbook shows the result:
PLAY [Ubuntu_18.04_Desktops] ***************************************************
TASK [Gathering Facts] *********************************************************
ok: [192.168.1.165]
TASK [Get /var/log/syslog] *****************************************************
changed: [192.168.1.165]
PLAY RECAP *********************************************************************
192.168.1.165 : ok=2 changed=1 unreachable=0 failed=0 skipped=0 rescued=0 ignored=0
But no file is present at /tmp directory of Tower server.
I've tried to use 'flat' directive and to save file to my home's folder, but no success.
I found the problem - Ansible Tower (AWX in my case) stores fetched files in ansible/awx_task container's filesystem.
Ansible Tower's Job Isolation system hides certain paths from you and redirects them to a safe location.
If you do want to use the system's /tmp, you can open Tower Settings -> Jobs -> add /tmp to paths to expose to isolated jobs.
Note that if you need the security to not expose /tmp to all Tower jobs, you should not do this.

Unable to access hosts beside local host in Ansible

I'm self-learning ansible and having the files structure below with my customized ansible.cfg and inventory file:
---ansible (Folder) <---Working level
---ansible.cfg (Customized cfg)
---dev (Inventory File)
---playbooks (Folder) <--Issue level
---hostname.yml
ansible.cfg
[defaults]
inventory=./dev
dev
[loadbalancer]
lb01
[webserver]
app01
app02
[database]
db01
[control]
control ansible_connection=localansible
hostname.yml
---
- hosts: all
tasks:
- name: get server hostname
command: hostname
My question is when I run ansible-playbook playbooks/hostname.yml at ansible (Folder) level, everything works fine
PLAY [all] *********************************************************************
TASK [setup] *******************************************************************
ok: [control]
ok: [app01]
ok: [db01]
ok: [lb01]
ok: [app02]
TASK [get server hostname] *****************************************************
changed: [db01]
changed: [app01]
changed: [app02]
changed: [lb01]
changed: [control]
PLAY RECAP *********************************************************************
app01 : ok=2 changed=1 unreachable=0 failed=0
app02 : ok=2 changed=1 unreachable=0 failed=0
control : ok=2 changed=1 unreachable=0 failed=0
db01 : ok=2 changed=1 unreachable=0 failed=0
lb01 : ok=2 changed=1 unreachable=0 failed=0
Howevery when I go into the playbooks folder and run ansible-playbook hostname.yml it give me warning saying
[WARNING]: provided hosts list is empty, only localhost is available
PLAY RECAP *********************************************************************
Is there anything that prevents playbook from accessing the inventory file?
Ansible does not traverse parent directories in search for the configuration file ansible.cfg. If you want to use a custom configuration file, you need to place it in the current directory and run Ansible executables from there or define an environment variable ANSIBLE_CONFIG pointing to the file.
In your case: changing the directory to ansible and executing is the proper way:
ansible-playbook playbooks/hostname.yml
I don't know where from you got the idea Ansible would check for the configuration file in the parent directory - the documentation is unambiguous:
Changes can be made and used in a configuration file which will be processed in the following order:
ANSIBLE_CONFIG (an environment variable)
ansible.cfg (in the current directory)
.ansible.cfg (in the home directory)
/etc/ansible/ansible.cfg

Ansible multi-play playbook silently ignore plays

Hope you can help work out why my playbook isn't completing as expected.
ENVIRONMENT
OSX El Capitan
ansible 2.1.0.0
CONFIGURATION
Nothing exciting:
[defaults]
roles_path=./roles
host_key_checking = False
ssh_args= -t -t
allow_world_readable_tmpfiles = True
PLAYBOOK
I have a reasonably involved setup with a number of plays in one playbook.
The playbook is run against different target systems; the production site and a dev rig. (Please don't suggest I combine them... it's an IoT system and complex enough as it is.)
Here's my somewhat redacted playbook:
- hosts: all
roles:
- ...
- hosts: xmpp_server
roles:
- ...
- hosts: audit_server
roles:
- ...
- hosts: elk_server
roles:
- ...
- hosts: all
roles:
- ...
Now, please bear in mind that I have an IoT setup with various redundancies, replication and distribution going on, so although there are other ways of skinning the cat, the above decomposition into multiple plays is really neat for my setup and I'd like to keep it.
Also important: I have no audit_server or elk_server hosts on my dev rig. Those groups are currently empty as I'm working on an orthogonal issue and don't need them consuming limited dev resources. I do have those in production, just not in dev.
EXPECTED BEHAVIOUR
On the production site I expect all the plays to trigger and run.
On the dev rig I expect the first play (all) and the xmpp_server play to run, the audit_server and elk_server plays to skip and the last (all) play to run after that.
ACTUAL BEHAVIOUR
The production site works exactly as expected. All plays run.
The dev rig completes the xmpp_server play as dev-piA is a member of the xmpp_server group. And then it silently stops. No error, no information, nothing. Just straight to the play recap. Here's the output:
...
TASK [xmppserver : include] ****************************************************
included: /Users/al/Studio/Projects/smc/ansible/roles/xmppserver/tasks/./openfire.yml for dev-piA
TASK [xmppserver : Get openfire deb file] **************************************
ok: [dev-piA]
TASK [xmppserver : Install openfire deb file] **********************************
ok: [dev-piA]
TASK [xmppserver : Check if schema has been uploaded previously] ***************
ok: [dev-piA]
TASK [xmppserver : Install openfire schema to postgres db] *********************
skipping: [dev-piA]
to retry, use: --limit #fel.retry
PLAY RECAP *********************************************************************
dev-vagrant1 : ok=0 changed=0 unreachable=1 failed=0
dev-piA : ok=106 changed=3 unreachable=0 failed=0
dev-piB : ok=77 changed=3 unreachable=0 failed=0
dev-piC : ok=77 changed=3 unreachable=0 failed=0
...
So, I ran it with -vvvvv and got nothing more useful:
...
TASK [xmppserver : Install openfire schema to postgres db] *********************
task path: /Users/al/Studio/Projects/smc/ansible/roles/xmppserver/tasks/openfire.yml:14
skipping: [dev-piA] => {"changed": false, "skip_reason": "Conditional check failed", "skipped": true}
to retry, use: --limit #fel.retry
PLAY RECAP *********************************************************************
dev-vagrant1 : ok=0 changed=0 unreachable=1 failed=0
dev-piA : ok=106 changed=2 unreachable=0 failed=0
dev-piB : ok=77 changed=3 unreachable=0 failed=0
dev-piC : ok=77 changed=3 unreachable=0 failed=0
...
HELP NEEDED
So, my question is: why does the playbook just stop there? What's going on?!
It doesn't actually explicitly say that there are no more hosts left for the audit_server play; that's my best guess. It just stops as if it hit an EOF.
I'm completely stumped.
Edit: NB: The retry file only contains a reference to the vagrant machine, which is currently off. But if the existence of that is the problem then Ansible's logic is very flawed. I'll check now just in case anyway
Edit: OMFG it actually IS the missing vagrant box, which has nothing to do with a goddamn thing. That's shocking and I'll raise it as an issue with Ansible. But... I'll leave this here in case anyone ever has the same problem and googles it.
Edit: For clarity, the vagrant machine is not in the host lists for any of the plays, except the special 'all' case.
Ansible aborts execution if every host in the play is unhealthy.
If dev-vagrant1 is the only member of audit_server group, this is the expected behavior (as we see dev-vagrant1 is marked as unreachable).
Nevertheless there should be a line PLAY [audit_server] ******** just before to retry, use...
Ansible folk got back to me and confirmed that they'd been working on a number of issues in this area for the 2.1.1 release.
I updated to 2.1.1.0 and it worked fine.

Resources