How to knife bootstrap ec2 private instances using NAT gateway - amazon-ec2

I am trying to bootstrap my private IP instance with its NAT Gateway using below command, but unable to bootstrap.
$knife bootstrap x.x.x.x --ssh-gateway x.x.x.x --ssh-user ec2-user --sudo --ssh-identity-file mypemfile.pem -N <nodename>
...
ERROR: Train::Transports::SSHFailed: SSH command failed (command timed out: ssh -o UserKnownHostsFile=/dev/null -o StrictHostKeyChecking=no -o IdentitiesOnly=yes -o BatchMode=yes -o LogLevel=ERROR -o ForwardAgent=no -i mypemfile.pem root#x.x.x.x -p 22 -W x.x.x.x:22)

Related

Tracing to see where Ansible hangs

My Ansible tasks hangs. I use -vvvv, but nevertheless I can't see any useful information.
<coffee-and-sugar.club> ESTABLISH SSH CONNECTION FOR USER: root
<coffee-and-sugar.club> SSH: EXEC ssh -vvv -C -o ControlMaster=auto -o ControlPersist=60s -o KbdInteractiveAuthentication=no -o PreferredAuthentications=gssapi-with-mic,gssapi-keyex,hostbased,publickey -o PasswordAuthentication=no -o 'User="root"' -o ConnectTimeout=10 -o ControlPath=/home/guettli/.ansible/cp/544631aae4 -tt coffee-and-sugar.club '/bin/sh -c '"'"'/usr/bin/python3 /root/.ansible/tmp/ansible-tmp-1608394831.3465264-205483640933119/AnsiballZ_pip.py && sleep 0'"'"''
What can I do to see what is going on?
Is there a way to enable tracing (like set -x in a shell script)?
You can execute the python script on the remote server by hand. In my case this revealed the root-cause.
Example:
ssh root#remote
# /usr/bin/python3 /root/.ansible/tmp/ansible-tmp-1608394831.3465264-205483640933119/AnsiballZ_pip.py
The authenticity of host 'github.com (140.82.121.3)' can't be established.
RSA key fingerprint is SHA256:nThbg6kXUpJWGl7E1IGOCspRomTxdCARLviKw6E5SY8.
Are you sure you want to continue connecting (yes/no/[fingerprint])?

How do i fix the Broken pipe error during Ansible Play

I am getting this below error while running my Ansible play. It was working perfectly fine till couple of days ago and suddenly started happening for this particular host. I don't know if some configuration change happened on this server but any idea what could be wrong?
The same play works fine for other environment like Prod.
Command
ansible-playbook -i my-inventory my-main.yml --tags=copyRepo-e my_release_version=5.0.0-4 -e target_env=preprod --ask-become-pass
I am able to ssh as well
server1 | success >> {
"changed": false,
"ping": "pong"
}
Error
<server1> ESTABLISH CONNECTION FOR USER: user1
<server1> REMOTE_MODULE file state=directory path=/opt/tomcat/releases/Release5.0.0-4/advancederrorsearch/app
<server1> EXEC ssh -C -tt -vvv -o ControlMaster=auto -o ControlPersist=60s -o ControlPath="/home/user1/.ansible/cp/ansible-ssh-%h-%p-%r" -o KbdInteractiveAuthentication=no -o PreferredAuthentications=gssapi-with-mic,gssapi-keyex,hostbased,publickey -o PasswordAuthentication=no -o ConnectTimeout=10 server1 /bin/sh -c 'mkdir -p /tmp/ansible-tmp-1586344775.71-121508053477718 && chmod a+rx /tmp/ansible-tmp-1586344775.71-121508053477718 && echo /tmp/ansible-tmp-1586344775.71-121508053477718'
<server1> PUT /tmp/tmp1enuT2 TO /tmp/ansible-tmp-1586344775.71-121508053477718/file
<server1> EXEC ssh -C -tt -vvv -o ControlMaster=auto -o ControlPersist=60s -o ControlPath="/home/user1/.ansible/cp/ansible-ssh-%h-%p-%r" -o KbdInteractiveAuthentication=no -o PreferredAuthentications=gssapi-with-mic,gssapi-keyex,hostbased,publickey -o PasswordAuthentication=no -o ConnectTimeout=10 server1 /bin/sh -c 'chmod a+r /tmp/ansible-tmp-1586344775.71-121508053477718/file'
<server1> EXEC ssh -C -tt -vvv -o ControlMaster=auto -o ControlPersist=60s -o ControlPath="/home/user1/.ansible/cp/ansible-ssh-%h-%p-%r" -o KbdInteractiveAuthentication=no -o PreferredAuthentications=gssapi-with-mic,gssapi-keyex,hostbased,publickey -o PasswordAuthentication=no -o ConnectTimeout=10 server1 /bin/sh -c 'sudo -k && sudo -H -S -p "[sudo via ansible, key=othmymdswpmqvimfnuimdtsuqtdboprm] password: " -u tomcat /bin/sh -c '"'"'echo BECOME-SUCCESS-othmymdswpmqvimfnuimdtsuqtdboprm; LANG=en_US.UTF-8 LC_CTYPE=en_US.UTF-8 /usr/bin/python /tmp/ansible-tmp-1586344775.71-121508053477718/file'"'"''
<server1> EXEC ssh -C -tt -vvv -o ControlMaster=auto -o ControlPersist=60s -o ControlPath="/home/user1/.ansible/cp/ansible-ssh-%h-%p-%r" -o KbdInteractiveAuthentication=no -o PreferredAuthentications=gssapi-with-mic,gssapi-keyex,hostbased,publickey -o PasswordAuthentication=no -o ConnectTimeout=10 server1 /bin/sh -c 'rm -rf /tmp/ansible-tmp-1586344775.71-121508053477718/ >/dev/null 2>&1'
failed: [server1] => (item=advancederrorsearch) => {"failed": true, "item": "advancederrorsearch", "parsed": false}
BECOME-SUCCESS-othmymdswpmqvimfnuimdtsuqtdboprm
couldn't set locale correctly
couldn't set locale correctly
debug1: mux_client_request_session: master session id: 2
debug3: mux_client_read_packet: read header failed: Broken pipe
debug2: Received exit status from master 0
Shared connection to server1 closed.

Q: IBM Cloud Private CE - fatal: [9.29.100.159] => The Etcd component failed to start

First install of ICP CE 2.1.0 on Ubuntu 16.04.03 VM running on ESXi5.5. The VM has 4vCPU with 16GB ram and 170GB (small I know). The install runs 10 min and fails. I ran the install with the -vvv and it's doesn't really provide any significant insights.
TASK [master : Waiting for Etcd to start] **************************************
task path: /installer/playbook/roles/master/tasks/kube-service.yaml:6
Using module file /installer/playbook/library/cfc_wait_for.py
<9.29.100.159> ESTABLISH SSH CONNECTION FOR USER: root
<9.29.100.159> SSH: EXEC ssh -C -o CheckHostIP=no -o LogLevel=ERROR -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -o 'IdentityFile="cluster/ssh_key"' -o KbdInteractiveAuthentication=no -o PreferredAuthentications=gssapi-with-mic,gssapi-keyex,hostbased,publickey -o PasswordAuthentication=no -o User=root -o ConnectTimeout=10 9.29.100.159 '/bin/bash -c '"'"'echo ~ && sleep 0'"'"''
<9.29.100.159> (0, '/root\n', '')
<9.29.100.159> ESTABLISH SSH CONNECTION FOR USER: root
<9.29.100.159> SSH: EXEC ssh -C -o CheckHostIP=no -o LogLevel=ERROR -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -o 'IdentityFile="cluster/ssh_key"' -o KbdInteractiveAuthentication=no -o PreferredAuthentications=gssapi-with-mic,gssapi-keyex,hostbased,publickey -o PasswordAuthentication=no -o User=root -o ConnectTimeout=10 9.29.100.159 '/bin/bash -c '"'"'( umask 77 && mkdir -p "` echo /root/.ansible/tmp/ansible-tmp-1511385912.24-67181235419067 `" && echo ansible-tmp-1511385912.24-67181235419067="` echo /root/.ansible/tmp/ansible-tmp-1511385912.24-67181235419067 `" ) && sleep 0'"'"''
<9.29.100.159> (0, 'ansible-tmp-1511385912.24-67181235419067=/root/.ansible/tmp/ansible-tmp-1511385912.24-67181235419067\n', '')
<9.29.100.159> PUT /tmp/tmp_LQQz6 TO /root/.ansible/tmp/ansible-tmp-1511385912.24-67181235419067/cfc_wait_for.py
<9.29.100.159> SSH: EXEC sftp -b - -C -o CheckHostIP=no -o LogLevel=ERROR -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -o 'IdentityFile="cluster/ssh_key"' -o KbdInteractiveAuthentication=no -o PreferredAuthentications=gssapi-with-mic,gssapi-keyex,hostbased,publickey -o PasswordAuthentication=no -o User=root -o ConnectTimeout=10 '[9.29.100.159]'
<9.29.100.159> (0, 'sftp> put /tmp/tmp_LQQz6 /root/.ansible/tmp/ansible-tmp-1511385912.24-67181235419067/cfc_wait_for.py\n', '')
<9.29.100.159> ESTABLISH SSH CONNECTION FOR USER: root
<9.29.100.159> SSH: EXEC ssh -C -o CheckHostIP=no -o LogLevel=ERROR -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -o 'IdentityFile="cluster/ssh_key"' -o KbdInteractiveAuthentication=no -o PreferredAuthentications=gssapi-with-mic,gssapi-keyex,hostbased,publickey -o PasswordAuthentication=no -o User=root -o ConnectTimeout=10 9.29.100.159 '/bin/bash -c '"'"'chmod u+x /root/.ansible/tmp/ansible-tmp-1511385912.24-67181235419067/ /root/.ansible/tmp/ansible-tmp-1511385912.24-67181235419067/cfc_wait_for.py && sleep 0'"'"''
<9.29.100.159> (0, '', '')
<9.29.100.159> ESTABLISH SSH CONNECTION FOR USER: root
<9.29.100.159> SSH: EXEC ssh -C -o CheckHostIP=no -o LogLevel=ERROR -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -o 'IdentityFile="cluster/ssh_key"' -o KbdInteractiveAuthentication=no -o PreferredAuthentications=gssapi-with-mic,gssapi-keyex,hostbased,publickey -o PasswordAuthentication=no -o User=root -o ConnectTimeout=10 -tt 9.29.100.159 '/bin/bash -c '"'"'/usr/bin/python /root/.ansible/tmp/ansible-tmp-1511385912.24-67181235419067/cfc_wait_for.py; rm -rf "/root/.ansible/tmp/ansible-tmp-1511385912.24-67181235419067/" > /dev/null 2>&1 && sleep 0'"'"''
<9.29.100.159> (0, '\r\n{"msg": "The Etcd component failed to start. For more details, see https://ibm.biz/etcd-fails.", "failed": true, "elapsed": 1965, "invocation": {"module_args": {"active_connection_states": ["ESTABLISHED", "SYN_SENT", "SYN_RECV", "FIN_WAIT1", "FIN_WAIT2", "TIME_WAIT"], "state": "started", "port": 4001, "delay": 0, "msg": "The Etcd component failed to start. For more details, see https://ibm.biz/etcd-fails.", "host": "9.29.100.159", "sleep": 1, "timeout": 600, "exclude_hosts": null, "search_regex": null, "path": null, "connect_timeout": 5}}}\r\n', 'Connection to 9.29.100.159 closed.\r\n')
fatal: [9.29.100.159] => The Etcd component failed to start. For more details, see https://ibm.biz/etcd-fails.
The link https://ibm.biz/etcd-fails takes you to a 1.2.0 Knowledge Center entry about flannel fails to start on worker node.
Whats odd is a docker ps shows that etcd is running
root#sysicpce:~# docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
652aab0c1cee ibmcom/mariadb "start.sh docker-e..." 17 hours ago Up 17 hours k8s_mariadb_k8s-mariadb-9.29.100.159_kube-system_3b21d2ed8c3e2047c0e457af0e948b97_0
80201425a077 ibmcom/etcd "etcd --name=etcd0..." 17 hours ago Up 17 hours k8s_etcd_k8s-etcd-9.29.100.159_kube-system_b674f0dc7c07780868387aaea0ba7acc_0
a5be8a1e0c25 ibmcom/pause:3.0 "/pause" 17 hours ago Up 17 hours k8s_POD_k8s-mariadb-9.29.100.159_kube-system_3b21d2ed8c3e2047c0e457af0e948b97_0
d82b0c6e5fa0 ibmcom/pause:3.0 "/pause" 17 hours ago Up 17 hours k8s_POD_k8s-etcd-9.29.100.159_kube-system_b674f0dc7c07780868387aaea0ba7acc_0
6574c3760499 ibmcom/kubernetes "/hyperkube proxy ..." 18 hours ago Up 18 hours k8s_proxy_k8s-proxy-9.29.100.159_kube-system_708dfdafb2a5d66e99356e10e609f6b1_0
3b4621d57fef ibmcom/pause:3.0 "/pause" 18 hours ago Up 18 hours k8s_POD_k8s-proxy-9.29.100.159_kube-system_708dfdafb2a5d66e99356e10e609f6b1_0
root#sysicpce:~#
How can I resolve this? Where can/should I look next?
Based on the installation requirements, if you have all management services running in your single host cluster, you will need at least 8 core CPUs. If you have less than that, you can disable management services .e.g. metering and monitoring as like disabled_management_services: ["metering", "monitoring"] in config.yaml file. As you have 4 core CPU, you can disable these services in config.ymal and can try the installation again.
I had the same problem, with ICP CE 2.1.0 on Ubuntu 16.04, KVM/OpenStack. Same message: "The Etcd component failed to start"
The problem went away when I added a rule to allow access from 127.0.0.1 to port 4001 on the ICP machine.

ansible unable to connect to centos

When i try connecting to CentOS server, i get following error
boby#hon-pc-01:~/www/ansible $ ansible centos -vvv -i hosts -a "uname -a"
Using /home/boby/www/ansible/ansible.cfg as config file
<root#209.236.74.192:3333> ESTABLISH SSH CONNECTION FOR USER: root
<root#209.236.74.192:3333> SSH: EXEC ssh -C -q -o ControlMaster=auto -o ControlPersist=60s -o KbdInteractiveAuthentication=no -o PreferredAuthentications=gssapi-with-mic,gssapi-keyex,hostbased,publickey -o PasswordAuthentication=no -o User=root -o ConnectTimeout=10 -o ControlPath=/home/boby/.ansible/cp/ansible-ssh-%h-%p-%r -tt root#209.236.74.192:3333 'mkdir -p "$( echo $HOME/.ansible/tmp/ansible-tmp-1484629049.5-55764328572466 )" && echo "$( echo $HOME/.ansible/tmp/ansible-tmp-1484629049.5-55764328572466 )"'
root#209.236.74.192:3333 | UNREACHABLE! => {
"changed": false,
"msg": "ERROR! SSH encountered an unknown error during the connection. We recommend you re-run the command using -vvvv, which will enable SSH debugging output to help diagnose the issue",
"unreachable": true
}
boby#hon-pc-01:~/www/ansible $
I am able to connect Debian server with out any issue
boby#hon-pc-01:~/www/ansible $ ansible ubuntu -vvv -i hosts -a "uname -a"
Using /home/boby/www/ansible/ansible.cfg as config file
<vm705n> ESTABLISH SSH CONNECTION FOR USER: root
<vm705n> SSH: EXEC ssh -C -q -o ControlMaster=auto -o ControlPersist=60s -o Port=3333 -o KbdInteractiveAuthentication=no -o PreferredAuthentications=gssapi-with-mic,gssapi-keyex,hostbased,publickey -o PasswordAuthentication=no -o User=root -o ConnectTimeout=10 -o ControlPath=/home/boby/.ansible/cp/ansible-ssh-%h-%p-%r -tt vm705n 'mkdir -p "$( echo $HOME/.ansible/tmp/ansible-tmp-1484629067.62-202068262196976 )" && echo "$( echo $HOME/.ansible/tmp/ansible-tmp-1484629067.62-202068262196976 )"'
<vm705n> PUT /tmp/tmpWzw_nH TO /root/.ansible/tmp/ansible-tmp-1484629067.62-202068262196976/command
<vm705n> SSH: EXEC sftp -b - -C -o ControlMaster=auto -o ControlPersist=60s -o Port=3333 -o KbdInteractiveAuthentication=no -o PreferredAuthentications=gssapi-with-mic,gssapi-keyex,hostbased,publickey -o PasswordAuthentication=no -o User=root -o ConnectTimeout=10 -o ControlPath=/home/boby/.ansible/cp/ansible-ssh-%h-%p-%r '[vm705n]'
<vm705n> ESTABLISH SSH CONNECTION FOR USER: root
<vm705n> SSH: EXEC ssh -C -q -o ControlMaster=auto -o ControlPersist=60s -o Port=3333 -o KbdInteractiveAuthentication=no -o PreferredAuthentications=gssapi-with-mic,gssapi-keyex,hostbased,publickey -o PasswordAuthentication=no -o User=root -o ConnectTimeout=10 -o ControlPath=/home/boby/.ansible/cp/ansible-ssh-%h-%p-%r -tt vm705n 'LANG=en_IN LC_ALL=en_IN LC_MESSAGES=en_IN /usr/bin/python /root/.ansible/tmp/ansible-tmp-1484629067.62-202068262196976/command; rm -rf "/root/.ansible/tmp/ansible-tmp-1484629067.62-202068262196976/" > /dev/null 2>&1'
vm705n | SUCCESS | rc=0 >>
Linux hon-vpn 3.16.0-4-amd64 #1 SMP Debian 3.16.7-ckt25-2 (2016-04-08) x86_64 GNU/Linux
boby#hon-pc-01:~/www/ansible $
Here is my hosts file
boby#hon-pc-01:~/www/ansible $ cat hosts
[ubuntu]
vm705n:3333
[centos]
root#209.236.74.192:3333
boby#hon-pc-01:~/www/ansible $
Any idea why it is not working for CentOS 6 server ?
EDIT
I got it fixed. The problem was root# in hosts file. For some reason, the SSH command did not take port 3333 because root# present on host file.
The problem was in hosts file.
boby#hon-pc-01:~/www/ansible $ cat hosts
[ubuntu]
vm705n:3333
[centos]
root#209.236.74.192:3333
boby#hon-pc-01:~/www/ansible $
Replaced root#209.236.74.192:3333 with 209.236.74.192:3333 and it started working.

Ansible not picking up proxy settings

I am trying to run an Ansible job on a remote host. But for that to happen, I need to go through a proxy.
Proxy server is: 142.133.134.161
Proxy port is: 1088
My playbook is simple for now:
---
- hosts: LAB1
tasks:
- name: Copy file
template: src=/tmp/file1 dest=/tmp/file1
My environment file is:
[LAB1]
10.169.99.189
10.169.99.190
My ansible.cfg file is:
Host 10.169.99.*
ProxyCommand nc -x 142.133.134.161:1088 %h %p
But when I run a job, it says "Connection timed out":
[root#vm1 ANSIBLE]# ansible -i /root/ANSIBLE/env/target LAB1 -m ping
10.169.99.190 | FAILED => SSH Error: ssh: connect to host 10.169.99.190 port 22: Connection timed out
while connecting to 10.169.99.190:22
It is sometimes useful to re-run the command using -vvvv, which prints SSH debug output to help diagnose the issue.
10.169.99.189 | FAILED => SSH Error: ssh: connect to host 10.169.99.189 port 22: Connection timed out
while connecting to 10.169.99.189:22
It is sometimes useful to re-run the command using -vvvv, which prints SSH debug output to help diagnose the issue.
When I run this in debug mode:
[root#vm1 ANSIBLE]# ansible -i /root/ANSIBLE/env/target LAB1 -m ping -vvvvv
<10.169.99.190> ESTABLISH CONNECTION FOR USER: msdp
<10.169.99.190> REMOTE_MODULE ping
<10.169.99.189> ESTABLISH CONNECTION FOR USER: msdp
<10.169.99.189> REMOTE_MODULE ping
<10.169.99.190> EXEC sshpass -d8 ssh -C -tt -vvv -o ControlMaster=auto -o ControlPersist=60s -o ControlPath="/root/.ansible/cp/ansible-ssh-%h-%p-%r" -o StrictHostKeyChecking=no -o GSSAPIAuthentication=no -o PubkeyAuthentication=no -o User=msdp -o ConnectTimeout=10 10.169.99.190 /bin/sh -c 'mkdir -p $HOME/.ansible/tmp/ansible-tmp-1473612082.62-116308097993503 && echo $HOME/.ansible/tmp/ansible-tmp-1473612082.62-116308097993503'
<10.169.99.189> EXEC sshpass -d9 ssh -C -tt -vvv -o ControlMaster=auto -o ControlPersist=60s -o ControlPath="/root/.ansible/cp/ansible-ssh-%h-%p-%r" -o StrictHostKeyChecking=no -o GSSAPIAuthentication=no -o PubkeyAuthentication=no -o User=msdp -o ConnectTimeout=10 10.169.99.189 /bin/sh -c 'mkdir -p $HOME/.ansible/tmp/ansible-tmp-1473612082.63-269107268980760 && echo $HOME/.ansible/tmp/ansible-tmp-1473612082.63-269107268980760'
10.169.99.189 | FAILED => SSH Error: ssh: connect to host 10.169.99.189 port 22: Connection timed out
while connecting to 10.169.99.189:22
It is sometimes useful to re-run the command using -vvvv, which prints SSH debug output to help diagnose the issue.
10.169.99.190 | FAILED => SSH Error: ssh: connect to host 10.169.99.190 port 22: Connection timed out
while connecting to 10.169.99.190:22
It is sometimes useful to re-run the command using -vvvv, which prints SSH debug output to help diagnose the issue.
This does not indicate that it is using the Proxy. Is that the issue here?
Given your ProxyCommand syntax is correct and you want to include it in the ansible.cfg, the correct syntax would be to add an argument to the ssh_args in the [ssh_connection] section of the file:
[ssh_connection]
ssh_args = -o ForwardAgent=yes -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -o IdentitiesOnly=yes -o ControlMaster=auto -o ControlPersist=60s -o ProxyCommand="nc -x 142.133.134.161:1088 %h %p"

Resources