Ansible ssh connection fails with "Failed to login: Connection refused" - ansible

I have one esxi host. I need to get some data for which i need to have linux on the same host. So i reboot the host with liveboot with rhel7.4. Perform some operations and then again i reboot the host with local boot.
So the problem is when the second boot has happened im not able to perform the tasks it is failing for ssh connection as follows
"stderr_lines": [
"Failed to login: Connection refused: The remote service is not running, OR is overloaded, OR a firewall is rejecting connections."
]
Login credential for both the os is same.
If i skip the middle reboot for linux os no error is occurred.
After each reboot i keep one check task as follow
- name: Wait for system to boot up
local_action: wait_for host="{{ host_name }}" port=22 state=started delay=25 timeout=3600
become: False
This is my ansible.cfg file
[defaults]
host_key_checking=False
[paramiko-connection]
record_host_keys=False
[ssh_connection]
ssh_args = -o ControlMaster=auto -o ControlPersist=60s -o UserKnownHostsFile=/dev/null
Am i missing something ?

You are using ssh master sessions...
-o ControlMaster=auto -o ControlPersist=60s
...but you are connecting to the same host (same name and same ip) after a reboot which changes the underlying OS, so most probably with a change of host keys between the two OS. This is a perfect situation to get an existing master session hanging.
I would first disable ssh master session and see if it solves your issue. If it does and you still want to take advantage of master sessions for performance, you can re-enable them and perform a master session cleanup on the localhost right after you initiate the reboot. Simply remove the socket file in dir declared as ssh ControlPath

Related

Vscode SSH Jump Failed with macOS

I can connect to the first server with key but cannot connet to the second server jumping with the first server. I doubt it is a bug on MacOS because I can jump to the seceond server with command line. Is there ANYONE knows what happened here?
Here is the config:
Host comp
HostName xx.xx.xxx.xxx
User xxxx
Port 22
IdentityFile ***************
Host local
HostName 127.0.0.1
Port ****
User xxxx
ProxyCommand ssh -q -x -W %h:%p comp
IdentityFile ***************
Here is the error information:
[19:55:48.660] Log Level: 2
[19:55:48.662] remote-ssh#0.55.0
[19:55:48.662] darwin x64
[19:55:48.663] SSH Resolver called for "ssh-remote+localhost", attempt 1
[19:55:48.663] SSH Resolver called for host: localhost
[19:55:48.663] Setting up SSH remote "localhost"
[19:55:48.669] Acquiring local install lock: /var/folders/5q/****************_tr0000gn/T/vscode-remote-ssh-localhost-install.lock
[19:55:48.713] Looking for existing server data file at /Users/gy/Library/Application Support/Code/User/globalStorage/ms-vscode-remote.remote-ssh/vscode-ssh-host-localhost-************************************-0.55.0/data.json
[19:55:48.742] Using commit id "***********************************" and quality "stable" for server
[19:55:48.743] Install and start server if needed
[19:55:48.779] Checking ssh with "ssh -V"
[19:55:48.854] > OpenSSH_8.1p1, LibreSSL 2.7.3
[19:55:48.860] Using SSH config file "/Users/gy/.ssh/config/vscodeconfig"
[19:55:48.861] askpass server listening on /var/folders/5q/******************_tr0000gn/T/vscode-ssh-askpass-**********************************.sock
[19:55:48.862] Spawning local server with {"ipcHandlePath":"/var/folders/5q/**************_tr0000gn/T/vscode-ssh-askpass-********************************.sock","sshCommand":"ssh","sshArgs":["-v","-T","-D","54815","-o","ConnectTimeout=15","-F","/Users/gy/.ssh/config/vscodeconfig","localhost"],"dataFilePath":"/Users/gy/Library/Application Support/Code/User/globalStorage/ms-vscode-remote.remote-ssh/vscode-ssh-host-localhost-*********************************-0.55.0/data.json"}
[19:55:48.862] Local server env: {"DISPLAY":"1","ELECTRON_RUN_AS_NODE":"1","SSH_ASKPASS":"/Users/gy/.vscode/extensions/ms-vscode-remote.remote-ssh-0.55.0/out/local-server/askpass.sh","VSCODE_SSH_ASKPASS_NODE":"/Applications/Visual Studio Code.app/Contents/Frameworks/Code Helper (Renderer).app/Contents/MacOS/Code Helper (Renderer)","VSCODE_SSH_ASKPASS_MAIN":"/Users/gy/.vscode/extensions/ms-vscode-remote.remote-ssh-0.55.0/out/askpass-main.js","VSCODE_SSH_ASKPASS_HANDLE":"/var/folders/5q/********************_tr0000gn/T/vscode-ssh-askpass-**********************************.sock"}
[19:55:48.871] Spawned 34492
[19:55:48.987] > local-server> Spawned ssh: 34493
[19:55:49.008] stderr> OpenSSH_8.1p1, LibreSSL 2.7.3
[19:55:50.129] stderr> kex_exchange_identification: Connection closed by remote host
[19:55:50.131] > local-server> ssh child died, shutting down
[19:55:50.136] Local server exit: 0
[19:55:50.136] Received install output: OpenSSH_8.1p1, LibreSSL 2.7.3
kex_exchange_identification: Connection closed by remote host
[19:55:50.137] Stopped parsing output early. Remaining text: OpenSSH_8.1p1, LibreSSL 2.7.3kex_exchange_identification: Connection closed by remote host
[19:55:50.137] Failed to parse remote port from server output
[19:55:50.141] Resolver error: Error:
at Function.Create (/Users/gy/.vscode/extensions/ms-vscode-remote.remote-ssh-0.55.0/out/extension.js:1:130564)
at Object.t.handleInstallOutput (/Users/gy/.vscode/extensions/ms-vscode-remote.remote-ssh-0.55.0/out/extension.js:1:127671)
at Object.t.tryInstallWithLocalServer (/Users/gy/.vscode/extensions/ms-vscode-remote.remote-ssh-0.55.0/out/extension.js:127:102339)
at processTicksAndRejections (internal/process/task_queues.js:94:5)
at async /Users/gy/.vscode/extensions/ms-vscode-remote.remote-ssh-0.55.0/out/extension.js:127:104310
at async Object.t.withShowDetailsEvent (/Users/gy/.vscode/extensions/ms-vscode-remote.remote-ssh-0.55.0/out/extension.js:127:109845)
at async /Users/gy/.vscode/extensions/ms-vscode-remote.remote-ssh-0.55.0/out/extension.js:127:100912
at async R (/Users/gy/.vscode/extensions/ms-vscode-remote.remote-ssh-0.55.0/out/extension.js:127:97702)
at async Object.t.resolveWithLocalServer (/Users/gy/.vscode/extensions/ms-vscode-remote.remote-ssh-0.55.0/out/extension.js:127:100561)
at async Object.t.resolve (/Users/gy/.vscode/extensions/ms-vscode-remote.remote-ssh-0.55.0/out/extension.js:127:107840)
at async /Users/gy/.vscode/extensions/ms-vscode-remote.remote-ssh-0.55.0/out/extension.js:127:141955
[19:55:50.143] ------
Could someone enlighten me on the reason for the problem or on a possible solution ? Thanks !
I have the similar problem.
Later it disappears after I turn off the agent server.
Maybe you can try this.

Ansible: connect to a remote host via proxy

I'm working on a playbook that needs to connect to a couple of different servers through a proxy.
I was able to test the connection using putty and the proxy.
Basically, went to connections --> proxy, then select HTTP and added the proxy host.
But, I was not able to reproduce it with SSH from the Ansible server.
I tried different ssh commands:
ssh -L jumphost.example.org:80 fred#server.example.org -p 443
ssh -J jumphost.example.org:80 fred#server.example.org
ssh -o ProxyCommand="ssh -W %h:%p jumphost.example.org" server.example.org
ssh -tt jumphost.example.org ssh -tt server.example.org
I know there are different options that use nc but I didn't try them, because its not installed on the server.
Is there any way to connect to the remote host in ansible using the proxy?
Thanks
Can you try this ?
- hosts: all
remote_user: root
tasks:
- name: Install cobbler
package:
name: cobbler
state: present
environment:
http_proxy: http://proxy.example.com:8080
from: playbooks_environment

How to connect to WIndows node using openSSH and Ansible?

I am trying to connect to my windows computer using OpenSSH and Ansible.
I am able to connect using regular ssh, but when I try to connect using Ansible, I get pretty much the same error everytime I change something.
I've also tried running Ansible as root and still nothing
fatal: [IVU]: UNREACHABLE! => {"changed": false, "msg": "Authentication or permission failure. In some cases, you may have been able to authenticate and did not have permissions on the remote directory. Consider changing the remote temp path in ansible.cfg to a path rooted in \"/tmp\". Failed command was: ( umask 77 && mkdir -p \"` echo /tmp/ansible-tmp-1502794936.2073953-164132649383245 `\" && echo ansible-tmp-1502794936.2073953-164132649383245=\"` echo /tmp/ansible-tmp-1502794936.2073953-164132649383245 `\" ), exited with result 1", "unreachable": true}
I've tried to change the ssh_args in ansible.cfg to ssh_args= -o ControlMaster=no and no change to the output was made.
I've tried to change the executable in the ansible.cfg to C:/Windows/System32/cmd.exe and I got the same error
I've tried changing the remote_dir=/tmp/ and still nothing.
My ansible inventory is:
[IVU]
IVU ansible_host=**IP**
[IVU:vars]
ansible_port=22
ansible_user=**user**
ansible_ssh_pass=**pass**
ansible_ssh_private_key_file=** Keyfile **
It seems like it's failing before even running any tasks, but from the openssh logs on the windows computer I see no difference when ansible connects to it and when I ssh into it.
3724 09:27:38:720 error: Couldn't create pid file "C:\\Program Files\\OpenSSH\\sshd.pid": Permission denied
3724 09:27:41:376 Accepted publickey for **User** from **IP** port 42700 ssh2: RSA SHA256:clNmiKxygl/TLEb5Ob4lZs6JqztoQyxOsjMoHQ2HYgo
3724 09:27:58:533 Received disconnect from **IP** port 42700:11: disconnected by user
3724 09:27:58:533 Disconnected from user **User** **IP** port 42700
3360 09:28:41:398 error: Couldn't create pid file "C:\\Program Files\\OpenSSH\\sshd.pid": Permission denied
3360 09:28:41:616 Accepted publickey for **User** from **IP** port 42704 ssh2: RSA SHA256:clNmiKxygl/TLEb5Ob4lZs6JqztoQyxOsjMoHQ2HYgo
3360 09:28:41:741 Received disconnect from **IP** port 42704:11: disconnected by user
3360 09:28:41:741 Disconnected from user **User** **IP** port 42704
The 9:27 is when I'm connecting using ssh and the 9:28 is when ansible connects.
Is there something I'm missing that I need to change in order for Ansible to work with openSSH on windows?
I figured out a solution by using a reverse ssh tunnel.
I abandoned the idea of trying to use the ssh ansible module with windows since Windows simply doesn't play nicely with it unless you have the windows 10 update. I decided to use the winrm ansible module instead.
What I did is I connected the windows computer to the computer running Ansible by opening a reverse SSH tunnel by using the command:
ssh -p5983 -R 5982:localhost:5986 **my_user**#**my_ip**
For my purposes I had to port forward because my computer was on a separate vlan than the windows computer
Then in Ansible I specified that the host is localhost at port 5982.
This is about as good of a solution for when working with openssh and windows, at least until Ansible supports openssh on windows.

How can docker-machine create an EC2 instance in a private subnet?

I have a bastion host in the public subnet through which I usually access the hosts in the private subnet. When I create a docker machine in the private subnet with the command below, it does not complete.
export server_name=tomcat-5
docker-machine create \
--driver amazonec2 \
--amazonec2-region us-west-2 \
--amazonec2-vpc-id vpc-8e5488ea \
--amazonec2-ami ami-6f69a25f \
--amazonec2-instance-type m3.medium \
--amazonec2-zone b \
--amazonec2-subnet-id subnet-52f5dd54 \
--amazonec2-security-group tomcat-sg-SecurityGroup-JHHNDKKL4LO1 \
--amazonec2-tags Name,${server_name} \
--amazonec2-root-size 10 \
--amazonec2-ssh-user ec2-user \
--amazonec2-ssh-keypath ~/.ssh/id_rsa \
--amazonec2-private-address-only \
${server_name}
It says
Running pre-create checks...
Creating machine...
(tomcat-5) Launching instance...
Waiting for machine to be running, this may take a few minutes...
Detecting operating system of created instance...
Waiting for SSH to be available...
and after that it just hangs for ever. Obviously it does not know how to get to the server via the bastion. And I cannot name the server so that docker can leverage the .ssh/config (if it will do that).
It is hard to imagine that others have not run into it. I ultimately plan to bring up these servers using docker compose. So if I can do that without docker-machine, that is fine too.
What am I missing?
I was able to get more information about the problem by turning on debug. Essentially
docker-machine --debug....
This allowed me to see that docker-machine was trying to ssh into an IP with ec2-user#10.x.y.z with these parameters
{[-F /dev/null -o PasswordAuthentication=no -o StrictHostKeyChecking=no
-o UserKnownHostsFile=/dev/null -o LogLevel=quiet -o ConnectionAttempts=3
-o ConnectTimeout=10 -o ControlMaster=no -o ControlPath=none ec2-user#10.x.y.z
-o IdentitiesOnly=yes -i /Users/jvarg/.docker/machine/machines/tomcat-2/id_rsa
-p 22] /usr/bin/ssh <nil>}
Getting closer. I was able to directly ssh into the machine by updating my ~/.ssh/config by creating a proxy for my subnet. But docker machine is not using that config file. As you can see it use /dev/null. "-F /dev/null".
I looked in the docker machine code and this seems to be hardcoded. https://github.com/docker/machine/blob/df2d3811ca8bc9ddf6896b4a4154b9277826b441/libmachine/ssh/client.go#L69
I have created an issue on the github to follow up. https://github.com/docker/machine/issues/3794
Update: While we wait for that PR to be accepted (now stuck with merge conflicts) here is a lame workaround. Login to bastion and use that as an orchestrator instead of your laptop. On the bastion install docker and docker-machine. Also make sure you create a new key-pair there so as not to compromise your own. I did not install ssh agent. So if you go the same way, make sure the key pair has no pass phrase. You will be able to finally ssh into it. But it will then fail with
notifying bugsnag: [Error creating machine: Error running provisioning: exit status 1]
While it is not obvious from that error, a detailed analysis of the debug output showed that it was failing because the new machine was not able to get out to the internet. This is usually not an issue for most of you. In my company's case we use an http_proxy. But I resolved this by setting up a NAT gateway.
The next error was because the bastion could not communicate on port 2376 with the new machine. Normally docker-machine creates a security group with 2376 open to the world. My company frowns on open-to-the-world ports. So I had updated my SG to allow access from the bastion. But I guess I need to tweak it.
I installed an OpenVPN server on the bastion (actually a NAT gateway). Then my client connects to the OpenVPN server. The OpenVPN server pushes a route allowing the client to access the private subnet work on the VPC.
This works transparently. I can use docker-machine to create nodes on the private network. I can even start docker on my client and join an existing swarm.

Setup passphraseless ssh to localhost on OS X

I'm trying to get Hadoop's Pseudo-Distributed Operation example (http://hadoop.apache.org/common/docs/stable/single_node_setup.html) to work on OS X Lion, but am having trouble getting the ssh to work without a passphrase.
The instructions say the following:
Setup passphraseless ssh
Now check that you can ssh to the localhost without a passphrase: $
ssh localhost
I'm getting connection refused:
archos:hadoop-0.20.203.0 travis$ ssh localhost
ssh: connect to host localhost port 22: Connection refused
If you cannot ssh to localhost without a passphrase, execute the
following commands:
$ ssh-keygen -t dsa -P '' -f ~/.ssh/id_dsa
$ cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys
After this step I am still getting connection refused. Any ideas???
Sounds like you don't have SSH enabled. Should be in the network settings control panel somewhere.
You go to "System Preferences > Sharing > Remote Access" and there's a list of authorized users. Change it to "All Users".
That's solves this problem.
Check the permissions on your .ssh directory. Some ssh implementations require that the directory be chmod 700. Otherwise, they just ignore it.
Also, check the output of
ssh -v localhost
to see how the ssh client is trying to connect. The output is very detailed, and will help you decide if it's an authentication problem.
I had the same issue.
Please check if the ssh server is running or not.
If yes, open the /etc/init.d/ssh_config and /etc/init.d/sshd_config files. The issue is that the server is running on a different port and the client is pointing to different port.
Before this please ensure that openssh-server and client are installed.
I had the same problem and i solved it the following manner :
SSH is activated.
ssh -v localhost (as stated by Herko)
In the ouput, i identified that the authentication method by DSA is not supported.
debug1: Skipping ssh-dss key /Users/john/.ssh/id_dsa - not in PubkeyAcceptedKeyTypes
I simply re-generate an ECDSA keys and remove the DSA key pairs.
After the keys generation, the procedure given on Hadoop documentation holds.
Therefore, it is important to check, if the authentication method is supported by the Openssh configuration.

Resources