Could not connect VM on port 22 in Google cloud - hadoop

I have installed hadoop (HDP ) in the google cloud vm instance , after sometime when i tried to connect the machine again, it is showing error :
"We are unable to connect to VM on port 22" .

To get additional debug logs try to SSH using verbose flag using the following command:
$ gcloud compute ssh INSTANCE_NAME --zone ZONE --ssh-flag="-vvv"
If the above step doesn’t help, connect to the instance using the serial console of the affected instance and check if this issue has to do with open port as Abhinav mentioned.
You find additional SSH troubleshooting information in the Help Center Article.

Related

Unable to create new docker instances with docker-machine

I am using AWS with docker-machine to create and provision my instances. I would use this command to create a new instance:
docker-machine create --driver amazonec2 --amazonec2-instance-type "t2.micro" --amazonec2-security-group zhxw-production-sg zhxw-production-3
About a month ago, that worked fine. I just went to create a fresh machine, and I can no longer connect to it. When I run the above command, it gets stuck on "waiting for SSH to be available..."
Running pre-create checks...
Creating machine...
(zhxw-production-3) Launching instance...
Waiting for machine to be running, this may take a few minutes...
Detecting operating system of created instance...
Waiting for SSH to be available...
It just hangs at that point. If I cancel the command, and check the AWS EC2 console, it suggests that it's running:
When I run docker-machine ls, it also suggests that it's running, but with errors:
$-> docker-machine ls
NAME ACTIVE DRIVER STATE URL SWARM DOCKER ERRORS
zhxw-production-2 - amazonec2 Running tcp://3.86.xxx.xxx:2376 v19.03.12
zhxw-production-3 - amazonec2 Running tcp://54.167.xxx.xxx:2376 Unknown Unable to query docker version: Cannot connect to the docker engine endpoint
I'm able to connect to the zhxw-production-2 machine (which has been running for a month). Just not the new one zhxw-production-3 one I just launched.
$-> docker-machine env zhxw-production-3
Error checking TLS connection: Error checking and/or regenerating the certs: There was an error validating certificates for host "54.167.123.108:2376": dial tcp 54.167.123.108:2376: connect: connection refused
You can attempt to regenerate them using 'docker-machine regenerate-certs [name]'.
Be advised that this will trigger a Docker daemon restart which might stop running containers.
The regenerate-certs command doesn't help either. I'm not really sure where to start debugging, because as far as I can tell, the docker-machine create command is the very beginning.
Turned out to be a problem with SSH to my AWS environment. I had my public IP address whitelisted, but it had changed.
I came across a problem like this and I found out that the AWS EC2 AMI did not have SSH installed, so I had to use different AMI, eg. Ubuntu.
I went through the same problem recently and found that the cause was the public ip change when I enabled elastic ip on the machine. I don't know if this is your case. Maybe my solution will help you or help others. He follows:
usually the file path is: /User/<name_your_user>/.docker/machine/<name_machine_ploblem>
edit parameter value: "IPAdress"
After making the change, run the command: docker-machine regenerate-certs <name_instance_ec2>
With these procedures, my problem was solved. I hope it helps! hug to everyone.

SSH timeout error on Azure DevOps CD pipeline

I am getting an timeout error when trying to deploy to an VM instance hosted on AWS. Manually I can log ing using
ssh -i myKeyFile.pem myuser#IP
Once I accessed the remote machine I can execute some docker commands and everything works fine. But now that I need to automated that on the CD pipeline is where I am getting the following error:
2020-06-02T21:37:12.6877276Z Trying to establish an SSH connection to ***#IP:port
2020-06-02T21:38:52.4629461Z ##[error]Failed to connect to remote machine. Verify the SSH service connection details. Error: Error: Timed out while waiting for handshake.
2020-06-02T21:38:52.4685976Z ##[section]Finishing: Run shell commands on remote machine
The steps I follow to make the SSH connection are:
I created a SSH service connection on the project settings in Azure DevOps
I created the CD pipeline
I added a SSH task with the following parameters
When I manually trigger it to test if it works, the release start working fine but after 1:43 minutes more or less is when I got the error:
Then, when I review the logs, it is the same error I pasted at the beginning:
[error]Failed to connect to remote machine. Verify the SSH service connection details. Error: Error: Timed out while waiting for handshake
I've increase the handshake timeout settings from the default one (20000) to 90000, but no luck.
Any one has face this problem before?
Seems there is an ongoing error with the default agent pools from Azure DevOps. Lot of people have been reported this and Azure DevOps teams is working on it at the time this post is been written (I couldn't find the post where all that is details. I will add this later on).
The workaround is
To create a self-hosted agent.
After this has been created you will need to re-create your CD pipeline using the new self-hosted agent.
The rest of the SSH task configuration depends on your needs. But if you want to test the SSH connection works, just print something:
echo 'I'm connected'
After this you CD pipeline should be working fine.
More details on how to created the Self-Hosted Agent on Windows. There are also links for Linux and Mac.
I had a similar issue with a VM in Azure. It turned out I had set the security group to only allow SSH in from my local network and Azure Dev-Ops agents obviously run in a Microsoft network and were coming from a different IP Address range. The solution was to open up SSH to all source IP Addresses. You can get the list of IP address ranges Dev-Ops agents use but they appear to change every week which isn't very helpful.
See https://learn.microsoft.com/en-us/azure/devops/organizations/security/allow-list-ip-url?view=azure-devops#microsoft-hosted-agents

Connecting with JMX using Docker for Mac

I'm struggling with setting up a JMX connection to Tomcat running in a Docker container using Docker for Mac.
I think I understand the basics, and have a setenv.sh in the tomcat/bin directory looking like this:
CATALINA_OPTS="-Dcom.sun.management.jmxremote=true\
-Dcom.sun.management.jmxremote.local.only=false\
-Dcom.sun.management.jmxremote.authenticate=false\
-Dcom.sun.management.jmxremote.ssl=false\
-Djava.rmi.server.hostname=185.83.15.228\
-Dcom.sun.management.jmxremote.port=9999\
-Dcom.sun.management.jmxremote.rmi.port=9999"
I think the problematic part might be the java.rmi.server.hostname property. I've set this to the IP of the host machine, but I've also tried other obvious things. I believe this should be the IP of the machine on which jconsole or jvisualvm will be running, but this is not working for me.
I start the container like this:
docker run -d -v /Users/timbo/tomcat-jmx.sh:/usr/local/tomcat/bin/setenv.sh -p 8080:8080 -p 9999:9999 tomcat:8.0
so port 9999 is exposed.
When I try to connect using jvisualvm connecting to localhost:9999 (which Docker for Mac will route to the container which is actually on 172.17.0.2) I get the error:
Cannot connect to localhost:9999 using service:jmx:rmi:///jndi/rmi://localhost:9999/jmxrmi
Any hints on what is wrong?
OK, I think I managed to find it eventually. Setting the value of java.rmi.server.hostname to the hostname of the host (e.g. mymac.local, or whatever is returned by hostname) seem to get it working. All other settings were OK.
Docker for Mac works in a bit different way. The port you map actually gets mapped to the Linux VM it is running in the background. This VM usually has in IP 192.168.99.100. So you should try and connect to 192.168.99.100:9999
To verify what is the IP of your VM, open the Docker CLI terminal and execute below
echo $DOCKER_HOST
tcp://192.168.99.100:2376

Unable to get Mesos to run from tutorial: Setting up a Single Node Mesosphere Cluster

I have been following this tutorial to try and setup a single node mesosphere cluster from their
official tutorial:
http://mesosphere.com/docs/getting-started/developer/single-node-install/
I followed all the commands without any issues, and I also added the ports 5050 and 8080 to my security group. When I try to access the console for mesos/marathon, I get a "Internet Explorer cannot display the webpage" message.
They also recommend checking it the following way:
MASTER=$(mesos-resolve `cat /etc/mesos/zk`)
mesos-execute --master=$MASTER --name="cluster-test" --command="sleep 5"
But that comes up with an error:
WARNING: Logging before InitGoogleLogging() is written to STDERR
F0106 17:03:08.126703 20993 process.cpp:1561] Failed to initialize, gethostbyname2: Unknown host
*** Check failure stack trace: ***
I am not really sure how to troubleshoot this either, and there are not many tutorials I could find on how to install mesos on ubuntu.
I checked the contents of the zk file, seems to be the default value.
$ cat /etc/mesos/zk
zk://localhost:2181/mesos
I would really appreciate any clues on how to go about this one.
Edit: The process is definitely running too - just an fyi:
root 31545 8.5 5.9 187464 35604 ? Ssl 17:28 0:00 /usr/local/sbin/mesos-slave --master=zk://localhost:2181/mesos --log_dir=/var/log/mesos
root 31563 28.5 2.1 116304 12856 ? Rs 17:28 0:00 /usr/local/sbin/mesos-master --zk=zk://localhost:2181/mesos --port=5050 --log_dir=/var/log/mesos --quorum=1 --wo
Mesos uses gethostbyname2 to resolve hostnames to IPs. The first thing I would recommend, is to try "ping localhost" and "ping hostname", and verify that there are no strange settings in /etc/hosts. If you're doing a multi-node cluster, I'd recommend that hostname map to the public IP address (not 127.0.x.1).
If that doesn't help, you can try setting the --ip and --hostname flags when starting mesos-master and mesos-slave, to bypass the gethostbyname2 resolution. These can also be set by writing to the file-based parameters, e.g. /etc/mesos/mesos-master/ip
For additional troubleshooting, try running wget http://localhost:5050 (or curl -L) from the mesos master, to verify that it is locally visible. Also try wget http://<public_ip>:5050 to verify that the web server is up and serving to the public IP. Depending on how your (EC2?) node is setup, you may need to expose/forward the port, or connect to a VPN.
Thanks Adam. I ran the wget and curl commands, and nothing was actually listening on port 8080 or 5050. I did open those ports in the ec2. A simple reboot did the trick however, once I ssh'ed into the ec2 instance after the reboot, both mesos and marathon were running and both ports are now showing after I ran
netstat -ntln.

SSH Connection from MAC to Amazon EC2 not working

I am trying to connect to Amazon EC2 via:
ssh -i ~/.ssh/YOUR_KEYPAIR_FILE.pem ec2-user#YOUR_IP_ADDRESS
The terminal takes 1 or 2 mins and then prints:
ssh: connect to host XXX port 22: Operation timed out
Any ideas?
Login to AWS
Go to the Instances section
Click on the security group associated with your EC2 instance
Down the bottom click on the inbound tab and then click edit
Create this rule
TYPE SSH
PROTOCOL TCP
PORT RANGE 22
SOURCE Anywhere
You should now be able to connect to the instance on port 22 via ssh with your key.
You need to open port 22 in your security group. All ports are closed by default.
Can you try changing permissions to YOUR_KEYPAIR_FILE.pem like this
chmod 600 YOUR_KEYPAIR_FILE.pem
Then shoot the command
ssh -i YOUR_KEYPAIR_FILE.pem ec2-user#YOUR_IP_ADDRESS
I had a similar problem. I checked all my networking time and time again from the ec2 instance all the way through the VPC and out to the internet. Security groups were allowing all sources through ports 22 and 80. My NACL was allowing the right permissions. I knew AWS was all ok yet everytime I went to try ssh into an instance I would still get an operation timeout, indicating that problem must be with my local machine instead.
First to check that the ssh port was open I ran the following:
ssh localhost
This worked fine!
Afte doing some research on the net, in the end it all boiled down to java and my terminal not recognising that java was installed on my machine.
Supporting Document:
AWS Documentation
No Java means that your .pem will not be recognised
Start by running the follwing:
java -version
If you get no hits then install relevant java SDK for your OS and once installed run
which java
You should get something like this:
/usr/bin/java
Now we can try connect to an instance again and hopefully you should have success this time!
ssh -v -i ~/Downloads/labamikey.pem ec2-user#ec2-34-200-217-2.compute-
__| __|_ )
_| ( / Amazon Linux AMI
___|\___|___|
[ec2-user#ip-10-0-0-54 ~]$

Resources