Rancher: creating hosts on AWS EC2 - docker-machine

I'm trying to add a EC2 host to my Rancher setup. I have seen this tutorial, however I wanted to use Docker-machine instead.
To that extend, I have done the following:
MAC:~ user1$ docker-machine create -d amazonec2 --amazonec2-vpc-id vpc-84fd6de0 --amazonec2-region eu-west-1 --amazonec2-ami ami-c5f1beb6 Rancher-node-aws-01Running pre-create checks...
Creating machine...
(Rancher-node-aws-01) Launching instance...
Waiting for machine to be running, this may take a few minutes...
Detecting operating system of created instance...
Waiting for SSH to be available...
Error creating machine: Error detecting OS: Too many retries waiting for SSH to be available. Last error: Maximum number of retries (60) exceeded
Note: the AMI ID corresponds to rancheros-v0.7.0-hvm-1.
As you can see, I cannot SSH into the RancherOS (SSH port is open on AWS). Any ideas why this is?

The trick is to use an SSH user called 'rancher'. So the full command will be:
docker-machine create -d amazonec2 --amazonec2-vpc-id vpc-84fd6de0 --amazonec2-region eu-west-1 --amazonec2-ami ami-c5f1beb6 --amazonec2-ssh-user rancher Rancher-node-aws-01

Related

Can only connect one time to AWS EC2 instance

I launched a new AWS EC2 Ubuntu Server t2.micro instance via the AWS console. I was able to successfully connect to the instance a single time using ssh on macOS Sierra 10.12.3:
$ ssh -i ./ubuntu-server-2-17-2017.pem ubuntu#ec2-55-555-555-555.compute-1.amazonaws.com
However, when I try to connect a second time, I get a time out error:
$ ssh -i ./ubuntu-server-2-17-2017.pem ubuntu#ec2-55-555-555-555.compute-1.amazonaws.com
ssh: connect to host ec2-55-555-555-555.compute-1.amazonaws.com port 22: Operation timed out
How can I resolve this issue?
The first thing to check is that the IP address associated with the instance is still the same.
The other thing to look at, then, is the security group to see if your IP address (which maybe changed) is still allowed.

How can docker-machine create an EC2 instance in a private subnet?

I have a bastion host in the public subnet through which I usually access the hosts in the private subnet. When I create a docker machine in the private subnet with the command below, it does not complete.
export server_name=tomcat-5
docker-machine create \
--driver amazonec2 \
--amazonec2-region us-west-2 \
--amazonec2-vpc-id vpc-8e5488ea \
--amazonec2-ami ami-6f69a25f \
--amazonec2-instance-type m3.medium \
--amazonec2-zone b \
--amazonec2-subnet-id subnet-52f5dd54 \
--amazonec2-security-group tomcat-sg-SecurityGroup-JHHNDKKL4LO1 \
--amazonec2-tags Name,${server_name} \
--amazonec2-root-size 10 \
--amazonec2-ssh-user ec2-user \
--amazonec2-ssh-keypath ~/.ssh/id_rsa \
--amazonec2-private-address-only \
${server_name}
It says
Running pre-create checks...
Creating machine...
(tomcat-5) Launching instance...
Waiting for machine to be running, this may take a few minutes...
Detecting operating system of created instance...
Waiting for SSH to be available...
and after that it just hangs for ever. Obviously it does not know how to get to the server via the bastion. And I cannot name the server so that docker can leverage the .ssh/config (if it will do that).
It is hard to imagine that others have not run into it. I ultimately plan to bring up these servers using docker compose. So if I can do that without docker-machine, that is fine too.
What am I missing?
I was able to get more information about the problem by turning on debug. Essentially
docker-machine --debug....
This allowed me to see that docker-machine was trying to ssh into an IP with ec2-user#10.x.y.z with these parameters
{[-F /dev/null -o PasswordAuthentication=no -o StrictHostKeyChecking=no
-o UserKnownHostsFile=/dev/null -o LogLevel=quiet -o ConnectionAttempts=3
-o ConnectTimeout=10 -o ControlMaster=no -o ControlPath=none ec2-user#10.x.y.z
-o IdentitiesOnly=yes -i /Users/jvarg/.docker/machine/machines/tomcat-2/id_rsa
-p 22] /usr/bin/ssh <nil>}
Getting closer. I was able to directly ssh into the machine by updating my ~/.ssh/config by creating a proxy for my subnet. But docker machine is not using that config file. As you can see it use /dev/null. "-F /dev/null".
I looked in the docker machine code and this seems to be hardcoded. https://github.com/docker/machine/blob/df2d3811ca8bc9ddf6896b4a4154b9277826b441/libmachine/ssh/client.go#L69
I have created an issue on the github to follow up. https://github.com/docker/machine/issues/3794
Update: While we wait for that PR to be accepted (now stuck with merge conflicts) here is a lame workaround. Login to bastion and use that as an orchestrator instead of your laptop. On the bastion install docker and docker-machine. Also make sure you create a new key-pair there so as not to compromise your own. I did not install ssh agent. So if you go the same way, make sure the key pair has no pass phrase. You will be able to finally ssh into it. But it will then fail with
notifying bugsnag: [Error creating machine: Error running provisioning: exit status 1]
While it is not obvious from that error, a detailed analysis of the debug output showed that it was failing because the new machine was not able to get out to the internet. This is usually not an issue for most of you. In my company's case we use an http_proxy. But I resolved this by setting up a NAT gateway.
The next error was because the bastion could not communicate on port 2376 with the new machine. Normally docker-machine creates a security group with 2376 open to the world. My company frowns on open-to-the-world ports. So I had updated my SG to allow access from the bastion. But I guess I need to tweak it.
I installed an OpenVPN server on the bastion (actually a NAT gateway). Then my client connects to the OpenVPN server. The OpenVPN server pushes a route allowing the client to access the private subnet work on the VPC.
This works transparently. I can use docker-machine to create nodes on the private network. I can even start docker on my client and join an existing swarm.

Error setting up docker on Windows

I am trying to set up docker machine on Windows and this problem has annoyed me for a few days.
I downloaded and installed DockerToolbox-1.9.1a on my Windows, so it came with Virtual Box version 5.0.10. After that I ran this command to create my virtual machine:
docker-machine create --driver virtualbox --engine-insecure-registry docker.pre-prod.ss.local:5000 --virtualbox-hostonly-cidr 192.168.99.100/24 mymachine
Here is what I got:
Waiting for machine to be running, this may take a few minutes...
Machine is running, waiting for SSH to be available... Detecting
operating system of created instance... Detecting the provisioner...
Provisioning created instance... Copying certs to the local machine
directory... Copying certs to the remote machine... Setting Docker
configuration on the remote daemon... WARNING >>> This machine has
been allocated an IP address, but Docker Machine could not reach it
successfully.
SSH for the machine should still work, but connecting to exposed
ports, such as the Docker daemon port (usually :2376), may not
work properly.
You may need to add the route manually, or use another related
workaround
This could be due to a VPN, proxy, or host file configuration issue.
You also might want to clear any VirtualBox host only interfaces you
are not using
The machine was created successfully. So I ran the docker-machine env command:
docker-machine env --shell=powershell mymachine| Invoke-Expression
and I got:
Error running connection boilerplate: Error checking and/or
regenerating the certs: There was icates for host
"192.168.99.100:2376": dial tcp 192.168.99.100:2376: connectex: No
connection target machine actively refused it. You can attempt to
regenerate them using 'docker-machine regenerate-certs name'. Be
advised that this will trigger a Docker daemon restart which will stop
running containers.
Running docker version returned
Client: Version: 1.9.1 API version: 1.21 Go version:
go1.4.3 Git commit: a34a1d5 Built: Fri Nov 20 17:56:04 UTC
2015 OS/Arch: windows/amd64 An error occurred trying to connect:
Get http://localhost:2375/v1.21/version: dial tcp connection could be
made because the target machine actively refused it.
Can someone help to point out the direction to fix this issue? It is really troublesome to set up docker on Windows. Thank you very much.
I use docker 1.9.1 on Windows (7, 8 and even 10), but without docker registry, and without using --virtualbox-hostonly-cidr.
If you are to use that last option, check "Set a specific address ip when i create a docker container", where I mention issue 1709, which uses cidr in .1, not .100 (but getting a .100 ip address as a result):
docker-machine create -d virtualbox --virtualbox-hostonly-cidr "192.168.99.1/24" m99
If there's no other machine with the same cidr (Classless Inter-Domain Routing), the machine should always get the .100 IP upon start.

SSH freeze when connecting to AWS

Connecting to Ubuntu 14.04 server at AWS gx2.2 instance(Huge GPU one), from an Ubuntu 14.04 system with the following command:
ssh -i ~/.ssh/key.pem ubuntu#12.121.12.321
Normally it would just connect, but now it times out with this error:
ssh: connect to host 54.171.53.164 port 22: Connection timed out
I can Ctrl+C out of the freeze though.
I have tried to restart.
I have tried to sudo apt-get update.
Recheck your AWS parameters...
1) Check Public Ip associated with the Amazon ec2 instance, check whether it is the same as ip address 12.121.12.321 you are using to make connection.
2) Check the inbound rule of the security group associated with the Amazon ec2 instance. Ensure that the inbound rule has ssh access to the ip address of the machine from which you are trying to connect the Amazon ec2 instance.
3) Ensure that the pem file you file you are using is appropriate.
Hope it helps...

Getting some sort of authentication issue when deploying EC2 instances with Knife

I'm having some kind of authentication issue when trying to launch server instances in EC2 with the knife command.
I'm using a command like:
knife ec2 server create --availability-zone us-east-1d --node-name ES-test --flavor t1.micro --image ami-fd20ad94 --identity-file something-dev.pem --ssh-user ubuntu -r 'recipe[something-elasticsearch::default]'
And there are 2 points of failure. The first comes relatively early on.
Waiting for instance...........................
Subnet ID: subnet-61dfa849
Private IP Address: 10.0.0.43
done
Bootstrapping Chef on 10.0.0.43
Failed to authenticate ubuntu - trying password auth
Enter your password:
I should be able to authenticate as Ubuntu with no password here. In fact, if I allow the provisioning to continue and try to ssh to the generated instance with something like:
ssh -i something-dev.pem ubuntu#10.0.0.43
...it will work. So why is the knife command itself failing to authenticate?
I had the same problem as above and tried the ssh-add as suggested by Rico above. Although I still got the prompt for a password, hitting enter on a blank password then allowed the process to continue.
Failing that, the -V verbose output option may give you more insight.
I found this to work well for me.
bundle exec knife ec2 server create -r "role[websphere]" -I ami-cb94868e --flavor m1.small -G default --ssh-user ubuntu -N server01 -S whatever --identity-file .chef/whatever.pem
Also consider that when you download the .pem from AWS, you need to chmod 400 whatever.pem

Resources