Unable to create new docker instances with docker-machine - docker-machine

I am using AWS with docker-machine to create and provision my instances. I would use this command to create a new instance:
docker-machine create --driver amazonec2 --amazonec2-instance-type "t2.micro" --amazonec2-security-group zhxw-production-sg zhxw-production-3
About a month ago, that worked fine. I just went to create a fresh machine, and I can no longer connect to it. When I run the above command, it gets stuck on "waiting for SSH to be available..."
Running pre-create checks...
Creating machine...
(zhxw-production-3) Launching instance...
Waiting for machine to be running, this may take a few minutes...
Detecting operating system of created instance...
Waiting for SSH to be available...
It just hangs at that point. If I cancel the command, and check the AWS EC2 console, it suggests that it's running:
When I run docker-machine ls, it also suggests that it's running, but with errors:
$-> docker-machine ls
NAME ACTIVE DRIVER STATE URL SWARM DOCKER ERRORS
zhxw-production-2 - amazonec2 Running tcp://3.86.xxx.xxx:2376 v19.03.12
zhxw-production-3 - amazonec2 Running tcp://54.167.xxx.xxx:2376 Unknown Unable to query docker version: Cannot connect to the docker engine endpoint
I'm able to connect to the zhxw-production-2 machine (which has been running for a month). Just not the new one zhxw-production-3 one I just launched.
$-> docker-machine env zhxw-production-3
Error checking TLS connection: Error checking and/or regenerating the certs: There was an error validating certificates for host "54.167.123.108:2376": dial tcp 54.167.123.108:2376: connect: connection refused
You can attempt to regenerate them using 'docker-machine regenerate-certs [name]'.
Be advised that this will trigger a Docker daemon restart which might stop running containers.
The regenerate-certs command doesn't help either. I'm not really sure where to start debugging, because as far as I can tell, the docker-machine create command is the very beginning.

Turned out to be a problem with SSH to my AWS environment. I had my public IP address whitelisted, but it had changed.

I came across a problem like this and I found out that the AWS EC2 AMI did not have SSH installed, so I had to use different AMI, eg. Ubuntu.

I went through the same problem recently and found that the cause was the public ip change when I enabled elastic ip on the machine. I don't know if this is your case. Maybe my solution will help you or help others. He follows:
usually the file path is: /User/<name_your_user>/.docker/machine/<name_machine_ploblem>
edit parameter value: "IPAdress"
After making the change, run the command: docker-machine regenerate-certs <name_instance_ec2>
With these procedures, my problem was solved. I hope it helps! hug to everyone.

Related

Cannot pull image from AWS ECR repository using docker with VirtualBox or Colima

The Situation
As you may all know, Docker has changed its license for Docker desktop to limit free usage for limited use cases.
As a result, I have resorted to alternatives such as Colima and use of virtual box as a means to continue using docker CLI while respecting Docker's new changes.
While it works fine for pulling images from Docker Hub, I've noticed that I can no longer pull images from my company's AWS ECR repo. The reason is due to unknown certificate authority issues.
My understanding of how docker runs is limited, but the gist I got from this stackoverflow post is that docker CLI acts as the client for the developer to send commands to the Docker Daemon that runs on a virtual machine. So this issue is most likely related to the VM that the docker daemon is running on.
The Error Message
Pulling from myrepo/myapp
5ad559c5ae16: Pulling fs layer
d7a7f7e76287: Pulling fs layer
3eb3e996f0d7: Pulling fs layer
d8f3fbab0eaf: Waiting
d310dd0da683: Waiting
6f542466a6be: Waiting
8851a2099770: Waiting
f1dd90cdff4b: Waiting
4a852bd6c6f1: Waiting
538106d55e7d: Waiting
dbc972867db8: Waiting
2bc8828e78a2: Waiting
1a653b47f557: Waiting
877c2f613a70: Waiting
09eac264496b: Waiting
66dd8ce5c695: Waiting
ccde39d6cfef: Waiting
4351b359c9e4: Waiting
52e095209afc: Waiting
c6ad9f161855: Waiting
233f3e28c5a3: Waiting
error pulling image configuration: Get https://prod-ca-central-1-starport-layer-bucket.s3.ca-central-1.amazonaws.com/<a-very-long-hash>/<another-very-long-hash>?X-Amz-Security-Token=<AWS-security-token>&X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Date=20220210T215140Z&X-Amz-SignedHeaders=host&X-Amz-Expires=3600&X-Amz-Credential=<my-credential>&X-Amz-Signature=<amazon-signature>: x509: certificate signed by unknown authority
My hypothesis for why I'm getting this error message
This is purely a guess. Please feel free to correct me.
I know that with Docker Desktop, I do not get this certificate error and my guess is that with the integration of hyperkit, it the VM can run via localhost, which will allow Docker Daemon to tap into macOS' trusted certificate authority certs.
The problem now arises because the VM that I've obtained from the Internet now no longer has access to those trusted certs.
What I've tried
Ensure I've logged into ECR using AWS command aws ecr get-login-password --region ca-central-1 | docker login --username AWS --password-stdin <my-aws-account-id>.dkr.ecr.ca-central-1.amazonaws.com
reinstall both Colima and the virtual box hypervisor
Isolate the issue by experimenting solely on virtual box setup.
I noticed that the folder /etc/docker is present on the VM. From Docker's documentation, the default directory for certificates for docker is in /etc/docker/certs.d to which I noticed it is absent in my Virtual Machine installation.
I think I'm close to a solution, but I'm quite new to how certificates work and I'm not sure where I can obtain the certificates I need to put them into that path to test.
Does anyone know how this can be done?
I got into same issue, I did this and it worked
Remove the line "credsStore": "xxx" from ~/.docker/config.json.

Docker stuck on "Waiting for SSH to be available..."

I'm using a docker with Windows and Hyper-v to create containers. I've added a docker machine vmachine to my docker configuration. First time the machine is created, it gets an IP (although I cannot manage nginx to access it - ERR_CONNECTION_REFUSED) and finishes the bootup.
When I turn off the machine and then try to boot it, i get stuck in this message
Waiting for SSH to be available...
And it doesn't evolve from there. The machine is booted, however, I get an IPv6 when I input the command docker-machine ip vmachine like - fe80::215:5dff:fe21:10b insted of a IPv4
What am I doing wrong?
Problem here is by default docker uses DockerNAT network switch. You should create a new external network switch instead. This issue is covered here and here. You can create an external network switch using the below command
docker-machine create -d hyperv --hyperv-virtual-switch external-switch tempbox1
or you can create one through the UI
Be sure to reboot the device after creating the external switch.
I had a similar issue and non of the solutions worked. Turns out that according to this answer, docker launches SSH with Unix specific elements. This is said to have been fixed in the releases that followed, but I still encountered the 'Waiting for SSH' issue. I resolved this by simply using GIT bash to run all docker related SSH commands.
Use the switch --native-ssh
for example docker-machine --native-ssh .... Get more details from here
docker-machine.exe -debug create --driver hyperv --hyperv-virtual-switch "External Virtual Switch" --hyperv-cpu-count "1" --hyperv-memory "1024" --hyperv-disk-size "20000" mydockervm
make sure to have additional VirtualSwitch configure , with external network driver selected , Uninstall virtualbox
Use the debug switch to see the exact error , for me it was not able to allocate memory.
Here's what's solved it for me.
Turns out Windows 10 starting version 1709 has a built in SSH client at C:\Windows\System32\OpenSSH. Here's an article discussing it.
Looks like docker is using that SSH implementation and it's not compatible. I didn't look for a proper way to remove the built-in SSH implementatino in Windows 10, and simply renamed the folder. That was enough to fix it for me.
After doing what is mentioned in the above suggestions and if you are running docker on a windows machine try to login using cli. This has worked for me.
If you are using Command Promt Docker will stuck at Waiting for SSH to be available..., So change to use GIT BASH as #Dave Howson said it will work.
If you're using oracle VM you must ensure first that your new cloud vm is running.
Before:
After:

Types of errors docker in Windows?

When I began to start docker it stuck and there are so many bad problem with it
System: Windows 8.1 64-bit
I replace virtualbox 5.0.6 to virtualbox 5.0.7 Still, there's an error
Error One:
When Docker Quickstart Terminal want to begin Starting VM... is hanging
Error two
I test my installation with docker run hello-world I get the following:
Post http://127.0.0.1:2375/v1.20/containers/create: dial tcp 127.0.0.1:2375: ConnectEx tcp: No connection could be made because the target machine actively refused it..
* Are you trying to connect to a TLS-enabled daemon without TLS?
* Is your docker daemon up and running?
Error three
Kinematic doesn't work
It work until 99% BUT suddenly it doesn't work
Machine IP could not be fetched. Please retry the setup. If this fails please file a ticket on our GitHub repo.
Best solution I've found after googling for a while:
First check if your Virtualbox version is the latest or at least above 5, then
Run on CMD docker-machine rm -f ##dockerContainerName## (Generally: default)
Delete .docker folder from your user folder
Run docker-machine create --driver virtualbox ##dockerContainerName##
Profit!!!

Docker unreachable after computer sleep

I have just installed docker using docker-toolbox 1.8.2 on Windows 10.
Due to due to this issue I had to recreate the docker image using these commands
docker-machine rm default
docker-machine --native-ssh create -d virtualbox default
After that it has been working fine, except for one problem:
When the PC has gone to sleep and then wakes again, the docker commands can no longer connect. Example:
> docker images
An error occurred trying to connect: Get https://192.168.99.100:2376/v1.20/images/json:
dial tcp 192.168.99.100:2376: ConnectEx tcp: A connection attempt failed because the
connected party did not properly respond after a period of time, or established connection
failed because connected host has failed to respond.
However the docker-machine lists the machine as running:
> docker-machine ls
NAME ACTIVE DRIVER STATE URL SWARM
default * virtualbox Running tcp://192.168.99.100:2376
I can also confirm in VirtualBox that the VM screen seems to be active.
I have tried starting and stopping the machine, but that does not help
C:\x> docker-machine stop default
C:\x> docker-machine start default
Starting VM...
Started machines may have new IP addresses. You may need to re-run the `docker-machine env` command.
C:\x> docker-machine env default --shell=powershell
Ironically, the last command hangs, so I never get any environment settings.
The only thing that helps is to restart the whole PC. But that should be unnecessary?
I have also posted this as an issue on the docker github repository,but that was closed. A related issue seems to be this one, but no workaround or solution has been posted for Windows.
After hous of fighting with VirtualBox + Docker Toolbox, I finally found the way, how to make Docker working again (even without restarting all the containers):
Wake up PC from sleep
Try docker images (won`t work)
Open VirtualBox -> Close VM with saving state (CTRL+V)
Run your VM again
Try docker images again (now should work)
Please note: All steps are in VirtualBox only! Running docker-machine restart default will create another host-only adapter, which is something you do not want. If you did it anyway, delete all additionally created adapters (File->Preferences->Network on VirtualBox), then follow steps 1-5.
I have experienced the exact same symptoms on Windows 8.1... The thing is that it's not really a docker-specific issue, but more how Windows manages the VirtualBox network adapters after sleep (I think...). The culprit in my case is that the network adapter's addresses were becoming private after sleep (they became 169.* addresses).
Credits to this guy who gave me the idea: http://lyngtinh.blogspot.ca/2011/12/how-to-disable-autoconfiguration-ipv4.html
Fix:
Start a command prompt as Administrator
Find out the "useful" network adapters: ipconfig /all. The useful ones in my case were the ones labeled "VirtualBox Host-Only Ethernet Adapter" that didn't have private ips (not starting with 169.*).
Run this command and note the "Idx" of the useful VirtualBox network adapters: netsh interface ipv4 show inter.
Run this command to disable the IP auto configuration: netsh interface ipv4 set interface <idx> dadtransmits=0 store=persistent. Replace <idx> with each index found in the previous step.
Restart Windows
Afterwards, I was able to docker-machine start default, then docker-machine env default --shell cmd, put the PC to sleep, wake up and run docker-machine env default --shell cmd again.
I found that removing 'host only adapter' (File->Preferences->Network on VirtualBox), and restart the docker-machine helps.
Not a real solution. But probably better over restart the computer.
Having tried all the other answers here, and having varying but not consistent success, the following seems to reliably bring it back for me after this problem occurs.
Open a powershell/command window (I have most success if I run all docker-machine commands in a powershell window opened as administrator, I don't know if that's important or not) then run (where "dev" is the name of your docker machine instance):
docker-machine ssh dev
Then on the terminal that is opened, run:
sudo shutdown -r now
When the machine restarts, it seems to refresh the network and work correctly. Note, however, that simply running docker-machine restart dev did not have the same effect for me.
Your machine needs to be running before you can do the ssh, so if it's not running, execute docker-machine start dev before trying to SSH.
Had the same problem on Windows 8.1 and docker toolbox 1.12.0
None of the above solutions worked for me, too.
[edited]
Found another way to make docker work after system wake up:
In the docker Quickstart Terminal window, stop docker process Ctrl-C (if it is still running)
Run command docker-compose down
Shut down docker with docker-machine stop default
Exit terminal window Ctrl-D
Run Quickstart Terminal again and do all subsequent steps you need.
This worked for me, on Windows host machine.
Configure your network adapter to
1) Allow the network adapter to wake the computer,
2) Allow a magic packet to wake the computer,
3) Allow IPV6
http://www.worldstart.com/dropped-internet-connection-in-sleep-mode/
Also, on virtual box network settings, go to advanced, and allow promiscuous mode to VM machines, or allow all

FreeNX(nomachine) unable to connect after cloning of a working ubuntu EC2 instance

I have previously setup a EC2 instance on Ubuntu 10.04 and setup the necessary binaries to allow ssh and more importantly FreeNX(no machine) to work on my MacOS-10.6 machine.
As this was done on a micro instance, i was keen to try it on small instance today so i created a AMI image from the aws management console(browser) and launch a new small instance using the image with the exact same keypair and security setting.
Expecting the instance to work exactly the same(except much faster) i tried to connect to it using SSH and FreeNX again.
Result:
SSH is working fine and my env look exactly the same.
NX is unable to connect.
it complain username/password is incorrect.
I wonder why this is happen since i did an exact clone of the EC2 instance and i can connect fine using NX with the previous instance?
I had the same issue, and after a lot of searching fixed it. It seems freenx lost the usernames and passwords. I fixed it by doing the following:
log in with putty as ubuntu user then
cd /etc/nxserver
sudo vim node.conf
set ENABLE_PASSDB_AUTHENTICATION="1" and save the file
then
sudo nxserver --adduser xxxxxx
sudo nxserver --passwd yyyyyy
sudo nxserver --restart
after that I was able to log in using nomachine with the username and password I just set.

Resources