fleetctl load hello.service hangs - amazon-ec2

When loading the hello.service from the coreos quickstart.
Am running on amazon ec2.
The command hangs.
Last login: Fri Sep 12 18:47:20 2014 from 10.0.11.90
CoreOS (alpha)
core#ip-10-32-252-148 ~ $ cat /etc/os-release
NAME=CoreOS
ID=coreos
VERSION=435.0.0
VERSION_ID=435.0.0
BUILD_ID=
PRETTY_NAME="CoreOS 435.0.0"
ANSI_COLOR="1;32"
HOME_URL="https://coreos.com/"
BUG_REPORT_URL="https://github.com/coreos/bugs/issues"
core#ip-10-32-252-148 ~ $ fleetctl list-machines
MACHINE IP METADATA
2be4e7f1... 10.32.252.144 -
73477fd7... 10.32.252.147 -
87abdeb3... 10.32.252.145 -
adc91c39... 10.32.252.148 -
core#ip-10-32-252-148 ~ $ fleetctl list-units
UNIT MACHINE ACTIVE SUB
core#ip-10-32-252-148 ~ $ fleetctl load hello.service
Here's the user-data that I used for amazon ec2.

Related

Hashicorp - vault service fails to start

I have setup Hashicorp - vault (Vault v1.5.4) on Ubuntu 18.04. My backend is Consul (single node running on same server as vault) - consul service is up.
My vault service fails to start
systemctl list-units --type=service | grep "vault"
vault.service loaded failed failed vault service
journalctl -xe -u vault
Oct 03 00:21:33 ubuntu2 systemd[1]: vault.service: Scheduled restart job, restart counter is at 5.
-- Subject: Automatic restarting of a unit has been scheduled
- Unit vault.service has finished shutting down.
Oct 03 00:21:33 ubuntu2 systemd[1]: vault.service: Start request repeated too quickly.
Oct 03 00:21:33 ubuntu2 systemd[1]: vault.service: Failed with result 'exit-code'.
Oct 03 00:21:33 ubuntu2 systemd[1]: Failed to start vault service.
-- Subject: Unit vault.service has failed
vault config.json
"api_addr": "http://<my-ip>:8200",
storage "consul" {
address = "127.0.0.1:8500"
path = "vault"
},
Service config
StandardOutput=/opt/vault/logs/output.log
StandardError=/opt/vault/logs/error.log
cat /opt/vault/logs/error.log
cat: /opt/vault/logs/error.log: No such file or directory
cat /opt/vault/logs/output.log
cat: /opt/vault/logs/output.log: No such file or directory
sudo tail -f /opt/vault/logs/error.log
tail: cannot open '/opt/vault/logs/error.log' for reading: No such file or
directory
:/opt/vault/logs$ ls -al
total 8
drwxrwxr-x 2 vault vault 4096 Oct 2 13:38 .
drwxrwxr-x 5 vault vault 4096 Oct 2 13:38 ..
After much debugging, the issue was silly goofup mixing .hcl and .json (they are so similar - but different) - cut-n-paste between stuff the storage (as posted) needs to be in json format. The problem is of course compounded when the error message saying nothing and there is nothing in the logs.
"storage": {
"consul": {
"address": "127.0.0.1:8500",
"path" : "vault"
}
},
There were a couple of other additional issues to sort out to get it going- disable_mlock : true, opening the firewall for 8200: sudo ufw allow 8200/tcp.
Finally got done (rather started).

How to connect Minikube on Hyperkit on MacBook

I'm trying to connect to Hyperkit to check containers running on this VM.
All I'm getting now is [screen is terminating]
Here is what I do:
MacBook-Pro-Karol: ~
β†’ minikube start --driver=hyperkit
πŸ˜„ minikube v1.12.3 na Darwin 10.15.6
✨ Using the hyperkit driver based on user configuration
πŸ‘ Starting control plane node minikube in cluster minikube
πŸ”₯ Creating hyperkit VM (CPUs=2, Memory=4000MB, Disk=20000MB) ...
🐳 preparing Kubernetes v1.18.3 on Docker 19.03.12...
πŸ”Ž Verifying Kubernetes components...
🌟 Enabled addons: default-storageclass, storage-provisioner
πŸ„ Ready! kubectl is configured to be used with "minikube".
MacBook-Pro-Karol: ~
β†’ sudo screen /Users/karol/.minikube/machines/minikube/tty
Password:
[screen is terminating]
MacBook-Pro-Karol: ~
β†’ screen ~/Library/Containers/com.docker.docker/Data/vms/0/tty
[screen is terminating]
Cannot exec '/Users/karol/Library/Containers/com.docker.docker/Data/vms/0/tty': Permission denied
β†’ sudo screen ~/Library/Containers/com.docker.docker/Data/vms/0/tty
Password:
[screen is terminating]
Cannot exec '/Users/karol/Library/Containers/com.docker.docker/Data/vms/0/tty': Operation not permitted
Any help would be appreciated.
You can use minikube ssh to login in to VM that minikube runs in:
Log into or run a command on a machine with SSH; similar to
β€˜docker-machine ssh’.
minikube ssh [flags]
and then use docker ps to check the running containers inside this VM:
$ docker ps | grep kube-api
f53aebd26287 7e28efa976bd "kube-apiserver --ad…" 16 minutes ago k8s_kube-apiserver_kube-apiserver-minikube_kube-system_8009646ba816631d0677c2668886baad_1
12188a523d12 k8s.gcr.io/pause:3.2 "/pause" 16 minutes ago k8s_POD_kube-apiserver-minikube_kube-system_8009646ba816631d0677c2668886baad_1

Windows Docker Desktop Linux mode - docker container time skew

Question: How can I map docker container time to my local PC time to sync the time inside the docker container?
From my windows 10 PC, I am running Linux mode docker desktop version 2.2.0.4 (43472), docker Engine 19.03.8.
All the docker containers created are showing massive time skew from that of the host:
From centos 8 docker container:
[root# /]# date
Thu May 7 01:18:16 UTC 2020
From docker host running Window Doker desktop on Windows 10 PC:
PS> date
14 May 2020 14:42:17
I tried to create a new container with -v option as below:
docker container run -it -v c:\docker_volumes\docker1:/storage -v /etc/localtime:/etc/localtime:ro --name centos7-squid centos:7.7.1908 /bin/bash
I get the error below
Unable to find image 'centos:7.7.1908' locally
7.7.1908: Pulling from library/centos
f34b00c7da20: Pull complete Digest: sha256:50752af5182c6cd5518e3e91d48f7ff0cba93d5d760a67ac140e2d63c4dd9efc
Status: Downloaded newer image for centos:7.7.1908
C:\Program Files\Docker\Docker\resources\bin\docker.exe: Error response from daemon: OCI runtime create failed: container_linux.go:349: starting container process caused "process_linux.go:449: container init caused \"rootfs_linux.go:58: mounting \\"/etc/localtime\\" to rootfs \\"/var/lib/docker/overlay2/c7e86cffdc46c354f19b25fa97146ce8f2caee653793219719b043c97040d1b7/merged\\" at \\"/var/lib/docker/overlay2/c7e86cffdc46c354f19b25fa97146ce8f2caee653793219719b043c97040d1b7/merged/usr/share/zoneinfo/UTC\\" caused \\"not a directory\\"\"": unknown: Are you trying to mount a directory onto a file (or vice-versa)? Check if the specified host path exists and is the expected type.
I fixed it by setting the hardware clock of the virtual machine running docker:
docker run --rm --privileged alpine hwclock -s
credit:
https://blog.jverkamp.com/2017/11/15/clock-drift-in-docker-containers/

Can't kill processes (originating in a docker container)

I run a docker cluster with a few thousand containers and a few times per day randomly I have a process that gets "stuck" blocking a container from stopping. Below is an example container with its corresponding process and all things I have tried to kill the container / process.
The container:
# docker ps | grep 950677e2317f
950677e2317f 7e553d1d9f6f "/bin/sh -c /minecraf" 2 days ago Up 2 days 0.0.0.0:22661->22661/tcp, 0.0.0.0:22661->22661/udp, 0.0.0.0:37681->37681/tcp, 0.0.0.0:37681->37681/udp gloomy_jennings
Try to stop container using docker daemon (it tries forever without result):
# time docker stop --time=1 950677e2317f
^C
real 0m13.508s
user 0m0.036s
sys 0m0.008s
Daemon log while trying to stop:
# journalctl -fu docker.service
-- Logs begin at Fri 2015-12-11 15:40:55 CET. --
Dec 31 23:30:33 m3561.contabo.host docker[9988]: time="2015-12-31T23:30:33.164731953+01:00" level=info msg="POST /v1.21/containers/950677e2317f/stop?t=1"
Dec 31 23:30:34 m3561.contabo.host docker[9988]: time="2015-12-31T23:30:34.165531990+01:00" level=info msg="Container 950677e2317fcd2403ef5b5ffad37204e880136e91f76b0a8682e04a93e80942 failed to exit within 1 seconds of SIGTERM - using the force"
Dec 31 23:30:44 m3561.contabo.host docker[9988]: time="2015-12-31T23:30:44.165954266+01:00" level=info msg="Container 950677e2317f failed to exit within 10 seconds of kill - trying direct SIGKILL"
Looking into the processes running on the machine reveals the zombie process (pid 11991 on host machine):
# ps aux | grep [1]1991
root 11991 84.3 0.0 5836 132 ? R Dec30 1300:19 bash -c (echo stop > /tmp/minecraft &)
# top -b | grep [1]1991
11991 root 20 0 5836 132 20 R 89.5 0.0 1300:29 bash
And it is indeed a process running within our container (check container id):
# cat /proc/11991/mountinfo
...
/var/lib/docker/containers/950677e2317fcd2403ef5b5ffad37204e880136e91f76b0a8682e04a93e80942/resolv.conf /etc/resolv.conf rw,relatime - ext4 /dev/sda2 rw,errors=remount-ro,data=ordered
Trying to kill the process yields nothing:
# kill -9 11991
# ps aux | grep [1]1991
root 11991 84.3 0.0 5836 132 ? R Dec30 1303:58 bash -c (echo stop > /tmp/minecraft &)
Some overview data:
# docker version
Client:
Version: 1.9.1
API version: 1.21
Go version: go1.4.2
Git commit: a34a1d5
Built: Fri Nov 20 13:20:08 UTC 2015
OS/Arch: linux/amd64
Server:
Version: 1.9.1
API version: 1.21
Go version: go1.4.2
Git commit: a34a1d5
Built: Fri Nov 20 13:20:08 UTC 2015
OS/Arch: linux/amd64
# docker info
Containers: 189
Images: 322
Server Version: 1.9.1
Storage Driver: aufs
Root Dir: /var/lib/docker/aufs
Backing Filesystem: extfs
Dirs: 700
Dirperm1 Supported: true
Execution Driver: native-0.2
Logging Driver: json-file
Kernel Version: 4.2.0-19-generic
Operating System: Ubuntu 15.10
CPUs: 24
Total Memory: 125.8 GiB
Name: m3561.contabo.host
ID: ZM2Q:RA6Q:E4NM:5Q2Q:R7E4:BFPQ:EEVK:7MEO:YRH6:SVS6:RIHA:3I2K
# uname -a
Linux m3561.contabo.host 4.2.0-19-generic #23-Ubuntu SMP Wed Nov 11 11:39:30 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux
If stopping the docker daemon the process still lives. The only way to get rid of the process is to restart the host machine. As this happens fairly frequently (requires every node to restart every 3-7 days) it has a serious impact on the uptime of the overall cluster.
Any ideas on what to do here?
Okay, I think I found the root cause of this. The folks over at Docker helped me out, check out this thread on GitHub.
It turns out this most likely is a bug in the Linux Kernel 4.19+. I'll be rolling back to an older version until it is fixed.
UPDATE: I've been running 3.* only in my cluster for several days now without any issues. This was most certainly a kernel bug.
I had a similar problem and switching to use overlay2 storage driver made the problem go away. Changing the storage driver will loose all docker state (images & containers). It seems that the aufs storage driver has some problems that might be a source of lock ups.

Docker-Machine and Swarm behind proxy

I'm traing to set up docker swarm over my virtual cluster. First, I try to install the swarm-master on the localhost with docker-machine.
The problem is that the machine need to use a proxy to access the discovery token.
First I ask a token with swarm create. To do that, I created this file :
$cat /etc/systemd/system/docker.service.d/http_proxy.conf
[Service]
Environment="HTTP_PROXY=http://**.**.**.**:3128/" "HTTPS_PROXY=http://**.**.**.**:3128/" "NO_PROXY=localhost,127.0.0.1,192.168.2.100,192.168.2.101,192.168.2.102,192.168.2.103,192.168.2.104,192.168.2.105,192.168.2.106,192.168.2.107,192.168.2.108,192.168.2.194,192.168.2.110"
I restarted the daemon and I can pull the swarm image :
$docker run -e "http_proxy=http://**.**.**.**:3128/" -e "https_proxy=http://**.**.**.**:3128/" swarm create
b54d8665e72939d2c611d8f9e99521b4
After I want to create the swarm master :
$docker-machine create -d generic --generic-ip-address localhost \
--engine-env HTTP_PROXY=http://192.168.254.10:3128/ \
--engine-env HTTPS_PROXY=http://192.168.254.10:3128/ \
--engine-env NO_PROXY=localhost,192.168.2.102,192.168.2.100 \
--swarm --swarm-master --swarm-discovery \
token://b54d8665e72939d2c611d8f9e99521b4 swarm-master
Result :
Running pre-create checks...
Creating machine...
Waiting for machine to be running, this may take a few minutes...
Machine is running, waiting for SSH to be available...
Detecting operating system of created instance...
Provisioning created instance...
Copying certs to the local machine directory...
Copying certs to the remote machine...
Setting Docker configuration on the remote daemon...
Configuring swarm...
To see how to connect Docker to this machine, run: docker-machine env swarm-master
And I have errors in the logs of the join and manage container (I think the error come because the containers don't take care of the proxy) :
$docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
6fbf967cdb60 swarm:latest "/swarm join --advert" 53 seconds ago Up 52 seconds 2375/tcp swarm-agent
8b176116989e swarm:latest "/swarm manage --tlsv" 54 seconds ago Up 53 seconds 2375/tcp, 0.0.0.0:3376->3376/tcp swarm-agent-master
$docker logs 6fbf967cdb60
time="2015-11-17T19:37:21Z" level=info msg="Registering on the discovery service every 20s..." addr="localhost:2376" discovery="token://b54d8665e72939d2c611d8f9e99521b4"
time="2015-11-17T19:37:41Z" level=error msg="Post https://discovery.hub.docker.com/v1/clusters/b54d8665e72939d2c611d8f9e99521b4?ttl=60: dial tcp: lookup discovery.hub.docker.com on 8.8.4.4:53: read udp 172.17.0.3:46576->8.8.4.4:53: i/o timeout"
$docker logs 8b176116989e
time="2015-11-17T19:37:20Z" level=info msg="Listening for HTTP" addr="0.0.0.0:3376" proto=tcp
time="2015-11-17T19:37:40Z" level=error msg="Discovery error: Get https://discovery.hub.docker.com/v1/clusters/b54d8665e72939d2c611d8f9e99521b4: dial tcp: lookup discovery.hub.docker.com on 8.8.4.4:53: read udp 172.17.0.2:44241->8.8.4.4:53: i/o timeout"
Is it a bug of the generic driver ?
Some others informations :
# docker version
Client:
Version: 1.9.0
API version: 1.21
Go version: go1.4.2
Git commit: 76d6bc9
Built: Tue Nov 3 17:29:38 UTC 2015
OS/Arch: linux/amd64
Server:
Version: 1.9.0
API version: 1.21
Go version: go1.4.2
Git commit: 76d6bc9
Built: Tue Nov 3 17:29:38 UTC 2015
OS/Arch: linux/amd64
# docker info
Containers: 2
Images: 8
Server Version: 1.9.0
Storage Driver: aufs
Root Dir: /var/lib/docker/aufs
Backing Filesystem: extfs
Dirs: 12
Dirperm1 Supported: true
Execution Driver: native-0.2
Logging Driver: json-file
Kernel Version: 3.16.0-4-amd64
Operating System: Debian GNU/Linux 8 (jessie)
CPUs: 2
Total Memory: 1000 MiB
Name: swarm-master
ID: 6SDE:CQRA:NM6W:TY2H:4DPB:O4YO:IGRT:33AA:OKQP:M6UK:EMSR:H4WR
WARNING: No memory limit support
WARNING: No swap limit support
Labels:
provider=generic
Thank you :)
The problem was that it's not possible to use docker machine to create the swarm-master on the same machine. So I created two VM, one with docker-machine (and mh-keystore) and one other for swarm-master.
Creating the mh-keystore on localhost :
$docker-machine create -d generic --generic-ip-address localhost mh-keystore
$docker $(docker-machine config mh-keystore) run -d \
-p "8500:8500" \
-h "consul" \
progrium/consul -server -bootstrap
$docker ps
Installation of swarm-master to the other machine
$ docker-machine create \
-d generic --generic-ip-address 192.168.2.100 \
--swarm --swarm-image="swarm" --swarm-master \
--swarm-discovery="consul://192.168.2.103:8500" \
swarm-master
Creation of agent :
$ docker-machine create \
-d generic --generic-ip-address 192.168.2.102 \
--swarm \
--swarm-discovery="consul://192.168.2.103:8500" \
swarm-agent-00

Resources