I use the OpenShift client on macOS to get an OpenShift cluster up and running. I can login on both commandline and web console. But when I try to install an application, e.g. my own simple spring-boot-application or the openshift/jenkins template, the deployment process stuck and I get a couple of errors:
Failed Sync: Error syncing pod
Failed mount: MountVolume.SetUp failed for volume "kubernetes.io/secret/972d63f8-bc0e-11e7-b3e2-025000000001-deployer-token-n8bdv" (spec.Name: "deployer-token-n8bdv") pod "972d63f8-bc0e-11e7-b3e2-025000000001" (UID: "972d63f8-bc0e-11e7-b3e2-025000000001") with: exit status 1
Failed mount: Unable to mount volumes for pod "jenkins-1-deploy_myproject(972d63f8-bc0e-11e7-b3e2-025000000001)": timeout expired waiting for volumes to attach/mount for pod "myproject"/"jenkins-1-deploy". list of unattached/unmounted volumes=[deployer-token-n8bdv]
Any hints? What does the exit status 1 mean? Or can someone point me to the source code where I can lookup MountVolume.SetUp method?
$ oc version
oc v3.6.0+c4dd4cf
kubernetes v1.6.1+5115d708d7
features: Basic-Auth
Server https://127.0.0.1:8443
openshift v3.6.0+c4dd4cf
kubernetes v1.6.1+5115d708d7
Related
I set up a master node on a Ubuntu 18.04 machine and followed the guide: https://kubernetes.io/ja/docs/setup/production-environment/windows/user-guide-windows-nodes/ to register a windows server 2019 node to the cluster successfully.
Now the kubelet has been started on powershell and the two nodes are ready.
On the windows machine, I run the command line: "kubectl create deployment --image=XXX(A windows server 2019 image) webadmin-app" to create a deployment on the windows node.
When creating the pod, kubelet reports the following log messages:
W0821 17:37:03.003768 99524 pod_container_deletor.go:77] Container "dee4daa76a9e60e0e68af75597092aa5cff517c7021a6ef7579f77f662f2a163" not found in pod's containers
W0821 17:37:03.071774 99524 helpers.go:289] Unable to create pod sandbox due to conflict. Attempting to remove sandbox "dee4daa76a9e60e0e68af75597092aa5cff517c7021a6ef7579f77f662f2a163"
E0821 17:37:03.108764 99524 remote_runtime.go:200] CreateContainer in sandbox "62ff282461eba2fae24a66b7d38ccca43b224c74320dbb5a0a4659b4c4446eb7" from runtime service failed: rpc error: code = Unknown desc = Error response from daemon: Conflict. The container name "/k8s_webadmin-site_webadmin-app-757c7455cf-nms75_default_7ac60567-f9e2-4c04-aead-c6957200c961_0" is already in use by container "dee4daa76a9e60e0e68af75597092aa5cff517c7021a6ef7579f77f662f2a163". You have to remove (or rename) that container to be able to reuse that name.
E0821 17:37:03.109762 99524 kuberuntime_manager.go:801] container start failed: CreateContainerError: Error response from daemon: Conflict. The container name "/k8s_webadmin-site_webadmin-app-757c7455cf-nms75_default_7ac60567-f9e2-4c04-aead-c6957200c961_0" is already in use by container "dee4daa76a9e60e0e68af75597092aa5cff517c7021a6ef7579f77f662f2a163". You have to remove (or rename) that container to be able to reuse that name.
E0821 17:37:03.113766 99524 pod_workers.go:191] Error syncing pod 7ac60567-f9e2-4c04-aead-c6957200c961 ("webadmin-app-757c7455cf-nms75_default(7ac60567-f9e2-4c04-aead-c6957200c961)"), skipping: failed to "StartContainer" for "webadmin-site" with CreateContainerError: "Error response from daemon: Conflict. The container name "/k8s_webadmin-site_webadmin-app-757c7455cf-nms75_default_7ac60567-f9e2-4c04-aead-c6957200c961_0" is already in use by container "dee4daa76a9e60e0e68af75597092aa5cff517c7021a6ef7579f77f662f2a163". You have to remove (or rename) that container to be able to reuse that name."
Such failed messages keeps generating when creating the pod.
So I listed the docker containers using "docker ps" during the pod creation period. It seems that the kubelet keeps creating and removing containers(which are based on the specified image XXX).
How can I resolve such failure to create a deployment on the windows node?
Update:
I can create a deployment if --image is set to mcr.microsoft.com/k8s/core/pause:1.2.0 .
However if I use another image which is based on windows nano server. I got the deployment failed.
I caught up some error logs from kubelet output:
W0826 17:58:14.903662 127340 cri_stats_provider_windows.go:89] Failed to get HNS endpoint "" with error 'json: cannot unmarshal array into Go value of type hns.HNSEndpoint', continue to get stats for other endpoints
E0826 17:58:14.940665 127340 cni.go:364] Error adding default_webadmin-app-757c7455cf-5c5nf/9a37dab965d1b006c4c2ec70619b3a4bb42dd39dad8c0ce1baf39eab634da3d5 to network flannel/vxlan0: error while ProvisionEndpoint(9a37dab965d1b006c4c2ec70619b3a4bb42dd39dad8c0ce1baf39eab634da3d5_vxlan0,39EA65C2-C0F9-4870-8B7A-E2A1DBF5CD9D,9a37dab965d1b006c4c2ec70619b3a4bb42dd39dad8c0ce1baf39eab634da3d5): The virtual machine or container was forcefully exited.
E0826 17:58:14.942663 127340 cni_windows.go:59] error while adding to cni network: error while ProvisionEndpoint(9a37dab965d1b006c4c2ec70619b3a4bb42dd39dad8c0ce1baf39eab634da3d5_vxlan0,39EA65C2-C0F9-4870-8B7A-E2A1DBF5CD9D,9a37dab965d1b006c4c2ec70619b3a4bb42dd39dad8c0ce1baf39eab634da3d5): The virtual machine or container was forcefully exited.
W0826 17:58:14.943663 127340 docker_sandbox.go:400] failed to read pod IP from plugin/docker: networkPlugin cni failed on the status hook for pod "webadmin-app-757c7455cf-5c5nf_default": error while ProvisionEndpoint(9a37dab965d1b006c4c2ec70619b3a4bb42dd39dad8c0ce1baf39eab634da3d5_vxlan0,39EA65C2-C0F9-4870-8B7A-E2A1DBF5CD9D,9a37dab965d1b006c4c2ec70619b3a4bb42dd39dad8c0ce1baf39eab634da3d5): The virtual machine or container was forcefully exited.
From the error messages, it seems that it is the container's image that caused the pod's failure. Did I miss anything when I was setting up my own image? Why the network plugin failed to work for my created image but works for the image mcr.microsoft.com/k8s/core/pause:1.2.0 ?
I started using a mac and use docker to develop my sites. Until 12 ago my projects were running properly.
The last activity I tried to on the machine was to install brew mysql, which I already removed. After I restarted my machine, I noticed that I can no longer start any docker container and view a project.
I cd into my project folder and run
docker-compose up
The following error shows up:
Sydney-de-Sousas-Mac:scottishleader.co.za sidney$ docker-compose up
Starting scottishleader_mariadb ... error
ERROR: for scottishleader_mariadb Cannot start service mariadb: driver failed programming external connectivity on endpoint scottishleader_mariadb (30a2e258468e8840ca5e67a535835603770afadab58b3e460e624ae3ba587293): Error starting userland proxy: /forwards/expose/port returned unexpected status: 500
ERROR: for mariadb Cannot start service mariadb: driver failed programming external connectivity on endpoint scottishleader_mariadb (30a2e258468e8840ca5e67a535835603770afadab58b3e460e624ae3ba587293): Error starting userland proxy: /forwards/expose/port returned unexpected status: 500
ERROR: Encountered errors while bringing up the project.
Sydney-de-Sousas-Mac:scottishleader.co.za sidney$
I run docker ps and no container is listed
Sydney-de-Sousas-Mac:scottishleader.co.za sidney$ docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
Why can I suddenly not start the container?
I get the following Error when trying to get ddev running with nfs-mounts on win10
ERROR: for web Cannot start service web: error while
mounting volume '/var/lib/docker/volumes/ddev-drupal8_nfsmount/_data': error while
mounting volume with options:
type='nfs' device=':/D/seu/source/drupal8prototyp_' o='addr=host.docker.internal,hard,
nolock,rw': connection refused
I have checked the Windows Defender FW and added the following lines to the nfs_exports.txt file
Any hints on that topic?
I have ICP V2.1 installed into a RHEL VMWare image. After rebooting the image, ICP fails to start in what appears to be the first known issue in the documentation (Kubernetes controller manager fails to start after a master or cluster restart). However, the prescribed resolution does not get my system going.
Here is the running pod list:
NAME READY STATUS RESTARTS AGE
calico-node-amd64-dtl47 2/2 Running 14 20h
filebeat-ds-amd64-mvcsj 1/1 Running 8 20h
k8s-etcd-192.168.232.131 1/1 Running 7 20h
k8s-mariadb-192.168.232.131 1/1 Running 7 20h
k8s-master-192.168.232.131 2/3 CrashLoopBackOff 15 17m
k8s-proxy-192.168.232.131 1/1 Running 7 20h
metering-reader-amd64-gkwt4 1/1 Running 7 20h
monitoring-prometheus-nodeexporter-amd64-sghrv 1/1 Running 7 20h
Removing the k8s-master-192.168.232.131 pod and allowing it to restart only puts it back into the CrashLoopBackOff state. Here is how the last line in controller manager log looks:
F1029 23:55:07.345341 1 controllermanager.go:176] error building controller context: failed to get supported resources from server: unable to retrieve the complete list of server APIs: servicecatalog.k8s.io/v1alpha1: an error on the server ("Error: 'dial tcp 10.0.0.145:443: getsockopt: connection refused'\nTrying to reach: 'https://10.0.0.145:443/apis/servicecatalog.k8s.io/v1alpha1'") has prevented the request from succeeding
Removing the pod or removing the failed controller master docker container directly has no effect. It seems like another service hasn't started yet, or failed to start. I've waited several hours to see if the issue resolves itself, but to no avail.
Thanks...
Before the fix of https://github.com/kubernetes/kubernetes/pull/49495, kuberentes controller manager failed to start if an registered extension-apiserver is not ready. In ICP, service catalog is implemented as extension-apiserver.
Usually after ICP master is restarted, kubelet will start the k8s management service first as static pod. After that, it will get pods/nodes/service information from kubernetes api server, and then start all the pods including catalog api service. For that case, the whole cluster is recovered.
However for your case, there is a race condition that when kubelet get pods information from kuberentes api server and start all the pods, it has not get the nodes information from kubernetes api server yet. As a result, kubelet failed to start catalog api service due to nodeSelector is not met. The whole cluster failed to be recovered.
In next release of ICP 2.1.0.1, kuberentes will be upgraded into 1.8.2 with the fix of https://github.com/kubernetes/kubernetes/pull/49495. The issue will be resolved completely.
Before that you could try the following workaround method.
Use the -s flag form of the kubectl command if your token has expired after restart and you no longer have access to the GUI to re-establish it.
Delete apiservices of v1alpha1.servicecatalog.k8s.io
kubectl delete apiservices v1alpha1.servicecatalog.k8s.io
kubectl -s 127.0.0.1:8888 delete apiservices v1alpha1.servicecatalog.k8s.io
Delete the dead controller manager
docker rm <k8s controller manager>
Wait until service catalog started
Recover the service catalog apiservices by re-register the apiservice of v1alpha1.servicecatalog.k8s.io
kubectl apply -f cluster/cfc-components/service-catalog/apiregistration.yaml
kubectl -s 127.0.0.1:8888 apply -f cluster/cfc-components/service-catalog/apiregistration.yaml
when i start a docker containner on mesos-slave
the mesos-slave log shows that:
I1223 15:38:40.822557 258486272 docker.cpp:761] Starting container 'ea1ed2fa-c2e3-469a-bcc4-142e0a6c624d' for task '2-1.2fb839ea-a948-11e5-9c42-2e7bf2aa25a6' (and executor '2-1.2fb839ea-a948-11e5-9c42-2e7bf2aa25a6') of framework '13165a00-8e58-4d80-b84d-fe4652022a3e-0000'
E1223 15:38:41.219044 254730240 slave.cpp:3342] Container 'ea1ed2fa-c2e3-469a-bcc4-142e0a6c624d' for executor '2-1.2fb839ea-a948-11e5-9c42-2e7bf2aa25a6' of framework '13165a00-8e58-4d80-b84d-fe4652022a3e-0000' failed to start: Failed to 'docker -H unix:///var/run/docker.sock pull python:3': exit status = exited with status 1 stderr = An error occurred trying to connect: Post https:///var/run/docker.sock/v1.19/images/create?fromImage=python%3A3: dial unix /var/run/docker.sock: no such file or directory
from that, i can see mesos-slave excute docker -H unix:///var/run/docker.sock pull python:3 to download images.
but I use Mac OSX, And the docker is running in the vm ,
So there is no docker.sock file in my slave but the vm.
How cuold i solve the issue on Mac OSX? please help me ,thanks!
I had this issue too. I was trying to run zookeeper, mesos, marathon, and friends all in one docker-compose file with my mac.
Trying to run things in marathon would cause the exact error that you got. I discovered that the docker daemon was not running on the slave. All I had to do was start it: docker exec <mesos-slave-container-name> sudo service docker start.
From then on, I was able to run docker containers using marathon.