Failed to create kubernetes pod on a windows server 2019 node - windows

I set up a master node on a Ubuntu 18.04 machine and followed the guide: https://kubernetes.io/ja/docs/setup/production-environment/windows/user-guide-windows-nodes/ to register a windows server 2019 node to the cluster successfully.
Now the kubelet has been started on powershell and the two nodes are ready.
On the windows machine, I run the command line: "kubectl create deployment --image=XXX(A windows server 2019 image) webadmin-app" to create a deployment on the windows node.
When creating the pod, kubelet reports the following log messages:
W0821 17:37:03.003768 99524 pod_container_deletor.go:77] Container "dee4daa76a9e60e0e68af75597092aa5cff517c7021a6ef7579f77f662f2a163" not found in pod's containers
W0821 17:37:03.071774 99524 helpers.go:289] Unable to create pod sandbox due to conflict. Attempting to remove sandbox "dee4daa76a9e60e0e68af75597092aa5cff517c7021a6ef7579f77f662f2a163"
E0821 17:37:03.108764 99524 remote_runtime.go:200] CreateContainer in sandbox "62ff282461eba2fae24a66b7d38ccca43b224c74320dbb5a0a4659b4c4446eb7" from runtime service failed: rpc error: code = Unknown desc = Error response from daemon: Conflict. The container name "/k8s_webadmin-site_webadmin-app-757c7455cf-nms75_default_7ac60567-f9e2-4c04-aead-c6957200c961_0" is already in use by container "dee4daa76a9e60e0e68af75597092aa5cff517c7021a6ef7579f77f662f2a163". You have to remove (or rename) that container to be able to reuse that name.
E0821 17:37:03.109762 99524 kuberuntime_manager.go:801] container start failed: CreateContainerError: Error response from daemon: Conflict. The container name "/k8s_webadmin-site_webadmin-app-757c7455cf-nms75_default_7ac60567-f9e2-4c04-aead-c6957200c961_0" is already in use by container "dee4daa76a9e60e0e68af75597092aa5cff517c7021a6ef7579f77f662f2a163". You have to remove (or rename) that container to be able to reuse that name.
E0821 17:37:03.113766 99524 pod_workers.go:191] Error syncing pod 7ac60567-f9e2-4c04-aead-c6957200c961 ("webadmin-app-757c7455cf-nms75_default(7ac60567-f9e2-4c04-aead-c6957200c961)"), skipping: failed to "StartContainer" for "webadmin-site" with CreateContainerError: "Error response from daemon: Conflict. The container name "/k8s_webadmin-site_webadmin-app-757c7455cf-nms75_default_7ac60567-f9e2-4c04-aead-c6957200c961_0" is already in use by container "dee4daa76a9e60e0e68af75597092aa5cff517c7021a6ef7579f77f662f2a163". You have to remove (or rename) that container to be able to reuse that name."
Such failed messages keeps generating when creating the pod.
So I listed the docker containers using "docker ps" during the pod creation period. It seems that the kubelet keeps creating and removing containers(which are based on the specified image XXX).
How can I resolve such failure to create a deployment on the windows node?
Update:
I can create a deployment if --image is set to mcr.microsoft.com/k8s/core/pause:1.2.0 .
However if I use another image which is based on windows nano server. I got the deployment failed.
I caught up some error logs from kubelet output:
W0826 17:58:14.903662 127340 cri_stats_provider_windows.go:89] Failed to get HNS endpoint "" with error 'json: cannot unmarshal array into Go value of type hns.HNSEndpoint', continue to get stats for other endpoints
E0826 17:58:14.940665 127340 cni.go:364] Error adding default_webadmin-app-757c7455cf-5c5nf/9a37dab965d1b006c4c2ec70619b3a4bb42dd39dad8c0ce1baf39eab634da3d5 to network flannel/vxlan0: error while ProvisionEndpoint(9a37dab965d1b006c4c2ec70619b3a4bb42dd39dad8c0ce1baf39eab634da3d5_vxlan0,39EA65C2-C0F9-4870-8B7A-E2A1DBF5CD9D,9a37dab965d1b006c4c2ec70619b3a4bb42dd39dad8c0ce1baf39eab634da3d5): The virtual machine or container was forcefully exited.
E0826 17:58:14.942663 127340 cni_windows.go:59] error while adding to cni network: error while ProvisionEndpoint(9a37dab965d1b006c4c2ec70619b3a4bb42dd39dad8c0ce1baf39eab634da3d5_vxlan0,39EA65C2-C0F9-4870-8B7A-E2A1DBF5CD9D,9a37dab965d1b006c4c2ec70619b3a4bb42dd39dad8c0ce1baf39eab634da3d5): The virtual machine or container was forcefully exited.
W0826 17:58:14.943663 127340 docker_sandbox.go:400] failed to read pod IP from plugin/docker: networkPlugin cni failed on the status hook for pod "webadmin-app-757c7455cf-5c5nf_default": error while ProvisionEndpoint(9a37dab965d1b006c4c2ec70619b3a4bb42dd39dad8c0ce1baf39eab634da3d5_vxlan0,39EA65C2-C0F9-4870-8B7A-E2A1DBF5CD9D,9a37dab965d1b006c4c2ec70619b3a4bb42dd39dad8c0ce1baf39eab634da3d5): The virtual machine or container was forcefully exited.
From the error messages, it seems that it is the container's image that caused the pod's failure. Did I miss anything when I was setting up my own image? Why the network plugin failed to work for my created image but works for the image mcr.microsoft.com/k8s/core/pause:1.2.0 ?

Related

failed to shutdown container in docker:

when im trying to build an image with command docker build im getting this error
im using windows docker desktop
The command '/bin/sh -c yarn install --production' returned a non-zero code: 4294967295: failed to shutdown container: container 1842716825b498d7be9ed514a4839dfc528caafe8f906018c93842517f60635e encountered an error during hcsshim::System::waitBackground: failure in a Windows system call: The virtual machine or container with the specified identifier is not running. (0xc0370110): subsequent terminate failed container 1842716825b498d7be9ed514a4839dfc528caafe8f906018c93842517f60635e encountered an error during hcsshim::System::waitBackground: failure in a Windows system call: The virtual machine or container with the specified identifier is not running. (0xc0370110)
how this error be resolved
This must be due to windows host failure on docker call when try to shut down the container
Try by restarting docker demon
sudo service docker restart
Check the status of the Docker containers:
docker ps -a
stop and remove it:
docker stop <container-id>
docker rm <container-id>
Then try by building the image again. If the problem still persists, consider upgrading to the latest version of Docker or clear docker cache or
docker system prune

Error syncing pod on starting Beam - Dataflow pipeline from docker

We are constantly getting an error while starting our Beam Golang SDK pipeline (driver program) from a docker image which works when started from local / VM instance. We are using Dataflow runner for our pipeline and Kubernetes to deploy.
LOCAL SETUP:
We have GOOGLE_APPLICATION_CREDENTIALS variable set with service account for our GCP cluster. When running the job from local, job gets submitted to dataflow and completes successfully.
DOCKER SETUP:
Build image used is FROM golang:1.14-alpine. When we pack the same program with Dockerfile and try to run, it fails with error
User program exited: fork/exec /bin/worker: no such file or directory
On checking Stackdriver logs for more details, we see this:
Error syncing pod 00014c7112b5049966a4242e323b7850 ("dataflow-go-job-1-1611314272307727-
01220317-27at-harness-jv3l_default(00014c7112b5049966a4242e323b7850)"),
skipping: failed to "StartContainer" for "sdk" with CrashLoopBackOff:
"back-off 2m40s restarting failed container=sdk pod=dataflow-go-job-1-
1611314272307727-01220317-27at-harness-jv3l_default(00014c7112b5049966a4242e323b7850)"
Found reference to this error in Dataflow common errors doc, but it is too generic to figure out whats failing. After multiple retries, we were able to eliminate any permission / access related issues from pods. Not sure what else could be the problem here.
After multiple attempts, we decided to start the job manually from a new Debian 10 based VM instance and it worked. This brought to our notice that we are using alpine based golang image in Docker which may not have all the required dependencies installed to start the job.
On golang docker hub, we found a golang:1.14-buster where buster is codename for Debian 10. Using that for docker build helped us solve the issue. Self answering here to help anyone else facing the same issues.

Docker Compose failed to build - Filesharing has been cancelled

I've ran into an issue with Docker Desktop, currently im running the edge version as a user on Stackoverflow. Before I got the drive not shared for unknown reason error which was "solved" by installing edge version: Docker for Windows: Drive sharing failed for an unknown reason
Now that this was installed im getting this new error which prevents some containers from being built. These containers have all been tested and works on several other systems. Currently 3 out of 4 containers are not built and they all produce the same error as below:
ERROR: for db Cannot create container for service db: status code not OK but 500: {"Message":"Unhandled exception: Filesharing has been cancelled"}
Encountered errors while bringing up the project.
full error:
Creating imt2291-part2_www_1 ...
Creating imt2291-part2_phpmyadmin_1 ... done
Creating imt2291-part2_db_1 ...
Creating imt2291-part2_test_1 ... error
Creating imt2291-part2_www_1 ... error
ERROR: for imt2291-part2_test_1 Cannot create container for service test: status code not OK but 500: {"Message":"Unhandled exception: Filesharing has been cancelled"}
ERROR: for imt2291-part2_www_1 Cannot create container for service www: status Creating imt2291-part2_db_1 ... error
lled"}
ERROR: for imt2291-part2_db_1 Cannot create container for service db: status code not OK but 500: {"Message":"Unhandled exception: Filesharing has been cancelled"}
ERROR: for test Cannot create container for service test: status code not OK but 500: {"Message":"Unhandled exception: Filesharing has been cancelled"}
ERROR: for www Cannot create container for service www: status code not OK but 500: {"Message":"Unhandled exception: Filesharing has been cancelled"}
ERROR: for db Cannot create container for service db: status code not OK but 500: {"Message":"Unhandled exception: Filesharing has been cancelled"}
Encountered errors while bringing up the project.
Has anyone encountered this issue before and found a fix?
You need to update File Sharing configuration in your Docker for Windows app (there is a new security hardening in 2.2.0.0 which has agressive defaults). Add all folders you need and then restart Docker for Windows.
After changing "File Sharing" to C Drive its start working in my windows machine. I am using docker desktop 2.3.0.3
I am using Docker in Windows 10 and had the same problem.
The solution suggested by Oleg Nenashev and Rejoanul Alam helped me.
Adding the project dir where the Dockerfile lives or C:/ to docker shared folders solves the problem.
Step 6 from Getting started states:
Shared folders, volumes, and bind mounts
If your project is outside of the Users directory (cd ~), then you need to share the drive or location of the Dockerfile and volume you are using.
If you get runtime errors indicating an application file is not found, a volume mount is denied, or a service cannot start, try enabling file or drive sharing.
Volume mounting requires shared drives for projects that live outside of C:\Users (Windows) or /Users (Mac), and is required for any project on Docker Desktop for Windows that uses Linux containers.
For more information, see File sharing on Docker for Mac, and the general examples on how to Manage data in containers.
If you are using Oracle VirtualBox on an older Windows OS, you might encounter an issue with shared folders as described in this VB trouble ticket. Newer Windows systems meet the requirements for Docker Desktop for Windows and do not need VirtualBox.
I have meet this problem and my environment is windows. first when the issue happen, I chance the file sharing path to C: , and my project path is in G: ,so the command docker-compose up fail,and the massage is:
docker: Error response from daemon: status code not OK but 500 {"Message":"Unhandled exception: Filesharing has been cancelled"}
I think the file sharing must include your project path. because when I set my file sharing path in other file path , it doesn't work, however, i chance it to the path where is my project path, docker-compose up do successfully!
I had the same error and answers of Oleg Nenashev and Rejoanul Alam helped me to solve this error.
My task was to share volumes between containers.
I have a website folder with file index.html inside:
Open Docker Desktop Settings and write the address of the folder you are working with in File Sharing:
After this open this folder in your command prompt and add the necessary Docker command:
If the command is correct and you did all steps attentively it must be working fine.

My docker-compose up is no longer working and cannot start mariadb

I started using a mac and use docker to develop my sites. Until 12 ago my projects were running properly.
The last activity I tried to on the machine was to install brew mysql, which I already removed. After I restarted my machine, I noticed that I can no longer start any docker container and view a project.
I cd into my project folder and run
docker-compose up
The following error shows up:
Sydney-de-Sousas-Mac:scottishleader.co.za sidney$ docker-compose up
Starting scottishleader_mariadb ... error
ERROR: for scottishleader_mariadb Cannot start service mariadb: driver failed programming external connectivity on endpoint scottishleader_mariadb (30a2e258468e8840ca5e67a535835603770afadab58b3e460e624ae3ba587293): Error starting userland proxy: /forwards/expose/port returned unexpected status: 500
ERROR: for mariadb Cannot start service mariadb: driver failed programming external connectivity on endpoint scottishleader_mariadb (30a2e258468e8840ca5e67a535835603770afadab58b3e460e624ae3ba587293): Error starting userland proxy: /forwards/expose/port returned unexpected status: 500
ERROR: Encountered errors while bringing up the project.
Sydney-de-Sousas-Mac:scottishleader.co.za sidney$
I run docker ps and no container is listed
Sydney-de-Sousas-Mac:scottishleader.co.za sidney$ docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
Why can I suddenly not start the container?

ICP fails to start after machine reboot

I have ICP V2.1 installed into a RHEL VMWare image. After rebooting the image, ICP fails to start in what appears to be the first known issue in the documentation (Kubernetes controller manager fails to start after a master or cluster restart). However, the prescribed resolution does not get my system going.
Here is the running pod list:
NAME READY STATUS RESTARTS AGE
calico-node-amd64-dtl47 2/2 Running 14 20h
filebeat-ds-amd64-mvcsj 1/1 Running 8 20h
k8s-etcd-192.168.232.131 1/1 Running 7 20h
k8s-mariadb-192.168.232.131 1/1 Running 7 20h
k8s-master-192.168.232.131 2/3 CrashLoopBackOff 15 17m
k8s-proxy-192.168.232.131 1/1 Running 7 20h
metering-reader-amd64-gkwt4 1/1 Running 7 20h
monitoring-prometheus-nodeexporter-amd64-sghrv 1/1 Running 7 20h
Removing the k8s-master-192.168.232.131 pod and allowing it to restart only puts it back into the CrashLoopBackOff state. Here is how the last line in controller manager log looks:
F1029 23:55:07.345341 1 controllermanager.go:176] error building controller context: failed to get supported resources from server: unable to retrieve the complete list of server APIs: servicecatalog.k8s.io/v1alpha1: an error on the server ("Error: 'dial tcp 10.0.0.145:443: getsockopt: connection refused'\nTrying to reach: 'https://10.0.0.145:443/apis/servicecatalog.k8s.io/v1alpha1'") has prevented the request from succeeding
Removing the pod or removing the failed controller master docker container directly has no effect. It seems like another service hasn't started yet, or failed to start. I've waited several hours to see if the issue resolves itself, but to no avail.
Thanks...
Before the fix of https://github.com/kubernetes/kubernetes/pull/49495, kuberentes controller manager failed to start if an registered extension-apiserver is not ready. In ICP, service catalog is implemented as extension-apiserver.
Usually after ICP master is restarted, kubelet will start the k8s management service first as static pod. After that, it will get pods/nodes/service information from kubernetes api server, and then start all the pods including catalog api service. For that case, the whole cluster is recovered.
However for your case, there is a race condition that when kubelet get pods information from kuberentes api server and start all the pods, it has not get the nodes information from kubernetes api server yet. As a result, kubelet failed to start catalog api service due to nodeSelector is not met. The whole cluster failed to be recovered.
In next release of ICP 2.1.0.1, kuberentes will be upgraded into 1.8.2 with the fix of https://github.com/kubernetes/kubernetes/pull/49495. The issue will be resolved completely.
Before that you could try the following workaround method.
Use the -s flag form of the kubectl command if your token has expired after restart and you no longer have access to the GUI to re-establish it.
Delete apiservices of v1alpha1.servicecatalog.k8s.io
kubectl delete apiservices v1alpha1.servicecatalog.k8s.io
kubectl -s 127.0.0.1:8888 delete apiservices v1alpha1.servicecatalog.k8s.io
Delete the dead controller manager
docker rm <k8s controller manager>
Wait until service catalog started
Recover the service catalog apiservices by re-register the apiservice of v1alpha1.servicecatalog.k8s.io
kubectl apply -f cluster/cfc-components/service-catalog/apiregistration.yaml
kubectl -s 127.0.0.1:8888 apply -f cluster/cfc-components/service-catalog/apiregistration.yaml

Resources