In IBM Cloud Private when stopping a Docker container, it automatically restarts. How can it be stopped?
Here's a bit more information:
When you work with containers on IBM Cloud Private, you'd actually deploying individual Pods or more likely Deployments.
When a Pod is managed by a ReplicaSet, DaemonSet, or StatefulSet, there are semantics which apply to reschedule the pod if it fails unexpectedly. Deleting a Pod isn't distinguished from other failures within a pod (application crashes or worker node failure).
You should be using kubectl to work with pods. You can configure kubectl from User > Configure Client in the top right corner of the web UI. Copy and paste the commands for your environment into your console. Validate that the IP or network address is resolvable from your client machine (control this value in the install cluster/config.yaml with cluster_access_ip).
Example kubectl configure steps (Copy from User > Configure Client in the web UI):
kubectl config set-cluster mycluster.icp --server=https://[NETWORK_ADDRESS]:8001 --insecure-skip-tls-verify=true
kubectl config set-context mycluster.icp-context --cluster=mycluster.icp
kubectl config set-credentials mycluster.icp-user --token=[TOKEN]
kubectl config set-context mycluster.icp-context --user=mycluster.icp-user --namespace=default
kubectl config use-context mycluster.icp-context
Then view running pods:
kubectl get pods [--namespace default]
These pods represent the basic unit of deployment: containers + volumes + labels + links to ConfigMaps and Secrets.
These pods are generally deployed from other management "sets":
kubectl get deployments [--namespace default]
kubectl get daemonsets [--namespace default]
kubectl get statefulsets [--namespace default]
These collections represent policy + pods; behaviors about how to recover are built into each construct.
You probably have a deployment, so to remove the container --
kubectl get deployments -o wide [--namespace default]
Find the deployment of interest, and delete it:
kubectl delete deployments my-deployment [--namespace default]
Now the deployment will be removed, along with all associated pods.
You need to stop the kubelet first, otherwise it will automatically start up exited containers. You can run “systemctl stop kubelet”.
kubernetes restarts failed containers (pods), you should scale the deployment to 0 instances or delete the deployment, both can be achieved with kubectl (kubectl scale --replicas=0 ...) or using ICP console.
You should change the number of replicas to zero.
Related
I set up a master node on a Ubuntu 18.04 machine and followed the guide: https://kubernetes.io/ja/docs/setup/production-environment/windows/user-guide-windows-nodes/ to register a windows server 2019 node to the cluster successfully.
Now the kubelet has been started on powershell and the two nodes are ready.
On the windows machine, I run the command line: "kubectl create deployment --image=XXX(A windows server 2019 image) webadmin-app" to create a deployment on the windows node.
When creating the pod, kubelet reports the following log messages:
W0821 17:37:03.003768 99524 pod_container_deletor.go:77] Container "dee4daa76a9e60e0e68af75597092aa5cff517c7021a6ef7579f77f662f2a163" not found in pod's containers
W0821 17:37:03.071774 99524 helpers.go:289] Unable to create pod sandbox due to conflict. Attempting to remove sandbox "dee4daa76a9e60e0e68af75597092aa5cff517c7021a6ef7579f77f662f2a163"
E0821 17:37:03.108764 99524 remote_runtime.go:200] CreateContainer in sandbox "62ff282461eba2fae24a66b7d38ccca43b224c74320dbb5a0a4659b4c4446eb7" from runtime service failed: rpc error: code = Unknown desc = Error response from daemon: Conflict. The container name "/k8s_webadmin-site_webadmin-app-757c7455cf-nms75_default_7ac60567-f9e2-4c04-aead-c6957200c961_0" is already in use by container "dee4daa76a9e60e0e68af75597092aa5cff517c7021a6ef7579f77f662f2a163". You have to remove (or rename) that container to be able to reuse that name.
E0821 17:37:03.109762 99524 kuberuntime_manager.go:801] container start failed: CreateContainerError: Error response from daemon: Conflict. The container name "/k8s_webadmin-site_webadmin-app-757c7455cf-nms75_default_7ac60567-f9e2-4c04-aead-c6957200c961_0" is already in use by container "dee4daa76a9e60e0e68af75597092aa5cff517c7021a6ef7579f77f662f2a163". You have to remove (or rename) that container to be able to reuse that name.
E0821 17:37:03.113766 99524 pod_workers.go:191] Error syncing pod 7ac60567-f9e2-4c04-aead-c6957200c961 ("webadmin-app-757c7455cf-nms75_default(7ac60567-f9e2-4c04-aead-c6957200c961)"), skipping: failed to "StartContainer" for "webadmin-site" with CreateContainerError: "Error response from daemon: Conflict. The container name "/k8s_webadmin-site_webadmin-app-757c7455cf-nms75_default_7ac60567-f9e2-4c04-aead-c6957200c961_0" is already in use by container "dee4daa76a9e60e0e68af75597092aa5cff517c7021a6ef7579f77f662f2a163". You have to remove (or rename) that container to be able to reuse that name."
Such failed messages keeps generating when creating the pod.
So I listed the docker containers using "docker ps" during the pod creation period. It seems that the kubelet keeps creating and removing containers(which are based on the specified image XXX).
How can I resolve such failure to create a deployment on the windows node?
Update:
I can create a deployment if --image is set to mcr.microsoft.com/k8s/core/pause:1.2.0 .
However if I use another image which is based on windows nano server. I got the deployment failed.
I caught up some error logs from kubelet output:
W0826 17:58:14.903662 127340 cri_stats_provider_windows.go:89] Failed to get HNS endpoint "" with error 'json: cannot unmarshal array into Go value of type hns.HNSEndpoint', continue to get stats for other endpoints
E0826 17:58:14.940665 127340 cni.go:364] Error adding default_webadmin-app-757c7455cf-5c5nf/9a37dab965d1b006c4c2ec70619b3a4bb42dd39dad8c0ce1baf39eab634da3d5 to network flannel/vxlan0: error while ProvisionEndpoint(9a37dab965d1b006c4c2ec70619b3a4bb42dd39dad8c0ce1baf39eab634da3d5_vxlan0,39EA65C2-C0F9-4870-8B7A-E2A1DBF5CD9D,9a37dab965d1b006c4c2ec70619b3a4bb42dd39dad8c0ce1baf39eab634da3d5): The virtual machine or container was forcefully exited.
E0826 17:58:14.942663 127340 cni_windows.go:59] error while adding to cni network: error while ProvisionEndpoint(9a37dab965d1b006c4c2ec70619b3a4bb42dd39dad8c0ce1baf39eab634da3d5_vxlan0,39EA65C2-C0F9-4870-8B7A-E2A1DBF5CD9D,9a37dab965d1b006c4c2ec70619b3a4bb42dd39dad8c0ce1baf39eab634da3d5): The virtual machine or container was forcefully exited.
W0826 17:58:14.943663 127340 docker_sandbox.go:400] failed to read pod IP from plugin/docker: networkPlugin cni failed on the status hook for pod "webadmin-app-757c7455cf-5c5nf_default": error while ProvisionEndpoint(9a37dab965d1b006c4c2ec70619b3a4bb42dd39dad8c0ce1baf39eab634da3d5_vxlan0,39EA65C2-C0F9-4870-8B7A-E2A1DBF5CD9D,9a37dab965d1b006c4c2ec70619b3a4bb42dd39dad8c0ce1baf39eab634da3d5): The virtual machine or container was forcefully exited.
From the error messages, it seems that it is the container's image that caused the pod's failure. Did I miss anything when I was setting up my own image? Why the network plugin failed to work for my created image but works for the image mcr.microsoft.com/k8s/core/pause:1.2.0 ?
I am deploy a elasticsearch cluster in kubernetes,now the cluster pod restart for many times.For elasticsearch cluster stable,I want to find out why the cluster restart.Now I check the restart pods log it only shows the restarted log output,but the restarted log is no errors,I try to set the cluster not start automatic and I could see the error output on fail,It shows:
StatefulSet.apps "es-cluster" is invalid: spec.template.spec.restartPolicy: Unsupported value: "Never": supported values: "Always"
so which is the best way to find out why the pods restart?
To get a detailed view of the log snipper or immediate reasons for startup failures of a pod, run
kubectl describe pod <pod> -n <namespace>
Ideally you should run it as soon as a pod restarts (or you can force it by deleting the pod). This can/should be done in addition to the above comment that suggests tailing the logs, up to the moment when the pods fail to start. Be mindful that if there are multiple containers in a pod you may also need to
kubectl logs -f <pod> -n <namespace> -c <container name> --previous
Cheers
To get the log of restarted pod run
kubectl logs -f <pod> -n <namespace> --previous
I have been trying to setup Spring Cloud Dataflow Server for Kubernetes locally using minikube. Have followed the installation instructions in the the link here : SCDF Installation Reference
I've been getting the below error for the SCDF server:
11:32:52.095 [main] DEBUG io.fabric8.kubernetes.client.Config - Trying to configure client namespace from Kubernetes service account namespace path...
11:32:52.096 [main] DEBUG io.fabric8.kubernetes.client.Config - Found service account namespace at: [/var/run/secrets/kubernetes.io/serviceaccount/namespace].
2018-04-24 11:33:14.348 WARN 1 --- [ main] o.s.cloud.kubernetes.StandardPodUtils : Failed to get pod with name:[scdf-server-869d56967c-97lsd]. You should look into this if things aren't working as you expect. Are you missing serviceaccount permissions?
io.fabric8.kubernetes.client.KubernetesClientException: Failure executing: GET at: https://kubernetes.default.svc/api/v1/namespaces/default/pods/scdf-server-869d56967c-97lsd. Message: Forbidden!Configured service account doesn't have access. Service account may have been revoked. pods "scdf-server-869d56967c-97lsd" is forbidden: User "system:serviceaccount:default:default" cannot get pods in the namespace "default".
Below are the version details:
Spring Cloud Data Flow Server : 1.4.0.RELEASE
Kubernetes Local Deployment using minikube
Kubernetes Version : 1.10
The latest release of minikube enabled RBAC by default.
For RBAC enabled clusters, we have added a note in the installation section on this matter.
"The latest releases of kubernetes have enabled RBAC on the api-server. If your target platform has RBAC enabled you must ask a cluster-admin to create the roles and role-bindings for you before deploying the dataflow server. They associate the dataflow service account with the roles it needs to be run with."
For minikube, however, you can run the following command and retry installaing.
kubectl create clusterrolebinding add-on-cluster-admin --clusterrole=cluster-admin --serviceaccount=kube-system:default
Alternatively, if you're using the helm-chart, you can disable RBAC and install the chart with the following on minikube.
helm init
helm repo add incubator https://kubernetes-charts-incubator.storage.googleapis.com
helm repo update
helm install --name my-release --set server.service.type=NodePort --set rbac.create=false incubator/spring-cloud-data-flow
From the installation guide, step 7: https://docs.spring.io/spring-cloud-dataflow-server-kubernetes/docs/1.4.0.RELEASE/reference/htmlsingle/#_deploying_using_kubectl
The latest releases of kubernetes have enabled RBAC on the api-server. If your target platform has RBAC enabled you must ask a cluster-admin to create the roles and role-bindings for you before deploying the dataflow server. They associate the dataflow service account with the roles it needs to be run with.
$ kubectl create -f src/kubernetes/server/server-roles.yaml
$ kubectl create -f src/kubernetes/server/server-rolebinding.yaml
Did you perform those steps?
I have developed Spring Boot applications. I have setup admin and RabbitMQ as well as spring cloud bus. When i refresh the end points of applications, it refreshes the properties for application.
Can anyone please help me how to setup RabbitMQ in kubernetes now? I did research to an extent and found in few articles that it needs to be deployed as "Statefulset" rather than "Deployment" https://notallaboutcode.blogspot.de/2017/09/rabbitmq-on-kubernetes-container.html. I could not get why this needs to be done exactly. Also any useful link on deploying RabbitMQ in kubernetes would help.
It depends on what you're looking to do and what tools you have available. I guess your current setup is much like that described in http://www.baeldung.com/spring-cloud-bus. One approach to porting that to kubernetes might be to try to get your setup working with docker-compose first and then you could port that docker-compose to kubernetes deployment descriptors.
A simple way to deploy rabbitmq in k8s would be to set up a Deployment using a rabbitmq docker image. An example of this is https://github.com/Activiti/activiti-cloud-examples/blob/fe732096b5a19de0ad44879a399053f6ae02b095/kubernetes/kubectl/infrastructure.yml#L17. (Notice that file isn't radically different from a docker-compose file so you could port from one to the other.) But that won't be persisting data outside of the Pods so if the cluster were to go down or the Pod/s were to go down then you'd lose message data. The persistence is ephemeral.
So to have non-ephemeral persistence you could instead use a StatefulSet as in the example you point to. Another example is https://wesmorgan.svbtle.com/rabbitmq-cluster-on-kubernetes-with-statefulsets
If you are using helm (or can use helm) then you could use the rabbitmq helm chart, which uses a StatefulSet.
But if your only reason for needing the bus is to trigger refreshes when property changes happen then there are alternative paths available with Kubernetes. I'm guessing you need the hot reloads so you could look at using https://github.com/fabric8io/spring-cloud-kubernetes#propertysource-reload Or if you need the config to come from git specifically then you could look at http://fabric8.io/guide/develop/configuration.html (If you didn't need the hot reloads or git then you could consider versioning your configmaps and upgrading them with your application upgrades like in https://dzone.com/articles/configuring-java-apps-with-kubernetes-configmaps-a )
If you have installed helm in your cluster
helm install stable/rabbitmq
This will install rabbitmqserver on your cluster, the following commands are for obtaining the password and erlang cookie, replace prodding-wombat-rabbitmq for w/e kubernetes decides to name the pod.
kubectl get secret --namespace default prodding-wombat-rabbitmq -o jsonpath="{.data.rabbitmq-password}" | base64 --decode
kubectl get secret --namespace default prodding-wombat-rabbitmq -o jsonpath="{.data.rabbitmq-erlang-cookie}" | base64 --decode
To connect to the pod:
export POD_NAME=$(kubectl get pods --namespace default -l "app=prodding-wombat-rabbitmq" -o jsonpath="{.items[0].metadata.name}")
Then prorxy to localhost so you can connect in your browswer
kubectl port-forward $POD_NAME 5672:5672 15672:15672
I have ICP V2.1 installed into a RHEL VMWare image. After rebooting the image, ICP fails to start in what appears to be the first known issue in the documentation (Kubernetes controller manager fails to start after a master or cluster restart). However, the prescribed resolution does not get my system going.
Here is the running pod list:
NAME READY STATUS RESTARTS AGE
calico-node-amd64-dtl47 2/2 Running 14 20h
filebeat-ds-amd64-mvcsj 1/1 Running 8 20h
k8s-etcd-192.168.232.131 1/1 Running 7 20h
k8s-mariadb-192.168.232.131 1/1 Running 7 20h
k8s-master-192.168.232.131 2/3 CrashLoopBackOff 15 17m
k8s-proxy-192.168.232.131 1/1 Running 7 20h
metering-reader-amd64-gkwt4 1/1 Running 7 20h
monitoring-prometheus-nodeexporter-amd64-sghrv 1/1 Running 7 20h
Removing the k8s-master-192.168.232.131 pod and allowing it to restart only puts it back into the CrashLoopBackOff state. Here is how the last line in controller manager log looks:
F1029 23:55:07.345341 1 controllermanager.go:176] error building controller context: failed to get supported resources from server: unable to retrieve the complete list of server APIs: servicecatalog.k8s.io/v1alpha1: an error on the server ("Error: 'dial tcp 10.0.0.145:443: getsockopt: connection refused'\nTrying to reach: 'https://10.0.0.145:443/apis/servicecatalog.k8s.io/v1alpha1'") has prevented the request from succeeding
Removing the pod or removing the failed controller master docker container directly has no effect. It seems like another service hasn't started yet, or failed to start. I've waited several hours to see if the issue resolves itself, but to no avail.
Thanks...
Before the fix of https://github.com/kubernetes/kubernetes/pull/49495, kuberentes controller manager failed to start if an registered extension-apiserver is not ready. In ICP, service catalog is implemented as extension-apiserver.
Usually after ICP master is restarted, kubelet will start the k8s management service first as static pod. After that, it will get pods/nodes/service information from kubernetes api server, and then start all the pods including catalog api service. For that case, the whole cluster is recovered.
However for your case, there is a race condition that when kubelet get pods information from kuberentes api server and start all the pods, it has not get the nodes information from kubernetes api server yet. As a result, kubelet failed to start catalog api service due to nodeSelector is not met. The whole cluster failed to be recovered.
In next release of ICP 2.1.0.1, kuberentes will be upgraded into 1.8.2 with the fix of https://github.com/kubernetes/kubernetes/pull/49495. The issue will be resolved completely.
Before that you could try the following workaround method.
Use the -s flag form of the kubectl command if your token has expired after restart and you no longer have access to the GUI to re-establish it.
Delete apiservices of v1alpha1.servicecatalog.k8s.io
kubectl delete apiservices v1alpha1.servicecatalog.k8s.io
kubectl -s 127.0.0.1:8888 delete apiservices v1alpha1.servicecatalog.k8s.io
Delete the dead controller manager
docker rm <k8s controller manager>
Wait until service catalog started
Recover the service catalog apiservices by re-register the apiservice of v1alpha1.servicecatalog.k8s.io
kubectl apply -f cluster/cfc-components/service-catalog/apiregistration.yaml
kubectl -s 127.0.0.1:8888 apply -f cluster/cfc-components/service-catalog/apiregistration.yaml