MiniKube coredns is in CrashLoopBackOff status in AWS EC2 - amazon-ec2

I am new to Minikube. We deployed minikube version: v1.0.1 on AWS EC2. Coredns is showing as CrashLoopBackOff.
kubectl get -n kube-system pods
NAME READY STATUS RESTARTS AGE
coredns-6765558d84-6bpff 0/1 CrashLoopBackOff 531 44h
coredns-6765558d84-9mqz6 0/1 CrashLoopBackOff 531 44h
The logs of these pods are showing:
2019-05-22T06:29:40.959Z [FATAL] plugin/loop: Loop (127.0.0.1:53726 -> :53) detected for zone ".", see https://coredns.io/plugins/loop#troubleshooting. Query: "HINFO 1771143215983809104.2668792180170228628."
I did try to remove the word loop from a config file - as per another Stackoverflow ticket. CoreDNS started working, but proxy stopped!

Related

Kubernetes windows worker node addition: "failed to create containerd task: hcsshim::CreateComputeSystem kube-proxy: The directory name is invalid"

I am using the Kubernetes(v1.23.13) with the container and Flannel CNI. The Kubernetes cluster created on ubuntu (v 18) VM(vmware esxi) and windows server running on another VM. I follow the link below to add the windows(windows server 2019) node to the cluster. Windows node added the cluster. But the windows kube-proxy and demonset pod deployment has failed.
Link https://web.archive.org/web/20220530090758/https://kubernetes.io/docs/tasks/administer-cluster/kubeadm/adding-windows-nodes/
Error: Normal Created (x5 over ) kubelet Created container kube-proxy
Normal Pulled (x5 over ) kubelet Container image "sigwindowstools/kube-proxy:v1.23.13-nanoserver" already present on machine
Warning Failed kubelet Error: failed to create containerd task: hcsshim::CreateComputeSystem kube-proxy: The directory name is invalid.
(extra info: {"Owner":"containerd-shim-runhcs-v1.exe","SchemaVersion":{"Major":2,"Minor":1},"Container":{"GuestOs":{"HostName":"kube-proxy-windows-hq7bb"},"Storage":{"Layers":[{"Id":"e30f10e1-6696-5df6-af3f-156a372bce4e","Path":"C:\ProgramData\containerd\root\io.containerd.snapshotter.v1.windows\snapshots\19"},{"Id":"8aa59a8b-78d3-5efe-a3d9-660bd52fd6ce","Path":"C:\ProgramData\containerd\root\io.containerd.snapshotter.v1.windows\snapshots\18"},{"Id":"f222f973-9869-5b65-a546-cb8ae78a32b9","Path":"C:\ProgramData\containerd\root\io.containerd.snapshotter.v1.windows\snapshots\17"},{"Id":"133385ae-6df6-509b-b342-bc46338b3df4","Path":"C:\ProgramData\containerd\root\io.containerd.snapshotter.v1.windows\snapshots\16"},{"Id":"f6f9524c-e3f0-5be2-978d-7e09e0b21299","Path":"C:\ProgramData\containerd\root\io.containerd.snapshotter.v1.windows\snapshots\15"},{"Id":"0d9d58e6-47b6-5091-a552-7cc2027ca06f","Path":"C:\ProgramData\containerd\root\io.containerd.snapshotter.v1.windows\snapshots\14"},{"Id":"6715ca06-295b-5fba-9224-795ca5af71b9","Path":"C:\ProgramData\containerd\root\io.containerd.snapshotter.v1.windows\snapshots\13"},{"Id":"75e64a3b-69a5-52cf-b39f-ee05718eb1e2","Path":"C:\ProgramData\containerd\root\io.containerd.snapshotter.v1.windows\snapshots\12"},{"Id":"8698c4b4-b092-57c6-b1eb-0a7ca14fcf4e","Path":"C:\ProgramData\containerd\root\io.containerd.snapshotter.v1.windows\snapshots\11"},{"Id":"7c9a6fb7-2ca8-5ef7-bbfe-cabbff23cfa4","Path":"C:\ProgramData\containerd\root\io.containerd.snapshotter.v1.windows\snapshots\10"},{"Id":"a10d4ad8-f2b1-5fd6-993f-7aa642762865","Path":"C:\ProgramData\containerd\root\io.containerd.snapshotter.v1.windows\snapshots\9"}],"Path":"\\?\Volume{64336318-a64f-436e-869c-55f9f8e4ea62}\"},"MappedDirectories":[{"HostPath":"c:\","ContainerPath":"c:\host"},{"HostPath":"c:\var\lib\kubelet\pods\1cd0c333-3cd0-4c90-9d22-884ea73e8b69\containers\kube-proxy\0e58a001","ContainerPath":"c:\dev\termination-log"},{"HostPath":"c:\var\lib\kubelet\pods\1cd0c333-3cd0-4c90-9d22-884ea73e8b69\volumes\kubernetes.io~configmap\kube-proxy","ContainerPath":"c:\var\lib\kube-proxy","ReadOnly":true},{"HostPath":"c:\var\lib\kubelet\pods\1cd0c333-3cd0-4c90-9d22-884ea73e8b69\volumes\kubernetes.io~configmap\kube-proxy-windows","ContainerPath":"c:\var\lib\kube-proxy-windows","ReadOnly":true},{"HostPath":"c:\var\lib\kubelet\pods\1cd0c333-3cd0-4c90-9d22-884ea73e8b69\volumes\kubernetes.io~projected\kube-api-access-4zs46","ContainerPath":"c:\var\run\secrets\kubernetes.io\serviceaccount","ReadOnly":true},{"HostPath":"c:\var\lib\kubelet\pods\1cd0c333-3cd0-4c90-9d22-884ea73e8b69\etc-hosts","ContainerPath":"C:\Windows\System32\drivers\etc\hosts"}],"MappedPipes":[{"ContainerPipeName":"rancher_wins","HostPath":"\\.\pipe\rancher_wins"}],"Networking":{"Namespace":"4a4d0354-251a-4750-8251-51ae42707db2"}},"ShouldTerminateOnLastHandleClosed":true}): unknown
Warning BackOff (x23 over ) kubelet Back-off restarting failed container
kubectl get pods -A
NAMESPACE NAME READY STATUS RESTARTS AGE
kube-system coredns-64897985d-2mkd5 1/1 Running 0 19h
kube-system coredns-64897985d-qhhbz 1/1 Running 0 19h
kube-system etcd-scspa2658542001 1/1 Running 2 19h
kube-system kube-apiserver-scspa2658542001 1/1 Running 8 (3h4m ago) 19h
kube-system kube-controller-manager-scspa2658542001 1/1 Running 54 (126m ago) 19h
kube-system kube-flannel-ds-hjw8s 1/1 Running 14 (18h ago) 19h
kube-system kube-flannel-ds-windows-amd64-xfhjl 0/1 ImagePullBackOff 0 29m
kube-system kube-proxy-windows-hq7bb 0/1 CrashLoopBackOff 10 (<invalid> ago) 29m
kube-system kube-proxy-wx2x9 1/1 Running 0 19h
kube-system kube-scheduler-scspa2658542001 1/1 Running 92 (153m ago) 19h
From this issue, it seems windows nodes with flannel has issues they have solved with different work arounds,
As mentioned in the issue they have made a guide to work windows properly, Follow this doc with the installation guide and requirements.
Attaching troubleshooting blog and issue for crashloop backoff.
I had a similar error failed to create containerd task: hcsshim::CreateComputeSystem with flannel on k8s v1.24. The cause was that Windows OS patches had not been applied. You must have applied the patch related to KB4489899.
https://github.com/kubernetes-sigs/sig-windows-tools/blob/master/guides/guide-for-adding-windows-node.md#before-you-begin

Crashlooperror in Google Cloud Shell

I am taking a course in google cloud on launching a kubernetes engine cluster. I received this when running through twice.
What is the fix for CrashLoopBackOff? I have not been able to locate.
(venv) student_04_9b8cb56b5006#cloudshell:~/cloud-vision/python/awwvision/cloud-vision/python/awwvision (qwiklabs-gcp-00-128898864713)$ kubectl get pods
W0619 19:58:53.278025 3544 gcp.go:120] WARNING: the gcp auth plugin is deprecated in v1.22+, unavailable in v1.25+; use gcloud instead.
To learn more, consult https://cloud.google.com/blog/products/containers-kubernetes/kubectl-auth-changes-in-gke
NAME READY STATUS RESTARTS AGE
awwvision-webapp-55f5dbb8c7-mdtnq 0/1 CrashLoopBackOff 9 (2m45s ago) 24m
awwvision-worker-79c846b86d-f9mvp 0/1 CrashLoopBackOff 9 (2m4s ago) 23m
awwvision-worker-79c846b86d-lhnt8 0/1 CrashLoopBackOff 9 (2m25s ago) 23m
awwvision-worker-79c846b86d-t79zc 0/1 CrashLoopBackOff 9 (2m45s ago) 23m
redis-master-6c59fc54c-ldk8t 1/1 Running 0 25m

Gitea: dial tcp: lookup gitea-postgresql.default.svc.cluster.local

I see this error when trying to use Gitea with microk8s on Ubuntu 21.10:
$ k logs gitea-0 -c configure-gitea
Wait for database to become avialable...
gitea-postgresql (10.152.183.227:5432) open
...
2021/11/20 05:49:40 ...om/urfave/cli/app.go:277:Run() [I] PING DATABASE postgres
2021/11/20 05:49:45 cmd/migrate.go:38:runMigrate() [F] Failed to initialize ORM engine: dial tcp: lookup gitea-postgresql.default.svc.cluster.local: Try again
I am looking for some clues as to how to debug this please.
The other pods seem to be running as expected:
$ k get pod -A
NAMESPACE NAME READY STATUS RESTARTS AGE
kube-system hostpath-provisioner-5c65fbdb4f-nfx7d 1/1 Running 0 11h
kube-system calico-node-h8tpk 1/1 Running 0 11h
kube-system calico-kube-controllers-f7868dd95-dpp8n 1/1 Running 0 11h
kube-system coredns-7f9c69c78c-cnpkj 1/1 Running 0 11h
default gitea-memcached-584956987c-zb8kp 1/1 Running 0 20s
default gitea-postgresql-0 1/1 Running 0 20s
default gitea-0 0/1 Init:1/2 1 20s
The services are not as expected, since gitea-0 is not starting:
$ k get svc -A
NAMESPACE NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
default kubernetes ClusterIP 10.152.183.1 <none> 443/TCP 11h
kube-system kube-dns ClusterIP 10.152.183.10 <none> 53/UDP,53/TCP,9153/TCP 11h
default gitea-postgresql-headless ClusterIP None <none> 5432/TCP 3m25s
default gitea-ssh ClusterIP None <none> 22/TCP 3m25s
default gitea-http ClusterIP None <none> 3000/TCP 3m25s
default gitea-memcached ClusterIP 10.152.183.15 <none> 11211/TCP 3m25s
default gitea-postgresql ClusterIP 10.152.183.227 <none> 5432/TCP 3m25s
Also see:
https://github.com/ubuntu/microk8s/issues/2741
https://gitea.com/gitea/helm-chart/issues/249
I worked through to the point where I had the logs below, specifically:
cmd/migrate.go:38:runMigrate() [F] Failed to initialize ORM engine: dial tcp: lookup gitea-postgresql.default.svc.cluster.local: Try again
Using k cluster-info dump I saw:
[ERROR] plugin/errors: 2 gitea-postgresql.default.svc.cluster.local.cisco.com. A: read udp 10.1.147.194:56647->8.8.8.8:53: i/o timeout
That led me to test the DNS with dig and 8.8.8.8. That test didn't reveal any errors, in that DNS seemed to work. Even so, DNS seemed suspect.
So then I tried microk8s enable storage dns:<IP address of DNS in lab>, whereas I was previously only using microk8s storage dns. The storage part enables the persistent volumes that the database needs.
The key piece here is the lab DNS server IP address argument when enabling DNS with microk8s.

How can I diagnose why a k8s pod keeps restarting?

I deploy a elasticsearch to minikube with below configure file:
apiVersion: apps/v1
kind: Deployment
metadata:
name: elasticsearch
spec:
replicas: 1
selector:
matchLabels:
name: elasticsearch
template:
metadata:
labels:
name: elasticsearch
spec:
containers:
- name: elasticsearch
image: elasticsearch:7.10.1
ports:
- containerPort: 9200
- containerPort: 9300
I run the command kubectl apply -f es.yml to deploy the elasticsearch cluster.
$ kubectl get pod
NAME READY STATUS RESTARTS AGE
elasticsearch-fb9b44948-bchh2 1/1 Running 5 6m23s
The elasticsearch pod keep restarting every a few minutes. When I run kubectl describe pod command, I can see these events:
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 7m11s default-scheduler Successfully assigned default/elasticsearch-fb9b44948-bchh2 to minikube
Normal Pulled 3m18s (x5 over 7m11s) kubelet Container image "elasticsearch:7.10.1" already present on machine
Normal Created 3m18s (x5 over 7m11s) kubelet Created container elasticsearch
Normal Started 3m18s (x5 over 7m10s) kubelet Started container elasticsearch
Warning BackOff 103s (x11 over 5m56s) kubelet Back-off restarting failed container
The last event is Back-off restarting failed but I don't know why it restarts the pod. Is there any way I can check why it keeps restarting?
The first step (kubectl describe pod) you've already done. As a next step I suggest checking container logs: kubectl logs <pod_name>. 99% you get the reason from logs in this case (I bet on bootstrap check failure).
When neither describe pod nor logs do not have anything about the error, I get into the container with 'exec': kubectl exec -it <pod_name> -c <container_name> sh. With this you'll get a shell inside the container (of course if there IS a shell binary in it) ans so you can use it to investigate the problem manually. Note that to keep failing container alive you may need to change command and args to something like this:
command:
- /bin/sh
- -c
args:
- cat /dev/stdout
Be sure to disable probes when doing this. A container may restart if liveness probe fails, you will see that in kubectl describe pod if it happen. Since your snippet doesn't have any probes specified, you can skip this.
Checking logs of the pod using kubectl logs podname gives clue about what could go wrong.
ERROR: [2] bootstrap checks failed
[1]: max virtual memory areas vm.max_map_count [65530] is too low, increase to at least [262144]
[2]: the default discovery settings are unsuitable for production use; at least one of [discovery.seed_hosts, discovery.seed_providers, cluster.initial_master_nodes] must be configured
ERROR: Elasticsearch did not exit normally - check the logs at /usr/share/elasticsearch/logs/docker-cluster.log
Check this post for a solution

minikube error trying to reach 172.17.0.4:8080 on osx

I'm doing the kubernetes tutorial locally with minikube on osx. In https://kubernetes.io/docs/tutorials/kubernetes-basics/deploy-app/deploy-interactive/ step 3, I get the error
% curl http://localhost:8001/api/v1/namespaces/default/pods/$POD_NAME/proxy/
Error: 'dial tcp 172.17.0.4:8080: getsockopt: connection refused'
Trying to reach: 'http://172.17.0.4:8080/'%
any idea why this doesn't work locally? the simpler request does work
% curl http://localhost:8001/version
{
"major": "1",
"minor": "10",
"gitVersion": "v1.10.0",
"gitCommit": "fc32d2f3698e36b93322a3465f63a14e9f0eaead",
"gitTreeState": "clean",
"buildDate": "2018-03-26T16:44:10Z",
"goVersion": "go1.9.3",
"compiler": "gc",
"platform": "linux/amd64"
info
$ kubectl get pods
NAME READY STATUS RESTARTS AGE
kubernetes-bootcamp-74f58d6b87-ntn5r 0/1 ImagePullBackOff 0 21h
logs
$ kubectl logs $POD_NAME
Error from server (BadRequest): container "kubernetes-bootcamp" in pod "kubernetes-bootcamp-74f58d6b87-w4zh8" is waiting to start: trying and failing to pull image
so then the run command is starting the node but the pod crashes? why?
$ kubectl run kubernetes-bootcamp --image=gcr.io/google-samples/kubernetes-bootcamp:v1 --port=8080
I can pull the image without a problem
$ docker pull gcr.io/google-samples/kubernetes-bootcamp:v1
v1: Pulling from google-samples/kubernetes-bootcamp
5c90d4a2d1a8: Pull complete
ab30c63719b1: Pull complete
29d0bc1e8c52: Pull complete
d4fe0dc68927: Pull complete
dfa9e924f957: Pull complete
Digest: sha256:0d6b8ee63bb57c5f5b6156f446b3bc3b3c143d233037f3a2f00e279c8fcc64af
Status: Downloaded newer image for gcr.io/google-samples/kubernetes-bootcamp:v1
describe
$ kubectl describe pods
Name: kubernetes-bootcamp-74f58d6b87-w4zh8
Namespace: default
Node: minikube/10.0.2.15
Start Time: Tue, 24 Jul 2018 15:05:00 -0400
Labels: pod-template-hash=3091482643
run=kubernetes-bootcamp
Annotations: <none>
Status: Pending
IP: 172.17.0.3
Controlled By: ReplicaSet/kubernetes-bootcamp-74f58d6b87
Containers:
kubernetes-bootcamp:
Container ID:
Image: gci.io/google-samples/kubernetes-bootcamp:v1
Image ID:
Port: 8080/TCP
State: Waiting
Reason: ImagePullBackOff
Ready: False
Restart Count: 0
Environment: <none>
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from default-token-wp28q (ro)
Conditions:
Type Status
Initialized True
Ready False
PodScheduled True
Volumes:
default-token-wp28q:
Type: Secret (a volume populated by a Secret)
SecretName: default-token-wp28q
Optional: false
QoS Class: BestEffort
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute for 300s
node.kubernetes.io/unreachable:NoExecute for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal BackOff 23m (x281 over 1h) kubelet, minikube Back-off pulling image "gci.io/google-samples/kubernetes-bootcamp:v1"
Warning Failed 4m (x366 over 1h) kubelet, minikube Error: ImagePullBackOff
Minikube is a tool that makes it easy to run Kubernetes locally.
Minikube runs a single-node Kubernetes
cluster inside a VM on your laptop for users looking to try out Kubernetes or develop with it day-to-day.
Back to your issue. Have you checked if you provided enough resources to run Minikube environment?
You may try to run minikube and force allocate more memory:
minikube start --memory 4096
For further analysis, please provide information about resources dedicated to this installation and
type of hypervisor you use.
Sounds like a networking issue. Your VM is unable to pull the images from gcr.io:443.
Here's what your kubectl describe pods kubernetes-bootcamp-xxx should looks like:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 5m default-scheduler Successfully assigned kubernetes-bootcamp-5c69669756-xbbmn to minikube
Normal SuccessfulMountVolume 5m kubelet, minikube MountVolume.SetUp succeeded for volume "default-token-cfq65"
Normal Pulling 5m kubelet, minikube pulling image "gcr.io/google-samples/kubernetes-bootcamp:v1"
Normal Pulled 5m kubelet, minikube Successfully pulled image "gcr.io/google-samples/kubernetes-bootcamp:v1"
Normal Created 5m kubelet, minikube Created container
Normal Started 5m kubelet, minikube Started container
Normal SuccessfulMountVolume 1m kubelet, minikube MountVolume.SetUp succeeded for volume "default-token-cfq65"
Normal SandboxChanged 1m kubelet, minikube Pod sandbox changed, it will be killed and re-created.
Normal Pulled 1m kubelet, minikube Container image "gcr.io/google-samples/kubernetes-bootcamp:v1" already present on machine
Normal Created 1m kubelet, minikube Created container
Normal Started 1m kubelet, minikube Started container
Try this from your host, to narrow down if it's a networking issue with your VM or your host machine:
$ docker pull gcr.io/google-samples/kubernetes-bootcamp:v1
v1: Pulling from google-samples/kubernetes-bootcamp
5c90d4a2d1a8: Pull complete
ab30c63719b1: Pull complete
29d0bc1e8c52: Pull complete
d4fe0dc68927: Pull complete
dfa9e924f957: Pull complete
Digest: sha256:0d6b8ee63bb57c5f5b6156f446b3bc3b3c143d233037f3a2f00e279c8fcc64af
Status: Downloaded newer image for gcr.io/google-samples/kubernetes-bootcamp:v1

Resources