How can I troubleshoot/fix an issue interacting with a running Kubernetes pod (timeout error)?

How can I troubleshoot/fix an issue interacting with a running Kubernetes pod (timeout error)? - amazon-ec2

I have two EC2 instances, one running a Kubernetes Master node and the other running the Worker node. I can successfully create a pod from a deployment file that pulls a docker image and it starts with a status of "Running". However when I try to interact with it I get a timeout error.
Ex: kubectl logs <pod-name> -v6
Output:
Config loaded from file /home/ec2-user/.kube/config
GET https://<master-node-ip>:6443/api/v1/namespaces/default/pods/<pod-name> 200 OK in 11 milliseconds
GET https://<master-node-ip>:6443/api/v1/namespaces/default/pods/<pod-name>/log 500 Internal Server Error in 30002 milliseconds
Server response object: [{"status": "Failure", "message": "Get https://<worker-node-ip>:10250/containerLogs/default/<pod-name>/<container-name>: dial tcp <worker-node-ip>:10250: i/o timeout", "code": 500 }]
I can get information about the pod by running kubectl describe pod <pod-name> and confirm the status as Running. Any ideas on how to identify exactly what is causing this error and/or how to fix it?

Probably, you didn't install any network add-on to your Kubernetes cluster. It's not included in kubeadm installation, but it's required to communicate between pods scheduled on different nodes. The most popular are Calico and Flannel. As you already have a cluster, you may want to chose the network add-on that uses the same subnet as you stated with kubeadm init --pod-network-cidr=xx.xx.xx.xx/xx during cluster initialization.
192.168.0.0/16 is default for Calico network addon
10.244.0.0/16 is default for Flannel network addon
You can change it by downloading corresponded YAML file and by replacing the default subnet with the subnet you want. Then just apply it with kubectl apply -f filename.yaml

Related

How to set metricbeat on amazon elasticsearch

I have two servers one for production and one for test, I've been trying to install metricbeat on both servers.
I did on my test server and set it to send logs to my amazon elasticsearch service and now I can see on kibana all data from that server but I did the same process to my production server and when I use command sudo metricbeat -e it starts but I get
ERROR pipeline/output.go:74 Failed to connect: Get https://amazon-endpoint request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers) but the config for both is the same

ibm-cloud-private DNS or Internet issues from inside the pods

I've been experimenting with an ICP instance (ICP 2.1.0.2): 1 master node and 2 worker nodes.
I noticed that the pods in my ICP Kubernetes cluster don't have outbound Internet connectivity (or are having DNS lookup issues)
For example, If I start up a busybox pod in my cluster, and try to do "nslookup github.com" or "ping google.com" .. it fails..
kubectl run curl --image=radial/busyboxplus:curl -i --tty
root#curl-545bbf5f9c-gssbg:/ ]$ nslookup github.com
Server: 10.0.0.10
Address 1: 10.0.0.10
nslookup: can't resolve 'github.com'
I checked and saw that "kube-dns" (service, pod, daemonset.extensions, daemonset.apps) does appear to be running.
When I'm logged into (eg. SSH) to the ICP master and the worker nodes machines, I am able to ping these external sites successfully.
Any suggestions for how to troubleshoot this problem? Thanks!

We had kind of the reverse problem - where we could look up anything on internet or other domains, but not the domain in which the cluster was deployed.
That turned out to be the vague documentation around what cluster_domain and cluster_CA_domain mean in the config.yaml. But as a plus we got to learn a bit more about those and about configuring kube-dns.
Basically cluster_domain should be a private virtual domain to the cluster for which kube-dns will be authoritative. Anything else it should use the host's resolve.conf nameservers as upstream servers. If you suspect that your DNS servers are not being utilised for public DNS then you can update the kube-dns configMap to specify the upstream servers that it should use.
https://kubernetes.io/docs/tasks/administer-cluster/dns-custom-nameservers/
This is assuming you have configure cluster_domain, cluster_CA_domain correctly of course.
They should look something like
cluster_domain = mycluster.icp <----- could be "Mickey-mouse" for all it matters
cluster_CA_domain = icp.mycompany.com <----- the endpoint that portal/registry/api etc are accessible to users on

IBM Cloud Private monitoring gets 502 bad gateway

The following containers are not starting after installing IBM Cloud Private. I had previously installed ICP without a Management node and was doing a new install after having done and 'uninstall' and did restart the Docker service on all nodes.
Installed a second time with a Management node defined, Master/Proxy on a single node, and two Worker nodes.
Selecting menu option Platform / Monitoring gets 502 Bad Gateway
Event messages from deployed containers
Deployment - monitoring-prometheus
TYPE SOURCE COUNT REASON MESSAGE
Warning default-scheduler 2113 FailedScheduling
No nodes are available that match all of the following predicates:: MatchNodeSelector (3), NoVolumeNodeConflict (4).
Deployment - monitoring-grafana
TYPE SOURCE COUNT REASON MESSAGE
Warning default-scheduler 2097 FailedScheduling
No nodes are available that match all of the following predicates:: MatchNodeSelector (3), NoVolumeNodeConflict (4).
Deployment - rootkit-annotator
TYPE SOURCE COUNT REASON MESSAGE
Normal kubelet 169.53.226.142 125 Pulled
Container image "ibmcom/rootkit-annotator:20171011" already present on machine
Normal kubelet 169.53.226.142 125 Created
Created container
Normal kubelet 169.53.226.142 125 Started
Started container
Warning kubelet 169.53.226.142 2770 BackOff
Back-off restarting failed container
Warning kubelet 169.53.226.142 2770 FailedSync
Error syncing pod

The management console sometimes displays a 502 Bad Gateway Error after installation or rebooting the master node. If you recently installed IBM Cloud Private, wait a few minutes and reload the page.
If you rebooted the master node, take the following steps:
Configure the kubectl command line interface. See Accessing your IBM Cloud Private cluster by using the kubectl CLI.
Obtain the IP addresses of the icp-ds pods. Run the following command:
kubectl get pods -o wide -n kube-system | grep "icp-ds"
The output resembles the following text:
icp-ds-0 1/1 Running 0 1d 10.1.231.171 10.10.25.134
In this example, 10.1.231.171 is the IP address of the pod.
In high availability (HA) environments, an icp-ds pod exists for each master node.
From the master node, ping the icp-ds pods. Check the IP address for each icp-ds pod by running the following command for each IP address:
ping 10.1.231.171
If the output resembles the following text, you must delete the pod:
connect: Invalid argument
Delete each pod that you cannot reach:
kubectl delete pods icp-ds-0 -n kube-system
In this example, icp-ds-0 is the name of the unresponsive pod.
In HA installations, you might have to delete the pod for each master node.
Obtain the IP address of the replacement pod or pods. Run the following command:
kubectl get pods -o wide -n kube-system | grep "icp-ds"
The output resembles the following text:
icp-ds-0 1/1 Running 0 1d 10.1.231.172 10.10.2
Ping the pods again. Check the IP address for each icp-ds pod by running the following command for each IP address:
ping 10.1.231.172
If you can reach all icp-ds pods, you can access the IBM Cloud Private management console when that pod enters the available state.

kubernetes dashboard unavailable post cluster deploy

i am new to kubernetes. i have just followed this guide and have a vagrant/kubernetes cluster: https://coreos.com/kubernetes/docs/latest/kubernetes-on-vagrant.html
i was interested viewing the ui, so i followed the instructions here: http://kubernetes.io/docs/user-guide/ui/#deploying-the-dashboard-ui
$ kubectl proxy
Starting to serve on 127.0.0.1:8001
upon browsing to the above IP:PORT, <h3>Unauthorized</h3> is served. so, i suffix /ui to the URI, and we get:
// 127.0.0.1:8001/ui redirected to http://localhost:8001/api/v1/proxy/namespaces/kube-system/services/kubernetes-dashboard
{
"kind": "Status",
"apiVersion": "v1",
"metadata": {},
"status": "Failure",
"message": "no endpoints available for service \"kubernetes-dashboard\"",
"reason": "ServiceUnavailable",
"code": 503
}
perhaps relevant is:
$ kubectl cluster-info
Kubernetes master is running at https://172.17.4.101:443
Heapster is running at https://172.17.4.101:443/api/v1/proxy/namespaces/kube-system/services/heapster
KubeDNS is running at https://172.17.4.101:443/api/v1/proxy/namespaces/kube-system/services/kube-dns
kubernetes-dashboard is running at https://172.17.4.101:443/api/v1/proxy/namespaces/kube-system/services/kubernetes-dashboard
$ kubectl get services
NAME CLUSTER-IP EXTERNAL-IP PORT(S) AGE
kubernetes 10.3.0.1 <none> 443/TCP 36m
i saw another SO thread, Kubernetes dashboard keeps pending with message: no endpoints available for service "kubernetes-dashboard", and discovered get pods and describe pod <pod-name> --namespace=kube-system.
so, i ran kubectl describe pod kubernetes-dashboard-3543765157-94gj9 --namespace="kube-system" which yielded: https://gist.github.com/cdaringe/b972bf5a95c9f2a7cb8386ef6bf2252b

ultimately, my cluster had no nodes, so the UI service had no place to land and run! the API still attempts to proxy to it, which is why it reported "no endpoints"--there was no host endpoint serving the content. still haven't figured out why my vagrant cluster deployed no nodes. im going to guess that the workers never downloaded the kubelet and joined.

Kubernetes proxy connection

I am trying to play around with kubernetes and specifically the REST API. The steps to connect with the cluster API are listed here. However Im stuck in the first step i.e. running kubectl proxy
I try running this:
kubectl --context='vagrant' proxy --port=8080 &
which returns error: couldn't read version from server: Get https://172.17.4.99:443/api: dial tcp 172.17.4.99:443: i/o timeout
What does this mean? How do overcome it connect to the API?

Check that your docker, proxy, kube-apiserver, kube-control-manager services are running without error. Check their status using systemclt status your-service-name. If the service is loaded but not running then restart the service by using systemctl restart your-service-name.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio