Openshift container health probe connection refused - spring-boot

Hi openshift community,
I am currently migrating an app to Openshift and has encountered failed health probes due to connection refused. What I find strange is that if I ssh into the pod and use
curl localhost:10080/xxx-service/info
It returns HTTP 200 but if I use the IP address then it fails with
This is the details:
POD status
Logs in Openshift saying Spring boot started successfully
Openshift events saying probes failed due to connection refused
Tried SSH to pod to check using localhost which works
Not sure why the IP address is not resolving at the POD level.... Does anyone know the answer or have encountered it?

It is hard to say in your case what exactly the issue is, as it is environment specific.
In general, you should avoid using IP addresses when working with Kubernetes, as these will change whenever a Pod is restarted (which may be the root cause for the issue you are seeing).
When defining readiness and liveness probes for your container, I would recommend that you always use the following syntax to define your checks (note that it does not specify the host):
...
readinessProbe:
httpGet:
path: /xxx-service/info
port: 10080
initialDelaySeconds: 15
timeoutSeconds: 1
...
See also the Kubernetes or OpenShift documentation for more information:
https://docs.openshift.com/container-platform/3.11/dev_guide/application_health.html
https://kubernetes.io/docs/tasks/configure-pod-container/configure-liveness-readiness-startup-probes/#define-a-liveness-http-request

I found the root cause, it turns out it was Spring related.
It was a Spring Boot app that was being packaged as WAR file and deployed to Tomcat server. In the application.properties, it had this field:
server.address=127.0.0.1
Removing it has fixed this issue.
https://docs.spring.io/spring-boot/docs/current/reference/html/appendix-application-properties.html#common-application-properties-server

Related

Spring application unable to access kafka running in kubernetes minikube

I used bitnami/kafka to deploy kafka on minikube. A describe of the pod kafka-0 looks says that server address is:
KAFKA_CFG_ADVERTISED_LISTENERS:INTERNAL://$(MY_POD_NAME).kafka-headless.default.svc.cluster.local:9093,CLIENT://$(MY_POD_NAME).kafka-headless.default.svc.cluster.local:9092
My kafka address is set like so in Spring config properties:
spring.kafka.bootstrap-servers=["kafka-0.kafka-headless.default.svc.cluster.local:9092"]
But when I try to send a message I get the following error:
Failed to construct kafka producer] with root cause:
org.apache.kafka.common.config.ConfigException:
Invalid url in bootstrap.servers: ["kafka-0.kafka-headless.default.svc.cluster.local:9092"]
Note that this works when I run kafka locally and set the bootstrap-servers address to localhost:9092
How do I fix this error? What is the correct kafka URL to use and where do I find it? thanks
Minikube network is different to the host network, you need a bridge.
The advertised listener is in the minikube realm, not findable from the host.
You could setup a service and an ingress in minikube pointing to your kafka, setup your hosts file to the ip address of the ingress and the hostname advertised.
spring.kafka.bootstrap-servers needs valid server hostnames along with port number as comma-separated
hostname-1:port,hostname-2:port
["kafka-0.kafka-headless.default.svc.cluster.local:9092"] is not looking like one!

Kubernetes - Windows 10 - connectex: No connection could be made because the target machine actively refused it

I've installed Kubernetes on windows 10 pro. I ran into a problem where the UI wasn't accepting the access token I had generated for some reason.
So I went into docker and reset the cluster so I could start over:
But now when I try to apply my configuration again I get an error:
kubectl apply -f .\recommended.yaml
Unable to connect to the server: dial tcp 127.0.0.1:61634: connectex: No connection could be made because the target machine actively refused it.
I have my KUBECONFIG variable set:
$env:KUBECONFIG
C:\Users\bluet\.kube\config
And I have let kubernetes know about the config with this command:
[Environment]::SetEnvironmentVariable("KUBECONFIG", $HOME + "\.kube\config", [EnvironmentVariableTarget]::Machine)
Yet, the issue remains! How can I resolve this? Docker seems fine.
This stack overflow answered my question.
This is what it says:
If you have kubectl already installed and pointing to some other environment, such as minikube or a GKE cluster, be sure to change
context so that kubectl is pointing to docker-desktop:
kubectl config get-contexts
kubectl config use-context docker-desktop
Apparently I had installed minikube which is what messed it up. Switching back to a docker context is what saved the day.

Why does search name resolution fail for elasticsearch-master-headless on Kubernetes 1.16?

I'm trying to get elasticsearch running on Kubernetes 1.16 with Helm 3 on GKE. I'm aware that both 1.16 and 3 aren't supported yet. I want to prepare a PR to make it compatible. I'm using the helm charts from https://github.com/elastic/helm-charts.
If I use the original chart 7.6.1 the pod creation fails due to create Pod elasticsearch-master-0 in StatefulSet elasticsearch-master failed error: pods "elasticsearch-master-0" is forbidden: unable to validate against any pod security policy: [spec.volumes[1]: Invalid value: "projected": projected volumes are not allowed to be used]. Therefore I created the following patch:
diff --git a/elasticsearch/values.yaml b/elasticsearch/values.yaml
index 053c020..fd9c37b 100755
--- a/elasticsearch/values.yaml
+++ b/elasticsearch/values.yaml
## -107,6 +107,7 ## podSecurityPolicy:
- secret
- configMap
- persistentVolumeClaim
+ - projected
persistence:
enabled: true
With this patch on master/d9ccb5a and tag 7.6.1 (tried both) the pods quickly get into unhealthy state due to failed to resolve host [elasticsearch-master-headless] caused by a java.net.UnknownHostException: elasticsearch-master-headless.
I don't understand why the name resolution doesn't work as there's no change introduced in 1.16 which changes name resolution with Kubernetes names afaik. If I try to ping elasticsearch-master-headless from a shell in the pod started with kubectl exec, I can't reach it neither.
I tried to contact the nameserver in /etc/resolv.conf with telnet because it allows specifying a specific port:
[elasticsearch#elasticsearch-master-1 ~]$ cat /etc/resolv.conf
search default.svc.cluster.local svc.cluster.local cluster.local us-central1-a.c.myproject.internal c.myproject.internal google.internal
nameserver 10.23.240.10
options ndots:5
[elasticsearch#elasticsearch-master-1 ~]$ telnet 10.23.240.10
Trying 10.23.240.10...
^C
[elasticsearch#elasticsearch-master-1 ~]$ telnet 10.23.240.10 53
Trying 10.23.240.10...
telnet: connect to address 10.23.240.10: Connection refused
I obfuscated the project ID with myproject.
The patch is already proposed to be merged upstream together with other changes at https://github.com/elastic/helm-charts/pull/496.
This is caused by the pod kube-dns crashing due to
F0315 20:01:02.464839 1 server.go:61] Failed to create a kubernetes client: open /var/run/secrets/kubernetes.io/serviceaccount/token: permission denied
Since Kubernetes 1.16 is only available in the rapid channel of GKE and it's a system pod, I consider this a bug.
I'll update this answer if I find the energy to file a bug.
Chances are that there is a firewall(firewalld) blocking 53/udp,tcp or an issue with CoreDNS pod in the cluster where you are performing the test.

How can I troubleshoot/fix an issue interacting with a running Kubernetes pod (timeout error)?

I have two EC2 instances, one running a Kubernetes Master node and the other running the Worker node. I can successfully create a pod from a deployment file that pulls a docker image and it starts with a status of "Running". However when I try to interact with it I get a timeout error.
Ex: kubectl logs <pod-name> -v6
Output:
Config loaded from file /home/ec2-user/.kube/config
GET https://<master-node-ip>:6443/api/v1/namespaces/default/pods/<pod-name> 200 OK in 11 milliseconds
GET https://<master-node-ip>:6443/api/v1/namespaces/default/pods/<pod-name>/log 500 Internal Server Error in 30002 milliseconds
Server response object: [{"status": "Failure", "message": "Get https://<worker-node-ip>:10250/containerLogs/default/<pod-name>/<container-name>: dial tcp <worker-node-ip>:10250: i/o timeout", "code": 500 }]
I can get information about the pod by running kubectl describe pod <pod-name> and confirm the status as Running. Any ideas on how to identify exactly what is causing this error and/or how to fix it?
Probably, you didn't install any network add-on to your Kubernetes cluster. It's not included in kubeadm installation, but it's required to communicate between pods scheduled on different nodes. The most popular are Calico and Flannel. As you already have a cluster, you may want to chose the network add-on that uses the same subnet as you stated with kubeadm init --pod-network-cidr=xx.xx.xx.xx/xx during cluster initialization.
192.168.0.0/16 is default for Calico network addon
10.244.0.0/16 is default for Flannel network addon
You can change it by downloading corresponded YAML file and by replacing the default subnet with the subnet you want. Then just apply it with kubectl apply -f filename.yaml

Kubernetes proxy connection

I am trying to play around with kubernetes and specifically the REST API. The steps to connect with the cluster API are listed here. However Im stuck in the first step i.e. running kubectl proxy
I try running this:
kubectl --context='vagrant' proxy --port=8080 &
which returns error: couldn't read version from server: Get https://172.17.4.99:443/api: dial tcp 172.17.4.99:443: i/o timeout
What does this mean? How do overcome it connect to the API?
Check that your docker, proxy, kube-apiserver, kube-control-manager services are running without error. Check their status using systemclt status your-service-name. If the service is loaded but not running then restart the service by using systemctl restart your-service-name.

Resources