k8s spring boot pod failing readiness and liveness probe - spring-boot

I have configured a spring-boot pod and configured the liveness and readiness probes.
When I start the pod, the describe command is showing the below output.
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 92s default-scheduler Successfully assigned pradeep-ns/order-microservice-rs-8tqrv to pool-h4jq5h014-ukl3l
Normal Pulled 43s (x2 over 91s) kubelet Container image "classpathio/order-microservice:latest" already present on machine
Normal Created 43s (x2 over 91s) kubelet Created container order-microservice
Normal Started 43s (x2 over 91s) kubelet Started container order-microservice
Warning Unhealthy 12s (x6 over 72s) kubelet Liveness probe failed: Get "http://10.244.0.206:8222/actuator/health/liveness": dial tcp 10.244.0.206:8222: connect: connection refused
Normal Killing 12s (x2 over 52s) kubelet Container order-microservice failed liveness probe, will be restarted
Warning Unhealthy 2s (x8 over 72s) kubelet Readiness probe failed: Get "http://10.244.0.206:8222/actuator/health/readiness": dial tcp 10.244.0.206:8222: connect: connection refused
The pod definition is like below
apiVersion: apps/v1
kind: ReplicaSet
metadata:
name: order-microservice-rs
labels:
app: order-microservice
spec:
replicas: 1
selector:
matchLabels:
app: order-microservice
template:
metadata:
name: order-microservice
labels:
app: order-microservice
spec:
containers:
- name: order-microservice
image: classpathio/order-microservice:latest
imagePullPolicy: IfNotPresent
env:
- name: SPRING_PROFILES_ACTIVE
value: dev
- name: SPRING_DATASOURCE_USERNAME
valueFrom:
secretKeyRef:
key: username
name: db-credentials
- name: SPRING_DATASOURCE_PASSWORD
valueFrom:
secretKeyRef:
key: password
name: db-credentials
volumeMounts:
- name: app-config
mountPath: /app/config
- name: app-logs
mountPath: /var/log
livenessProbe:
httpGet:
port: 8222
path: /actuator/health/liveness
initialDelaySeconds: 10
periodSeconds: 10
readinessProbe:
httpGet:
port: 8222
path: /actuator/health/readiness
initialDelaySeconds: 10
periodSeconds: 10
resources:
requests:
memory: "550Mi"
cpu: "500m"
limits:
memory: "550Mi"
cpu: "750m"
volumes:
- name: app-config
configMap:
name: order-microservice-config
- name: app-logs
emptyDir: {}
restartPolicy: Always
If I disable the liveness and readiness probe in the replica-set manifest and I exec into the pod, I am getting a valid response when invoking http://localhost:8222/actuator/health/liveness and http://localhost:8222/actuator/health/readiness endpoint.
Why is my pod restarting and failing when invoking the readiness and liveness endpoint with Kubernetes. Where am I going wrong?
Update
If I remove the resource section, the pods are running but when added the resource parameters, the probes are failing.

When you limit the container / spring application to 0.5 cores (500 millicores) the startup probably takes longer than the given liveness probe thresholds.
You can either increase them, or use a startupProbe with more relaxed settings (f.e. failureThreshold 10). You can reduce the period for the liveness probe in that case and get faster feedback after a successful container start was detected.

Your pod config only give 0.5 Core of CPU, and your check time was too short. The spring boot start may take a long time more than 10 seconds according your server CPU performance. This is my config of spring boot pod may give you a point.
"livenessProbe": {
"httpGet": {
"path": "/actuator/liveness",
"port": 11032,
"scheme": "HTTP"
},
"initialDelaySeconds": 90,
"timeoutSeconds": 30,
"periodSeconds": 30,
"successThreshold": 1,
"failureThreshold": 3
},
"readinessProbe": {
"httpGet": {
"path": "/actuator/health",
"port": 11032,
"scheme": "HTTP"
},
"initialDelaySeconds": 60,
"timeoutSeconds": 30,
"periodSeconds": 30,
"successThreshold": 1,
"failureThreshold": 3
},
and I did not limit the CPU and memory resource, if you limit the CPU, it will take more time. Hop this could help you.

When you are trying the request against your localhost, and it works, it is not a guarantee that it is going to work on other network interfaces. Kubelet is a node agent, so the request is going to your eth0, or equivalent, not your localhost.
You can check it by making the request from another pod to your pod's IP address, or the service backing it up.
Probably you are making your application to serve on localhost, while you have to make it serve on 0.0.0.0, or eth0.

Related

Setting Ressource limit for ElasticSearch in Kubernetes creates bug

I have a web App, that I am trying to deploy with Kubernetes. It's working correctly, but when I try to add resource limits, the ElasticSearch will not deploy.
elasticsearch-deployment.yaml:
apiVersion: v1
kind: Service
metadata:
name: elasticsearch-service
spec:
type: NodePort
selector:
app: elasticsearch
ports:
- port: 9200
targetPort: 9200
name: serving
- port: 9300
targetPort: 9300
name: node-to-node
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: elasticsearch-deployment
labels:
app: elasticsearch
spec:
replicas: 1
selector:
matchLabels:
app: elasticsearch
template:
metadata:
labels:
app: elasticsearch
spec:
containers:
- name: elasticsearch
image: elasticsearch:7.9.0
ports:
- containerPort: 9200
- containerPort: 9300
env:
- name: discovery.type
value: single-node
# resources:
# limits:
# memory: 8Gi
# cpu: "4"
# requests:
# memory: 4Gi
# cpu: "2"
If I uncomment the resources section of the file, the pod is stuck in pending:
> kubectl get pods
NAME READY STATUS RESTARTS AGE
backend-deployment-bd4f98697-rxsz8 1/1 Running 1 (6m9s ago) 6m40s
elasticsearch-deployment-644475545b-t75pp 0/1 Pending 0 6m40s
frontend-deployment-8bc989f89-4g6v7 1/1 Running 0 6m40s
mysql-0 1/1 Running 0 6m40s
If I check the events:
> kubectl get events
...
Warning FailedScheduling pod/elasticsearch-deployment-54d9cdd879-k69js 0/1 nodes are available: 1 Insufficient cpu.
Warning FailedScheduling pod/elasticsearch-deployment-54d9cdd879-rjj24 0/1 nodes are available: 1 Insufficient cpu.
...
The events says that the pod has Insufficient cpu, but I tried to change the resource limits to :
resources:
limits:
memory: 8Gi
cpu: "18"
requests:
memory: 4Gi
cpu: "18"
Still doesn't works, the only way for it to works is to remove the resource limit, but why?
It is because of the request, not the limit. It means your node doesn't have enough memory to schedule a pod that requests 2 CPUs. You need to set the value to a lower one (e.g. 500m).
You can check your server's allocatable CPUs. The sum of all Pod's CPU requests should be lower than this.
# kubectl describe nodes
...
Allocatable:
cpu: 28
...
In Addition to Daigo
Requests and limits are on a per-container basis. Each container in the Pod gets its own individual limit and request. When adding the limits and requests for each container together, you will get an aggregate value for the Pod.
Combining the requests and limits value of each container will represent the Pod requests which is 500m cpu and 128Mi memory and Pod limits is 1 cpu and 256Mi memory.
Requests are what the container is guaranteed to get. If a container requests a resource, k8s will only schedule it on a node that can give it that resource.
Limits, on the other hand, make sure a container won't go above a certain value or limit.
Without requests and limits defined, the scheduler might place the Pod on a node that has less than 1 GiB memory available.

LivenessProbe is failing but port-forward is working on the same port

I have the following deployment yaml:
apiVersion: apps/v1
kind: Deployment
metadata:
name: gofirst
labels:
app: gofirst
spec:
selector:
matchLabels:
app: gofirst
template:
metadata:
labels:
app: gofirst
spec:
restartPolicy: Always
containers:
- name: gofirst
image: lbvenkatesh/gofirst:0.0.5
resources:
limits:
memory: "128Mi"
cpu: "500m"
ports:
- name: http
containerPort: 8080
livenessProbe:
httpGet:
path: /health
port: http
httpHeaders:
- name: "X-Health-Check"
value: "1"
initialDelaySeconds: 30
periodSeconds: 10
readinessProbe:
httpGet:
path: /health
port: http
httpHeaders:
- name: "X-Health-Check"
value: "1"
initialDelaySeconds: 30
periodSeconds: 10
and my service yaml is this:
apiVersion: v1
kind: Service
metadata:
name: gofirst
labels:
app: gofirst
spec:
publishNotReadyAddresses: true
type: NodePort
selector:
app: gofirst
ports:
- port: 8080
targetPort: http
name: http
"gofirst" is a simple web application written in Golang Gin.
Here is the dockerFile of the same:
FROM golang:latest
LABEL MAINTAINER='Venkatesh Laguduva <lbvenkatesh#gmail.com>'
RUN mkdir /app
ADD . /app/
RUN apt -y update && apt -y install git
RUN go get github.com/gin-gonic/gin
RUN go get -u github.com/RaMin0/gin-health-check
WORKDIR /app
RUN go build -o main .
ARG verArg="0.0.1"
ENV VERSION=$verArg
ENV PORT=8080
ENV GIN_MODE=release
EXPOSE 8080
CMD ["/app/main"]
I have deployed this application in Minikube and when I try to describe this pods, I am seeing these events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedScheduling 10m (x2 over 10m) default-scheduler 0/1 nodes are available: 1 Insufficient cpu.
Normal Scheduled 10m default-scheduler Successfully assigned default/gofirst-95fc8668c-6r4qc to m01
Normal Pulling 10m kubelet, m01 Pulling image "lbvenkatesh/gofirst:0.0.5"
Normal Pulled 10m kubelet, m01 Successfully pulled image "lbvenkatesh/gofirst:0.0.5"
Normal Killing 8m13s (x2 over 9m13s) kubelet, m01 Container gofirst failed liveness probe, will be restarted
Normal Pulled 8m13s (x2 over 9m12s) kubelet, m01 Container image "lbvenkatesh/gofirst:0.0.5" already present on machine
Normal Created 8m12s (x3 over 10m) kubelet, m01 Created container gofirst
Normal Started 8m12s (x3 over 10m) kubelet, m01 Started container gofirst
Warning Unhealthy 7m33s (x7 over 9m33s) kubelet, m01 Liveness probe failed: Get http://172.17.0.4:8080/health: dial tcp 172.17.0.4:8080: connect: connection refused
Warning Unhealthy 5m35s (x12 over 9m25s) kubelet, m01 Readiness probe failed: Get http://172.17.0.4:8080/health: dial tcp 172.17.0.4:8080: connect: connection refused
Warning BackOff 31s (x17 over 4m13s) kubelet, m01 Back-off restarting failed container
I tried the sample container "hello-world" and worked well when I did "minikube service hello-world" but when I tried the same with "minikube service gofirst", I got the connection error in the browser.
I must be doing something relatively simpler but am unable to locate the error. Please go through my yaml and docker file, let me know if I am making any error.
I've reproduced your scenario and faced the same issues you have. So I decided to remove the liveness and rediness probes to be able to log in to the pod and investigate it.
Here is the yaml I used:
apiVersion: apps/v1
kind: Deployment
metadata:
name: gofirst
labels:
app: gofirst
spec:
selector:
matchLabels:
app: gofirst
template:
metadata:
labels:
app: gofirst
spec:
restartPolicy: Always
containers:
- name: gofirst
image: lbvenkatesh/gofirst:0.0.5
resources:
limits:
memory: "128Mi"
cpu: "500m"
ports:
- name: http
containerPort: 8080
I logged in the pod to check if the application is listening in the port you are trying to test:
kubectl exec -ti gofirst-65cfc7556-bbdcg -- bash
Then I installed netstat:
# apt update
# apt install net-tools
Checked if the application is running:
# ps -ef
UID PID PPID C STIME TTY TIME CMD
root 1 0 0 10:06 ? 00:00:00 /app/main
root 9 0 0 10:06 pts/0 00:00:00 sh
root 15 9 0 10:07 pts/0 00:00:00 ps -ef
And finally checked if port 8080 is listening:
# netstat -an
Active Internet connections (servers and established)
Proto Recv-Q Send-Q Local Address Foreign Address State
tcp 0 0 127.0.0.1:8080 0.0.0.0:* LISTEN
tcp 0 0 10.28.0.9:56106 151.101.184.204:80 TIME_WAIT
tcp 0 0 10.28.0.9:56130 151.101.184.204:80 TIME_WAIT
tcp 0 0 10.28.0.9:56104 151.101.184.204:80 TIME_WAIT
Active UNIX domain sockets (servers and established)
Proto RefCnt Flags Type State I-Node Path
As we can see, application is listening to localhost connections only and not from everywhere. Expected output should be: 0.0.0.0:8080
Hope it helps you to solve the problem.

Access Elasticsearch from minikube/kubernetes

I have a spring boot application which is deployed in Kubernetes on local windows machine using minikube. I also have Elasticsearch running on my local machine (http://localhost:9200).
I want to call Elasticsearch REST endpoints from this spring boot app.
I tried solving this by creating a service without selector but not sure what am i missing.
When accessing the spring boot app using http://#minikube_ip#:#Node_Port#, i get an error "No route to host".
i tried doing minikube ssh and executing curl command, from there also i get the same error. Clearly I am missing something here.
application.yaml
elasticsearch:
hosts:
- http://my-es:80
connectTimeout: 10000
connectionRequestTimeout: 10000
socketTimeout: 10000
maxRetryTimeoutMillis: 60000
deployment.yaml
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
name: kube-es-app
spec:
progressDeadlineSeconds: 600
replicas: 1
revisionHistoryLimit: 10
selector:
matchLabels:
run: kube-es-app
strategy:
rollingUpdate:
maxSurge: 25%
maxUnavailable: 25%
type: RollingUpdate
template:
metadata:
creationTimestamp: null
labels:
run: kube-es-app
spec:
containers:
- image: elastic-search-app:latest
imagePullPolicy: Never
name: kube-es-app
ports:
- containerPort: 8080
protocol: TCP
resources: {}
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
dnsPolicy: ClusterFirst
restartPolicy: Always
schedulerName: default-scheduler
securityContext: {}
terminationGracePeriodSeconds: 30
---
kind: Service
apiVersion: v1
metadata:
name: my-es
spec:
ports:
- protocol: TCP
port: 80
targetPort: 9200
---
kind: Endpoints
apiVersion: v1
metadata:
name: my-es
subsets:
- addresses:
- ip: <MY_LOCAL_MACHINE_IP>
ports:
- port: 9200
Commands I executed
docker build -t elastic-search-app .
kubectl create -f deployment.yaml
kubectl expose deployment/kube-es-app --type="NodePort" --port 8080
Can anyone help please? I am stuck
If I've got the description right, the Windows machine should have vbox network adapter connected to the Host-only-network the Minikube VM is connected to.
Minikube can access the host machine directly because both are in the same network.
The Minikube is in charge of NAT-ting packages from Pods outside. What you need is to allow Elasticsearch to listen to the vbox- or all interfaces, and enable its port in the Windows firewall. Then the Elasticsearch should be available via IP address of Windows in the Host-only-network.
Apart from that, you might create a service (if you need go by name instead of IP) as discussed here:
Connect to local database from inside minikube cluster,
Minikube:Exposing mysql as a service on localhost.

Unable to create the fluentd containers in my kubernetes cluster on ubuntu

I am trying to do the log monitoring of my kubernetes cluster using Elasticsearch, Fluentd, and Kibana. Here is the link which I was followed in this task. I labeled the nodes with beta.kubernetes.io/fluentd-ds-ready: "true". Initially, I created the statefulset for Elasticsearch.
After that, I created the fluentd-es-configmap.yaml,fluentd-es-ds.yaml and checked the pods status using kubectl get pods -n kube-system. The Fluentd pods are showing status like container creating. I checked the logs of the Fluentd container and it shows the error like:
Error from server (BadRequest): container "fluentd-es" in pod "fluentd-es-v2.0.1-csx96" is waiting to start: ContainerCreating
Here is fluentd pod description:
Name: fluentd-es-v2.0.1-csx96
Namespace: kube-system
Priority: 0
PriorityClassName: <none>
Node: ldap/192.168.1.191
Start Time: Wed, 10 Oct 2018 15:08:17 -0400
Labels: controller-revision-hash=5754d85c97
k8s-app=fluentd-es
kubernetes.io/cluster-service=true
pod-template-generation=1
version=v2.0.1
Annotations: scheduler.alpha.kubernetes.io/critical-pod:
Status: Pending
IP:
Controlled By: DaemonSet/fluentd-es-v2.0.1
Containers:
fluentd-es:
Container ID:
Image: gcr.io/google-containers/fluentd-elasticsearch:v2.0.1
Image ID:
Port: <none>
Host Port: <none>
State: Waiting
Reason: ContainerCreating
Ready: False
Restart Count: 0
Limits:
memory: 500Mi
Requests:
cpu: 100m
memory: 200Mi
Environment:
FLUENTD_ARGS: --no-supervisor -q
Mounts:
/etc/fluent/config.d from config-volume (rw)
/host/lib from libsystemddir (ro)
/var/lib/docker/containers from varlibdockercontainers (ro)
/var/log from varlog (rw)
/var/run/secrets/kubernetes.io/serviceaccount from fluentd-es-token-l2b2m (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
varlog:
Type: HostPath (bare host directory volume)
Path: /var/log
HostPathType:
varlibdockercontainers:
Type: HostPath (bare host directory volume)
Path: /var/lib/docker/containers
HostPathType:
libsystemddir:
Type: HostPath (bare host directory volume)
Path: /usr/lib64
HostPathType:
config-volume:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: fluentd-es-config-v0.1.0
Optional: false
fluentd-es-token-l2b2m:
Type: Secret (a volume populated by a Secret)
SecretName: fluentd-es-token-l2b2m
Optional: false
QoS Class: Burstable
Node-Selectors: beta.kubernetes.io/fluentd-ds-ready=true
Tolerations: node.kubernetes.io/disk-pressure:NoSchedule
node.kubernetes.io/memory-pressure:NoSchedule
node.kubernetes.io/not-ready:NoExecute
node.kubernetes.io/unreachable:NoExecute
node.kubernetes.io/unschedulable:NoSchedule
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedMount 14m (x42 over 107m) kubelet, ldap Unable to mount vo lumes for pod "fluentd-es-v2.0.1-csx96_kube-system(d80d9c78-ccbf-11e8-b7b5-52540 0e4ff36)": timeout expired waiting for volumes to attach or mount for pod "kube- system"/"fluentd-es-v2.0.1-csx96". list of unmounted volumes=[config-volume]. li st of unattached volumes=[varlog varlibdockercontainers libsystemddir config-volume fluentd-es-token-l2b2m]
Warning FailedMount 3m23s (x60 over 109m) kubelet, ldap MountVolume.SetUp failed for volume "config-volume" : configmap "fluentd-es-config-v0.1.0" not found
Could anybody suggest me how to resolve this issue?
Thanks in advance.
The problem seems to be a mismatch in the name of the configmap. The DaemonSet in looking for a configmap named fluentd-es-config-v0.1.0 but it is not found.
In the repository the configmap is named fluentd-es-config-v0.1.5 in both fluentd-es-ds.yaml and fluentd-es-configmap.yaml, so it should work by just using these files.

How do I access this Kubernetes service via kubectl proxy?

I want to access my Grafana Kubernetes service via the kubectl proxy server, but for some reason it won't work even though I can make it work for other services. Given the below service definition, why is it not available on http://localhost:8001/api/v1/proxy/namespaces/monitoring/services/grafana?
grafana-service.yaml
apiVersion: v1
kind: Service
metadata:
namespace: monitoring
name: grafana
labels:
app: grafana
spec:
type: NodePort
ports:
- name: web
port: 3000
protocol: TCP
nodePort: 30902
selector:
app: grafana
grafana-deployment.yaml
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
namespace: monitoring
name: grafana
spec:
replicas: 1
template:
metadata:
labels:
app: grafana
spec:
containers:
- name: grafana
image: grafana/grafana:4.1.1
env:
- name: GF_AUTH_BASIC_ENABLED
value: "true"
- name: GF_AUTH_ANONYMOUS_ENABLED
value: "true"
- name: GF_SECURITY_ADMIN_USER
valueFrom:
secretKeyRef:
name: grafana-credentials
key: user
- name: GF_SECURITY_ADMIN_PASSWORD
valueFrom:
secretKeyRef:
name: grafana-credentials
key: password
volumeMounts:
- name: grafana-storage
mountPath: /var/grafana-storage
ports:
- name: web
containerPort: 3000
resources:
requests:
memory: 100Mi
cpu: 100m
limits:
memory: 200Mi
cpu: 200m
- name: grafana-watcher
image: quay.io/coreos/grafana-watcher:v0.0.5
args:
- '--watch-dir=/var/grafana-dashboards'
- '--grafana-url=http://localhost:3000'
env:
- name: GRAFANA_USER
valueFrom:
secretKeyRef:
name: grafana-credentials
key: user
- name: GRAFANA_PASSWORD
valueFrom:
secretKeyRef:
name: grafana-credentials
key: password
resources:
requests:
memory: "16Mi"
cpu: "50m"
limits:
memory: "32Mi"
cpu: "100m"
volumeMounts:
- name: grafana-dashboards
mountPath: /var/grafana-dashboards
volumes:
- name: grafana-storage
emptyDir: {}
- name: grafana-dashboards
configMap:
name: grafana-dashboards
The error I'm seeing when accessing the above URL is "no endpoints available for service "grafana"", error code 503.
With Kubernetes 1.10 the proxy URL should be slighly different, like this:
http://localhost:8080/api/v1/namespaces/default/services/SERVICE-NAME:PORT-NAME/proxy/
Ref: https://kubernetes.io/docs/tasks/access-application-cluster/access-cluster/#manually-constructing-apiserver-proxy-urls
As Michael says, quite possibly your labels or namespaces are mismatching. However in addition to that, keep in mind that even when you fix the endpoint, the url you're after (http://localhost:8001/api/v1/proxy/namespaces/monitoring/services/grafana) might not work correctly.
Depending on your root_url and/or static_root_path grafana configuration settings, when trying to login you might get grafana trying to POST to http://localhost:8001/login and get a 404.
Try using kubectl port-forward instead:
kubectl -n monitoring port-forward [grafana-pod-name] 3000
then access grafana via http://localhost:3000/
https://kubernetes.io/docs/tasks/access-application-cluster/port-forward-access-application-cluster/
The issue is that Grafana's port is named web, and as a result one needs to append :web to the kubectl proxy URL: http://localhost:8001/api/v1/proxy/namespaces/monitoring/services/grafana:web.
An alternative, is to instead not name the Grafana port, because then you don't have to append :web to the kubectl proxy URL for the service: http://localhost:8001/api/v1/proxy/namespaces/monitoring/services/grafana:web. I went with this option in the end since it's easier.
There are a few factors that might be causing this issue.
The service expects to find one or more supporting endpoints, which it discovers through matching rules on the labels. If the labels don't align, then the service won't find endpoints, and the network gateway function performed by the service will result in 503.
The port declared by the POD and the process within the container are misaligned from the --target-port expected by the service.
Either one of these might generate the error. Let's take a closer look.
First, kubectl describe the service:
$ kubectl describe svc grafana01-grafana-3000
Name: grafana01-grafana-3000
Namespace: default
Labels: app=grafana01-grafana
chart=grafana-0.3.7
component=grafana
heritage=Tiller
release=grafana01
Annotations: <none>
Selector: app=grafana01-grafana,component=grafana,release=grafana01
Type: NodePort
IP: 10.0.0.197
Port: <unset> 3000/TCP
NodePort: <unset> 30905/TCP
Endpoints: 10.1.45.69:3000
Session Affinity: None
Events: <none>
Notice that my grafana service has 1 endpoint listed (there could be multiple). The error above in your example indicates that you won't have endpoints listed here.
Endpoints: 10.1.45.69:3000
Let's take a look next at the selectors. In the example above, you can see I have 3 selector labels on my service:
Selector: app=grafana01-grafana,component=grafana,release=grafana01
I'll kubectl describe my pods next:
$ kubectl describe pod grafana
Name: grafana01-grafana-1843344063-vp30d
Namespace: default
Node: 10.10.25.220/10.10.25.220
Start Time: Fri, 14 Jul 2017 03:25:11 +0000
Labels: app=grafana01-grafana
component=grafana
pod-template-hash=1843344063
release=grafana01
...
Notice that the labels on the pod align correctly, hence my service finds pods which provide endpoints which are load balanced against by the service. Verify that this part of the chain isn't broken in your environment.
If you do find that the labels are correct, you may still have a disconnect in that the grafana process running within the container within the pod is running on a different port than you expect.
$ kubectl describe pod grafana
Name: grafana01-grafana-1843344063-vp30d
...
Containers:
grafana:
Container ID: docker://69f11b7828c01c5c3b395c008d88e8640c5606f4d865107bf4b433628cc36c76
Image: grafana/grafana:latest
Image ID: docker-pullable://grafana/grafana#sha256:11690015c430f2b08955e28c0e8ce7ce1c5883edfc521b68f3fb288e85578d26
Port: 3000/TCP
State: Running
Started: Fri, 14 Jul 2017 03:25:26 +0000
If for some reason, your port under the container listed a different value, then the service is effectively load balancing against an invalid endpoint.
For example, if it listed port 80:
Port: 80/TCP
Or was an empty value
Port:
Then even if your label selectors were correct, the service would never find a valid response from the pod and would remove the endpoint from the rotation.
I suspect your issue is the first problem above (mismatched label selectors).
If both the label selectors and ports align, then you might have a problem with the MTU setting between nodes. In some cases, if the MTU used by your networking layer (like calico) is larger than the MTU of the supporting network, then you'll never get a valid response from the endpoint. Typically, this last potential issue will manifest itself as a timeout rather than a 503 though.
Your Deployment may not have a label app: grafana, or be in another namespace. Could you also post the Deployment definition?

Resources