Kubernetes Kibana operator failures and Nginx ingress timeouts - elasticsearch

I just started implementing a Kubernetes cluster on an Azure Linux VM. I'm very new with all this. The cluster is running on a small VM (2 core, 16gb). I set up the ECK stack using their tutorial online, and an Nginx Ingress controller to expose it.
Most of the day, everything runs fine. I can access the Kibana dashboard, run Elastic queries, Nginx is working. But about once each day, something happens that causes the Kibana Endpoint matching the Kibana Service to not have any IP address. As a result, the Service can't route correctly to the container. When this happens, the Kibana pod has a status of Running, but says that 0/1 are running. It never triggers any restarts, and as a result, the Kibana dashboard becomes inaccessible. I've tried reproducing this by shutting down the Docker container, force killing the pod, but can't reliably reproduce it.
Looking at the logs on the Kibana pod, there are a bunch of errors due to timeouts. The Nginx logs say that it can't find the Endpoint for the Service. It looks like this could potentially be the source. Has anyone encountered this? Does anyone know a reliable way to prevent this?
This should probably be a separate question, but the other issue this causes is completely blocking all Nginx Ingress. Any new requests are not seen in the logs, and the logs completely stop after the message about not finding an endpoint. As a result, all URLs that Ingress is normally responsible for time out, and the whole cluster becomes externally unusable. This is fixed by deleting the Nginx controller pod, but the pod doesn't restart itself. Can someone explain why an issue like this would completely block Nginx? And why the Nginx pod can't detect this and restart?
Edit:
The Nginx logs end with this:
W1126 16:20:31.517113 6 controller.go:950] Service "default/gwam-kb-http" does not have any active Endpoint.
W1126 16:20:34.848942 6 controller.go:950] Service "default/gwam-kb-http" does not have any active Endpoint.
W1126 16:21:52.555873 6 controller.go:950] Service "default/gwam-kb-http" does not have any active Endpoint.
Any further requests timeout and do not appear in the logs.
I don't have logs for the kibana pod, but they were just consistent timeouts to the kibana service default/gwam-kb-http (same as in Nginx logs above). This caused the readiness probe to fail, and show 0/1 Running, but did not trigger a restart of the pod.
Kibana Endpoints when everything is normal
Name: gwam-kb-http
Namespace: default
Labels: common.k8s.elastic.co/type=kibana
kibana.k8s.elastic.co/name=gwam
Annotations: endpoints.kubernetes.io/last-change-trigger-time: 2020-11-26T16:27:20Z
Subsets:
Addresses: 10.244.0.6
NotReadyAddresses: <none>
Ports:
Name Port Protocol
---- ---- --------
https 5601 TCP
Events: <none>
When I run into this issue, Addresses is empty, and the pod IP is under NotReadyAddresses
I'm using the very basic YAML from the ECK setup tutorial:
Elastic (no problems here)
apiVersion: elasticsearch.k8s.elastic.co/v1
kind: Elasticsearch
metadata:
name: gwam
spec:
version: 7.10.0
nodeSets:
- name: default
count: 3
volumeClaimTemplates:
- metadata:
name: elasticsearch-data
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 2Gi
storageClassName: elasticsearch
Kibana:
apiVersion: kibana.k8s.elastic.co/v1
kind: Kibana
metadata:
name: gwam
spec:
version: 7.10.0
count: 1
elasticsearchRef:
name: gwam
Ingress for the Kibana service:
kind: Ingress
apiVersion: extensions/v1beta1
metadata:
name: nginx-ingress-secure-backend-no-rewrite
annotations:
kubernetes.io/ingress.class: nginx
nginx.org/proxy-connect-timeout: "30s"
nginx.org/proxy-read-timeout: "20s"
nginx.org/proxy-send-timeout: "60s"
nginx.org/client-max-body-size: "4m"
nginx.ingress.kubernetes.io/backend-protocol: "HTTPS"
spec:
tls:
- hosts:
- <internal company site>
secretName: gwam-tls-secret
rules:
- host: <internal company site>
http:
paths:
- path: /
backend:
serviceName: gwam-kb-http
servicePort: 5601
Some more environment details:
Kubernetes version: 1.19.3
OS: Ubuntu 18.04.5 LTS (GNU/Linux 5.4.0-1031-azure x86_64)
edit 2:
Seems like I'm getting some kind of network error here. None of my pods can do a dnslookup for kubernetes.default. All the networking pods are running, but after adding logs to CoreDNS, I'm seeing the following:
[ERROR] plugin/errors: 2 1699910358767628111.9001703618875455268. HINFO: read udp 10.244.0.69:35222->10.234.44.20:53: i/o timeout
I'm using Flannel for my network. Thinking of trying to reset and switch to Calico and increasing nf_conntrack_max as some answers suggest.

This ended up being a very simple mistake on my part. I thought it was a pod or DNS issue, but was just a general network issue. My IP forwarding was turned off. I turned it on with:
sysctl -w net.ipv4.ip_forward=1
And added net.ipv4.ip_forward=1 to /etc/sysctl.conf

Related

How to Connect to kafka on localhost (host machine) from app inside kubernetes (minikube)

I am trying to connect my springboot app (running inside minikube) to kafka on my localhost (ie, laptop).
I have tried many things, including headless services, services without selectors, updating minikube \etc\hosts, but nothing works yet.
I get error from spring boot saying No resolvable bootstrap urls given in bootstrap.servers
Can someone please point me to what I am doing wrong?
My Headless Service
apiVersion: v1
kind: Service
metadata:
name: es-local-kafka
namespace: demo
spec:
clusterIP: None
---
apiVersion: v1
kind: Endpoints
metadata:
name: es-local-kafka
subsets:
- addresses:
- ip: "10.0.2.2"
ports:
- name: "kafkabroker1"
port: 9191
- name: "kafkabroker2"
port: 9192
- name: "kafkabroker3"
port: 9193
My application properties for kafka:
kafka.bootstrap-servers=${LOCALHOST}:9191,${LOCALHOST}:9192,${LOCALHOST}:9193
My Config Map:
apiVersion: v1
kind: ConfigMap
metadata:
creationTimestamp: null
name: rr-config
namespace: demo
data:
LOCALHOST: es-local-kafka.demo.svc
Not sure how you are trying to connect service running on Minikube or on the local system and want to leverage kafka on minikube.
If your application running on local system and Kafka on minikube
you can connect the application to Kafka cluster with the IP of minikube also.
Here is good example : https://github.com/d1egoaz/minikube-kafka-cluster
Git clone : https://github.com/d1egoaz/minikube-kafka-cluster
cd minikube-kafka-cluster
kubectl apply -f 00-namespace/
kubectl apply -f 01-zookeeper/
kubectl apply -f 02-kafka/
kubectl apply -f 03-yahoo-kafka-manager/
kubectl get svc -n kafka-ca1 (Note the port of kafka 31445)
list the Ip of minikube
minikube ip
Now from your local system to minikube kafka you can connect with, http://minikube-ip:port you will see UI of kafka manager in browser
If you are running sprint boot application on the minikube
If both services are running in same namespace you just have to use the service name only to connect
Only service name in sprint boot, if port required you can also pass it
es-local-kafka
try with passing full service also
<servicename>.<namespace>.svc.cluster.local
Headless service is for different purposes and service without a selector is weird in that case your service wont be able to connect to PODs.
I eventually got a fix, and doesn't need all the crazy stuff I was referring to in my question:
You need to make sure your kafka broker is bound to 0.0.0.0 instead of 127.0.0.0 (localhost) . By default, in the single node kafka broker setup, this is what is used. I went with this, due to both time constraint, and the fact that this was just for a POC in my local (prod will have a specific dns-able kafka URL anyway, and no such localhost shenanigans needed)
In the kafka URL in your application properties file, instead of localhost, you need to give ip as as the minikube ip. This is the same ip that you will get if you do the command minikube ip :)
Read more about how this works here: https://minikube.sigs.k8s.io/docs/handbook/host-access/

Packetbeat does not add Kubernetes metadata

I've started a minikube (using Kubernetes 1.18.3) to test out ECK and specifically packetbeat. The minikube profile is called "packetbeat" (important, as that's the hostname for the Virtualbox VM as well) and I followed the ECK quickstart to get it up and running. ElasticSearch (single node) and Kibana are running fine and packetbeat is gathering flows as well, however, I'm unable to make it add the Kubernetes metadata to the fields.
I'm working in the default namespace and created a ClusterRoleBinding to view for the default ServiceAccount in the namespace. This is working well, if I do not do that, packetbeat will report it is unable to list the Pods on the API server.
This is the Beat config I'm using to make ECK deploy packetbeat:
apiVersion: beat.k8s.elastic.co/v1beta1
kind: Beat
metadata:
name: packetbeat
spec:
type: packetbeat
version: 7.9.0
elasticsearchRef:
name: quickstart
kibanaRef:
name: kibana
config:
packetbeat.interfaces.device: any
packetbeat.protocols:
- type: http
ports: [80, 8000, 8080, 9200]
- type: tls
ports: [443]
packetbeat.flows:
timeout: 30s
period: 10s
processors:
- add_kubernetes_metadata: {}
daemonSet:
podTemplate:
spec:
terminationGracePeriodSeconds: 30
hostNetwork: true
automountServiceAccountToken: true # some older Beat versions are depending on this settings presence in k8s context
dnsPolicy: ClusterFirstWithHostNet
containers:
- name: packetbeat
securityContext:
runAsUser: 0
capabilities:
add:
- NET_ADMIN
(This is mostly a slightly modified example from the ECK example page.) However, this is not working at all. I tried it with "add_kubernetes_metadata: {}" first, but that will error with the message:
2020-08-19T14:23:38.550Z ERROR [kubernetes] kubernetes/util.go:117
kubernetes: Querying for pod failed with error: pods "packetbeat" not
found {"libbeat.processor": "add_kubernetes_metadata"}
This message goes away when I add the "host: packetbeat". I'm no longer getting an error now, but I'm not getting the Kubernetes metadata either. I'm mostly interested in the namespace tag, but I'm not getting any. I do not see any additional errors in the log and it just reports monitoring details every 30 seconds at the moment.
What am I doing wrong? Any more information I can provide to help me debug this?
So the docs are just unclear. Although they do not explicitely state it, you do need to add indexers and matchers. My understanding was that there are "default" ones (as you can disable those), but that does not seem to be the case. Adding the indexers and matchers as per the example in the docs makes the Kubernetes metadata part of the data.

upstream connect error or disconnect/reset before headers. reset reason: connection failure. Spring Boot and java 11

I'm having a problem migrating my pure Kubernetes app to an Istio managed. I'm using Google Cloud Platform (GCP), Istio 1.4, Google Kubernetes Engine (GKE), Spring Boot and JAVA 11.
I had the containers running in a pure GKE environment without a problem. Now I started the migration of my Kubernetes cluster to use Istio. Since then I'm getting the following message when I try to access the exposed service.
upstream connect error or disconnect/reset before headers. reset reason: connection failure
This error message looks like a really generic. I found a lot of different problems, with the same error message, but no one was related to my problem.
Bellow the version of the Istio:
client version: 1.4.10
control plane version: 1.4.10-gke.5
data plane version: 1.4.10-gke.5 (2 proxies)
Bellow my yaml files:
apiVersion: v1
kind: ServiceAccount
metadata:
labels:
account: tree-guest
name: tree-guest-service-account
---
apiVersion: v1
kind: Service
metadata:
labels:
app: tree-guest
service: tree-guest
name: tree-guest
spec:
ports:
- name: http
port: 8080
targetPort: 8080
selector:
app: tree-guest
---
apiVersion: apps/v1
kind: Deployment
metadata:
labels:
app: tree-guest
version: v1
name: tree-guest-v1
spec:
replicas: 1
selector:
matchLabels:
app: tree-guest
version: v1
template:
metadata:
labels:
app: tree-guestaz
version: v1
spec:
containers:
- image: registry.hub.docker.com/victorsens/tree-quest:circle_ci_build_00923285-3c44-4955-8de1-ed578e23c5cf
imagePullPolicy: IfNotPresent
name: tree-guest
ports:
- containerPort: 8080
serviceAccount: tree-guest-service-account
---
apiVersion: networking.istio.io/v1alpha3
kind: Gateway
metadata:
name: tree-guest-gateway
spec:
selector:
istio: ingressgateway # use istio default controller
servers:
- port:
number: 80
name: http
protocol: HTTP
hosts:
- "*"
---
apiVersion: networking.istio.io/v1alpha3
kind: VirtualService
metadata:
name: tree-guest-virtual-service
spec:
hosts:
- "*"
gateways:
- tree-guest-gateway
http:
- match:
- uri:
prefix: /v1
route:
- destination:
host: tree-guest
port:
number: 8080
To apply the yaml file I used the following argument:
kubectl apply -f <(istioctl kube-inject -f ./tree-guest.yaml)
Below the result of the Istio proxy argument, after deploying the application:
istio-ingressgateway-6674cc989b-vwzqg.istio-system SYNCED SYNCED SYNCED SYNCED
istio-pilot-ff4489db8-2hx5f 1.4.10-gke.5 tree-guest-v1-774bf84ddd-jkhsh.default SYNCED SYNCED SYNCED SYNCED istio-pilot-ff4489db8-2hx5f 1.4.10-gke.5
If someone have a tip about what is going wrong, please let me know. I'm stuck in this problem for a couple of days.
Thanks.
As #Victor mentioned the problem here was the wrong yaml file.
I solve it. In my case the yaml file was wrong. I reviewed it and the problem now is solved. Thank you guys., – Victor
If you're looking for yaml samples I would suggest to take a look at istio github samples.
As 503 upstream connect error or disconnect/reset before headers. reset reason: connection failure occurs very often I set up little troubleshooting answer, there are another questions with 503 error which I encountered for several months with answers, useful informations from istio documentation and things I would check.
Examples with 503 error:
Istio 503:s between (Public) Gateway and Service
IstIO egress gateway gives HTTP 503 error
Istio Ingress Gateway with TLS termination returning 503 service unavailable
how to terminate ssl at ingress-gateway in istio?
Accessing service using istio ingress gives 503 error when mTLS is enabled
Common cause of 503 errors from istio documentation:
https://istio.io/docs/ops/best-practices/traffic-management/#avoid-503-errors-while-reconfiguring-service-routes
https://istio.io/docs/ops/common-problems/network-issues/#503-errors-after-setting-destination-rule
https://istio.io/latest/docs/concepts/traffic-management/#working-with-your-applications
Few things I would check first:
Check services ports name, Istio can route correctly the traffic if it knows the protocol. It should be <protocol>[-<suffix>] as mentioned in istio
documentation.
Check mTLS, if there are any problems caused by mTLS, usually those problems would result in error 503.
Check if istio works, I would recommend to apply bookinfo application example and check if it works as expected.
Check if your namespace is injected with kubectl get namespace -L istio-injection
If the VirtualService using the subsets arrives before the DestinationRule where the subsets are defined, the Envoy configuration generated by Pilot would refer to non-existent upstream pools. This results in HTTP 503 errors until all configuration objects are available to Pilot.
I landed exactly here with exactly similar symptoms.
But in my case I had to
switch pod listen address from 172.0.0.1 to 0.0.0.0
which solved my issue

Changing Kubernetes' node-proxy tcp keepalive time

How do I properly change the TCP keepalive time for node-proxy?
I am running Kubernetes in Google Container Engine and have set up an ingress backed by HTTP(S) Google Load Balancer. When I continuously make POST requests to the ingress, I get a 502 error exactly once every 80 seconds or so. backend_connection_closed_before_data_sent_to_client error in Cloud Logging, which is because GLB's tcp keepalive (600 seconds) is larger than node-proxy's keepalive (no clue what it is).
The logged error is detailed in https://cloud.google.com/compute/docs/load-balancing/http/.
Thanks!
You can use the custom resource BackendConfig that exist on each GKE cluster to configure timeouts and other parameters like CDN here is the documentacion
An example from here shows how to configure on the ingress
That is the BackendConfig definition:
apiVersion: cloud.google.com/v1beta1
kind: BackendConfig
metadata:
name: my-bsc-backendconfig
spec:
timeoutSec: 40
connectionDraining:
drainingTimeoutSec: 60
And this is how to use on the ingress definition through annotations
apiVersion: v1
kind: Service
metadata:
name: my-bsc-service
labels:
purpose: bsc-config-demo
annotations:
beta.cloud.google.com/backend-config: '{"ports": {"80":"my-bsc-backendconfig"}}'
spec:
type: NodePort
selector:
purpose: bsc-config-demo
ports:
- port: 80
protocol: TCP
targetPort: 8080
just for the sake of understanding, when you use Google solution to load-balance and manage your Kubernetes Ingress, you will have GLBC pods running in kube-system namespace.
You can check it out with :
kubectl -n kube-system get po
These pods are intended to route the incoming traffic from the actual Google Load Balancer.
I think that the timeouts should be configured there, on GLBC. You should check what annotations or ConfigMap GLBC can take to be configured, if any.
You can find details there :
https://github.com/kubernetes/kubernetes/tree/master/cluster/addons/cluster-loadbalancing/glbc
https://github.com/kubernetes/ingress/blob/master/controllers/gce/README.md
https://github.com/kubernetes/ingress/blob/master/controllers/gce/rc.yaml#L64
Personally I prefer to use the Nginx Ingress Controller for now, and it has necessary annotations and ConfigMap support.
See :
https://github.com/kubernetes/ingress/blob/master/controllers/nginx/README.md

Kubernetes Ingress Controller on Vagrant

Is there anything special about running ingress controllers on Kubernetes CoreOS Vagrant Multi-Machine? I followed the example but when I run kubectl -f I do not get an address.
Example:
http://kubernetes.io/v1.1/docs/user-guide/ingress.html#single-service-ingress
Setup:
https://coreos.com/kubernetes/docs/latest/kubernetes-on-vagrant.html
I looked at networking in kubernetes. Everything looks like it should run without further configuration.
My goal is to create a local testing environment before I build out a production platform. I'm thinking there's something about how they setup their virtualbox networking. I'm about to dive into the CoreOS cloud config but thought I would ask first.
UPDATE
Yes I'm running an ingress controller.
https://github.com/kubernetes/contrib/blob/master/Ingress/controllers/nginx-alpha/rc.yaml
It runs without giving an error. It's just when I run kubectl -f I do not get an address. I'm thinking there's either two things:
I have to do something extra in networking for CoreOS-Kubernetes vagrant multi-node.
It's running right, but I'm point my localhost to the wrong IP. I'm using a 172.17.4.x ip, I also have 10.0.0.x . I can access services through the 172.17.4.x using a NodePort, but I can get to my Ingress.
Here is the code:
apiVersion: v1
kind: ReplicationController
metadata:
name: nginx-ingress
labels:
app: nginx-ingress
spec:
replicas: 1
selector:
app: nginx-ingress
template:
metadata:
labels:
app: nginx-ingress
spec:
containers:
- image: gcr.io/google_containers/nginx-ingress:0.1
imagePullPolicy: Always
name: nginx
ports:
- containerPort: 80
hostPort: 80
Update 2
Output of commands:
kubectl get pods
NAME READY STATUS RESTARTS AGE
echoheaders-kkja7 1/1 Running 0 24m
nginx-ingress-2wwnk 1/1 Running 0 25m
kubectl logs nginx-ingress-2wwnk --previous
Pod "nginx-ingress-2wwnk" in namespace "default": previous terminated container "nginx" not found
kubectl exec nginx-ingress-2wwnk -- cat /etc/nginx/nginx.conf
events {
worker_connections 1024;
}
http {
}%
I'm running an echoheaders service on NodePort. When I type the node IP and port on my browser, I get that just fine.
I restarted all nodes in virtualbox too.
With a lot help from kubernetes irc and slack, I fixed this a while back. If I remember correctly, I had the ingress service listening on a port that was already being used, I think for vagrant. These commands really help:
kubectl get pod <nginx-ingress pod> -o json
kubectl exec <nginx-ingress pod> -- cat /etc/nginx/nginx.conf
kubectl get pods -o wide
kubectl logs <nginx-ingress pod> --previous

Resources