How to show custom application metrics in Prometheus captured using the golang client library from all pods running in Kubernetes - go

I am trying to get some custom application metrics captured in golang using the prometheus client library to show up in Prometheus.
I have the following working:
I have a go application which is exposing metrics on localhost:8080/metrics as described in this article:
https://godoc.org/github.com/prometheus/client_golang/prometheus
I have a kubernates minikube running which has Prometheus, Grafana and AlertManager running using the operator from this article:
https://github.com/coreos/prometheus-operator/tree/master/contrib/kube-prometheus
I created a docker image for my go app, when I run it and go to localhost:8080/metrics I can see the prometheus metrics showing up in a browser.
I use the following pod.yaml to deploy my docker image to a pod in k8s
apiVersion: v1
kind: Pod
metadata:
name: my-app-pod
labels:
zone: prod
version: v1
annotations:
prometheus.io/scrape: 'true'
prometheus.io/port: '8080'
spec:
containers:
- name: my-container
image: name/my-app:latest
imagePullPolicy: IfNotPresent
ports:
- containerPort: 8080
If I connect to my pod using:
kubectl exec -it my-app-pod -- /bin/bash
then do wget on "localhost:8080/metrics", I can see my metrics
So far so good, here is where I am hitting a wall. I could have multiple pods running this same image. I want to expose all the images to prometheus as targets. How do I configure my pods so that they show up in prometheus so I can report on my custom metrics?
Thanks for any help offered!

The kubernetes_sd_config directive can be used to discover all pods with a given tag. Your Prometheus.yml config file should have something like so:
- job_name: 'some-app'
kubernetes_sd_configs:
- role: pod
relabel_configs:
- source_labels: [__meta_kubernetes_pod_label_app]
regex: python-app
action: keep
The source label [__meta_kubernetes_pod_label_app] is basically using the Kubernetes api to look at pods that have a label of 'app' and whose value is captured by the regex expression, given on the line below (in this case, matching 'python-app').
Once you've done this Prometheus will automatically discover the pods you want and start scraping the metrics from your app.
Hope that helps. You can follow blog post here for more detail.
Note: it is worth mentioning that at the time of writing, kubernetes_sd_config is still in beta. Thus breaking changes to configuration may occur in future releases.

You need 2 things:
a ServiceMonitor for the Prometheus Operator, which specifies which services will be scraped for metrics
a Service which matches the ServiceMonitor and points to your pods
There is an example in the docs over here: https://coreos.com/operators/prometheus/docs/latest/user-guides/running-exporters.html

Can you share the prometheus config that you are using to scrape the metrics. The config will control what all sources to scrape the metrics from. Here are a few links that you can refer to : https://groups.google.com/forum/#!searchin/prometheus-users/Application$20metrics$20monitoring$20of$20Kubernetes$20Pods%7Csort:relevance/prometheus-users/uNPl4nJX9yk/cSKEBqJlBwAJ

Related

Job that executes command inside a pod

What I'd like to ask is if is it possible to create a Kubernetes Job that runs a bash command within another Pod.
apiVersion: batch/v1
kind: Job
metadata:
namespace: dev
name: run-cmd
spec:
ttlSecondsAfterFinished: 180
template:
spec:
containers:
- name: run-cmd
image: <IMG>
command: ["/bin/bash", "-c"]
args:
- <CMD> $POD_NAME
restartPolicy: Never
backoffLimit: 4
I considered using :
Environment variable to define the pod name
Using Kubernetes SDK to automate
But if you have better ideas I am open to them, please!
The Job manifest you shared seems the valid idea.
Yet, you need to take into consideration below points:
As running command inside other pod (pod's some container) requires interacting with Kubernetes API server, you'd need to interact with it using Kubernetes client (e.g. kubectl). This, in turn, requires the client to be installed inside the job's container's image.
job's pod's service account has to have the permissions to pods/exec resource. See docs and this answer.

Record Kubernetes container resource utilization data

I'm doing a perf test for web server which is deployed on EKS cluster. I'm invoking the server using jmeter with different conditions (like varying thread count, payload size, etc..).
So I want to record kubernetes perf data with the timestamp so that I can analyze these data with my jmeter output (JTL).
I have been digging through the internet to find a way to record kubernetes perf data. But I was unable to find a proper way to do that.
Can experts please provide me a standard way to do this??
Note: I have a multi-container pod also.
In line with #Jonas comment
This is the quickest way of installing Prometheus in you K8 cluster. Added Details in the answer as it was impossible to put the commands in a readable format in Comment.
Add bitnami helm repo.
helm repo add bitnami https://charts.bitnami.com/bitnami
Install helmchart for promethus
helm install my-release bitnami/kube-prometheus
Installation output would be:
C:\Users\ameena\Desktop\shine\Article\K8\promethus>helm install my-release bitnami/kube-prometheus
NAME: my-release
LAST DEPLOYED: Mon Apr 12 12:44:13 2021
NAMESPACE: default
STATUS: deployed
REVISION: 1
TEST SUITE: None
NOTES:
** Please be patient while the chart is being deployed **
Watch the Prometheus Operator Deployment status using the command:
kubectl get deploy -w --namespace default -l app.kubernetes.io/name=kube-prometheus-operator,app.kubernetes.io/instance=my-release
Watch the Prometheus StatefulSet status using the command:
kubectl get sts -w --namespace default -l app.kubernetes.io/name=kube-prometheus-prometheus,app.kubernetes.io/instance=my-release
Prometheus can be accessed via port "9090" on the following DNS name from within your cluster:
my-release-kube-prometheus-prometheus.default.svc.cluster.local
To access Prometheus from outside the cluster execute the following commands:
echo "Prometheus URL: http://127.0.0.1:9090/"
kubectl port-forward --namespace default svc/my-release-kube-prometheus-prometheus 9090:9090
Watch the Alertmanager StatefulSet status using the command:
kubectl get sts -w --namespace default -l app.kubernetes.io/name=kube-prometheus-alertmanager,app.kubernetes.io/instance=my-release
Alertmanager can be accessed via port "9093" on the following DNS name from within your cluster:
my-release-kube-prometheus-alertmanager.default.svc.cluster.local
To access Alertmanager from outside the cluster execute the following commands:
echo "Alertmanager URL: http://127.0.0.1:9093/"
kubectl port-forward --namespace default svc/my-release-kube-prometheus-alertmanager 9093:9093
Follow the commands to forward the UI to localhost.
echo "Prometheus URL: http://127.0.0.1:9090/"
kubectl port-forward --namespace default svc/my-release-kube-prometheus-prometheus 9090:9090
Open the UI in browser: http://127.0.0.1:9090/classic/graph
Annotate the pods for sending the metrics.
apiVersion: apps/v1
kind: Deployment
metadata:
name: nginx-deployment
spec:
selector:
matchLabels:
app: nginx
replicas: 4 # Update the replicas from 2 to 4
template:
metadata:
labels:
app: nginx
annotations:
prometheus.io/scrape: 'true'
prometheus.io/port: '9102'
spec:
containers:
- name: nginx
image: nginx:1.14.2
ports:
- containerPort: 80
In the ui put appropriate filters and start observing the crucial parameter such as memory CPU etc. UI supports autocomplete so it will not be that difficult to figure out things.
Regards

Equivalent of kind load or minikube cache in kubeadm

I have 3 nodes kubernetes cluster managing with kubeadm. Previously i used kind and minikube. When I wanted make deployment based on docker image i just need make:
kind load docker-image in kind or,
minikube cache add in minikube,
Now, when I want make deployment in kubeadm I obviously get ImagePullBackOff.
Question: Is a equivalent comment do add image to kubeadm and I can't find it, or there is entirely other way to solve that problem?
EDIT
Maybe, question is not clear enough, so instead of delete it I try to put more details.
I have tree nodes (one control plane, and two workers) with docker, kubeadm, kubelet and kubectl installed on each. One deployment of my future cluster is machine learning module so I need tensorflow:
docker pull tensorflow/tensorflow
Using this image I build my own:
docker build -t mlimage:cluster -f ml.Dockerfile .
Next I prepare deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: mldeployment
spec:
selector:
matchLabels:
app: mldeployment
name: mldeployment
replicas: 1
template:
metadata:
labels:
app: mldeployment
name: mldeployment
spec:
containers:
- name: mlcontainer
image: mlimage:cluster
imagePullPolicy: Never
ports:
- name: http
containerPort: 6060
and create it:
kubectl create -f mldeployment.yaml
Now, when I type
kubectl describe pod
In mldeployment are these events:
In that case of minikube or kind it was enought to simply add image to cluster typing
minikibe cache add ...
and
kind load docker-image ...
respectively.
Question is how to add image from my machine to cluster in case of managing it from kubeadm. I assume that there is similar way to do that like for minikube or kind (without creating any connection to docker hub, because everything is locally).
you are getting ImagePullBackOff due to kubeadm maybe going to check inside the registry.
while if you look at both command minikube cache and kind load is for loading the local images into the cluster.
As I now understand images for cluster managed via kubeadm should be stored in trusted registry like dockerhub or cloud. But if you want make fast solution in separated networks there is a posibility: Docker registry.
There are also some tools ready to use e.g. Trow, or simpler solution.
I used second approach, and it works (code is a bit old, so it may needs some changes, this links may be helpful: change apiVersion, add label
After that changes, first create deployment and daemonSet:
kubectl create -f docker-private-registry.json
kubectl create -f docker-private-registry-proxy.json
Add localhost address to image:
docker tag image:tag 127.0.0.1:5000/image:tag
Check full name of docker private registry deployment, and forward port (replace x by exact deployment name:
kubectl get pod
kubectl port-forward docker-private-registry-deployment-xxxxxxxxx-xxxxx 5000:5000 -n default
Open next terminal window and push image to private registry:
docker push 127.0.0.1:5000/image:tag
Finnaly change in deployment.yaml file containers image (add 127.0.0.1:5000/...) and create deployment.
This solution is very unsafe and vulnerable, so use it wisely only in separated networks for test and dev purposes.

Kubernetes Kibana operator failures and Nginx ingress timeouts

I just started implementing a Kubernetes cluster on an Azure Linux VM. I'm very new with all this. The cluster is running on a small VM (2 core, 16gb). I set up the ECK stack using their tutorial online, and an Nginx Ingress controller to expose it.
Most of the day, everything runs fine. I can access the Kibana dashboard, run Elastic queries, Nginx is working. But about once each day, something happens that causes the Kibana Endpoint matching the Kibana Service to not have any IP address. As a result, the Service can't route correctly to the container. When this happens, the Kibana pod has a status of Running, but says that 0/1 are running. It never triggers any restarts, and as a result, the Kibana dashboard becomes inaccessible. I've tried reproducing this by shutting down the Docker container, force killing the pod, but can't reliably reproduce it.
Looking at the logs on the Kibana pod, there are a bunch of errors due to timeouts. The Nginx logs say that it can't find the Endpoint for the Service. It looks like this could potentially be the source. Has anyone encountered this? Does anyone know a reliable way to prevent this?
This should probably be a separate question, but the other issue this causes is completely blocking all Nginx Ingress. Any new requests are not seen in the logs, and the logs completely stop after the message about not finding an endpoint. As a result, all URLs that Ingress is normally responsible for time out, and the whole cluster becomes externally unusable. This is fixed by deleting the Nginx controller pod, but the pod doesn't restart itself. Can someone explain why an issue like this would completely block Nginx? And why the Nginx pod can't detect this and restart?
Edit:
The Nginx logs end with this:
W1126 16:20:31.517113 6 controller.go:950] Service "default/gwam-kb-http" does not have any active Endpoint.
W1126 16:20:34.848942 6 controller.go:950] Service "default/gwam-kb-http" does not have any active Endpoint.
W1126 16:21:52.555873 6 controller.go:950] Service "default/gwam-kb-http" does not have any active Endpoint.
Any further requests timeout and do not appear in the logs.
I don't have logs for the kibana pod, but they were just consistent timeouts to the kibana service default/gwam-kb-http (same as in Nginx logs above). This caused the readiness probe to fail, and show 0/1 Running, but did not trigger a restart of the pod.
Kibana Endpoints when everything is normal
Name: gwam-kb-http
Namespace: default
Labels: common.k8s.elastic.co/type=kibana
kibana.k8s.elastic.co/name=gwam
Annotations: endpoints.kubernetes.io/last-change-trigger-time: 2020-11-26T16:27:20Z
Subsets:
Addresses: 10.244.0.6
NotReadyAddresses: <none>
Ports:
Name Port Protocol
---- ---- --------
https 5601 TCP
Events: <none>
When I run into this issue, Addresses is empty, and the pod IP is under NotReadyAddresses
I'm using the very basic YAML from the ECK setup tutorial:
Elastic (no problems here)
apiVersion: elasticsearch.k8s.elastic.co/v1
kind: Elasticsearch
metadata:
name: gwam
spec:
version: 7.10.0
nodeSets:
- name: default
count: 3
volumeClaimTemplates:
- metadata:
name: elasticsearch-data
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 2Gi
storageClassName: elasticsearch
Kibana:
apiVersion: kibana.k8s.elastic.co/v1
kind: Kibana
metadata:
name: gwam
spec:
version: 7.10.0
count: 1
elasticsearchRef:
name: gwam
Ingress for the Kibana service:
kind: Ingress
apiVersion: extensions/v1beta1
metadata:
name: nginx-ingress-secure-backend-no-rewrite
annotations:
kubernetes.io/ingress.class: nginx
nginx.org/proxy-connect-timeout: "30s"
nginx.org/proxy-read-timeout: "20s"
nginx.org/proxy-send-timeout: "60s"
nginx.org/client-max-body-size: "4m"
nginx.ingress.kubernetes.io/backend-protocol: "HTTPS"
spec:
tls:
- hosts:
- <internal company site>
secretName: gwam-tls-secret
rules:
- host: <internal company site>
http:
paths:
- path: /
backend:
serviceName: gwam-kb-http
servicePort: 5601
Some more environment details:
Kubernetes version: 1.19.3
OS: Ubuntu 18.04.5 LTS (GNU/Linux 5.4.0-1031-azure x86_64)
edit 2:
Seems like I'm getting some kind of network error here. None of my pods can do a dnslookup for kubernetes.default. All the networking pods are running, but after adding logs to CoreDNS, I'm seeing the following:
[ERROR] plugin/errors: 2 1699910358767628111.9001703618875455268. HINFO: read udp 10.244.0.69:35222->10.234.44.20:53: i/o timeout
I'm using Flannel for my network. Thinking of trying to reset and switch to Calico and increasing nf_conntrack_max as some answers suggest.
This ended up being a very simple mistake on my part. I thought it was a pod or DNS issue, but was just a general network issue. My IP forwarding was turned off. I turned it on with:
sysctl -w net.ipv4.ip_forward=1
And added net.ipv4.ip_forward=1 to /etc/sysctl.conf

Packetbeat does not add Kubernetes metadata

I've started a minikube (using Kubernetes 1.18.3) to test out ECK and specifically packetbeat. The minikube profile is called "packetbeat" (important, as that's the hostname for the Virtualbox VM as well) and I followed the ECK quickstart to get it up and running. ElasticSearch (single node) and Kibana are running fine and packetbeat is gathering flows as well, however, I'm unable to make it add the Kubernetes metadata to the fields.
I'm working in the default namespace and created a ClusterRoleBinding to view for the default ServiceAccount in the namespace. This is working well, if I do not do that, packetbeat will report it is unable to list the Pods on the API server.
This is the Beat config I'm using to make ECK deploy packetbeat:
apiVersion: beat.k8s.elastic.co/v1beta1
kind: Beat
metadata:
name: packetbeat
spec:
type: packetbeat
version: 7.9.0
elasticsearchRef:
name: quickstart
kibanaRef:
name: kibana
config:
packetbeat.interfaces.device: any
packetbeat.protocols:
- type: http
ports: [80, 8000, 8080, 9200]
- type: tls
ports: [443]
packetbeat.flows:
timeout: 30s
period: 10s
processors:
- add_kubernetes_metadata: {}
daemonSet:
podTemplate:
spec:
terminationGracePeriodSeconds: 30
hostNetwork: true
automountServiceAccountToken: true # some older Beat versions are depending on this settings presence in k8s context
dnsPolicy: ClusterFirstWithHostNet
containers:
- name: packetbeat
securityContext:
runAsUser: 0
capabilities:
add:
- NET_ADMIN
(This is mostly a slightly modified example from the ECK example page.) However, this is not working at all. I tried it with "add_kubernetes_metadata: {}" first, but that will error with the message:
2020-08-19T14:23:38.550Z ERROR [kubernetes] kubernetes/util.go:117
kubernetes: Querying for pod failed with error: pods "packetbeat" not
found {"libbeat.processor": "add_kubernetes_metadata"}
This message goes away when I add the "host: packetbeat". I'm no longer getting an error now, but I'm not getting the Kubernetes metadata either. I'm mostly interested in the namespace tag, but I'm not getting any. I do not see any additional errors in the log and it just reports monitoring details every 30 seconds at the moment.
What am I doing wrong? Any more information I can provide to help me debug this?
So the docs are just unclear. Although they do not explicitely state it, you do need to add indexers and matchers. My understanding was that there are "default" ones (as you can disable those), but that does not seem to be the case. Adding the indexers and matchers as per the example in the docs makes the Kubernetes metadata part of the data.

Resources