Why Prometheus pod pending after setup it by helm in Kubernetes cluster on Rancher server? - vagrant

Installed Rancher server and 2 Rancher agents in Vagrant. Then switch to K8S environment from Rancher server.
On Rancher server host, installed kubectl and helm. Then installed Prometheus by Helm:
helm install stable/prometheus
Now check the status from Kubernetes dashboard, there are 2 pods pending:
It noticed PersistentVolumeClaim is not bound, so aren't the K8S components been installed default with Rancher server?
(another name, same issue)
Edit
> kubectl get pvc
NAME STATUS VOLUME CAPACITY
ACCESSMODES STORAGECLASS AGE
voting-prawn-prometheus-alertmanager Pending 6h
voting-prawn-prometheus-server Pending 6h
> kubectl get pv
No resources found.
Edit 2
$ kubectl describe pvc voting-prawn-prometheus-alertmanager
Name: voting-prawn-prometheus-alertmanager
Namespace: default
StorageClass:
Status: Pending
Volume:
Labels: app=prometheus
chart=prometheus-4.6.9
component=alertmanager
heritage=Tiller
release=voting-prawn
Annotations: <none>
Capacity:
Access Modes:
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal FailedBinding 12s (x10 over 2m) persistentvolume-controller no persistent volumes available for this claim and no storage class is set
$ kubectl describe pvc voting-prawn-prometheus-server
Name: voting-prawn-prometheus-server
Namespace: default
StorageClass:
Status: Pending
Volume:
Labels: app=prometheus
chart=prometheus-4.6.9
component=server
heritage=Tiller
release=voting-prawn
Annotations: <none>
Capacity:
Access Modes:
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal FailedBinding 12s (x14 over 3m) persistentvolume-controller no persistent volumes available for this claim and no storage class is set

I had same issues as you. I found two ways to solve this:
edit values.yaml under persistentVolumes.enabled=false this will allow you to use emptyDir "this applies to Prometheus-Server and AlertManager"
If you can't change values.yaml you will have to create the PV before deploying the chart so that the pod can bind to the volume otherwise it will stay in the pending state forever

PV are cluster scoped and PVC are namespaced scope.
If your application running in a different namespace and PVC in a different namespace, it can be issue.
If yes, use RBAC to give proper permissions, or put app and PVC in same namespace.
Can you make sure PV which is getting created from Storage class is the default SC of the cluster ?

I found that i was missing storage class and storage volumes. fixed similar problems on my cluster by first creating a storage class.
kubectl apply -f storageclass.ymal
storageclass.ymal:
{
"kind": "StorageClass",
"apiVersion": "storage.k8s.io/v1",
"metadata": {
"name": "local-storage",
"annotations": {
"storageclass.kubernetes.io/is-default-class": "true"
}
},
"provisioner": "kubernetes.io/no-provisioner",
"reclaimPolicy": "Delete"
and the using the storage class when install Prometheus with helm
helm install stable/prometheus --set server.storageClass=local-storage
and i was also forced to create a volume for Prometheus to bind to
kubectl apply -f prometheusVolume.yaml
prometheusVolume.yaml:
apiVersion: v1
kind: PersistentVolume
metadata:
name: prometheus-volume
spec:
storageClassName: local-storage
capacity:
storage: 2Gi #Size of the volume
accessModes:
- ReadWriteOnce #type of access
hostPath:
path: "/mnt/data" #host location
You could use other storage classes, found that there as a lot to chose between but then there might be other steps involved to get it working.

Related

ElasticSearch CrashLoopBackoff when deploying with ECK in Kubernetes OKD 4.11

I am running Kubernetes using OKD 4.11 (running on vSphere) and have validated the basic functionality (including dyn. volume provisioning) using applications (like nginx).
I also applied
oc adm policy add-scc-to-group anyuid system:authenticated
to allow authenticated users to use anyuid (which seems to have been required to deploy the nginx example I was testing with).
Then I installed ECK using this quickstart with kubectl to install the CRD and RBAC manifests. This seems to have worked.
Then I deployed the most basic ElasticSearch quickstart example with kubectl apply -f quickstart.yaml using this manifest:
apiVersion: elasticsearch.k8s.elastic.co/v1
kind: Elasticsearch
metadata:
name: quickstart
spec:
version: 8.4.2
nodeSets:
- name: default
count: 1
config:
node.store.allow_mmap: false
The deployment proceeds as expected, pulling image and starting container, but ends in a CrashLoopBackoff with the following error from ElasticSearch at the end of the log:
"elasticsearch.cluster.name":"quickstart",
"error.type":"java.lang.IllegalStateException",
"error.message":"failed to obtain node locks, tried
[/usr/share/elasticsearch/data]; maybe these locations
are not writable or multiple nodes were started on the same data path?"
Looking into the storage, the PV and PVC are created successfully, the output of kubectl get pv,pvc,sc -A -n my-namespace is:
NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGECLASS REASON AGE
persistentvolume/pvc-9d7b57db-8afd-40f7-8b3d-6334bdc07241 1Gi RWO Delete Bound my-namespace/elasticsearch-data-quickstart-es-default-0 thin 41m
NAMESPACE NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
my-namespace persistentvolumeclaim/elasticsearch-data-quickstart-es-default-0 Bound pvc-9d7b57db-8afd-40f7-8b3d-6334bdc07241 1Gi RWO thin 41m
NAMESPACE NAME PROVISIONER RECLAIMPOLICY VOLUMEBINDINGMODE ALLOWVOLUMEEXPANSION AGE
storageclass.storage.k8s.io/thin (default) kubernetes.io/vsphere-volume Delete Immediate false 19d
storageclass.storage.k8s.io/thin-csi csi.vsphere.vmware.com Delete WaitForFirstConsumer true 19d
Looking at the pod yaml, it appears that the volume is correctly attached :
volumes:
- name: elasticsearch-data
persistentVolumeClaim:
claimName: elasticsearch-data-quickstart-es-default-0
- name: downward-api
downwardAPI:
items:
- path: labels
fieldRef:
apiVersion: v1
fieldPath: metadata.labels
defaultMode: 420
....
volumeMounts:
...
- name: elasticsearch-data
mountPath: /usr/share/elasticsearch/data
I cannot understand why the volume would be read-only or rather why ES cannot create the lock.
I did find this similar issue, but I am not sure how to apply the UID permissions (in general I am fairly naive about the way permissions work in OKD) when when working with ECK.
Does anyone with deeper K8s / OKD or ECK/ElasticSearch knowledge have an idea how to better isolate and/or resolve this issue?
Update: I believe this has something to do with this issue and am researching the optionas related to OKD.
For posterity, the ECK starts an init container that should take care of the chown on the data volume, but can only do so if it is running as root.
The resolution for me was documented here:
https://repo1.dso.mil/dsop/elastic/elasticsearch/elasticsearch/-/issues/7
The manifest now looks like this:
apiVersion: elasticsearch.k8s.elastic.co/v1
kind: Elasticsearch
metadata:
name: quickstart
spec:
version: 8.4.2
nodeSets:
- name: default
count: 1
config:
node.store.allow_mmap: false
# run init container as root to chown the volume to uid 1000
podTemplate:
spec:
securityContext:
runAsUser: 1000
runAsGroup: 0
initContainers:
- name: elastic-internal-init-filesystem
securityContext:
runAsUser: 0
runAsGroup: 0
And the pod starts up and can write to the volume as uid 1000.

Record Kubernetes container resource utilization data

I'm doing a perf test for web server which is deployed on EKS cluster. I'm invoking the server using jmeter with different conditions (like varying thread count, payload size, etc..).
So I want to record kubernetes perf data with the timestamp so that I can analyze these data with my jmeter output (JTL).
I have been digging through the internet to find a way to record kubernetes perf data. But I was unable to find a proper way to do that.
Can experts please provide me a standard way to do this??
Note: I have a multi-container pod also.
In line with #Jonas comment
This is the quickest way of installing Prometheus in you K8 cluster. Added Details in the answer as it was impossible to put the commands in a readable format in Comment.
Add bitnami helm repo.
helm repo add bitnami https://charts.bitnami.com/bitnami
Install helmchart for promethus
helm install my-release bitnami/kube-prometheus
Installation output would be:
C:\Users\ameena\Desktop\shine\Article\K8\promethus>helm install my-release bitnami/kube-prometheus
NAME: my-release
LAST DEPLOYED: Mon Apr 12 12:44:13 2021
NAMESPACE: default
STATUS: deployed
REVISION: 1
TEST SUITE: None
NOTES:
** Please be patient while the chart is being deployed **
Watch the Prometheus Operator Deployment status using the command:
kubectl get deploy -w --namespace default -l app.kubernetes.io/name=kube-prometheus-operator,app.kubernetes.io/instance=my-release
Watch the Prometheus StatefulSet status using the command:
kubectl get sts -w --namespace default -l app.kubernetes.io/name=kube-prometheus-prometheus,app.kubernetes.io/instance=my-release
Prometheus can be accessed via port "9090" on the following DNS name from within your cluster:
my-release-kube-prometheus-prometheus.default.svc.cluster.local
To access Prometheus from outside the cluster execute the following commands:
echo "Prometheus URL: http://127.0.0.1:9090/"
kubectl port-forward --namespace default svc/my-release-kube-prometheus-prometheus 9090:9090
Watch the Alertmanager StatefulSet status using the command:
kubectl get sts -w --namespace default -l app.kubernetes.io/name=kube-prometheus-alertmanager,app.kubernetes.io/instance=my-release
Alertmanager can be accessed via port "9093" on the following DNS name from within your cluster:
my-release-kube-prometheus-alertmanager.default.svc.cluster.local
To access Alertmanager from outside the cluster execute the following commands:
echo "Alertmanager URL: http://127.0.0.1:9093/"
kubectl port-forward --namespace default svc/my-release-kube-prometheus-alertmanager 9093:9093
Follow the commands to forward the UI to localhost.
echo "Prometheus URL: http://127.0.0.1:9090/"
kubectl port-forward --namespace default svc/my-release-kube-prometheus-prometheus 9090:9090
Open the UI in browser: http://127.0.0.1:9090/classic/graph
Annotate the pods for sending the metrics.
apiVersion: apps/v1
kind: Deployment
metadata:
name: nginx-deployment
spec:
selector:
matchLabels:
app: nginx
replicas: 4 # Update the replicas from 2 to 4
template:
metadata:
labels:
app: nginx
annotations:
prometheus.io/scrape: 'true'
prometheus.io/port: '9102'
spec:
containers:
- name: nginx
image: nginx:1.14.2
ports:
- containerPort: 80
In the ui put appropriate filters and start observing the crucial parameter such as memory CPU etc. UI supports autocomplete so it will not be that difficult to figure out things.
Regards

Digital Ocean managed Kubernetes volume in pending state

It's not so digital ocean specific, would be really nice to verify if this is an expected behavior or not.
I'm trying to setup ElasticSearch cluster on DO managed Kubernetes cluster with helm chart from ElasticSearch itself
And they say that I need to specify a storageClassName in a volumeClaimTemplate in order to use volume which is provided by managed kubernetes service. For DO it's do-block-storages according to their docs. Also seems to be it's not necessary to define PVC, helm chart should do it itself.
Here's config I'm using
# Specify node pool
nodeSelector:
doks.digitalocean.com/node-pool: elasticsearch
# Shrink default JVM heap.
esJavaOpts: "-Xmx128m -Xms128m"
# Allocate smaller chunks of memory per pod.
resources:
requests:
cpu: "100m"
memory: "512M"
limits:
cpu: "1000m"
memory: "512M"
# Specify Digital Ocean storage
# Request smaller persistent volumes.
volumeClaimTemplate:
accessModes: [ "ReadWriteOnce" ]
storageClassName: do-block-storage
resources:
requests:
storage: 10Gi
extraInitContainers: |
- name: create
image: busybox:1.28
command: ['mkdir', '/usr/share/elasticsearch/data/nodes/']
volumeMounts:
- mountPath: /usr/share/elasticsearch/data
name: elasticsearch-master
- name: file-permissions
image: busybox:1.28
command: ['chown', '-R', '1000:1000', '/usr/share/elasticsearch/']
volumeMounts:
- mountPath: /usr/share/elasticsearch/data
name: elasticsearch-master
Helm chart i'm setting with terraform, but it doesn't matter anyway, which way you'll do it:
resource "helm_release" "elasticsearch" {
name = "elasticsearch"
chart = "elastic/elasticsearch"
namespace = "elasticsearch"
values = [
file("charts/elasticsearch.yaml")
]
}
Here's what I've got when checking pod logs:
51s Normal Provisioning persistentvolumeclaim/elasticsearch-master-elasticsearch-master-2 External provisioner is provisioning volume for claim "elasticsearch/elasticsearch-master-elasticsearch-master-2"
2m28s Normal ExternalProvisioning persistentvolumeclaim/elasticsearch-master-elasticsearch-master-2 waiting for a volume to be created, either by external provisioner "dobs.csi.digitalocean.com" or manually created by system administrator
I'm pretty sure the problem is a volume. it should've been automagically provided by kubernetes. Describing persistent storage gives this:
holms#debian ~/D/c/s/b/t/s/post-infra> kubectl describe pvc elasticsearch-master-elasticsearch-master-0 --namespace elasticsearch
Name: elasticsearch-master-elasticsearch-master-0
Namespace: elasticsearch
StorageClass: do-block-storage
Status: Pending
Volume:
Labels: app=elasticsearch-master
Annotations: volume.beta.kubernetes.io/storage-provisioner: dobs.csi.digitalocean.com
Finalizers: [kubernetes.io/pvc-protection]
Capacity:
Access Modes:
VolumeMode: Filesystem
Mounted By: elasticsearch-master-0
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Provisioning 4m57s (x176 over 14h) dobs.csi.digitalocean.com_master-setupad-eu_04e43747-fafb-11e9-b7dd-e6fd8fbff586 External provisioner is provisioning volume for claim "elasticsearch/elasticsearch-master-elasticsearch-master-0"
Normal ExternalProvisioning 93s (x441 over 111m) persistentvolume-controller waiting for a volume to be created, either by external provisioner "dobs.csi.digitalocean.com" or manually created by system administrator
I've google everything already, it seems to be everything is correct, and volume should be up withing DO side with no problems, but it hangs in pending state. Is this expected behavior or should I ask DO support to check what's going on their side?
Yes, this is expected behavior. This chart might not be compatible with Digital Ocean Kubernetes service.
Digital Ocean documentation has the following information in Known Issues section:
Support for resizing DigitalOcean Block Storage Volumes in Kubernetes has not yet been implemented.
In the DigitalOcean Control Panel, cluster resources (worker nodes, load balancers, and block storage volumes) are listed outside of the Kubernetes page. If you rename or otherwise modify these resources in the control panel, you may render them unusable to the cluster or cause the reconciler to provision replacement resources. To avoid this, manage your cluster resources exclusively with kubectl or from the control panel’s Kubernetes page.
In the charts/stable/elasticsearch there are specific requirements mentioned:
Prerequisites Details
Kubernetes 1.10+
PV dynamic provisioning support on the underlying infrastructure
You can ask Digital Ocean support for help or try to deploy ElasticSearch without helm chart.
It is even mentioned on github that:
Automated testing of this chart is currently only run against GKE (Google Kubernetes Engine).
Update:
The same issue is present on my kubeadm ha cluster.
However I managed to get it working by manually creating PersistentVolumes's for my storageclass.
My storageclass definition: storageclass.yaml:
kind: StorageClass
apiVersion: storage.k8s.io/v1
metadata:
name: ssd
provisioner: kubernetes.io/no-provisioner
volumeBindingMode: WaitForFirstConsumer
parameters:
type: pd-ssd
$ kubectl apply -f storageclass.yaml
$ kubectl get sc
NAME PROVISIONER AGE
ssd local 50m
My PersistentVolume definition: pv.yaml:
apiVersion: v1
kind: PersistentVolume
metadata:
name: task-pv-volume
labels:
type: local
spec:
storageClassName: ssd
capacity:
storage: 30Gi
accessModes:
- ReadWriteOnce
hostPath:
path: "/mnt/data"
nodeAffinity:
required:
nodeSelectorTerms:
- matchExpressions:
- key: kubernetes.io/hostname
operator: In
values:
- <name of the node>
kubectl apply -f pv.yaml
After that I ran helm chart:
helm install stable/elasticsearch --name my-release --set data.persistence.storageClass=ssd,data.storage=30Gi --set data.persistence.storageClass=ssd,master.storage=30Gi
PVC finally got bound.
$ kubectl get pvc -A
NAMESPACE NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
default data-my-release-elasticsearch-data-0 Bound task-pv-volume2 30Gi RWO ssd 17m
default data-my-release-elasticsearch-master-0 Pending 17m
Note that I only manually satisfied only single pvc and ElasticSearch manual volume provisioning might be very inefficient.
I suggest contacting DO support for automated volume provisioning solution.
What a strange situation, after I've changed 10Gi to 10G it started to work. Maybe it has to do something with a storage class it's self, but it started to work.

Connecting jaeger with elasticsearch backend storage on kubernetes cluster

I have a kubernetes cluster on google cloud platform, and on it, I have a jaeger deployment via development setup of jaeger-kubernetes templates
because my purpose is setup elasticsearch like backend storage, due to this, I follow the jaeger-kubernetes github documentation with the following actions
I've created the services via production setup options
Here are configured the URLs to access to elasticsearch server and username and password and ports
kubectl create -f https://raw.githubusercontent.com/jaegertracing/jaeger-kubernetes/master/production-elasticsearch/configmap.yml
And here, there are configured the download of docker images of the elasticsearch service and their volume mounts.
kubectl create -f https://raw.githubusercontent.com/jaegertracing/jaeger-kubernetes/master/production-elasticsearch/elasticsearch.yml
And then, at this moment we have a elasticsearch service running over 9200 and 9300 ports
kubectl get service elasticsearch [a89fbe2]
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
elasticsearch ClusterIP None <none> 9200/TCP,9300/TCP 1h
I've follow with the creation of jaeger components using the kubernetes-jaeger production templates of this way:
λ bgarcial [~] → kubectl create -f https://raw.githubusercontent.com/jaegertracing/jaeger-kubernetes/master/jaeger-production-template.yml
deployment.extensions/jaeger-collector created
service/jaeger-collector created
service/zipkin created
deployment.extensions/jaeger-query created
service/jaeger-query created
daemonset.extensions/jaeger-agent created
λ bgarcial [~/workspace/jaeger-elastic] at master ?
According to the Jaeger architecture, the jaeger-collector and jaeger-query services require access to backend storage.
And so, these are my services running on my kubernetes cluster:
λ bgarcial [~/workspace/jaeger-elastic] at  master ?
→ kubectl get services [baefdf9]
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
elasticsearch ClusterIP None <none> 9200/TCP,9300/TCP 3h
jaeger-collector ClusterIP 10.55.253.240 <none> 14267/TCP,14268/TCP,9411/TCP 3h
jaeger-query LoadBalancer 10.55.248.243 35.228.179.167 80:30398/TCP 3h
kubernetes ClusterIP 10.55.240.1 <none> 443/TCP 3h
zipkin ClusterIP 10.55.240.60 <none> 9411/TCP 3h
λ bgarcial [~/workspace/jaeger-elastic] at  master ?
I going to configmap.yml elastic search file kubectl edit configmap jaeger-configuration command in order to try to edit it in relation to the elasticsearch URLs endpoints (may be? ... At this moment I am supossing that this is the next step ...)
I execute it:
λ bgarcial [~] → kubectl edit configmap jaeger-configuration
And I get the following edit entry:
apiVersion: v1
data:
agent: |
collector:
host-port: "jaeger-collector:14267"
collector: |
es:
server-urls: http://elasticsearch:9200
username: elastic
password: changeme
collector:
zipkin:
http-port: 9411
query: |
es:
server-urls: http://elasticsearch:9200
username: elastic
password: changeme
span-storage-type: elasticsearch
kind: ConfigMap
metadata:
creationTimestamp: "2018-12-27T13:24:11Z"
labels:
app: jaeger
jaeger-infra: configuration
name: jaeger-configuration
namespace: default
resourceVersion: "1387"
selfLink: /api/v1/namespaces/default/configmaps/jaeger-configuration
uid: b28eb5f4-09da-11e9-9f1e-42010aa60002
Here ... do I need setup our own URLs to collector and query services, which will be connect wiht elasticsearch backend service?
How to can I setup the elasticsearch IP address or URLs here?
In the jaeger components, the query and collector need access to storage, but I don't know what is the elastic endpoint ...
Is this server-urls: http://elasticsearch:9200 a correct endpoint?
I am starting in the kubernetes and DevOps world, and I appreciate if someone can help me in the concepts and point me in the right address in order to setup jaeger and elasticsearch as a backend storage.
When you are accessing the service from the pod in the same namespace you can use just the service name.
Example:
http://elasticsearch:9200
If you are accessing the service from the pod in the different namespace you should also specify the namespace.
Example:
http://elasticsearch.mynamespace:9200
http://elasticsearch.mynamespace.svc.cluster.local:9200
To check in what namespace the service is located, use the following command:
kubectl get svc --all-namespaces -o wide
Note: Changing ConfigMap does not apply it to deployment instantly. Usually, you need to restart all pods in the deployment to apply new ConfigMap values. There is no rolling-restart functionality at the moment, but you can use the following command as a workaround:
(replace deployment name and pod name with the real ones)
kubectl patch deployment mydeployment -p '{"spec":{"template":{"spec":{"containers":[{"name":"my-pod-name","env":[{"name":"START_TIME","value":"'$(date +%s)'"}]}]}}}}'

Fail to connect to kubectl from client-go - /serviceaccount/token: no such file

I am using golang lib client-go to connect to a running local kubrenets. To start with I took code from the example: out-of-cluster-client-configuration.
Running a code like this:
$ KUBERNETES_SERVICE_HOST=localhost KUBERNETES_SERVICE_PORT=6443 go run ./main.go results in following error:
panic: open /var/run/secrets/kubernetes.io/serviceaccount/token: no such file or directory
goroutine 1 [running]:
/var/run/secrets/kubernetes.io/serviceaccount/
I am not quite sure which part of configuration I am missing. I've researched following links :
https://kubernetes.io/docs/reference/access-authn-authz/authentication/#client-go-credential-plugins
https://kubernetes.io/docs/reference/access-authn-authz/service-accounts-admin/
But with no luck.
I guess I need to either let the client-go know which token/serviceAccount to use, or configure kubectl in a way that everyone can connect to its api.
Here's status of my kubectl though some commands results:
$ kubectl config view
apiVersion: v1
clusters:
- cluster:
insecure-skip-tls-verify: true
server: https://localhost:6443
name: docker-for-desktop-cluster
contexts:
- context:
cluster: docker-for-desktop-cluster
user: docker-for-desktop
name: docker-for-desktop
current-context: docker-for-desktop
kind: Config
preferences: {}
users:
- name: docker-for-desktop
user:
client-certificate-data: REDACTED
client-key-data: REDACTED
$ kubectl get serviceAccounts
NAME SECRETS AGE
default 1 3d
test-user 1 1d
$ kubectl describe serviceaccount test-user
Name: test-user
Namespace: default
Labels: <none>
Annotations: <none>
Image pull secrets: <none>
Mountable secrets: test-user-token-hxcsk
Tokens: test-user-token-hxcsk
Events: <none>
$ kubectl get secret test-user-token-hxcsk -o yaml
apiVersion: v1
data:
ca.crt: LS0tLS1CRUdJTiBDRVJUSUZJQ0......=
namespace: ZGVmYXVsdA==
token: ZXlKaGJHY2lPaUpTVXpJMU5pSX......=
kind: Secret
metadata:
annotations:
kubernetes.io/service-account.name: test-user
kubernetes.io/service-account.uid: 984b359a-6bd3-11e8-8600-XXXXXXX
creationTimestamp: 2018-06-09T10:55:17Z
name: test-user-token-hxcsk
namespace: default
resourceVersion: "110618"
selfLink: /api/v1/namespaces/default/secrets/test-user-token-hxcsk
uid: 98550de5-6bd3-11e8-8600-XXXXXX
type: kubernetes.io/service-account-token
This answer could be a little outdated but I will try to give more perspective/baseline for future readers that encounter the same/similar problem.
TL;DR
The following error:
panic: open /var/run/secrets/kubernetes.io/serviceaccount/token: no such file or directory
is most likely connected with the lack of token in the /var/run/secrets/kubernetes.io/serviceaccount location when using in-cluster-client-configuration. Also, it could be related to the fact of using in-cluster-client-configuration code outside of the cluster (for example running this code directly on a laptop or in pure Docker container).
You can check following commands to troubleshoot your issue further (assuming this code is running inside a Pod):
$ kubectl get serviceaccount X -o yaml:
look for: automountServiceAccountToken: false
$ kubectl describe pod XYZ
look for: containers.mounts and volumeMounts where Secret is mounted
Citing the official documentation:
Authenticating inside the cluster
This example shows you how to configure a client with client-go to authenticate to the Kubernetes API from an application running inside the Kubernetes cluster.
client-go uses the Service Account token mounted inside the Pod at the /var/run/secrets/kubernetes.io/serviceaccount path when the rest.InClusterConfig() is used.
-- Github.com: Kubernetes: client-go: Examples: in cluster client configuration
If you are authenticating to the Kubernetes API with ~/.kube/config you should be using the out-of-cluster-client-configuration.
Additional information:
I've added additional information for more reference on further troubleshooting when the code is run inside of a Pod.
automountServiceAccountToken: false
In version 1.6+, you can opt out of automounting API credentials for a service account by setting automountServiceAccountToken: false on the service account:
apiVersion: v1
kind: ServiceAccount
metadata:
name: go-serviceaccount
automountServiceAccountToken: false
In version 1.6+, you can also opt out of automounting API credentials for a particular pod:
apiVersion: v1
kind: Pod
metadata:
name: sdk
spec:
serviceAccountName: go-serviceaccount
automountServiceAccountToken: false
-- Kubernetes.io: Docs: Tasks: Configure pod container: Configure service account
$ kubectl describe pod XYZ:
When the servicAccount token is mounted, the Pod definition should look like this:
<-- OMITTED -->
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from go-serviceaccount-token-4rst8 (ro)
<-- OMITTED -->
Volumes:
go-serviceaccount-token-4rst8:
Type: Secret (a volume populated by a Secret)
SecretName: go-serviceaccount-token-4rst8
Optional: false
If it's not:
<-- OMITTED -->
Mounts: <none>
<-- OMITTED -->
Volumes: <none>
Additional resources:
Kubernetes.io: Docs: Reference: Access authn authz: Authentication
Just to make it clear, in case it helps you further debug it: the problem has nothing to do with Go or your code, and everything to do with the Kubernetes node not being able to get a token from the Kubernetes master.
In kubectl config view, clusters.cluster.server should probably point at an IP address that the node can reach.
It needs to access the CA, i.e., the master, in order to provide that token, and I'm guessing it fails to for that reason.
kubectl describe <your_pod_name> would probably tell you what the problem was acquiring the token.
Since you assumed the problem was Go/your code and focused on that, you neglected to provide more information about your Kubernetes setup, which makes it more difficult for me to give you a better answer than my guess above ;-)
But I hope it helps!

Resources