Minio tenant stucked with 'Waiting for MinIO TLS Certificate' - minio

I have a problem with Minio installation.
Minio tenants stucked in state 'Waiting for MinIO TLS Certificate'. It doesn't matter if tenant created by helm chart or by additional yaml.
Even if I'm creating tenant from Minio web console the result is the same.
I use chart installation from: https://github.com/minio/operator
helm repo remove minio
helm repo add minio https://operator.min.io/
helm install --namespace minio-operator --create-namespace --generate-name minio/minio-operator
3 kubectl apply -f https://github.com/minio/operator/blob/master/examples/tenant.yaml
Operator installs fine. Tenant after creation stacked with 'Waiting for MinIO TLS Certificate' message.
Logs from operator:
E0729 11:06:17.788400 1 operator.go:137] Unexpected error during the creation of the csr/operator-minio-csr: timeout during certificate fetching of csr/operator-minio-csr
I0729 11:06:17.788419 1 main-controller.go:627] Waiting for the operator certificates to be issued timeout during certificate fetching of csr/operator-minio-csr
I0729 11:06:27.795784 1 main-controller.go:625] operator TLS secret not found%!(EXTRA string=secrets "operator-tls" not found)
I0729 11:06:27.817912 1 csr.go:145] Start polling for certificate of csr/operator-minio-csr, every 5s, timeout after 20m0s
E0729 11:26:07.973014 1 minio.go:213] Unexpected error during the creation of the csr/minio-minio-csr: timeout during certificate fetching of csr/minio-minio-csr
E0729 11:26:07.973050 1 main-controller.go:754] error syncing 'minio/minio': timeout during certificate fetching of csr/minio-minio-csr
E0729 11:26:27.823681 1 operator.go:137] Unexpected error during the creation of the csr/operator-minio-csr: timeout during certificate fetching of csr/operator-minio-csr
I0729 11:26:27.823700 1 main-controller.go:627] Waiting for the operator certificates to be issued timeout during certificate fetching of csr/operator-minio-csr
I0729 11:26:37.831111 1 main-controller.go:625] operator TLS secret not found%!(EXTRA string=secrets "operator-tls" not found)
I0729 11:26:37.845819 1 csr.go:145] Start polling for certificate of csr/operator-minio-csr, every 5s, timeout after 20m0s
E0729 11:27:08.019483 1 main-controller.go:754] error syncing 'minio/minio': secrets "operator-tls" not found
I0729 11:28:08.036307 1 minio.go:141] Generating private key
I0729 11:28:08.036396 1 minio.go:154] Generating CSR with CN=minio
I0729 11:28:08.054702 1 csr.go:145] Start polling for certificate of csr/minio-minio-csr, every 5s, timeout after 20m0s
CSR request exists:
minio-minio-csr 15m kubernetes.io/kubelet-serving system:serviceaccount:minio:minio-operator Approved
operator-minio-csr 163m kubernetes.io/kubelet-serving system:serviceaccount:minio:minio-operator Approved
Tenant exists:
minio minio Waiting for MinIO TLS Certificate 37s
Example of tenant.yaml
---
apiVersion: minio.min.io/v2
kind: Tenant
metadata:
name: minio
namespace: minio
labels:
app: minio
annotations:
prometheus.io/path: /minio/v2/metrics/cluster
prometheus.io/port: "9000"
prometheus.io/scrape: "true"
spec:
image: minio/minio:RELEASE.2021-06-17T00-10-46Z
imagePullPolicy: IfNotPresent
credsSecret:
name: minio-creds-secret
pools:
- servers: 4
name: pool-0
volumesPerServer: 4
volumeClaimTemplate:
metadata:
name: data
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 10Gi
securityContext:
runAsUser: 1000
runAsGroup: 1000
runAsNonRoot: true
fsGroup: 1000
mountPath: /export
requestAutoCert: true
s3:
bucketDNS: false
certConfig:
commonName: "minio"
organizationName: []
dnsNames: []
podManagementPolicy: Parallel
serviceMetadata:
minioServiceLabels:
label: minio-svc
minioServiceAnnotations:
v2.min.io: minio-svc
consoleServiceLabels:
label: console-svc
consoleServiceAnnotations:
v2.min.io: console-svc
console:
image: minio/console:v0.7.5
replicas: 2
consoleSecret:
name: console-secret
securityContext:
runAsUser: 1000
runAsGroup: 2000
runAsNonRoot: true
fsGroup: 2000

I remember there was some settings that need to be in place in k8s to enable requestAutoCert: true.
Something like:
kube-controller:
extra_args:
cluster-signing-cert-file: "/etc/kubernetes/ssl/kube-ca.pem"
cluster-signing-key-file: "/etc/kubernetes/ssl/kube-ca-key.pem"

You must restart the console and operator pods after the cluster has been updated with below section.
kube-controller:
extra_args:
cluster-signing-cert-file: "/etc/kubernetes/ssl/kube-ca.pem"
cluster-signing-key-file: "/etc/kubernetes/ssl/kube-ca-key.pem"

One way I found to install MinIO is like this:
File: kind-config.yaml
Content:
# four node (two workers) cluster config
kind: Cluster
apiVersion: kind.x-k8s.io/v1alpha4
nodes:
- role: control-plane
- role: worker
- role: worker
- role: worker
- role: worker
Start by deleting any previous cluster
kind delete cluster
It should look like:
$ kind delete cluster
Deleting cluster "kind" ...
Create the cluster:
kind create cluster --config kind-config.yaml
It should look like:
$ kind create cluster --config kind-config.yaml
Creating cluster "kind" ...
✓ Ensuring node image (kindest/node:v1.24.0) đŸ–ŧ
✓ Preparing nodes đŸ“Ļ đŸ“Ļ đŸ“Ļ đŸ“Ļ đŸ“Ļ
✓ Writing configuration 📜
✓ Starting control-plane 🕹ī¸
✓ Installing CNI 🔌
✓ Installing StorageClass 💾
✓ Joining worker nodes 🚜
Set kubectl context to "kind-kind"
You can now use your cluster with:
kubectl cluster-info --context kind-kind
Not sure what to do next? 😅 Check out https://kind.sigs.k8s.io/docs/user/quick-start/
Remove and add the chart:
helm repo remove minio
helm repo add minio https://operator.min.io/
It should look like:
$ helm repo remove minio
"minio" has been removed from your repositories
$ helm repo add minio https://operator.min.io/
"minio" has been added to your repositories
Download the repository locally:
cd ~/
git clone git#github.com:minio/operator.git
It should look like:
$ cd ~/
$ git clone git#github.com:minio/operator.git
Cloning into 'operator'...
remote: Enumerating objects: 13159, done.
remote: Counting objects: 100% (881/881), done.
remote: Compressing objects: 100% (196/196), done.
remote: Total 13159 (delta 674), reused 822 (delta 659), pack-reused 12278
Receiving objects: 100% (13159/13159), 8.65 MiB | 3.60 MiB/s, done.
Resolving deltas: 100% (8259/8259), done.
Install the Operator:
cd ~/operator
helm install \
--namespace minio-operator \
--create-namespace minio-operator \
minio/operator
It should look like:
$ cd ~/operator
helm install \
--namespace minio-operator \
--create-namespace minio-operator \
minio/operator
NAME: minio-operator
LAST DEPLOYED: Fri Jun 24 17:50:19 2022
NAMESPACE: minio-operator
STATUS: deployed
REVISION: 1
TEST SUITE: None
NOTES:
1. Get the JWT for logging in to the console:
kubectl apply -f - <<EOF
apiVersion: v1
kind: Secret
metadata:
name: console-sa-secret
namespace: minio-operator
annotations:
kubernetes.io/service-account.name: console-sa
type: kubernetes.io/service-account-token
EOF
kubectl -n minio-operator get secret console-sa-secret -o jsonpath="{.data.token}" | base64 --decode
2. Get the Operator Console URL by running these commands:
kubectl --namespace minio-operator port-forward svc/console 9090:9090
echo "Visit the Operator Console at http://127.0.0.1:9090"
In case your cluster has no access to the internet, you can load image to the cluster:
kind load docker-image minio/console:v0.19.0
It should look like:
$ kind load docker-image minio/console:v0.19.0
Image: "minio/console:v0.19.0" with ID "sha256:739e933b5d9ddb22f690f3773cbcf4c7409113d6739d905e31e480cfa5c0a21d" not yet present on node "kind-worker2", loading...
Image: "minio/console:v0.19.0" with ID "sha256:739e933b5d9ddb22f690f3773cbcf4c7409113d6739d905e31e480cfa5c0a21d" not yet present on node "kind-worker4", loading...
Image: "minio/console:v0.19.0" with ID "sha256:739e933b5d9ddb22f690f3773cbcf4c7409113d6739d905e31e480cfa5c0a21d" not yet present on node "kind-worker", loading...
Image: "minio/console:v0.19.0" with ID "sha256:739e933b5d9ddb22f690f3773cbcf4c7409113d6739d905e31e480cfa5c0a21d" not yet present on node "kind-worker3", loading...
Image: "minio/console:v0.19.0" with ID "sha256:739e933b5d9ddb22f690f3773cbcf4c7409113d6739d905e31e480cfa5c0a21d" not yet present on node "kind-control-plane", loading...
Install the tenant with Helm:
helm install --namespace tenant-ns \
--create-namespace tenant minio/tenant
It should look like:
$ helm install --namespace tenant-ns \
--create-namespace tenant minio/tenant
NAME: tenant
LAST DEPLOYED: Fri Jun 24 17:52:55 2022
NAMESPACE: tenant-ns
STATUS: deployed
REVISION: 1
TEST SUITE: None
NOTES:
To connect to the minio1 tenant if it doesn't have a service exposed, you can port-forward to it by running:
kubectl --namespace tenant-ns port-forward svc/minio1-console 9443:9443
Then visit the MinIO Console at https://127.0.0.1:9443
Also provide the image to the cluster:
kind load docker-image quay.io/minio/minio:RELEASE.2022-05-26T05-48-41Z
It should look like:
$ kind load docker-image quay.io/minio/minio:RELEASE.2022-05-26T05-48-41Z
Image: "quay.io/minio/minio:RELEASE.2022-05-26T05-48-41Z" with ID "sha256:ee8072647d5aed0c6fd23090acdcc26da93787d329b091fbdeeb33d64409a28a" not yet present on node "kind-worker2", loading...
Image: "quay.io/minio/minio:RELEASE.2022-05-26T05-48-41Z" with ID "sha256:ee8072647d5aed0c6fd23090acdcc26da93787d329b091fbdeeb33d64409a28a" not yet present on node "kind-worker4", loading...
Image: "quay.io/minio/minio:RELEASE.2022-05-26T05-48-41Z" with ID "sha256:ee8072647d5aed0c6fd23090acdcc26da93787d329b091fbdeeb33d64409a28a" not yet present on node "kind-worker", loading...
Image: "quay.io/minio/minio:RELEASE.2022-05-26T05-48-41Z" with ID "sha256:ee8072647d5aed0c6fd23090acdcc26da93787d329b091fbdeeb33d64409a28a" not yet present on node "kind-worker3", loading...
Image: "quay.io/minio/minio:RELEASE.2022-05-26T05-48-41Z" with ID "sha256:ee8072647d5aed0c6fd23090acdcc26da93787d329b091fbdeeb33d64409a28a" not yet present on node "kind-control-plane", loading...
Then look at the MinIO pods:
$ k get pods -n tenant-ns -l app=minio
NAME READY STATUS RESTARTS AGE
minio1-pool-0-0 1/1 Running 0 12m
minio1-pool-0-1 1/1 Running 0 12m
minio1-pool-0-2 1/1 Running 0 12m
minio1-pool-0-3 1/1 Running 0 12m
The reason for the TLS Message can be because of the type of cluster, I have seen similar problem with OpenShift Cluster but please specify the cluster to have an idea of what else could be. Also try with latest versions of MinIO and Operator to get proper functionality.

Related

ElasticSearch CrashLoopBackoff when deploying with ECK in Kubernetes OKD 4.11

I am running Kubernetes using OKD 4.11 (running on vSphere) and have validated the basic functionality (including dyn. volume provisioning) using applications (like nginx).
I also applied
oc adm policy add-scc-to-group anyuid system:authenticated
to allow authenticated users to use anyuid (which seems to have been required to deploy the nginx example I was testing with).
Then I installed ECK using this quickstart with kubectl to install the CRD and RBAC manifests. This seems to have worked.
Then I deployed the most basic ElasticSearch quickstart example with kubectl apply -f quickstart.yaml using this manifest:
apiVersion: elasticsearch.k8s.elastic.co/v1
kind: Elasticsearch
metadata:
name: quickstart
spec:
version: 8.4.2
nodeSets:
- name: default
count: 1
config:
node.store.allow_mmap: false
The deployment proceeds as expected, pulling image and starting container, but ends in a CrashLoopBackoff with the following error from ElasticSearch at the end of the log:
"elasticsearch.cluster.name":"quickstart",
"error.type":"java.lang.IllegalStateException",
"error.message":"failed to obtain node locks, tried
[/usr/share/elasticsearch/data]; maybe these locations
are not writable or multiple nodes were started on the same data path?"
Looking into the storage, the PV and PVC are created successfully, the output of kubectl get pv,pvc,sc -A -n my-namespace is:
NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGECLASS REASON AGE
persistentvolume/pvc-9d7b57db-8afd-40f7-8b3d-6334bdc07241 1Gi RWO Delete Bound my-namespace/elasticsearch-data-quickstart-es-default-0 thin 41m
NAMESPACE NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
my-namespace persistentvolumeclaim/elasticsearch-data-quickstart-es-default-0 Bound pvc-9d7b57db-8afd-40f7-8b3d-6334bdc07241 1Gi RWO thin 41m
NAMESPACE NAME PROVISIONER RECLAIMPOLICY VOLUMEBINDINGMODE ALLOWVOLUMEEXPANSION AGE
storageclass.storage.k8s.io/thin (default) kubernetes.io/vsphere-volume Delete Immediate false 19d
storageclass.storage.k8s.io/thin-csi csi.vsphere.vmware.com Delete WaitForFirstConsumer true 19d
Looking at the pod yaml, it appears that the volume is correctly attached :
volumes:
- name: elasticsearch-data
persistentVolumeClaim:
claimName: elasticsearch-data-quickstart-es-default-0
- name: downward-api
downwardAPI:
items:
- path: labels
fieldRef:
apiVersion: v1
fieldPath: metadata.labels
defaultMode: 420
....
volumeMounts:
...
- name: elasticsearch-data
mountPath: /usr/share/elasticsearch/data
I cannot understand why the volume would be read-only or rather why ES cannot create the lock.
I did find this similar issue, but I am not sure how to apply the UID permissions (in general I am fairly naive about the way permissions work in OKD) when when working with ECK.
Does anyone with deeper K8s / OKD or ECK/ElasticSearch knowledge have an idea how to better isolate and/or resolve this issue?
Update: I believe this has something to do with this issue and am researching the optionas related to OKD.
For posterity, the ECK starts an init container that should take care of the chown on the data volume, but can only do so if it is running as root.
The resolution for me was documented here:
https://repo1.dso.mil/dsop/elastic/elasticsearch/elasticsearch/-/issues/7
The manifest now looks like this:
apiVersion: elasticsearch.k8s.elastic.co/v1
kind: Elasticsearch
metadata:
name: quickstart
spec:
version: 8.4.2
nodeSets:
- name: default
count: 1
config:
node.store.allow_mmap: false
# run init container as root to chown the volume to uid 1000
podTemplate:
spec:
securityContext:
runAsUser: 1000
runAsGroup: 0
initContainers:
- name: elastic-internal-init-filesystem
securityContext:
runAsUser: 0
runAsGroup: 0
And the pod starts up and can write to the volume as uid 1000.

Record Kubernetes container resource utilization data

I'm doing a perf test for web server which is deployed on EKS cluster. I'm invoking the server using jmeter with different conditions (like varying thread count, payload size, etc..).
So I want to record kubernetes perf data with the timestamp so that I can analyze these data with my jmeter output (JTL).
I have been digging through the internet to find a way to record kubernetes perf data. But I was unable to find a proper way to do that.
Can experts please provide me a standard way to do this??
Note: I have a multi-container pod also.
In line with #Jonas comment
This is the quickest way of installing Prometheus in you K8 cluster. Added Details in the answer as it was impossible to put the commands in a readable format in Comment.
Add bitnami helm repo.
helm repo add bitnami https://charts.bitnami.com/bitnami
Install helmchart for promethus
helm install my-release bitnami/kube-prometheus
Installation output would be:
C:\Users\ameena\Desktop\shine\Article\K8\promethus>helm install my-release bitnami/kube-prometheus
NAME: my-release
LAST DEPLOYED: Mon Apr 12 12:44:13 2021
NAMESPACE: default
STATUS: deployed
REVISION: 1
TEST SUITE: None
NOTES:
** Please be patient while the chart is being deployed **
Watch the Prometheus Operator Deployment status using the command:
kubectl get deploy -w --namespace default -l app.kubernetes.io/name=kube-prometheus-operator,app.kubernetes.io/instance=my-release
Watch the Prometheus StatefulSet status using the command:
kubectl get sts -w --namespace default -l app.kubernetes.io/name=kube-prometheus-prometheus,app.kubernetes.io/instance=my-release
Prometheus can be accessed via port "9090" on the following DNS name from within your cluster:
my-release-kube-prometheus-prometheus.default.svc.cluster.local
To access Prometheus from outside the cluster execute the following commands:
echo "Prometheus URL: http://127.0.0.1:9090/"
kubectl port-forward --namespace default svc/my-release-kube-prometheus-prometheus 9090:9090
Watch the Alertmanager StatefulSet status using the command:
kubectl get sts -w --namespace default -l app.kubernetes.io/name=kube-prometheus-alertmanager,app.kubernetes.io/instance=my-release
Alertmanager can be accessed via port "9093" on the following DNS name from within your cluster:
my-release-kube-prometheus-alertmanager.default.svc.cluster.local
To access Alertmanager from outside the cluster execute the following commands:
echo "Alertmanager URL: http://127.0.0.1:9093/"
kubectl port-forward --namespace default svc/my-release-kube-prometheus-alertmanager 9093:9093
Follow the commands to forward the UI to localhost.
echo "Prometheus URL: http://127.0.0.1:9090/"
kubectl port-forward --namespace default svc/my-release-kube-prometheus-prometheus 9090:9090
Open the UI in browser: http://127.0.0.1:9090/classic/graph
Annotate the pods for sending the metrics.
apiVersion: apps/v1
kind: Deployment
metadata:
name: nginx-deployment
spec:
selector:
matchLabels:
app: nginx
replicas: 4 # Update the replicas from 2 to 4
template:
metadata:
labels:
app: nginx
annotations:
prometheus.io/scrape: 'true'
prometheus.io/port: '9102'
spec:
containers:
- name: nginx
image: nginx:1.14.2
ports:
- containerPort: 80
In the ui put appropriate filters and start observing the crucial parameter such as memory CPU etc. UI supports autocomplete so it will not be that difficult to figure out things.
Regards

Fail to connect to kubectl from client-go - /serviceaccount/token: no such file

I am using golang lib client-go to connect to a running local kubrenets. To start with I took code from the example: out-of-cluster-client-configuration.
Running a code like this:
$ KUBERNETES_SERVICE_HOST=localhost KUBERNETES_SERVICE_PORT=6443 go run ./main.go results in following error:
panic: open /var/run/secrets/kubernetes.io/serviceaccount/token: no such file or directory
goroutine 1 [running]:
/var/run/secrets/kubernetes.io/serviceaccount/
I am not quite sure which part of configuration I am missing. I've researched following links :
https://kubernetes.io/docs/reference/access-authn-authz/authentication/#client-go-credential-plugins
https://kubernetes.io/docs/reference/access-authn-authz/service-accounts-admin/
But with no luck.
I guess I need to either let the client-go know which token/serviceAccount to use, or configure kubectl in a way that everyone can connect to its api.
Here's status of my kubectl though some commands results:
$ kubectl config view
apiVersion: v1
clusters:
- cluster:
insecure-skip-tls-verify: true
server: https://localhost:6443
name: docker-for-desktop-cluster
contexts:
- context:
cluster: docker-for-desktop-cluster
user: docker-for-desktop
name: docker-for-desktop
current-context: docker-for-desktop
kind: Config
preferences: {}
users:
- name: docker-for-desktop
user:
client-certificate-data: REDACTED
client-key-data: REDACTED
$ kubectl get serviceAccounts
NAME SECRETS AGE
default 1 3d
test-user 1 1d
$ kubectl describe serviceaccount test-user
Name: test-user
Namespace: default
Labels: <none>
Annotations: <none>
Image pull secrets: <none>
Mountable secrets: test-user-token-hxcsk
Tokens: test-user-token-hxcsk
Events: <none>
$ kubectl get secret test-user-token-hxcsk -o yaml
apiVersion: v1
data:
ca.crt: LS0tLS1CRUdJTiBDRVJUSUZJQ0......=
namespace: ZGVmYXVsdA==
token: ZXlKaGJHY2lPaUpTVXpJMU5pSX......=
kind: Secret
metadata:
annotations:
kubernetes.io/service-account.name: test-user
kubernetes.io/service-account.uid: 984b359a-6bd3-11e8-8600-XXXXXXX
creationTimestamp: 2018-06-09T10:55:17Z
name: test-user-token-hxcsk
namespace: default
resourceVersion: "110618"
selfLink: /api/v1/namespaces/default/secrets/test-user-token-hxcsk
uid: 98550de5-6bd3-11e8-8600-XXXXXX
type: kubernetes.io/service-account-token
This answer could be a little outdated but I will try to give more perspective/baseline for future readers that encounter the same/similar problem.
TL;DR
The following error:
panic: open /var/run/secrets/kubernetes.io/serviceaccount/token: no such file or directory
is most likely connected with the lack of token in the /var/run/secrets/kubernetes.io/serviceaccount location when using in-cluster-client-configuration. Also, it could be related to the fact of using in-cluster-client-configuration code outside of the cluster (for example running this code directly on a laptop or in pure Docker container).
You can check following commands to troubleshoot your issue further (assuming this code is running inside a Pod):
$ kubectl get serviceaccount X -o yaml:
look for: automountServiceAccountToken: false
$ kubectl describe pod XYZ
look for: containers.mounts and volumeMounts where Secret is mounted
Citing the official documentation:
Authenticating inside the cluster
This example shows you how to configure a client with client-go to authenticate to the Kubernetes API from an application running inside the Kubernetes cluster.
client-go uses the Service Account token mounted inside the Pod at the /var/run/secrets/kubernetes.io/serviceaccount path when the rest.InClusterConfig() is used.
-- Github.com: Kubernetes: client-go: Examples: in cluster client configuration
If you are authenticating to the Kubernetes API with ~/.kube/config you should be using the out-of-cluster-client-configuration.
Additional information:
I've added additional information for more reference on further troubleshooting when the code is run inside of a Pod.
automountServiceAccountToken: false
In version 1.6+, you can opt out of automounting API credentials for a service account by setting automountServiceAccountToken: false on the service account:
apiVersion: v1
kind: ServiceAccount
metadata:
name: go-serviceaccount
automountServiceAccountToken: false
In version 1.6+, you can also opt out of automounting API credentials for a particular pod:
apiVersion: v1
kind: Pod
metadata:
name: sdk
spec:
serviceAccountName: go-serviceaccount
automountServiceAccountToken: false
-- Kubernetes.io: Docs: Tasks: Configure pod container: Configure service account
$ kubectl describe pod XYZ:
When the servicAccount token is mounted, the Pod definition should look like this:
<-- OMITTED -->
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from go-serviceaccount-token-4rst8 (ro)
<-- OMITTED -->
Volumes:
go-serviceaccount-token-4rst8:
Type: Secret (a volume populated by a Secret)
SecretName: go-serviceaccount-token-4rst8
Optional: false
If it's not:
<-- OMITTED -->
Mounts: <none>
<-- OMITTED -->
Volumes: <none>
Additional resources:
Kubernetes.io: Docs: Reference: Access authn authz: Authentication
Just to make it clear, in case it helps you further debug it: the problem has nothing to do with Go or your code, and everything to do with the Kubernetes node not being able to get a token from the Kubernetes master.
In kubectl config view, clusters.cluster.server should probably point at an IP address that the node can reach.
It needs to access the CA, i.e., the master, in order to provide that token, and I'm guessing it fails to for that reason.
kubectl describe <your_pod_name> would probably tell you what the problem was acquiring the token.
Since you assumed the problem was Go/your code and focused on that, you neglected to provide more information about your Kubernetes setup, which makes it more difficult for me to give you a better answer than my guess above ;-)
But I hope it helps!

Running elasticsearch on Google Cloud Kubernetes ends in CrashLoopBackOff

I try to run the elasticsearch6 container on a google cloud instance. Unfortunately the container always ends in CrashLoopBackOff.
This is what I did:
install gcloud and kubectl
curl https://packages.cloud.google.com/apt/doc/apt-key.gpg | sudo apt-key add -
echo "deb http://packages.cloud.google.com/apt cloud-sdk-$(lsb_release -c -s) main" | sudo tee -a /etc/apt/sources.list.d/google-cloud-sdk.list
sudo apt-get update && sudo apt-get install google-cloud-sdk kubectl
configure gcloud
gcloud init
gcloud config set compute/zone europe-west3-a # For Frankfurt
create kubernetes cluster
gcloud container clusters create elasticsearch-cluster --machine-type=f1-micro --num-nodes=3
Activate pod
kubectl create -f pod.yml
apiVersion: v1
kind: Pod
metadata:
name: test-elasticsearch
labels:
name: test-elasticsearch
spec:
containers:
- image: launcher.gcr.io/google/elasticsearch6
name: elasticsearch
After this I get the status:
kubectl get pods
NAME READY STATUS RESTARTS AGE
test-elasticsearch 0/1 CrashLoopBackOff 10 31m
A kubectl logs test-elasticsearch does not show any output.
And here the output of kubectl describe po test-elasticsearch with some info XXX out.
Name: test-elasticsearch
Namespace: default
Node: gke-elasticsearch-cluste-default-pool-XXXXXXXX-wtbv/XX.XXX.X.X
Start Time: Sat, 12 May 2018 14:54:36 +0200
Labels: name=test-elasticsearch
Annotations: kubernetes.io/limit-ranger=LimitRanger plugin set: cpu request for container elasticsearch
Status: Running
IP: XX.XX.X.X
Containers:
elasticsearch:
Container ID: docker://bb9d093df792df072a762973066d504a4e7d73b0e87d0236a94c3e8b972d9c41
Image: launcher.gcr.io/google/elasticsearch6
Image ID: docker-pullable://launcher.gcr.io/google/elasticsearch6#sha256:1ddafd5293dbec8fb73eabffa29614916e4933bb057db50231084d89f4a0b3fa
Port: <none>
Host Port: <none>
State: Waiting
Reason: CrashLoopBackOff
Last State: Terminated
Reason: Error
Exit Code: 137
Started: Sat, 12 May 2018 14:55:06 +0200
Finished: Sat, 12 May 2018 14:55:09 +0200
Ready: False
Restart Count: 2
Requests:
cpu: 100m
Environment: <none>
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from default-token-XXXXX (ro)
Conditions:
Type Status
Initialized True
Ready False
PodScheduled True
Volumes:
default-token-XXXXX:
Type: Secret (a volume populated by a Secret)
SecretName: default-token-XXXXX
Optional: false
QoS Class: Burstable
Node-Selectors: <none>
Tolerations: node.alpha.kubernetes.io/notReady:NoExecute for 300s
node.alpha.kubernetes.io/unreachable:NoExecute for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 51s default-scheduler Successfully assigned test-elasticsearch to gke-elasticsearch-cluste-def
Normal SuccessfulMountVolume 51s kubelet, gke-elasticsearch-cluste-default-pool-XXXXXXXX-wtbv MountVolume.SetUp succeeded for volume "default-token-XXXXX"
Normal Pulling 22s (x3 over 49s) kubelet, gke-elasticsearch-cluste-default-pool-XXXXXXXX-wtbv pulling image "launcher.gcr.io/google/elasticsearch6"
Normal Pulled 22s (x3 over 49s) kubelet, gke-elasticsearch-cluste-default-pool-XXXXXXXX-wtbv Successfully pulled image "launcher.gcr.io/google/elasticsearch6"
Normal Created 22s (x3 over 48s) kubelet, gke-elasticsearch-cluste-default-pool-XXXXXXXX-wtbv Created container
Normal Started 21s (x3 over 48s) kubelet, gke-elasticsearch-cluste-default-pool-XXXXXXXX-wtbv Started container
Warning BackOff 4s (x3 over 36s) kubelet, gke-elasticsearch-cluste-default-pool-XXXXXXXX-wtbv Back-off restarting failed container
Warning FailedSync 4s (x3 over 36s) kubelet, gke-elasticsearch-cluste-default-pool-XXXXXXXX-wtbv Error syncing pod
The problem was the f1-micro instance. It doesn't have enough memory to run. Only after upgrading to an instance with 4GB it works. Unfortunately this is way too expensive for me, so I have to look for something else.

StackDriver Monitoring for Elasticsearch in GKE

I need to monitor Elasticsearch(2.4) that installed on top of the k8s cluster. I have 2 clients, 3 masters and several data nodes run in pods. Following the "how to" of Stackdriver and the post "Can I run Google Monitoring Agent inside a Kubernetes Pod?", I deployed an agent in its own Pod. Suddenly, after all, have no Elasticsearch metrics in StackDriver. The Only Zeros.
Any suggestion are more than welcome.
This is my configuration:
Elastic service:
$kubectl describe svc elasticsearch
Name: elasticsearch
Namespace: default
Labels: component=elasticsearch
role=client
Selector: component=elasticsearch,role=client
Type: NodePort
IP: <IP>
Port: http 9200/TCP
NodePort: http <PORT>/TCP
Endpoints: <IP>:9200,<IP>:9200
Session Affinity: None
No events.
Stackdriver deployment:
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
name: stackagent
spec:
replicas: 1
strategy:
type: Recreate
template:
metadata:
labels:
component: monitoring
role: stackdriver-agent
spec:
containers:
- name: hslab-data-agent
image: StackDriverAgent:version1
StackDriverAgent:version1 Docker:
FROM ubuntu
WORKDIR /stackdriver
RUN apt-get update
RUN apt-get install curl lsb-release libyajl2 -y
RUN apt-get clean
COPY ./stackdriver/run.sh run.sh
COPY ./stackdriver/elasticsearch.conf elasticsearch.conf
RUN chmod 755 ./run.sh
CMD ["./run.sh"]
run.sh:
#!/bin/bash
curl -O https://repo.stackdriver.com/stack-install.sh
chmod 755 stack-install.sh
bash stack-install.sh --write-gcm
cp ./elasticsearch.conf /opt/stackdriver/collectd/etc/collectd.d/
service stackdriver-agent restart
while true; do
sleep 60
agent_pid=$(cat /var/run/stackdriver-agent.pid 2>/dev/null)
ps -p $agent_pid > /dev/null 2>&1
if [ $? != 0 ]; then
echo "Stackdriver agent pid not found!"
break;
fi
done
elasticsearch.conf:
Taken from https://raw.githubusercontent.com/Stackdriver/stackdriver-agent-service-configs/master/etc/collectd.d/elasticsearch.conf
# This is the monitoring configuration for Elasticsearch 1.0.x and later.
# Look for ELASTICSEARCH_HOST and ELASTICSEARCH_PORT to adjust your configuration file.
LoadPlugin curl_json
<Plugin "curl_json">
# When using non-standard Elasticsearch configurations, replace the below with
#<URL "http://ELASTICSEARCH_HOST:ELASTICSEARCH_PORT/_nodes/_local/stats/">
# PREVIOUSE LINE
# <URL "http://localhost:9200/_nodes/_local/stats/">
<URL "http://elasticsearch:9200/_nodes/_local/stats/">
Instance "elasticsearch"
....
Running state:
NAME READY STATUS RESTARTS AGE
esclient-4231471109-bd4tb 1/1 Running 0 23h
esclient-4231471109-k5pnw 1/1 Running 0 23h
esdata-1-2524076994-898r0 1/1 Running 0 23h
esdata-2-2426789420-zhz7j 1/1 Running 0 23h
esmaster-1-4205943399-zj2pn 1/1 Running 0 23h
esmaster-2-4248445829-pwq46 1/1 Running 0 23h
esmaster-3-3967126695-w0tp2 1/1 Running 0 23h
stackagent-3122989159-15vj1 1/1 Running 0 18h
The problem with API URL of plugin configuration.The http://elasticsearch:9200/_nodes/**_local**/stats/"> will returns info only for the _local node which is a client that have no documents.
In addition, the Stackdriver data will be collected under k8s cluster node and not under the pod name.
The partial solution is to setup sidecar in data node and patch the elasticsearch.conf query with the corresponding ES node name:
get the curl [elasticsearch]:9200/_nodes/stats
find the ES node name by the $(hostname)
patch the configuration <URL "http://elasticsearch:9200/_nodes/<esnode_name>/stats/">
This will collect the information of ES data node under the k8s data node name.

Resources