i'm configuring Elastic Cloud agent on Azure AKS with pool system and user. On system pool i configured CriticalAddonsOnly=true:NoSchedule taint to prevent that pod application run there. I installed the Elastic Cloud agent but i'm noticing that DaemonSet trying to run pods on that system pool without success. I tried to set on yaml config of agent the label CriticalAddonsOnly=true:NoSchedule but i got same errors. Is there a way to force deploy on system pool or to exclude ElasticCloud pods deploy on that pool?
Here how setup yaml:
tolerations:
- key: node-role.kubernetes.io/control-plane
effect: NoSchedule
- key: node-role.kubernetes.io/master
effect: NoSchedule
- key: CriticalAddonsOnly
operator: "Exists"
effect: NoSchedule
Regards
node-role.kubernetes.io/control-plane & node-role.kubernetes.io/master are no taints for AKS nodes. These are node labels. So please remove them from the toleration spec.
Furthermore specifying a toleration does not guarantee scheduling onto tolerated nodes. It just marks that the node should not accept any pods that do not tolerate the taints. As your 2nd node pool seems not to be tainted, the scheduler just drops your pods there.
You could now add taints to your other nodepools or more easier just specify a node selector =
nodeSelector:
kubernetes.azure.com/mode: system
tolerations:
- key: "CriticalAddonsOnly"
operator: "Exists"
effect: "NoSchedule"
The same could be also achieved with Node Affinity. You should check the Helm Chart or your deployment option if nodeSelector or NodeAffinity is available.
Related
I want to mark the pod ready only when there are enough connections created and the pod is ready to handle requests. The connections are created at the startup of my Springboot application. How can I make sure that the pod is ready only after the connections are created?
You can write a small python (or other) script which checks if the connections are created and ready to receive requests.
Then, add it to the pod deployment yaml as an initContainers:
https://kubernetes.io/docs/concepts/workloads/pods/init-containers/
initContainers:
- name: my-connection-validator
image: path-to-your-image
env:
- name: POD_HOST
value: "localhost-or-ip"
- name: POD_PORT
value: "12345"
I deploy a Elasticsearch cluster to EKS, below is the spec
apiVersion: elasticsearch.k8s.elastic.co/v1
kind: Elasticsearch
metadata:
name: elk
spec:
version: 7.15.2
serviceAccountName: docker-sa
http:
tls:
selfSignedCertificate:
disabled: true
nodeSets:
- name: node
count: 3
config:
...
I can see it has been deployed correctly and all pods are running.
$ kubectl get pods
NAME READY STATUS RESTARTS AGE
elk-es-node-0 1/1 Running 0 19h
elk-es-node-1 1/1 Running 0 19h
elk-es-node-2 1/1 Running 0 11h
But I can't restart the deployment Elasticsearch,
$ kubectl rollout restart Elasticsearch elk-es-node
Error from server (NotFound): elasticsearches.elasticsearch.k8s.elastic.co "elk-es-node" not found
The Elasticsearch is using statefulset so I tried to restart statefulset,
$ kubectl rollout restart statefulset elk-es-node
statefulset.apps/elk-es-node restarted
the above command says restarted, but the actual pods are not restarting.
what is the right way to restart a custom kind in K8S?
Use - kubectl get all
To identify if the resource created is a deployment or a statefulset -
use -n <namespace"> along with the above command, if you are working in a specific namespace.
Assuming, you are using a statefulset, the issue below command to understand the properties in which it is configured.
kubectl get statefulset <statefulset-name"> -o yaml > statefulsetContent.yaml
this will create a yaml file names statefulsetContent.yaml in same directory.
you can use it to explore different options configured in the statefulset.
Check for .spec.updateStrategy in the yaml file. Based on this we can identify its update strategy.
Below is from the official documentation
There are two possible values:
OnDelete
When a StatefulSet's .spec.updateStrategy.type is set to OnDelete, the StatefulSet controller will not automatically update the Pods in a StatefulSet. Users must manually delete Pods to cause the controller to create new Pods that reflect modifications made to a StatefulSet's .spec.template.
RollingUpdate
The RollingUpdate update strategy implements automated, rolling update for the Pods in a StatefulSet. This is the default update strategy.
As a work around, you can try to scale down/up the statefulset.
kubectl scale sts <statefulset-name"> --replicas=<count">
With ECK as the operator, you do not need to use rollout restart. Apply your updated Elasticsearch spec and the operator will perform rolling update for you. If for any reason you need to restart a pod, you use kubectl delete pod <es pod> -n <your es namespace> to remove the pod and the operator will spin up new one for you.
I have deployed the Bitnami helm chart of elasticsearch on the Kubernetes environment.
https://github.com/bitnami/charts/tree/master/bitnami/elasticsearch
Unfortunately, I am getting the following error for the coordinating-only pod. However, the cluster is restricted.
Pods "elasticsearch-elasticsearch-coordinating-only-5b57786cf6-" is forbidden: unable to validate against any pod security policy:
[spec.initContainers[0].securityContext.privileged: Invalid value: true: Privileged containers are not allowed]; Deployment does not have minimum availability.
I there anything I need to adapt/add-in default values.yaml?
Any suggestion to get rid of this error?
Thanks.
You can't validate if your cluster is restricted with some security policy. In your situation someone (assuming administrator) has blocked the option to run privileged containers for you.
Here's an example of how pod security policy blocks privileged containers:
apiVersion: policy/v1beta1
kind: PodSecurityPolicy
metadata:
name: example
spec:
privileged: false # Don't allow privileged pods!
seLinux:
rule: RunAsAny
----
What is require for you is to have appropriate Role with a PodSecurityPolicy resource and RoleBinding that will allow you to run privileged containers.
This is very well explained in kubernetes documentation at Enabling pod security policy
so the solution was to set the following parameter in values.yaml file then deploy simply.
Don't need to create any role or pod security policy.
sysctlImage:
enabled: false
curator:
enabled: true
rbac:
# Specifies whether RBAC should be enabled
enabled: true
psp:
# Specifies whether a podsecuritypolicy should be created
create: true
Also run this command on each node:
sysctl -w vm.max_map_count=262144 && sysctl -w fs.file-max=65536
I deployed an elasticsearch cluster with official Helm chart (https://github.com/elastic/helm-charts/tree/master/elasticsearch).
There are 3 Helm releases:
master (3 nodes)
client (1 node)
data (2 nodes)
Cluster was running fine, I did a crash test by removing master release, and re-create it.
After that, master nodes are ok, but data nodes complain:
Caused by: org.elasticsearch.cluster.coordination.CoordinationStateRejectedException: join validation on cluster state with a different cluster uuid xeQ6IVkDQ2es1CO2yZ_7rw than local cluster uuid 9P9ZGqSuQmy7iRDGcit5fg, rejecting
which is normal because master nodes are new.
How can I fix data nodes cluster state without removing data folder?
Edit:
I know the reason why is broken, I know a basic solution is to remove data folder and restart node (as I can see on elastic forum, lot of similar questions without answers). But I am looking for a production aware solution, maybe with https://www.elastic.co/guide/en/elasticsearch/reference/current/node-tool.html tool?
Using elasticsearch-node utility, it's possible to reset cluster state, then the fresh node can join another cluster.
The tricky thing is to use this utility bin with Docker, because elasticsearch server must be stopped!
Solution with kubernetes:
Stop pods by scaling to 0 the sts: kubectl scale data-nodes --replicas=0
Create a k8s job that reset the cluster state, with data volume attached
Apply the job for each PVC
Rescale sts and enjoy!
job.yaml:
apiVersion: batch/v1
kind: Job
metadata:
name: test-fix-cluster-m[0-3]
spec:
template:
spec:
containers:
- args:
- -c
- yes | elasticsearch-node detach-cluster; yes | elasticsearch-node remove-customs '*'
# uncomment for at least 1 PVC
#- yes | elasticsearch-node unsafe-bootstrap -v
command:
- /bin/sh
image: docker.elastic.co/elasticsearch/elasticsearch:7.10.1
name: elasticsearch
volumeMounts:
- mountPath: /usr/share/elasticsearch/data
name: es-data
restartPolicy: Never
volumes:
- name: es-data
persistentVolumeClaim:
claimName: es-test-master-es-test-master-[0-3]
If you are interested, here the code behind unsafe-bootstrap: https://github.com/elastic/elasticsearch/blob/master/server/src/main/java/org/elasticsearch/cluster/coordination/UnsafeBootstrapMasterCommand.java#L83
I have written a small story at https://medium.com/#thomasdecaux/fix-broken-elasticsearch-cluster-405ad67ee17c.
I've started a minikube (using Kubernetes 1.18.3) to test out ECK and specifically packetbeat. The minikube profile is called "packetbeat" (important, as that's the hostname for the Virtualbox VM as well) and I followed the ECK quickstart to get it up and running. ElasticSearch (single node) and Kibana are running fine and packetbeat is gathering flows as well, however, I'm unable to make it add the Kubernetes metadata to the fields.
I'm working in the default namespace and created a ClusterRoleBinding to view for the default ServiceAccount in the namespace. This is working well, if I do not do that, packetbeat will report it is unable to list the Pods on the API server.
This is the Beat config I'm using to make ECK deploy packetbeat:
apiVersion: beat.k8s.elastic.co/v1beta1
kind: Beat
metadata:
name: packetbeat
spec:
type: packetbeat
version: 7.9.0
elasticsearchRef:
name: quickstart
kibanaRef:
name: kibana
config:
packetbeat.interfaces.device: any
packetbeat.protocols:
- type: http
ports: [80, 8000, 8080, 9200]
- type: tls
ports: [443]
packetbeat.flows:
timeout: 30s
period: 10s
processors:
- add_kubernetes_metadata: {}
daemonSet:
podTemplate:
spec:
terminationGracePeriodSeconds: 30
hostNetwork: true
automountServiceAccountToken: true # some older Beat versions are depending on this settings presence in k8s context
dnsPolicy: ClusterFirstWithHostNet
containers:
- name: packetbeat
securityContext:
runAsUser: 0
capabilities:
add:
- NET_ADMIN
(This is mostly a slightly modified example from the ECK example page.) However, this is not working at all. I tried it with "add_kubernetes_metadata: {}" first, but that will error with the message:
2020-08-19T14:23:38.550Z ERROR [kubernetes] kubernetes/util.go:117
kubernetes: Querying for pod failed with error: pods "packetbeat" not
found {"libbeat.processor": "add_kubernetes_metadata"}
This message goes away when I add the "host: packetbeat". I'm no longer getting an error now, but I'm not getting the Kubernetes metadata either. I'm mostly interested in the namespace tag, but I'm not getting any. I do not see any additional errors in the log and it just reports monitoring details every 30 seconds at the moment.
What am I doing wrong? Any more information I can provide to help me debug this?
So the docs are just unclear. Although they do not explicitely state it, you do need to add indexers and matchers. My understanding was that there are "default" ones (as you can disable those), but that does not seem to be the case. Adding the indexers and matchers as per the example in the docs makes the Kubernetes metadata part of the data.