I have the following cronjob which deletes pods in a specific namespace.
I run the job as-is but it seems that the job doesn't run for each 20 min, it runs every few (2-3) min,
what I need is that on each 20 min the job will start deleting the pods in the specified namespace and then terminate, any idea what could be wrong here?
apiVersion: batch/v1
kind: CronJob
metadata:
name: restart
spec:
schedule: "*/20 * * * *"
concurrencyPolicy: Forbid
successfulJobsHistoryLimit: 0
failedJobsHistoryLimit: 0
jobTemplate:
spec:
backoffLimit: 0
template:
spec:
serviceAccountName: sa
restartPolicy: Never
containers:
- name: kubectl
image: bitnami/kubectl:1.22.3
command:
- /bin/sh
- -c
- kubectl get pods -o name | while read -r POD; do kubectl delete "$POD"; sleep 30; done
I'm really not sure why this happens...
Maybe the delete of the pod collapse
update
I tried the following but no pods were deleted,any idea?
apiVersion: batch/v1
kind: CronJob
metadata:
name: restart
spec:
schedule: "*/1 * * * *"
concurrencyPolicy: Forbid
successfulJobsHistoryLimit: 0
failedJobsHistoryLimit: 0
jobTemplate:
spec:
backoffLimit: 0
template:
metadata:
labels:
name: restart
spec:
serviceAccountName: pod-exterminator
restartPolicy: Never
containers:
- name: kubectl
image: bitnami/kubectl:1.22.3
command:
- /bin/sh
- -c
- kubectl get pods -o name --selector name!=restart | while read -r POD; do kubectl delete "$POD"; sleep 10; done.
This cronjob pod will delete itself at some point during the execution. Causing the job to fail and additionally resetting its back-off count.
The docs say:
The back-off count is reset when a Job's Pod is deleted or successful without any other Pods for the Job failing around that time.
You need to apply an appropriate filter. Also note that you can delete all pods with a single command.
Add a label to spec.jobTemplate.spec.template.metadata that you can use for filtering.
apiVersion: batch/v1
kind: CronJob
metadata:
name: restart
spec:
jobTemplate:
spec:
template:
metadata:
labels:
name: restart # label the pod
Then use this label to delete all pods that are not the cronjob pod.
kubectl delete pod --selector name!=restart
Since you state in the comments, you need a loop, a full working example may look like this.
---
apiVersion: batch/v1
kind: CronJob
metadata:
name: restart
namespace: sandbox
spec:
schedule: "*/20 * * * *"
concurrencyPolicy: Forbid
successfulJobsHistoryLimit: 0
failedJobsHistoryLimit: 0
jobTemplate:
spec:
backoffLimit: 0
template:
metadata:
labels:
name: restart
spec:
serviceAccountName: restart
restartPolicy: Never
containers:
- name: kubectl
image: bitnami/kubectl:1.22.3
command:
- /bin/sh
- -c
- |
kubectl get pods -o name --selector "name!=restart" |
while read -r POD; do
kubectl delete "$POD"
sleep 30
done
---
apiVersion: v1
kind: ServiceAccount
metadata:
name: restart
namespace: sandbox
---
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
name: pod-management
namespace: sandbox
rules:
- apiGroups: [""]
resources: ["pods"]
verbs: ["get", "watch", "list", "delete"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
name: restart-pod-management
namespace: sandbox
subjects:
- kind: ServiceAccount
name: restart
namespace: sandbox
roleRef:
kind: Role
name: pod-management
apiGroup: rbac.authorization.k8s.io
kubectl create namespace sandbox
kubectl config set-context --current --namespace sandbox
kubectl run pod1 --image busybox -- sleep infinity
kubectl run pod2 --image busybox -- sleep infinity
kubectl apply -f restart.yaml # the above file
Here you can see how the first pod is getting terminated.
$ kubectl get all
NAME READY STATUS RESTARTS AGE
pod/pod1 1/1 Terminating 0 43s
pod/pod2 1/1 Running 0 39s
pod/restart-27432801-rrtvm 1/1 Running 0 16s
NAME SCHEDULE SUSPEND ACTIVE LAST SCHEDULE AGE
cronjob.batch/restart */1 * * * * False 1 17s 36s
NAME COMPLETIONS DURATION AGE
job.batch/restart-27432801 0/1 17s 17s
Note that this is actually slightly buggy. Because from the time you're reading the pod list to the time you delete an individual pod in the list, the pod may not exist any more. You could use the below to ignore those cases, since when they are gone you don't need to delete them.
kubectl delete "$POD" || true
That said, since you name your job restart, I assume the purpose of this is to restart the pods of some deployments. You could actually use a proper restart, leveraging Kubernetes update strategies.
kubectl rollout restart $(kubectl get deploy -o name)
With the default update strategy, this will lead to new pods being created first and making sure they are ready before terminating the old ones.
$ kubectl rollout restart $(kubectl get deploy -o name)
NAME READY STATUS RESTARTS AGE
pod/app1-56f87fc665-mf9th 0/1 ContainerCreating 0 2s
pod/app1-5cbc776547-fh96w 1/1 Running 0 2m9s
pod/app2-7b9779f767-48kpd 0/1 ContainerCreating 0 2s
pod/app2-8d6454757-xj4zc 1/1 Running 0 2m9s
This also works with deamonsets.
$ kubectl rollout restart -h
Restart a resource.
Resource rollout will be restarted.
Examples:
# Restart a deployment
kubectl rollout restart deployment/nginx
# Restart a daemon set
kubectl rollout restart daemonset/abc
Related
I am running Kubernetes using OKD 4.11 (running on vSphere) and have validated the basic functionality (including dyn. volume provisioning) using applications (like nginx).
I also applied
oc adm policy add-scc-to-group anyuid system:authenticated
to allow authenticated users to use anyuid (which seems to have been required to deploy the nginx example I was testing with).
Then I installed ECK using this quickstart with kubectl to install the CRD and RBAC manifests. This seems to have worked.
Then I deployed the most basic ElasticSearch quickstart example with kubectl apply -f quickstart.yaml using this manifest:
apiVersion: elasticsearch.k8s.elastic.co/v1
kind: Elasticsearch
metadata:
name: quickstart
spec:
version: 8.4.2
nodeSets:
- name: default
count: 1
config:
node.store.allow_mmap: false
The deployment proceeds as expected, pulling image and starting container, but ends in a CrashLoopBackoff with the following error from ElasticSearch at the end of the log:
"elasticsearch.cluster.name":"quickstart",
"error.type":"java.lang.IllegalStateException",
"error.message":"failed to obtain node locks, tried
[/usr/share/elasticsearch/data]; maybe these locations
are not writable or multiple nodes were started on the same data path?"
Looking into the storage, the PV and PVC are created successfully, the output of kubectl get pv,pvc,sc -A -n my-namespace is:
NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGECLASS REASON AGE
persistentvolume/pvc-9d7b57db-8afd-40f7-8b3d-6334bdc07241 1Gi RWO Delete Bound my-namespace/elasticsearch-data-quickstart-es-default-0 thin 41m
NAMESPACE NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
my-namespace persistentvolumeclaim/elasticsearch-data-quickstart-es-default-0 Bound pvc-9d7b57db-8afd-40f7-8b3d-6334bdc07241 1Gi RWO thin 41m
NAMESPACE NAME PROVISIONER RECLAIMPOLICY VOLUMEBINDINGMODE ALLOWVOLUMEEXPANSION AGE
storageclass.storage.k8s.io/thin (default) kubernetes.io/vsphere-volume Delete Immediate false 19d
storageclass.storage.k8s.io/thin-csi csi.vsphere.vmware.com Delete WaitForFirstConsumer true 19d
Looking at the pod yaml, it appears that the volume is correctly attached :
volumes:
- name: elasticsearch-data
persistentVolumeClaim:
claimName: elasticsearch-data-quickstart-es-default-0
- name: downward-api
downwardAPI:
items:
- path: labels
fieldRef:
apiVersion: v1
fieldPath: metadata.labels
defaultMode: 420
....
volumeMounts:
...
- name: elasticsearch-data
mountPath: /usr/share/elasticsearch/data
I cannot understand why the volume would be read-only or rather why ES cannot create the lock.
I did find this similar issue, but I am not sure how to apply the UID permissions (in general I am fairly naive about the way permissions work in OKD) when when working with ECK.
Does anyone with deeper K8s / OKD or ECK/ElasticSearch knowledge have an idea how to better isolate and/or resolve this issue?
Update: I believe this has something to do with this issue and am researching the optionas related to OKD.
For posterity, the ECK starts an init container that should take care of the chown on the data volume, but can only do so if it is running as root.
The resolution for me was documented here:
https://repo1.dso.mil/dsop/elastic/elasticsearch/elasticsearch/-/issues/7
The manifest now looks like this:
apiVersion: elasticsearch.k8s.elastic.co/v1
kind: Elasticsearch
metadata:
name: quickstart
spec:
version: 8.4.2
nodeSets:
- name: default
count: 1
config:
node.store.allow_mmap: false
# run init container as root to chown the volume to uid 1000
podTemplate:
spec:
securityContext:
runAsUser: 1000
runAsGroup: 0
initContainers:
- name: elastic-internal-init-filesystem
securityContext:
runAsUser: 0
runAsGroup: 0
And the pod starts up and can write to the volume as uid 1000.
I am trying to rename a nodeset in my ECK cluster. Below is my Elastisearch cluster yaml:
apiVersion: elasticsearch.k8s.elastic.co/v1
kind: Elasticsearch
metadata:
name: elastic-test
spec:
version: 7.11.1
auth:
roles:
- secretName: elastic-roles-secret
fileRealm:
- secretName: elastic-filerealm-secret
nodeSets:
- name: default
count: 1
config:
node.store.allow_mmap: false
volumeClaimTemplates:
- metadata:
name: azure-pvc
spec:
storageClassName: ""
accessModes:
- ReadWriteMany
resources:
requests:
storage: 25Gi
volumeName: elasticsearch-azure-pv
podTemplate:
spec:
initContainers:
- name: install-plugins
command:
- sh
- -c
- |
bin/elasticsearch-plugin install --batch ingest-attachment
I want to change the nodeset name from default to default2.
However, the new pod created is stuck on Pending.
kubectl describe the new pod:
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedScheduling 4m12s default-scheduler 0/3 nodes are available: 3 pod has unbound immediate PersistentVolumeClaims.
Warning FailedScheduling 4m12s default-scheduler 0/3 nodes are available: 3 pod has unbound immediate PersistentVolumeClaims.
Because both the old PVC was not deleted, the new PVC cannot bind to the same PV. AFAIK, for the intended behaviour the old PVC and pod should be deleted and the new pod and PVC can bind to the PV.
To provide some context, my deployment environment only allows me to apply yaml files (no running kubectl delete), and the goal is to add the ingest-attachment plugin. So I am trying to restart the existing pod by renaming it.
I'm playing with the Elasticsearch operator Kubernetes and created two stateful sets (see https://www.elastic.co/guide/en/cloud-on-k8s/current/k8s-orchestration.html):
apiVersion: elasticsearch.k8s.elastic.co/v1
kind: Elasticsearch
metadata:
name: quickstart
spec:
version: 7.12.1
nodeSets:
- name: master-nodes
count: 3
config:
node.roles: ["master"]
volumeClaimTemplates:
- metadata:
name: elasticsearch-data
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 1Gi
storageClassName: standard
- name: data-nodes
count: 3
config:
node.roles: ["data"]
volumeClaimTemplates:
- metadata:
name: elasticsearch-data
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 1Gi
storageClassName: standard
The problem is that I cannot delete the stateful sets. After deletion, they're recreated automatically:
my-PC:~$ kubectl get sts
NAME READY AGE
quickstart-es-data-nodes 0/0 14m
quickstart-es-master-nodes 0/0 18m
my-PC:~$ kubectl delete sts quickstart-es-data-nodes --force --grace-period=0
warning: Immediate deletion does not wait for confirmation that the running resource has been terminated. The resource may continue to run on the cluster indefinitely.
statefulset.apps "quickstart-es-data-nodes" force deleted
my-PC:~$ kubectl get sts
NAME READY AGE
quickstart-es-data-nodes 0/3 3s
quickstart-es-master-nodes 0/0 18m
Before deletion I already scaled down the statefulset to 0 to ensure that all pods are terminated. But after deletion, the stateful is recreated (see quickstart-es-data-nodes).
So, anyone having any idea how I can delete the stateful sets without being recreated?
it's due to the operator you are using for the Elasticsearch. Operator manage the statefulset and will update if you delete it.
Behind the scenes, ECK translates each NodeSet specified in the
Elasticsearch resource into a StatefulSet in Kubernetes.
if you read the documentation: https://www.elastic.co/guide/en/cloud-on-k8s/current/k8s-orchestration.html#k8s-statefulsets
https://kubernetes.io/docs/concepts/workloads/controllers/statefulset/#on-delete
You have to delete the custom object. The operator owns those StatefulSets and will continually update them to match its expected content.
I finally got the answer... I need to run the following command for deletion:
kubectl delete elasticsearch quickstart
This finally removed the quickstart examples.
I'm trying to deploy elk stack in kubernetes cluster with helm, using this chart. When I launch
helm install elk-stack stable/elastic-stack
I receive the following message:
NAME: elk-stack
LAST DEPLOYED: Mon Aug 24 07:30:31 2020
NAMESPACE: default
STATUS: deployed
REVISION: 1
NOTES:
The elasticsearch cluster and associated extras have been installed.
Kibana can be accessed:
* Within your cluster, at the following DNS name at port 9200:
elk-stack-elastic-stack.default.svc.cluster.local
* From outside the cluster, run these commands in the same shell:
export POD_NAME=$(kubectl get pods --namespace default -l "app=elastic-stack,release=elk-stack" -o jsonpath="{.items[0].metadata.name}")
echo "Visit http://127.0.0.1:5601 to use Kibana"
kubectl port-forward --namespace default $POD_NAME 5601:5601
But when I run
kubectl get pods
the result is:
NAME READY STATUS RESTARTS AGE
elk-stack-elasticsearch-client-7fcfc7b858-5f7fw 0/1 Running 0 12m
elk-stack-elasticsearch-client-7fcfc7b858-zdkwd 0/1 Running 1 12m
elk-stack-elasticsearch-data-0 0/1 Pending 0 12m
elk-stack-elasticsearch-master-0 0/1 Pending 0 12m
elk-stack-kibana-cb7d9ccbf-msw95 1/1 Running 0 12m
elk-stack-logstash-0 0/1 Pending 0 12m
Using kubectl describe pods command, I see that for elasticsearch pods the problem is:
Warning FailedScheduling 6m29s default-scheduler running "VolumeBinding" filter plugin for pod "elk-stack-elasticsearch-data-0": pod has unbound immediate PersistentVolumeClaims
and for logstash pods:
Warning FailedScheduling 7m53s default-scheduler running "VolumeBinding" filter plugin for pod "elk-stack-logstash-0": pod has unbound immediate PersistentVolumeClaims
Output of kubectl get pv,pvc,sc -A:
NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGECLASS REASON AGE
persistentvolume/elasticsearch-data 10Gi RWO Retain Bound default/elasticsearch-data manual 16d
NAMESPACE NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
default persistentvolumeclaim/claim1 Pending slow 64m
default persistentvolumeclaim/data-elk-stack-elasticsearch-data-0 Pending 120m
default persistentvolumeclaim/data-elk-stack-elasticsearch-master-0 Pending 120m
default persistentvolumeclaim/data-elk-stack-logstash-0 Pending 120m
default persistentvolumeclaim/elasticsearch-data Bound elasticsearch-data 10Gi RWO manual 16d
default persistentvolumeclaim/elasticsearch-data-elasticsearch-data-0 Pending 17d
default persistentvolumeclaim/elasticsearch-data-elasticsearch-data-1 Pending 17d
default persistentvolumeclaim/elasticsearch-data-quickstart-es-default-0 Pending 16d
default persistentvolumeclaim/elasticsearch-master-elasticsearch-master-0 Pending 17d
default persistentvolumeclaim/elasticsearch-master-elasticsearch-master-1 Pending 17d
default persistentvolumeclaim/elasticsearch-master-elasticsearch-master-2 Pending 16d
NAMESPACE NAME PROVISIONER RECLAIMPOLICY VOLUMEBINDINGMODE ALLOWVOLUMEEXPANSION AGE
storageclass.storage.k8s.io/slow (default) kubernetes.io/gce-pd Delete Immediate false 66m
Storage class slow and Persistent volume claim claim1 are my experiments. I create they using kubectl create and a yaml file, the others is automatically created by helm (I think).
Output of kubectl get pvc data-elk-stack-elasticsearch-master-0 -o yaml:
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
creationTimestamp: "2020-08-24T07:30:38Z"
finalizers:
- kubernetes.io/pvc-protection
labels:
app: elasticsearch
release: elk-stack
managedFields:
- apiVersion: v1
fieldsType: FieldsV1
fieldsV1:
f:metadata:
f:labels:
.: {}
f:app: {}
f:release: {}
f:spec:
f:accessModes: {}
f:resources:
f:requests:
.: {}
f:storage: {}
f:volumeMode: {}
f:status:
f:phase: {}
manager: kube-controller-manager
operation: Update
time: "2020-08-24T07:30:38Z"
name: data-elk-stack-elasticsearch-master-0
namespace: default
resourceVersion: "201123"
selfLink: /api/v1/namespaces/default/persistentvolumeclaims/data-elk-stack-elasticsearch-master-0
uid: de58f769-f9a7-41ad-a449-ef16d4b72bc6
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 4Gi
volumeMode: Filesystem
status:
phase: Pending
Can somebody please help me to fix this problem? Thanks in advance.
The reason why pod is pending is below PVCs are pending because corresponding PVs are not created.
data-elk-stack-elasticsearch-master-0
data-elk-stack-logstash-0
data-elk-stack-elasticsearch-data-0
Since you have mentioned this is for local development you can use hostPath volume for the PV. So create PV for each of the pending PVCs using the sample PV below. So you will create 3 PVs in total.
apiVersion: v1
kind: PersistentVolume
metadata:
name: elk-master
labels:
type: local
spec:
capacity:
storage: 4Gi
accessModes:
- ReadWriteOnce
hostPath:
path: "/mnt/data"
---
apiVersion: v1
kind: PersistentVolume
metadata:
name: elk-logstash
labels:
type: local
spec:
capacity:
storage: 2Gi
accessModes:
- ReadWriteOnce
hostPath:
path: "/mnt/data"
---
apiVersion: v1
kind: PersistentVolume
metadata:
name: elk-data
labels:
type: local
spec:
capacity:
storage: 30Gi
accessModes:
- ReadWriteOnce
hostPath:
path: "/mnt/data"
I created registry credits and when I apply on pod like this:
apiVersion: v1
kind: Pod
metadata:
name: private-reg
spec:
containers:
- name: private-reg-container
image: registry.io.io/simple-node
imagePullSecrets:
- name: regcred
it works succesfly pull image
But if I try to do this:
apiVersion: apps/v1
kind: Deployment
metadata:
name: node123
namespace: node123
spec:
replicas: 5
strategy:
type: RollingUpdate
rollingUpdate:
maxSurge: 2
maxUnavailable: 0
selector:
matchLabels:
name: node123
template:
metadata:
labels:
name: node123
spec:
containers:
- name: node123
image: registry.io.io/simple-node
ports:
- containerPort: 3000
imagePullSecrets:
- name: regcred
On pod will get error: ImagePullBackOff
when I describe it getting
Failed to pull image "registry.io.io/simple-node": rpc error: code =
Unknown desc = Error response from daemon: Get
https://registry.io.io/v2/simple-node/manifests/latest: no basic auth
credentials
Anyone know how to solve this issue?
We are always running images from private registry. And this checklist might help you :
Put your params in env variable in your terminal to have single source of truth:
export DOCKER_HOST=registry.io.io
export DOCKER_USER=<your-user>
export DOCKER_PASS=<your-pass>
Make sure that you can authenticate & the image really exist
echo $DOCKER_PASS | docker login -u$DOCKER_USER --password-stdin $DOCKER_HOST
docker pull ${DOCKER_HOST}/simple-node
Make sure that you created the Dockerconfig secret in the same namespace of pod/deployment;
namespace=mynamespace # default
kubectl -n ${namespace} create secret docker-registry regcred \
--docker-server=${DOCKER_HOST} \
--docker-username=${DOCKER_USER} \
--docker-password=${DOCKER_PASS} \
--docker-email=anything#will.work.com
Patch the service account used by the Pod with the secret
namespace=mynamespace
kubectl -n ${namespace} patch serviceaccount default \
-p '{"imagePullSecrets": [{"name": "regcred"}]}'
# if the pod use another service account,
# replace "default" by the relevant service account
or
Add imagePullSecrets in the pod :
imagePullSecrets:
- name: regcred
containers:
- ....