Missing queues from RabbitMQ Metricbeat - elasticsearch
It looks like only a fraction of the queues on my RabbitMQ cluster are making it into Elasticsearch via Metricbeat.
When I query RabbitMQ's /api/overview, I see 887 queues reported:
object_totals: {
consumers: 517,
queues: 887,
exchanges: 197,
connections: 305,
channels: 622
},
When I query RabbitMQ's /api/queues (which is what Metricbeat hits), I count 887 queues there as well.
When I get a unique count of the field rabbitmq.queue.name in Elasticsearch, I am seeing only 309 queues.
I don't see anything in the debug output that stands out to me. It's just the usual INFO level startup messages, followed by the publish information:
root#rabbitmq:/etc/metricbeat# metricbeat -e
2019-06-24T21:13:33.692Z INFO instance/beat.go:571 Home path: [/usr/share/metricbeat] Config path: [/etc/metricbeat] Data path: [/var/lib/metricbeat] Logs path: [/var/log/metricbeat]
2019-06-24T21:13:33.692Z INFO instance/beat.go:579 Beat ID: xxx
2019-06-24T21:13:33.692Z INFO [index-management.ilm] ilm/ilm.go:129 Policy name: metricbeat-7.1.1
2019-06-24T21:13:33.692Z INFO [seccomp] seccomp/seccomp.go:116 Syscall filter successfully installed
2019-06-24T21:13:33.692Z INFO [beat] instance/beat.go:827 Beat info {"system_info": {"beat": {"path": {"config": "/etc/metricbeat", "data": "/var/lib/metricbeat", "home": "/usr/share/metricbeat", "logs": "/var/log/metricbeat"}, "type": "metricbeat", "uuid": "xxx"}}}
2019-06-24T21:13:33.692Z INFO [beat] instance/beat.go:836 Build info {"system_info": {"build": {"commit": "3358d9a5a09e3c6709a2d3aaafde628ea34e8419", "libbeat": "7.1.1", "time": "2019-05-23T13:23:10.000Z", "version": "7.1.1"}}}
2019-06-24T21:13:33.692Z INFO [beat] instance/beat.go:839 Go runtime info {"system_info": {"go": {"os":"linux","arch":"amd64","max_procs":4,"version":"go1.11.5"}}}
[...]
2019-06-24T21:13:33.694Z INFO [beat] instance/beat.go:872 Process info {"system_info": {"process": {"capabilities": {"inheritable":null,"permitted":["chown","dac_override","dac_read_search","fowner","fsetid","kill","setgid","setuid","setpcap","linux_immutable","net_bind_service","net_broadcast","net_admin","net_raw","ipc_lock","ipc_owner","sys_module","sys_rawio","sys_chroot","sys_ptrace","sys_pacct","sys_admin","sys_boot","sys_nice","sys_resource","sys_time","sys_tty_config","mknod","lease","audit_write","audit_control","setfcap","mac_override","mac_admin","syslog","wake_alarm","block_suspend","audit_read"],"effective":["chown","dac_override","dac_read_search","fowner","fsetid","kill","setgid","setuid","setpcap","linux_immutable","net_bind_service","net_broadcast","net_admin","net_raw","ipc_lock","ipc_owner","sys_module","sys_rawio","sys_chroot","sys_ptrace","sys_pacct","sys_admin","sys_boot","sys_nice","sys_resource","sys_time","sys_tty_config","mknod","lease","audit_write","audit_control","setfcap","mac_override","mac_admin","syslog","wake_alarm","block_suspend","audit_read"],"bounding":["chown","dac_override","dac_read_search","fowner","fsetid","kill","setgid","setuid","setpcap","linux_immutable","net_bind_service","net_broadcast","net_admin","net_raw","ipc_lock","ipc_owner","sys_module","sys_rawio","sys_chroot","sys_ptrace","sys_pacct","sys_admin","sys_boot","sys_nice","sys_resource","sys_time","sys_tty_config","mknod","lease","audit_write","audit_control","setfcap","mac_override","mac_admin","syslog","wake_alarm","block_suspend","audit_read"],"ambient":null}, "cwd": "/etc/metricbeat", "exe": "/usr/share/metricbeat/bin/metricbeat", "name": "metricbeat", "pid": 30898, "ppid": 30405, "seccomp": {"mode":"filter","no_new_privs":true}, "start_time": "2019-06-24T21:13:33.100Z"}}}
2019-06-24T21:13:33.694Z INFO instance/beat.go:280 Setup Beat: metricbeat; Version: 7.1.1
2019-06-24T21:13:33.694Z INFO [publisher] pipeline/module.go:97 Beat name: metricbeat
2019-06-24T21:13:33.694Z INFO instance/beat.go:391 metricbeat start running.
2019-06-24T21:13:33.694Z INFO cfgfile/reload.go:150 Config reloader started
2019-06-24T21:13:33.694Z INFO [monitoring] log/log.go:117 Starting metrics logging every 30s
[...]
2019-06-24T21:13:43.696Z INFO filesystem/filesystem.go:57 Ignoring filesystem types: sysfs, rootfs, ramfs, bdev, proc, cpuset, cgroup, cgroup2, tmpfs, devtmpfs, configfs, debugfs, tracefs, securityfs, sockfs, dax, bpf, pipefs, hugetlbfs, devpts, ecryptfs, fuse, fusectl, pstore, mqueue, autofs
2019-06-24T21:13:43.696Z INFO fsstat/fsstat.go:59 Ignoring filesystem types: sysfs, rootfs, ramfs, bdev, proc, cpuset, cgroup, cgroup2, tmpfs, devtmpfs, configfs, debugfs, tracefs, securityfs, sockfs, dax, bpf, pipefs, hugetlbfs, devpts, ecryptfs, fuse, fusectl, pstore, mqueue, autofs
2019-06-24T21:13:44.696Z INFO pipeline/output.go:95 Connecting to backoff(async(tcp://xxx))
2019-06-24T21:13:44.711Z INFO pipeline/output.go:105 Connection to backoff(async(tcp://xxx)) established
2019-06-24T21:14:03.696Z INFO [monitoring] log/log.go:144 Non-zero metrics in the last 30s {"monitoring": {"metrics": {"beat":{"cpu":{"system":{"ticks":130,"time":{"ms":131}},"total":{"ticks":1960,"time":{"ms":1965},"value":1960},"user":{"ticks":1830,"time":{"ms":1834}}},"handles":{"limit":{"hard":1048576,"soft":1024},"open":12},"info":{"ephemeral_id":"xxx","uptime":{"ms":30030}},"memstats":{"gc_next":30689808,"memory_alloc":21580680,"memory_total":428076400,"rss":79917056}},"libbeat":{"config":{"module":{"running":0},"reloads":2},"output":{"events":{"acked":7825,"batches":11,"total":7825},"read":{"bytes":66},"type":"logstash","write":{"bytes":870352}},"pipeline":{"clients":4,"events":{"active":313,"published":8138,"retry":523,"total":8138},"queue":{"acked":7825}}},"metricbeat":{"rabbitmq":{"connection":{"events":2987,"failures":10,"success":2977},"exchange":{"events":1970,"success":1970},"node":{"events":10,"success":10},"queue":{"events":3130,"failures":10,"success":3120}},"system":{"cpu":{"events":2,"success":2},"filesystem":{"events":7,"success":7},"fsstat":{"events":1,"success":1},"load":{"events":2,"success":2},"memory":{"events":2,"success":2},"network":{"events":4,"success":4},"process":{"events":18,"success":18},"process_summary":{"events":2,"success":2},"socket_summary":{"events":2,"success":2},"uptime":{"events":1,"success":1}}},"system":{"cpu":{"cores":4},"load":{"1":0.48,"15":0.28,"5":0.15,"norm":{"1":0.12,"15":0.07,"5":0.0375}}}}}}
I think if there were a problem getting the queue, I should see an error in the logs above as per https://github.com/elastic/beats/blob/master/metricbeat/module/rabbitmq/queue/data.go#L94-L104
Here's the metricbeat.yml:
metricbeat.config.modules:
path: ${path.config}/modules.d/*.yml
reload.enabled: true
reload.period: 10s
setup.template.settings:
index.number_of_shards: 1
index.codec: best_compression
name: metricbeat
fields:
environment: development
processors:
- add_cloud_metadata: ~
output.logstash:
hosts: ["xxx"]
Here's the modules.d/rabbitmq.yml:
- module: rabbitmq
metricsets: ["node", "queue", "connection", "exchange"]
enabled: true
period: 2s
hosts: ["xxx"]
username: xxx
password: xxx
I solved it by upgrading Elastic Stack from 7.1.1 to 7.2.0.
Related
Error in Filebeat logs - not able to view data in kibana
Recently upgraded to 7.17.7 filebeat. Using elasticsearch, kibana and filebeat, all 7.17.7. However , I am not able to see the logs in kibana, as filebeat is not sending the logs to elasticsearch and kibana. In filebeat saw error - ERROR [publisher_pipeline_output] pipeline/output.go:154 Failed to connect to backoff(elasticsearch(http://localhost:9200)): Connection marked as failed because the onConnect callback failed: resource 'filebeat-7.17.7' exists, but it is not an alias Can someone help to figure out what could be the cause and solution for this error? restarted filebeat, but didnt help. Filebeat config - filebeat.inputs: - type: log enabled: true paths: - /var/www/vhosts/rshop/current/var/log/*.log multiline.pattern: ^\[[0-9]{4}-[0-9]{2}-[0-9]{2} multiline.negate: true multiline.match: after filebeat.config.modules: path: ${path.config}/modules.d/*.yml reload.enabled: false setup.template.settings: index.number_of_shards: 3 setup.ilm.enabled: false setup.kibana: output.elasticsearch: hosts: ["localhost:9200"] indices: - index: "r-logs-%{[agent.version]}-%{+yyyy.MM.dd}" when.regexp: log.file.path: '^.+\/var\/log\/recalculation\.log$' pipelines: - pipeline: "filebeat-6.8.7-monolog-pipeline" when.or: - regexp: log.file.path: '^.+\/var\/log\/recalculation\.log$' processors: - add_host_metadata: ~ - add_cloud_metadata: ~ logging.level: info logging.to_files: true logging.files: path: /var/log/filebeat name: filebeat keepfiles: 7 permissions: 0755
#NKumar most likely its an upgrade issue from legacy to new index templates, which will happen if you don't mark them as true for overwriting. Can you please provide info from what version of stack did you upgrade to 7.17? Also, the quick solution would be to just add an alias to your filebeat index as: POST /_aliases { "actions" : [ { "add" : { "index" : "filebeat-7.17.7", "alias" : "filebeat-7.17.7_1", "is_write_index" : true } } ]} or a more persistent solution would be to add following setting in filebeat: setup.template.settings: setup.template.enabled: true setup.template.overwrite: true
Trying to Verify ELK installation , kibana dashboard not showing logbeats in discover tab
I used helm to load the ELK stack on kubernetes. I ran the following commands minikube start --cpus 4 --memory 8192 minikube addons enable ingress helm repo add elastic https://helm.elastic.co helm repo update Then deployed elasticsearch values-02.yml replicas: 1 minimumMasterNodes: 1 ingress: enabled: true hosts: - host: es-elk.s9.devopscloud.link #Change the hostname to the one you need paths: - path: / volumeClaimTemplate: accessModes: ["ReadWriteOnce"] resources: requests: storage: 1Gi Applied it helm install elk-elasticsearch elastic/elasticsearch -f values-02.yml Then deployed kibana values-03.yml elasticsearchHosts: "http://elasticsearch-master:9200" ingress: enabled: true className: "nginx" hosts: - host: paths: - path: / Applied it helm install elk-kibana elastic/kibana -f values-03.yml Then deployed logstash persistence: enabled: true logstashConfig: logstash.yml: | http.host: 0.0.0.0 xpack.monitoring.enabled: false logstashPipeline: logstash.conf: | input { beats { port => 5044 } } output { elasticsearch { hosts => "http://elasticsearch-master.logging.svc.cluster.local:9200" manage_template => false index => "%{[#metadata][beat]}-%{+YYYY.MM.dd}" document_type => "%{[#metadata][type]}" } } service: type: ClusterIP ports: - name: beats port: 5044 protocol: TCP targetPort: 5044 - name: http port: 8080 protocol: TCP targetPort: 8080 Applied it helm install elk-logstash elastic/logstash -f values-04.yaml Then deployed filebeat values-05.yaml daemonset: filebeatConfig: filebeat.yml: | filebeat.inputs: - type: container paths: - /var/log/containers/*.log processors: - add_kubernetes_metadata: host: ${NODE_NAME} matchers: - logs_path: logs_path: "/var/log/containers/" output.logstash: hosts: ["elk-logstash-logstash:5044"] Then applied it helm install elk-filebeat elastic/filebeat -f values-05.yaml All up and running kubectl get pods NAME READY STATUS RESTARTS AGE elasticsearch-master-0 1/1 Running 0 61m elk-filebeat-filebeat-ggjhc 1/1 Running 0 45m elk-kibana-kibana-6d658894bf-grb8x 1/1 Running 0 52m elk-logstash-logstash-0 1/1 Running 0 47m But when I go onto the discover page http://172.21.95.140/app/management/kibana/indexPatterns?bannerMessage=To%20visualize%20and%20explore%20data%20in%20Kibana,%20you%20must%20create%20an%20index%20pattern%20to%20retrieve%20data%20from%20Elasticsearch. It does not show anything, for filebeats Instead I get a Ready to try Kibana? First, you need data message. I was following this tutorial https://blog.knoldus.com/how-to-deploy-elk-stack-on-kubernetes/#deploy-elastic-search
I followed this tutorial and ran the default kibana and filebeat yaml files.
Greenplum Operator on kubernetes zapr error
I am trying to deploy Greenplum Operator on kubernetes and I get the following error: kubectl describe pod greenplum-operator-87d989b4d-ldft6: Name: greenplum-operator-87d989b4d-ldft6 Namespace: greenplum Priority: 0 Node: node-1/some-ip Start Time: Mon, 23 May 2022 14:07:26 +0200 Labels: app=greenplum-operator pod-template-hash=87d989b4d Annotations: cni.projectcalico.org/podIP: some-ip cni.projectcalico.org/podIPs: some-ip Status: Running IP: some-ip IPs: IP: some-ip Controlled By: ReplicaSet/greenplum-operator-87d989b4d Containers: greenplum-operator: Container ID: docker://364997050b1f337ff61b8ce40534697bbc13aae29f7b9ae5255245375acce03f Image: greenplum-operator:v2.3.0 Image ID: docker-pullable://greenplum-operator:v2.3.0 Port: <none> Host Port: <none> Command: greenplum-operator --logLevel debug State: Waiting Reason: CrashLoopBackOff Last State: Terminated Reason: Error Exit Code: 1 Started: Mon, 23 May 2022 15:29:59 +0200 Finished: Mon, 23 May 2022 15:30:32 +0200 Ready: False Restart Count: 19 Environment: GREENPLUM_IMAGE_REPO: greenplum-operator:v2.3.0 GREENPLUM_IMAGE_TAG: v2.3.0 OPERATOR_IMAGE_REPO: greenplum-operator:v2.3.0 OPERATOR_IMAGE_TAG: v2.3.0 Mounts: /var/run/secrets/kubernetes.io/serviceaccount from greenplum-system-operator-token-xcz4q (ro) Conditions: Type Status Initialized True Ready False ContainersReady False PodScheduled True Volumes: greenplum-system-operator-token-xcz4q: Type: Secret (a volume populated by a Secret) SecretName: greenplum-system-operator-token-xcz4q Optional: false QoS Class: BestEffort Node-Selectors: <none> Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s node.kubernetes.io/unreachable:NoExecute op=Exists for 300s Events: Type Reason Age From Message ---- ------ ---- ---- ------- Warning BackOff 32s (x340 over 84m) kubelet Back-off restarting failed container kubectl logs greenplum-operator-87d989b4d-ldft6 {"level":"INFO","ts":"2022-05-23T13:35:38.735Z","logger":"setup","msg":"Go Info","Version":"go1.14.10","GOOS":"linux","GOARCH":"amd64"} {"level":"INFO","ts":"2022-05-23T13:35:41.242Z","logger":"controller-runtime.metrics","msg":"metrics server is starting to listen","addr":":8080"} {"level":"INFO","ts":"2022-05-23T13:35:41.262Z","logger":"setup","msg":"starting manager"} {"level":"INFO","ts":"2022-05-23T13:35:41.262Z","logger":"admission","msg":"starting greenplum validating admission webhook server"} {"level":"INFO","ts":"2022-05-23T13:35:41.262Z","logger":"controller-runtime.controller","msg":"Starting EventSource","controller":"greenplumpxfservice","source":"kind source: /, Kind="} {"level":"INFO","ts":"2022-05-23T13:35:41.264Z","logger":"controller-runtime.controller","msg":"Starting EventSource","controller":"greenplumplservice","source":"kind source: /, Kind="} {"level":"INFO","ts":"2022-05-23T13:35:41.264Z","logger":"controller-runtime.controller","msg":"Starting EventSource","controller":"greenplumcluster","source":"kind source: /, Kind="} {"level":"INFO","ts":"2022-05-23T13:35:41.262Z","logger":"controller-runtime.manager","msg":"starting metrics server","path":"/metrics"} {"level":"INFO","ts":"2022-05-23T13:35:41.265Z","logger":"controller-runtime.controller","msg":"Starting EventSource","controller":"greenplumtextservice","source":"kind source: /, Kind="} {"level":"INFO","ts":"2022-05-23T13:35:41.361Z","logger":"admission","msg":"CertificateSigningRequest: created"} {"level":"INFO","ts":"2022-05-23T13:35:41.363Z","logger":"controller-runtime.controller","msg":"Starting EventSource","controller":"greenplumpxfservice","source":"kind source: /, Kind="} {"level":"INFO","ts":"2022-05-23T13:35:41.364Z","logger":"controller-runtime.controller","msg":"Starting EventSource","controller":"greenplumplservice","source":"kind source: /, Kind="} {"level":"INFO","ts":"2022-05-23T13:35:41.364Z","logger":"controller-runtime.controller","msg":"Starting EventSource","controller":"greenplumcluster","source":"kind source: /, Kind="} {"level":"INFO","ts":"2022-05-23T13:35:41.366Z","logger":"controller-runtime.controller","msg":"Starting EventSource","controller":"greenplumtextservice","source":"kind source: /, Kind="} {"level":"INFO","ts":"2022-05-23T13:35:41.464Z","logger":"controller-runtime.controller","msg":"Starting Controller","controller":"greenplumpxfservice"} {"level":"INFO","ts":"2022-05-23T13:35:41.464Z","logger":"controller-runtime.controller","msg":"Starting Controller","controller":"greenplumplservice"} {"level":"INFO","ts":"2022-05-23T13:35:41.465Z","logger":"controller-runtime.controller","msg":"Starting workers","controller":"greenplumplservice","worker count":1} {"level":"INFO","ts":"2022-05-23T13:35:41.465Z","logger":"controller-runtime.controller","msg":"Starting Controller","controller":"greenplumcluster"} {"level":"INFO","ts":"2022-05-23T13:35:41.465Z","logger":"controller-runtime.controller","msg":"Starting workers","controller":"greenplumpxfservice","worker count":1} {"level":"INFO","ts":"2022-05-23T13:35:41.465Z","logger":"controller-runtime.controller","msg":"Starting workers","controller":"greenplumcluster","worker count":1} {"level":"INFO","ts":"2022-05-23T13:35:41.466Z","logger":"controller-runtime.controller","msg":"Starting Controller","controller":"greenplumtextservice"} {"level":"INFO","ts":"2022-05-23T13:35:41.466Z","logger":"controller-runtime.controller","msg":"Starting workers","controller":"greenplumtextservice","worker count":1} {"level":"ERROR","ts":"2022-05-23T13:36:11.368Z","logger":"setup","msg":"error","error":"getting certificate for webhook: failure while waiting for approval: timed out waiting for the condition","errorCauses":[{"error":"getting certificate for webhook: failure while waiting for approval: timed out waiting for the condition"}],"stacktrace":"github.com/go-logr/zapr.(*zapLogger).Error\n\t/go/pkg/mod/github.com/go-logr/zapr#v0.1.0/zapr.go:128\nmain.main\n\t/greenplum-for-kubernetes/greenplum-operator/cmd/greenplumOperator/main.go:35\nruntime.main\n\t/usr/local/go/src/runtime/proc.go:203"} I tried to redeploy the cert-manager and check logs but couldn't find anything. Documentation of the greenplum-for-kubernetes doesn't mention anything about that. Read the whole troubleshooting document from the pivotal website too
K8s Elasticsearch with filebeat is keeping 'not ready' after rebooting
I'm going through a not very understandable situation. Environment Two dedicated nodes with azure centos 8.2 (2vcpu, 16G ram), not AKS 1 master node, 1 worker node. kubernetes v1.19.3 helm v2.16.12 Helm charts Elastic (https://github.com/elastic/helm-charts/tree/7.9.3) At the first time, It works fine with below installation. ## elasticsearch, filebeat # kubectl apply -f pv.yaml # helm install -f values.yaml --name elasticsearch elastic/elasticsearch # helm install --name filebeat --version 7.9.3 elastic/filebeat curl elasitcsearchip:9200 and curl elasitcsearchip:9200/_cat/indices show right values. but after rebooting a worker node, it just keeping ready 0/1 and not working. NAME READY STATUS RESTARTS AGE elasticsearch-master-0 0/1 Running 10 71m filebeat-filebeat-67qm2 0/1 Running 4 40m In this situation, after removing /mnt/data/nodes and rebooting again then works fine. elasticsearch pod has nothing special I think. #describe {"type": "server", "timestamp": "2020-10-26T07:49:49,708Z", "level": "INFO", "component": "o.e.c.r.a.AllocationService", "cluster.name": "elasticsearch", "node.name": "elasticsearch-master-0", "message": "Cluster health status changed from [RED] to [YELLOW] (reason: [shards started [[filebeat-7.9.3-2020.10.26-000001][0]]]).", "cluster.uuid": "sWUAXJG9QaKyZDe0BLqwSw", "node.id": "ztb35hToRf-2Ahr7olympw" } #logs Normal SandboxChanged 4m4s (x3 over 4m9s) kubelet Pod sandbox changed, it will be killed and re-created. Normal Pulled 4m3s kubelet Container image "docker.elastic.co/elasticsearch/elasticsearch:7.9.3" already present on machine Normal Created 4m1s kubelet Created container configure-sysctl Normal Started 4m1s kubelet Started container configure-sysctl Normal Pulled 3m58s kubelet Container image "docker.elastic.co/elasticsearch/elasticsearch:7.9.3" already present on machine Normal Created 3m58s kubelet Created container elasticsearch Normal Started 3m57s kubelet Started container elasticsearch Warning Unhealthy 91s (x14 over 3m42s) kubelet Readiness probe failed: Waiting for elasticsearch cluster to become ready (request params: "wait_for_status=green&timeout=1s" ) Cluster is not yet ready (request params: "wait_for_status=green&timeout=1s" ) #events 6m1s Normal Pulled pod/elasticsearch-master-0 Container image "docker.elastic.co/elasticsearch/elasticsearch:7.9.3" already present on machine 6m1s Normal Pulled pod/filebeat-filebeat-67qm2 Container image "docker.elastic.co/beats/filebeat:7.9.3" already present on machine 5m59s Normal Started pod/elasticsearch-master-0 Started container configure-sysctl 5m59s Normal Created pod/elasticsearch-master-0 Created container configure-sysctl 5m59s Normal Created pod/filebeat-filebeat-67qm2 Created container filebeat 5m58s Normal Started pod/filebeat-filebeat-67qm2 Started container filebeat 5m56s Normal Created pod/elasticsearch-master-0 Created container elasticsearch 5m56s Normal Pulled pod/elasticsearch-master-0 Container image "docker.elastic.co/elasticsearch/elasticsearch:7.9.3" already present on machine 5m55s Normal Started pod/elasticsearch-master-0 Started container elasticsearch 61s Warning Unhealthy pod/filebeat-filebeat-67qm2 Readiness probe failed: elasticsearch: http://elasticsearch-master:9200... parse url... OK connection... parse host... OK dns lookup... OK addresses: 10.97.133.135 dial up... ERROR dial tcp 10.97.133.135:9200: connect: connection refused 59s Warning Unhealthy pod/elasticsearch-master-0 Readiness probe failed: Waiting for elasticsearch cluster to become ready (request params: "wait_for_status=green&timeout=1s" ) Cluster is not yet ready (request params: "wait_for_status=green&timeout=1s" ) /mnt/data path has chown 1000:1000 and In case of only elastisearch without filebeat, rebooting has no problem. I can't figure this out at all. :( What am I missing? pv.yaml kind: PersistentVolume apiVersion: v1 metadata: name: elastic-pv labels: type: local app: elastic spec: storageClassName: local-storage capacity: storage: 10Gi accessModes: - ReadWriteOnce persistentVolumeReclaimPolicy: Retain claimRef: namespace: default name: elasticsearch-master-elasticsearch-master-0 hostPath: path: "/mnt/data" values.yaml --- clusterName: "elasticsearch" nodeGroup: "master" # The service that non master groups will try to connect to when joining the cluster # This should be set to clusterName + "-" + nodeGroup for your master group masterService: "" # Elasticsearch roles that will be applied to this nodeGroup # These will be set as environment variables. E.g. node.master=true roles: master: "true" ingest: "true" data: "true" replicas: 1 minimumMasterNodes: 1 esMajorVersion: "" # Allows you to add any config files in /usr/share/elasticsearch/config/ # such as elasticsearch.yml and log4j2.properties esConfig: {} # elasticsearch.yml: | # key: # nestedkey: value # log4j2.properties: | # key = value # Extra environment variables to append to this nodeGroup # This will be appended to the current 'env:' key. You can use any of the kubernetes env # syntax here extraEnvs: [] # - name: MY_ENVIRONMENT_VAR # value: the_value_goes_here # Allows you to load environment variables from kubernetes secret or config map envFrom: [] # - secretRef: # name: env-secret # - configMapRef: # name: config-map # A list of secrets and their paths to mount inside the pod # This is useful for mounting certificates for security and for mounting # the X-Pack license secretMounts: [] # - name: elastic-certificates # secretName: elastic-certificates # path: /usr/share/elasticsearch/config/certs # defaultMode: 0755 image: "docker.elastic.co/elasticsearch/elasticsearch" imageTag: "7.9.3" imagePullPolicy: "IfNotPresent" podAnnotations: {} # iam.amazonaws.com/role: es-cluster # additionals labels labels: {} esJavaOpts: "-Xmx1g -Xms1g" resources: requests: cpu: "500m" memory: "1Gi" limits: cpu: "1000m" memory: "2Gi" initResources: {} # limits: # cpu: "25m" # # memory: "128Mi" # requests: # cpu: "25m" # memory: "128Mi" sidecarResources: {} # limits: # cpu: "25m" # # memory: "128Mi" # requests: # cpu: "25m" # memory: "128Mi" networkHost: "0.0.0.0" volumeClaimTemplate: accessModes: [ "ReadWriteOnce" ] storageClassName: local-storage resources: requests: storage: 5Gi rbac: create: false serviceAccountAnnotations: {} serviceAccountName: "" podSecurityPolicy: create: false name: "" spec: privileged: true fsGroup: rule: RunAsAny runAsUser: rule: RunAsAny seLinux: rule: RunAsAny supplementalGroups: rule: RunAsAny volumes: - secret - configMap - persistentVolumeClaim persistence: enabled: true name: elastic-vc labels: # Add default labels for the volumeClaimTemplate fo the StatefulSet app: elastic annotations: {} extraVolumes: [] # - name: extras # emptyDir: {} extraVolumeMounts: [] # - name: extras # mountPath: /usr/share/extras # readOnly: true extraContainers: [] # - name: do-something # image: busybox # command: ['do', 'something'] extraInitContainers: [] # - name: do-something # image: busybox # command: ['do', 'something'] # This is the PriorityClass settings as defined in # https://kubernetes.io/docs/concepts/configuration/pod-priority-preemption/#priorityclass priorityClassName: "" # By default this will make sure two pods don't end up on the same node # Changing this to a region would allow you to spread pods across regions antiAffinityTopologyKey: "kubernetes.io/hostname" # Hard means that by default pods will only be scheduled if there are enough nodes for them # and that they will never end up on the same node. Setting this to soft will do this "best effort" antiAffinity: "hard" # This is the node affinity settings as defined in # https://kubernetes.io/docs/concepts/configuration/assign-pod-node/#node-affinity-beta-feature nodeAffinity: {} # The default is to deploy all pods serially. By setting this to parallel all pods are started at # the same time when bootstrapping the cluster podManagementPolicy: "Parallel" # The environment variables injected by service links are not used, but can lead to slow Elasticsearch boot times when # there are many services in the current namespace. # If you experience slow pod startups you probably want to set this to `false`. enableServiceLinks: true protocol: http httpPort: 9200 transportPort: 9300 service: labels: {} labelsHeadless: {} type: ClusterIP nodePort: "" annotations: {} httpPortName: http transportPortName: transport loadBalancerIP: "" loadBalancerSourceRanges: [] externalTrafficPolicy: "" updateStrategy: RollingUpdate # This is the max unavailable setting for the pod disruption budget # The default value of 1 will make sure that kubernetes won't allow more than 1 # of your pods to be unavailable during maintenance maxUnavailable: 1 podSecurityContext: fsGroup: 1000 runAsUser: 1000 securityContext: capabilities: drop: - ALL #readOnlyRootFilesystem: false runAsNonRoot: true runAsUser: 1000 # How long to wait for elasticsearch to stop gracefully terminationGracePeriod: 120 sysctlVmMaxMapCount: 262144 readinessProbe: failureThreshold: 3 initialDelaySeconds: 10 periodSeconds: 10 successThreshold: 3 timeoutSeconds: 5 # https://www.elastic.co/guide/en/elasticsearch/reference/7.9/cluster-health.html#request-params wait_for_status clusterHealthCheckParams: "wait_for_status=green&timeout=1s" ## Use an alternate scheduler. ## ref: https://kubernetes.io/docs/tasks/administer-cluster/configure-multiple-schedulers/ ## schedulerName: "" imagePullSecrets: [] nodeSelector: {} tolerations: [] # - effect: NoSchedule # key: node-role.kubernetes.io/master # Enabling this will publically expose your Elasticsearch instance. # Only enable this if you have security enabled on your cluster ingress: enabled: false annotations: {} # kubernetes.io/ingress.class: nginx # kubernetes.io/tls-acme: "true" path: / hosts: - chart-example.local tls: [] # - secretName: chart-example-tls # hosts: # - chart-example.local nameOverride: "" fullnameOverride: "" # https://github.com/elastic/helm-charts/issues/63 masterTerminationFix: false lifecycle: {} # preStop: # exec: # command: ["/bin/sh", "-c", "echo Hello from the postStart handler > /usr/share/message"] # postStart: # exec: # command: # - bash # - -c # - | # #!/bin/bash # # Add a template to adjust number of shards/replicas # TEMPLATE_NAME=my_template # INDEX_PATTERN="logstash-*" # SHARD_COUNT=8 # REPLICA_COUNT=1 # ES_URL=http://localhost:9200 # while [[ "$(curl -s -o /dev/null -w '%{http_code}\n' $ES_URL)" != "200" ]]; do sleep 1; done # curl -XPUT "$ES_URL/_template/$TEMPLATE_NAME" -H 'Content-Type: application/json' -d'{"index_patterns":['\""$INDEX_PATTERN"\"'],"settings":{"number_of_shards":'$SHARD_COUNT',"number_of_replicas":'$REPLICA_COUNT'}}' sysctlInitContainer: enabled: true keystore: [] # Deprecated # please use the above podSecurityContext.fsGroup instead fsGroup: ""
Issue There is an issue with elasticsearch readiness probe when running on single replica cluster. Warning Unhealthy 91s (x14 over 3m42s) kubelet Readiness probe failed: Waiting for elasticsearch cluster to become ready (request params: "wait_for_status=green&timeout=1s" ) Cluster is not yet ready (request params: "wait_for_status=green&timeout=1s" ) Solution As mentioned here by #adinhodovic If your running a single replica cluster add the following helm value: clusterHealthCheckParams: "wait_for_status=yellow&timeout=1s" Your status will never go green with a single replica cluster. The following values should work: replicas: 1 minimumMasterNodes: 1 clusterHealthCheckParams: 'wait_for_status=yellow&timeout=1s'
ElastAlert filter not worked as expected
I have installed elastAlert. Below is my config and yaml file configuration: Config file : rules_folder: rules run_every: minutes: 15 buffer_time: minutes: 15 es_host: ip_address(#####) es_port: 9200 writeback_index: elastalert_status writeback_alias: elastalert_alerts alert_time_limit: days: 2 logging: version: 1 incremental: false disable_existing_loggers: false formatters: logline: format: '%(asctime)s %(levelname)+8s %(name)+20s %(message)s' handlers: console: class: logging.StreamHandler formatter: logline level: DEBUG stream: ext://sys.stderr file: class : logging.FileHandler formatter: logline level: DEBUG filename: elastalert.log loggers: elastalert: level: WARN handlers: [] propagate: true elasticsearch: level: WARN handlers: [] propagate: true Example_frequency.yamlfile: es_host: ip adress(####) es_port: 9200 name: FaultExceptions type: frequency index: logstash_* num_events: 5 timeframe: minutes: 15 filter: -query: query_string: query: "ErrorGroup: Fault Exception" alert: -"email" email: -"abc#gmail.com" I am getting the mail in each 15 min but that data does not match with filter where ErrorGroup name should be Fault Exception. Please help me to understand this as I am working on this since last 4 days, Thanks in advance.
Hope not very late, but yes use --es_debug_trace command line option. It helps to see exact query being sent in curl: python3 -m elastalert.elastalert --verbose --rule your_rule_to_test.yaml --es_debug_trace /tmp/elastalert_curl.log The curl command in /tmp/elastalert_curl.log can then be fired in terminal to see the output or tweaked to see what went wrong. You can use Kibana Dev Tools to then check the curl command and test. Also confirm ErrorGroup is at the root level of the document index and try ErrorGroup.keyword.