Missing queues from RabbitMQ Metricbeat - elasticsearch

It looks like only a fraction of the queues on my RabbitMQ cluster are making it into Elasticsearch via Metricbeat.
When I query RabbitMQ's /api/overview, I see 887 queues reported:
object_totals: {
consumers: 517,
queues: 887,
exchanges: 197,
connections: 305,
channels: 622
},
When I query RabbitMQ's /api/queues (which is what Metricbeat hits), I count 887 queues there as well.
When I get a unique count of the field rabbitmq.queue.name in Elasticsearch, I am seeing only 309 queues.
I don't see anything in the debug output that stands out to me. It's just the usual INFO level startup messages, followed by the publish information:
root#rabbitmq:/etc/metricbeat# metricbeat -e
2019-06-24T21:13:33.692Z INFO instance/beat.go:571 Home path: [/usr/share/metricbeat] Config path: [/etc/metricbeat] Data path: [/var/lib/metricbeat] Logs path: [/var/log/metricbeat]
2019-06-24T21:13:33.692Z INFO instance/beat.go:579 Beat ID: xxx
2019-06-24T21:13:33.692Z INFO [index-management.ilm] ilm/ilm.go:129 Policy name: metricbeat-7.1.1
2019-06-24T21:13:33.692Z INFO [seccomp] seccomp/seccomp.go:116 Syscall filter successfully installed
2019-06-24T21:13:33.692Z INFO [beat] instance/beat.go:827 Beat info {"system_info": {"beat": {"path": {"config": "/etc/metricbeat", "data": "/var/lib/metricbeat", "home": "/usr/share/metricbeat", "logs": "/var/log/metricbeat"}, "type": "metricbeat", "uuid": "xxx"}}}
2019-06-24T21:13:33.692Z INFO [beat] instance/beat.go:836 Build info {"system_info": {"build": {"commit": "3358d9a5a09e3c6709a2d3aaafde628ea34e8419", "libbeat": "7.1.1", "time": "2019-05-23T13:23:10.000Z", "version": "7.1.1"}}}
2019-06-24T21:13:33.692Z INFO [beat] instance/beat.go:839 Go runtime info {"system_info": {"go": {"os":"linux","arch":"amd64","max_procs":4,"version":"go1.11.5"}}}
[...]
2019-06-24T21:13:33.694Z INFO [beat] instance/beat.go:872 Process info {"system_info": {"process": {"capabilities": {"inheritable":null,"permitted":["chown","dac_override","dac_read_search","fowner","fsetid","kill","setgid","setuid","setpcap","linux_immutable","net_bind_service","net_broadcast","net_admin","net_raw","ipc_lock","ipc_owner","sys_module","sys_rawio","sys_chroot","sys_ptrace","sys_pacct","sys_admin","sys_boot","sys_nice","sys_resource","sys_time","sys_tty_config","mknod","lease","audit_write","audit_control","setfcap","mac_override","mac_admin","syslog","wake_alarm","block_suspend","audit_read"],"effective":["chown","dac_override","dac_read_search","fowner","fsetid","kill","setgid","setuid","setpcap","linux_immutable","net_bind_service","net_broadcast","net_admin","net_raw","ipc_lock","ipc_owner","sys_module","sys_rawio","sys_chroot","sys_ptrace","sys_pacct","sys_admin","sys_boot","sys_nice","sys_resource","sys_time","sys_tty_config","mknod","lease","audit_write","audit_control","setfcap","mac_override","mac_admin","syslog","wake_alarm","block_suspend","audit_read"],"bounding":["chown","dac_override","dac_read_search","fowner","fsetid","kill","setgid","setuid","setpcap","linux_immutable","net_bind_service","net_broadcast","net_admin","net_raw","ipc_lock","ipc_owner","sys_module","sys_rawio","sys_chroot","sys_ptrace","sys_pacct","sys_admin","sys_boot","sys_nice","sys_resource","sys_time","sys_tty_config","mknod","lease","audit_write","audit_control","setfcap","mac_override","mac_admin","syslog","wake_alarm","block_suspend","audit_read"],"ambient":null}, "cwd": "/etc/metricbeat", "exe": "/usr/share/metricbeat/bin/metricbeat", "name": "metricbeat", "pid": 30898, "ppid": 30405, "seccomp": {"mode":"filter","no_new_privs":true}, "start_time": "2019-06-24T21:13:33.100Z"}}}
2019-06-24T21:13:33.694Z INFO instance/beat.go:280 Setup Beat: metricbeat; Version: 7.1.1
2019-06-24T21:13:33.694Z INFO [publisher] pipeline/module.go:97 Beat name: metricbeat
2019-06-24T21:13:33.694Z INFO instance/beat.go:391 metricbeat start running.
2019-06-24T21:13:33.694Z INFO cfgfile/reload.go:150 Config reloader started
2019-06-24T21:13:33.694Z INFO [monitoring] log/log.go:117 Starting metrics logging every 30s
[...]
2019-06-24T21:13:43.696Z INFO filesystem/filesystem.go:57 Ignoring filesystem types: sysfs, rootfs, ramfs, bdev, proc, cpuset, cgroup, cgroup2, tmpfs, devtmpfs, configfs, debugfs, tracefs, securityfs, sockfs, dax, bpf, pipefs, hugetlbfs, devpts, ecryptfs, fuse, fusectl, pstore, mqueue, autofs
2019-06-24T21:13:43.696Z INFO fsstat/fsstat.go:59 Ignoring filesystem types: sysfs, rootfs, ramfs, bdev, proc, cpuset, cgroup, cgroup2, tmpfs, devtmpfs, configfs, debugfs, tracefs, securityfs, sockfs, dax, bpf, pipefs, hugetlbfs, devpts, ecryptfs, fuse, fusectl, pstore, mqueue, autofs
2019-06-24T21:13:44.696Z INFO pipeline/output.go:95 Connecting to backoff(async(tcp://xxx))
2019-06-24T21:13:44.711Z INFO pipeline/output.go:105 Connection to backoff(async(tcp://xxx)) established
2019-06-24T21:14:03.696Z INFO [monitoring] log/log.go:144 Non-zero metrics in the last 30s {"monitoring": {"metrics": {"beat":{"cpu":{"system":{"ticks":130,"time":{"ms":131}},"total":{"ticks":1960,"time":{"ms":1965},"value":1960},"user":{"ticks":1830,"time":{"ms":1834}}},"handles":{"limit":{"hard":1048576,"soft":1024},"open":12},"info":{"ephemeral_id":"xxx","uptime":{"ms":30030}},"memstats":{"gc_next":30689808,"memory_alloc":21580680,"memory_total":428076400,"rss":79917056}},"libbeat":{"config":{"module":{"running":0},"reloads":2},"output":{"events":{"acked":7825,"batches":11,"total":7825},"read":{"bytes":66},"type":"logstash","write":{"bytes":870352}},"pipeline":{"clients":4,"events":{"active":313,"published":8138,"retry":523,"total":8138},"queue":{"acked":7825}}},"metricbeat":{"rabbitmq":{"connection":{"events":2987,"failures":10,"success":2977},"exchange":{"events":1970,"success":1970},"node":{"events":10,"success":10},"queue":{"events":3130,"failures":10,"success":3120}},"system":{"cpu":{"events":2,"success":2},"filesystem":{"events":7,"success":7},"fsstat":{"events":1,"success":1},"load":{"events":2,"success":2},"memory":{"events":2,"success":2},"network":{"events":4,"success":4},"process":{"events":18,"success":18},"process_summary":{"events":2,"success":2},"socket_summary":{"events":2,"success":2},"uptime":{"events":1,"success":1}}},"system":{"cpu":{"cores":4},"load":{"1":0.48,"15":0.28,"5":0.15,"norm":{"1":0.12,"15":0.07,"5":0.0375}}}}}}
I think if there were a problem getting the queue, I should see an error in the logs above as per https://github.com/elastic/beats/blob/master/metricbeat/module/rabbitmq/queue/data.go#L94-L104
Here's the metricbeat.yml:
metricbeat.config.modules:
path: ${path.config}/modules.d/*.yml
reload.enabled: true
reload.period: 10s
setup.template.settings:
index.number_of_shards: 1
index.codec: best_compression
name: metricbeat
fields:
environment: development
processors:
- add_cloud_metadata: ~
output.logstash:
hosts: ["xxx"]
Here's the modules.d/rabbitmq.yml:
- module: rabbitmq
metricsets: ["node", "queue", "connection", "exchange"]
enabled: true
period: 2s
hosts: ["xxx"]
username: xxx
password: xxx

I solved it by upgrading Elastic Stack from 7.1.1 to 7.2.0.

Related

Error in Filebeat logs - not able to view data in kibana

Recently upgraded to 7.17.7 filebeat. Using elasticsearch, kibana and filebeat, all 7.17.7. However , I am not able to see the logs in kibana, as filebeat is not sending the logs to elasticsearch and kibana. In filebeat saw error -
ERROR [publisher_pipeline_output] pipeline/output.go:154 Failed to connect to backoff(elasticsearch(http://localhost:9200)): Connection marked as failed because the onConnect callback failed: resource 'filebeat-7.17.7' exists, but it is not an alias
Can someone help to figure out what could be the cause and solution for this error?
restarted filebeat, but didnt help.
Filebeat config -
filebeat.inputs:
- type: log
enabled: true
paths:
- /var/www/vhosts/rshop/current/var/log/*.log
multiline.pattern: ^\[[0-9]{4}-[0-9]{2}-[0-9]{2}
multiline.negate: true
multiline.match: after
filebeat.config.modules:
path: ${path.config}/modules.d/*.yml
reload.enabled: false
setup.template.settings:
index.number_of_shards: 3
setup.ilm.enabled: false
setup.kibana:
output.elasticsearch:
hosts: ["localhost:9200"]
indices:
- index: "r-logs-%{[agent.version]}-%{+yyyy.MM.dd}"
when.regexp:
log.file.path: '^.+\/var\/log\/recalculation\.log$'
pipelines:
- pipeline: "filebeat-6.8.7-monolog-pipeline"
when.or:
- regexp:
log.file.path: '^.+\/var\/log\/recalculation\.log$'
processors:
- add_host_metadata: ~
- add_cloud_metadata: ~
logging.level: info
logging.to_files: true
logging.files:
path: /var/log/filebeat
name: filebeat
keepfiles: 7
permissions: 0755
#NKumar most likely its an upgrade issue from legacy to new index templates, which will happen if you don't mark them as true for overwriting.
Can you please provide info from what version of stack did you upgrade to 7.17?
Also, the quick solution would be to just add an alias to your filebeat index as:
POST /_aliases
{
"actions" : [
{
"add" : {
"index" : "filebeat-7.17.7",
"alias" : "filebeat-7.17.7_1",
"is_write_index" : true
}
}
]}
or a more persistent solution would be to add following setting in filebeat:
setup.template.settings:
setup.template.enabled: true
setup.template.overwrite: true

Trying to Verify ELK installation , kibana dashboard not showing logbeats in discover tab

I used helm to load the ELK stack on kubernetes.
I ran the following commands
minikube start --cpus 4 --memory 8192
minikube addons enable ingress
helm repo add elastic https://helm.elastic.co
helm repo update
Then deployed elasticsearch
values-02.yml
replicas: 1
minimumMasterNodes: 1
ingress:
enabled: true
hosts:
- host: es-elk.s9.devopscloud.link #Change the hostname to the one you need
paths:
- path: /
volumeClaimTemplate:
accessModes: ["ReadWriteOnce"]
resources:
requests:
storage: 1Gi
Applied it
helm install elk-elasticsearch elastic/elasticsearch -f values-02.yml
Then deployed kibana values-03.yml
elasticsearchHosts: "http://elasticsearch-master:9200"
ingress:
enabled: true
className: "nginx"
hosts:
- host:
paths:
- path: /
Applied it
helm install elk-kibana elastic/kibana -f values-03.yml
Then deployed logstash
persistence:
enabled: true
logstashConfig:
logstash.yml: |
http.host: 0.0.0.0
xpack.monitoring.enabled: false
logstashPipeline:
logstash.conf: |
input {
beats {
port => 5044
}
}
output {
elasticsearch {
hosts => "http://elasticsearch-master.logging.svc.cluster.local:9200"
manage_template => false
index => "%{[#metadata][beat]}-%{+YYYY.MM.dd}"
document_type => "%{[#metadata][type]}"
}
}
service:
type: ClusterIP
ports:
- name: beats
port: 5044
protocol: TCP
targetPort: 5044
- name: http
port: 8080
protocol: TCP
targetPort: 8080
Applied it
helm install elk-logstash elastic/logstash -f values-04.yaml
Then deployed filebeat values-05.yaml
daemonset:
filebeatConfig:
filebeat.yml: |
filebeat.inputs:
- type: container
paths:
- /var/log/containers/*.log
processors:
- add_kubernetes_metadata:
host: ${NODE_NAME}
matchers:
- logs_path:
logs_path: "/var/log/containers/"
output.logstash:
hosts: ["elk-logstash-logstash:5044"]
Then applied it
helm install elk-filebeat elastic/filebeat -f values-05.yaml
All up and running
kubectl get pods
NAME READY STATUS RESTARTS AGE
elasticsearch-master-0 1/1 Running 0 61m
elk-filebeat-filebeat-ggjhc 1/1 Running 0 45m
elk-kibana-kibana-6d658894bf-grb8x 1/1 Running 0 52m
elk-logstash-logstash-0 1/1 Running 0 47m
But when I go onto the discover page
http://172.21.95.140/app/management/kibana/indexPatterns?bannerMessage=To%20visualize%20and%20explore%20data%20in%20Kibana,%20you%20must%20create%20an%20index%20pattern%20to%20retrieve%20data%20from%20Elasticsearch.
It does not show anything, for filebeats
Instead I get a Ready to try Kibana? First, you need data message.
I was following this tutorial
https://blog.knoldus.com/how-to-deploy-elk-stack-on-kubernetes/#deploy-elastic-search
I followed this tutorial and ran the default kibana and filebeat yaml files.

Greenplum Operator on kubernetes zapr error

I am trying to deploy Greenplum Operator on kubernetes and I get the following error:
kubectl describe pod greenplum-operator-87d989b4d-ldft6:
Name: greenplum-operator-87d989b4d-ldft6
Namespace: greenplum
Priority: 0
Node: node-1/some-ip
Start Time: Mon, 23 May 2022 14:07:26 +0200
Labels: app=greenplum-operator
pod-template-hash=87d989b4d
Annotations: cni.projectcalico.org/podIP: some-ip
cni.projectcalico.org/podIPs: some-ip
Status: Running
IP: some-ip
IPs:
IP: some-ip
Controlled By: ReplicaSet/greenplum-operator-87d989b4d
Containers:
greenplum-operator:
Container ID: docker://364997050b1f337ff61b8ce40534697bbc13aae29f7b9ae5255245375acce03f
Image: greenplum-operator:v2.3.0
Image ID: docker-pullable://greenplum-operator:v2.3.0
Port: <none>
Host Port: <none>
Command:
greenplum-operator
--logLevel
debug
State: Waiting
Reason: CrashLoopBackOff
Last State: Terminated
Reason: Error
Exit Code: 1
Started: Mon, 23 May 2022 15:29:59 +0200
Finished: Mon, 23 May 2022 15:30:32 +0200
Ready: False
Restart Count: 19
Environment:
GREENPLUM_IMAGE_REPO: greenplum-operator:v2.3.0
GREENPLUM_IMAGE_TAG: v2.3.0
OPERATOR_IMAGE_REPO: greenplum-operator:v2.3.0
OPERATOR_IMAGE_TAG: v2.3.0
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from greenplum-system-operator-token-xcz4q (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
greenplum-system-operator-token-xcz4q:
Type: Secret (a volume populated by a Secret)
SecretName: greenplum-system-operator-token-xcz4q
Optional: false
QoS Class: BestEffort
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning BackOff 32s (x340 over 84m) kubelet Back-off restarting failed container
kubectl logs greenplum-operator-87d989b4d-ldft6
{"level":"INFO","ts":"2022-05-23T13:35:38.735Z","logger":"setup","msg":"Go Info","Version":"go1.14.10","GOOS":"linux","GOARCH":"amd64"}
{"level":"INFO","ts":"2022-05-23T13:35:41.242Z","logger":"controller-runtime.metrics","msg":"metrics server is starting to listen","addr":":8080"}
{"level":"INFO","ts":"2022-05-23T13:35:41.262Z","logger":"setup","msg":"starting manager"}
{"level":"INFO","ts":"2022-05-23T13:35:41.262Z","logger":"admission","msg":"starting greenplum validating admission webhook server"}
{"level":"INFO","ts":"2022-05-23T13:35:41.262Z","logger":"controller-runtime.controller","msg":"Starting EventSource","controller":"greenplumpxfservice","source":"kind source: /, Kind="}
{"level":"INFO","ts":"2022-05-23T13:35:41.264Z","logger":"controller-runtime.controller","msg":"Starting EventSource","controller":"greenplumplservice","source":"kind source: /, Kind="}
{"level":"INFO","ts":"2022-05-23T13:35:41.264Z","logger":"controller-runtime.controller","msg":"Starting EventSource","controller":"greenplumcluster","source":"kind source: /, Kind="}
{"level":"INFO","ts":"2022-05-23T13:35:41.262Z","logger":"controller-runtime.manager","msg":"starting metrics server","path":"/metrics"}
{"level":"INFO","ts":"2022-05-23T13:35:41.265Z","logger":"controller-runtime.controller","msg":"Starting EventSource","controller":"greenplumtextservice","source":"kind source: /, Kind="}
{"level":"INFO","ts":"2022-05-23T13:35:41.361Z","logger":"admission","msg":"CertificateSigningRequest: created"}
{"level":"INFO","ts":"2022-05-23T13:35:41.363Z","logger":"controller-runtime.controller","msg":"Starting EventSource","controller":"greenplumpxfservice","source":"kind source: /, Kind="}
{"level":"INFO","ts":"2022-05-23T13:35:41.364Z","logger":"controller-runtime.controller","msg":"Starting EventSource","controller":"greenplumplservice","source":"kind source: /, Kind="}
{"level":"INFO","ts":"2022-05-23T13:35:41.364Z","logger":"controller-runtime.controller","msg":"Starting EventSource","controller":"greenplumcluster","source":"kind source: /, Kind="}
{"level":"INFO","ts":"2022-05-23T13:35:41.366Z","logger":"controller-runtime.controller","msg":"Starting EventSource","controller":"greenplumtextservice","source":"kind source: /, Kind="}
{"level":"INFO","ts":"2022-05-23T13:35:41.464Z","logger":"controller-runtime.controller","msg":"Starting Controller","controller":"greenplumpxfservice"}
{"level":"INFO","ts":"2022-05-23T13:35:41.464Z","logger":"controller-runtime.controller","msg":"Starting Controller","controller":"greenplumplservice"}
{"level":"INFO","ts":"2022-05-23T13:35:41.465Z","logger":"controller-runtime.controller","msg":"Starting workers","controller":"greenplumplservice","worker count":1}
{"level":"INFO","ts":"2022-05-23T13:35:41.465Z","logger":"controller-runtime.controller","msg":"Starting Controller","controller":"greenplumcluster"}
{"level":"INFO","ts":"2022-05-23T13:35:41.465Z","logger":"controller-runtime.controller","msg":"Starting workers","controller":"greenplumpxfservice","worker count":1}
{"level":"INFO","ts":"2022-05-23T13:35:41.465Z","logger":"controller-runtime.controller","msg":"Starting workers","controller":"greenplumcluster","worker count":1}
{"level":"INFO","ts":"2022-05-23T13:35:41.466Z","logger":"controller-runtime.controller","msg":"Starting Controller","controller":"greenplumtextservice"}
{"level":"INFO","ts":"2022-05-23T13:35:41.466Z","logger":"controller-runtime.controller","msg":"Starting workers","controller":"greenplumtextservice","worker count":1}
{"level":"ERROR","ts":"2022-05-23T13:36:11.368Z","logger":"setup","msg":"error","error":"getting certificate for webhook: failure while waiting for approval: timed out waiting for the condition","errorCauses":[{"error":"getting certificate for webhook: failure while waiting for approval: timed out waiting for the condition"}],"stacktrace":"github.com/go-logr/zapr.(*zapLogger).Error\n\t/go/pkg/mod/github.com/go-logr/zapr#v0.1.0/zapr.go:128\nmain.main\n\t/greenplum-for-kubernetes/greenplum-operator/cmd/greenplumOperator/main.go:35\nruntime.main\n\t/usr/local/go/src/runtime/proc.go:203"}
I tried to redeploy the cert-manager and check logs but couldn't find anything. Documentation of the greenplum-for-kubernetes doesn't mention anything about that. Read the whole troubleshooting document from the pivotal website too

K8s Elasticsearch with filebeat is keeping 'not ready' after rebooting

I'm going through a not very understandable situation.
Environment
Two dedicated nodes with azure centos 8.2 (2vcpu, 16G ram), not AKS
1 master node, 1 worker node.
kubernetes v1.19.3
helm v2.16.12
Helm charts Elastic (https://github.com/elastic/helm-charts/tree/7.9.3)
At the first time, It works fine with below installation.
## elasticsearch, filebeat
# kubectl apply -f pv.yaml
# helm install -f values.yaml --name elasticsearch elastic/elasticsearch
# helm install --name filebeat --version 7.9.3 elastic/filebeat
curl elasitcsearchip:9200 and curl elasitcsearchip:9200/_cat/indices
show right values.
but after rebooting a worker node, it just keeping ready 0/1 and not working.
NAME READY STATUS RESTARTS AGE
elasticsearch-master-0 0/1 Running 10 71m
filebeat-filebeat-67qm2 0/1 Running 4 40m
In this situation, after removing /mnt/data/nodes and rebooting again
then works fine.
elasticsearch pod has nothing special I think.
#describe
{"type": "server", "timestamp": "2020-10-26T07:49:49,708Z", "level": "INFO", "component": "o.e.c.r.a.AllocationService", "cluster.name": "elasticsearch", "node.name": "elasticsearch-master-0", "message": "Cluster health status changed from [RED] to [YELLOW] (reason: [shards started [[filebeat-7.9.3-2020.10.26-000001][0]]]).", "cluster.uuid": "sWUAXJG9QaKyZDe0BLqwSw", "node.id": "ztb35hToRf-2Ahr7olympw" }
#logs
Normal SandboxChanged 4m4s (x3 over 4m9s) kubelet Pod sandbox changed, it will be killed and re-created.
Normal Pulled 4m3s kubelet Container image "docker.elastic.co/elasticsearch/elasticsearch:7.9.3" already present on machine
Normal Created 4m1s kubelet Created container configure-sysctl
Normal Started 4m1s kubelet Started container configure-sysctl
Normal Pulled 3m58s kubelet Container image "docker.elastic.co/elasticsearch/elasticsearch:7.9.3" already present on machine
Normal Created 3m58s kubelet Created container elasticsearch
Normal Started 3m57s kubelet Started container elasticsearch
Warning Unhealthy 91s (x14 over 3m42s) kubelet Readiness probe failed: Waiting for elasticsearch cluster to become ready (request params: "wait_for_status=green&timeout=1s" )
Cluster is not yet ready (request params: "wait_for_status=green&timeout=1s" )
#events
6m1s Normal Pulled pod/elasticsearch-master-0 Container image "docker.elastic.co/elasticsearch/elasticsearch:7.9.3" already present on machine
6m1s Normal Pulled pod/filebeat-filebeat-67qm2 Container image "docker.elastic.co/beats/filebeat:7.9.3" already present on machine
5m59s Normal Started pod/elasticsearch-master-0 Started container configure-sysctl
5m59s Normal Created pod/elasticsearch-master-0 Created container configure-sysctl
5m59s Normal Created pod/filebeat-filebeat-67qm2 Created container filebeat
5m58s Normal Started pod/filebeat-filebeat-67qm2 Started container filebeat
5m56s Normal Created pod/elasticsearch-master-0 Created container elasticsearch
5m56s Normal Pulled pod/elasticsearch-master-0 Container image "docker.elastic.co/elasticsearch/elasticsearch:7.9.3" already present on machine
5m55s Normal Started pod/elasticsearch-master-0 Started container elasticsearch
61s Warning Unhealthy pod/filebeat-filebeat-67qm2 Readiness probe failed: elasticsearch: http://elasticsearch-master:9200...
parse url... OK
connection...
parse host... OK
dns lookup... OK
addresses: 10.97.133.135
dial up... ERROR dial tcp 10.97.133.135:9200: connect: connection refused
59s Warning Unhealthy pod/elasticsearch-master-0 Readiness probe failed: Waiting for elasticsearch cluster to become ready (request params: "wait_for_status=green&timeout=1s" )
Cluster is not yet ready (request params: "wait_for_status=green&timeout=1s" )
/mnt/data path has chown 1000:1000
and In case of only elastisearch without filebeat, rebooting has no problem.
I can't figure this out at all. :(
What am I missing?
pv.yaml
kind: PersistentVolume
apiVersion: v1
metadata:
name: elastic-pv
labels:
type: local
app: elastic
spec:
storageClassName: local-storage
capacity:
storage: 10Gi
accessModes:
- ReadWriteOnce
persistentVolumeReclaimPolicy: Retain
claimRef:
namespace: default
name: elasticsearch-master-elasticsearch-master-0
hostPath:
path: "/mnt/data"
values.yaml
---
clusterName: "elasticsearch"
nodeGroup: "master"
# The service that non master groups will try to connect to when joining the cluster
# This should be set to clusterName + "-" + nodeGroup for your master group
masterService: ""
# Elasticsearch roles that will be applied to this nodeGroup
# These will be set as environment variables. E.g. node.master=true
roles:
master: "true"
ingest: "true"
data: "true"
replicas: 1
minimumMasterNodes: 1
esMajorVersion: ""
# Allows you to add any config files in /usr/share/elasticsearch/config/
# such as elasticsearch.yml and log4j2.properties
esConfig: {}
# elasticsearch.yml: |
# key:
# nestedkey: value
# log4j2.properties: |
# key = value
# Extra environment variables to append to this nodeGroup
# This will be appended to the current 'env:' key. You can use any of the kubernetes env
# syntax here
extraEnvs: []
# - name: MY_ENVIRONMENT_VAR
# value: the_value_goes_here
# Allows you to load environment variables from kubernetes secret or config map
envFrom: []
# - secretRef:
# name: env-secret
# - configMapRef:
# name: config-map
# A list of secrets and their paths to mount inside the pod
# This is useful for mounting certificates for security and for mounting
# the X-Pack license
secretMounts: []
# - name: elastic-certificates
# secretName: elastic-certificates
# path: /usr/share/elasticsearch/config/certs
# defaultMode: 0755
image: "docker.elastic.co/elasticsearch/elasticsearch"
imageTag: "7.9.3"
imagePullPolicy: "IfNotPresent"
podAnnotations: {}
# iam.amazonaws.com/role: es-cluster
# additionals labels
labels: {}
esJavaOpts: "-Xmx1g -Xms1g"
resources:
requests:
cpu: "500m"
memory: "1Gi"
limits:
cpu: "1000m"
memory: "2Gi"
initResources: {}
# limits:
# cpu: "25m"
# # memory: "128Mi"
# requests:
# cpu: "25m"
# memory: "128Mi"
sidecarResources: {}
# limits:
# cpu: "25m"
# # memory: "128Mi"
# requests:
# cpu: "25m"
# memory: "128Mi"
networkHost: "0.0.0.0"
volumeClaimTemplate:
accessModes: [ "ReadWriteOnce" ]
storageClassName: local-storage
resources:
requests:
storage: 5Gi
rbac:
create: false
serviceAccountAnnotations: {}
serviceAccountName: ""
podSecurityPolicy:
create: false
name: ""
spec:
privileged: true
fsGroup:
rule: RunAsAny
runAsUser:
rule: RunAsAny
seLinux:
rule: RunAsAny
supplementalGroups:
rule: RunAsAny
volumes:
- secret
- configMap
- persistentVolumeClaim
persistence:
enabled: true
name: elastic-vc
labels:
# Add default labels for the volumeClaimTemplate fo the StatefulSet
app: elastic
annotations: {}
extraVolumes: []
# - name: extras
# emptyDir: {}
extraVolumeMounts: []
# - name: extras
# mountPath: /usr/share/extras
# readOnly: true
extraContainers: []
# - name: do-something
# image: busybox
# command: ['do', 'something']
extraInitContainers: []
# - name: do-something
# image: busybox
# command: ['do', 'something']
# This is the PriorityClass settings as defined in
# https://kubernetes.io/docs/concepts/configuration/pod-priority-preemption/#priorityclass
priorityClassName: ""
# By default this will make sure two pods don't end up on the same node
# Changing this to a region would allow you to spread pods across regions
antiAffinityTopologyKey: "kubernetes.io/hostname"
# Hard means that by default pods will only be scheduled if there are enough nodes for them
# and that they will never end up on the same node. Setting this to soft will do this "best effort"
antiAffinity: "hard"
# This is the node affinity settings as defined in
# https://kubernetes.io/docs/concepts/configuration/assign-pod-node/#node-affinity-beta-feature
nodeAffinity: {}
# The default is to deploy all pods serially. By setting this to parallel all pods are started at
# the same time when bootstrapping the cluster
podManagementPolicy: "Parallel"
# The environment variables injected by service links are not used, but can lead to slow Elasticsearch boot times when
# there are many services in the current namespace.
# If you experience slow pod startups you probably want to set this to `false`.
enableServiceLinks: true
protocol: http
httpPort: 9200
transportPort: 9300
service:
labels: {}
labelsHeadless: {}
type: ClusterIP
nodePort: ""
annotations: {}
httpPortName: http
transportPortName: transport
loadBalancerIP: ""
loadBalancerSourceRanges: []
externalTrafficPolicy: ""
updateStrategy: RollingUpdate
# This is the max unavailable setting for the pod disruption budget
# The default value of 1 will make sure that kubernetes won't allow more than 1
# of your pods to be unavailable during maintenance
maxUnavailable: 1
podSecurityContext:
fsGroup: 1000
runAsUser: 1000
securityContext:
capabilities:
drop:
- ALL
#readOnlyRootFilesystem: false
runAsNonRoot: true
runAsUser: 1000
# How long to wait for elasticsearch to stop gracefully
terminationGracePeriod: 120
sysctlVmMaxMapCount: 262144
readinessProbe:
failureThreshold: 3
initialDelaySeconds: 10
periodSeconds: 10
successThreshold: 3
timeoutSeconds: 5
# https://www.elastic.co/guide/en/elasticsearch/reference/7.9/cluster-health.html#request-params wait_for_status
clusterHealthCheckParams: "wait_for_status=green&timeout=1s"
## Use an alternate scheduler.
## ref: https://kubernetes.io/docs/tasks/administer-cluster/configure-multiple-schedulers/
##
schedulerName: ""
imagePullSecrets: []
nodeSelector: {}
tolerations: []
# - effect: NoSchedule
# key: node-role.kubernetes.io/master
# Enabling this will publically expose your Elasticsearch instance.
# Only enable this if you have security enabled on your cluster
ingress:
enabled: false
annotations: {}
# kubernetes.io/ingress.class: nginx
# kubernetes.io/tls-acme: "true"
path: /
hosts:
- chart-example.local
tls: []
# - secretName: chart-example-tls
# hosts:
# - chart-example.local
nameOverride: ""
fullnameOverride: ""
# https://github.com/elastic/helm-charts/issues/63
masterTerminationFix: false
lifecycle: {}
# preStop:
# exec:
# command: ["/bin/sh", "-c", "echo Hello from the postStart handler > /usr/share/message"]
# postStart:
# exec:
# command:
# - bash
# - -c
# - |
# #!/bin/bash
# # Add a template to adjust number of shards/replicas
# TEMPLATE_NAME=my_template
# INDEX_PATTERN="logstash-*"
# SHARD_COUNT=8
# REPLICA_COUNT=1
# ES_URL=http://localhost:9200
# while [[ "$(curl -s -o /dev/null -w '%{http_code}\n' $ES_URL)" != "200" ]]; do sleep 1; done
# curl -XPUT "$ES_URL/_template/$TEMPLATE_NAME" -H 'Content-Type: application/json' -d'{"index_patterns":['\""$INDEX_PATTERN"\"'],"settings":{"number_of_shards":'$SHARD_COUNT',"number_of_replicas":'$REPLICA_COUNT'}}'
sysctlInitContainer:
enabled: true
keystore: []
# Deprecated
# please use the above podSecurityContext.fsGroup instead
fsGroup: ""
Issue
There is an issue with elasticsearch readiness probe when running on single replica cluster.
Warning Unhealthy 91s (x14 over 3m42s) kubelet Readiness probe failed: Waiting for elasticsearch cluster to become ready (request params: "wait_for_status=green&timeout=1s" )
Cluster is not yet ready (request params: "wait_for_status=green&timeout=1s" )
Solution
As mentioned here by #adinhodovic
If your running a single replica cluster add the following helm value:
clusterHealthCheckParams: "wait_for_status=yellow&timeout=1s"
Your status will never go green with a single replica cluster.
The following values should work:
replicas: 1
minimumMasterNodes: 1
clusterHealthCheckParams: 'wait_for_status=yellow&timeout=1s'

ElastAlert filter not worked as expected

I have installed elastAlert.
Below is my config and yaml file configuration:
Config file :
rules_folder: rules
run_every:
minutes: 15
buffer_time:
minutes: 15
es_host: ip_address(#####)
es_port: 9200
writeback_index: elastalert_status
writeback_alias: elastalert_alerts
alert_time_limit:
days: 2
logging:
version: 1
incremental: false
disable_existing_loggers: false
formatters:
logline:
format: '%(asctime)s %(levelname)+8s %(name)+20s %(message)s'
handlers:
console:
class: logging.StreamHandler
formatter: logline
level: DEBUG
stream: ext://sys.stderr
file:
class : logging.FileHandler
formatter: logline
level: DEBUG
filename: elastalert.log
loggers:
elastalert:
level: WARN
handlers: []
propagate: true
elasticsearch:
level: WARN
handlers: []
propagate: true
Example_frequency.yamlfile:
es_host: ip adress(####)
es_port: 9200
name: FaultExceptions
type: frequency
index: logstash_*
num_events: 5
timeframe:
minutes: 15
filter:
-query:
query_string:
query: "ErrorGroup: Fault Exception"
alert:
-"email"
email:
-"abc#gmail.com"
I am getting the mail in each 15 min but that data does not match with filter where ErrorGroup name should be Fault Exception.
Please help me to understand this as I am working on this since last 4 days, Thanks in advance.
Hope not very late, but yes use --es_debug_trace command line option. It helps to see exact query being sent in curl:
python3 -m elastalert.elastalert --verbose --rule your_rule_to_test.yaml --es_debug_trace /tmp/elastalert_curl.log
The curl command in /tmp/elastalert_curl.log can then be fired in terminal to see the output or tweaked to see what went wrong. You can use Kibana Dev Tools to then check the curl command and test. Also confirm ErrorGroup is at the root level of the document index and try ErrorGroup.keyword.

Resources