Metricbeat - Not creating any logfile - elasticsearch

I am trying to set up metric beat for my CentOS7 host. I have explictly mentioned the logfile location for the metricbeat and the logging level is debug, but I dont see a log file created. I can see the logs in journalctl. Please let me know why the logfile is not creating. Same setting works with filebeat and the log file gets created.
Metricbeat version:
root#example.domain.com:/usr/share/metricbeat# metricbeat version
metricbeat version 7.2.0 (amd64), libbeat 7.2.0 [9ba65d864ca37cd32c25b980dbb4020975288fc0 built 2019-06-20 15:07:31 +0000 UTC]
Metricbeat config file:
/etc/metricbeat/metricbeat.yml
metricbeat:
config:
modules:
path: /etc/metricbeat/modules.d/*.yml
reload.enabled: true
reload.period: 10s
output.logstash:
hosts: ['logstash.domain.com:5158']
worker: 1
compression_level: 3
loadbalance: true
ssl:
certificate: /usr/share/metricbeat/metricbeat.crt
key: /usr/share/metricbeat/metricbeat.key
verification_mode: none
logging:
level: debug
to_files: true
files:
path: /var/myapp/log/metricbeat
name: metricbeat.log
rotateeverybytes: 10485760
keepfiles: 7
Ideally it should create a file (metricbeat.log) in /var/myapp/log/metricbeat location, but I dont see any files getting created.
Journalctl output:
* metricbeat.service - Metricbeat is a lightweight shipper for metrics.
Loaded: loaded (/usr/lib/systemd/system/metricbeat.service; disabled; vendor preset: disabled)
Active: active (running) since Mon 2022-01-24 08:51:13 PST; 39min ago
Docs: https://www.elastic.co/products/beats/metricbeat
Main PID: 13520 (metricbeat)
CGroup: /system.slice/metricbeat.service
`-13520 /usr/share/metricbeat/bin/metricbeat -e -c /etc/metricbeat/metricbeat.yml -path.home /usr/share/metricbeat -path.config /etc/metricbeat -path.data /var/lib/metricbeat -path.logs /var/log/metricbeat
Jan 24 09:30:14 example.domain.com metricbeat[13520]: "/var/lib/metricbeat",
Jan 24 09:30:14 example.domain.com metricbeat[13520]: "-path.logs",
Jan 24 09:30:14 example.domain.com metricbeat[13520]: "/var/log/metricbeat"
Jan 24 09:30:14 example.domain.com metricbeat[13520]: ]
Jan 24 09:30:14 example.domain.com metricbeat[13520]: },
Jan 24 09:30:14 example.domain.com metricbeat[13520]: "user": {
Jan 24 09:30:14 example.domain.com metricbeat[13520]: "name": "root"
Jan 24 09:30:14 example.domain.com metricbeat[13520]: },
Jan 24 09:30:14 example.domain.com metricbeat[13520]: "event": {
Jan 24 09:30:14 example.domain.com metricbeat[13520]: "module": "system",
I dont see any thing in "/var/log/metricbeat" directory as well.
UPDATE: I tried with version 6.3 and 7.16. It works fine. Looks like some issue with 7.2

Related

Sonarqube plugin installation using netrCreds

We are installing Sonarqube as a self managed service via the helm charts at https://SonarSource.github.io/helm-chart-sonarqube.. Sonarqube instance was working fine but we did a change to use netrc type Credentials to download plugins from JFrog artifactory after which our pods started failing.
Log details can be found as below
bash-3.2$ kubectl logs sonarqube-sonarqube-0 install-plugins -n sonarqube
sh: /opt/sonarqube/extensions/downloads/sonar-pmd-plugin-3.3.1.jar: unknown operand
curl: (22) The requested URL returned error: 403
bash-3.2$ kubectl exec sonarqube-sonarqube-0 -n sonarqube -- ls /opt/sonarqube/extensions/download
Defaulted container "sonarqube" out of: sonarqube, init-sysctl (init), concat-properties (init), inject-prometheus-exporter (init), init-fs (init), install-plugins (init)
error: unable to upgrade connection: container not found ("sonarqube")
NAME READY STATUS RESTARTS AGE
sonarqube-sonarqube-0 0/1 Init:CrashLoopBackOff 525 44h
Name: sonarqube-sonarqube-0
Namespace: sonarqube
Priority: 0
Node: ip-10-110-198-195.eu-west-1.compute.internal/10.110.198.195
Start Time: Sat, 10 Sep 2022 13:57:31 +0200
Labels: app=sonarqube
controller-revision-hash=sonarqube-sonarqube-6d6c785f6f
release=sonarqube
statefulset.kubernetes.io/pod-name=sonarqube-sonarqube-0
Annotations: checksum/config: 823d389fbc2ce9b41133d9542232fb023520659597f5473b44f9c0a870c2c6a7
checksum/init-fs: ad6cbc139b1960af56d3e813d56eb450949be388fa84686c48265d32e68cb895
checksum/init-sysctl: 3fc2c9dee4de70eed6b8b0b7112095ccbf69694166ee05c3e59ccfc7571461aa
checksum/plugins: 649c5fdb8f1b2f07b1999a8d5f7e56f9ae65d05e25d537fcdfc7e1c5ff6c9103
checksum/prometheus-ce-config: b2643e1c7fd0d26ede75ee98c7e646dfcb9255b1f73d1c51616dc3972499bb08
checksum/prometheus-config: 3f1303040aa8c859addcf37c7b82e376b3d90adcdc0b209fa251ca72ec9bee7e
checksum/secret: 7b9cfd0db7ecd7dc34ee86567e5bc93601ccca66047d3452801b6222fd44df84
kubernetes.io/psp: eks.privileged
Status: Pending
IP: 10.110.202.249
IPs:
IP: 10.110.202.249
Controlled By: StatefulSet/sonarqube-sonarqube
Init Containers:
init-sysctl:
Container ID: docker://3e66f63924be5c251a46cf054107951f5056f23a096b2f6c8c31b77842e0f29d
Image: leaseplan.jfrog.io/docker-hub/busybox:latest
Image ID: docker-pullable://leaseplan.jfrog.io/docker-hub/busybox#sha256:20142e89dab967c01765b0aea3be4cec3a5957cc330f061e5503ef6168ae6613
Port: <none>
Host Port: <none>
Command:
sh
-e
/tmp/scripts/init_sysctl.sh
State: Terminated
Reason: Completed
Exit Code: 0
Started: Sat, 10 Sep 2022 13:57:42 +0200
Finished: Sat, 10 Sep 2022 13:57:42 +0200
Ready: True
Restart Count: 0
Limits:
cpu: 50m
memory: 128Mi
Requests:
cpu: 20m
memory: 64Mi
Environment: <none>
Mounts:
/tmp/scripts/ from init-sysctl (rw)
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-n89wf (ro)
concat-properties:
Container ID: docker://b04f51eaa84bf4198437c7a782e0d186ea93337ac91cc6dae862b836fc6ef6a9
Image: leaseplan.jfrog.io/docker-hub/busybox:latest
Image ID: docker-pullable://leaseplan.jfrog.io/docker-hub/busybox#sha256:20142e89dab967c01765b0aea3be4cec3a5957cc330f061e5503ef6168ae6613
Port: <none>
Host Port: <none>
Command:
sh
-c
#!/bin/sh
if [ -f /tmp/props/sonar.properties ]; then
cat /tmp/props/sonar.properties > /tmp/result/sonar.properties
fi
if [ -f /tmp/props/secret.properties ]; then
cat /tmp/props/secret.properties > /tmp/result/sonar.properties
fi
if [ -f /tmp/props/sonar.properties -a -f /tmp/props/secret.properties ]; then
awk 1 /tmp/props/sonar.properties /tmp/props/secret.properties > /tmp/result/sonar.properties
fi
State: Terminated
Reason: Completed
Exit Code: 0
Started: Sat, 10 Sep 2022 13:57:43 +0200
Finished: Sat, 10 Sep 2022 13:57:43 +0200
Ready: True
Restart Count: 0
Limits:
cpu: 50m
memory: 128Mi
Requests:
cpu: 20m
memory: 64Mi
Environment: <none>
Mounts:
/tmp/props/sonar.properties from config (rw,path="sonar.properties")
/tmp/result from concat-dir (rw)
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-n89wf (ro)
inject-prometheus-exporter:
Container ID: docker://22d8f7458c95d1d7ad096f2f804cac5fef64b889895274558739f691820786e0
Image: leaseplan.jfrog.io/docker-hub/curlimages/curl:7.76.1
Image ID: docker-pullable://leaseplan.jfrog.io/docker-hub/curlimages/curl#sha256:fa32ef426092b88ee0b569d6f81ab0203ee527692a94ec2e6ceb2fd0b6b2755c
Port: <none>
Host Port: <none>
Command:
/bin/sh
-c
Args:
curl -s 'https://repo1.maven.org/maven2/io/prometheus/jmx/jmx_prometheus_javaagent/0.16.0/jmx_prometheus_javaagent-0.16.0.jar' --output /data/jmx_prometheus_javaagent.jar -v
State: Terminated
Reason: Completed
Exit Code: 0
Started: Sat, 10 Sep 2022 13:57:43 +0200
Finished: Sat, 10 Sep 2022 13:57:44 +0200
Ready: True
Restart Count: 0
Limits:
cpu: 50m
memory: 128Mi
Requests:
cpu: 20m
memory: 64Mi
Environment:
http_proxy:
https_proxy:
no_proxy:
Mounts:
/data from sonarqube (rw,path="data")
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-n89wf (ro)
init-fs:
Container ID: docker://2005fe2dbe2ca4c5150d91955563c9df948864ea65fca9d9bfa397b6f8699410
Image: leaseplan.jfrog.io/docker-hub/busybox:latest
Image ID: docker-pullable://leaseplan.jfrog.io/docker-hub/busybox#sha256:20142e89dab967c01765b0aea3be4cec3a5957cc330f061e5503ef6168ae6613
Port: <none>
Host Port: <none>
Command:
sh
-e
/tmp/scripts/init_fs.sh
State: Terminated
Reason: Completed
Exit Code: 0
Started: Sat, 10 Sep 2022 13:57:44 +0200
Finished: Sat, 10 Sep 2022 13:57:44 +0200
Ready: True
Restart Count: 0
Limits:
cpu: 50m
memory: 128Mi
Requests:
cpu: 20m
memory: 64Mi
Environment: <none>
Mounts:
/opt/sonarqube/data from sonarqube (rw,path="data")
/opt/sonarqube/extensions from sonarqube (rw,path="extensions")
/opt/sonarqube/logs from sonarqube (rw,path="logs")
/opt/sonarqube/temp from sonarqube (rw,path="temp")
/tmp from tmp-dir (rw)
/tmp/scripts/ from init-fs (rw)
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-n89wf (ro)
install-plugins:
Container ID: docker://58a6bed99749e3da7c4818a6f0e0061ac5bced70563020ccc55b4b63ab721125
Image: leaseplan.jfrog.io/docker-hub/curlimages/curl:7.76.1
Image ID: docker-pullable://leaseplan.jfrog.io/docker-hub/curlimages/curl#sha256:fa32ef426092b88ee0b569d6f81ab0203ee527692a94ec2e6ceb2fd0b6b2755c
Port: <none>
Host Port: <none>
Command:
sh
-e
/tmp/scripts/install_plugins.sh
State: Waiting
Reason: CrashLoopBackOff
Last State: Terminated
Reason: Error
Exit Code: 22
Started: Mon, 12 Sep 2022 10:53:52 +0200
Finished: Mon, 12 Sep 2022 10:53:56 +0200
Ready: False
Restart Count: 525
Limits:
cpu: 50m
memory: 128Mi
Requests:
cpu: 20m
memory: 64Mi
Environment:
http_proxy:
https_proxy:
no_proxy:
Mounts:
/opt/sonarqube/extensions/downloads from sonarqube (rw,path="extensions/downloads")
/opt/sonarqube/lib/common from sonarqube (rw,path="lib/common")
/root from plugins-netrc-file (rw)
/tmp/scripts/ from install-plugins (rw)
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-n89wf (ro)
Containers:
sonarqube:
Container ID:
Image: leaseplan.jfrog.io/docker-hub/sonarqube:9.5.0-developer
Image ID:
Ports: 9000/TCP, 8000/TCP, 8001/TCP
Host Ports: 0/TCP, 0/TCP, 0/TCP
State: Waiting
Reason: PodInitializing
Ready: False
Restart Count: 0
Limits:
cpu: 4
memory: 6Gi
Requests:
cpu: 1
memory: 4Gi
Liveness: http-get http://:http/api/system/liveness delay=60s timeout=1s period=30s #success=1 #failure=6
Readiness: exec [sh -c #!/bin/bash
# A Sonarqube container is considered ready if the status is UP, DB_MIGRATION_NEEDED or DB_MIGRATION_RUNNING
# status about migration are added to prevent the node to be kill while sonarqube is upgrading the database.
host="$(hostname -i || echo '127.0.0.1')"
if wget --proxy off -qO- http://${host}:9000/api/system/status | grep -q -e '"status":"UP"' -e '"status":"DB_MIGRATION_NEEDED"' -e '"status":"DB_MIGRATION_RUNNING"'; then
exit 0
fi
exit 1
] delay=60s timeout=1s period=30s #success=1 #failure=6
Startup: http-get http://:http/api/system/status delay=30s timeout=1s period=10s #success=1 #failure=24
Environment Variables from:
sonarqube-sonarqube-jdbc-config ConfigMap Optional: false
Environment:
SONAR_WEB_JAVAOPTS: -javaagent:/opt/sonarqube/data/jmx_prometheus_javaagent.jar=8000:/opt/sonarqube/conf/prometheus-config.yaml
SONAR_CE_JAVAOPTS: -javaagent:/opt/sonarqube/data/jmx_prometheus_javaagent.jar=8001:/opt/sonarqube/conf/prometheus-ce-config.yaml
SONAR_JDBC_PASSWORD: <set to the key 'password' in secret 'sonarqube-database'> Optional: false
SONAR_WEB_SYSTEMPASSCODE: <set to the key 'SONAR_WEB_SYSTEMPASSCODE' in secret 'sonarqube-sonarqube-monitoring-passcode'> Optional: false
Mounts:
/opt/sonarqube/conf/ from concat-dir (rw)
/opt/sonarqube/conf/prometheus-ce-config.yaml from prometheus-ce-config (rw,path="prometheus-ce-config.yaml")
/opt/sonarqube/conf/prometheus-config.yaml from prometheus-config (rw,path="prometheus-config.yaml")
/opt/sonarqube/data from sonarqube (rw,path="data")
/opt/sonarqube/extensions from sonarqube (rw,path="extensions")
/opt/sonarqube/logs from sonarqube (rw,path="logs")
/opt/sonarqube/temp from sonarqube (rw,path="temp")
/tmp from tmp-dir (rw)
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-n89wf (ro)
Conditions:
Type Status
Initialized False
Ready False
ContainersReady False
PodScheduled True
Volumes:
config:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: sonarqube-sonarqube-config
Optional: false
plugins-netrc-file:
Type: Secret (a volume populated by a Secret)
SecretName: eks-prv-0001-maven-local-default
Optional: false
init-sysctl:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: sonarqube-sonarqube-init-sysctl
Optional: false
init-fs:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: sonarqube-sonarqube-init-fs
Optional: false
install-plugins:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: sonarqube-sonarqube-install-plugins
Optional: false
prometheus-config:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: sonarqube-sonarqube-prometheus-config
Optional: false
prometheus-ce-config:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: sonarqube-sonarqube-prometheus-ce-config
Optional: false
sonarqube:
Type: PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
ClaimName: sonarqube-sonarqube
ReadOnly: false
tmp-dir:
Type: EmptyDir (a temporary directory that shares a pod's lifetime)
Medium:
SizeLimit: <unset>
concat-dir:
Type: EmptyDir (a temporary directory that shares a pod's lifetime)
Medium:
SizeLimit: <unset>
kube-api-access-n89wf:
Type: Projected (a volume that contains injected data from multiple sources)
TokenExpirationSeconds: 3607
ConfigMapName: kube-root-ca.crt
ConfigMapOptional: <nil>
DownwardAPI: true
QoS Class: Burstable
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Pulled 29m (x521 over 44h) kubelet Container image "leaseplan.jfrog.io/docker-hub/curlimages/curl:7.76.1" already present on machine
Warning BackOff 4m39s (x12224 over 44h) kubelet Back-off restarting failed container

Greenplum Operator on kubernetes zapr error

I am trying to deploy Greenplum Operator on kubernetes and I get the following error:
kubectl describe pod greenplum-operator-87d989b4d-ldft6:
Name: greenplum-operator-87d989b4d-ldft6
Namespace: greenplum
Priority: 0
Node: node-1/some-ip
Start Time: Mon, 23 May 2022 14:07:26 +0200
Labels: app=greenplum-operator
pod-template-hash=87d989b4d
Annotations: cni.projectcalico.org/podIP: some-ip
cni.projectcalico.org/podIPs: some-ip
Status: Running
IP: some-ip
IPs:
IP: some-ip
Controlled By: ReplicaSet/greenplum-operator-87d989b4d
Containers:
greenplum-operator:
Container ID: docker://364997050b1f337ff61b8ce40534697bbc13aae29f7b9ae5255245375acce03f
Image: greenplum-operator:v2.3.0
Image ID: docker-pullable://greenplum-operator:v2.3.0
Port: <none>
Host Port: <none>
Command:
greenplum-operator
--logLevel
debug
State: Waiting
Reason: CrashLoopBackOff
Last State: Terminated
Reason: Error
Exit Code: 1
Started: Mon, 23 May 2022 15:29:59 +0200
Finished: Mon, 23 May 2022 15:30:32 +0200
Ready: False
Restart Count: 19
Environment:
GREENPLUM_IMAGE_REPO: greenplum-operator:v2.3.0
GREENPLUM_IMAGE_TAG: v2.3.0
OPERATOR_IMAGE_REPO: greenplum-operator:v2.3.0
OPERATOR_IMAGE_TAG: v2.3.0
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from greenplum-system-operator-token-xcz4q (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
greenplum-system-operator-token-xcz4q:
Type: Secret (a volume populated by a Secret)
SecretName: greenplum-system-operator-token-xcz4q
Optional: false
QoS Class: BestEffort
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning BackOff 32s (x340 over 84m) kubelet Back-off restarting failed container
kubectl logs greenplum-operator-87d989b4d-ldft6
{"level":"INFO","ts":"2022-05-23T13:35:38.735Z","logger":"setup","msg":"Go Info","Version":"go1.14.10","GOOS":"linux","GOARCH":"amd64"}
{"level":"INFO","ts":"2022-05-23T13:35:41.242Z","logger":"controller-runtime.metrics","msg":"metrics server is starting to listen","addr":":8080"}
{"level":"INFO","ts":"2022-05-23T13:35:41.262Z","logger":"setup","msg":"starting manager"}
{"level":"INFO","ts":"2022-05-23T13:35:41.262Z","logger":"admission","msg":"starting greenplum validating admission webhook server"}
{"level":"INFO","ts":"2022-05-23T13:35:41.262Z","logger":"controller-runtime.controller","msg":"Starting EventSource","controller":"greenplumpxfservice","source":"kind source: /, Kind="}
{"level":"INFO","ts":"2022-05-23T13:35:41.264Z","logger":"controller-runtime.controller","msg":"Starting EventSource","controller":"greenplumplservice","source":"kind source: /, Kind="}
{"level":"INFO","ts":"2022-05-23T13:35:41.264Z","logger":"controller-runtime.controller","msg":"Starting EventSource","controller":"greenplumcluster","source":"kind source: /, Kind="}
{"level":"INFO","ts":"2022-05-23T13:35:41.262Z","logger":"controller-runtime.manager","msg":"starting metrics server","path":"/metrics"}
{"level":"INFO","ts":"2022-05-23T13:35:41.265Z","logger":"controller-runtime.controller","msg":"Starting EventSource","controller":"greenplumtextservice","source":"kind source: /, Kind="}
{"level":"INFO","ts":"2022-05-23T13:35:41.361Z","logger":"admission","msg":"CertificateSigningRequest: created"}
{"level":"INFO","ts":"2022-05-23T13:35:41.363Z","logger":"controller-runtime.controller","msg":"Starting EventSource","controller":"greenplumpxfservice","source":"kind source: /, Kind="}
{"level":"INFO","ts":"2022-05-23T13:35:41.364Z","logger":"controller-runtime.controller","msg":"Starting EventSource","controller":"greenplumplservice","source":"kind source: /, Kind="}
{"level":"INFO","ts":"2022-05-23T13:35:41.364Z","logger":"controller-runtime.controller","msg":"Starting EventSource","controller":"greenplumcluster","source":"kind source: /, Kind="}
{"level":"INFO","ts":"2022-05-23T13:35:41.366Z","logger":"controller-runtime.controller","msg":"Starting EventSource","controller":"greenplumtextservice","source":"kind source: /, Kind="}
{"level":"INFO","ts":"2022-05-23T13:35:41.464Z","logger":"controller-runtime.controller","msg":"Starting Controller","controller":"greenplumpxfservice"}
{"level":"INFO","ts":"2022-05-23T13:35:41.464Z","logger":"controller-runtime.controller","msg":"Starting Controller","controller":"greenplumplservice"}
{"level":"INFO","ts":"2022-05-23T13:35:41.465Z","logger":"controller-runtime.controller","msg":"Starting workers","controller":"greenplumplservice","worker count":1}
{"level":"INFO","ts":"2022-05-23T13:35:41.465Z","logger":"controller-runtime.controller","msg":"Starting Controller","controller":"greenplumcluster"}
{"level":"INFO","ts":"2022-05-23T13:35:41.465Z","logger":"controller-runtime.controller","msg":"Starting workers","controller":"greenplumpxfservice","worker count":1}
{"level":"INFO","ts":"2022-05-23T13:35:41.465Z","logger":"controller-runtime.controller","msg":"Starting workers","controller":"greenplumcluster","worker count":1}
{"level":"INFO","ts":"2022-05-23T13:35:41.466Z","logger":"controller-runtime.controller","msg":"Starting Controller","controller":"greenplumtextservice"}
{"level":"INFO","ts":"2022-05-23T13:35:41.466Z","logger":"controller-runtime.controller","msg":"Starting workers","controller":"greenplumtextservice","worker count":1}
{"level":"ERROR","ts":"2022-05-23T13:36:11.368Z","logger":"setup","msg":"error","error":"getting certificate for webhook: failure while waiting for approval: timed out waiting for the condition","errorCauses":[{"error":"getting certificate for webhook: failure while waiting for approval: timed out waiting for the condition"}],"stacktrace":"github.com/go-logr/zapr.(*zapLogger).Error\n\t/go/pkg/mod/github.com/go-logr/zapr#v0.1.0/zapr.go:128\nmain.main\n\t/greenplum-for-kubernetes/greenplum-operator/cmd/greenplumOperator/main.go:35\nruntime.main\n\t/usr/local/go/src/runtime/proc.go:203"}
I tried to redeploy the cert-manager and check logs but couldn't find anything. Documentation of the greenplum-for-kubernetes doesn't mention anything about that. Read the whole troubleshooting document from the pivotal website too

ElastAlert filter not worked as expected

I have installed elastAlert.
Below is my config and yaml file configuration:
Config file :
rules_folder: rules
run_every:
minutes: 15
buffer_time:
minutes: 15
es_host: ip_address(#####)
es_port: 9200
writeback_index: elastalert_status
writeback_alias: elastalert_alerts
alert_time_limit:
days: 2
logging:
version: 1
incremental: false
disable_existing_loggers: false
formatters:
logline:
format: '%(asctime)s %(levelname)+8s %(name)+20s %(message)s'
handlers:
console:
class: logging.StreamHandler
formatter: logline
level: DEBUG
stream: ext://sys.stderr
file:
class : logging.FileHandler
formatter: logline
level: DEBUG
filename: elastalert.log
loggers:
elastalert:
level: WARN
handlers: []
propagate: true
elasticsearch:
level: WARN
handlers: []
propagate: true
Example_frequency.yamlfile:
es_host: ip adress(####)
es_port: 9200
name: FaultExceptions
type: frequency
index: logstash_*
num_events: 5
timeframe:
minutes: 15
filter:
-query:
query_string:
query: "ErrorGroup: Fault Exception"
alert:
-"email"
email:
-"abc#gmail.com"
I am getting the mail in each 15 min but that data does not match with filter where ErrorGroup name should be Fault Exception.
Please help me to understand this as I am working on this since last 4 days, Thanks in advance.
Hope not very late, but yes use --es_debug_trace command line option. It helps to see exact query being sent in curl:
python3 -m elastalert.elastalert --verbose --rule your_rule_to_test.yaml --es_debug_trace /tmp/elastalert_curl.log
The curl command in /tmp/elastalert_curl.log can then be fired in terminal to see the output or tweaked to see what went wrong. You can use Kibana Dev Tools to then check the curl command and test. Also confirm ErrorGroup is at the root level of the document index and try ErrorGroup.keyword.

OpenShift Ansible-based operator hangs inconsistently

I have an Ansible-based operator running within an OpenShift 4.2 cluster.
Most of times when I apply the relevant CR, the operator runs perfectly.
Occasionally though the operator hangs without reporting any further logs.
The step where this happens is the same however the problem is that this happens inconsistently without any other factors involved and I am not sure how to diagnose it.
Restarting the operator always resolves the issue, but I wonder if there's anything I could do to diagnose it and prevent this from happening altogether?
- name: allow Pods to reference images in myproject project
k8s:
definition:
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
name: "system:image-puller-{{ meta.name }}"
namespace: myproject
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: system:image-puller
subjects:
- apiGroup: rbac.authorization.k8s.io
kind: Group
name: "system:serviceaccounts:{{ meta.name }}"
The operator's logs simply hang right after the above step and right before the following step:
- name: fetch some-secret
set_fact:
some_secret: "{{ lookup('k8s', kind='Secret', namespace='myproject', resource_name='some-secret') }}"
oc describe is as follows
oc describe -n openshift-operators pod my-ansible-operator-849b44d6cc-nr5st
Name: my-ansible-operator-849b44d6cc-nr5st
Namespace: openshift-operators
Priority: 0
PriorityClassName: <none>
Node: worker1.openshift.mycompany.com/10.0.8.21
Start Time: Wed, 10 Jun 2020 22:35:45 +0100
Labels: name=my-ansible-operator
pod-template-hash=849b44d6cc
Annotations: k8s.v1.cni.cncf.io/networks-status:
[{
"name": "openshift-sdn",
"interface": "eth0",
"ips": [
"10.254.20.128"
],
"default": true,
"dns": {}
}]
Status: Running
IP: 10.254.20.128
Controlled By: ReplicaSet/my-ansible-operator-849b44d6cc
Containers:
ansible:
Container ID: cri-o://63b86ddef4055be4bcd661a3fcd70d525f9788cb96b7af8dd383ac08ea670047
Image: image-registry.openshift-image-registry.svc:5000/openshift-operators/my-ansible-operator:v0.0.1
Image ID: image-registry.openshift-image-registry.svc:5000/openshift-operators/my-ansible-operator#sha256:fda68898e6fe0c61760fe8c50fd0a55de392e63635c5c8da47fdb081cd126b5a
Port: <none>
Host Port: <none>
Command:
/usr/local/bin/ao-logs
/tmp/ansible-operator/runner
stdout
State: Running
Started: Wed, 10 Jun 2020 22:35:56 +0100
Ready: True
Restart Count: 0
Environment: <none>
Mounts:
/tmp/ansible-operator/runner from runner (ro)
/var/run/secrets/kubernetes.io/serviceaccount from my-ansible-operator-token-vbwlr (ro)
operator:
Container ID: cri-o://365077a3c1d83b97428d27eebf2f0735c9d670d364b16fad83fff5bb02b479fe
Image: image-registry.openshift-image-registry.svc:5000/openshift-operators/my-ansible-operator:v0.0.1
Image ID: image-registry.openshift-image-registry.svc:5000/openshift-operators/my-ansible-operator#sha256:fda68898e6fe0c61760fe8c50fd0a55de392e63635c5c8da47fdb081cd126b5a
Port: <none>
Host Port: <none>
State: Running
Started: Wed, 10 Jun 2020 22:35:57 +0100
Ready: True
Restart Count: 0
Environment:
WATCH_NAMESPACE: openshift-operators (v1:metadata.namespace)
POD_NAME: my-ansible-operator-849b44d6cc-nr5st (v1:metadata.name)
OPERATOR_NAME: my-ansible-operator
ANSIBLE_GATHERING: explicit
Mounts:
/tmp/ansible-operator/runner from runner (rw)
/var/run/secrets/kubernetes.io/serviceaccount from my-ansible-operator-token-vbwlr (ro)
Conditions:
Type Status
Initialized True
Ready True
ContainersReady True
PodScheduled True
Volumes:
runner:
Type: EmptyDir (a temporary directory that shares a pod's lifetime)
Medium:
my-ansible-operator-token-vbwlr:
Type: Secret (a volume populated by a Secret)
SecretName: my-ansible-operator-token-vbwlr
Optional: false
QoS Class: BestEffort
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute for 300s
node.kubernetes.io/unreachable:NoExecute for 300s
Events: <none>
Is there anything else I could do to diagnose the problem further or prevent the operator from hanging occasionally?
I found a very similar issue in the operator-sdk repository, linking to the root cause in the Ansible k8s module:
Ansible 2.7 stuck on Python 3.7 in docker-ce
From the discussion in the issue it seems that the problem is related to tasks that do not time out and the current workaround seems to be:
For now we just override ansible local connection and normal action plugins, so:
all communicate() calls have 60 second timeout
all raised TimeoutExpired exceptions are retried a few times
Can you check if this resolves your issue? As the issue is still "open", you might need to reach out to the issue as well.

Missing queues from RabbitMQ Metricbeat

It looks like only a fraction of the queues on my RabbitMQ cluster are making it into Elasticsearch via Metricbeat.
When I query RabbitMQ's /api/overview, I see 887 queues reported:
object_totals: {
consumers: 517,
queues: 887,
exchanges: 197,
connections: 305,
channels: 622
},
When I query RabbitMQ's /api/queues (which is what Metricbeat hits), I count 887 queues there as well.
When I get a unique count of the field rabbitmq.queue.name in Elasticsearch, I am seeing only 309 queues.
I don't see anything in the debug output that stands out to me. It's just the usual INFO level startup messages, followed by the publish information:
root#rabbitmq:/etc/metricbeat# metricbeat -e
2019-06-24T21:13:33.692Z INFO instance/beat.go:571 Home path: [/usr/share/metricbeat] Config path: [/etc/metricbeat] Data path: [/var/lib/metricbeat] Logs path: [/var/log/metricbeat]
2019-06-24T21:13:33.692Z INFO instance/beat.go:579 Beat ID: xxx
2019-06-24T21:13:33.692Z INFO [index-management.ilm] ilm/ilm.go:129 Policy name: metricbeat-7.1.1
2019-06-24T21:13:33.692Z INFO [seccomp] seccomp/seccomp.go:116 Syscall filter successfully installed
2019-06-24T21:13:33.692Z INFO [beat] instance/beat.go:827 Beat info {"system_info": {"beat": {"path": {"config": "/etc/metricbeat", "data": "/var/lib/metricbeat", "home": "/usr/share/metricbeat", "logs": "/var/log/metricbeat"}, "type": "metricbeat", "uuid": "xxx"}}}
2019-06-24T21:13:33.692Z INFO [beat] instance/beat.go:836 Build info {"system_info": {"build": {"commit": "3358d9a5a09e3c6709a2d3aaafde628ea34e8419", "libbeat": "7.1.1", "time": "2019-05-23T13:23:10.000Z", "version": "7.1.1"}}}
2019-06-24T21:13:33.692Z INFO [beat] instance/beat.go:839 Go runtime info {"system_info": {"go": {"os":"linux","arch":"amd64","max_procs":4,"version":"go1.11.5"}}}
[...]
2019-06-24T21:13:33.694Z INFO [beat] instance/beat.go:872 Process info {"system_info": {"process": {"capabilities": {"inheritable":null,"permitted":["chown","dac_override","dac_read_search","fowner","fsetid","kill","setgid","setuid","setpcap","linux_immutable","net_bind_service","net_broadcast","net_admin","net_raw","ipc_lock","ipc_owner","sys_module","sys_rawio","sys_chroot","sys_ptrace","sys_pacct","sys_admin","sys_boot","sys_nice","sys_resource","sys_time","sys_tty_config","mknod","lease","audit_write","audit_control","setfcap","mac_override","mac_admin","syslog","wake_alarm","block_suspend","audit_read"],"effective":["chown","dac_override","dac_read_search","fowner","fsetid","kill","setgid","setuid","setpcap","linux_immutable","net_bind_service","net_broadcast","net_admin","net_raw","ipc_lock","ipc_owner","sys_module","sys_rawio","sys_chroot","sys_ptrace","sys_pacct","sys_admin","sys_boot","sys_nice","sys_resource","sys_time","sys_tty_config","mknod","lease","audit_write","audit_control","setfcap","mac_override","mac_admin","syslog","wake_alarm","block_suspend","audit_read"],"bounding":["chown","dac_override","dac_read_search","fowner","fsetid","kill","setgid","setuid","setpcap","linux_immutable","net_bind_service","net_broadcast","net_admin","net_raw","ipc_lock","ipc_owner","sys_module","sys_rawio","sys_chroot","sys_ptrace","sys_pacct","sys_admin","sys_boot","sys_nice","sys_resource","sys_time","sys_tty_config","mknod","lease","audit_write","audit_control","setfcap","mac_override","mac_admin","syslog","wake_alarm","block_suspend","audit_read"],"ambient":null}, "cwd": "/etc/metricbeat", "exe": "/usr/share/metricbeat/bin/metricbeat", "name": "metricbeat", "pid": 30898, "ppid": 30405, "seccomp": {"mode":"filter","no_new_privs":true}, "start_time": "2019-06-24T21:13:33.100Z"}}}
2019-06-24T21:13:33.694Z INFO instance/beat.go:280 Setup Beat: metricbeat; Version: 7.1.1
2019-06-24T21:13:33.694Z INFO [publisher] pipeline/module.go:97 Beat name: metricbeat
2019-06-24T21:13:33.694Z INFO instance/beat.go:391 metricbeat start running.
2019-06-24T21:13:33.694Z INFO cfgfile/reload.go:150 Config reloader started
2019-06-24T21:13:33.694Z INFO [monitoring] log/log.go:117 Starting metrics logging every 30s
[...]
2019-06-24T21:13:43.696Z INFO filesystem/filesystem.go:57 Ignoring filesystem types: sysfs, rootfs, ramfs, bdev, proc, cpuset, cgroup, cgroup2, tmpfs, devtmpfs, configfs, debugfs, tracefs, securityfs, sockfs, dax, bpf, pipefs, hugetlbfs, devpts, ecryptfs, fuse, fusectl, pstore, mqueue, autofs
2019-06-24T21:13:43.696Z INFO fsstat/fsstat.go:59 Ignoring filesystem types: sysfs, rootfs, ramfs, bdev, proc, cpuset, cgroup, cgroup2, tmpfs, devtmpfs, configfs, debugfs, tracefs, securityfs, sockfs, dax, bpf, pipefs, hugetlbfs, devpts, ecryptfs, fuse, fusectl, pstore, mqueue, autofs
2019-06-24T21:13:44.696Z INFO pipeline/output.go:95 Connecting to backoff(async(tcp://xxx))
2019-06-24T21:13:44.711Z INFO pipeline/output.go:105 Connection to backoff(async(tcp://xxx)) established
2019-06-24T21:14:03.696Z INFO [monitoring] log/log.go:144 Non-zero metrics in the last 30s {"monitoring": {"metrics": {"beat":{"cpu":{"system":{"ticks":130,"time":{"ms":131}},"total":{"ticks":1960,"time":{"ms":1965},"value":1960},"user":{"ticks":1830,"time":{"ms":1834}}},"handles":{"limit":{"hard":1048576,"soft":1024},"open":12},"info":{"ephemeral_id":"xxx","uptime":{"ms":30030}},"memstats":{"gc_next":30689808,"memory_alloc":21580680,"memory_total":428076400,"rss":79917056}},"libbeat":{"config":{"module":{"running":0},"reloads":2},"output":{"events":{"acked":7825,"batches":11,"total":7825},"read":{"bytes":66},"type":"logstash","write":{"bytes":870352}},"pipeline":{"clients":4,"events":{"active":313,"published":8138,"retry":523,"total":8138},"queue":{"acked":7825}}},"metricbeat":{"rabbitmq":{"connection":{"events":2987,"failures":10,"success":2977},"exchange":{"events":1970,"success":1970},"node":{"events":10,"success":10},"queue":{"events":3130,"failures":10,"success":3120}},"system":{"cpu":{"events":2,"success":2},"filesystem":{"events":7,"success":7},"fsstat":{"events":1,"success":1},"load":{"events":2,"success":2},"memory":{"events":2,"success":2},"network":{"events":4,"success":4},"process":{"events":18,"success":18},"process_summary":{"events":2,"success":2},"socket_summary":{"events":2,"success":2},"uptime":{"events":1,"success":1}}},"system":{"cpu":{"cores":4},"load":{"1":0.48,"15":0.28,"5":0.15,"norm":{"1":0.12,"15":0.07,"5":0.0375}}}}}}
I think if there were a problem getting the queue, I should see an error in the logs above as per https://github.com/elastic/beats/blob/master/metricbeat/module/rabbitmq/queue/data.go#L94-L104
Here's the metricbeat.yml:
metricbeat.config.modules:
path: ${path.config}/modules.d/*.yml
reload.enabled: true
reload.period: 10s
setup.template.settings:
index.number_of_shards: 1
index.codec: best_compression
name: metricbeat
fields:
environment: development
processors:
- add_cloud_metadata: ~
output.logstash:
hosts: ["xxx"]
Here's the modules.d/rabbitmq.yml:
- module: rabbitmq
metricsets: ["node", "queue", "connection", "exchange"]
enabled: true
period: 2s
hosts: ["xxx"]
username: xxx
password: xxx
I solved it by upgrading Elastic Stack from 7.1.1 to 7.2.0.

Resources