MetricBeat - Kafka's consumergroup metricset doesn't send any data? - elasticsearch

I have running ZooKeeper and single Kafka broker and I want to get metrics with MetricBeat, index it with ElasticSearch and display with Kibana.
However, MetricBeat can only get data from partition metricset and nothing comes from consumergroup metricset.
Since kafka module is defined as periodical in metricbeat.yml, it should send some data on it's own, not just waiting for users interaction (f.exam. - write to topic) ?
To ensure myself, I tried to create consumer group, write and consume from topic, but still no data was collected by consumergroup metricset.
consumergroup is defined in both metricbeat.template.json and metricbeat.template-es2x.json.
While metricbeat.full.yml is completely commented off, this is my metricbeat.yml kafka module definition :
- module: kafka
metricsets: ["partition", "consumergroup"]
enabled: true
period: 10s
hosts: ["localhost:9092"]
client_id: metricbeat1
retries: 3
backoff: 250ms
topics: []
In /logs directory of MetricBeat, lines like this show up :
INFO Non-zero metrics in the last 30s:
libbeat.es.published_and_acked_events=109
libbeat.es.publish.write_bytes=88050
libbeat.publisher.messages_in_worker_queues=109
libbeat.es.call_count.PublishEvents=5
fetches.kafka-partition.events=106
fetches.kafka-consumergroup.success=2
libbeat.publisher.published_events=109
libbeat.es.publish.read_bytes=2701
fetches.kafka-partition.success=2
fetches.zookeeper-mntr.events=3
fetches.zookeeper-mntr.success=3
With ZooKeeper's mntr and Kafka's partition, I can see events= and success= values, but for consumergroup there is only success. It looks like no events are fired.
partition and mntr data are properly visible in Kibana, while consumergroup is missing.
Data stored in ElasticSearch are not readable with human eye, there are some internal strings used for directory names and logs do not contain any useful information.
Can anybody help me to understand what is going on and fix it(probably MetricBeat) to send data to ElasticSearch ? Thanks :)

You need to have an active consumer consuming out of the topics, to be able to generate events for consumergroup metricset.

Related

Messages are dropping because too many are queued in AlertManager

I have single instance cluster for AlertManager and I see warning in AlertManager container level=warn ts=2021-11-03T08:50:44.528Z caller=delegate.go:272 component=cluster msg="dropping messages because too many are queued" current=4125 limit=4096
Alert Manager Version information:
Version Information
Branch: HEAD
BuildDate: 20190708-14:31:49
BuildUser: root#868685ed3ed0
GoVersion: go1.12.6
Revision: 1ace0f76b7101cccc149d7298022df36039858ca
Version: 0.18.0
AlertManager metrics
# HELP alertmanager_cluster_members Number indicating current number of members in cluster.
# TYPE alertmanager_cluster_members gauge
alertmanager_cluster_members 1
# HELP alertmanager_cluster_messages_pruned_total Total number of cluster messages pruned.
# TYPE alertmanager_cluster_messages_pruned_total counter
alertmanager_cluster_messages_pruned_total 23020
# HELP alertmanager_cluster_messages_queued Number of cluster messages which are queued.
# TYPE alertmanager_cluster_messages_queued gauge
alertmanager_cluster_messages_queued 4125
How do we see those queued messages in AlertManager?
Do we lose alerts when messages are dropped because of too many
queued ?
Why are messages queued even though there is logic to prune messages
on regular interval i.e 15 minutes ?
Do we lose alerts when AlertManager pruned messages on regular interval?
I am new to alerting. Could you please answer for the above questions?

Using multiple pipelines in Logstash with beats input

As per an earlier discussion (Defining multiple outputs in Logstash whilst handling potential unavailability of an Elasticsearch instance) I'm now using pipelines in Logstash in order to send data input (from Beats on TCP 5044) to multiple Elasticsearch hosts. The relevant extract from pipelines.yml is shown below.
- pipeline.id: beats
queue.type: persisted
config.string: |
input {
beats {
port => 5044
ssl => true
ssl_certificate_authorities => '/etc/logstash/config/certs/ca.crt'
ssl_key => '/etc/logstash/config/certs/forwarder-001.pkcs8.key'
ssl_certificate => '/etc/logstash/config/certs/forwarder-001.crt'
ssl_verify_mode => "force_peer"
}
}
output { pipeline { send_to => [es100, es101] } }
- pipeline.id: es100
path.config: "/etc/logstash/pipelines/es100.conf"
- pipeline.id: es101
path.config: "/etc/logstash/pipelines/es101.conf"
In each of the pipeline .conf files I have the related virtual address i.e. the file /etc/logstash/pipelines/es101.conf includes the following:
input {
pipeline {
address => es101
}
}
This configuration seems to work well i.e. data is received by each of the Elasticsearch hosts es100 and es101.
I need to ensure that if one of these hosts is unavailable, the other still receives data and thanks to a previous tip, I'm now using pipelines which I understand allow for this. However I'm obviously missing something key in this configuration as the data isn't received by a host if the other is unavailable. Any suggestions are gratefully welcomed.
Firstly, you should configure persistent queues on the downstream pipelines (es100, es101), and size them to contain all the data that arrives during an outage. But even with persistent queues logstash has an at-least-once delivery model. If the persistent queue fills up then back-pressure will cause the beats input to stop accepting data. As the documentation on the output isolator pattern says "If any of the persistent queues of the downstream pipelines ... become full, both outputs will stop". If you really want to make sure an output is never blocked because another output is unavailable then you will need to introduce some software with a different delivery model. For example, configure filebeat to write to kafka, then have two pipelines that read from kafka and write to elasticsearch. If kafka is configured with an at-most-once delivery model (the default) then it will lose data if it cannot deliver it.

Kafka elasticsearch connector - 'Flush timeout expired with unflushed records:'

I have a strange problem with kafka -> elasticsearch connector. First time when I started it all was great, I received a new data in elasticsearch and checked it through kibana dashboard, but when I produced new data in to kafka using the same producer application and tried to start connector one more time, I didn't get any new data in elasticsearch.
Now I'm getting such errors:
[2018-02-04 21:38:04,987] ERROR WorkerSinkTask{id=log-platform-elastic-0} Commit of offsets threw an unexpected exception for sequence number 14: null (org.apache.kafka.connect.runtime.WorkerSinkTask:233)
org.apache.kafka.connect.errors.ConnectException: Flush timeout expired with unflushed records: 15805
I'm using next command to run connector:
/usr/bin/connect-standalone /etc/schema-registry/connect-avro-standalone.properties log-platform-elastic.properties
connect-avro-standalone.properties:
bootstrap.servers=kafka-0.kafka-hs:9093,kafka-1.kafka-hs:9093,kafka-2.kafka-hs:9093
key.converter=io.confluent.connect.avro.AvroConverter
key.converter.schema.registry.url=http://localhost:8081
value.converter=io.confluent.connect.avro.AvroConverter
value.converter.schema.registry.url=http://localhost:8081
internal.key.converter=org.apache.kafka.connect.json.JsonConverter
internal.value.converter=org.apache.kafka.connect.json.JsonConverter
internal.key.converter.schemas.enable=false
internal.value.converter.schemas.enable=false
offset.storage.file.filename=/tmp/connect.offsets
# producer.interceptor.classes=io.confluent.monitoring.clients.interceptor.MonitoringProducerInterceptor
# consumer.interceptor.classes=io.confluent.monitoring.clients.interceptor.MonitoringConsumerInterceptor
#rest.host.name=
rest.port=8084
#rest.advertised.host.name=
#rest.advertised.port=
plugin.path=/usr/share/java
and log-platform-elastic.properties:
name=log-platform-elastic
key.converter=org.apache.kafka.connect.storage.StringConverter
connector.class=io.confluent.connect.elasticsearch.ElasticsearchSinkConnector
tasks.max=1
topics=member_sync_log, order_history_sync_log # ... and many others
key.ignore=true
connection.url=http://elasticsearch:9200
type.name=log
I checked connection to kafka brokers, elasticsearch and schema-registry(schema-registry and connector are on the same host at this moment) and all is fine. Kafka brokers are running on port 9093 and I'm able to read data from topics using kafka-avro-console-consumer.
I'll be gratefull for any help on this!
Just update flush.timeout.ms to bigger than 10000 (10 seconds which is the default)
According to documentation:
flush.timeout.ms
The timeout in milliseconds to use for periodic
flushing, and when waiting for buffer space to be made available by
completed requests as records are added. If this timeout is exceeded
the task will fail.
Type: long Default: 10000 Importance: low
See documentation
We can optimized Elastic search configuration to solve issue. Please refer below link for configuration parameter
https://docs.confluent.io/current/connect/kafka-connect-elasticsearch/configuration_options.html
Below are key parameter which can control message rate flow to eventually help to solve issue:
flush.timeout.ms: Increase might help to give more breath on flush time
The timeout in milliseconds to use for periodic flushing, and when
waiting for buffer space to be made available by completed requests as
records are added. If this timeout is exceeded the task will fail.
max.buffered.records: Try reducing buffer record limit
The maximum number of records each task will buffer before blocking
acceptance of more records. This config can be used to limit the
memory usage for each task
batch.size: Try reducing batch size
The number of records to process as a batch when writing to
Elasticsearch
tasks.max: Number of parallel thread(consumer instance) Reduce or Increase. This will control Elastic Search if bandwidth not able to handle reduce task may help.
It worked my issue by tuning above parameters

Strange 'emitted' numbers behavior / zero stat numbers in Topology stats (Storm 1.0.3)

This is what my storm UI stat looks like.
The problem is that I have no idea where those numbers (of emitted tuples are coming from).
My topology is pretty simple: kafka spout -> bolt (persisting data into hbase)
topology works - when I put data into kafka topic, I get them processed by bolt and persisted in hbase, which I then verify with scan operator in hbase shell (so new records are being inserted)
however each time I submit new message into kafka and when it’s persisted by bolt - my topology doesn’t increase number of emitted by ‘1’.
periodically I get all numbers increased by 20 - without sending any new messages into kafka. I.e. my kafka topic gets no messages for hours, but the number of tuples emitted always get increased in chunks of 20 over time. I still get the same number of records in hbase.
I get no exceptions/errors anywhere in apache storm logs.
I’m not doing ack() or fail() any of my tuples in my bolt implementation (which is BasicBolt type doing ack automatically)
my capacity or latency in bolt metrics is always staying zero even when I load a lot of messages in Kafka
my kafka offset log ($KAFKA/kafka-run-class.sh kafka.tools.ConsumerOffsetChecker) shows all the messages are processed and Kafka Lag for given topic/group is 0.
So my question:
what are those ‘stealth’ tuples that increase ‘emitted’ in both Spout and Bolt over time by 20s?
is it possible to enable ‘debugging’ in storm UI to see what those tuples are?
why capacity/latency in bolt metrics is always zero while bolt is confirmed to persist data?
Environment details
I’m using Java 8 + Apache Storm 1.0.3
[devops#storm-wk1-prod]~/storm/supervisor/stormdist% storm version
Running: /usr/lib/jvm/jre-1.8.0-openjdk/bin/java -client -Ddaemon.name= -Dstorm.options= -Dstorm.home=/opt/apache-storm-1.0.3 -Dstorm.log.dir=/opt/apache-storm-1.0.3/logs -Djava.library.path=/usr/local/lib:/opt/local/lib:/usr/lib -Dstorm.conf.file= -cp /opt/apache-storm-1.0.3/lib/storm-core-1.0.3.jar:/opt/apache-storm-1.0.3/lib/kryo-3.0.3.jar:/opt/apache-storm-1.0.3/lib/reflectasm-1.10.1.jar:/opt/apache-storm-1.0.3/lib/asm-5.0.3.jar:/opt/apache-storm-1.0.3/lib/minlog-1.3.0.jar:/opt/apache-storm-1.0.3/lib/objenesis-2.1.jar:/opt/apache-storm-1.0.3/lib/clojure-1.7.0.jar:/opt/apache-storm-1.0.3/lib/disruptor-3.3.2.jar:/opt/apache-storm-1.0.3/lib/log4j-api-2.1.jar:/opt/apache-storm-1.0.3/lib/log4j-core-2.1.jar:/opt/apache-storm-1.0.3/lib/log4j-slf4j-impl-2.1.jar:/opt/apache-storm-1.0.3/lib/slf4j-api-1.7.7.jar:/opt/apache-storm-1.0.3/lib/log4j-over-slf4j-1.6.6.jar:/opt/apache-storm-1.0.3/lib/servlet-api-2.5.jar:/opt/apache-storm-1.0.3/lib/storm-rename-hack-1.0.3.jar:/opt/storm/conf org.apache.storm.utils.VersionInfo
Storm 1.0.3
URL https://git-wip-us.apache.org/repos/asf/storm.git -r eac433b0beb3798c4723deb39b3c4fad446378f4
Branch (no branch)
Compiled by ptgoetz on 2017-02-07T20:22Z
From source with checksum c78e52de4b8a22d99551d45dfe9c1a4b
My storm.yaml:
I'm running 2 instances with storm supervisor, each having the following config:
storm.zookeeper.servers:
- "10.138.0.8"
- "10.138.0.9"
- "10.138.0.16"
storm.zookeeper.port: 2181
nimbus.seeds: ["10.138.0.10"]
storm.local.dir: "/var/log/storm"
supervisor.slots.ports:
- 6700
- 6701
- 6702
- 6703
worker.childopts: "-Xmx768m"
nimbus.childopts: "-Xmx512m"
supervisor.childopts: "-Xmx256m"
toplogy.yaml
nimbus.host: "10.138.0.10"
# In Storm 0.7.x, this is necessary in order to give workers time to
# initialize. In Storm 0.8.0 and later, it may not be necessary because Storm
# has added a separate, longer timeout for the initial launch of a worker.
supervisor.worker.timeout.secs: 60
topology.workers: 1
topology
import tbolts
import tspouts
def create(builder):
"""Create toplogy through Petrel library
"""
# spout getting data from kafka instance
# we run 2 tasks of kafka spout
builder.setSpout("kafka",
tspouts.KafkaSpout(), 2)
# persistence bolt
# we run 4 tasks of persistence bolt
builder.setBolt("persistence",
tbolts.PersistenceBolt(), 4).shuffleGrouping("kafka")
The reason your emit count jumps up by 20 is due to the fact that Storm only samples every 20th tuple buy default to update its metrics. This sampling rate is controlled by the topology.stats.sample.rate config variable and can be changed per topology. So you could set this to be 1.0 (it is 0.05 by default) and you would get an accurate emit count, however this would introduce a significant processing overhead and may cause your Acker and/or metrics consumer instances to become overloaded. Use with caution.

metricbeat output kafka configuration

I am trying to get the system metrics using metricbeat (metricbeat 5.1.1 and output the data to kafka topic)
output.kafka:
# Boolean flag to enable or disable the output module.
enabled: true
# The list of Kafka broker addresses from where to fetch the cluster metadata.
# The cluster metadata contain the actual Kafka brokers events are published
# to.
hosts: ["XX.XXX.XXX.XX:9092","XX.XXX.XXX.XX:9092","XX.XXX.XXX.XX:9092"]
# The Kafka topic used for produced events. The setting can be a format string
# using any event field. To set the topic from document type use `%{[type]}`.
topic: ab-mb-raw, cd-mb-raw
Is it possible to push the data to more than one topic in kafka?
When I ran the above configuration, I am not able to see the data in the kafka topic persistent
Can anyone help me whether my config is correct?
Not directly as a static string, no. But you can read the comment there...
The setting can be a format string using any event field
Thus, if you can insert a field into the payload of which topic to be sent to, you can dynamically route data that way
Yes you can do that. Basically you will need to define in your prospectors configs.
For example:
Main filebeat.yml:
output.kafka:
hosts: ["kafka:9092"]
topic: "%{[type]}" <----- that is what you need.
compression: snappy
# Prospector configs
filebeat.config_dir: /opt/filebeat/etc/conf.d
Your prospectors from /opt/filebeat/etc/conf.d can looks something like these to files:
filebeat.prospectors:
- input_type: log
paths:
- "test.log"
document_type: "topic_name" <--------- topic per prospector

Resources