Does any monitoring metrics/events exist for ClickHouse keeper? - clickhouse

I am considering using ClickHouse keeper to replace zookeeper for data replication. And zookeeper has lots of useful metrics for monitoring/convenient triage. I checked ClickHouse documents and CurrentMetrics/ProfileEvents files but found no similar monitoring data to zk(https://zookeeper.apache.org/doc/r3.7.0/zookeeperMonitor.html).
Pls. direct me to the right way, thanks!

ClickHouse-keeper already supports 4-letter commands 'ruok' and 'mntr'
# echo 'mntr' | nc localhost 9181
zk_version v22.2.1.2764-testing-4fab6bec4ec53b66246a055919a4ed4c0610f650
zk_avg_latency 0
zk_max_latency 33
zk_min_latency 0
zk_packets_received 15430936
zk_packets_sent 15430936
zk_num_alive_connections 1
zk_outstanding_requests 0
zk_server_state standalone
zk_znode_count 4272
zk_watch_count 235
zk_ephemerals_count 111
zk_approximate_data_size 781777
zk_open_file_descriptor_count 203
zk_max_file_descriptor_count 18446744073709551615
zk_followers 0
zk_synced_followers 0
echo 'ruok' | nc localhost 9181
imok
It is possible to export those in Prometheus format using external tools like https://github.com/dabealu/zookeeper-exporter
Future versions will have embedded Prometheus exporter.

They are not implemented yet. There are plans to expose keeper metrics through Prometheus endpoint.

ClickHouse Keeper now has support for Prometheus endpoint https://github.com/ClickHouse/ClickHouse/pull/43087

Related

debezium content based routing configuration

I'm using confluent so I've installed dibezium connectors according to confluent docs using confluent-hub in connect.properties I do have entry
plugin.path=/usr/share/java,/opt/confluent-6.0.0/share/confluent-hub-components
I need to use io.debezium.transforms.ContentBasedRouter https://debezium.io/documentation/reference/1.3/configuration/content-based-routing.html
so according to debezium doc I've downloaded debezium-scripting-1.3.1.Final.jar
and put it into
/opt/confluent-6.0.0/share/confluent-hub-components/ and copied it into
/opt/confluent-6.0.0/share/confluent-hub-components/debezium-debezium-connector-sqlserver/lib directories
here the entries in my mysql_src.json connector
"transforms": "unwrap,route",
"transforms.unwrap.type": "io.debezium.transforms.ExtractNewRecordState",
"transforms.unwrap.add.fields": "source.snapshot",
"transforms.route.type": "io.debezium.transforms.ContentBasedRouter",
"transforms.route.language": "jsr223.groovy",
"transforms.route.topic.expression": "value.__source_snapshot == 'false' ? 'test'"
when I'm trying to configure/load this connector I'm getting following error message
[2020-12-15 22:18:45,351] ERROR [Worker clientId=connect-1, groupId=connect-cluster] Failed to reconfigure connector's tasks, retrying after backoff: (org.apache.kafka.connect.runtime.distributed.DistributedHerder:1369)
java.lang.NoClassDefFoundError: io/debezium/DebeziumException
Any suggestions how to fix this problem ?
According the docs, you need to additionally obtain a JSR-223 script engine implementation and add its contents to the Debezium plug-in directories of your Kafka Connect environment, since:
Debezium does not come with any implementations of the JSR 223 API. To use an expression language with Debezium, you must download the JSR 223 script engine implementation for the language, and add to your Debezium connector plug-in directories, along any other JAR files used by the language implementation.
I am not sure that configuration is correct but I passed first configuration problem (I hope) I'm facing another problem now which I will describe in different question.
I am not sure what was wrong, I did following
Clean up zookeeper directories
Clean up kafka directories
Run kafka in distributed mode using command line start/stop scripts (not using confluent cli)
this solved java.lang.NoClassDefFoundError: io/debezium/DebeziumException
error

how to deal "failed to connect to Kubernetes master ip"

I started learn Kubernetes, so I follow this guide https://kubernetes.io/docs/setup/turnkey/gce/
I got an error after I run "cluster/kube-up.sh"
Waiting up to 300 seconds for cluster initialization.
This will continually check to see if the API for kubernetes is reachable.
This may time out if there was some uncaught error during start up.
.........................................................................................................................................Cluster failed to initialize within 300 seconds.
Last output from querying API server follows:
% Total % Received % Xferd Average Speed Time Time Time current
             Dload Upload Total Spent Left speed
0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0curl: (7) Failed to connect to "[kubernetes-master (external IP)]"
So I tried to search about this error. And then found one solution. It said go to the "cluster/gce/config-default.sh"
secret: $(dd if=/dev/urandom iflag=fullblock bs=32 count=1 2>/dev/null | base64 | tr -d '\r\n')
And then change 'dd' to 'gdd'. So I modified config-default.sh, But it doesn't work.
I don't know how to fix it. Is there are any solution about this error ?̊̈ Also in the /var/logs there are no logs about Kubernetes. Where logs are saved ?̊̈
My Mac version: mas OS Mojave
version: 10.14.3
Looks like GDD is only available in MacOS, about the Kubernetes logs you can have a look at this other article. If you would have a Google Kubernetes Engine cluster it would be easier to check the log by looking at Stackdriver.

hashicorp consul is not publishing all the metrics

consul isn't publishing all the metrics defined in their document, from https://www.consul.io/docs/agent/telemetry.html#transaction-timing, it shows only raft metrics but not txn kvs, has anyone observed this problem?
Command to enable prometheus style metrics:
consul agent -dev -hcl 'telemetry{prometheus_retention_time="24h" disable_hostname=true}'
watch metrics:
watch -n 1 -d "curl -s localhost:8500/v1/agent/metrics?format=prometheus|grep -v ^# | grep -E 'kvs|txn|raft'"
Metrics will be exported only if they are available, i.e. if there are no transactions or KV store operations, then you will not see these metrics in the output.
I have managed to see kvs metrics in the example you have provided. While running Consul agent via command in the question, in browser open http://127.0.0.1:8500/ and click on Key/Value option in the top list (you should end up here http://127.0.0.1:8500/ui/dc1/kv). Click on Create to add new Key/Value pair. After clicking Save you should see something like this in the terminal running watch command:
consul_fsm_kvs{op="set",quantile="0.5"} 0.3572689890861511
consul_fsm_kvs{op="set",quantile="0.9"} 0.3572689890861511
consul_fsm_kvs{op="set",quantile="0.99"} 0.3572689890861511
consul_fsm_kvs_sum{op="set"} 0.3572689890861511
consul_fsm_kvs_count{op="set"} 1
consul_kvs_apply{quantile="0.5"} 2.6777150630950928
consul_kvs_apply{quantile="0.9"} 2.6777150630950928
consul_kvs_apply{quantile="0.99"} 2.6777150630950928
consul_kvs_apply_sum 2.6777150630950928
consul_kvs_apply_count 1
If there are no more transactions some of these values will be set to NaN value, depends on Prometheus metrics type.
Similarly, to see txn, you need to create Consul Transaction
Hope that helps you set up monitoring.

Running two SonarQube instances in Windows Server 2008 R2

I've already run one SonarQube instance at port 9000 and able access it at address: localhost:9000.
Now I would like to run another SonarQube instance for my new project at port 10000. I've changed in sonar.properties file:
sonar.web.port: 10000
sonar.web.context: /
However, when I run C:\SonarMAP\bin\windows-x86-64\StartSonar.bat, I got the ERROR message:
wrapper | ERROR: Another instance of the SonarQube application is already running.
Press any key to continue . . .
I do some research to solve this problem but can't find any helpful information.
Any suggestion ? Thanks !
UPDATE
The instance 1 configuration:
sonar.jdbc.username=username
sonar.jdbc.password=password
sonar.jdbc.url=jdbc:postgresql://server15/sonarQube
sonar.jdbc.driverClassName: org.postgresql.Driver
sonar.jdbc.validationQuery: select 1
sonar.jdbc.maxActive=20
sonar.jdbc.maxIdle=5
sonar.jdbc.minIdle=2
sonar.jdbc.maxWait=5000
sonar.jdbc.minEvictableIdleTimeMillis=600000
sonar.jdbc.timeBetweenEvictionRunsMillis=30000
The instance 2 configuration:
sonar.jdbc.username=username
sonar.jdbc.password=password
sonar.jdbc.url: jdbc:postgresql://localhost/sonarMAP
sonar.jdbc.driverClassName: org.postgresql.Driver
sonar.jdbc.validationQuery: select 1
sonar.jdbc.maxActive: 20
sonar.jdbc.maxIdle: 5
sonar.jdbc.minIdle: 2
sonar.jdbc.maxWait: 5000
sonar.jdbc.minEvictableIdleTimeMillis: 600000
sonar.jdbc.timeBetweenEvictionRunsMillis: 30000
sonar.web.port: 9100
sonar.web.context: /
sonar.search.port=9101
sonar.notifications.delay: 60
Apparently you can't run multiple instances on Windows because of wrapper.single_invocation=true in conf/wrapper.conf.
Setting it to false seems to allow this (you'll still have to use different ports as Fabrice explained in his answer) but this is getting into grey zone: non recommended and non tested setup.
You need to change other settings inside the conf/sonar.properties file, namely:
sonar.search.port: the port of Elasticsearch
sonar.search.httpPort: if you enabled it in the first instance, you've got to change it as well
and obviously you can't connect to the same schema of the same DB

How to diagnose Kafka topics failing globally to be found

I have Kafka 0.8.1.2.2 running on a HDP 2.2.4 cluster (3 brokers on 3 ZK nodes - ZK 3.4.6.2.2). All worked well for a couple of days and now my topics seem to have become unreachable by producers and consumer. I am newer to Kafka and am seeking a way to determine what has gone wrong and how to fix it as "just re-installing" will not be an option once in production.
Previously, messages were successfully received by my topics and could then be consumed. Now, even the most basic of operations fails immediately. If I ssh to a broker node and create a new topic:
[root#dev-hdp-0 kafka]# bin/kafka-topics.sh --create --zookeeper 10.0.0.39:2181 --replication-factor 3 --partitions 3 --topic test4
Created topic "test4".
So far so good. Now, we check the description:
[root#dev-hdp-0 kafka]# bin/kafka-topics.sh --describe --zookeeper 10.0.0.39:2181 --topic test4
Topic:test4 PartitionCount:3 ReplicationFactor:3 Configs:
Topic: test4 Partition: 0 Leader: 2 Replicas: 2,1,0 Isr: 2
Topic: test4 Partition: 1 Leader: 0 Replicas: 0,2,1 Isr: 0,2,1
Topic: test4 Partition: 2 Leader: 1 Replicas: 1,0,2 Isr: 1,0,2
OK - now if I create a consumer:
[2015-06-09 08:34:27,458] WARN [console-consumer-45097_dev-hdp-0.cloud.stp-1.sparfu.com-1433856803464-12b54195-leader-finder-thread], Failed to add leader for partitions [test4,0],[test4,2],[test4,1]; will retry (kafka.consumer.ConsumerFetcherManager$LeaderFinderThread)
java.net.ConnectException: Connection timed out
at sun.nio.ch.Net.connect0(Native Method)
at sun.nio.ch.Net.connect(Net.java:465)
at sun.nio.ch.Net.connect(Net.java:457)
at sun.nio.ch.SocketChannelImpl.connect(SocketChannelImpl.java:670)
at kafka.network.BlockingChannel.connect(BlockingChannel.scala:57)
at kafka.consumer.SimpleConsumer.connect(SimpleConsumer.scala:44)
at kafka.consumer.SimpleConsumer.getOrMakeConnection(SimpleConsumer.scala:142)
at kafka.consumer.SimpleConsumer.kafka$consumer$SimpleConsumer$$sendRequest(SimpleConsumer.scala:69)
at kafka.consumer.SimpleConsumer.getOffsetsBefore(SimpleConsumer.scala:124)
at kafka.consumer.SimpleConsumer.earliestOrLatestOffset(SimpleConsumer.scala:157)
at kafka.consumer.ConsumerFetcherThread.handleOffsetOutOfRange(ConsumerFetcherThread.scala:60)
at kafka.server.AbstractFetcherThread$$anonfun$addPartitions$2.apply(AbstractFetcherThread.scala:179)
at kafka.server.AbstractFetcherThread$$anonfun$addPartitions$2.apply(AbstractFetcherThread.scala:174)
at scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:772)
at scala.collection.immutable.Map$Map1.foreach(Map.scala:109)
at scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:771)
at kafka.server.AbstractFetcherThread.addPartitions(AbstractFetcherThread.scala:174)
at kafka.server.AbstractFetcherManager$$anonfun$addFetcherForPartitions$2.apply(AbstractFetcherManager.scala:86)
at kafka.server.AbstractFetcherManager$$anonfun$addFetcherForPartitions$2.apply(AbstractFetcherManager.scala:76)
at scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:772)
at scala.collection.immutable.Map$Map3.foreach(Map.scala:154)
at scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:771)
at kafka.server.AbstractFetcherManager.addFetcherForPartitions(AbstractFetcherManager.scala:76)
at kafka.consumer.ConsumerFetcherManager$LeaderFinderThread.doWork(ConsumerFetcherManager.scala:95)
at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:51)
[2015-06-09 08:35:30,709] WARN [console-consumer-45097_dev-hdp-0.cloud.stp-1.sparfu.com-1433856803464-12b54195-leader-finder-thread], Failed to add leader for partitions [test4,0],[test4,2],[test4,1]; will retry (kafka.consumer.ConsumerFetcherManager$LeaderFinderThread)
I have been poking around for something related to Failed to add leader for partitions as that seems to be key, but I have yet to find anything specific there that is helpful.
So, if I try using the simple consumer shell for a known partition:
[root#dev-hdp-0 kafka]# bin/kafka-simple-consumer-shell.sh --broker-list 10.0.0.39:6667,10.0.0.45:6667,10.0.0.48:6667 --skip-message-on-error --offset -1 --print-offsets --topic test4 --partition 0
Error: partition 0 does not exist for topic test4
Despite the --describe operation showing clearly that partition 0 very much does exist.
I have a simple Spark application that publishes a small number of messages to a topic, but this too fails to publish (on brand new and older, previously working, topics). A excerpt from the console here also alludes to leader issues:
15/06/08 15:05:35 WARN BrokerPartitionInfo: Error while fetching metadata [{TopicMetadata for topic test8 ->
No partition metadata for topic test8 due to kafka.common.LeaderNotAvailableException}] for topic [test8]: class kafka.common.LeaderNotAvailableException
15/06/08 15:05:35 ERROR DefaultEventHandler: Failed to collate messages by topic, partition due to: Failed to fetch topic metadata for topic: test8
15/06/08 15:05:35 WARN BrokerPartitionInfo: Error while fetching metadata [{TopicMetadata for topic test8 ->
No partition metadata for topic test8 due to kafka.common.LeaderNotAvailableException}] for topic [test8]: class kafka.common.LeaderNotAvailableException
Additionally, if we try the console producer:
[root#dev-hdp-0 kafka]# bin/kafka-console-producer.sh --broker-list 10.0.0.39:6667,10.0.0.45:6667,10.0.0.48:6667 --topic test4
foo
[2015-06-09 08:58:36,456] WARN Error while fetching metadata [{TopicMetadata for topic test4 ->
No partition metadata for topic test4 due to kafka.common.LeaderNotAvailableException}] for topic [test4]: class kafka.common.LeaderNotAvailableException (kafka.producer.BrokerPartitionInfo)
I have scanned the logs under /var/log/kafka and nothing any more descriptive than the console output presents itself. Searching on the various exceptions has yielded little more than others with similar mysterious issues.
That all said is there a way to diagnose properly why my broker set suddenly stopped working when there has been no changes to the environment or configs? Has anyone encounter a similar scenario and found a corrective set of actions?
Some other details:
All nodes are CentOS 6.6 on an OpenStack private cloud
HDP Cluster 2.2.4.2-2 installed and configured using Ambari 2.0.0
Kafka service has been restarted (a few times now...)
Not sure what else might be helpful - let me know if there are other details that could help to shed light on the problem.
Thank you.
Looks like forcibly stopping (kill -9) and restarting kafka did the trick.
Graceful shutdown didn't work.
Looking at the boot scripts, kafka and zookeeper where coming up at the same time (S20kafka, S20zookeeper) - so perhaps that was the initial problem. For now... not going to reboot this thing.

Resources