debezium content based routing configuration - apache-kafka-connect

I'm using confluent so I've installed dibezium connectors according to confluent docs using confluent-hub in connect.properties I do have entry
plugin.path=/usr/share/java,/opt/confluent-6.0.0/share/confluent-hub-components
I need to use io.debezium.transforms.ContentBasedRouter https://debezium.io/documentation/reference/1.3/configuration/content-based-routing.html
so according to debezium doc I've downloaded debezium-scripting-1.3.1.Final.jar
and put it into
/opt/confluent-6.0.0/share/confluent-hub-components/ and copied it into
/opt/confluent-6.0.0/share/confluent-hub-components/debezium-debezium-connector-sqlserver/lib directories
here the entries in my mysql_src.json connector
"transforms": "unwrap,route",
"transforms.unwrap.type": "io.debezium.transforms.ExtractNewRecordState",
"transforms.unwrap.add.fields": "source.snapshot",
"transforms.route.type": "io.debezium.transforms.ContentBasedRouter",
"transforms.route.language": "jsr223.groovy",
"transforms.route.topic.expression": "value.__source_snapshot == 'false' ? 'test'"
when I'm trying to configure/load this connector I'm getting following error message
[2020-12-15 22:18:45,351] ERROR [Worker clientId=connect-1, groupId=connect-cluster] Failed to reconfigure connector's tasks, retrying after backoff: (org.apache.kafka.connect.runtime.distributed.DistributedHerder:1369)
java.lang.NoClassDefFoundError: io/debezium/DebeziumException
Any suggestions how to fix this problem ?

According the docs, you need to additionally obtain a JSR-223 script engine implementation and add its contents to the Debezium plug-in directories of your Kafka Connect environment, since:
Debezium does not come with any implementations of the JSR 223 API. To use an expression language with Debezium, you must download the JSR 223 script engine implementation for the language, and add to your Debezium connector plug-in directories, along any other JAR files used by the language implementation.

I am not sure that configuration is correct but I passed first configuration problem (I hope) I'm facing another problem now which I will describe in different question.
I am not sure what was wrong, I did following
Clean up zookeeper directories
Clean up kafka directories
Run kafka in distributed mode using command line start/stop scripts (not using confluent cli)
this solved java.lang.NoClassDefFoundError: io/debezium/DebeziumException
error

Related

Is it possible to read a file using SparkSession object of Scala language on Windows?

I've been trying to read from a .csv file on many ways, utilizing SparkContext object. I found it possible through scala.io.Source.fromFile function, but I want to use spark object. Everytime I run function textfile for org.apache.spark.SparkContext I get the same error:
scala> sparkSession.read.csv("file://C:\\Users\\184229\\Desktop\\bigdata.csv")
21/12/29 16:47:32 WARN streaming.FileStreamSink: Error while looking for metadata directory.
java.lang.UnsupportedOperationException: Not implemented by the DistributedFileSystem FileSystem implementation
.....
As it's mentioned in the title I run the code on Windows in IntelliJ
[Edit]
In build.sbt have no redundant or overlapped dependencies. I use hadoop-tools, spark-sql and hadoop-xz.
Have you tried to run your spark-shell using local mode?
spark-shell --master=local
Also pay attention to not use both Hadoop-code and Hadoop-commons as a dependencies since you may have conflicting jars issues.
I've found the solution, precisely one of my colleague did that.
In dependencies build.sbt I changed hadoop-tools to hadoop-commons and it worked out.

Failed to set ConnectorClientConfigOverridePolicy to All in Debezium mysql connector in docker

Trying to set ConnectorClientConfigOverridePolicy, by adding CONNECTOR_CLIENT_CONFIG_OVERRIDE_POLICY=All. During start up debezium connector fails with matches all=All. Seems, CONNECTOR_CLIENT_CONFIG_OVERRIDE_POLICY duplicates value, instead "All", value is "all=All"
Stopping due to error [org.apache.kafka.connect.cli.ConnectDistributed]
org.apache.kafka.connect.errors.ConnectException: Failed to find any class that implements interface org.apache.kafka.connect.connector.policy.ConnectorClientConfigOverridePolicy and which name matches all=All
Is it bug or I am doing something wrong?
Using debezium docker 1.5
this is partly bug and partly misconfiguration.
The env var should be named CONNECT_CONNECTOR_CLIENT_CONFIG_OVERRIDE_POLICY=All
But at the same time the start script that processes all env vars named CONNECT_ does not check for the underscore so CONNECTOR... matches too which breaks the further logic.

Kafka Connect Elasticsearch - NoSuchMethodError

I am trying to run the kafka-connect-elasticsearch plugin from Confluent in order to stream topics from Kafka (V0.11.0.1) directly into Elasticsearch (without putting Logstash in between).
I build the connector using Maven -
$ cd kafka-connect-elasticsearch
$ mvn clean package
I then created the require configuration file -
name=es-cluster-lab
connector.class=io.confluent.connect.elasticsearch.ElasticsearchSinkConnector
tasks.max=1
topics=filebeats-test
topic.index.map=filebeats-test:kafka_test_index
key.ignore=true
schema-ignore=true
connection.url=http://elastic:9200
type.name=log
As per the new Kafka Classpath Isolation spec, I also added the following line to my connect-standalone.properties file -
plugin.path=/home/kafka/kafka-connect-elasticsearch-3.3.0/target/kafka-connect-elasticsearch-3.3.0-development/share/java/kafka-connect-elasticsearch/
I go to run the script ...
bin/connect-standalone.sh config/connect-standalone.properties config/elasticsearch-connect.properties
... and receive the below error.
[2017-09-14 16:08:26,599] INFO Loading plugin from: /home/kafka/kafka-connect-elasticsearch-3.3.0/target/kafka-connect-elasticsearch-3.3.0-development/share/java/kafka-connect-elasticsearch/slf4j-api-1.7.25.jar (org.apache.kafka.connect.runtime.isolation.DelegatingClassLoader:176)
Exception in thread "main" java.lang.NoSuchMethodError: com.google.common.collect.Sets$SetView.iterator()Lcom/google/common/collect/UnmodifiableIterator;
at org.reflections.Reflections.expandSuperTypes(Reflections.java:380)
at org.reflections.Reflections.<init>(Reflections.java:126)
at org.apache.kafka.connect.runtime.isolation.DelegatingClassLoader.scanPluginPath(DelegatingClassLoader.java:221)
at org.apache.kafka.connect.runtime.isolation.DelegatingClassLoader.scanUrlsAndAddPlugins(DelegatingClassLoader.java:198)
at org.apache.kafka.connect.runtime.isolation.DelegatingClassLoader.registerPlugin(DelegatingClassLoader.java:190)
at org.apache.kafka.connect.runtime.isolation.DelegatingClassLoader.initLoaders(DelegatingClassLoader.java:150)
at org.apache.kafka.connect.runtime.isolation.Plugins.<init>(Plugins.java:47)
at org.apache.kafka.connect.cli.ConnectStandalone.main(ConnectStandalone.java:68)
I also tried to move the JAR files into the /app/kafka/libs directory (default CLASSPATH) and even tried to create a subdirectory /app/kafka/libs/connect_libs and add that manually to my CLASSPATH environment variable.
Not sure what my next step is besides putting Logstash between Kafka and Elastic.
try to change the guava version to 20 before you build it
I think you are missing the star '*' at the end of the path of the plugin path.
plugin.path=/home/kafka/kafka-connect-elasticsearch-3.3.0/target/kafka-connect-elasticsearch-3.3.0-development/share/java/kafka-connect-elasticsearch/*

configuring springXD Horthonworks

I try to Configuring Spring XD to use Hadoop (horthonworks) but when I excute this line In a terminal "./xd-singlenode --hadoopDistro hadoop11" ,I get error 'hadoop11' is not a valid value for option --hadoopDistro Possible values are [cdh5, hdp22, phd21, hadoop27, phd30]
The error message is pretty clear, the Spring Xd version your are trying to use is wrong. For horthonworks configuration you should use hdp22.
Regards

HPCC/HDFS Connector

Does anyone know about HPCC/HDFS connector.we are using both HPCC and HADOOP.There is one utility(HPCC/HDFS connector) developed by HPCC which allows HPCC cluster to acess HDFS data
i have installed the connector but when i run the program to acess data from hdfs it gives error as libhdfs.so.0 doesn't exist.
I tried to build libhdfs.so using command
ant compile-libhdfs -Dlibhdfs=1
its giving me error as
target "compile-libhdfs" does not exist in the project "hadoop"
i used one more command
ant compile-c++-libhdfs -Dlibhdfs=1
its giving error as
ivy-download:
[get] Getting: http://repo2.maven.org/maven2/org/apache/ivy/ivy/2.1.0/ivy-2.1.0.jar
[get] To: /home/hadoop/hadoop-0.20.203.0/ivy/ivy-2.1.0.jar
[get] Error getting http://repo2.maven.org/maven2/org/apache/ivy/ivy/2.1.0/ivy-2.1.0.jar
to /home/hadoop/hadoop-0.20.203.0/ivy/ivy-2.1.0.jar
BUILD FAILED java.net.ConnectException: Connection timed out
any suggestion will be a great help
Chhaya, you might not need to build libhdfs.so, depending on how you installed hadoop, you might already have it.
Check in HADOOP_LOCATION/c++/Linux-<arch>/lib/libhdfs.so, where HADOOP_LOCATION is your hadoop install location, and arch is the machine’s architecture (i386-32 or amd64-64).
Once you locate the lib, make sure the H2H connector is configured correctly (see page 4 here).
It's just a matter of updating the HADOOP_LOCATION var in the config file:
/opt/HPCCSystems/hdfsconnector.conf
good luck.

Resources