Kafka Connect Elasticsearch - NoSuchMethodError - elasticsearch

I am trying to run the kafka-connect-elasticsearch plugin from Confluent in order to stream topics from Kafka (V0.11.0.1) directly into Elasticsearch (without putting Logstash in between).
I build the connector using Maven -
$ cd kafka-connect-elasticsearch
$ mvn clean package
I then created the require configuration file -
name=es-cluster-lab
connector.class=io.confluent.connect.elasticsearch.ElasticsearchSinkConnector
tasks.max=1
topics=filebeats-test
topic.index.map=filebeats-test:kafka_test_index
key.ignore=true
schema-ignore=true
connection.url=http://elastic:9200
type.name=log
As per the new Kafka Classpath Isolation spec, I also added the following line to my connect-standalone.properties file -
plugin.path=/home/kafka/kafka-connect-elasticsearch-3.3.0/target/kafka-connect-elasticsearch-3.3.0-development/share/java/kafka-connect-elasticsearch/
I go to run the script ...
bin/connect-standalone.sh config/connect-standalone.properties config/elasticsearch-connect.properties
... and receive the below error.
[2017-09-14 16:08:26,599] INFO Loading plugin from: /home/kafka/kafka-connect-elasticsearch-3.3.0/target/kafka-connect-elasticsearch-3.3.0-development/share/java/kafka-connect-elasticsearch/slf4j-api-1.7.25.jar (org.apache.kafka.connect.runtime.isolation.DelegatingClassLoader:176)
Exception in thread "main" java.lang.NoSuchMethodError: com.google.common.collect.Sets$SetView.iterator()Lcom/google/common/collect/UnmodifiableIterator;
at org.reflections.Reflections.expandSuperTypes(Reflections.java:380)
at org.reflections.Reflections.<init>(Reflections.java:126)
at org.apache.kafka.connect.runtime.isolation.DelegatingClassLoader.scanPluginPath(DelegatingClassLoader.java:221)
at org.apache.kafka.connect.runtime.isolation.DelegatingClassLoader.scanUrlsAndAddPlugins(DelegatingClassLoader.java:198)
at org.apache.kafka.connect.runtime.isolation.DelegatingClassLoader.registerPlugin(DelegatingClassLoader.java:190)
at org.apache.kafka.connect.runtime.isolation.DelegatingClassLoader.initLoaders(DelegatingClassLoader.java:150)
at org.apache.kafka.connect.runtime.isolation.Plugins.<init>(Plugins.java:47)
at org.apache.kafka.connect.cli.ConnectStandalone.main(ConnectStandalone.java:68)
I also tried to move the JAR files into the /app/kafka/libs directory (default CLASSPATH) and even tried to create a subdirectory /app/kafka/libs/connect_libs and add that manually to my CLASSPATH environment variable.
Not sure what my next step is besides putting Logstash between Kafka and Elastic.

try to change the guava version to 20 before you build it

I think you are missing the star '*' at the end of the path of the plugin path.
plugin.path=/home/kafka/kafka-connect-elasticsearch-3.3.0/target/kafka-connect-elasticsearch-3.3.0-development/share/java/kafka-connect-elasticsearch/*

Related

debezium content based routing configuration

I'm using confluent so I've installed dibezium connectors according to confluent docs using confluent-hub in connect.properties I do have entry
plugin.path=/usr/share/java,/opt/confluent-6.0.0/share/confluent-hub-components
I need to use io.debezium.transforms.ContentBasedRouter https://debezium.io/documentation/reference/1.3/configuration/content-based-routing.html
so according to debezium doc I've downloaded debezium-scripting-1.3.1.Final.jar
and put it into
/opt/confluent-6.0.0/share/confluent-hub-components/ and copied it into
/opt/confluent-6.0.0/share/confluent-hub-components/debezium-debezium-connector-sqlserver/lib directories
here the entries in my mysql_src.json connector
"transforms": "unwrap,route",
"transforms.unwrap.type": "io.debezium.transforms.ExtractNewRecordState",
"transforms.unwrap.add.fields": "source.snapshot",
"transforms.route.type": "io.debezium.transforms.ContentBasedRouter",
"transforms.route.language": "jsr223.groovy",
"transforms.route.topic.expression": "value.__source_snapshot == 'false' ? 'test'"
when I'm trying to configure/load this connector I'm getting following error message
[2020-12-15 22:18:45,351] ERROR [Worker clientId=connect-1, groupId=connect-cluster] Failed to reconfigure connector's tasks, retrying after backoff: (org.apache.kafka.connect.runtime.distributed.DistributedHerder:1369)
java.lang.NoClassDefFoundError: io/debezium/DebeziumException
Any suggestions how to fix this problem ?
According the docs, you need to additionally obtain a JSR-223 script engine implementation and add its contents to the Debezium plug-in directories of your Kafka Connect environment, since:
Debezium does not come with any implementations of the JSR 223 API. To use an expression language with Debezium, you must download the JSR 223 script engine implementation for the language, and add to your Debezium connector plug-in directories, along any other JAR files used by the language implementation.
I am not sure that configuration is correct but I passed first configuration problem (I hope) I'm facing another problem now which I will describe in different question.
I am not sure what was wrong, I did following
Clean up zookeeper directories
Clean up kafka directories
Run kafka in distributed mode using command line start/stop scripts (not using confluent cli)
this solved java.lang.NoClassDefFoundError: io/debezium/DebeziumException
error

No suitable driver found for jdbc:mysql in Kafka Connect

connect-standalone.properties
connector.class=io.confluent.connect.jdbc.JdbcSourceConnector
bootstrap.servers=10.33.62.20:9092,10.33.62.110:9092,10.33.62.200:9092
key.converter=org.apache.kafka.connect.json.JsonConverter
value.converter=org.apache.kafka.connect.json.JsonConverter
key.converter.schemas.enable=true
value.converter.schemas.enable=true
offset.storage.file.filename=/tmp/connect.offsets
offset.flush.interval.ms=10000
plugin.path=/grid/1/mukul/confluent-5.0.0/share/java
source-sqlite.properties
name=test-source-sqlite-jdbc-autoincrement
connector.class=io.confluent.connect.jdbc.JdbcSourceConnector
tasks.max=5
connection.url=jdbc:mysql://10.32.177.178:3306/test&user=xxxx&password=xxxxx
table.whitelist=banner_hourly_statistics_v2
group.id=test-mysql-kafka
key.converter=org.apache.kafka.connect.json.JsonConverter
value.converter=org.apache.kafka.connect.json.JsonConverter
config.storage.topic=demo-1-distributed-config
offset.storage.topic=demo-1-distributed-offset
status.storage.topic=demo-1-distributed-status
bootstrap.servers=10.33.62.20:9092,10.33.62.110:9092,10.33.62.200:9092
mode=bulk
#incrementing.column.name=id
topic.prefix=test-sqlite-jdbc-
CMD: connect-standalone /grid/1/mukul/confluent-5.0.0/etc/kafka/connect-standalone.properties /grid/1/mukul/confluent-5.0.0/etc/kafka-connect-jdbc/source-quickstart-sqlite.properties
In the startup logs, it clearly shows loading JDBC Connectors:
[2018-08-09 06:59:30,072] INFO Loading plugin from: /grid/1/mukul/confluent-5.0.0/share/java/kafka-connect-jdbc (org.apache.kafka.connect.runtime.isolation.DelegatingClassLoader:218)
[2018-08-09 06:59:30,133] INFO Registered loader: PluginClassLoader{pluginLocation=file:/grid/1/mukul/confluent-5.0.0/share/java/kafka-connect-jdbc/} (org.apache.kafka.connect.runtime.isolation.DelegatingClassLoader:241)
[2018-08-09 06:59:30,133] INFO Added plugin 'io.confluent.connect.jdbc.JdbcSinkConnector' (org.apache.kafka.connect.runtime.isolation.DelegatingClassLoader:170)
[2018-08-09 06:59:30,133] INFO Added plugin 'io.confluent.connect.jdbc.JdbcSourceConnector' (org.apache.kafka.connect.runtime.isolation.DelegatingClassLoader:170)
But it fails with following exception:
Invalid value java.sql.SQLException: No suitable driver found for jdbc:mysql://10.32.177.178:3306/test&user=xxxx&password=xxxx for configuration Couldn't open connection to jdbc:mysql://10.32.177.178:3306/test&user=xxxx&password=xxx
Invalid value java.sql.SQLException: No suitable driver found for jdbc:mysql://10.32.177.178:3306/test&user=xxxx&password=xxxx for configuration Couldn't open connection to jdbc:mysql://10.32.177.178:3306/test&user=xxxx&password=xxxx
You can also find the above list of errors at the endpoint `/{connectorType}/config/validate`
at org.apache.kafka.connect.util.ConvertingFutureCallback.result(ConvertingFutureCallback.java:79)
at org.apache.kafka.connect.util.ConvertingFutureCallback.get(ConvertingFutureCallback.java:66)
at org.apache.kafka.connect.cli.ConnectStandalone.main(ConnectStandalone.java:110)
Tried changing the plugin directories too but it didn't work. Tried moving the confluent share/* to /usr/share/java too but it also didn't work.
Download the JAR from the URL: https://dev.mysql.com/downloads/connector/j/5.1.html
Place inside the Plugin dir
Run the connect
It will take start pulling data from MySql.
May be a little late. I had the same issue of "No Driver found.." when I connect DB2 using kafka jdbc connector.
1st Possible Solution:
I resolved it by placing the DB2 driver at the exact location where jdbc-connector is.
With in Kafka connect:
find / -name kafka-connect-jdbc\*.jar
Once you found the location from the above command, copy DB2 jar at that location:
cp {your DB2 jar location}/db2.jar {copy the location from 'find' command}
Example
cp /Download/db2.jar /Users/share/java/kafka-connect-java/
Restart kafka-connect, it will pick up the DB2 drivers
2nd Possible Solution:
Download the jt400 jar (jdk-8) and put it next to the other jdbc drivers (DB2, SQL etc)
Happy coding :)

apache zepplelin shows ava.lang.classnotfoundexception: com.mysql.jdbc.driver error

name value
common.max_count 1000
default.driver org.mysql.jdbc.Driver
default.password ****
default.url jdbc:mysql://localhost:3306/
default.user root
zeppelin.interpreter.localRepo /usr/local/zeppelin/local-repo/2DCVRUUK8
zeppelin.interpreter.output.limit 102400
zeppelin.jdbc.auth.type
zeppelin.jdbc.concurrent.max_connection 10
zeppelin.jdbc.concurrent.use true
Dependencies
artifact exclude
/usr/local/zeppelin/interpreter/jdbc/mysql-connector-java-5.1.46-bin.jar
These are my interpreter settings. I have loaded mysql-connector-java-5.1.46-bin.jar with correct path, then I am still unable to run this.
On a side note, if anyone is trying to access mysql table using spark as below
val tempDF = spark.read.jdbc(<JdbcConnectionURL>, "table_name", <ConnectionProperties>)
tempDF.createOrReplaceTempView("tempdf")
tempDF.show(10,false)
And encounters com.mysql.jdbc.driver error due to missing dependency then we can add the mysql-connector as dependency on spark interpreter as mentioned by #cricket_007
or by calling the below deprecated command on the first paragraph of zeppelin notebook
%dep
z.load("mysql:mysql-connector-java:8.0.11")
Zeppelin can just use Maven targets
Add this
mysql:mysql-connector-java:5.1.46
Restart the interpreter

Spring-xd strange too many open files error

I upgraded from spring-xd 1.2.1 to 1.3.0, and have both under /opt on my system. After starting xd in single node (but configured to use Zookeeper), I tried to create another stream (e.g. "time | log"), and spring-xd throws the following exception:
java.io.FileNotFoundException: /opt/spring-xd-1.2.1.RELEASE/xd/config/modules/modules.yml (Too many open files)
I changed ulimit -n 60000, but it didn't solve the problem. The strange thing is why it still points to spring-xd-1.2.1.RELEASE? I have started both xd-singlenode and xd-shell under /opt/spring-xd-1.3.1.RELEASE
EDIT: add xd-singlenode running process output just to show it's pointing to 1.3.1:
/usr/java/default/bin/java -Dspring.application.name=admin
-Dlogging.config=file:/opt/spring-xd-1.3.0.RELEASE/xd/config//
/xd-singlenode-logback.groovy -Dxd.home=/opt/spring-xd-1.3.0.RELEASE/xd
-Dspring.config.location=file:/opt/spring-xd-1.3.0.RELEASE/xd/config//
-Dxd.config.home=file:/opt
/spring-xd-1.3.0.RELEASE/xd/config//
-Dspring.config.name=servers,application
-Dxd.module.config.location=file:/opt/spring-xd-1.3.0.RELEASE/xd/config//modules/
-Dxd.module.config.name=modules -classpath
/opt/spring-xd-1.3.0.RELEASE/xd/modules/processor/scripts:/opt/spring-xd
-1.3.0.RELEASE/xd/config:/opt/spring-xd-1.3.0.RELEASE/xd/lib/activation-
...
have you updated your environment variables? specifically XD_CONFIG_LOCATION based on the error shown above.

Setting elasticsearch properties in spark-submit

I'm trying to launch Spark jobs that use Elastic Search input via command line using spark-submit as described in http://www.elasticsearch.org/guide/en/elasticsearch/hadoop/current/spark.html
I'm setting the properties in a file, but when launching spark-submit it gives the following warnings:
~/spark-1.0.1-bin-hadoop1/bin/spark-submit --class Main --properties-file spark.conf SparkES.jar
Warning: Ignoring non-spark config property: es.resource=myresource
Warning: Ignoring non-spark config property: es.nodes=mynode
Warning: Ignoring non-spark config property: es.query=myquery
...
Exception in thread "main" org.elasticsearch.hadoop.rest.EsHadoopNoNodesLeftException: Connection error (check network and/or proxy settings)- all nodes failed
My config file looks like (with correct values):
es.nodes nodeip:port
es.resource index/type
es.query query
Setting the properties in the Configuration object in the code works, but I need to avoid this workaround.
Is there a way to set those properties via command line?
I don't know if you resolved your issue (if so, how?), but I found this solution:
import org.elasticsearch.spark.rdd.EsSpark
EsSpark.saveToEs(rdd, "spark/docs", Map("es.nodes" -> "10.0.5.151"))
Bye
When you pass a config file to spark-submit, it only loads configs that start with 'spark.'
So, in my config I simply use
spark.es.nodes <es-ip>
and in the code itself I have to do
val conf = new SparkConf()
conf.set("es.nodes", conf.get("spark.es.nodes"))

Resources