I am using Kafka connect to sink data into elastic search. Usually, we ignore the empty fields when persisting to the elastic search. Can we do the same using Kafka connect?
Sample input
{"field1":"1","field2":""}
In the elastic index
{"field1":"1"}
In kafka connect there is a term called SMT, single message transformation, it has several types of supported functions, but none of them doing what you wish for, you can however write your on SMT doing that action ,
● Create your JAR file.
● Install the JAR file. Copy your custom SMT JAR file (and any non-Kafka JAR files required by the transformation) into a directory that is under one of the directories listed in the plugin.path property in the Connect worker configuration
refer to further instructions
https://docs.confluent.io/platform/current/connect/transforms/custom.html
Related
I am trying to produce message using kafka-console-producer from apache-kafka binary and consume from consumer setup in spring boot. Consumer uses avro schema.
When message is produced in json format, my consumer is throwing exception - “not able to serialize”.
I found a solution for this to use “Confluent Platform 7.1”, which has kafka-avro-console-producer. It supports avro but it is an enterprise edition.
Is there a way to produce/consume messages with avro schema using apache-kafka itself with kafka-console-producer?
kafka-console-producer only accepts UTF8 strings, by default. Internally, it defaults to use StringSerializer
The kafka-avro-console-consumer wraps the other, and it's source-available, not Enterprise. You'd need to download at least the Confluent kafka-avro-serializer JAR file(s) and dependencies to even produce Avro data with it, but this also will require you to use the Schema Registry.
If you simply want to produce Avro binary data with no Registry, you can use Avro BinaryEncoder class, however, this will require you to also need your own deserializer in any consumer, rather than using ones provided by Confluent (again, free, not Enterprise)
I want to use Apache NiFi for writing some data encoded in Avro to Apache Kafka.
Therefore i use the ConvertRecord processor for converting from JSON to Avro. For Avro the AvroRecordSetWriter with ConfluentSchemaRegistry is used. The schema url is set to http:<hostname>:<port>/apis/ccompat/v6 (hostname/port are not important for this question). For having a free alternative to Confluent Schema Registry i deployed a Apicurio Schema Registry. The ccompat API should be compatible to Confluent.
But when i run the NiFi pipeline i get the following error, that the schema with the given name is not found:
Could not retrieve schema with the given name [...] from the configured Schema Registry
But i definitely created the Avro schema with this name in the Web-UI of Apicurio Registry.
Can someone please help me? Is there anybody who is using NiFi for Avro encoding in Kafka by using Apicurio Schema Registry?
Update:
Here are some screenshots of my pipeline and its configuration.
Set schema name via UpdateAttribute
Use ConvertRecord with JsonTreeReader
and ConfluentSchemaRegistry
and AvroSetWriter
Update 2:
This artifact id has to be set:
I am reading the configuration reference for the JanusGraph https://docs.janusgraph.org/0.2.0/config-ref.html and I wonder - in which file should I write those values? Under conf directory of the JanusGraph there is no single common file in which I can write those options. I am so confused and documentation does not specify it!
These configuration options should be put into a .properties configuration file. The JanusGraph distribution zip archive already contains example configuration files for different backends that can be used as a basis, for example for Cassandra with Elasticsearch: conf/janusgraph-cql-es.properties.
How these configuration files are provided to JanusGraph depends on whether you want to use JanusGraph embedded or via JanusGraph Server (which is the recommended approach). If you want to use JanusGraph Server, then you need a section that specifies this configuration file for JanusGraph in the configuration file of the server which is conf/gremlin-server/gremlin-server.yaml by default:
graphs: {
graph: conf/janusgraph-cql-es.properties
}
The chapter Using Configuration of the JanusGraph docs contains more information about how the configuration can be applied.
I'm new to Apache Kafka and I'm trying to create multiple tasks, each one with a separate purpose. But the source connector method taskClass() returns only 1 task class. So how do I create more ?
Deploy multiple connector instances, using the plugin you require. You have gone too low-level looking at taskClass() etc. If you can't get the data from multiple sources with a single connector configuration, just create additional connector configurations. Each is just config files. One connector = one task (or more, if scaled out).
When i try to connect the elastic search jdbc river plugin with postgres or h2 db to get the data into the elastic search engine, it behaves properly.
But in case of informix it always give this kind of error :-
java.sql.SQLException: No suitable driver found for jdbc:informix-sqli:
even after i put the jar file into the plugin/jdbc folder.
can anybody has any idea on that.
The issue was with the Jar, I had all the 6 jars but the thing wat elastic search engine accepts a jar in specific way which means jar should contail Meta-Inf->services->jdbc.sql.Driver, which was not there so explicitly I had mension the driver name in the elastic search configuration. which is
set JAVA_OPTS=%JAVA_OPTS% -Djdbc.drivers=com.informix.jdbc.IfxDriver