Kafka Console Producer using avro - spring-boot

I am trying to produce message using kafka-console-producer from apache-kafka binary and consume from consumer setup in spring boot. Consumer uses avro schema.
When message is produced in json format, my consumer is throwing exception - “not able to serialize”.
I found a solution for this to use “Confluent Platform 7.1”, which has kafka-avro-console-producer. It supports avro but it is an enterprise edition.
Is there a way to produce/consume messages with avro schema using apache-kafka itself with kafka-console-producer?

kafka-console-producer only accepts UTF8 strings, by default. Internally, it defaults to use StringSerializer
The kafka-avro-console-consumer wraps the other, and it's source-available, not Enterprise. You'd need to download at least the Confluent kafka-avro-serializer JAR file(s) and dependencies to even produce Avro data with it, but this also will require you to use the Schema Registry.
If you simply want to produce Avro binary data with no Registry, you can use Avro BinaryEncoder class, however, this will require you to also need your own deserializer in any consumer, rather than using ones provided by Confluent (again, free, not Enterprise)

Related

Local verification of avro schema validity and compatibility

We're using avro for (de)serialization of messages that flow through a message broker. For the purpose of storing the avro files a schema registry (apicurio) is used. This provides two benefits - schema validation and compatibility validation. However, I'm wondering if there is a way to go around the schema registry and achieve the same locally, using a script/plugin. Validating if an avro file is syntactically/semantically valid should be possible. The same applies for compatibility validation, as checking if a new schema version is backward/forward compatible against a list of other schemas (the previous versions) also sounds doable locally.
Is there a library that does that? Ideally a gradle plugin, but a java/python library would do as well, as it can easily be called from a gradle task.
I believe this is confluent's Java class for checking schema compatibility within its schema registry:
https://github.com/confluentinc/schema-registry/blob/master/core/src/test/java/io/confluent/kafka/schemaregistry/avro/AvroCompatibilityTest.java
You can use it to validate schemas locally.
Expedia has used it as a basis to create their own compatibility tool:
https://github.com/ExpediaGroup/avro-compatibility
I could not find a plugin doing just that what you ask for. Plugins seem to aim for generating classes from the schema files (i.e. https://github.com/davidmc24/gradle-avro-plugin). Without getting into why you want to do this, I think you could use this simple approach (How do I call a static Java method from Gradle) to hook you custom code into Gradle and checking for schema validity and compatibility.
Refer to following Avro Java API:
https://avro.apache.org/docs/current/api/java/org/apache/avro/SchemaCompatibility.html
https://avro.apache.org/docs/current/api/java/org/apache/avro/SchemaValidatorBuilder.html
Also checking this particular class can be helpful for executing validation against the schema:
https://github.com/apache/avro/blob/master/lang/java/tools/src/main/java/org/apache/avro/tool/DataFileReadTool.java

Using Apache NiFi's ConfluentSchemaRegistry with Apicurio Schema Registry

I want to use Apache NiFi for writing some data encoded in Avro to Apache Kafka.
Therefore i use the ConvertRecord processor for converting from JSON to Avro. For Avro the AvroRecordSetWriter with ConfluentSchemaRegistry is used. The schema url is set to http:<hostname>:<port>/apis/ccompat/v6 (hostname/port are not important for this question). For having a free alternative to Confluent Schema Registry i deployed a Apicurio Schema Registry. The ccompat API should be compatible to Confluent.
But when i run the NiFi pipeline i get the following error, that the schema with the given name is not found:
Could not retrieve schema with the given name [...] from the configured Schema Registry
But i definitely created the Avro schema with this name in the Web-UI of Apicurio Registry.
Can someone please help me? Is there anybody who is using NiFi for Avro encoding in Kafka by using Apicurio Schema Registry?
Update:
Here are some screenshots of my pipeline and its configuration.
Set schema name via UpdateAttribute
Use ConvertRecord with JsonTreeReader
and ConfluentSchemaRegistry
and AvroSetWriter
Update 2:
This artifact id has to be set:

Spring boot Kafka - Confusion over Avro object serialisation and use cases

I think I'm not grasping some basic concepts of Kafka here, so I'm hoping Stack maybe able to the help.
I've been trying to learn Kafka with Spring boot by following this GIT repo here:
I understand how to without avro take a Java class from one Microservice, send it to Kafka and consume / serialise it on another Microservice...however I hate that idea. As it means I must have an identical class on the other Microservice in terms of package location / name etc
So overall I've two questions here I guess.
I want to understand how I can share message across my spring boot microservices and map them to classes without copying said classes from one service to the other
I want to be able to consume from my Spring Kafka listeners messages created from another language say C#
Where I'm currently at is, I have the avro example from the repo above up and running along with my local kafka and Schema registry instance.
However if I create a duplicate class and call it UserTest (For example) and have it identical to the User class consumed here I get stacktraces like the following:
Caused by: org.springframework.messaging.converter.MessageConversionException: Cannot convert from [io.confluent.developer.User] to [io.confluent.developer.kafkaworkshop.streams.User] for GenericMessage [payload={"name": "vik", "age": 33}, headers={kafka_offset=6, kafka_consumer=org.apache.kafka.clients.consumer.KafkaConsumer#54200a0e, kafka_timestampType=CREATE_TIME, kafka_receivedMessageKey=vik, kafka_receivedPartitionId=1, kafka_receivedTopic=users12, kafka_receivedTimestamp=1611278455840, kafka_groupId=simple-consumer}]
Am I missing something exceptionally basic here? I thought that once the message was send in Avro format that it could be consume and mapped to another object which had the same fields...that way if the object was created in c#, the spring service would be able to interpret it no?
If anyone can help me that would be great....
Thanks!

How do I share between applications the auto-generated SpecificRecord POJO by avro-maven-plugin

I'm new to Kafka and Avro, I have been reading how Avro provides a nice maven utility avro-maven-plugin to generate Avro POJO records, also I understand that using the schema registry is the way to have a centralized place and to share schema evolutions across multiple applications, most consumers.
All the examples that I have been looking at, have the consumer and producer in the same Java application using the same generated SpecificRecord, that is fine for examples, however, I'm planning to have multiple consumers with the source code in different repositories.
How do you maintain and share the auto-generated SpecificRecord POJOs?
Is sharing SpecificRecord POJOs a thing or each application should deal with deserialization using only the Avro definition from the schema registry?
Do you create a Java package dependency or what would be a good practice?

settings needed for kafka-connect framework to achieve exactly once semantics

I am under the impression that in Kafka-Connect you can specify some parameters in the .properties files to turn on exactly-once semantics.
I have been unable to find these settings; but I have found other ways of achieving this like
https://github.com/koeninger/kafka-exactly-once/blob/master/blogpost.md
and even older,
https://cwiki.apache.org/confluence/display/KAFKA/FAQ
Is it possible to achieve exactly once semantics by changing settings in kafka connect?
Kafka Connect does not support exactly once semantics at the framework scope. There are individual connectors (for instance the HDFS Connector that Confluent provides) that provide exactly once semantics. However, it's not a framework level configuration at this time.

Resources