Kafka cluster unavailable and how to listen to reconnected - spring-boot

In one of our spring boot applications I'm developing a retry mechanism. Below summarized what this new feature needs to do:
If the kafka cluster for whatever reason is not available the application should keep running because records should be inserted in the database.
When the kafka cluster is available again a producer should send all newly inserted records to a topic.
What I'm looking for is an event which tells me kafka is back up and running. So far I'm unable to find something like this. What I did find are forums telling this is not supported. I was wondering if somebody has experience how to implement this.
If you have any questions for me to clarify please let me know.
We are using spring-kafka 2.8.2
Kind regards,
Josip

Related

A Topic lost a subscription node during run time

Spring boot Version : 2.5.0
Spring Cloud Version : 2020.0.3
I used Spring-Cloud-stream-binder - Kafka and Spring-cloud-stream-binder - Kafka-Streams for kafka production and consumption in the project.
In one project, I subscribed to N topics.
Two nodes were started for service using load balancing.
During run time, it was suddenly discovered that one of the topics had no subscription nodes.
This results in messages being backlogged and lost.
I have to restart these service nodes before I can subscribe to this Topic again.
What is the cause of this, or is there any way to help find some clues.
And is there a way to check at run time so that topics that have lost subscriptions can be re-subscribed?

duplicate consumption of messages with Spring Cloud Stream Kafka binder

We have several micro-services using Spring Boot and Spring Cloud Stream Kafka binder to communicate between them.
Occasionally, we observe bursts of duplicate messages received by a consumer - often several days after it was first consumed and processed (successfully).
While I understand that Kafka does not guarantee exactly-once delivery, it still looks very strange, given that there were no rebalancing events or any 'suspicious' activity in the logs of either the brokers nor the services. Since the consumer is interacting with external APIs, it is a bit difficult to make it idempotent.
Any hints what might be the cause of duplication? What should I be looking for to figure this out?
We are using Kafka broker 1.0.0, and this particular consumer uses Spring Cloud Stream Binder Kafka 2.0.0, which is based on kafka-client 1.0.2 (version of the other services might be a bit different).
You should show your configuration when asking questions like this.
Best guess is the broker's offsets.retention.minutes.
With modern broker versions (since 2.0), it defaults to 1 week; with older versions it was only one day.

Is it possible sending websocket messages to a kafka topic?

I am trying to find a way to consume messages that being sent by a websocket to a kafka topic (the messages are sent by the websocket to the address 'ws://address:port/topic_name' and I want to add all of those messages to a kafka topic).
I read about kafka connect and tried to find a way to do it with it but it doesnt seem to work...
thanks in advance :)
There is no Kafka Connector to a socket in Confluent Platform.
I work in a team that use Kafka in production and our source is a socket, so your options are to use platforms that support this socket->Kafka producing, or write one by yourself.
About possible platforms, I think most of them will be overkill though you can utilize them for this problem, some options are:
1. NiFi or MiniFi for smaller loads, use PublishKafka Processor
2. StreamSets with Kafka Producer Destination
3. Apache Flume- not very recommended, this project is stops to evolve.
If you wish to write your own producer, you basically have to create a listener on this port, and produce the incoming messages to Kafka; if this is a web socket, just get the payload of the requests and produce them to Kafka.
Example Kafka Producer Code can be copied from tutorialspoint simple producer example*
Here are some open-source projects examples:
1. https://github.com/DataReply/kafka-connect-socket-source
2. https://github.com/kafka-socket/miniature_engine
3. https://github.com/dhanuka84/kafka-connect-tcp
4. https://github.com/krux/tcp-stream-kafka-producer
The idea of Kafka connect is that you have some sort of external integration that serves as storage. This can be SAP, Salesforce, RDBMS, MQ or anything else that has state. You websocket endpoint does not have data, you can not poll it it is someone else that is invoking it and there fore the data is transfered. Now if you know who is actualy holding the data than you can potentialy build a conector using this guide. https://docs.confluent.io/current/connect/devguide.html
For your particular case, the best you can do is either to use Kafka Producer API https://docs.confluent.io/current/clients/producer.html
and from your websocket enpoint use this producer to post a message to the topic, or even better if you are using spring you can use a higher level abstraction, that will be KafkaTemplate https://docs.spring.io/spring-kafka/reference/html/#sending-messages.
Full disclosure: I work for MigratoryData.
You can check out MigratoryData's solution for Kafka. MigratoryData is a scalable WebSocket server. The MigratoryData Source/Sink Connector for Kafka makes use of Kafka Connect API and can be used to stream data in real-time from Kafka to WebSocket clients and vice versa. The main advantage of the solution is it extends Kafka messaging to WebSocket clients while preserving Kafka's key features like guaranteed delivery, message ordering, etc.

For a spring enterprise web application with multiple instances, What is the way to retrieve the offset value from Kafka and store it?

I'm working on an enterprise web application that has a requirement to read from a Kafka system and then trigger events. Can anyone suggest a way to get the offset and also an ideal way to store the offset (Ideal way should be able to handle accessing by multiple instances of the application)?
Note:-
I'm using spring-kafka and open for any further suggestions.
Thanks in advance.
With recent versions of Kafka, the offset is stored in a kafka topic. Kafka keeps track of the consumer offset for each partition in a topic __consumer_offsets which is a compacted topic; in other words; kafka itself keeps track of the offset for each consumer group.
With Spring for Apache Kafka; several options are provided for when the offset is committed.
In earlier versions of kafka offsets were often stored externally; it's now a lot simpler.
There may still be use cases for that but such scenarios are all supported by Spring Kafka; especially with the upcoming 2.0 release.

How to get Kafka bootstrap configuration setting from Kafka connector

Question for Kafka experts:
Anyone knows how to get worker's config setting 'bootstrap.servers' either from SinkConnector or SinkTask? Or how to get Kafka cluster information from connector?
Thank you
AFAIK, the Connect API doesn't provide this information right now.
If you need this functionality, perhaps your best bet is to open a JIRA on the Apache Kafka project and explain the use-case (i.e. what are you planning on doing this information).

Resources