Streamsets pipeline didn't finish if kafka consumer having 0 messages in topic - apache-kafka-streams

I have developed a streamsets pipeline which using KAFKA consumer as origin.My pipeline is working fine if Kafka consumer having message in it.But kafka consumer have 0 message in it the my pipeline went into loop and running contineously and didn't finish.
I need to finish my pipeline if kafka consumer having zero messages in his topic.

By default it is streaming application's nature to keep checking for messages. But a similar approach to what you want to achieve has been answered-here.

Related

How to change offset of a topic during runtime?

I have a Producer for kafka topic which keeps on pushing some messages to kafka topic. And also I have another service reading these messages from topic.
I have an business use-case, where sometimes consumer need to ignore all the messages which are already there in queue and start processing only new upcoming messages. Can this be archived without stopping and restarting the kafka server.
I am working on GO. So if kafka supports such requirement, is there any way I can change configuration of consumer to start consuming from latest message using sarama GO client.
Thank you in advance.
You could use a random UUID for consumer group id, and/or disable auto commits, then you can start at the latest offset with
config := sarama.NewConfig()
config.Consumer.Offsets.Initial = sarama.OffsetOldest
(copied from Sarama example code)
Otherwise, Kafka consumer API should have a seekToEnd function, but it seems to be exposed in Sarama as getting high watermarks from consumer for every partition, then calling ResetOffets on a ConsumerGroup instance. Note: the group should be paused before doing that.

Does Spring Kafka producer guarantee delivery by default?

I wonder whether spring kafka Producer within spring boot guarantee delivery or not.
Does anybody know what happens if some random listener fails to receive message? Would spring kafka retry to send the message?
There are some concepts here:
Producer will produce events and send them to kafka server. You must be aware on the producer side for retries and things like that if Kafka will have downtime or other error scenarios that are specific to your context.
Consumers will have assign partitions by Kafka, each partition will deliver events and each event will have an offset. Consumers will poll for data from kafka (they will request for data, kafka will not push data to consumers, but consumers will go to kafka and require data). Every event that is delivered with success by Kafka to the consumers will produce and Acknowledgment and Kafka will commit the offset of the event. So the next event, with a higher offeset will be delivered to the consumer. If a consumer goes down, partitions will be reasigned to other consumers, so you won't lose your data. If you have only one consumer, the data will be stored in Kafka and when the consumer will be back, it will go and request data from the latest/earliest offset.

Spring Integration - Kafka Message Driven Channel - Auto Acknowledge

I have used the sample configuration as was listed in the spring io docs and it is working fine.
<int-kafka:message-driven-channel-adapter
id="kafkaListener"
listener-container="container1"
auto-startup="false"
phase="100"
send-timeout="5000"
channel="nullChannel"
message-converter="messageConverter"
error-channel="errorChannel" />
However, when i was testing it with downstream application where i consume from kafka and publish it to downstream. If downstream is down, the messages were still getting consumed and was not replayed.
Or lets say after consuming from kafka topic , in case i find some exception in service activator, i want to throw some exception as well which should rollback the transaction so that kafka messages can be replayed.
In brief, if the consuming application is having some issue , then i want to roll back the transaction so that messages are not automatically acknowledged and are replayed back again and again unless it is succesfuly processed.
That's not how Apache Kafka works. There is the TX semantics similar to JMS. The offset in Kafka topic has nothing with rallback or redelivery.
I suggest you to study Apache Kafka closer from their official resource.
Spring Kafka brings nothing over the regular Apache Kafka protocol, however you can consider to use retry capabilities in the Spring Kafka to redeliver the same record locally : http://docs.spring.io/spring-kafka/docs/1.2.2.RELEASE/reference/html/_reference.html#_retrying_deliveries
And yes, the ack mode must be MANUAL, do not commit offset into the Kafka automatically after consuming.

messages published to all consumers with same consumer-group in spring-data-stream project

I got my zookeeper and 3 kafka broker running locally.
I started one producer and one consumer. I can see consumer is consuming message.
I then started three consumers with same consumer group name (different ports since its a spring boot project). but what I found is that all the consumers are now consuming (receiving) messages. But I expect the message to be load-balanced in that only messages are not repeated across the consumers. I don't know what the problem is.
Here is my property file
spring.cloud.stream.bindings.input.destination=timerTopicLocal
spring.cloud.stream.kafka.binder.zkNodes=localhost
spring.cloud.stream.kafka.binder.brokers=localhost
spring.cloud.stream.bindings.input.group=timerGroup
Here the group is timerGroup.
consumer code : https://github.com/codecentric/edmp-sample-stream-sink
producer code : https://github.com/codecentric/edmp-sample-stream-source
Can you please update dependencies to Camden.RELEASE (and start using Kafka 0.9+) ? In Brixton.RELEASE, Kafka consumers were 0.8-based and required passing instanceIndex/instanceCount as properties in order to distribute partitions correctly.
In Camden.RELEASE we are using the Kafka 0.9+ consumer client, which does load-balancing in the way you are expecting (we also support static partition allocation via instanceIndex/instanceCount, but I suspect this is not what you want). I can enter into more details on how to configure this with Brixton, but I guess an upgrade should be a much easier path.

Kafka consumer , able process tons of messages

In general case Kafka consumer could be anything, that connects to Kafka and gets messages.
I'm interested in known Kafka consumers for several purposes:
1) process messages and save result in DB(Oracle)
2) process messages and save result in files
What established Kafka consumers can you suggest?
Thanks.
You can use Camus Consumer for Kakfa->HDFS. It is a mapreduce job that does distributed data loads out of Kafka.

Resources