Kafka consumer , able process tons of messages - performance

In general case Kafka consumer could be anything, that connects to Kafka and gets messages.
I'm interested in known Kafka consumers for several purposes:
1) process messages and save result in DB(Oracle)
2) process messages and save result in files
What established Kafka consumers can you suggest?
Thanks.

You can use Camus Consumer for Kakfa->HDFS. It is a mapreduce job that does distributed data loads out of Kafka.

Related

Streamsets pipeline didn't finish if kafka consumer having 0 messages in topic

I have developed a streamsets pipeline which using KAFKA consumer as origin.My pipeline is working fine if Kafka consumer having message in it.But kafka consumer have 0 message in it the my pipeline went into loop and running contineously and didn't finish.
I need to finish my pipeline if kafka consumer having zero messages in his topic.
By default it is streaming application's nature to keep checking for messages. But a similar approach to what you want to achieve has been answered-here.

Is it possible sending websocket messages to a kafka topic?

I am trying to find a way to consume messages that being sent by a websocket to a kafka topic (the messages are sent by the websocket to the address 'ws://address:port/topic_name' and I want to add all of those messages to a kafka topic).
I read about kafka connect and tried to find a way to do it with it but it doesnt seem to work...
thanks in advance :)
There is no Kafka Connector to a socket in Confluent Platform.
I work in a team that use Kafka in production and our source is a socket, so your options are to use platforms that support this socket->Kafka producing, or write one by yourself.
About possible platforms, I think most of them will be overkill though you can utilize them for this problem, some options are:
1. NiFi or MiniFi for smaller loads, use PublishKafka Processor
2. StreamSets with Kafka Producer Destination
3. Apache Flume- not very recommended, this project is stops to evolve.
If you wish to write your own producer, you basically have to create a listener on this port, and produce the incoming messages to Kafka; if this is a web socket, just get the payload of the requests and produce them to Kafka.
Example Kafka Producer Code can be copied from tutorialspoint simple producer example*
Here are some open-source projects examples:
1. https://github.com/DataReply/kafka-connect-socket-source
2. https://github.com/kafka-socket/miniature_engine
3. https://github.com/dhanuka84/kafka-connect-tcp
4. https://github.com/krux/tcp-stream-kafka-producer
The idea of Kafka connect is that you have some sort of external integration that serves as storage. This can be SAP, Salesforce, RDBMS, MQ or anything else that has state. You websocket endpoint does not have data, you can not poll it it is someone else that is invoking it and there fore the data is transfered. Now if you know who is actualy holding the data than you can potentialy build a conector using this guide. https://docs.confluent.io/current/connect/devguide.html
For your particular case, the best you can do is either to use Kafka Producer API https://docs.confluent.io/current/clients/producer.html
and from your websocket enpoint use this producer to post a message to the topic, or even better if you are using spring you can use a higher level abstraction, that will be KafkaTemplate https://docs.spring.io/spring-kafka/reference/html/#sending-messages.
Full disclosure: I work for MigratoryData.
You can check out MigratoryData's solution for Kafka. MigratoryData is a scalable WebSocket server. The MigratoryData Source/Sink Connector for Kafka makes use of Kafka Connect API and can be used to stream data in real-time from Kafka to WebSocket clients and vice versa. The main advantage of the solution is it extends Kafka messaging to WebSocket clients while preserving Kafka's key features like guaranteed delivery, message ordering, etc.

Does Spring Kafka producer guarantee delivery by default?

I wonder whether spring kafka Producer within spring boot guarantee delivery or not.
Does anybody know what happens if some random listener fails to receive message? Would spring kafka retry to send the message?
There are some concepts here:
Producer will produce events and send them to kafka server. You must be aware on the producer side for retries and things like that if Kafka will have downtime or other error scenarios that are specific to your context.
Consumers will have assign partitions by Kafka, each partition will deliver events and each event will have an offset. Consumers will poll for data from kafka (they will request for data, kafka will not push data to consumers, but consumers will go to kafka and require data). Every event that is delivered with success by Kafka to the consumers will produce and Acknowledgment and Kafka will commit the offset of the event. So the next event, with a higher offeset will be delivered to the consumer. If a consumer goes down, partitions will be reasigned to other consumers, so you won't lose your data. If you have only one consumer, the data will be stored in Kafka and when the consumer will be back, it will go and request data from the latest/earliest offset.

messages published to all consumers with same consumer-group in spring-data-stream project

I got my zookeeper and 3 kafka broker running locally.
I started one producer and one consumer. I can see consumer is consuming message.
I then started three consumers with same consumer group name (different ports since its a spring boot project). but what I found is that all the consumers are now consuming (receiving) messages. But I expect the message to be load-balanced in that only messages are not repeated across the consumers. I don't know what the problem is.
Here is my property file
spring.cloud.stream.bindings.input.destination=timerTopicLocal
spring.cloud.stream.kafka.binder.zkNodes=localhost
spring.cloud.stream.kafka.binder.brokers=localhost
spring.cloud.stream.bindings.input.group=timerGroup
Here the group is timerGroup.
consumer code : https://github.com/codecentric/edmp-sample-stream-sink
producer code : https://github.com/codecentric/edmp-sample-stream-source
Can you please update dependencies to Camden.RELEASE (and start using Kafka 0.9+) ? In Brixton.RELEASE, Kafka consumers were 0.8-based and required passing instanceIndex/instanceCount as properties in order to distribute partitions correctly.
In Camden.RELEASE we are using the Kafka 0.9+ consumer client, which does load-balancing in the way you are expecting (we also support static partition allocation via instanceIndex/instanceCount, but I suspect this is not what you want). I can enter into more details on how to configure this with Brixton, but I guess an upgrade should be a much easier path.

Queue consumer clusters with ActiveMQ

How to configure cluster of Consumers in ActiveMQ?
I created a simple embedded ActiveMQ application with two consumers of one Queue, consumers are working in separate threads. But when I send a message to the Queue, JMS delivers it to first consumer no matter how long it sleeps after receiving.
I think you're trying to explain that the first consumer is receiving all the messages. There is a FAQ entry for this type of problem available here:
http://activemq.apache.org/i-do-not-receive-messages-in-my-second-consumer.html
Bruce

Resources