GetKafka not getting messages in Apache Nifi - hortonworks-data-platform

I'm trying to get the messages using a GetKafka processor but I'm not able to get it. I tested consuming messages using Kafka command line consumer and it works. I was also able to use PutKafka processor successfully to put the messages in the topic. Attached my settings where I set the Zookeeper Connection String and Topic Name. When I run the flow, I don't see any errors in the processors.
GetKafka Processor
I see an exception in nifi-app.log:
2016-08-03 09:34:33,722 WARN [70e1df87-6097-4ed0-9a40-7e36f9be6921_mydomain.com-1470231250839-1fbd0cfe-leader-finder-thread] kafka.client.ClientUtils$ Fetching topic metadata with correlation id 0 for topics [Set(test)] from broker [id:0,host:DataSlave1.CSE-RD.com,
port:9092] failed
java.nio.channels.ClosedByInterruptException: null
at java.nio.channels.spi.AbstractInterruptibleChannel.end(AbstractInterruptibleChannel.java:202) ~[na:1.8.0_101]
at sun.nio.ch.SocketChannelImpl.poll(SocketChannelImpl.java:957) ~[na:1.8.0_101]
Kafka (0.8): 2.10-0.8.2.1, Nifi: 0.7.0
Am I missing anything? Thanks.

The exception went away after a restart. GetKafka is now able to get the messages as they are sent in the producer. It was not receiving the previous messages in the topic (something equivalent to --from-beginning in kafka console consumer). I don't see a setting for that in the processor.

Related

Dynamically set topic name and Kafka Brokers in PublishKafka2_0 processor in Nifi

I am using the PublishKafka processor in nifi to publish JSON data in Kafka. I want to set Kafka Server and topic name details from attribute values. I am not able to do this. It's giving the below error
PublishKafka_2_0[id=2cb333e9-5fd6-1254-a10c-d1084b230d03] Processing
halted: yielding [1 sec]: org.apache.kafka.common.KafkaException:
Failed to construct kafka producer
Caused by: org.apache.kafka.common.config.ConfigException: No resolvable bootstrap urls given in bootstrap.servers .
Only hardcoded values of the Kafka server and the topic name is working.
Am I missing something? Any help would be much appreciated!

Spring Integration - Kafka Message Driven Channel - Auto Acknowledge

I have used the sample configuration as was listed in the spring io docs and it is working fine.
<int-kafka:message-driven-channel-adapter
id="kafkaListener"
listener-container="container1"
auto-startup="false"
phase="100"
send-timeout="5000"
channel="nullChannel"
message-converter="messageConverter"
error-channel="errorChannel" />
However, when i was testing it with downstream application where i consume from kafka and publish it to downstream. If downstream is down, the messages were still getting consumed and was not replayed.
Or lets say after consuming from kafka topic , in case i find some exception in service activator, i want to throw some exception as well which should rollback the transaction so that kafka messages can be replayed.
In brief, if the consuming application is having some issue , then i want to roll back the transaction so that messages are not automatically acknowledged and are replayed back again and again unless it is succesfuly processed.
That's not how Apache Kafka works. There is the TX semantics similar to JMS. The offset in Kafka topic has nothing with rallback or redelivery.
I suggest you to study Apache Kafka closer from their official resource.
Spring Kafka brings nothing over the regular Apache Kafka protocol, however you can consider to use retry capabilities in the Spring Kafka to redeliver the same record locally : http://docs.spring.io/spring-kafka/docs/1.2.2.RELEASE/reference/html/_reference.html#_retrying_deliveries
And yes, the ack mode must be MANUAL, do not commit offset into the Kafka automatically after consuming.

messages published to all consumers with same consumer-group in spring-data-stream project

I got my zookeeper and 3 kafka broker running locally.
I started one producer and one consumer. I can see consumer is consuming message.
I then started three consumers with same consumer group name (different ports since its a spring boot project). but what I found is that all the consumers are now consuming (receiving) messages. But I expect the message to be load-balanced in that only messages are not repeated across the consumers. I don't know what the problem is.
Here is my property file
spring.cloud.stream.bindings.input.destination=timerTopicLocal
spring.cloud.stream.kafka.binder.zkNodes=localhost
spring.cloud.stream.kafka.binder.brokers=localhost
spring.cloud.stream.bindings.input.group=timerGroup
Here the group is timerGroup.
consumer code : https://github.com/codecentric/edmp-sample-stream-sink
producer code : https://github.com/codecentric/edmp-sample-stream-source
Can you please update dependencies to Camden.RELEASE (and start using Kafka 0.9+) ? In Brixton.RELEASE, Kafka consumers were 0.8-based and required passing instanceIndex/instanceCount as properties in order to distribute partitions correctly.
In Camden.RELEASE we are using the Kafka 0.9+ consumer client, which does load-balancing in the way you are expecting (we also support static partition allocation via instanceIndex/instanceCount, but I suspect this is not what you want). I can enter into more details on how to configure this with Brixton, but I guess an upgrade should be a much easier path.

Storm-kafka Hortonworks Tutorials for real time data streaming

Welcome any idea after read the problem statement.
Background:
Publish message using Apache Kafka:
Kafka broker is running. Kafka producers are the application that create the messages and publish them to the Kafka broker for further consumption. Therefore, in order for the Kafka consumer to consume data, Kafka topic need to create before Kafka producer and consumer starting publish message and consume message
Kafka tested successful as Kafka consumer able to consume data from Kafka topic and display result.
Before startup Storm topology, stop the Kafka consumer so that Storm Spout able to working on source of data streams from kafka topics.
Real time processing of the data using Apache Storm:
With Storm topology created, Storm Spout working on the source of data streams, which mean Spout will read data from kafka topics. At another end, Spout passes streams of data to Storm Bolt, which processes and create the data into HDFS (file format) and HBase (db format) for storage purpose.
Zookeeper znode missing the last child znode.
From log file,
2015-05-20 04:22:43 b.s.util [ERROR] Async loop died!
java.lang.RuntimeException: java.lang.RuntimeException: org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode for /brokers/topics/truckevent/partitions
Zookeeper is the coordination service for distribution application. From the zookeeper client, we always can see the /brokers/topics/truckevent, but the last znode always missing when running storm. I managed to solve this issue once if we create the znode manually. However, same method is no longer work for subsequent testing.
Storm (TrucjHBaseBolt is the java class) failed to access connection to HBase tables.
From log file,
2015-05-20 04:22:51 c.h.t.t.TruckHBaseBolt [ERROR] Error retrievinging connection and access to HBase Tables
I had manually create the Hbase table as for data format at HBase. However, retrieving the connection to HBase still failed.
Storm (HdfsBolt java class) reported the permission denied when storm user write the data into hdfs.
From log file,
2015-05-20 04:22:43 b.s.util [ERROR] Async loop died!
java.lang.RuntimeException: Error preparing HdfsBolt: Permission denied: user=storm, access=WRITE, inode="/":hdfs:hdfs:drwxr-xr-x
Any one can help on this?
Suggestion for problem 1:
Stop storm topology. Delete the znodes related to topics manually in the zookeeper where storm is running and restart storm topology.
This will create new znodes.
Suggestion for problem 2:
First check using java code if you are able to connect to Hbase. Then test that same logic in Storm topology.
Answer for problem 3:
As per your logs user=storm but the directory in which you are writing is owned by hdfs. So change the user permission of that directory and make storm as the user using chown command.

Does Spring XD re-process the same message when one of it's container goes down while processing the message?

Application Data Flow:
JSon Messages--> Active MQ --> Spring XD-- Business Login(Transform JSon to Java Object)--> Save Data to Target DB--> DB.
Question:
Sprin-Xd is running in cluster mode, configured with Radis.
Spring XD picks up the message from the Active message queue(AMQ). So message is no longer in AMQ. Now while one of the containers where this message is being processed with some business logic suddenly goes down. In this scenarios-
Will Spring-XD framework automatically re-process that particular message ? what's mechanism behind that?
Thanks,
Abhi
Not with a Redis transport; Redis has no infrastructure to support such a requirement ("transactional" reads). You would need to use a rabbit or kafka transport.
EDIT:
See Application Configuration (scroll down to RabbitMQ) and Message Bus Configuration.
Specifically, the default ackMode is AUTO which means messages are acknowledged on success.

Resources