Spring Integration: File Copy functionality with delay - spring

I am using following configuration to copy files from one directory to another directory -
#Bean
public MessageChannel fileInputChannel() {
return new DirectChannel();
}
#Bean
#InboundChannelAdapter(value = "fileInputChannel", poller = #Poller(fixedDelay = "10000") )
public MessageSource<File> fileReadingMessageSource() {
FileReadingMessageSource source = new FileReadingMessageSource();
source.setDirectory(new File("C:/input_dir"));
source.setFilter(new RegexPatternFileListFilter(".*"));
return source;
}
#Bean
#ServiceActivator(inputChannel = "fileInputChannel")
public FileWritingMessageHandler handle() {
FileWritingMessageHandler handler = new FileWritingMessageHandler(new File("C:/Output_dir"));
handler.setDeleteSourceFiles(false);
handler.setExpectReply(false);
handler.setPreserveTimestamp(true);
handler.setAsync(true);
return handler;
}
I am expecting that :
- If there is any change in any source file OR
- If a new file created in source directory
updated OR newly created file will be updated/created in destination folder within 10 seconds. However it is taking more than 1 minute and file size in KBs. Also, source and destination directories are on same machine.
I am not able to identify why it is taking more than 1 minute when I have set #Poller time 10 seconds
logs -
[2017-01-31 10:33:04,943612]INFO [task-scheduler-3] (FileReadingMessageSource.java:367) - Created message: [GenericMessage [payload=C:/input_dir/file.170126.19, headers={timestamp=1485876784308, id=662aaf51-91e5-6f78-a2a6-997fc01b8b79}]]<br>
[2017-01-31 10:33:04,943612]DEBUG[task-scheduler-3] (AbstractPollingEndpoint.java:267) - Poll resulted in Message: GenericMessage [payload=C:/input_dir/file.170126.19, headers={timestamp=1485876784308, id=662aaf51-91e5-6f78-a2a6-997fc01b8b79}]<br>
[2017-01-31 10:33:04,943612]DEBUG[task-scheduler-3] (AbstractMessageChannel.java:411) - preSend on channel 'fileInputChannel', message: GenericMessage [payload=C:/input_dir/file.170126.19, headers={timestamp=1485876784308, id=662aaf51-91e5-6f78-a2a6-997fc01b8b79}]<br>
[2017-01-31 10:33:04,943612]DEBUG[task-scheduler-3] (AbstractMessageHandler.java:115) - handle received message: GenericMessage [payload=C:/input_dir/file.170126.19, headers={timestamp=1485876784308, id=662aaf51-91e5-6f78-a2a6-997fc01b8b79}]<br>
[2017-01-31 10:33:04,943618]DEBUG[task-scheduler-3] (AbstractMessageChannel.java:430) - postSend (sent=true) on channel 'fileInputChannel', message: GenericMessage [payload=C:/input_dir/file.170126.19, headers={timestamp=1485876784308, id=662aaf51-91e5-6f78-a2a6-997fc01b8b79}]<br>
[2017-01-31 10:33:14,953620]INFO [task-scheduler-4] (FileReadingMessageSource.java:367) - Created message: [GenericMessage [payload=C:/input_dir/file.170127.19, headers={timestamp=1485876794316, id=c05deaec-f863-fd7f-0b08-dd3534be81d7}]]<br>
[2017-01-31 10:33:14,953620]DEBUG[task-scheduler-4] (AbstractPollingEndpoint.java:267) - Poll resulted in Message: GenericMessage [payload=C:/input_dir/file.170127.19, headers={timestamp=1485876794316, id=c05deaec-f863-fd7f-0b08-dd3534be81d7}]<br>
[2017-01-31 10:33:14,953620]DEBUG[task-scheduler-4] (AbstractMessageChannel.java:411) - preSend on channel 'fileInputChannel', message: GenericMessage [payload=C:/input_dir/file.170127.19, headers={timestamp=1485876794316, id=c05deaec-f863-fd7f-0b08-dd3534be81d7}]<br>
[2017-01-31 10:33:14,953620]DEBUG[task-scheduler-4] (AbstractMessageHandler.java:115) - handle received message: GenericMessage [payload=C:/input_dir/file.170127.19, headers={timestamp=1485876794316, id=c05deaec-f863-fd7f-0b08-dd3534be81d7}]<br>
[2017-01-31 10:33:14,953626]DEBUG[task-scheduler-4] (AbstractMessageChannel.java:430) - postSend (sent=true) on channel 'fileInputChannel', message: GenericMessage [payload=C:/input_dir/file.170127.19, headers={timestamp=1485876794316, id=c05deaec-f863-fd7f-0b08-dd3534be81d7}]

OK. I got your concern!
Look, #Poller has this property:
/**
* #return The maximum number of messages to receive for each poll.
* Can be specified as 'property placeholder', e.g. {#code ${poller.maxMessagesPerPoll}}.
* Defaults to -1 (infinity) for polling consumers and 1 for polling inbound channel adapters.
*/
String maxMessagesPerPoll() default "";
Pay attention how it defaults to the "1 for polling inbound channel adapters".
So, to poll all your files per one polling task you really should make this property as infinity:
#Poller(maxMessagesPerPoll = "-1")

Related

Spring webflux websocket closed when receive message too fast

enter code herepublic Mono<Void> handle(#Nonnull WebSocketSession session) {
final WebSocketContext webSocketContext = new WebSocketContext(session);
Mono<Void> output = session.send(Flux.create(webSocketContext::setSink));
Mono<Void> input = session.receive()
.timeout(Duration.ofSeconds(adapterProperties.getSessionTimeout()))
.doOnSubscribe(subscription -> subscription.request(64))
.doOnNext(WebSocketMessage::retain)
.publishOn(Schedulers.boundedElastic())
.concatMap(msg -> {
// ....blocking operation
return Flux.empty();
}).then();
return Mono.zip(input, output).then();
When I use ws client to send information, because the message sending speed is too fast, when 2000 pieces of data are received, the connection is disconnected and there is no exception message,
After I slow down the sending speed of the message on the client side, there is no problem. How can I solve it?
Below is the Flux log information:
2022-07-14 17:19:40.295 adapter-iat [boundedElastic-5] INFO reactor.Flux.PublishOn.5 - | onNext(WebSocket TEXT message (13765 bytes))
2022-07-14 17:19:40.296 adapter-iat [boundedElastic-5] INFO reactor.Flux.PublishOn.5 - | onNext(WebSocket TEXT message (13765 bytes))
2022-07-14 17:19:40.300 adapter-iat [boundedElastic-5] INFO reactor.Flux.PublishOn.5 - | onComplete()????why

Conversion by Spring Cloud Stream of Message with pojo on consumer side returns object with null fields

Here is code snippets to demonstrate the problem.
On producer side:
ProductModelDto dto = new ProductModelDto(1L, "name", "desc", 100.5);
streamBridge.send(PRODUCTS_OUT, MessageBuilder.withPayload(dto).build());
On consumer side:
#Bean
public Consumer<Message<ProductModelDto>> productCreated() {
return message -> {
log.info("payload = {}", message.getPayload());
log.info("payload class = {}", message.getPayload().getClass().getName());
log.info("sourceData = {}", message.getHeaders().get("sourceData"));
};
}
Output:
payload = ProductModelDto(id=null, name=null, description=null, price=null)
payload class = ru.security.common.model.product.ProductModelDto
sourceData = (Body:'{"id":1,"name":"name","description":"desc","price":100.5}' MessageProperties [headers={}, timestamp=Tue Sep 07 11:13:02 MSK 2021, messageId=3840075f-1142-f94d-be37-7be950d73f54, contentType=application/json, contentLength=0, receivedDeliveryMode=PERSISTENT, priority=0, redelivered=false, receivedExchange=product-service.product-created-or-changed, receivedRoutingKey=product-service.product-created-or-changed, deliveryTag=1, consumerTag=amq.ctag-3i5ECkRTPs5_O5ZW5af-MA, consumerQueue=product-service.product-created-or-changed.some-group4])
I expect to receive payload which have been sent by producer. But resulted payload is ProductModelDto(id=null, name=null, description=null, price=null).
I know that spring automatically convert message to pojo if I use
Consumer<ProductModelDto> productCreated()
but I need
Consumer<Message<ProductModelDto>> productCreated()
to get Headers from message. Any suggestion where I miss some configuration?
I've created sample project and found out that this issue is reproduced in version 3.0.11.RELEASE of spring-cloud-starter-stream-rabbit. In versions 3.1.4 or 3.0.10.RELEASE i didn't see this problem. https://github.com/serjteplov/demo-gitter-scs1.git

Spring Data Redis Streams, Cannot figure out what is happening to my unacknowleded messages?

I am using the following code to consume a Redis stream using a Spring Data Redis consumer group, but even though I have commented out the acknowledge command, my messages are not re-read after a server restart.
I would expect that if I didn't acknowledge the message, it should be re-read when the server gets killed and restarted. What am I missing here?
#Bean
#Autowired
public StreamMessageListenerContainer eventStreamPersistenceListenerContainerTwo(RedisConnectionFactory streamRedisConnectionFactory, RedisTemplate streamRedisTemplate) {
StreamMessageListenerContainer.StreamMessageListenerContainerOptions<String, MapRecord<String, String, String>> containerOptions = StreamMessageListenerContainer.StreamMessageListenerContainerOptions
.builder().pollTimeout(Duration.ofMillis(100)).build();
StreamMessageListenerContainer<String, MapRecord<String, String, String>> container = StreamMessageListenerContainer.create(streamRedisConnectionFactory,
containerOptions);
container.receive(Consumer.from("my-group", "my-consumer"),
StreamOffset.create("event-stream", ReadOffset.latest()),
message -> {
System.out.println("MessageId: " + message.getId());
System.out.println("Stream: " + message.getStream());
System.out.println("Body: " + message.getValue());
//streamRedisTemplate.opsForStream().acknowledge("my-group", message);
});
container.start();
return container;
}
After reading the Redis documentation on how streams work, I came up with the following to automatically process any unacknowledged but previously delivered messages for the consumer:
// Check for any previously unacknowledged messages that were delivered to this consumer.
log.info("STREAM - Checking for previously unacknowledged messages for " + this.getClass().getSimpleName() + " event stream listener.");
String offset = "0";
while ((offset = processUnacknowledgedMessage(offset)) != null) {
log.info("STREAM - Finished processing one unacknowledged message for " + this.getClass().getSimpleName() + " event stream listener: " + offset);
}
log.info("STREAM - Finished checking for previously unacknowledged messages for " + this.getClass().getSimpleName() + " event stream listener.");
And the method that processes the messages:
/**
* Processes, and acknowledges the next previously delivered message, beginning
* at the given message id offset.
*
* #param offset The last read message id offset.
* #return The message that was just processed, or null if there are no more messages.
*/
public String processUnacknowledgedMessage(String offset) {
List<MapRecord> messages = streamRedisTemplate.opsForStream().read(Consumer.from(groupName(), consumerName()),
StreamReadOptions.empty().noack().count(1),
StreamOffset.create(streamKey(), ReadOffset.from(offset)));
String lastMessageId = null;
for (MapRecord message : messages) {
if (log.isDebugEnabled()) log.debug(String.format("STREAM - Processing event(%s) from stream(%s) during startup: %s", message.getId(), message.getStream(), message.getValue()));
processRecord(message);
if (log.isDebugEnabled()) log.debug(String.format("STREAM - Finished processing event(%s) from stream(%s) during startup.", message.getId(), message.getStream()));
streamRedisTemplate.opsForStream().acknowledge(groupName(), message);
lastMessageId = message.getId().getValue();
}
return lastMessageId;
}

Does Spring Cloud Stream Kafka supports embedded headers?

According to this topic:
Kafka Spring Integration: Headers not coming for kafka consumer -
this is no headers support for Kafka
But documentation says:
spring.cloud.stream.kafka.binder.headers
The list of custom headers that will be transported by the binder.
Default: empty.
I can't get it working with spring-cloud-stream-binder-kafka: 1.2.0.RELEASE
SENDING LOG:
MESSAGE (e23885fd-ffd9-42dc-ebe3-5a78467fee1f) SENT :
GenericMessage [payload=...,
headers={
content-type=application/json,
correlationId=51dd90b1-76e6-4b8d-b667-da25f214f383,
id=e23885fd-ffd9-42dc-ebe3-5a78467fee1f,
contentType=application/json,
timestamp=1497535771673
}]
RECEIVING LOG:
MESSAGE (448175f5-2b21-9a44-26b9-85f093b33f6b) RECEIVED BY HANDLER 1:
GenericMessage [payload=...,
headers={
kafka_offset=36,
id=448175f5-2b21-9a44-26b9-85f093b33f6b,
kafka_receivedPartitionId=0,
contentType=application/json;charset=UTF-8,
kafka_receivedTopic=new_patient, timestamp=1497535771715
}]
MESSAGE (448175f5-2b21-9a44-26b9-85f093b33f6b) RECEIVED BY HANDLER 2 :
GenericMessage [payload=...,
headers={
kafka_offset=36,
id=448175f5-2b21-9a44-26b9-85f093b33f6b,
kafka_receivedPartitionId=0,
contentType=application/json;charset=UTF-8,
kafka_receivedTopic=new_patient, timestamp=1497535771715
}]
I expect to see the same message id and get correlationId on receiving side.
application.properties:
spring.cloud.stream.kafka.binder.headers=correlationId
spring.cloud.stream.bindings.newTest.destination=new_test
spring.cloud.stream.bindings.newTestCreated.destination=new_test
spring.cloud.stream.default.consumer.headerMode=embeddedHeaders
spring.cloud.stream.default.producer.headerMode=embeddedHeaders
SENDING MESSAGE:
#Publisher(channel = "testChannel")
public Object newTest(Object param) {
...
return myObject;
}
Yes, it does: http://docs.spring.io/spring-cloud-stream/docs/Chelsea.SR2/reference/htmlsingle/index.html#_consumer_properties
headerMode
When set to raw, disables header parsing on input. Effective only for messaging middleware that does not support message headers natively and requires header embedding. Useful when inbound data is coming from outside Spring Cloud Stream applications.
Default: embeddedHeaders
But that is already Spring Cloud Stream story, not Spring Kafka per se.

Spark Kafka Receiver is not picking data from all partitions

I have created a Kafka topic with 5 partitions. And I am using createStream receiver API like following. But somehow only one receiver is getting the input data. Rest of receivers are not processign anything. Can you please help?
JavaPairDStream<String, String> messages = null;
if(sparkStreamCount > 0){
// We create an input DStream for each partition of the topic, unify those streams, and then repartition the unified stream.
List<JavaPairDStream<String, String>> kafkaStreams = new ArrayList<JavaPairDStream<String, String>>(sparkStreamCount);
for (int i = 0; i < sparkStreamCount; i++) {
kafkaStreams.add( KafkaUtils.createStream(jssc, contextVal.getString(KAFKA_ZOOKEEPER), contextVal.getString(KAFKA_GROUP_ID), kafkaTopicMap));
}
messages = jssc.union(kafkaStreams.get(0), kafkaStreams.subList(1, kafkaStreams.size()));
}
else{
messages = KafkaUtils.createStream(jssc, contextVal.getString(KAFKA_ZOOKEEPER), contextVal.getString(KAFKA_GROUP_ID), kafkaTopicMap);
}
After adding the changes I am getting following exceptions:
INFO : org.apache.spark.streaming.kafka.KafkaReceiver - Connected to localhost:2181
INFO : org.apache.spark.streaming.receiver.ReceiverSupervisorImpl - Stopping receiver with message: Error starting receiver 0: java.lang.AssertionError: assertion failed
INFO : org.apache.spark.streaming.receiver.ReceiverSupervisorImpl - Called receiver onStop
INFO : org.apache.spark.streaming.receiver.ReceiverSupervisorImpl - Deregistering receiver 0
ERROR: org.apache.spark.streaming.scheduler.ReceiverTracker - Deregistered receiver for stream 0: Error starting receiver 0 - java.lang.AssertionError: assertion failed
at scala.Predef$.assert(Predef.scala:165)
at kafka.consumer.TopicCount$$anonfun$makeConsumerThreadIdsPerTopic$2.apply(TopicCount.scala:36)
at kafka.consumer.TopicCount$$anonfun$makeConsumerThreadIdsPerTopic$2.apply(TopicCount.scala:34)
at scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:772)
at scala.collection.immutable.Map$Map1.foreach(Map.scala:109)
at scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:771)
at kafka.consumer.TopicCount$class.makeConsumerThreadIdsPerTopic(TopicCount.scala:34)
at kafka.consumer.StaticTopicCount.makeConsumerThreadIdsPerTopic(TopicCount.scala:100)
at kafka.consumer.StaticTopicCount.getConsumerThreadIdsPerTopic(TopicCount.scala:104)
at kafka.consumer.ZookeeperConsumerConnector.consume(ZookeeperConsumerConnector.scala:198)
at kafka.consumer.ZookeeperConsumerConnector.createMessageStreams(ZookeeperConsumerConnector.scala:138)
at org.apache.spark.streaming.kafka.KafkaReceiver.onStart(KafkaInputDStream.scala:111)
at org.apache.spark.streaming.receiver.ReceiverSupervisor.startReceiver(ReceiverSupervisor.scala:148)
at org.apache.spark.streaming.receiver.ReceiverSupervisor.start(ReceiverSupervisor.scala:130)
at org.apache.spark.streaming.scheduler.ReceiverTracker$ReceiverTrackerEndpoint$$anonfun$9.apply(ReceiverTracker.scala:542)
at org.apache.spark.streaming.scheduler.ReceiverTracker$ReceiverTrackerEndpoint$$anonfun$9.apply(ReceiverTracker.scala:532)
at org.apache.spark.SparkContext$$anonfun$37.apply(SparkContext.scala:1986)
at org.apache.spark.SparkContext$$anonfun$37.apply(SparkContext.scala:1986)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
at org.apache.spark.scheduler.Task.run(Task.scala:88)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
INFO : org.apache.spark.streaming.receiver.ReceiverSupervisorImpl - Stopped receiver 0
INFO : org.apache.spark.streaming.receiver.BlockGenerator - Stopping BlockGenerator
INFO : org.apache.spark.streaming.util.RecurringTimer - Stopped timer for BlockGenerator after time 1473964037200
INFO : org.apache.spark.streaming.receiver.BlockGenerator - Waiting for block pushing thread to terminate
INFO : org.apache.spark.streaming.receiver.BlockGenerator - Pushing out the last 0 blocks
INFO : org.apache.spark.streaming.receiver.BlockGenerator - Stopped block pushing thread
INFO : org.apache.spark.streaming.receiver.BlockGenerator - Stopped BlockGenerator
INFO : org.apache.spark.streaming.receiver.ReceiverSupervisorImpl - Waiting for receiver to be stopped
ERROR: org.apache.spark.streaming.receiver.ReceiverSupervisorImpl - Stopped receiver with error: java.lang.AssertionError: assertion failed
ERROR: org.apache.spark.executor.Executor - Exception in task 0.0 in stage 29.0
There is one issue with the above code. The kafkaTopicMap parameter in KafkaUtils.createStream method specify Map of (topic_name -> numPartitions) to consume. Each partition is consumed in its own thread.
Try the below code:
JavaPairDStream<String, String> messages = null;
int sparkStreamCount = 5;
Map<String, Integer> kafkaTopicMap = new HashMap<String, Integer>();
if (sparkStreamCount > 0) {
List<JavaPairDStream<String, String>> kafkaStreams = new ArrayList<JavaPairDStream<String, String>>(sparkStreamCount);
for (int i = 0; i < sparkStreamCount; i++) {
kafkaTopicMap.put(topic, i+1);
kafkaStreams.add(KafkaUtils.createStream(streamingContext, contextVal.getString(KAFKA_ZOOKEEPER), contextVal.getString(KAFKA_GROUP_ID), kafkaTopicMap));
}
messages = streamingContext.union(kafkaStreams.get(0), kafkaStreams.subList(1, kafkaStreams.size()));
} else {
messages = KafkaUtils.createStream(streamingContext, contextVal.getString(KAFKA_ZOOKEEPER), contextVal.getString(KAFKA_GROUP_ID), kafkaTopicMap);
}

Resources