Spring Integration - WIth AWS S3 ( Retry Strategy) - spring-boot

I am creating a simple integration service with AWS S3. I am facing some difficulties when an exception occurs.
My requirement is to poll an S3 Bucket periodically and to apply some transformation whenever a file is newly placed into S3 Bucket. The below code snippet works fine, but when an exception occurs it continues to retry again and again. I do not want that to happen. Can someone help me here.,
The IntegrationFlow is defined as below.,
#Configuration
public class S3Routes {
#Bean
public IntegrationFlow downloadFlow(MessageSource<InputStream> s3InboundStreamingMessageSource) {
return IntegrationFlows.from(s3InboundStreamingMessageSource)
.channel("s3Channel")
.handle("QueryServiceImpl", "processFile")
.get();
}
}
Configuration file is as below.,
#Service
public class S3AppConfiguration {
#Bean
#InboundChannelAdapter(value = "s3Channel")
public MessageSource<InputStream> s3InboundStreamingMessageSource(S3RemoteFileTemplate template) {
S3StreamingMessageSource messageSource = new S3StreamingMessageSource(template);
messageSource.setRemoteDirectory("my-bucket-name");
messageSource.setFilter(new S3PersistentAcceptOnceFileListFilter(new SimpleMetadataStore(),
"streaming"));
return messageSource;
}
#Bean
public PollableChannel s3Channel() {
return new QueueChannel();
}
#Bean
public S3RemoteFileTemplate template(AmazonS3 amazonS3) {
return new S3RemoteFileTemplate(new S3SessionFactory(amazonS3));
}
#Bean(name = "amazonS3")
public AmazonS3 nonProdAmazonS3(BasicAWSCredentials basicAWSCredentials) {
ClientConfiguration config = new ClientConfiguration();
config.setProxyHost("localhost");
config.setProxyPort(3128);
return AmazonS3ClientBuilder.standard().withRegion(Regions.fromName("ap-southeast-1"))
.withCredentials(new AWSStaticCredentialsProvider(basicAWSCredentials))
.withClientConfiguration(config)
.build();
}
#Bean
public BasicAWSCredentials basicAWSCredentials() {
return new BasicAWSCredentials("access_key", "secret_key");
}
#Bean(name = PollerMetadata.DEFAULT_POLLER)
public PollerMetadata nonProdPoller() {
return Pollers.cron("* */2 * * * *")
.get();
}
}
AcceptOnceFileList filter that I have used here, helps me to prevent handling the same file for continuous retries. But, I do not want to use AcceptOnceFileList filter, because when a file is not processed on 1st attempt, I wish to retry on next Poll (usually it happens every 1 hour in Prod region). I tried to use filter.remove() method whenever the processing fails(in case of any exception), it again results in continuous retries.
I am not sure how to disable the continuous retries on failure. Where should I configure it?
I took a look at Spring Integration ( Retry Strategy). Same scenario, but a different integration. I am not sure how to set up this for my IntegrationFlow. Can someone help here? Thanks in advance

That story is different: it talks about a listener container for AMQP. You use a source polling channel adapter - the approach might be different.
You create two source polling channel adapters: one via that #InboundChannelAdapter, another via IntegrationFlows.from(s3InboundStreamingMessageSource). Both of them produces data to the same channel. Not sure if that is really intentional.
It is not clear what is that retry in your case unless you really do that manual filter.remove() call. In this case indeed it is going to retry. But this is a single, not controlled retry. It is going to retry again only if you call that filter.remove() again. So, if you do everything yourself, why is the question?
Consider to use a RequestHandlerRetryAdvice configured for that your handle() instead: https://docs.spring.io/spring-integration/docs/current/reference/html/messaging-endpoints.html#message-handler-advice-chain. This way you really going to pull the remote file only once and retry is going to be managed by the Spring Retry API.
UPDATE
So, after some Cron Expression learning I realized that your one is wrong:
* */2 * * * * - means every second of every even minute
Must be like this:
0 */2 * * * * - at the beginning of every even minute
Perhaps something similar is with your hourly cron expression on the prod...

Related

JobRunr Spring Boot: how to get notified if a recurring job - including retries - has failed

I'm using jobrunr 5.1.4 in my spring boot application. I have a simple service declaring a recurring job which allows for some retries. A single failing job run is not that relevant for me. Instead, I'm interested in getting notified after all jobs, i.e. the initial job including all the retries, have failed.
I thought JobRunr's JobServerFilter would be a good idea. But the onProcessed() method never gets triggered in case of an exception only in case of a successful job run. And the ApplyStateFilter gets triggered on every state change. Far too often for my requirement. Leaving me clueless, if a change to a FAILED state was the last in a series of jobs belonging together (initial job + allowed retried jobs).
A simple example would look like this:
#Service
public class JobScheduler {
#Job(name = "My Recurring Job", retries = 2, jobFilters = ExceptionFilter.class)
#Recurring(id = "my-recurring-job", cron = "*/10 * * * *")
public void recurringJob() {
throw new RuntimeException("foo");
}
}
A basic implementation of my JobFilter looks like this:
#Component
public class ExceptionFilter implements JobServerFilter, ApplyStateFilter {
#Override
public void onProcessing(Job job) {
log.info("onProcessing: {}", job.getJobName());
log.info(job.getJobState().getName().name());
}
#Override
public void onProcessed(Job job) {
log.info("onProcessed: {}", job.getJobName());
log.info(job.getJobState().getName().name());
}
#Override
public void onStateApplied(Job job, JobState jobState1, JobState jobState2) {
log.info("onStateApplied: {}", job.getJobName());
log.info("jobState1: {}", jobState1.getName().name());
log.info("jobState2: {}", jobState2.getName().name());
}
}
Is this use case even possible with JobRunr? Or does anyone have an idea how to solve this issue in a different way?
Thank you very much in advance for you support.
I think you're on the right track with onStateApplied from ApplyStateFilter.
You can use the following approach:
#Override
public void onStateApplied(Job job, JobState oldState, JobState newState) {
if (isFailed(newState) && maxAmountOfRetriesReached(job)) {
// your logic here
}
}
OnProcessed is not triggered as your job was not processed (due to the failure).

Spring Apache Kafka onFailure Callback of KafkaTemplate not fired on connection error

I'm experimenting a lot with Apache Kafka in a Spring Boot App at the moment.
My current goal is to write a REST endpoint that takes in some message payload, which will use a KafkaTemplate to send the data to my local Kafka running on port 9092.
This is my producer config:
#Bean
public Map<String,Object> producerConfig() {
// config settings for creating producers
Map<String,Object> configProps = new HashMap<>();
configProps.put(ProducerConfig.BOOTSTRAP_SERVERS_CONFIG,this.bootstrapServers);
configProps.put(ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG, StringSerializer.class);
configProps.put(ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG,StringSerializer.class);
configProps.put(ProducerConfig.MAX_BLOCK_MS_CONFIG,5000);
configProps.put(ProducerConfig.REQUEST_TIMEOUT_MS_CONFIG,4000);
configProps.put(ProducerConfig.RETRIES_CONFIG,0);
return configProps;
}
#Bean
public ProducerFactory<String,String> producerFactory() {
// creates a kafka producer
return new DefaultKafkaProducerFactory<>(producerConfig());
}
#Bean("kafkaTemplate")
public KafkaTemplate<String,String> kafkaTemplate(){
// template which abstracts sending data to kafka
return new KafkaTemplate<>(producerFactory());
}
My rest endpoint forwards to a service, the service looks like this:
#Service
public class KafkaSenderService {
#Qualifier("kafkaTemplate")
private final KafkaTemplate<String,String> kafkaTemplate;
#Autowired
public KafkaSenderService(KafkaTemplate<String,String> kafkaTemplate) {
this.kafkaTemplate = kafkaTemplate;
}
public void sendMessageWithCallback(String message, String topicName) {
// possibility to add callbacks to define what shall happen in success/ error case
ListenableFuture<SendResult<String,String>> future = kafkaTemplate.send(topicName, message);
future.addCallback(new KafkaSendCallback<String, String>() {
#Override
public void onFailure(KafkaProducerException ex) {
logger.warn("Message could not be delivered. " + ex.getMessage());
}
#Override
public void onSuccess(SendResult<String, String> result) {
logger.info("Your message was delivered with following offset: " + result.getRecordMetadata().offset());
}
});
}
}
The thing now is: I'm expecting the "onFailure()" method to get called when the message could not be sent. But this seems not to work. When I change the bootstrapServers variable in the producer config to localhost:9091 (which is the wrong port, so there should be no connection possible), the producer tries to connect to the broker. It will do several connection attempts, and after 5 seconds, a TimeOutException will occur. But the "onFailure() method won't get called. Is there a way to achieve that the "onFailure()" method can get called event if the connection cannot be established?
And by the way, I set the retries count to zero, but the prodcuer still does a second connection attempt after the first one. This is the log output:
EDIT: it seems like the Kafke producer/ KafkaTemplate goes into an infinite loop when the broker is not available. Is that really the intended behaviour?
The KafkaTemplate does really nothing fancy about connection and publishing. Everything is delegated to the KafkaProducer. What you describe here would happen exactly even if you'd use just plain Kafka Client.
See KafkaProducer.send() JavaDocs:
* #throws TimeoutException If the record could not be appended to the send buffer due to memory unavailable
* or missing metadata within {#code max.block.ms}.
Which happens by the blocking logic in that producer:
/**
* Wait for cluster metadata including partitions for the given topic to be available.
* #param topic The topic we want metadata for
* #param partition A specific partition expected to exist in metadata, or null if there's no preference
* #param nowMs The current time in ms
* #param maxWaitMs The maximum time in ms for waiting on the metadata
* #return The cluster containing topic metadata and the amount of time we waited in ms
* #throws TimeoutException if metadata could not be refreshed within {#code max.block.ms}
* #throws KafkaException for all Kafka-related exceptions, including the case where this method is called after producer close
*/
private ClusterAndWaitTime waitOnMetadata(String topic, Integer partition, long nowMs, long maxWaitMs) throws InterruptedException {
Unfortunately this is not explained in the send() JavaDocs which claims to be fully asynchronous, but apparently it is not. At least in this metadata part which has to be available before we enqueue the record for publishing.
That's what we cannot control and it is not reflected on the returned Future:
try {
clusterAndWaitTime = waitOnMetadata(record.topic(), record.partition(), nowMs, maxBlockTimeMs);
} catch (KafkaException e) {
if (metadata.isClosed())
throw new KafkaException("Producer closed while send in progress", e);
throw e;
}
See more info in Apache Kafka docs how to adjust the KafkaProducer for this matter: https://kafka.apache.org/documentation/#theproducer
Question answered inside the discussion on https://github.com/spring-projects/spring-kafka/discussions/2250# for anyone else stumbling across this thread. In short, kafkaTemplate.getProducerFactory().reset();does the trick.

how to stop consuming messages from kafka when error occurred and restart consuming again after some time in spring boot

This is the first time i am using Kafka. i have a spring boot application and i am consuming messages from kafka topics and storing messages in DB. I have a requirement to handle DB fail over, if DB is down that message should not be committed and suspend consuming messages for some time and after some time listener can start consuming messages again. what is the better approach to do this.
i am using spring-kafka:2.2.8.RELEASE which is internally using kafka 2.0.1
Configure a ContainerStoppingErrorHandler and throw an exception from your listener.
https://docs.spring.io/spring-kafka/docs/2.2.13.RELEASE/reference/html/#container-stopping-error-handlers
You can restart the container later when you have detected that your DB is back online.
https://docs.spring.io/spring-kafka/docs/2.2.13.RELEASE/reference/html/#kafkalistener-lifecycle
EDIT
#SpringBootApplication
public class So62125817Application {
public static void main(String[] args) {
SpringApplication.run(So62125817Application.class, args);
}
#Bean
TaskScheduler scheduler() {
return new ThreadPoolTaskScheduler();
}
#Bean
public NewTopic topic() {
return TopicBuilder.name("so62125817").partitions(1).replicas(1).build();
}
}
#Component
class Listener {
private final TaskScheduler scheduler;
private final KafkaListenerEndpointRegistry registry;
public Listener(TaskScheduler scheduler, KafkaListenerEndpointRegistry registry,
AbstractKafkaListenerContainerFactory<?, ?, ?> factory) {
this.scheduler = scheduler;
this.registry = registry;
factory.setErrorHandler(new ContainerStoppingErrorHandler());
}
#KafkaListener(id = "so62125817.id", topics = "so62125817")
public void listen(String in) {
System.out.println(in);
// run this code if you want to stop the container and restart it in 60 seconds
this.scheduler.schedule(() -> {
this.registry.getListenerContainer("so62125817.id").start();
}, new Date(System.currentTimeMillis() + 60_000));
throw new RuntimeException("test restart");
}
}
There are two approaches which I can think of doing this:
First Approach: Let the auto-commit option for consuming messages be true. The configuration for this is enable.auto.commit. By default, this would be true, so you do not need to change anything. Whenever your DB operation fails, you can put the messages on a different topic say a topic named failed_events. When you do this, you can have the same application (Which populates the DB) running say once at a daily level to consume the message from failed_events topic and populate the DB again. This way you can keep track of how many times the DB write gets failed. One small thing to note is what if during this run also the DB is down, then what do you do. You can decide what to do in this case. Probably discard the message if it is Ok to do so, or do a certain number of retries.
Second approach: If it is very deterministic to know that for how long the DB would be down. And if the time period is very small, then it is better to do a sleep operation in the case of DB write failure. Say the application sleeps for 10 minutes before it retries again. You will not have to create a separate topic in this case.
The advantage of this approach is that you don't have to run a separate instance of the same application to fetch from a different topic. You could do all of them in one single application. Maintaining this becomes relatively easier.
The disadvantage of this approach is that if the DB is down for a very long period, say 1 day, Then you will end up losing the message.

Send and receive files from FTP in Spring Boot

I'm new to Spring Framework and, indeed, I'm learning and using Spring Boot. Recently, in the app I'm developing, I made Quartz Scheduler work, and now I want to make Spring Integration work there: FTP connection to a server to write and read files from.
What I want is really simple (as I've been able to do so in a previous java application). I've got two Quartz Jobs scheduled to fired in different times daily: one of them reads a file from a FTP server and another one writes a file to a FTP server.
I'll detail what I've developed so far.
#SpringBootApplication
#ImportResource("classpath:ws-config.xml")
#EnableIntegration
#EnableScheduling
public class MyApp extends SpringBootServletInitializer {
#Autowired
private Configuration configuration;
//...
#Bean
public DefaultFtpsSessionFactory myFtpsSessionFactory(){
DefaultFtpsSessionFactory sess = new DefaultFtpsSessionFactory();
Ftp ftp = configuration.getFtp();
sess.setHost(ftp.getServer());
sess.setPort(ftp.getPort());
sess.setUsername(ftp.getUsername());
sess.setPassword(ftp.getPassword());
return sess;
}
}
The following class I've named it as a FtpGateway, as follows:
#Component
public class FtpGateway {
#Autowired
private DefaultFtpsSessionFactory sess;
public void sendFile(){
// todo
}
public void readFile(){
// todo
}
}
I'm reading this documentation to learn to do so. Spring Integration's FTP seems to be event driven, so I don't know how can I execute either of the sendFile() and readFile() from by Jobs when the trigger is fired at an exact time.
The documentation tells me something about using Inbound Channel Adapter (to read files from a FTP?), Outbound Channel Adapter (to write files to a FTP?) and Outbound Gateway (to do what?):
Spring Integration supports sending and receiving files over FTP/FTPS by providing three client side endpoints: Inbound Channel Adapter, Outbound Channel Adapter, and Outbound Gateway. It also provides convenient namespace-based configuration options for defining these client components.
So, I haven't got it clear as how to follow.
Please, could anybody give me a hint?
Thank you!
EDIT:
Thank you #M. Deinum. First, I'll try a simple task: read a file from the FTP, the poller will run every 5 seconds. This is what I've added:
#Bean
public FtpInboundFileSynchronizer ftpInboundFileSynchronizer() {
FtpInboundFileSynchronizer fileSynchronizer = new FtpInboundFileSynchronizer(myFtpsSessionFactory());
fileSynchronizer.setDeleteRemoteFiles(false);
fileSynchronizer.setPreserveTimestamp(true);
fileSynchronizer.setRemoteDirectory("/Entrada");
fileSynchronizer.setFilter(new FtpSimplePatternFileListFilter("*.csv"));
return fileSynchronizer;
}
#Bean
#InboundChannelAdapter(channel = "ftpChannel", poller = #Poller(fixedDelay = "5000"))
public MessageSource<File> ftpMessageSource() {
FtpInboundFileSynchronizingMessageSource source = new FtpInboundFileSynchronizingMessageSource(inbound);
source.setLocalDirectory(new File(configuracion.getDirFicherosDescargados()));
source.setAutoCreateLocalDirectory(true);
source.setLocalFilter(new AcceptOnceFileListFilter<File>());
return source;
}
#Bean
#ServiceActivator(inputChannel = "ftpChannel")
public MessageHandler handler() {
return new MessageHandler() {
#Override
public void handleMessage(Message<?> message) throws MessagingException {
Object payload = message.getPayload();
if(payload instanceof File){
File f = (File) payload;
System.out.println(f.getName());
}else{
System.out.println(message.getPayload());
}
}
};
}
Then, when the app is running, I put a new csv file intro "Entrada" remote folder, but the handler() method isn't run after 5 seconds... I'm doing something wrong?
Please add #Scheduled(fixedDelay = 5000) over your poller method.
You should use SPRING BATCH with tasklet. It is far easier to configure bean, crone time, input source with existing interfaces provided by Spring.
https://www.baeldung.com/introduction-to-spring-batch
Above example is annotation and xml based both, you can use either.
Other benefit Take use of listeners and parallel steps. This framework can be used in Reader - Processor - Writer manner as well.

How to set a Message Handler programmatically in Spring Cloud AWS SQS?

maybe someone has an idea to my following problem:
I am currently on a project, where i want to use the AWS SQS with Spring Cloud integration. For the receiver part i want to provide a API, where a user can register a "message handler" on a queue, which is an interface and will contain the user's business logic, e.g.
MyAwsSqsReceiver receiver = new MyAwsSqsReceiver();
receiver.register("a-queue-name", new MessageHandler(){
#Override
public void handle(String message){
//... business logic for the received message
}
});
I found examples, e.g.
https://codemason.me/2016/03/12/amazon-aws-sqs-with-spring-cloud/
and read the docu
http://cloud.spring.io/spring-cloud-aws/spring-cloud-aws.html#_sqs_support
But the only thing i found there to "connect" a functionality for processing a incoming message is a annotation on a method, e.g. #SqsListener or #MessageMapping.
These annotations are fixed to a certain queue-name, though. So now i am at a loss, how to dynamically "connect" my provided "MessageHandler" (from my API) to the incoming message for the specified queuename.
In the Config the example there is a SimpleMessageListenerContainer, which gets a QueueMessageHandler set, but this QueueMessageHandler does not seem
to be the right place to set my handler or to override its methods and provide my own subclass of QueueMessageHandler.
I already did something like this with the Spring Amqp integration and RabbitMq and thought, that it would be also similar here with AWS SQS.
Does anyone have an idea, how to accomplish this?
thx + bye,
Ximon
EDIT:
I found, that Spring JMS could actually do that, e.g. www.javacodegeeks.com/2016/02/aws-sqs-spring-jms-integration.html. Does anybody know, what consequences using JMS protocol has here, good or bad?
I am facing the same issue.
I am trying to go in an unusual way where I set up an Aws client bean at build time and then instead of using sqslistener annotation to consume from the specific queue I use the scheduled annotation which I can programmatically pool (each 10 secs in my case) from which queue I want to consume.
I did the example that iterates over queues defined in properties and then consumes from each one.
Client Bean:
#Bean
#Primary
public AmazonSQSAsync awsSqsClient() {
return AmazonSQSAsyncClientBuilder
.standard()
.withRegion(Regions.EU_WEST_1.getName())
.build();
}
Consumer:
// injected in the constructor
private final AmazonSQSAsync awsSqsClient;
#Scheduled(fixedDelay = 10000)
public void pool() {
properties.getSqsQueues()
.forEach(queue -> {
val receiveMessageRequest = new ReceiveMessageRequest(queue)
.withWaitTimeSeconds(10)
.withMaxNumberOfMessages(10);
// reading the messages
val result = awsSqsClient.receiveMessage(receiveMessageRequest);
val sqsMessages = result.getMessages();
log.info("Received Message on queue {}: message = {}", queue, sqsMessages.toString());
// deleting the messages
sqsMessages.forEach(message -> {
val deleteMessageRequest = new DeleteMessageRequest(queue, message.getReceiptHandle());
awsSqsClient.deleteMessage(deleteMessageRequest);
});
});
}
Just to clarify, in my case, I need multiple queues, one for each tenant, with the queue URL for each one passed in a property file. Of course, in your case, you could get the queue names from another source, maybe a ThreadLocal which has the queues you have created in runtime.
If you wish, you can also try the JMS approach where you create message consumers and add a listener to each one you wish (See the doc Aws Jms documentation).
When we do Spring and SQS we use the spring-cloud-starter-aws-messaging.
Then just create a Listener class
#Component
public class MyListener {
#SQSListener(value="myqueue")
public void listen(MyMessageType message) {
//process the message
}
}

Resources