Spring boot applications high availability - spring-boot

We have a microservice which is developed using spring boot. couple of the functionalities it implements is
1) A scheduler that triggers, at a specified time, a file download using webhdfs and process it and once the data is processed, it will send an email to users with the data process summary.
2) Read messages from kafka and once the data is read, send an email to users.
We are now planning to make this application high available either in Active-Active or Active-passive set up. The problem we are facing now is if both the instances of the application are running then both of them will try to download the file/read the data from kafka, process it and send emails. How can this be avoided? I mean to ensure that only one instance triggers the download and process it ?
Please let me know if there is known solution for this kind of scenarios as this seems to be a common scenario in most of the projects? Is master-slave/leader election approach a correct solution?
Thanks

Let the service download that file, extract the information and publish them via kafka.
Check beforehand if the information was already processed by querying kafka or a local DB.
You also could publish an DataProcessed-Event that triggers the EmailService, that sends the corresponding E-Mail.

Related

Using Spring or Lambda for bulk event trigger

Looking for some help on an application design. I am using spring framework and hosting application in AWS.
I am working on an enterprise Java Web application that is suppose to handle events when their trigger time is reached. For example, consumers can set an event to begin on 12/20/22 at 07:35 AM, and system is suppose to send a notification when that time is reached.
I can store these events in a database along with their trigger time and setup a Spring scheduler (#Scheduler) to run every minute and process events whose trigger time is reached. My only concern with this approach is, there could be hundreds/thousands of event to trigger at any minute, and it cannot be processed within one minute.
Is there any alternate way to design this? I don't know if Spring offers a feature where I could create these Event, and Frameworks trigger these events when trigger time is reached. In that way, I can stay away from managing Scheduling and Triggering part.
I am using AWS to host this applications, so another option I'm thinking towards is creating an AWS lambda for every such Event, and let AWS manage the triggering part. In that way, I can stay away from managing the triggers.
Let me know your views? Or If you came across similar problems and how you resolved that?
You can consider using spring-cloud-dataflow to manage this as tasks and streams.
You create a custom batch application that will use #Scheduled to check the your database when events are dure and then send events to a stream. You can use Spring Integration APIs to interact with RabbitMQ or Kafka topics.
The event should contain enough information needed to process the event.
You then have a stream application that produces the content and send via email or pass it on to a separate stream app that sends the email.
https://dataflow.spring.io/docs/stream-developer-guides/programming-models/
The flow will look something like:
:mail_events | message-processor | message-sender
You will configure property for mail_events to match the topic created and configured for you mail-event-batch application.
You can use Spring Cloud Data Flow to manage the mail-event-batch application as well.
You can scale each application https://dataflow.spring.io/docs/recipes/scaling/

Read data directly from REST Api with Kafka

Is such a situation even possible ? :
There is an application "XYZ" (in which there is no Kafka) that exposes a REST api. It is a SpringBoot application with which Angular application communicates.
A new application (SpringBoot) is created which wants to use Kafka and needs to fetch data from "XYZ" application. And it wants to do this using Kafka.
The "XYZ" application has an example endpoint [GET] api/message/all which displays all messages.
Is there a way to "connect" Kafka directly to this endpoint and read data from it ? In short, the idea is for Kafka to consume data directly from the EP. Communication between two microservices, where one microservice does not have a kafka.
What suggestions do you have for solving this situation. Because I guess this option is not possible. Is it necessary to add a publisher in application XYZ which will send data to the queue and only then will they be available for consumption by a new application ??
Getting them via the REST-Interface might not be a very good idea.
Simply put, in the messaging world, message delivery guarantees are a big topic and the standard ways to solve that with Kafka are usually
Producing messages from your service using the Producer-API to a Kafka topic.
Using Kafka-Connect to read from an outbox-table.
Since you most likely have a database already attached to your API-Service, there might arise the problem of dual writes if you choose to produce the messages directly to a topic. What this means, is that writes to a database might fail while it might be successfully written to Kafka/vice-versa. So you can end up with inconsistent states. Depending on your use case this might be a problem or not.
Nevertheless, to overcome that, the outbox pattern can come in handy.
Via the outbox pattern, you'd basically write your messages to a table, a so-called outbox-table, and then you'd use Kafka-Connect to poll this table of the database. Kafka Connect is basically a cluster of workers that consume this database table and forward the entries of the table to a Kafka topic. You might want to look at confluent cloud, they offer a fully managed Kafka-Connect service. Like this you don't have to manage the cluster of workers yourself. Once you have the messages in a Kafka topic, you can consume them with the standard Kafka Consumer-API/ Stream-API.
What you're looking for is a Source-Connector.
A source connector for a specific database. E.g. MongoDB
E.g. https://www.confluent.io/hub/mongodb/kafka-connect-mongodb
For now, most source-connectors produce in an at-least-once fashion. This means that the topic you configure the connector to write to might contain a message twice. So make sure that if you need them to be consumed exactly once, you think about deduplicating these messages.

Microservice failure Scenario

I am working on Microservice architecture. One of my service is exposed to source system which is used to post the data. This microservice published the data to redis. I am using redis pub/sub. Which is further consumed by couple of microservices.
Now if the other microservice is down and not able to process the data from redis pub/sub than I have to retry with the published data when microservice comes up. Source can not push the data again. As source can not repush the data and manual intervention is not possible so I tohught of 3 approaches.
Additionally Using redis data for storing and retrieving.
Using database for storing before publishing. I have many source and target microservices which use redis pub/sub. Now If I use this approach everytime i have to insert the request in DB first than its response status. Now I have to use shared database, this approach itself adding couple of more exception handling cases and doesnt look very efficient to me.
Use kafka inplace if redis pub/sub. As traffic is low so I used Redis pub/sub and not feasible to change.
In both of the above cases, I have to use scheduler and I have a duration before which I have to retry else subsequent request will fail.
Is there any other way to handle above cases.
For the point 2,
- Store the data in DB.
- Create a daemon process which will process the data from the table.
- This Daemon process can be configured well as per our needs.
- Daemon process will poll the DB and publish the data, if any. Also, it will delete the data once published.
Not in micro service architecture, But I have seen this approach working efficiently while communicating 3rd party services.
At the very outset, as you mentioned, we do indeed seem to have only three possibilities
This is one of those situations where you want to get a handshake from the service after pushing and after processing. In order to accomplish the same, using a middleware queuing system would be a right shot.
Although a bit more complex to accomplish, what you can do is use Kafka for streaming this. Configuring producer and consumer groups properly can help you do the job smoothly.
Using a DB to store would be a overkill, considering the situation where you "this data is to be processed and to be persisted"
BUT, alternatively, storing data to Redis and reading it in a cron-job/scheduled job would make your job much simpler. Once the job is run successfully, you may remove the data from cache and thus save Redis Memory.
If you can comment further more on the architecture and the implementation, I can go ahead and update my answer accordingly. :)

Sending scheduled emails in a spring application?

I need to achieve the following : -
Sending emails to around 6000 users around 30 times in a year. Sometimes sending emails at specific time of day else at midnight.
I need to provide retry functionality in my application, so if by some reason my application failed to send email to some of the user it should retry to send 3 times (till 3 days) before finally marking it as failure.
i need to send emails using predefined templates but having dynamic data in it.
My application tech stack - java, spring boot 1.4, oracle database, CA autosys job scheduler, activiti bpm (not using Activiti as of now but can use it if it is the best solution)
My current solution :-
Use autosys scheduler to define these jobs.
calling my Rest exposed services (spring + java + oracle tech stack), that perform all the application logic and them Apache commons email to send the email using my smtp server.
My question - What is the recommended way to send email in this case? As i have to maintain various tables to achieve retry functionality. should i use activiti instead of autosys scheduler? Or spring framework itself for this email scheduling?
I don't see any business processes to be managed in your problem. as far as no business people are involved in any task (such as filling a form, make a decision based on input provided), you should avoid activiti. Activiti is a BPM engine, there is no use of it unless you are managing a process. for schedulers you should definitely go ahead with the spring framework. Do let me know if i've missed any point.

scheduling jobs using spring batch or just Quartz scheduler

I am looking for best solution to create a java web application to generate reports in excel/PDf format. some thing similar to Google Adwords, where user can create schedule reports and download it when the report is generated at a later time.
I am thinking to develop and java application where User logs, selects a pre defined report and provides the input parameters (like report date etc), This request will be queued up or saved as Quarts Job(prefer persistent Queue). A Job will be monitoring the queue/job and execute the job, generate the report(output excel /pdf) and stored in disk.
When the user refresh the screen or logs back at a later time, the report should be available for down load.
Using Spring batch and Quartz scheduler can I do this ? I also expecting like Spring admin , where I can see number of request in Queue(jobs queued up), and stop the queue processing etc.
You would use spring-batch if you wanted to process all report requests at the same time, perhaps at night when your servers are not otherwise occupied processing real-time user requests (or even during the day during slow periods).
You would use a quartz job if you wanted to check for new jobs every few seconds/minutes/hours/etc, and process one/many of them at that specified time interval.
So, quartz is a scheduler and batch is a process. You could use quartz to schedule batch jobs to run at specific times. They aren't competing technologies, they are complimentary.
About your question:
Given that you talk about queues and their persistence however it sounds a lot like your problem would fit into a simple jms model. You would need some messaging software. If you want to make it easy on yourself I'd recommend using spring-jms as a wrapper around the basic Java EE JMS api -- the spring wrappers are simply simpler than basic jms. For a messaging service I'd look at RabbitMQ, because again it's pretty simple.
With the jms architecture you'd post user requests to the queue, which you'd configured to be persistent. You'd have a custom listener on the queue, passing requests to a report generator whenever it runs. You can assign one or more threads to the listener, meaning that you should find it easy to tune the performance of the report generator.
There is a pretty useful DZone article about using rabbitmq via spring-integration (a set of prebuilt pattern implementations that help with connecting things to each other).

Resources