Spring Cloud Dataflow - Retaining Order of Messages

Spring Cloud Dataflow - Retaining Order of Messages - spring

Let's say I have a stream with 3 applications - a source, processor, and sink.
I need to retain the order of my the messages I received from my source. When I receive messages A,B,C,D, I have to send them to sink as A,B,C,D. (I can't send them as B,A,C,D).
If I have just have 1 instance of each application, everything will run sequentially and the order will be retained.
If I have 10 instances of each application, the messages A,B,C,D might get processed at the same time in different instances. I don't know what order these messages will wind up in.
So is there any way I can ensure that I retain the order of my messages when using multiple instances?

No; when you scale out (either by concurrency in the binder or by deploying multiple instances), you lose order. This is true for any multi-threaded application, not just spring-cloud-stream.
You can use partitioning so that each instance gets a partition of the data, but ordering is only retained within each partition.
If you have sequence information in your messages, you can add a custom module using a Spring Integration Resequencer to reassemble your messages back into the same sequence - but you'll need a single instance of the resequencer before a single sink instance.

Related

Best way to track/trace a JSON Object (a time series data) as it flows through a system of microservices on a IOT platform

We are working on an IOT platform, which ingests many device parameter
values (time series) every second from may devices. Once ingested the
each JSON (batch of multiple parameter values captured at a particular
instance) What is the best way to track the JSON as it flows through
many microservices down stream in an event driven way?
We use spring boot technology predominantly and all the services are
containerised.
Eg: Option 1 - Is associating UUID to each object and then updating
the states idempotently in Redis as each microservice processes it
ideal? Problem is each microservice will be tied to Redis now and we
have seen performance of Redis going down as number api calls to Redis
increase as it is single threaded (We can scale this out though).
Option 2 - Zipkin?
Note: We use Kafka/RabbitMQ to process the messages in a distributed
way as you mentioned here. My question is about a strategy to track
each of this message and its status (to enable replay if needed to
attain only once delivery). Let's say a message1 is being by processed
by Service A, Service B, Service C. Now we are having issues to track
if the message failed getting processed at Service B or Service C as
we get a lot of messages

Better approach will be using Kafka instead of Redis.
Create a topic for every microservice & keep moving the packet from
one topic to another after processing.
topic(raw-data) - |MS One| - topic(processed-data-1) - |MS Two| - topic(processed-data-2) ... etc
Keep appending the results to same object and keep moving it down the line, untill every micro-service has processed it.

How to create unique messages to rabbitmq queue - spring-amp

I am putting a message containing string data to rabbitmq queue.
Message publishing is called as a part of a service and the service can be called with same data (data goes to the queue) multiple times, thus chances for having duplicated data in the queue is very likely.
We have issues with this as the consumer code is inserting this data to table where this data is primary key. Consumer will be called from 4 different nodes simultaneously thus chances for having consumers consuming same data (from different messages) can happen.
I want to know if rabbitMQ publishing has any way to avoid message duplication.
Read "define a property "x-unique-message-code" to compare them is an easy and simple way" , but don't know how to do it.
I am using spring-amqp
Any help is highly appreciated.
Thank you

There is a good article from RabbitMQ about reliability: https://www.rabbitmq.com/reliability.html
There is a note like:
In the event of network failure (or a node crashing), messages can be duplicated, and consumers must be prepared to handle them. If possible, the simplest way to handle this is to ensure that your consumers handle messages in an idempotent way rather than explicitly deal with deduplication.
For this purpose the message to produce can be supplied with a messageId property.

Spring integration service activator with multiple messages

I would like to process multiple messages at a time e.g. get 10 messages from the channel at a time and write them to a log file at once.
Given the scenario, can I write a service activator which will get messages in predefined set i.e. 5 or 10 messages and process it? If this is not possible then how to achieve this using Spring Integration.

That is exactly what you can get with the Aggregator. You can collect several messages to the group using simple expression like size() == 10. When the group is complete, the DefaultAggregatingMessageGroupProcessor emits a single message with the list of payloads of messages in the group. The result you can send to the service-activator for handling the batch at once.
UPDATE
Something like this:
.aggregate(aggregator -> aggregator
.correlationStrategy(message -> 1)
.releaseStrategy(group -> group.size() == 10)
.outputProcessor(g -> new GenericMessage<Collection<Message<?>>>(g.getMessages()))
.expireGroupsUponCompletion(true))
So, we correlate messages (group or buffer them) by the static 1 key.
The group (or buffer size is 10) and when we reach it we emit a single message which contains all the message from the group. After emitting the result we clean the store from this group to allow to form a new one for a fresh sequence of messages.

It depends on what is creating the messages in the first place; if a message-driven channel adapter, the concurrency in that adapter is the key.
For other message sources, you can use an ExecutorChannel as the input channel to the service activator, with an executor with a pool size of 10.
Depending on what is sending messages, you need to be careful about losing messages in the event of a server failure.
It's difficult to provide a general answer without more information about your application.

ActiveMQ converting existing Queue to CompositeQueue

I'll try to explain this the best I can.
As I store my data that I receive from my ActiveMQ queue in several distinct locations, I have decided to build a composite Queue so I can process the data for each location individually.
The issue I am running into is that I currently have the Queue in a production environment. It seems that changing a queue named A to a composite Queue also called A having virtual destinations named B and C causes me to lose all the data on the existing Queue. It does not on start-up forward the previous messages. Currently, I am creating a new CompositeQueue with a different name, say D, which forwards data to B and C. Then I have some clunky code that prevents all connections until I have both a) updated all the producers to send to D and b) pulled the data from A using a consumer and sent it to D with a producer.
It feels rather messy. Is there any way around this? Ideally I would be able to keep the same Queue name, have all its current data sent to the composite sub-queues, and have the Queue forward only in the end.

From the description given the desired behavior is no possible as message routing on the composite queue works when messages are in-flight and not sometime later when that queue has already stored messages and the broker configuration is changed. You need to consume the past messages from the initial Queue (A I guess it is) and send them onto the destinations desired.

Clustering the Batch Job & distributing the data load

I have Batch Processing project, wanted to cluster on 5 machines.
Suppose I have input source is database having 1000 records.
I want to split these records equally i.e. 200 records/instance of batch job.
How could we distribute the work load ?

Given below, is the workflow that you may want to follow.
Assumptions:
You have the necessary Domain Objects respective to the DB table.
You have a batch flow configured wherein, there is a
reader/writer/tasklet mechanism.
You have a Messaging System (Messaging Queues are a great way to
make distributed applications talk to each other)
Input object is an object to the queue that contains the set of
input records split as per the required size.
Result object is an object to the queue that contains the processed
records or result value(if scalar)
The chunkSize is configured in a property file. Here 200
Design:
In the application,
Configure a queueReader to read from a queue
Configure a queueWriter to write to a queue
If using the task/tasklet mechanism, configure different queues to carry the input/result objects.
Configure a DB reader which reads from a DB
Logic in the DBReader
Read records from DB one by one and count of records maintained. if
(count%chunkSize==0) then write all the records to the inputMessage
object and write the object to the queue.
Logic in queueReader
Read the messages one by one
For each present message do the necessary processing.
Create a resultObject
Logic in the queueWriter
Read the resultObject (usually batch frameworks provide a way to
ensure that writers are able to read the output from readers)
If any applicable processing or downstream interaction is needed,
add it here.
Write the result object to the outputQueue.
Deployment
Package once, deploy multiple instances. For better performance, ensure that the chunkSize is small to enable fast processing. The queues are managed by the messaging system (The available systems in the market provide ways to monitor the queues) where you will be able to see the message flow.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio