I'd like to process a list of custom java objects , split those using camel splitter and like to process it in parallel threads. But the challenge I am facing is ,list of custom objects is ordered based on Id, which has to be written in a file.
As soon as I use parallel processing, the sequence is disturbed. I went through a few articles which asked to use "resequencer" or "single thread".
But using single thread, it takes huge time to process 5k records.
Any leads would be highly helpful.
Thanks
Nitin
I had similar kind of problem while Splitting a XMl request based on a tag "XXX". Then processing the splitted request and Aggregating into a Response.
The order of the Aggregated response is not same as the request.
FIX : The issue has been resolved by using Aggregation "strategyRef" in splitter EIP.
Sample Code:
<route>
<from id="_from1" uri="activemq:DocumentGenerationQueue" />
<split parallelProcessing="true" streaming="false" strategyRef = "AggregateTask" >
<tokenize token="XXX" xml="true" />
<to id="_to71" uri="bean:ProcessorBean" />
<to id="_to72" uri="activemq:SplittedResponseQueue" />
</split>
<to uri="activemq:AggregatedResponseQueue" />
</route>
You can create an instance of AggregationStrategy that compares the results of newExchange and oldExchange and create a resultExchange with sorted list of custom java objects based on id.
But using single thread, it takes huge time to process 5k records.
You have to be careful as you may not want to spin up 5k parallel threads but instead create your own thread pool attach it in the split with executorServiceRef. This way you can control the number of threads and handle what to do when your queue is full.
Related
I have a “listener-container” defined like this:
<listener-container concurrency="1" connection-factory="connectionFactory" prefetch="10"
message-converter="jsonMessageConverter"
error-handler="clientErrorHandler"
mismatched-queues-fatal="true"
xmlns="http://www.springframework.org/schema/rabbit">
<listener ref="clientHandler" method="handleMessage" queue-names="#{marketDataBroadcastQueue.name}" />
</listener-container>
I want to process the messages in sequential order, so I need to set concurrency to 1.
But the bean “clientHandler” has more than one “handleMessage” methods (with diferent java classes as parameters). I can see in the application logs that messages are not processed one by one. I have several messages processed in parallel. Can it be due to having multiple methods with the same name that processes those messages?
Thanks!
I am developing a Clustered Web Application with different WARS deployed, so I need session sharing (and not only this). I've started using Ignite as a good platform for Clustered(Replicated) cache server.
The issue I reached is this:
My cache Key is String and Value is a HashMap
CacheConfiguration<Integer, Map<String,String>> cfg = new CacheConfiguration<>("my_cache");
I am using this cache as a WEBSESSION. The issue is where one servlet gets the Map, Put some session specific values, and put it back to Ignite. After the first servlet gets the cache, second one enters and because it finishes after the first one, the second put will kill first one changes.
So my exact question is, what's the pattern to have concurrent map access issue solved is a high efficient way (without whole object locking).
Regards
It sounds a bit weird to me, because this scenario should be only possible when two there are two concurrent requests working with the same session. How is this possible?
But in any case, you can use TRANSACTIONAL cache for web session data. This will guarantee that these two requests will be processed within a lock and the data will be updated atomically.
<bean class="org.apache.ignite.configuration.CacheConfiguration">
<property name="name" value="web-sessions-cache"/>
<property name="atomicityMode" value="TRANSACTIONAL"/>
</bean>
I have a huge xml that come as an input payload to my Spring integration flow. So I am using claim check in transformer instead of header enricher to retain my payload. I am using an in-memory message store.
Later on in my SI flow, I have a splitter that splits the payload into multiple threads and each thread will invoke different channel based on one of the attribute payload. I am using a router for achieve this. Each flow or each thread uses a claim check out transformer to retrieve the initial payload then us it for building the required response. Each thread will produce a response and I don't have to aggregate them. So I will have multiple responses coming out from my flow which will then be dropped into a queue.
I cannot remove the message during the check out as other thread will also try to check out the same message. What is the best way to remove the message from the message store?
Sample configuration
`<int:chain input-channel="myInputChannel"
output-channel="myOutputchannel">
<int:claim-check-in />
<int:header-enricher>
<int:header name="myClaimCheckID" expression="payload"/>
</int:header-enricher>
</int:chain>`
all the other components in the flow are invoked before the splitter
<int:splitter input-channel="mySplitterChannel" output-channel="myRouterChannel" expression="mySplitExpression">
</int:splitter>
`<int:router input-channel="myRouterChannel" expression="routerExpression"
resolution-required="true">
<int:mapping value="A" channel="aChannel" />
<int:mapping value="B" channel="bChannel" />
<int:mapping value="C" channel="cChannel" />
</int:router>`
Each channel has a claim check out transformer for the initial payload. So how do I make sure the message is removed after all the threads have been processed?
When you know you are done with the message you can simply invoke the message store's remove() method. You could use a service activator with
... expression="#store.remove(headers['myClaimCheckID'])" ...
However, if you are using an in-memory message store there is really no point in using the claim check pattern.
If you simply promote the payload to a header, it will use no more memory than putting it in a store.
Even if it ends up in multiple messages on multiple threads, it makes no difference since they'll all be pointing to the same object on the heap.
We have a Spring Integration project which uses a the following
<int-file:inbound-channel-adapter
directory="file:#{'${poller.landingzonepath}'.toLowerCase()}" channel="createMessageChannel"
filename-regex="${ingestion.filenameRegex}" queue-size="10000"
id="directoryPoller" scanner="leafScanner">
<!-- <int:poller fixed-rate="${ingestion.filepoller.interval:10000}" max-messages-per-poll="100" /> -->
<int:poller fixed-rate="10000" max-messages-per-poll="1000" />
</int-file:inbound-channel-adapter>
We also have a leafScanner which extends from the default RecursiveLeafOnlyDirectoryScanner, our leafscanner doesn't do too much. Just checks a directory against a regex property.
The issue we're seeing is one where there are 250,000 (.landed [the ones we care about] files) which means about 500k actual files in the directory that we are polling. This is redesign of an older system and the redesign was to make the application more scalable, whilst being agnostic of the directory names inside the polled parent directory. We wanted to get away from a poller per specific directory, but it seems unless we're doing something wrong, we'll have to go back to this.
If anyone has any possible solutions, or configuration items we could try please let me know. On my local machine with 66k .landed files, it takes about 16 minutes before the first file is presented to our transformer to do something.
As the JavaDocs indicate, the RecursiveLeafOnlyDirectoryScanner will not scale well with large directories or deep trees.
You could make your leafScanner stateful and, instead of subclassing RecursiveLeafOnlyDirectoryScanner, subclass DefaultDirectoryScanner and implement listEligibleFiles and return when you have 1000 files after saving off where you are; and on the next poll, continue from where you left off; when you get to the end, start again at the beginning.
You could maintain state in a field (which would mean you'd start over after a JVM restart) or use some persistence.
Just an update. The reason our implementation was so slow was beacuse of locking (trying to prevent duplicates), locking (preventing duplicates) is automatically disabled by adding a filter.
The max-messages-per-poll is also very important if you want to add a thread pool. Without this you will see no performance improvements.
I am using Spring Integration and have a large XML file containing a collection of child items, I want to split the file into a set of messages, the payload of each message will be one of the child XML fragments.
Using splitter is the obvious but this requires returning a collection of messages and this will exhaust the memory; I need to split the file into individual messages but process them one at a time (or more likely with a multi threaded task-executor).
Is there a standard way to do this without writing a custom component that writes the sub-messages to a channel programatically.
i have been looking for a similar solution and I have not found either any standard way of doing this.
Here is a rather dirty fix, if anyone needs this behavior implemented:
Split the files manually using a Service Activator or a Splitter with a custom bean.
<int:splitter input-channel="rawChannel" output-channel="splitChannel" id="splitter" >
<bean class="com.a.b.c.MYSplitter" />
</int:splitter>
Your custom bean should implement ApplicationContextAware so the application context can be injected by Spring.
Manually retrieve the output channel and send each sub-message
MessageChannel xsltChannel = (MessageChannel) applicationContext.getBean("splitChannel");
Message<String> message = new GenericMessage<String>(payload));
splitChannel.send(message);
For people coming across this very old question. Splitters can now handle results of type Iterable, Iterator, Stream, and Flux (project reactor). If any of these types are returned, messages are emitted one-at-a-time.
Iterator/Iterable since 4.0.4; Stream/Flux since 5.0.0.
There is also now a FileSplitter which emits file contents a line-at-a-time via an Interator - since 4.1.2.