Spring Integration Splitter Map Keys to different channels - spring

I have a transformer which returns a Map as a result. This result is then put on to the output-channel. What I want to do is to go to different channel for each KEY in the map. How can I configure this in Spring Integration?
e.g.
Transformer -- produces --> Map
Map contains {(Key1, "some data"), (Key2, "some data")}
So for Key1 --> go to channel 1
So for Key2 --> go to channel 2
etc..
Code examples would be helpful.
Thanks in advance
GM

Your processing should consist of two steps:
Partitioning message into separate parts that will be processed independently,
Routing separate messages (the result of split) into appropriate channels.
For the first task you have to use splitter and for the second one - router (header value router fits best here).
Please find a sample Spring Integration configuration below. You may want to use an aggregator at the end of a chain in order to combine messages - I leave it at your discretion.
<channel id="inputChannel">
<!-- splitting message into separate parts -->
<splitter id="messageSplitter" input-channel="inputChannel" method="split"
output-channel="routingChannel">
<beans:bean class="com.stackoverflow.MapSplitter"/>
</spliter>
<channel id="routingChannel">
<!-- routing messages into appropriate channels basis on header value -->
<header-value-router input-channel="routingChannel" header-name="routingHeader">
<mapping value="someHeaderValue1" channel="someChannel1" />
<mapping value="someHeaderValue2" channel="someChannel2" />
</header-value-router>
<channel id="someChannel1" />
<channel id="someChannel2" />
And the splitter:
public final class MapSplitter {
public static final String ROUTING_HEADER_NAME = "routingHeader";
public List<Message<SomeData>> split(final Message<Map<Key, SomeData>> map) {
List<Message<SomeData>> result = new LinkedList<>();
for(Entry<Key, SomeData> entry : map.entrySet()) {
final Message<SomeData> message = new MessageBuilder()
.withPayload(entry.getValue())
.setHeader(ROUTING_HEADER_NAME, entry.getKey())
.build();
result.add(message);
}
return result;
}
}

Related

Spring Cloud Function - Separate routing-expression for different Consumer

I have a service, which receives different structured messages from different message queues. Having #StreamListener conditions we can choose at every message type how that message should be handled. As an example:
We receive two different types of messages, which have different header fields and values e.g.
Incoming from "order" queue:
Order1: { Header: {catalog:groceries} }
Order2: { Header: {catalog:tools} }
Incoming from "shipment" queue:
Shipment1: { Header: {region:Europe} }
Shipment2: { Header: {region:America} }
There is a binding for each queue, and with according #StreamListener I can process the messages by catalog and region differently
e.g.
#StreamListener(target = OrderSink.ORDER_CHANNEL, condition = "headers['catalog'] == 'groceries'")
public void onGroceriesOrder(GroceryOder order){
...
}
So the question is, how to achieve this with the new Spring Cloud Function approach?
At the documentation https://cloud.spring.io/spring-cloud-static/spring-cloud-stream/3.0.2.RELEASE/reference/html/spring-cloud-stream.html#_event_routing it is mentioned:
Also, for SpEL, the root object of the evaluation context is Message so you can do evaluation on individual headers (or message) as well …​.routing-expression=headers['type']
Is it possible to add the routing-expression to the binding like (in application.yml)
onGroceriesOrder-in-0:
destination: order
routing-expression: "headers['catalog']==groceries"
?
EDIT after first answer
If the above expression at this location is not possible, what the first answer implies, than my question goes as follows:
As far as I understand, an expression like routing-expression: headers['catalog'] must be set globally, because the result maps to certain (consumer) functions.
How can I control that the 2 different messages on each queue will be forwarted to their own consumer function, e.g.
Order1 --> MyOrderService.onGroceriesOrder()
Order2 --> MyOrderService.onToolsOrder()
Shipment1 --> MyShipmentService.onEuropeShipment()
Shipment2 --> MyShipmentService.onAmericaShipment()
That was easy with #StreamListener, because each method gets their own #StreamListener annotation with different conditions. How can this be achieved with the new routing-expression setting?
?
Aside from the fact that the above is not a valid expression, but I think you meant headers['catalog']==groceries. If so, what would you expect to happen from evaluating it as the only two option could be true/false. Anyway, these are rhetorical but helps to understand the problem and how to fix it.
The expression must result in a value of a function to route TO. So. . .
routing-expression: headers['catalog'] - assumes that the actual value of catalog header is the name of the function to invoke
routing-expression: headers['catalog']==groceries ? 'processGroceries' : 'processOther' - maps value 'groceries' to 'processGroceries' function.
For a specific routing, you can use MessageRoutingCallback strategy:
MessageRoutingCallback
The MessageRoutingCallback is a strategy to assist with determining
the name of the route-to function definition.
public interface MessageRoutingCallback {
FunctionRoutingResult routingResult(Message<?> message);
. . .
}
All you need to do is implement and register it as a bean to be picked
up by the RoutingFunction. For example:
#Bean
public MessageRoutingCallback customRouter() {
return new MessageRoutingCallback() {
#Override
FunctionRoutingResult routingResult(Message<?> message) {
return new FunctionRoutingResult((String) message.getHeaders().get("func_name"));
}
};
}
Spring Cloud Function

Spring batch: processing multiple record at once

I am using spring batch and as normally used I have reader , processor and writer .
I have 2 questions
1>
Reader queries all 200 records (total record size in table is 200 and I have given pagesize=200 )and thus it gets me all 200 records, and in processor we want list of all these record because we have to compare each record with other 199 records to group them in different tiers .
Thus I am thinking if we can get that list in processing step , I can manipulate them .how should I approach .
2>
In processing stage I need some master data from database depending on which all input records will be processed .i m thinking of injection of data source in processing bean and fetch all master table data and process all records. Is it good approach or please suggest otherwise .
<job id="sampleJob">
<step id="step1">
<tasklet>
<chunk reader="itemReader" processor="processor" writer="itemWriter" commit-interval="20"/>
</tasklet>
</step>
</job>
And the processor is
#Override
public User process(Object item) throws Exception {
// transform item to user
return user;
}
And I want something like
public List<User> process(List<Object> item) throws Exception {
// transform item to user
return user;
}
I found some post here but they say to get the list in writer .But i dont like to process anything in writer, because that kills the defination of writer and processor. Is there any configuration to get the list inside this process method.
Thank you
Since the ItemProcessor receives whatever you return from the ItemReader, you need your ItemReader to return the List. That List is really the "item" you're processing. There is an example of this in the Spring Batch Samples. The AggregateItemReader reads all the items from a delegate ItemReader and returns them as a single list. You can take a look at it on Github here: https://github.com/spring-projects/spring-batch/blob/master/spring-batch-samples/src/main/java/org/springframework/batch/sample/domain/multiline/AggregateItemReader.java

Spring integration aggregator time expire - issue

Below code is accepting 2 messages, before proceeding to outbound channel.
<bean id="timeout"
class="org.springframework.integration.aggregator.TimeoutCountSequenceSizeReleaseStrategy">
<constructor-arg name="threshold" value="2" />
<constructor-arg name="timeout" value="7000" />
</bean>
<int:aggregator ref="updateCreate" input-channel="filteredAIPOutput"
method="handleMessage" release-strategy="releaseStrategyBean" release-strategy-method="timeout">
</int:aggregator>
My use case is to collate all the message for 10 min and send it to outbound channel. Not the based on the count of messages as shown above.
To implement this time based functionality, used below code:
<int:aggregator ref="updateCreate" input-channel="filteredAIPOutput"
method="handleMessage"
output-channel="outputappendFilenameinHeader" >
</int:aggregator>
<bean id="updateCreate" class="helper.UpdateCreateHelper"/>
I passed 10 messages, PojoDateStrategyHelper canRelease method invoked 10 times.
Tried to implement PojoDateStrategyHelper, with time difference logic, it's working as expected. After 10 min UpdateCreateHelper class is called, but it received only 1 message(last message). Remaining 9 messages not seen anywhere. Am i doing anything wrong here ? Messages are not collating.
I suspect there should be something inbuild with in SI, which can achieve this, if i pass 10 min as parameter, once it expires the 10 min time, it should pass on all the messages to outbound channel.
This is my UpdateCreateHelper.java code :
public Message<?> handleMessage(List<Message<?>> flights){
LOGGER.debug("orderItems list ::"+flights.size()); // this is always printing 1
MessageBuilder<?> messageWithHeader = MessageBuilder.withPayload(flights.get(0).getPayload().toString());
messageWithHeader.setHeader("ftp_filename", "");
return messageWithHeader.build();
}
#CorrelationStrategy
public String correlateBy(#Header("id") String id) {
return id;
}
#ReleaseStrategy
public boolean canRelease(List<Message<?>> flights) {
LOGGER.debug("inside canRelease ::"+flights.size()); // This is called for each and every message
return compareTime(date.getTime(), new Date().getTime());
}
I am new to SI (v3.x), i searched a lot for time bound related aggregator, couldn't find any useful source, Please suggest.
thanks!
Turn on DEBUG logging to see why you only see one message.
I suspect there should be something inbuilt with in SI, which can achieve this, ...
Prior to version 4.0 (and, by default, after), the aggregator is a completely passive component; the release strategy is only consulted when a new message arrives.
4.0 added group timeout capabilities whereby partial groups can be released (or discarded) after a timeout.
However, with any version, you can configure a MessageGroupStoreReaper to release partially complete groups after some timeout. See the documentation.
private String correlationId = date.toString();
#CorrelationStrategy
public String correlateBy(Message<?> message) {
**// Return the correlation ID which is the timestamp the current window started (all messages should have the same correlation id)**
return "same";
}
Earlier i was returning the Header Id, which is different from Message to Message. I hope this solution could help some one. I wasted almost 2 days by ignore such a small concept.

send output of two bolts to a single bolt in Storm?

What is the easiest way to send output of BoltA and BoltB to BoltC. Do I have to use Joins or is there any simpler solution. A and B have same fields (ts, metric_name, metric_count).
// KafkaSpout --> LogDecoder
builder.setBolt(LOGDECODER_BOLT_ID, logdecoderBolt, 10).shuffleGrouping(KAFKA_SPOUT_ID);
// LogDecoder --> CountBolt
builder.setBolt(COUNT_BOLT_ID, countBolt, 10).shuffleGrouping(LOGDECODER_BOLT_ID);
// LogDecoder --> HttpResCodeCountBolt
builder.setBolt(HTTP_RES_CODE_COUNT_BOLT_ID, http_res_code_count_bolt, 10).shuffleGrouping(LOGDECODER_BOLT_ID);
# And now I want to send CountBolt and HttpResCodeCountBolt output to Aggregator Bolt.
// CountBolt --> AggregatwBolt
builder.setBolt(AGGREGATE_BOLT_ID, aggregateBolt, 5).fieldsGrouping((COUNT_BOLT_ID), new Fields("ts"));
// HttpResCodeCountBolt --> AggregatwBolt
builder.setBolt(AGGREGATE_BOLT_ID, aggregateBolt, 5).fieldsGrouping((HTTP_RES_CODE_COUNT_BOLT_ID), new Fields("ts"));
Is this possible ?
Yes. Just add a stream-id ("stream1" and "stream2" below) to the fieldsGrouping call:
BoltDeclarer bd = builder.setBolt(AGGREGATE_BOLT_ID, aggregateBolt, 5);
bd.fieldsGrouping((COUNT_BOLT_ID), "stream1", new Fields("ts"));
bd.fieldsGrouping((HTTP_RES_CODE_COUNT_BOLT_ID), "stream2", new Fields("ts"));
and then in the execute() method for BoltC you can test to see which stream the tuple came from:
public void execute(Tuple tuple) {
if ("stream1".equals(tuple.getSourceStreamId())) {
// this came from stream1
} else if ("stream2".equals(tuple.getSourceStreamId())) {
// this came from stream2
}
Since you know which stream the tuple came from, you don't need to have the same shape of tuple on the two streams. You just de-marshall the tuple according to the stream-id.
You can also check to see which component the tuple came from (as I type this I think this might be more appropriate to your case) as well as the instance of the component (the task) that emitted the tuple.
As #Chris said you can use streams. But you can also simply get the source component from the tuple.
#Override
public final void execute(final Tuple tuple) {
final String sourceComponent = tuple.getSourceComponent();
....
}
The source component is the name you gave to the Bolt at the topology's initialization. For instance: COUNT_BOLT_ID.

How to map a set of text as a whole to a node?

Suppose I have a plain text file with the following data:
DataSetOne <br />
content <br />
content <br />
content <br />
DataSetTwo <br />
content <br />
content <br />
content <br />
content <br />
...and so on...
What I want to to is: count how many contents in each data set. For example the result should be
<DataSetOne, 3>, <DataSetTwo, 4>
I am a beginer to hadoop, I wonder if there is a way to map a chunk of data as a whole to a node. for example, set all DataSetOne to node 1 and all DataSetTwo to node 2.
Does anyone can give me an idea how to archive this?
I think the simple way will be to implement the logic in the mapper, where you will remember
what is a current dataSet and emit pairs like this:
(DataSetOne, content)
(DataSetOne, content)
(DataSetOne, content)
(DataSetTwo, content)
(DataSetTwo, content)
And then you will countgroups in the reduce stage.
If performance will became an issue I would suggest to consider combiner.
First of all your datasets are split for multiple maps if they are in seperate files or if they exceed the configured blocksize. So if you have one dataset of 128MB and your chunksize is 64mb hadoop will 2-block this file and setup 2 mappers for each.
This is like the wordcount example in the hadoop tutorials. Like David says you'll need to map the key/value pairs into HDFS and then reduce on them.
I would implement that like this:
// field in the mapper class
int groupId = 0;
#Override
protected void map(K key, V value, Context context) throws IOException,
InterruptedException {
if(key != groupId)
groupId = key;
context.write(groupId, value);
}
#Override
protected void reduce(K key, Iterable<V> values,
Context context)
throws IOException, InterruptedException {
int size = 0;
for(Value v : values){
size++;
}
context.write(key, size);
}
Like David said aswell you could use combiner. Combiners are simple reducers and are used to save ressources between the map and reduce phase. They can be set in the configuration.
You can extend the FileInputFormat class and implement the RecordReader interface (or if you're using the newer API, extend the RecordReader abstract class) to define how you split your data. Here is a link that gives you an example of how to implement these classes, using the older API.
http://www.questionhub.com/StackOverflow/4235318

Resources