I am using a collector node on IIB to collect a set of messages. Can someone help with sample ESQL after collector node to process a message collection? Since I am new to ESQL I am struggling to figure it out. The IBM info center is not very helpful on ESQL message collection.
The code would depend on what you want to do with the collection. If you want to loop through the collected messages, you could do something like this:
--reference the first message, the CollectionName is the first element in the array
DECLARE ref REFERENCE TO InputRoot.Collection.[2];
WHILE LASTMOVE(ref) DO
--reference the data like normal, domain is a header, parsers, etc.
SET Environment.Variables.data = ref.domain.data;
--example ref.XMLNSC.HTML.Body.h1
--do any other work on the message here
MOVE ref NEXTSIBLING;
END WHILE;
This loop will run until it reaches the end of the collection. Then the MOVE command will return null and the LASTMOVE will return false.
Use a Trace node before the ESQL with the pattern ${Root} to see what the message structure looks like. This is the best place to start to develop the ESQL you need to process the data.
Related
I am using Lambdas and SQS queue to delete the data from DynamoDB. Earlier when I was developing this I found that the only way to delete data from DyanmoDB is to gather the data you want to delete and deleting them in Batches.
At my current organization, most of the infrastructure is in serverless. Hence, I decided to make this piece following serverless and event driven architecture as well.
In a nutshell, I post a message on the SQS queue to delete items under particular partition. Once this message invokes my Lambda, I perform the listing call to DyanmoDB for 1000 items and do the following:
Grab the cursor from this listing call, and post another message to grab next 1000 items from this cursor.
import { DynamoDBClient } from '#aws-sdk/client-dynamodb';
const dbClient = new DynamoDBClient(config);
const records = dbClient.query(...fetchFirst1000ItemsForPrimaryKey);
postMessageToFetchNextItems();
From the fetched 1000 items:
I create a batches of 20 items, and issue set of messages for another lambda to delete these items. A batch of 20 items is posted for deletion until all 1000 have been posted for deletion.
for (let i = 0; i < 1000; i += 20) {
const itemsToDelete = records.slice(i, 20);
postItemsForDeletion(itemsToDelete);
}
Another lambda gets these items and just deletes them:
dbClient.send(new BatchWriteItemCommand([itemsForDeletion]))
The listing lambda receives call to read items from next cursor and the above steps ge t repeated.
This all happens in parallel. Get items, post message to grab next 1000 items, post messages for deletion of items.
While looking good on paper, this doesn't seem to delete all records from DynamoDB. There is no set pattern, there are always some items that remain in the DynamoDB. I am not entirely sure what could be happening but have a theory that parallel deletion and listing could be something that is causing the issue?
I was unable to find any documentation to verify my theory and hence this question here.
A batch write items call will return a list of unprocessed items. You should check for that and retry them.
Look at the docs for https://docs.aws.amazon.com/AWSJavaScriptSDK/v3/latest/clients/client-dynamodb/classes/batchwriteitemcommand.html and seach for UnprocessedItems.
Fundamentally, a batch write items call is not a transactional write. It's possible for some item writes to succeed while others fail. It's on you to check for failures and retry them. I'm sorry I don't have a link for good sample code.
I am exploring ChronicleQueue to save events generated in one of my application.I would like to publish the saved events to a different system in its original order of occurrence after some processing.I have multiple instances of my application and each of the instance could run a single threaded appender to append events to ChronicleQueue.Although ordering across instances is a necessity,I would like to understand these 2 questions.
1)How would the index of the read index for my events be saved so that I don't end up reading and publishing the same message from chronicle queue multiple times.
In the below code(picked from the example in github) the index is saved till we reach the end of the queue while we restarted the application.The moment we reach the end of the queue,we end up reading all the messages again from the start.I want to make sure for a particular consumer identified by a tailer Id, the messages are read only once.Do i need to save the read index in another queue and use that to achieve what I need here.
String file = "myPath";
try (ChronicleQueue cq = SingleChronicleQueueBuilder.binary(file).build()) {
for(int i = 0 ;i<10;i++){
cq.acquireAppender().writeText("test"+i);
}
}
try (ChronicleQueue cq = SingleChronicleQueueBuilder.binary(file).build()) {
ExcerptTailer atailer = cq.createTailer("a");
System.out.println(atailer.readText());
System.out.println(atailer.readText());
System.out.println(atailer.readText());
}
2)Also need some suggestion if there is a way to preserve ordering of events across instances.
Using a named tailer should ensure that the tailer only reads a message once. If you have an example where this doesn't happen can you create a test to reproduce it?
The order of entries in a queue are fixed when writing and all tailer see the same messages in the same order, there isn't any option.
I'm using IBM Integration Bus Version 10.0.0.15 and I'm looking for an option to intialize shared variables during the startup of a message flow, for example uing the command mqsistartmsgflow. Is there a special procedure or function one can implement with ESQL which is guranteed to be excuted during start up?
In the ESQL documentation it is stated that shared variables are intialized when the first message is routed through the flow which means you have to wait for the first message.
Actually you need to initialise them this typically looks something like.
-- Shared row variable for caching config data. Declared at Global scope.
DECLARE S_ConfigSharedRow SHARED ROW;
CREATE COMPUTE MODULE TheFirstComputeNode
CREATE FUNCTION Main() RETURNS BOOLEAN
BEGIN
CFGDATA_CACHE_LOCK: BEGIN ATOMIC
-- If the configuration data is not available in the cache then load it from the CONFIG table
DECLARE CfgDataRef REFERENCE TO S_ConfigSharedRow.CfgDataCache;
IF NOT LASTMOVE(CfgDataRef) THEN
-- Select all the relevant content from the actual database in one go.
DECLARE DBResults ROW;
DECLARE RetryCount INTEGER 5;
SET DBResults.Row[] = PASSTHRU('SELECT * FROM CONFIG');
-- Typically you would post process the content from the DB into a more amenable
-- structure but the following will get the data into the shared variable
CREATE LASTCHILD OF S_ConfigSharedRow.CfgDataCache FROM DBResults;
END IF;
END CFGDATA_CACHE_LOCK;
-- Config data is now available for use
RETURN TRUE;
END;
END MODULE;
I think the best way is to have a dedicated flow for initializing shared variables.
That flow should have an input queue separate from the normal input queue, just for sending in messages to trigger the initialization.
Then you should make a startup script, which sends a message to this init flow after starting up the main processing flow.
And use only that script for startup.
If you want another option, you could include a JavaCompute node and add some once-only initialisation into its class using a static initialization block. You would only be able to initialize Java data structures this way, though.
I have an Alpakka Elasticsearch Sink that I'm keeping around between requests. When I get a request, I create a Source from an HTTP request and turn that into a Source of Elasticsearch WriteMessages, then run that with mySource.runWith(theElasticseachSink).
How do I get notified when the source has completed? Nothing useful seems to be materialized.
Will completion of the source be passed to the sink, meaning I have to create a new one each time?
If yes to the above, would decoupling them somehow with Flow.fromSourceAndSink help?
My goal is to know when the HTTP download has completed (including the vias it goes through) and to be able to reuse the sink.
you can pass around the single parts of a flow as you wish, you can even pass around the whole executabe graph (those are immutables). The run() call materializes the flow, but does not change your graph or its parts.
1)
Since you want to know when the HttpDownload passed the flow , why not use the full graphs Future[Done] ? Assuming your call to elasticsearch is asynchronous, this should be equal since your sink just fires the call and does not wait.
You could also use Source.queue (https://doc.akka.io/docs/akka/2.5/stream/operators/Source/queue.html) and just add your messages to the queue, which then reuses the defined graph so you can add new messages when proocessing is needed. This one also materializes a SourceQueueWithComplete allowing you to stop the stream.
Apart from this, reuse the sink wherever needed without needing to wait for another stream using it.
2) As described above: no, you do not need to instantiate a sink multiple times.
Best Regards,
Andi
It turns out that Alpakka's Elasticsearch library also supports flow shapes, so I can have my source go via that and run it via any sink that materializes a future. Sink.foreach works fine here for testing purposes, for example, as in https://github.com/danellis/akka-es-test.
Flow fromFunction { product: Product =>
WriteMessage.createUpsertMessage(product.id, product.attributes)
} via ElasticsearchFlow.create[Map[String, String]](index, "_doc")
to define es.flow and then
val graph = response.entity.withSizeLimit(MaxFeedSize).dataBytes
.via(scanner)
.via(CsvToMap.toMap(Utf8))
.map(attrs => Product(attrs("id").decodeString(Utf8), attrs.mapValues(_.decodeString(Utf8))))
.via(es.flow)
val futureDone = graph.runWith(Sink.foreach(println))
futureDone onComplete {
case Success(_) => println("Done")
case Failure(e) => println(e)
}
So I have this requirement, that takes in one document, and from that needs to create one or more documents in the output.
During the cause of this, it needs to determine if the document is already there, because there are different operations to apply for create and update scenarios.
In straight code, this would be simple (conceptually)
InputData in = <something>
if (getItemFromExternalSystem(in.key1) == null) {
createItemSpecificToKey1InExternalSystem(in.key1);
}
if (getItemFromExternalSystem(in.key2) == null) {
createItemSpecificToKey2InExternalSystem(in.key1, in.key2);
}
createItemFromInput(in.key1,in.key2, in.moreData);
In effect a kind of "ensure this data is present".
However, in IIB How would i go about achieving this? If i used a subflow for the Get/create cycle, the output of the subflow would be whatever the result of the last operation is, is returned from the subflow as the new "message" of the flow, but really, I don't care about the value from the "ensure data present" subflow. I need instead to keep working on my original message, but still wait for the different subflows to finish before i can run my final "createItem"
You can use Aggregation Nodes: for example, use 3 flows:
first would be propagate your original message to third
second would be invoke operations createItemSpecificToKey1InExternalSystem and createItemSpecificToKey2InExternalSystem
third would be aggregate results of first and second and invoke createItemFromInput.
Have you considered using the Collector node? It will collect your records into N 'collections', and then you can iterate over the collections and output one document per collection.