Save consumer/tailer read offset for ChronicleQueue - chronicle

I am exploring ChronicleQueue to save events generated in one of my application.I would like to publish the saved events to a different system in its original order of occurrence after some processing.I have multiple instances of my application and each of the instance could run a single threaded appender to append events to ChronicleQueue.Although ordering across instances is a necessity,I would like to understand these 2 questions.
1)How would the index of the read index for my events be saved so that I don't end up reading and publishing the same message from chronicle queue multiple times.
In the below code(picked from the example in github) the index is saved till we reach the end of the queue while we restarted the application.The moment we reach the end of the queue,we end up reading all the messages again from the start.I want to make sure for a particular consumer identified by a tailer Id, the messages are read only once.Do i need to save the read index in another queue and use that to achieve what I need here.
String file = "myPath";
try (ChronicleQueue cq = SingleChronicleQueueBuilder.binary(file).build()) {
for(int i = 0 ;i<10;i++){
cq.acquireAppender().writeText("test"+i);
}
}
try (ChronicleQueue cq = SingleChronicleQueueBuilder.binary(file).build()) {
ExcerptTailer atailer = cq.createTailer("a");
System.out.println(atailer.readText());
System.out.println(atailer.readText());
System.out.println(atailer.readText());
}
2)Also need some suggestion if there is a way to preserve ordering of events across instances.

Using a named tailer should ensure that the tailer only reads a message once. If you have an example where this doesn't happen can you create a test to reproduce it?
The order of entries in a queue are fixed when writing and all tailer see the same messages in the same order, there isn't any option.

Related

Listing and Deleting Data from DynamoDB in parallel

I am using Lambdas and SQS queue to delete the data from DynamoDB. Earlier when I was developing this I found that the only way to delete data from DyanmoDB is to gather the data you want to delete and deleting them in Batches.
At my current organization, most of the infrastructure is in serverless. Hence, I decided to make this piece following serverless and event driven architecture as well.
In a nutshell, I post a message on the SQS queue to delete items under particular partition. Once this message invokes my Lambda, I perform the listing call to DyanmoDB for 1000 items and do the following:
Grab the cursor from this listing call, and post another message to grab next 1000 items from this cursor.
import { DynamoDBClient } from '#aws-sdk/client-dynamodb';
const dbClient = new DynamoDBClient(config);
const records = dbClient.query(...fetchFirst1000ItemsForPrimaryKey);
postMessageToFetchNextItems();
From the fetched 1000 items:
I create a batches of 20 items, and issue set of messages for another lambda to delete these items. A batch of 20 items is posted for deletion until all 1000 have been posted for deletion.
for (let i = 0; i < 1000; i += 20) {
const itemsToDelete = records.slice(i, 20);
postItemsForDeletion(itemsToDelete);
}
Another lambda gets these items and just deletes them:
dbClient.send(new BatchWriteItemCommand([itemsForDeletion]))
The listing lambda receives call to read items from next cursor and the above steps ge t repeated.
This all happens in parallel. Get items, post message to grab next 1000 items, post messages for deletion of items.
While looking good on paper, this doesn't seem to delete all records from DynamoDB. There is no set pattern, there are always some items that remain in the DynamoDB. I am not entirely sure what could be happening but have a theory that parallel deletion and listing could be something that is causing the issue?
I was unable to find any documentation to verify my theory and hence this question here.
A batch write items call will return a list of unprocessed items. You should check for that and retry them.
Look at the docs for https://docs.aws.amazon.com/AWSJavaScriptSDK/v3/latest/clients/client-dynamodb/classes/batchwriteitemcommand.html and seach for UnprocessedItems.
Fundamentally, a batch write items call is not a transactional write. It's possible for some item writes to succeed while others fail. It's on you to check for failures and retry them. I'm sorry I don't have a link for good sample code.

Is it possible to lock some entries in MongoDB and do a query that do not take into account the locked recors?

I have a mongoDB that contains a list of "task" and two istance of executors. This 2 executors have to read a task from the DB, save it in the state "IN_EXECUTION" and execute the task. Of course I do not want that my 2 executors execute the same task and this is my problem.
I use the transaction query. In this way when An executor try to change state of the task it get "write exception" and have to start again and read a new task. The problem of this approach is that sometimes an Executor get a lot of errors before it can save the change of task state correctly and execute a new task. So it is like I have only one exector.
Note:
- I do not want to block my entire DB on read/write becouse in this way I will slow down the entire process.
- I think it is necessay to save the state of the task because it could be a long task.
I asked if it is possible to lock only certain record and execute a query on the "not-locked" records but each advices that solves my problem will be really appriciated.
Thanks in advance.
EDIT1:
Sorry, I simplified the concept in the question above. Actually I extract n messages that I have to send. I have to send this messages in block of 100 messages so my executors will split the messages extracted in block of 100 and pass them to others executors basically.
Each executor extract the messages and then update them with the new state. I hope this is more clear now.
#Transactional(readOnly = false, propagation = Propagation.REQUIRED)
public List<PushMessageDB> assignPendingMessages(int limitQuery, boolean sortByClientPriority,
LocalDateTime now, String senderId) {
final List<PushMessageDB> messages = repositoryMessage.findByNotSendendAndSpecificError(limitQuery, sortByClientPriority, now);
long count = repositoryMessage.updateStateAndSenderId(messages, senderId, MessageState.IN_EXECUTION);
return messages;
}
DB update:
public long updateStateAndSenderId(List<String> ids, String senderId, MessageState messageState) {
Query query = new Query(Criteria.where(INTERNAL_ID).in(ids));
Update update = new Update().set(MESSAGE_STATE, messageState).set(SENDER_ID, senderId);
return mongoTemplate.updateMulti(query, update, PushMessageDB.class).getModifiedCount();
}
You will have to do the locking one-by-one.
Trying to lock 100 records at once and at the same time have a second process also lock 100 records (without any coordination between the two) will almost certainly result in an overlapping set unless you have a huge selection of available records.
Depending on your application, having all work done by one thread (and the other being just a "hot standby") may also be acceptable as long as that single worker does not get overloaded.

Regarding sub-topics in chronicle queue

I'm looking to write messages to a single queue. I'd like to use the sub-topics functionality, so that tailers can pick and choose either to read all of the sub-topics under one topic, or pick specific sub-topics to read from.
The documentation mentions that sub-topics are supported in a directory under the main topic, so in order to read from a subtopic, do we just create a new queue and point it to the sub-topic path?
SingleChronicleQueue queue = SingleChronicleQueueBuilder.binary("Topic").build();
SingleChronicleQueue queue2 = SingleChronicleQueueBuilder.binary("Topic/SubTopic").build();
ExcerptAppender appender = queue.acquireAppender();
ExcerptAppender appender2 = queue2.acquireAppender();
appender.writeText("aaa");
appender2.writeText("bbb");
This will just output aaa, but i want it to output but aaa and bbb
There is no real concept of hierarchy in Chronicle-Queue; there is a one-to-one mapping between file-system directory and queue.
If you wish to filter certain records, you will need to do that when reading the records out of the queue. It will be up to your application to decide how to detect messages that should be filtered.
The documentation you refer to appears to have been copied from concepts that exist in Chronicle-Engine.

Can multiple Chronicle/ExcerptAppenders write to the same queue?

Using Chronicle with vertx.io...
I create a new Chronicle per per verticle. I.e: one instance per thread.
chronicle = ChronicleQueueBuilder.indexed("samePath").build(); <-- This is created per thread to the same queue.
Now for each web http POST request I do... Each post is handle by exactly 1 thread at a time.
String message = request.toString();
ExcerptAppender appender = chronicle.createAppender();
// Configure the appender to write up to 100 bytes
appender.startExcerpt(message.length()+100);
// Copy the content of the Object as binary
appender.writeObject(message);
// Commit
appender.finish();
This seems to work. But is it ok?
This is not ok for IndexedChronicle whereas it is for VanillaChronicle.
If you can, best is to share the same VanillaChonicle instance among verticles (on the same process of course) and create an appender on demand.
Note that you can use WriteUtf* instead of writeObject to serialize strings as it is much more efficient.

How to insert a batch of records into Redis

In a twitter-like application, one of the things they do is when someone posts a tweet, they iterate over all followers and create a copy of the tweet in their timeline. I need something similar. What is the best way to insert a tweet ID into say 10/100/1000 followers assuming I have a list of follower IDs.
I am doing it within Azure WebJobs using Azure Redis. Each webjob is automatically created for every tweet received in the queue. So I may have around 16 simultaneous jobs running at the same time where each one goes through followers and inserts tweets.I'm thinking if 99% of inserts happen, they should not stop because one or a few have failed. I need to continue but log it.
Question: Should I do CreateBatch like below? If I need to retrieve latest tweets first in reverse chronological order is below fine? performant?
var tasks = new List<Task>();
var batch = _cache.CreateBatch();
//loop start
tasks.Add(batch.ListRightPushAsync("follower_id", "tweet_id"));
//loop end
batch.Execute();
await Task.WhenAll(tasks.ToArray());
a) But how do I catch if something fails? try catch?
b) how do I check in a batch for a total # in each list and pop one out if it reaches a certain #? I want to do a LeftPop if the list is > 800. Not sure how to do it all inside the batch.
Please point me to a sample or let me have a snippet here. Struggling to find a good way. Thank you so much.
UPDATE
Does this look right based on #marc's comments?
var tasks = new List<Task>();
followers.ForEach(f =>
{
var key = f.FollowerId;
var task = _cache.ListRightPushAsync(key, value);
task.ContinueWith(t =>
{
if (t.Result > 800) _cache.ListLeftPopAsync(key).Wait();
});
tasks.Add(task);
});
Task.WaitAll(tasks.ToArray());
CreateBatch probably doesn't do what you think it does. What it does is defer a set of operations and ensure they get sent contiguously relative to a single connection - there are some occasions this is useful, but not all that common - I'd probably just send them individually if it was me. There is also CreateTransaction (MULTI/EXEC), but I don't think that would be a good choice here.
That depends on whether you care about the data you're popping. If not: I'd send a LTRIM, [L|R]PUSH pair - to trim the list to (max-1) before adding. Another option would be Lua, but it seems overkill. If you care about the old data, you'll need to do a range query too.

Resources