Kafka job assignment & completion - apache-kafka-streams

I have requirement where i receive a record in KafkaStream, I divide this single record into n records based on logic. These n records are pushed to another stream. These n records are processed parallely. But I need to know the completion of n jobs so I can send response back. Can you please let me know how can i achieve this.

I think the simplest way to do it is to set up a consumer to subscribe to that stream and do process it.

If I understand correctly, you want to know that in fact "N messages were processed". AFAIK, there's not a universal solution for this, but here's what I would do.
(Note: My solution assumes that you have a single consumer consuming the second stream.)
To know that N records have been processed, you have to know what N is. So when you divide the single record into N records, store that N in the records themselves (e.g. if your records are JSON objects, add a "n": 5 key-value pair). Then, in the consumer(s) that process the 2nd stream (the one that contains N records), with each message you consume you do the following:
consume message and process message
increment a counter (let's call it K) - this could be an in-memory or a persistent storage-backed counter
compare K to N
if equal, all messages have been processed (at this point you might want to exit)
otherwise, there are more messages to be processed; continue consuming

Related

Schedule sending messages to consumers at different rate

I'm looking for best algorithm for message schedule. What I mean with message schedule is a way to send a messages on the bus when we have many consumers at different rate.
Example :
Suppose that we have data D1 to Dn
. D1 to send to many consumer C1 every 5ms, C2 every 19ms, C3 every 30ms, Cn every Rn ms
. Dn to send to C1 every 10ms, C2 every 31ms , Cn every 50ms
What is best algorithm which schedule this actions with the best performance (CPU, Memory, IO)?
Regards
I can think of quite a few options, each with their own costs and benefits. It really comes down to exactly what your needs are -- what really defines "best" for you. I've pseudocoded a couple possibilities below to hopefully help you get started.
Option 1: Execute the following every time unit (in your example, millisecond)
func callEachMs
time = getCurrentTime()
for each datum
for each customer
if time % datum.customer.rate == 0
sendMsg()
This has the advantage of requiring no consistently stored memory -- you just check at each time unit whether your should be sending a message. This can also deal with messages that weren't sent at time == 0 -- just store the time the message was initially sent modulo the rate, and replace the conditional with if time % datum.customer.rate == data.customer.firstMsgTimeMod.
A downside to this method is it is completely reliant on always being called at a rate of 1 ms. If there's lag caused by another process on a CPU and it misses a cycle, you may miss sending a message altogether (as opposed to sending it a little late).
Option 2: Maintain a list of lists of tuples, where each entry represents the tasks that need to be done at that millisecond. Make your list at least as long as the longest rate divided by the time unit (if your longest rate is 50 ms and you're going by ms, your list must be at least 50 long). When you start your program, place the first time a message will be sent into the queue. And then each time you send a message, update the next time you'll send it in that list.
func buildList(&list)
for each datum
for each customer
if list.size < datum.customer.rate
list.resize(datum.customer.rate+1)
list[customer.rate].push_back(tuple(datum.name, customer.name))
func callEachMs(&list)
for each (datum.name, customer.name) in list[0]
sendMsg()
list[customer.rate].push_back((datum.name, customer.name))
list.pop_front()
list.push_back(empty list)
This has the advantage of avoiding the many unnecessary modulus calculations option 1 required. However, that comes with the cost of increased memory usage. This implementation would also not be efficient if there's a large disparity in the rate of your various messages (although you could modify this to deal with algorithms with longer rates more efficiently). And it still has to be called every millisecond.
Finally, you'll have to think very carefully about what data structure you use, as this will make a huge difference in its efficiency. Because you pop from the front and push from the back at every iteration, and the list is a fixed size, you may want to implement a circular buffer to avoid unneeded moving of values. For the lists of tuples, since they're only ever iterated over (random access isn't needed), and there are frequent additions, a singly-linked list may be your best solution.
.
Obviously, there are many more ways that you could do this, but hopefully, these ideas can get you started. Also, keep in mind that the nature of the system you're running this on could have a strong effect on which method works better, or whether you want to do something else entirely. For example, both methods require that they can be reliably called at a certain rate. I also haven't described parallellized implementations, which may be the best option if your application supports them.
Like Helium_1s2 described, there is a second way which based on what I called a schedule table and this is what I used now but this solution has its limits.
Suppose that we have one data to send and two consumer C1 and C2 :
Like you can see we must extract our schedule table and we must identify the repeating transmission cycle and the value of IDLE MINIMUM PERIOD. In fact, it is useless to loop on the smallest peace of time ex 1ms or 1ns or 1mn or 1h (depending on the case) BUT it is not always the best period and we can optimize this loop as follows.
for example one (C1 at 6 and C2 at 9), we remark that there is cycle which repeats from 0 to 18. with a minimal difference of two consecutive send event equal to 3.
so :
HCF(6,9) = 3 = IDLE MINIMUM PERIOD
LCM(6,9) = 18 = transmission cycle length
LCM/HCF = 6 = size of our schedule table
And the schedule table is :
and the sending loop looks like :
while(1) {
sleep(IDLE_MINIMUM_PERIOD); // free CPU for idle min period
i++; // initialized at 0
send(ScheduleTable[i]);
if (i == sizeof(ScheduleTable)) i=0;
}
The problem with this method is that this array will grows if LCM grows which is the case if we have bad combination like with rate = prime number, etc.

perf_event_open: handling last recorded sample

When counting for events based on a specific sampling period, how to handle the last recorded sample when the last counter value of the leader is less than the sampling period.
Update:
I have checked the value of type which is a member of struct perf_event_header. For the last recorded sample this value is zero and according to perf_event.h header file, it does not seem that the value of zero has a corresponding sample record type!
To put my question in other words: How does perf_event API deal with the case when the workload finishes execution but the group leader counter value is less than the value of the sampling period? Is the data discarded at this case?
How does perf_event API deal with the case when the workload finishes execution but the group leader counter value is less than the value of the sampling period?
Nothing happens. If the event count is not reached yet, no sample is written.
You should consider that samples are typically statistical information.
If you really need to know you could possibly use some form of ptrace and manually read the counter value before the thread terminates.
If you read a perf_event_header with a type == 0, I would be concerned. I don't think that should ever happen.
Edit:
As per the manpage, I believe you cannot read the remaining value from that particular event because sampling and counting events are exclusive.
Events come in two flavors: counting and sampled. A counting event
one that is used for counting the aggregate number of events that.
In general, counting event results are gathered with a
read(2) call. A sampling event periodically writes measurements to a buffer
that can then be accessed via mmap(2).

PBFT view-change: What happens to committed operations after the valid snapshot?

PBFT says that if the timer of backup i expires in view v then it starts a view change for v+1 by multicasting <view-change, v+1, n, C, P, i> where n is the sequence number of the last stable checkpoint s and P is a set containing a set Pm for each request m that prepared at i with a sequence number higher than n.
Now, the checkpoints are taken periodically so, there can be prepared messages at i with a sequence number higher than n which are already committed. We don't want these to be included in Pm as they are already committed.
So, how does PBFT handles that?
I think that those messages are executed again. When a view-change happens, all the nodes would be in the same checkpoint.
When the "new" primary for the view v + 1 receives 2⨍ valid view-change messages, multicasts a new-view message. In the message that is sent, it indicates:
V: set of received and valid view-change messages.
P: set of pre-prepared unprocessed messages. These messages are calculate as follows:
From the last stable checkpoint, you get the sequence number of the last executed request. This value will correspond to the min-s.
Take the largest sequence number of all prepared messages you have received. This value will correspond to the max-s.
As a result, it generates as many pre-prepared messages as needed.
Each node saves the log of the messages that are not inside a chekcpoint, so they don't need to be procesed again.

How to queue database row updates to occur every n seconds using Redis

I have a nosql database rows of two types:
Rows that are essentially counters with a high number of updates per second. It doesn't matter if these updates are done in a batch once every n seconds (where n is say 2 seconds).
Rows that contain tree-like structures, and each time the row is updated the tree structure has to be updated. Updating the tree structure each time is expensive, it would be better to do it as a batch job once every n seconds.
This is my plan and then I will explain the part I am struggling to execute and whether I need to move to something like RabbitMQ.
Each row has a unique id which I use as the key for redis. Redis can easily do loads of counter increments no problem. As for the tree structure, each update for the row can use the string append command to appen json instructions on how to modify the existing tree in the database.
This is the tricky part
I want to ensure each row gets updated every n seconds. There will be a large amount of redis keys getting updated.
This was my plan. Have three queues: pre-processing, processing, dead
By default every key is placed in the pre-processing queue when the command for a database update comes in. After exactly n seconds move each key/value which has been there for n seconds to the processing queue (don't know how to do this efficiently and concurrently). Now n seconds have passed, it doesn't matter which order the processing queue is done in and I can have any consumers racing through them. And I will have a dead queue in case tasks keep failing for some reason.
Is there a better way to do this? Is what I am thinking of possible?

How to write a trident topology without aggregations?

I would like to process tuples in batches for which I am in a thought of using Trident API. However, there are no operations that I perform in batches here. Every tuple is processed individually. All that I need here is exactly-once semantics so that every tuple is processed only once and this is the only reason to use Trident.
I want to store the information of which tuple is processed so that when a batch is replayed, the tuple will not be executed when that is already processed.
The topology contains a persistentAggregate() method, but it takes some aggregation operation but I don't have any aggregation operation to perform on a set of tuples as every tuple is processed individually.
Here, the functions that a tuple undergoes are too minute to be executed. So, I am looking to process them in batches in order to save computing resources and time.
Now, how to write a topology which consumes tuples as batches but still doesn't perform any batch operations (like word count)?
Looks like what you need is partitionPersist.
It should be provided with a state (or a state factory), fields to persist and an updater.
For development purposes check MemoryMapState - it's basically an in-memory hashmap.
For production you can use, say, cassandra - check out the examples at https://github.com/hmsonline/storm-cassandra-cql

Resources