Algorithm to time-sort N data streams - algorithm

So I've got N asynchronous, timestamped data streams. Each stream has a fixed-ish rate. I want to process all of the data, but the catch is that I must process the data in order as close to the time that the data arrived as possible (it is a real-time streaming application).
So far, my implementation has been to create a fixed window of K messages which I sort by timestamp using a priority queue. I then process the entirety of this queue in order before moving on to the next window. This is okay, but its less than ideal because it creates lag proportional to the size of the buffer, and also will sometimes lead to dropped messages if a message arrives just after the end of the buffer has been processed. It looks something like this:
// Priority queue keeping track of the data in timestamp order.
ThreadSafeProrityQueue<Data> q;
// Fixed buffer size
int K = 10;
// The last successfully processed data timestamp
time_t lastTimestamp = -1;
// Called for each of the N data streams asyncronously
void receiveAsyncData(const Data& dat) {
q.push(dat.timestamp, dat);
if (q.size() > K) {
processQueue();
}
}
// Process all the data in the queue.
void processQueue() {
while (!q.empty()) {
const auto& data = q.top();
// If the data is too old, drop it.
if (data.timestamp < lastTimestamp) {
LOG("Dropping message. Too old.");
q.pop();
continue;
}
// Otherwise, process it.
processData(data);
lastTimestamp = data.timestamp;
q.pop();
}
}
Information about the data: they're guaranteed to be sorted within their own stream. Their rates are between 5 and 30 hz. They consist of images and other bits of data.
Some examples of why this is harder than it appears. Suppose I have two streams, A and B both running at 1 Hz and I get the data in the following order:
(stream, time)
(A, 2)
(B, 1.5)
(A, 3)
(B, 2.5)
(A, 4)
(B, 3.5)
(A, 5)
See how if I processed the data in order of when I received them, B would always get dropped? that's what I wanted to avoid.Now in my algorithm, B would get dropped every 10th frame, and I would process the data with a lag of 10 frames into the past.

I would suggest a producer/consumer structure. Have each stream put data into the queue, and a separate thread reading the queue. That is:
// your asynchronous update:
void receiveAsyncData(const Data& dat) {
q.push(dat.timestamp, dat);
}
// separate thread that processes the queue
void processQueue()
{
while (!stopRequested)
{
data = q.pop();
if (data.timestamp >= lastTimestamp)
{
processData(data);
lastTimestamp = data.timestamp;
}
}
}
This prevents the "lag" that you see in your current implementation when you're processing a batch.
The processQueue function is running in a separate, persistent thread. stopRequested is a flag that the program sets when it wants to shut down--forcing the thread to exit. Some people would use a volatile flag for this. I prefer to use something like a manual reset event.
To make this work, you'll need a priority queue implementation that allows concurrent updates, or you'll need to wrap your queue with a synchronization lock. In particular, you want to make sure that q.pop() waits for the next item when the queue is empty. Or that you never call q.pop() when the queue is empty. I don't know the specifics of your ThreadSafePriorityQueue, so I can't really say exactly how you'd write that.
The timestamp check is still necessary because it's possible for a later item to be processed before an earlier item. For example:
Event received from data stream 1, but thread is swapped out before it can be added to the queue.
Event received from data stream 2, and is added to the queue.
Event from data stream 2 is removed from the queue by the processQueue function.
Thread from step 1 above gets another time slice and item is added to the queue.
This isn't unusual, just infrequent. And the time difference will typically be on the order of microseconds.
If you regularly get updates out of order, then you can introduce an artificial delay. For example, in your updated question you show messages coming in out of order by 500 milliseconds. Let's assume that 500 milliseconds is the maximum tolerance you want to support. That is, if a message comes in more than 500 ms late, then it will get dropped.
What you do is add 500 ms to the timestamp when you add the thing to the priority queue. That is:
q.push(AddMs(dat.timestamp, 500), dat);
And in the loop that processes things, you don't dequeue something before its timestamp. Something like:
while (true)
{
if (q.peek().timestamp <= currentTime)
{
data = q.pop();
if (data.timestamp >= lastTimestamp)
{
processData(data);
lastTimestamp = data.timestamp;
}
}
}
This introduces a 500 ms delay in the processing of all items, but it prevents dropping "late" updates that fall within the 500 ms threshold. You have to balance your desire for "real time" updates with your desire to prevent dropping updates.

There's always be a lag and that lag will be determined by how long you'll be willing to wait for your slowest "fixed-ish rate" stream.
Suggestion:
keep the buffer
keep an array of bool flags with the meaning:"if position ix is true, in the buffer there is at least a sample originated from stream ix"
sort/process as soon as you have all flag to true
Not full-proof (each buffer will be sorted, but from one buffer to another you may have timestamp inversion), but perhaps good enough?
Playing around with the count of "satisfied" flags to trigger the processing (at step 3) may be used to make the lag smaller, but with the risk of more inter-buffer timestamp inversions. In extreme, accepting the processing with only one satisfied flag means "push a frame as soon as you receive it, timestamp sorting be damned".
I mentioned this to support my feeling that lag/timestamp inversions balance is inherent to your problem - except for absolutely equal framerates, there will be perfect solution in which one of the sides is not sacrificed.
Since a "solution" will be an act of balancing, any solution will require gathering/using extra information to help decisions (e.g. that "array of flags"). If what I suggested sounds silly for your case (may well be, the details you chose to share aren't too many), start thinking what metrics will be relevant for your targeted level of "quality of experience" and use additional data structures to help gathering/processing/using those metrics.

Related

An algorithm to increase / decrease load in an application based on the number of exceptions

I have a tonne of messages coming from a queue. Now, I want to dynamically vary the % of messages that is being read and processed by my application ( let's call it traffic %)
The parameters upon which i vary my traffic % is the number of messages failed to be processed ( errors ) by my application ( consumer of the queue )
If I hardcode something like, ' x errors in y mins (y can be fixed), reduce the traffic to z% '. Now after that, the traffic becomes low, the errors also become low. Need an algorithm, that takes into account the current traffic %, the number of errors and determines the new traffic %. Traffic % range being 25% - 100%
You take the inverse of the percent of errored messages to total messages within a time frame then you fit that percentage to your traffic range. This way if you get all errors your traffic percent would be 25% and if you get no errors your traffic percent would be 100%.
// traffic% 25%
minTraffic = 0.25
// traffic% 100%
maxTraffic = 1.00
// 25% -> 100% is a usable range of 75%
deltaTraffic = maxTraffic - minTraffic
// use Max(total, 1) to avoid divide by zero
error = (erroredMessagesPerTimeFrame / Math.max(totalMessagesPerTimeFrame, 1))
// inverse: error=1.00 becomes 0, error=0.00 becomes 1
invError = 1 - pcError
// linear clamp invError to [minTraffic, maxTraffic]
traffic = minTraffic + (deltaTraffic * invError)
This is the simplest implementation using a linear fit.
An alternate version might fit your "invError" value to the "deltaTraffic" using a curve instead, this would weigh higher and lower values closer (or further) to your "minTraffic" and "maxTraffic" depending on what type of curve you use.
Another alternative would be to just use a step function
If "invError" < 50% Then "minTraffic"
Else If "invError" < 75% Then "minTraffic" + (("maxTraffic" - "minTraffic") / 2)
Else "maxTraffic"
What you're asking for is called the Circuit Breaker design pattern. You can find good information all over; some top search results are here, here and here.
In essence, you're implementing a little state machine that may limit the number of requests depending on errors. You can have two or three states depending on if you want also want just cut off the flow or also want to throttle the flow rate for a small period.
You may also want to look at single-rate or dual-rate leaky buckets, which have been in use in the networking controllers for ages.
Here is the Microsoft implementation of the state machine. They (and the other sources)
suggest you make a generic adaptor to wrap your code and separate the concerns.
...
if (IsOpen)
{
// The circuit breaker is Open. Check if the Open timeout has expired.
// If it has, set the state to HalfOpen. Another approach might be to
// check for the HalfOpen state that had be set by some other operation.
if (stateStore.LastStateChangedDateUtc + OpenToHalfOpenWaitTime < DateTime.UtcNow)
{
// The Open timeout has expired. Allow one operation to execute. Note that, in
// this example, the circuit breaker is set to HalfOpen after being
// in the Open state for some period of time. An alternative would be to set
// this using some other approach such as a timer, test method, manually, and
// so on, and check the state here to determine how to handle execution
// of the action.
// Limit the number of threads to be executed when the breaker is HalfOpen.
// An alternative would be to use a more complex approach to determine which
// threads or how many are allowed to execute, or to execute a simple test
// method instead.
bool lockTaken = false;
try
{
Monitor.TryEnter(halfOpenSyncObject, ref lockTaken);
if (lockTaken)
{
// Set the circuit breaker state to HalfOpen.
stateStore.HalfOpen();
// Attempt the operation.
action();
// If this action succeeds, reset the state and allow other operations.
// In reality, instead of immediately returning to the Closed state, a counter
// here would record the number of successful operations and return the
// circuit breaker to the Closed state only after a specified number succeed.
this.stateStore.Reset();
return;
}
}
catch (Exception ex)
{
// If there's still an exception, trip the breaker again immediately.
this.stateStore.Trip(ex);
// Throw the exception so that the caller knows which exception occurred.
throw;
}
finally
{
if (lockTaken)
{
Monitor.Exit(halfOpenSyncObject);
}
}
}
// The Open timeout hasn't yet expired. Throw a CircuitBreakerOpen exception to
// inform the caller that the call was not actually attempted,
// and return the most recent exception received.
throw new CircuitBreakerOpenException(stateStore.LastException);
}
...

How to simulate limited RSU capacity in veins?

I have to simulate a scenario with a RSU that has limited processing capacity; it can only process a limited number of messages in a time unit (say 1 second).
I tried to set a counter in the RSU application. the counter is incremented each time the RSU receives a message and decremented after processing it. here is what I have done:
void RSUApp::onBSM(BasicSafetyMessage* bsm)
{
if(msgCount >= capacity)
{
//drop msg
this->getParentModule()->bubble("capacity limit");
return;
}
msgCount++;
//process message here
msgCount--;
}
it seems useless, I tested it using capacity limit=1 and I have 2 vehicles sending messages at the same time. the RSU process both although it should process one and drop the other.
can anyone help me with this?
In the beginning of the onBSM method the counter is incremented, your logic gets executed and finally the counter gets decremented. All those steps happen at once, meaning in one step of the simulation.
This is the reason why you don't see an effect.
What you probably want is a certain amount of "messages" to be processed in a certain time interval (e.g. 500 ms). It could somehow look like this (untested):
if (simTime() <= intervalEnd && msgCount >= capacity)
{
this->getParentModule()->bubble("capacity limit");
return;
} else if (simTime() > intervalEnd) {
intervalEnd = simTime() + YOURINTERVAL;
msgCount = 0;
}
......
The variable YOURINTERVAL would be time amount of time you like to consider as the interval for your capacity.
You can use self messaging with scheduleAt(simTime()+delay, yourmessage);
the delay will simulate the required processing time.

Algorithm for fragmenting data into packets

Lets just say I want to fragment some data units into packets (max size per packet is lets say 1024 bytes). Each data unit can be of variable size, say:
a = 20 bytes
b = 1000 bytes
c = 10 bytes
d = 800 bytes
Can anyone please suggest any efficient algorithm to create packets with such random data efficiently utilizing the bandwidth? I cannot split the individual data units into bytes...they go whole inside a packet.
EDIT: The ordering of data units is of no concern!
There are several different ways, depending on your requirements and how much time you want to spend on it. The general problem, as #amit mentioned in comments, is NP-Hard. But you can get some improvement with some simple changes.
Before we go there, are you sure you really need to do this? Most networking layers have a packet-sized (or larger) buffer. When you write to the network, it puts your data in that buffer. If you don't fill the buffer completely, the code will delay briefly before sending. If you add more data during that delay, the new data is added to the buffer. The buffer is sent once it fills, or after the delay timeout expires.
So if you have a loop that writes one byte at a time to the network, it's not like you'll be creating a large number of one-byte packets.
On the receiving side, the lowest level networking layer receives an entire packet, but there's no guarantee that your call to receive the data will get the entire packet. That is, the sender might send an 800 byte packet, but on the receiving end the first call to read might only return 50 or 273 bytes.
This depends, of course, at what level you're reading the data. If you're talking about something like Java or .NET, where your interface to the network stack is through a socket, you almost certainly can't guarantee that a call to socket.Read() will return an entire packet.
Now, if you can guarantee that every call to read returns an entire packet, then the easiest way to pack things would be to serialize everything into one big buffer and then send it out in multiple 1,024-byte packets. You'll want to create a header at the front of the first packet that says how many total bytes will be sent, so the receiver knows what to expect. The result will be a bunch of 1,024-byte packets, potentially followed by a final packet that is somewhat smaller.
If you want to make sure that a data object is fully contained within a single packet, then you have to do something like:
add a to buffer
if remaining buffer < size of b
send buffer
clear buffer
add b to buffer
if remaining buffer < size of c
send buffer
clear buffer
add c to buffer
... etc ...
Here's some simple JavaScript pseudo code. The packets will stay ordered and the bandwidth will be used optimally.
packets = [];
PACKET_SIZE = 1024;
currentPacket = [];
function write(data) {
var len = currentPacket.length + data.length;
if(len < PACKET_SIZE) {
currentPacket = currentPacket.concat(data);
} else if(len === PACKET_SIZE) {
packets.push(currentPacket.concat(data));
currentPacket = [];
} else { // if(len > PACKET_SIZE) {
packets.push(currentPacket);
currentPacket = data;
}
}
function flush() {
if(currentPacket.length > 0) {
packets.push(currentPacket);
currentPacket = [];
}
}
write(data20bytes);
write(data1000bytes);
write(data10bytes);
write(data800bytes);
flush();
EDIT Since you have all of the data chunks and you want to optimally package them out of order (bin packing) then you left with trying every permutation of the chunks for an exact answer or compromising with an best guess/first fit type algorithm.

Atomic Broadcast Exercice

I'm trying to solve the exercice 5.10 of the book
"Foundations of Multithreaded, Parallel, and Distributed Programming".
The exercice is
"Assume one producer process and N consumer processes share a bounded buffer having B slots. The producer deposits messages in the buffer; consumers fetch them. Every message deposited by the producer is to be received by all N consumers. Futthermore, each consumer is to receive the messages in the order theu were deposited. However, consumers can receive messages at different times. For example, one consumer could receive up to B more messages than another if the second consumer is slow.
Develop a monitor that implements this kind of communication. Use Signal and Continue discipline."
Can someone help me, please?
Thank you very much!
--
EDIT:
I'm commenting now what I already made (cause I thought that the question was very big if I wrote everything that).
/* creating a buffer of B positions. */
global buffer[B];
Monitor {
cond ok_write;
cond ok_read;
int stamp_buffer[B] = [0, 0, .., 0]
request_write (int pos){
if (stamp_buffer[pos] > 0)
wait(ok_write);
write_message (buufer[pos]);
stamp_buffer[pos] = N;
signalAll (ok_read);
}
request_read (int pos){
if (stamp_buffer[pos] == 0)
wait (ok_read);
stamp_buffer[pos] --;
}
release_read (int pos){
if (stamp_buffer[pos]==0)
signal(ok_write);
}
}
So, I think that I still have that problem: "A reader can read the same message two times."
The basic idea of my algorithm is:
The writer write in a position "pos" and set the value of stamp[pos] to N.
Then, when each reader read the position pos, it do stamp[pos] - 1.
So, if stamp[pos] is zero, the message buffer[pos] was already readed N times and the writer can write in this position again.
But, if some reader read a message two times (or more), the writer can wirte a new message in the position pos and some reader will not read the old message.

Algorithm for solving producer/consumer issue

I want to know how to solve the producer/consumer problem but using 2 differents consumers and I also need to know how to solve it using strict alternation.
I made the following algorithm for 1 producer and 1 consumer
producer()
{
while(true)
{
if i == N //full buffer
turn = 1
while turn <> 0
{
// nothing
}
produceitem(&item)//produce the item
insertitem(item, buffer)//insert the item in the buffer
turn = 1
//process zone
}
}
consumer()
{
while(true)
{
if i == 0 //empty buffer
turn = 0
while turn <> 1
{
// nothing
}
consumeitem(&item)
deleteitem(buffer)//erase the item from buffer
turn = 0
//process zone
}
}
Using that kind of "pseudo-code" I want to know to to solve the same problem(if the last was OK) with 2 consumers.
You can use the router pattern on small scale in both cases:
(source: enterpriseintegrationpatterns.com)
Basically after the queue you place special artifical consumer (router, just one). If you have two competing consumers, just put each received message randomly either in outQueue1 or outQueue2.
In case of strict alternation, the router remember which queue was used last time and sends to the second one.
If you don't want to introduce extra step, you need some sort of synchronization. In first case both consumers are competing for the same lock that they obtain randomly. The latter case is more complicated and will require more advanced synchronization so that both consumers are woken up alternatively.

Resources