scheduling events using shortest duration as greedy algorithm - algorithm

There is a list of events and I want to schedule maximum number of events in a day in a way that they don't conflict with each other and the events are scheduled based on short duration events being considered first. I have my code attached below which shows what I am trying to do, I know I am sorting them correctly based on their duration but further I am not able to schedule them without conflicts. Can someone please help me with this.
def shortest_duration(events):
events.sort(key= lambda x: x[1]-x[0])
print(events)
finish =0
ans=[]
for event in events:
if finish <= event[0]:
finish= event[1]
ans.append(event)
return ans
events = [(420, 480), (420, 510), (450, 550), (480, 570), (510, 540), (540, 570), (540, 630), (570, 630)]
print(events)
print(shortest_duration(events))

Related

An algorithm to increase / decrease load in an application based on the number of exceptions

I have a tonne of messages coming from a queue. Now, I want to dynamically vary the % of messages that is being read and processed by my application ( let's call it traffic %)
The parameters upon which i vary my traffic % is the number of messages failed to be processed ( errors ) by my application ( consumer of the queue )
If I hardcode something like, ' x errors in y mins (y can be fixed), reduce the traffic to z% '. Now after that, the traffic becomes low, the errors also become low. Need an algorithm, that takes into account the current traffic %, the number of errors and determines the new traffic %. Traffic % range being 25% - 100%
You take the inverse of the percent of errored messages to total messages within a time frame then you fit that percentage to your traffic range. This way if you get all errors your traffic percent would be 25% and if you get no errors your traffic percent would be 100%.
// traffic% 25%
minTraffic = 0.25
// traffic% 100%
maxTraffic = 1.00
// 25% -> 100% is a usable range of 75%
deltaTraffic = maxTraffic - minTraffic
// use Max(total, 1) to avoid divide by zero
error = (erroredMessagesPerTimeFrame / Math.max(totalMessagesPerTimeFrame, 1))
// inverse: error=1.00 becomes 0, error=0.00 becomes 1
invError = 1 - pcError
// linear clamp invError to [minTraffic, maxTraffic]
traffic = minTraffic + (deltaTraffic * invError)
This is the simplest implementation using a linear fit.
An alternate version might fit your "invError" value to the "deltaTraffic" using a curve instead, this would weigh higher and lower values closer (or further) to your "minTraffic" and "maxTraffic" depending on what type of curve you use.
Another alternative would be to just use a step function
If "invError" < 50% Then "minTraffic"
Else If "invError" < 75% Then "minTraffic" + (("maxTraffic" - "minTraffic") / 2)
Else "maxTraffic"
What you're asking for is called the Circuit Breaker design pattern. You can find good information all over; some top search results are here, here and here.
In essence, you're implementing a little state machine that may limit the number of requests depending on errors. You can have two or three states depending on if you want also want just cut off the flow or also want to throttle the flow rate for a small period.
You may also want to look at single-rate or dual-rate leaky buckets, which have been in use in the networking controllers for ages.
Here is the Microsoft implementation of the state machine. They (and the other sources)
suggest you make a generic adaptor to wrap your code and separate the concerns.
...
if (IsOpen)
{
// The circuit breaker is Open. Check if the Open timeout has expired.
// If it has, set the state to HalfOpen. Another approach might be to
// check for the HalfOpen state that had be set by some other operation.
if (stateStore.LastStateChangedDateUtc + OpenToHalfOpenWaitTime < DateTime.UtcNow)
{
// The Open timeout has expired. Allow one operation to execute. Note that, in
// this example, the circuit breaker is set to HalfOpen after being
// in the Open state for some period of time. An alternative would be to set
// this using some other approach such as a timer, test method, manually, and
// so on, and check the state here to determine how to handle execution
// of the action.
// Limit the number of threads to be executed when the breaker is HalfOpen.
// An alternative would be to use a more complex approach to determine which
// threads or how many are allowed to execute, or to execute a simple test
// method instead.
bool lockTaken = false;
try
{
Monitor.TryEnter(halfOpenSyncObject, ref lockTaken);
if (lockTaken)
{
// Set the circuit breaker state to HalfOpen.
stateStore.HalfOpen();
// Attempt the operation.
action();
// If this action succeeds, reset the state and allow other operations.
// In reality, instead of immediately returning to the Closed state, a counter
// here would record the number of successful operations and return the
// circuit breaker to the Closed state only after a specified number succeed.
this.stateStore.Reset();
return;
}
}
catch (Exception ex)
{
// If there's still an exception, trip the breaker again immediately.
this.stateStore.Trip(ex);
// Throw the exception so that the caller knows which exception occurred.
throw;
}
finally
{
if (lockTaken)
{
Monitor.Exit(halfOpenSyncObject);
}
}
}
// The Open timeout hasn't yet expired. Throw a CircuitBreakerOpen exception to
// inform the caller that the call was not actually attempted,
// and return the most recent exception received.
throw new CircuitBreakerOpenException(stateStore.LastException);
}
...

How come using a asyncScheduler keyword in a range operator postpones the subscription?

When running the following source code in an Angular application, the 50 Normal: logger statements are displayed first and only then are displayed the 50 Scheduler: logger statements.
range(1, 50, asyncScheduler).subscribe((value: number) => console.log('Scheduler: ' + value));
range(1, 50).subscribe((value: number) => console.log('Noraml: ' + value));
Of course, removing the asyncScheduler keyword cancels this reverse display.
Why is that display being reversed ?
Using asyncScheduler doesn't postpone subscription. It only makes all emissions to be emitted asynchronously (just like wrapping each next() with setTimeout()).
RxJS is strictly synchronous unless you work with time or use asynchronous scheduler yourself. So when you use range(1, 50) it will emit all values synchronously in the same event before any emission from asyncScheduler can reach its observer. All 50 next emissions are stacked in the event queue waiting until the current event ends.

How to simulate limited RSU capacity in veins?

I have to simulate a scenario with a RSU that has limited processing capacity; it can only process a limited number of messages in a time unit (say 1 second).
I tried to set a counter in the RSU application. the counter is incremented each time the RSU receives a message and decremented after processing it. here is what I have done:
void RSUApp::onBSM(BasicSafetyMessage* bsm)
{
if(msgCount >= capacity)
{
//drop msg
this->getParentModule()->bubble("capacity limit");
return;
}
msgCount++;
//process message here
msgCount--;
}
it seems useless, I tested it using capacity limit=1 and I have 2 vehicles sending messages at the same time. the RSU process both although it should process one and drop the other.
can anyone help me with this?
In the beginning of the onBSM method the counter is incremented, your logic gets executed and finally the counter gets decremented. All those steps happen at once, meaning in one step of the simulation.
This is the reason why you don't see an effect.
What you probably want is a certain amount of "messages" to be processed in a certain time interval (e.g. 500 ms). It could somehow look like this (untested):
if (simTime() <= intervalEnd && msgCount >= capacity)
{
this->getParentModule()->bubble("capacity limit");
return;
} else if (simTime() > intervalEnd) {
intervalEnd = simTime() + YOURINTERVAL;
msgCount = 0;
}
......
The variable YOURINTERVAL would be time amount of time you like to consider as the interval for your capacity.
You can use self messaging with scheduleAt(simTime()+delay, yourmessage);
the delay will simulate the required processing time.

Algorithm to time-sort N data streams

So I've got N asynchronous, timestamped data streams. Each stream has a fixed-ish rate. I want to process all of the data, but the catch is that I must process the data in order as close to the time that the data arrived as possible (it is a real-time streaming application).
So far, my implementation has been to create a fixed window of K messages which I sort by timestamp using a priority queue. I then process the entirety of this queue in order before moving on to the next window. This is okay, but its less than ideal because it creates lag proportional to the size of the buffer, and also will sometimes lead to dropped messages if a message arrives just after the end of the buffer has been processed. It looks something like this:
// Priority queue keeping track of the data in timestamp order.
ThreadSafeProrityQueue<Data> q;
// Fixed buffer size
int K = 10;
// The last successfully processed data timestamp
time_t lastTimestamp = -1;
// Called for each of the N data streams asyncronously
void receiveAsyncData(const Data& dat) {
q.push(dat.timestamp, dat);
if (q.size() > K) {
processQueue();
}
}
// Process all the data in the queue.
void processQueue() {
while (!q.empty()) {
const auto& data = q.top();
// If the data is too old, drop it.
if (data.timestamp < lastTimestamp) {
LOG("Dropping message. Too old.");
q.pop();
continue;
}
// Otherwise, process it.
processData(data);
lastTimestamp = data.timestamp;
q.pop();
}
}
Information about the data: they're guaranteed to be sorted within their own stream. Their rates are between 5 and 30 hz. They consist of images and other bits of data.
Some examples of why this is harder than it appears. Suppose I have two streams, A and B both running at 1 Hz and I get the data in the following order:
(stream, time)
(A, 2)
(B, 1.5)
(A, 3)
(B, 2.5)
(A, 4)
(B, 3.5)
(A, 5)
See how if I processed the data in order of when I received them, B would always get dropped? that's what I wanted to avoid.Now in my algorithm, B would get dropped every 10th frame, and I would process the data with a lag of 10 frames into the past.
I would suggest a producer/consumer structure. Have each stream put data into the queue, and a separate thread reading the queue. That is:
// your asynchronous update:
void receiveAsyncData(const Data& dat) {
q.push(dat.timestamp, dat);
}
// separate thread that processes the queue
void processQueue()
{
while (!stopRequested)
{
data = q.pop();
if (data.timestamp >= lastTimestamp)
{
processData(data);
lastTimestamp = data.timestamp;
}
}
}
This prevents the "lag" that you see in your current implementation when you're processing a batch.
The processQueue function is running in a separate, persistent thread. stopRequested is a flag that the program sets when it wants to shut down--forcing the thread to exit. Some people would use a volatile flag for this. I prefer to use something like a manual reset event.
To make this work, you'll need a priority queue implementation that allows concurrent updates, or you'll need to wrap your queue with a synchronization lock. In particular, you want to make sure that q.pop() waits for the next item when the queue is empty. Or that you never call q.pop() when the queue is empty. I don't know the specifics of your ThreadSafePriorityQueue, so I can't really say exactly how you'd write that.
The timestamp check is still necessary because it's possible for a later item to be processed before an earlier item. For example:
Event received from data stream 1, but thread is swapped out before it can be added to the queue.
Event received from data stream 2, and is added to the queue.
Event from data stream 2 is removed from the queue by the processQueue function.
Thread from step 1 above gets another time slice and item is added to the queue.
This isn't unusual, just infrequent. And the time difference will typically be on the order of microseconds.
If you regularly get updates out of order, then you can introduce an artificial delay. For example, in your updated question you show messages coming in out of order by 500 milliseconds. Let's assume that 500 milliseconds is the maximum tolerance you want to support. That is, if a message comes in more than 500 ms late, then it will get dropped.
What you do is add 500 ms to the timestamp when you add the thing to the priority queue. That is:
q.push(AddMs(dat.timestamp, 500), dat);
And in the loop that processes things, you don't dequeue something before its timestamp. Something like:
while (true)
{
if (q.peek().timestamp <= currentTime)
{
data = q.pop();
if (data.timestamp >= lastTimestamp)
{
processData(data);
lastTimestamp = data.timestamp;
}
}
}
This introduces a 500 ms delay in the processing of all items, but it prevents dropping "late" updates that fall within the 500 ms threshold. You have to balance your desire for "real time" updates with your desire to prevent dropping updates.
There's always be a lag and that lag will be determined by how long you'll be willing to wait for your slowest "fixed-ish rate" stream.
Suggestion:
keep the buffer
keep an array of bool flags with the meaning:"if position ix is true, in the buffer there is at least a sample originated from stream ix"
sort/process as soon as you have all flag to true
Not full-proof (each buffer will be sorted, but from one buffer to another you may have timestamp inversion), but perhaps good enough?
Playing around with the count of "satisfied" flags to trigger the processing (at step 3) may be used to make the lag smaller, but with the risk of more inter-buffer timestamp inversions. In extreme, accepting the processing with only one satisfied flag means "push a frame as soon as you receive it, timestamp sorting be damned".
I mentioned this to support my feeling that lag/timestamp inversions balance is inherent to your problem - except for absolutely equal framerates, there will be perfect solution in which one of the sides is not sacrificed.
Since a "solution" will be an act of balancing, any solution will require gathering/using extra information to help decisions (e.g. that "array of flags"). If what I suggested sounds silly for your case (may well be, the details you chose to share aren't too many), start thinking what metrics will be relevant for your targeted level of "quality of experience" and use additional data structures to help gathering/processing/using those metrics.

QProgressBar causing bad performance in QT5?

I'm developping a program which parses a file (365000 lines) in which I try to match some keywords after reading each line. This computation along with the update of my QProgressBar are made in another thread using QThread. Everything works fine except for the performance especially when I update the QProgressBar. I use a timer for the parsing and the result is just STUNNING. When I emit a signal to update the QProgressBar the program takes around 45 seconds but when I do not emit the signal for the QProgressBar update then the program takes around 0.40 sec =/
from PyQt5 import QtCore, QtWidgets, QtGui
import sys
import time
liste = ["failed", "exception"]
class ParseFileAsync(QtCore.QThread):
match = QtCore.pyqtSignal(str)
PBupdate = QtCore.pyqtSignal(int)
PBMax = QtCore.pyqtSignal(int)
def run(self):
cpt = 0
with open("test.txt", "r") as fichier:
fileLines = fichier.readlines()
lineNumber = len(fileLines)
self.PBMax.emit(lineNumber)
t0 = time.time()
for line in fileLines:
cpt+=1
self.PBupdate.emit(cpt)
for element in liste:
if element in line:
self.match.emit(line)
finalTime = time.time() - t0
print("over :", finalTime)
class Ui_MainWindow(QtWidgets.QMainWindow):
def __init__(self):
super().__init__()
self.setupUi(self)
self.thread = ParseFileAsync()
self.thread.match.connect(self.printError)
self.thread.PBupdate.connect(self.updateProgressBar)
self.thread.PBMax.connect(self.setMaximumProgressBar)
self.pushButton_GO.clicked.connect(self.startThread)
def printError(self, line):
self.textEdit.append(line)
def updateProgressBar(self, value):
self.progressBar.setValue(value)
def setMaximumProgressBar(self, value):
self.progressBar.setMaximum(value)
def startThread(self):
self.thread.start()
Console output:
over : 44.49321101765038 //QProgressBar updated
over : 0.3695987798147516 //QProgressBar not updated
Am I missing something or is that expected ?
EDIT :
I followed jpo38 and Matteo very good advices. I update the QProgressBar less frequently. The progression is still smooth and the performance is very good (around one second with this implementation). PSB :
class ParseFileAsync(QtCore.QThread):
match = QtCore.pyqtSignal(str)
PBupdate = QtCore.pyqtSignal(int)
PBMax = QtCore.pyqtSignal(int)
def run(self):
with open("test_long.log", "r") as fichier:
fileLines = fichier.readlines()
self.lineNumber = len(fileLines)
self.PBMax.emit(self.lineNumber)
if (self.lineNumber < 30):
self.parseFile(fileLines, False)
else:
self.parseFile(fileLines, True)
def parseFile(self, fileLines, isBig):
cpt = 0
if(isBig):
for line in fileLines:
cpt+=1
if(cpt % (int(self.lineNumber/30)) == 0):
self.PBupdate.emit(cpt)
for element in liste:
if element in line:
self.match.emit(line)
self.PBupdate.emit(self.lineNumber) #To avoid QProgressBar stopping at 99%
else:
for line in fileLines:
cpt+=1
self.PBupdate.emit(cpt)
for element in liste:
if element in line:
self.match.emit(line)
Updating a QProgressBar too often will definitely lead to performance issues. You should update the progress bar less often. You don't want/need to do that for every iteration...365000 times. When you read one line out of 365000, you progressed by 0.0002%, no need to update GUI for that...
Showing progression to user always has a cost...and we accept that because user prefers to wait a bit more and have progress information. However, showing progression must not multiply processing time by 100 as you experienced.
You can either emit the signal to update the progress bar only when progression significantly changed (for instance every time percentage value casted to a int changed, you can store a progression as int value to check that...or test if (line%(fileLines/100)==0) for instance...this will significantly decrease the cost of progress bar update).
Or you could start a QTimer to update the progress bar every 100ms for instance. Then you don't emit any signal from the for loop and just save the progression value to be used when timer times out.
If the file size is always 365000 lines, you can also decide to emit the signal every 1000 lines for instance (if line%1000==0). But the two earlier solutions are preferable because they will fix your performance issues whatever the file size is.
That's a classic problem, every experienced developer I know have a story about a supposedly long process where most of the time was actually taken by the progressbar update (most of these stories end up with removing the progress bar completely).
The point is that, very often, the "unit of work" you process (in your case the parsing of a line) is way smaller than the cost of a progress bar update - GUIs are fast compared to the user reflexes, but are still quite heavyweight when compared to, say, parsing a single line (especially if cross-thread machinery is involved).
In my experience, there are three common solutions:
if you notice that your process in general is "fast" you just drop the progressbar (or replace it with those unhelpful "forward and backwards" progress bars just to show that your program isn't hung if the program is some times fed with files way bigger than usual);
you just update it less frequently; you can emit your signal every 1/100 of your total progress; the advancement will be still smooth and you should not have performance problems (100 updates aren't going to take much time, although I guess they'll still dominate the time taken by your process if it normally takes 0.40 seconds) ;
you can decouple completely the progressbar update from the code that actually does the stuff. Instead of emitting a signal, update an integer class member with the current progress (which should be quite cheap); in the GUI thread use a timer to update the progressbar according to this member every - say - 0.5 seconds. You can even be smarter and avoid showing the progressbar completely if the process finishes before the first timer tick.

Resources