I am currently working on a project and I am looking for a technique that will solve this scenario:
There are people waiting in a room to take one of many tests. There can be multiple tests assigned to each person. Each test may be given at one or more locations at a given time, but only one person can take the test at a given location at a time.
It is relatively simple just to randomly assign people to the tests and eventually they all get done, but what kind of system could I use to make it where people wait a relatively equal time? If I just randomly assign them, a person that only has to take one of the tests could be put behind people that have to take 5.
I have thought about assigning people with a lower number of tests to take first, but I have not yet tested that and it seems like it would still be unfair. And to add complexity, I am adding a feature that allows the priority to be changed.
To be clear, this is not a homework assignment. This project is still in the logical development phase, so I haven't really started programming to compare different techniques. The closest thing that I have thought of would be to create a system that acts somewhat like a thread pool, but I have not found anything that gives a detailed description of the techniques behind a thread pool and it seems that it would require a good bit of overhead and still run into problems if I just used a thread pool directly. I have also looked into the C# Queue class, but I haven't thought of a way to expand its capability.
Anyone have any ideas or suggestions?
C# (and most other languages) has a concurrent priority queue that you could use. Place the test takers on the queue, and remove one (and assign one test to it) whenever a room frees up; if the test taker has more tests left to take, then put it back on the queue.
One way to balance your execution times is to assign a random priority to your "test-takers," e.g.
testTaker.serPriority(random.Next(CONSTANT * testTaker.numberOfRemainingTests))
Then reset the test taker's priority whenever it completes a test. This will favor assigning tests to test takers with more tests to take, while the random element will approximate fairness. CONSTANT ought to be greater than the number of test takers to ensure sufficient randomness.
Related
I am not sure how to express my scenario using activity diagrams:
What I am trying to visualise is the fact that:
A message is received
Two independent and concurrent actions take place: logging of the message and processing the message
Logging always takes less time than processing
The first activity in the diagram is correct in the sense that the actions are independent but it does not relay the fact that logging is guaranteed to take less time than processing.
The second activity in the diagram is not correct because, even if logging completes before processing, it looks as though processing depended on the logging's finishing first and that does not represent the reality.
Here is a non-computer related example:
You are a novice in birdwatching, trying to make your first notes in your notebook about birds passing by
A flock of birds approaches, you try to recognise as many details as possible
You want to write down the details in your notebook, but wait, you begin to realise that your theoretical background does not work in practice, what should be a quick scribble actually amounts to nothing in the end because you did not recognise anything
In the meantime, the birds majestically flew away without waiting for you, the activity is gone
Or maybe you did actually write it down, it took you only a moment and the birds are still nearby, slowly flying away, ending the activity again after some time
Or maybe you were under such awe that you just kept watching at them, without taking any notes - they fly away, disappearing in the horizon, ending the activity
After a few hours, you have enough notes and you come home very happy - maybe you did not capture everything but this was enough to make you smile anyway
I can always add a comment to a diagram to express it all somehow but I wonder, is there a more structured way to express what I described in an activity diagram? If not an activity diagram then what kind of a diagram would be better suited in your opinion? Thank you.
Your first diagram assumes that the duration of logging is always shorter than processing:
If this assumption is correct, the upper flow reaches the flow-final node, and the remaining flows continue until the first reaches the activity-final node. Here, the processing continues and the activity ends when the processing ends. This is exactly what you want.
But if once, the execution would deviate from this assumption and logging would get delayed for any reason, then the end of the processing would reach the activity-final node, resulting in the immediate interruption of all other ongoing activities. So logging would not complete. Maybe it’s not a problem for you, but in most cases audit expects logs to be complete.
You may be interested in a safer way that would be to add a join node:
The advantage is that the activity does not depend on any assumptions. It will always work:
whenever the logging is faster, the token on that flow will wait at the join node, and as soon as process is finished the activity (safely) the join can happen and the outgoing token reaches the end. This is exactly what you currently expect.
if the logging is exceptionally slower, no problem: the processing will be over, but the activity will wait for the logging to be completed.
This robust notation makes logging like Schroedinger's cat in its box: we don't have to know what activity is longer or shorter. At the end of the activity, both actions are completed.
Time in activity diagrams?
Activity diagrams are not really meant to express timing and duration. It's about the flow of control and the synchronization.
However, if time is important to you, you could:
visually make one activity shorter than the other. This is super-ambiguous and absolute meaningless from a formal UML point of view. But it's intuitive when readers see the parallel flow (a kind of sublminal communication ;-) ) .
add a comment note to express your assumption in plain English. This has the advantage of being very clear an unambiguous.
using UML duration constraints. This is often used in timing diagram, sometimes in sequence diagrams, but in general not in activity diagrams (personally I have never seen it, but UML specs doesn't exclude it either).
Time is something very general in the UML specs, and defined independently of the diagram. For example:
8.4.4.2: A Duration is a value of relative time given in an implementation specific textual format. Often a Duration is a non- negative integer expression representing the number of “time ticks” which may elapse during this duration.
8.5.1: An Interval is a range between two values, primarily for use in Constraints that assert that some other Element has a value in the given range. Intervals can be defined for any type of value, but they are especially useful for time and duration values as part of corresponding TimeConstraints and DurationConstraints.
In your case you have a duration observation for the processing (e.g. d), and a duration constraint for the logging (e.g. 0..d).
8.5.4.2: An IntervalConstraint is shown as an annotation of its constrainedElement. The general notation for Constraints may be used for an IntervalConstraint, with the specification Interval denoted textually (...).
Unfortunately little more is said. The only graphical examples are for messages in sequence diagrams (Fig 8.5 and 17.5) and for timing diagrams (Fig 17.28 to 17.30). Nevertheless, the notation could be extrapolated for activity diagrams, but it would be so unusal that I'd rather recommend the comment note.
How a Priority Queue a Queue Data Structure. Since it doesn't follow FIFO, shouldn't it be named Priority Array or Priority Linked LIst majorly because Priority Queues don't follow a fashion like a FIFO queue
In a priority queue, an element with high priority is served before an element with low priority.
'If two elements have the same priority, they are served according to their order in the queue'
i think this will answer your question
If you look at most used implementations, priority queues are essentially heaps - they are arranged in a particular fashion based on priority defined by the programmer - in a simple example, ascending or descending order of integers.
Think of priority queue as a queue where rather than retrieving the elements based on when you add the element, you retrieve them based on how they compare with each other. This comparison can be simply ascending or descending order in your textbook examples. You can understand the ADT from an analogy from another StackOverflow answer:
You're running a hospital and patients are coming in. There's only one
doctor on staff. The first man walks in - and he's served immediately.
Next, a man with a cold comes in and requires assistance. You add him
to the queue and he waits in line for the doctor to become available.
Next, a man with an axe in his head comes through the door. He is
assigned a higher priority because he is a higher medical liability.
So the man with the cold is bumped down in line. Next, someone comes
in with breathing problems. So, once again, the man with the cold is
bumped down in priority. This is called trigaing in the real world -
but in this case it's a medical line.
Implementing this in code would use a priority queue and a worker
thread (the doctor) to perform work on the consumable / units of work
(the patients).
In real scenario, instead of patients, you might have processes waiting to be addressed by the CPU.
Read:
When would I use a priority queue?
In the queue, the natural
ordering given by how much time an element waits in a line can be considered the fairest. When you enter in a line waiting for something, first comes first served.
Sometimes, however, there is something special about some elements that
might suggest they should be served sooner than others that waited longer. For example, we don’t always read our emails in the order we received them, but often
you skip newsletters or “funny” jokes from friends to read work-related messages first.
Likewise, when you design an app or test an app, if there are some bugs, those bugs are prioritized and teams work on those bugs based on bugs severity. First, new bugs are discovered all the
time, and so new items will be added to the list. Say a nasty authentication bug is found—
you’d need to have it solved by yesterday! Moreover, priority for bugs can change over
time. For instance, your CEO might decide that you are going after the market share
that’s mostly using browser X, and you have a big feature launch next Friday, so you really need to solve that bug at the bottom within a couple of days.
Priority queues are especially useful when we need to consume elements in a certain order from a dynamically changing list (such as the list of tasks to run on a CPU), so that at any time we can get the next element (according to a certain criterion), remove it from the list, and (usually) stop worrying about fixing anything for
the other elements.
That’s the idea behind priority queues: they behave like regular, plain queues, except that the front of the queue is dynamically determined based on some kind of priority. The differences caused to the implementation by the introduction of priority are profound, enough to deserve a special kind of data structure.
I am designing a project management app for a factory. The app is expected to produce draft project plans. To schedule a task, the app should check three conditions:
task dependency - do not start before,
machine availability, and
shift work hours
I keep track of machine engagement in machine_allocations table:
machine_allocations
+------------+--------------+-----------------+---------------+
| machine_id | operation_id | start_timestamp | end_timestamp |
+------------+--------------+-----------------+---------------+
Shift hours follow a pattern.
Now, to find the earliest possible date-time for an operation I am thinking of a function:
function earliest_slot($machine_id, $for_duration, $no_sooner_than) {
// pseudo code
1. get records for the machine in question for after $no_sooner_than
2. put start and end timestamps into $unavailable array
3. add non-working times as new elements to the array
4. in a loop find timeslots which are not in the array
5. if a timeslot is found which is equal to or bigger than $for_duration, return that
}
My question is, is this a good approach? Are there simpler ways to do this?
Finding the earliest date-time for one operation at a time may not give you the best result. Consider the example where operation A uses machine 1 for a long time, operation B uses machine 1 for a short time and operation C uses machine 2 for a short time, but operation C must be done after B.
In this case, it is better to schedule B before A on machine 1, but your approach would not achieve this. Of course, writing and using software to manage this would be more difficult than what you have suggested, so you need to decide whether the benefit is worth the extra effort.
Have a look at Scheduling, Job Shop Scheduling and Scheduling algorithm.
First you need to think about what sort of information you can collect about tasks (such as dependencies, priorities, deadlines) and then decide how best to put it together.
You may find that an approach like you propose is good enough in your case. My addition to your proposed algorithm would be to sort the list of existing machine operations to make searching through them faster, that is you can stop as soon as you find a time where your operation fits because it's guaranteed to be the earliest time.
A relatively simple extension would be a priority system that allows you to bump lower-priority tasks forward (which may require the adjustment of their dependencies as well), but more complicated algorithms would consider multiple tasks at once and try to optimise the outcome. In the end it comes down to what's appropriate for your specific problem.
That depends when You want to plan work. If before starting work of machines then mayby a Branch&Bound algorithm, or something like it (mayby dynamic programming). If work have to planned when machines are working and You can not tell what jobs would be performed then for optimal solution You can not count (well I can't think about it). Mayby put next jobs on machine with smalles max time? Mayby a dynamic version of Ford-Bellmas alg (if you have couple layers of production). Hard to say.
I would do couple of approches and determine witch are best. The You can write an article about this :)
Lets say I am going to run process X and see how long it takes.
I am going to save into a database a date I ran this process, and the time it took. I want to know what to put into the DB.
Process X almost always runs under 1500ms, so this is a short process. It usually runs between 500 and 1500ms, quite a range (3x difference).
My question is, how many "runs" should be saved into the DB as a single run?
Every run saved into the DB as its
own row?
5 Runs, averaged, then save that
time?
10 Runs averaged?
20 Runs, remove anything more than 2
std deviations away, and save
everything inside that range?
Does anyone have any good info backing them up on this?
Save the data for every run into its own row. Then later you can use and analyze the data however you like... ie, all you the other options you listed can be performed after the fact. It's not really possible for someone else to draw meaningful conclusions about how to average/analyze the data without knowing more about what's going on.
The fastest run is the one that most accurately times only your code.
All slower runs are slower because of noise introduced by the operating system scheduler.
The variance you experience is going to differ from machine to machine, and even on identical machines, the set of runnable processes will introduce noise.
None of the above. Bran is close though. You should save every measurment. But don't average them. The average (arithmetic mean) can be very misleading in this type of analysis. The reason is that some of your measurments will be much longer than the others. This will happen becuse things can interfere with your process - even on 'clean' test systems. It can also happen becuse your process may not be as deterministic as you might thing.
Some people think that simply taking more samples (running more iterations) and averaging the measurmetns will give them better data. It doesn't. The more you run, the more likelty it is that you will encounter a perturbing event, thus making the average overly high.
A better way to do this is to run as many measurments as you can (time permitting). 100 is not a bad number, but 30-ish can be enough.
Then, sort these by magnitude and graph them. Note that this is not a standard distribution. Compute compute some simple statistics: mean, median, min, max, lower quaertile, upper quartile.
Contrary to some guidance, do not 'throw away' outside vaulues or 'outliers'. These are often the most intersting measurments. For example, you may establish a nice baseline, then look for departures. Understanding these departures will help you fully understand how your process works, how the sytsem affecdts your process, and what can interfere with your process. It will often readily expose bugs.
Depends what kind of data you want. I'd say one line per run initially, then analyze the data, go from there. Maybe store a min/max/average of X runs if you want to consolidate it.
http://en.wikipedia.org/wiki/Sample_size
Bryan is right - you need to investigate more. if your code has that much variance even "most" of the time then you might have a lot of fluctuation in your test environment because of other processes, os paging or other factors. If not it seems that you have code paths doing wildly varying amount of work and coming up with a single number/run data to describe the performance of such a multi-modal system is not going to tell you much. So i'd say isolate your setup as much as possible, run at least 30 trials and get a feel for what your performance curve looks like. Once you have that, you can use that wikipedia page to come up with a number that will tell you how many trials you need to run per code-change to see if the performance has increased/decreased with some level of statistical significance.
While saying, "Save every run," is nice, it might not be practical in your case. However, I do think that storing only the average eliminates too much data. I like storing the average of ten runs, but instead of storing just the average, I'd also store the max and min values, so that I can get a feel for the spread of the data in addition to its center.
The max and min information in particular will tell you how often corner cases arise. Is the 1500ms case a one-in-1000 outlier? Or is it something that recurs on a regular basis?
I was reading new materials ahead I came to know the term "Philosophers Synchronization Algorithm", but I could not understand it. Can anyone help me understand it what is it?
Thanks
It's just one of the many examples used to describe what can happen in a concurrent world in which you have many entities that can perform actions on shared objects without caring about each other.
The problem is simple: you have X philosophers arranged in a round table (with a fictional spaghetti dish each to be eaten) and X forks, one between every pair of philosophers.
The rules of the game impose that a philosopher needs two forks to be able to consume his spaghetti and the example shows how simply allowing any of them to try to eat without caring about anyone else can lead to
deadlocks: every philosopher takes his left fork and then they all wait for another fork but no selfish philosopher will drop his one, so they're gonna wait forever
starvation: there's no guarantee that any philosopher will eventually be able to eat (check wikipedia page for exact explaination of why)
livelocks: another classical example.. if a rule imposes to phils to try to get a second fork after getting the first one for max 5 minutes, then release the already acquired one you can have a situation in which all of them are exactly synched and they keep taking one fork and releasing it after time expires
In you question you clearly speak about an algorithm related to this problem (so I suppose an algorithm meant to solve the just described problems), wikipedia offers 4 of them here.