Algorithm to process jobs with same priority - algorithm

I am solving exercise problems from a book called Algorithms by Papadimitrou and Vazirani.
The following is the question:
A server has n customers waiting to be served. The service time required by each customer is known in advance: it is ti minutes for customer i. So if, for example, the customers are served in order of increasing i, then the ith customer has to wait for Sum(j = 1 to n) tj minutes.
We wish to minimize the total waiting time. Give an efficient algorithm for the same.
My Attempt:
I thought of a couple of approaches but couldnt decide which is best or any other approach that beats mine.
Approach 1:
Serve them in Round Robin fashion with time slice as 5. However, when i need to be more careful when deciding the time slice. It shouldnt be too high or low. So, i thought of selecting the time slice as the average of serving times.
Approach 2:
Assume jobs are sorted according to the time they take and are stored in an array A[1...n]
First serve A[1] then A[n] then A[2] then A[n-1] and so on.
I cant really decide which will be a more optimal solution for this problem. Am i missing something.
Thanks,
Chander

You can solve this problem by adding the sorting part and improvising on your Round robin approach,
First sort the customers based on service time
Now instead of just giving each customer a time slice t in round robin manner, you can also check if the customer has less than t/2 remaining time, if so complete his task
So
for each customer in sorted list from first
server customer for time t
if his remaining time is < t/2 then complete his service now
else move to next customer

Let me assume the "total waiting time" is the sum of the time each customer waits before the server finish serving him/her, and assume the customers are served in order of increasing i, so customer C1 waits t1 minutes, customer C2 waits t1+t2 minutes, and customer C3 waits t1+t2+t3 minutes, and ... customer Cn waits t1+t2+...+t{n-1}+tn minutes.
or:
C1 waits: t1
C2 waits: t1+t2
C3 waits: t1+t2+t3
...
Cn waits: t1+t2+t3+...tn
The total waiting time adds up to n*t1+(n-1)*t2+...1*tn
Again, this is based on the assumption that the customers are served in order of increasing i.
Now, which customer do you want to server first?

Related

Operations on Two Streams of Data - Design Algorithm

I have seen this algorithm question or variants of it several times but have not found or been able to determine an optimal solution to the problem. The question is:
You are given two queues where each queue contains {timestamp, price}
pair. You have to print "price 1, price 2" pair for all those
timestamps where abs(ts1-ts2) <= 1 second where ts1 and price1 are
from the first queue and ts2 and price2 are from the second queue.
How would you design a system to handle these requirements?
Then a followup on this questions: what if one of the queues is slower than the other (data is delayed). How would you handle this?
You can do this in a similar fashion to the merging algorithm from merge sort, only doubled.
I'm going to describe an algorithm in which I choose queue #1 to be my "main queue." This will only provide a partial solution; I'll explain how to complete it afterwards.
At any time you keep one entry from each queue in memory. Whenever the two entries you have uphold your condition of being less than one second apart, print out their prices. Whether or not you did, you discard the one with the lower time stamp and get the next one. If at any point the time stamp from queue #1 is lower than that from queue #2, discard entries from queue #1 until that is no longer the case. If they both have the same time stamp, print it out and advance the one from queue #1. Repeat until done.
This will print out all the pairs of "price1, price2" whose corresponding ts1 and ts2 uphold that 0 <= ts1 - ts2 <= 1.
Now, for the other half, do the same only this time choose queue #2 as your "main queue" (i.e. do everything I just said with the numbers 1 and 2 reversed) - except don't print out pairs with equal time stamps, since you've already printed those in the first part.
This will print out all the pairs of "price1, price2" whose corresponding ts1 and ts2 uphold that 0 < ts2 - ts1 <= 1, which is like saying 0 > ts1 - ts2 >= -1.
Together you get the printout for all the cases in which -1 <= ts1 - ts2 <= 1, i.e. in which abs(ts1 - ts2) <= 1.
Additionally to the queues use two hashmaps (each exclusive for one queue)
As soon as a new item arrives strip the seconds out and use this as the key of the corresponding hashmap.
Using the very same key, retrieve all the items in the other hashmap.
One by one compare if the actual retrieved items are 1 second away of the item in bullet 2.
Note that this will fail to detect items with a difference in minutes: 10:00:59 and 10:01:00 will not be detected.
To solve this:
for items like XX:XX:59 you will need to hit the hashmap twice using keys XX:XX and XX:XX+1.
for items like XX:XX:00 you will need to hit the hashmap twice using keys XX:XX and XX:XX-1.
Note: do a date addition (not a mathematical one) since it will automatically deal with things like 01:59:59 + 1 = 02:00:00 or Monday 1 23:59:59 becoming Tuesday 2 00:00:00.
BTW, this algorithm also deals with the delay issue.
The speed of the queues does not matter at all if the algorithm is based on the comparison of timestamps alone. If one queue is empty and you cannot proceed just check periodically until you can continue.
You can solve this by managing a list for one of the queues. In the algorithm below the first was chosen, therefore it is called l1. It works like a sliding window.
Dequeue the 2nd queue: d2.
While the timestamp of the head of l1 is smaller than the one of d2 and the difference is greater than 1: remove the head from l1.
Go through the list and print all the pairs l1[i].price, d2.price as long as the difference of the timestamps is smalller than 1. If you don't reach the end of the list, continue with step 1.
Get the next element from the first queue and add it to the list. If the difference between the timestamps is smaller than 1 print the prices and repeat, if not continue with step 1.
here is my solution, you need following services.
Design a service to read the message from Queue1 and push the data to DB
Design another service to read the message from Queue2 and push the data to same DB.
Design another service to read the data from DB and print the result as per the frequency of result needed.
Edit
above specified system is designed ,keepin below point in mind
Scalablity - if load for system increases number of services can be scale up
Slowness as already mention one queu is slow then other , chances are ,first queue recieving more message then second ,hence not able to produce desired out put.
Otput Frequencey in future if requirement changes and instead of 1 sec difference we want to show 1 hour diffference ,then it is also very much possible.
Get the first element from both queues.
Compare the timestamps. If within one second, output the pair.
From the queue that gave the earlier timestamp, get the next element.
Repeat.
EDIT:
After #maraca's comment, I had to rethink my algorithm. And yes, if on both queues there are multiple events within a second, it will not produce all combinations.

Algorithm needed - benelux contest 2007

This question (last one) appeared in Benelux Algorithm Programming Contest-2007
http://www.cs.duke.edu/courses/cps149s/spring08/problems/bapc07/allprobs.pdf
Problem Statement in short:
A Company needs to figure out strategy when to - buy OR sell OR no-op on a given input so as to maximise profit. Input is in the form:
6
4 4 2
2 9 3
....
....
It means input is given for 6 days.
Day 1: You get 4 shares, each with price 4$ and at-max you can sell 2 of them
Day 2: You get 2 shares, each with price 9$ and at-max you can sell 3 of them
.
We need to output the maximum profit which can be achieved.
I m thinking about how to go for this problem. It seems to me that if we apply brute force, it will take too much time. If this can be converted to some DP problem like 0-1 Knapsack? Some help will be highly appreciated.
it can be solved by DP
suppose there are n days, and the total number of stock shares is m
let f[i][j] means, at the ith day, with j shares remaining, the maximum profit is f[i][j]
obviously, f[i][j]=maximum(f[i-1][j+k]+k*price_per_day[i]), 0<=k<=maximum_shares_sell_per_day[i]
it can be further optimized that, since f[i][...] only depends on f[i-1][...], a rolling array can be used here. hence u need only to define f[2][m] to save space.
total time complexity is O(n*m*maximum_shares_sell_per_day).
perhaps it can be further optimized to save time. any feedback is welcome
Your description does not quite match the last problem in the PDF - in the PDF you receive the number of shares specified in the first column (or are forced to buy them - since there is no decision to make it does not matter) and can only decide on how many shares to sell. Since it does not say otherwise I presume that short selling is not allowed (else ignore everything except the price and go make so much money on the derivatives market that you afford to both bribe the SEC or congress and retire :-)).
This looks like a dynamic program, where the state at each point in time is the total number of shares you have in hand. So at time n you have an array with one element for each possible number of shares you might have ended up with at that time, and in that element you have the maximum amount of money you can make up to then while ending up with that number of shares. From this you can work out the same information for time n+1. When you reach the end, then all your shares are worthless so the best answer is the one associated with the maximum amount of money.
We can't do better than selling the maximum amount of shares we can on the day with the highest price, so I was thinking: (this may be somewhat difficult to implement (efficiently))
It may be a good idea to calculate the total number of shares received so far for each day to improve the efficiency of the algorithm.
Process the days in decreasing order of price.
For a day, sell amount = min(daily sell limit, shares available) (for the max price day (the first processed day), shares available = shares received to date).
For all subsequent days, shares available -= sell amount. For preceding days, we binary search for (shares available - shares sold) and all entries between that and the day just processed = 0.
We might not need to physically set the values (at least not at every step), just calculate them on-the-fly from the history thus-far (I'm thinking interval tree or something similar).

load balancing algorithms - special example

Let´s pretend i have two buildings where i can build different units in.
A building can only build one unit at the same time but has a fifo-queue of max 5 units, which will be built in sequence.
Every unit has a build-time.
I need to know, what´s the fastest solution to get my units as fast as possible, considering the units already in the build-queues of my buildings.
"Famous" algorithms like RoundRobin doesn´t work here, i think.
Are there any algorithms, which can solve this problem?
This reminds me a bit of starcraft :D
I would just add an integer to the building queue which represents the time it is busy.
Of course you have to update this variable once per timeunit. (Timeunits are "s" here, for seconds)
So let's say we have a building and we are submitting 3 units, each take 5s to complete. Which will sum up to 15s total. We are in time = 0.
Then we have another building where we are submitting 2 units that need 6 timeunits to complete each.
So we can have a table like this:
Time 0
Building 1, 3 units, 15s to complete.
Building 2, 2 units, 12s to complete.
Time 1
Building 1, 3 units, 14s to complete.
Building 2, 2 units, 12s to complete.
And we want to add another unit that takes 2s, we can simply loop through the selected buildings and pick the one with the lowest time to complete.
In this case this would be building 2. This would lead to Time2...
Time 2
Building 1, 3 units, 13s to complete
Building 2, 3 units, 11s+2s=13s to complete
...
Time 5
Building 1, 2 units, 10s to complete (5s are over, the first unit pops out)
Building 2, 3 units, 10s to complete
And so on.
Of course you have to take care of the upper boundaries in your production facilities. Like if a building has 5 elements, don't assign something and pick the next building that has the lowest time to complete.
I don't know if you can implement this easily with your engine, or if it even support some kind of timeunits.
This will just result in updating all production facilities once per timeunit, O(n) where n is the number of buildings that can produce something. If you are submitting a unit this will take O(1) assuming that you keep the selected buildings in a sorted order, lowest first - so just a first element lookup. In this case you have to resort the list after manipulating the units like cancelling or adding.
Otherwise amit's answer seem to be possible, too.
This is NPC problem (proof at the end of the answer) so your best hope to find ideal solution is trying all possibilities (this will be 2^n possibilities, where n is the number of tasks).
possible heuristic was suggested in comment (and improved in comments by AShelly): sort the tasks from biggest to smallest, and put them in one queue, every task can now take element from the queue when done.
this is of course not always optimal, but I think will get good results for most cases.
proof that the problem is NPC:
let S={u|u is a unit need to be produced}. (S is the set containing all 'tasks')
claim: if there is a possible prefect split (both queues finish at the same time) it is optimal. let this time be HalfTime
this is true because if there was different optimal, at least one of the queues had to finish at t>HalfTime, and thus it is not optimal.
proof:
assume we had an algorithm A to produce the best solution at polynomial time, then we could solve the partition problem at polynomial time by the following algorithm:
1. run A on input
2. if the 2 queues finish exactly at HalfTIme - return True.
3. else: return False
this solution solves the partition problem because of the claim: if the partition exist, it will be returned by A, since it is optimal. all steps 1,2,3 run at polynomial time (1 for the assumption, 2 and 3 are trivial). so the algorithm we suggested solves partition problem at polynomial time. thus, our problem is NPC
Q.E.D.
Here's a simple scheme:
Let U be the list of units you want to build, and F be the set of factories that can build them. For each factory, track total time-til-complete; i.e. How long until the queue is completely empty.
Sort U by decreasing time-to-build. Maintain sort order when inserting new items
At the start, or at the end of any time tick after a factory completes a unit runs out of work:
Make a ready list of all the factories with space in the queue
Sort the ready list by increasing time-til-complete
Get the factory that will be done soonest
take the first item from U, add it to thact factory
Repeat until U is empty or all queues are full.
Googling "minimum makespan" may give you some leads into other solutions. This CMU lecture has a nice overview.
It turns out that if you know the set of work ahead of time, this problem is exactly Multiprocessor_scheduling, which is NP-Complete. Apparently the algorithm I suggested is called "Longest Processing Time", and it will always give a result no longer than 4/3 of the optimal time.
If you don't know the jobs ahead of time, it is a case of online Job-Shop Scheduling
The paper "The Power of Reordering for Online Minimum Makespan Scheduling" says
for many problems, including minimum
makespan scheduling, it is reasonable
to not only provide a lookahead to a
certain number of future jobs, but
additionally to allow the algorithm to
choose one of these jobs for
processing next and, therefore, to
reorder the input sequence.
Because you have a FIFO on each of your factories, you essentially do have the ability to buffer the incoming jobs, because you can hold them until a factory is completely idle, instead of trying to keeping all the FIFOs full at all times.
If I understand the paper correctly, the upshot of the scheme is to
Keep a fixed size buffer of incoming
jobs. In general, the bigger the
buffer, the closer to ideal
scheduling you get.
Assign a weight w to each factory according to
a given formula, which depends on
buffer size. In the case where
buffer size = number factories +1, use weights of (2/3,1/3) for 2 factories; (5/11,4/11,2/11) for 3.
Once the buffer is full, whenever a new job arrives, you remove the job with the least time to build and assign it to a factory with a time-to-complete < w*T where T is total time-to-complete of all factories.
If there are no more incoming jobs, schedule the remainder of jobs in U using the first algorithm I gave.
The main problem in applying this to your situation is that you don't know when (if ever) that there will be no more incoming jobs. But perhaps just replacing that condition with "if any factory is completely idle", and then restarting will give decent results.

Algorithm for the allocation of work with dynamic programming

The problem is this:
Need to perform n jobs, each characterized by a gain {v1, v2,. . . , vn}, a time required for its implementation {t1, t2,. . . , tn} and a deadline for its implementation {d1, d2,. . . , dn} with d1<=d2<=.....<=d3. Knowing that the gain occurs only if the work is done by that time and that you have a single machine. Must describe an algorithm that computes the maximum gain that is possible to obtain.
I had thought of a recurrence equation with two parameters, one indicating the i-th job and the other shows the moment in which we are implementing : OPT(i,d) , If d+t_i <= d then adds the gain t_i. (then a variant of multiway choice ..that is min for 1<=i<=n).
My main problem is: how can I find jobs that previously were carried out? I use a data structure of support?
As you would have written the equation of recurrence?
thanks you!!!!
My main problem is: how can I find jobs that previously were carried out? I use a data structure of support?
The trick is, you don't need to know what jobs are completed already. Because you can execute them in the order of increasing deadline.
Let's say, some optimal solution (yielding maximum profit) requirers you to complete job A (deadline 10) and then job B (deadline 3). But in this case you can safely swap A and B. They both will still be completed in time and new arrangement will yield the same total profit.
End of proof.
As you would have written the equation of recurrence?
You already have general idea, but you don't need a loop (min for 1<=i<=n).
max_profit(current_job, start_time)
// skip this job
result1 = max_profit(current_job + 1, start_time)
// start doing this job now
finish_time = start_time + T[current_job]
if finish_time <= D[current_job]
// only if we can finish it before deadline
result2 = max_profit(current_job + 1, finish_time) + V[current_job];
end
return max(result1, result2);
end
Converting it to DP should be trivial.
If you don't want O(n*max_deadline) complexity (e.g., when d and t values are big), you can resort to recursion with memoization and store results in a hash-table instead of two-dimensional array.
edit
If all jobs must be performed, but not all will be paid for, the problem stays the same. Just push jobs you don't have time for (jobs you can't finish before deadline) to the end. That's all.
First of all I would pick the items with the biggest yield. Meaning the jobs that have the
biggest rate in value/time that can match their deadline (if now+t1 exceeds d1 then it is bogus). Afterwards I check the time between now+job_time and each deadline obtaining a "chace to finish" of each job. The jobs that will come first will be the jobs with biggest yield and lowest chance to finish. The idea is to squeeze the most valuable jobs.
CASES:
If a job with a yield of 5 needs 10 seconds to finish and it's deadline comes in 600 seconds and a job with the same yield needs 20 seconds to finish and it's deadline comes in 22 seconds, then I run the second one.
If a job with a yield of 10 needs 10 seconds to finish and it's deadline comes in 100 seconds while another job has a yield of 5 needs 10 seconds to finish and it's deadline comes in 100 seconds,I'll run the first one.
If their yield is identical and they take same time to finish while their deadline comes in 100 seconds,respectively 101 seconds, I'll run the first one as it wins more time.
.. so on and so forth..
Recursion in this case refers only to reordering the jobs by these parameters by "Yield" and "Chance to finish".
Remember to always increase "now" (+job_time)after inserting a job in the order.
Hope it answers.
I read the upper comments and understood that you are not looking for efficiency you are looking for completion, so that takes the yield out of the way and leaves us with just ordering by deadline. It's the classic problem done by
Divide et Impera Quicksort
http://en.wikipedia.org/wiki/Quicksort

Determining the chances of an event occurring when it hasn't occurred yet

A user visits my website at time t, and they may or may not click on a particular link I care about, if they do I record the fact that they clicked the link, and also the duration since t that they clicked it, call this d.
I need an algorithm that allows me to create a class like this:
class ClickProbabilityEstimate {
public void reportImpression(long id);
public void reportClick(long id);
public double estimateClickProbability(long id);
}
Every impression gets a unique id, and this is used when reporting a click to indicate which impression the click belongs to.
I need an algorithm that will return a probability, based on how much time has past since an impression was reported, that the impression will receive a click, based on how long previous clicks required. Clearly one would expect that this probability will decrease over time if there is still no click.
If necessary, we can set an upper-bound, beyond which we consider the click probability to be 0 (eg. if its been an hour since the impression occurred, we can be pretty sure there won't be a click).
The algorithm should be both space and time efficient, and hopefully make as few assumptions as possible, while being elegant. Ease of implementation would also be nice. Any ideas?
Assuming you keep data on past impressions and clicks, it's easy: let's say that you have an impression, and a time d' has passed since that impression. You can divide your data into three groups:
Impressions which received a click in less than d'
Impressions which received a click after more than d'
Impressions which never received a click
Clearly the current impression is not in group (1), so eliminate that. You want the probability it is in group (2), which is then
P = N2 / (N2 + N3)
where N2 is the number of impressions in group 2, and similarly for N3.
As far as actual implementation, my first thought would be to keep an ordered list of the times d for past impressions which did receive clicks, along with a count of the number of impressions which never received a click, and just do a binary search for d' in that list. The position you find will give you N1, and then N2 is the length of the list minus N1.
If you don't need perfect granularity, you can store the past times as a histogram instead, i.e. a list that contains, in each element list[n], the number of impressions that received a click after at least n but less than n+1 minutes. (Or seconds, or whatever time interval you like) In that case you'd probably want to keep the total number of clicks as a separate variable so you can easily compute N2.
(By the way, I just made this up, I don't know if there are standard algorithms for this sort of thing that may be better)
I would suggest hypothesizing an arrival process (clicks per minute) and trying to fit a distribution to that arrival process using your existing data. I'll bet the result is negative binomial which is what you get when you have a poisson arrival process with a non-stationary mean if the mean has a gamma distribution. The inverse (minutes per click) gives you the distribution of the interarrival process. Don't know if there's a distribution named for that, but you can create an empirical one.
Hope this helps.

Resources