A company is planning a party for its employees. A fun rating is assigned to every employee.
The employees are organized into a strict hierarchy, i.e. a tree rooted the president. There is one
restriction, though, on the guest list to the party: an employee and his/her immediate supervisor
(parent in the tree) cannot both attend the party. You wish to prepare a guest list for the party that
maximizes the sum of fun ratings of the guests. Show that greedily choosing guests according to fun
rating, will not work. Then, formulate a dynamic programming solution
I could not understand some of the conditions like is the fun rate of the president higher than that of his descendants and how many employees are there for each of his supervisor. Can someone help me in proceeding with this ?
From the phrasing on the problem, the fun rating assigned to someone in the hierarchy tree is not necessarily greater than their descendants in the hierarchy tree.
However, even if this were the case, to see that it is not optimal to pick the best employee, consider a tree of height 2 with a root of fun=10 and 20 children of fun=1. Then the optimal solution is to skip the greedy choice (the root) and choose the 20 children.
In any case, with dynamic programming you can find the best solution even if parents can have lower fun than their descendants. For a node v in the tree, let F(v) be the maximum fun that can be attained within subtree rooted at v. Then either you choose v in which case the children are skipped and you look at all subtrees that are rooted at children of children (taking the sum of max fun over these subtrees, and adding to fun(v)), or you skip v and then you get the maximum fun is the sum of maximum fun over all subtrees rooted at children of v. This gives a linear time dynamic programming algorithm.
For greedy example show a simple counterexample.
1
|
1--3--1
|
1
Choosing greedily i.e. first selecting 3 won't allow us to select any other employee. But if we don't select 3, than max will be 4(all 1's). So greedy approach won't work.
For dynamic programming we can formulate problem as selecting a given root of subtree. If we select a node, than none of its children can be selected. However if we don't select the node, than we can either select the child or not select the child.
for all v initialize
c(v,true) = fun(v)
c(v,false)= 0
Than use the following recursion to solve the problem
weight(v,true) = for all children sum( weight(ci,false) ) + fun(i)
weight(v,false) = for all children sum( max(weight(ci,false), weight(ci,true)))
The answer will be max(weight(v,true),weight(v,false)) for root node.
I'm working on a problem from "Algorithm Design" by Kleinberg, specifically problem 4.15. I'm not currently enrolled in the class that this relates to -- I'm taking a crack at the problem set before the new quarter starts to see if I'd be able to do it. The question is as follows:
The manager of a large student union on campus comes to you with the
following problem. She’s in charge of a group of n students, each of whom
is scheduled to work one shift during the week. There are different jobs
associated with these shifts (tending the main desk, helping with package
delivery, rebooting cranky information kiosks, etc.), but.we can view each
shift as a single contiguous interval of time. There can be multiple shifts
going on at once.
She’s trying to choose a subset of these n students to form a super-
vising committee that she can meet with once a week. She considers such
a committee to be complete if, for every student not on the committee,
that student’s shift overlaps (at least partially) the shift of some student
who is on the committee. In this way, each student’s performance can be
observed by at least one person who’s serving on the committee.
Give an efficient algorithm that takes the schedule of n shifts and
produces a complete supervising committee containing as few students
as possible.
Example. Suppose n = 3, and the shifts are
Monday 4 p.M.-Monday 8 P.M.,
Monday 6 p.M.-Monday 10 P.M.,
Monday 9 P.M.-Monday 1I P.M..
Then the smallest complete supervising committee would consist of just
the second student, since the second shift overlaps both the first and the
third.
My attempt (I can't find this problem in my solution manual, so I'm asking here):
Construct a graph G with vertices S1, S2, ..., Sn for each student.
Let there be an edge between Si and Sj iff students i and j have an overlapping
shift. Let C represent the set of students in the supervising committee.
[O(n + 2m) to build an adjacency list, where m is the number of shifts?
Since we have to add at least each student to the adjacency list, and add an
additional m entries for each shift, with two entries added per shift since
our graph is undirected.]
Sort the vertices by degree into a list S [O(n log n)].
While S[0] has degree > 0:
(1) Add Si to C. [O(1)]
(2) Delete Si and all of the nodes that it was connected to, update the
adjacency list.
(3) Update S so that it is once again sorted.
Add any remaining vertices of degree 0 to C.
I'm not sure how to quantify the runtime of (2) and (3). Since the degree of any node is bounded by n, it seems that (2) is bounded by O(n). But the degree of the node removed in (1) also affects the number of iterations performed inside of the while loop, so I suspect that it's possible to say something about the upper bound of the whole while loop -- something to the effect of "Any sequence of deletions will involve deleting at most n nodes in linear time and resorting at most n nodes in linear time, resulting in an upper bound of O(n log n) for the while loop, and therefore of the algorithm as a whole."
You don't want to convert this to a general graph problem, as then it's simply the NP-hard vertex cover problem. However, on interval graphs in particular, there is in fact a linear-time greedy algorithm, as described in this paper (which is actually for a more general problem, but works fine here). From a quick read of it, here's how it applies to your problem:
Sort the students by the time at which their shift ends, from earliest to latest. Number them 1 through n.
Initialize a counter k = 1 which represents the earliest student in the ordering not in the committee.
Starting from k, find the first student in the order whose shift does not intersect student k's shift. Suppose this is student i. Add student i-1 to the committee, and update k to be the new earliest student not covered by the committee.
Repeat the previous step until all students are covered.
(This feels correct, but like I said I only had a quick read, so please say if I missed something)
I have the following problem. A user has a cart with N items in it. There is a quantity Q of each item. Further, there are P warehouses, and each of them has a certain stock level for each product (which may be 0). Distances between each warehouse and customer are also known. I need to find a set of warehouses that can accommodate the orders and satisfies the following constraints (ordered by decreasing priority):
It should contain a minimal number of warehouses
All warehouses should be as close to customer as it possible.
Any ideas are highly appreciated. Thanks!
UPD:
If one warehouse can't fulfill some line item completely, then it can be delivered by several different warehouses. E.g. we need 10 apples and we have 2 warehouses that have stock levels of 7 and 3. Then apples will be provided by these two warehouses (to provide 10 in total).
UPD 2
Number of available warehouses is nearly 15. So brute force won't help here.
This is solvable by integer programming.
Let items be indexed by i and warehouses be indexed by j. Let Qi be the quantity of item i in the cart and Sij be the quantity of item i at warehouse j and Dj be the distance from the customer to the warehouse j.
First find the minimum warehouse count k. Let binary variable xj be 1 if and only if warehouse j is involved in the order. k is the value of this program.
minimize sum over j of xj
subject to
for all i, (sum over j of min(Sij, Qi) * xj) >= Qi
for all j, xj in {0, 1}
Second find the closest warehouses. I'm going to assume that we want to minimize the sum of the distances.
minimize sum over j of Dj * xj
subject to
for all i, (sum over j of min(Sij, Qi) * xj) >= Qi
(sum over j of xj) <= k
for all j, xj in {0, 1}
There are many different libraries to solve integer programs, some free/open source. They typically accept programs in a format similar to but more restricted than the one I've presented here. You'll have to write some code yourself to expand the sums and universal quantifiers ("for all").
I would recommend to go with David Eisenstat's solution. If you'd like to understand more about the topic or need to implement an algorithm for solving integer programs yourself, I can recommend the following reference:
Chapter 9 from an MIT lecture on Applied Mathematical Programming gives a nice introduction into integer programming. On the third page, you find the warehouse location problem as an example of a problem solvable by integer programming. Note that the problem described there is slightly more general than the problem you described in your question: For your case, warehouses can be assumed to be always open (yi = 1), and the fixed operating cost fi of a warehouse is always fi = 0 in your case.
The rest of this chapter goes into the details of integer programming and also highlights various approaches to solve integer programs.
You may or may not like this, but I have warehousing and order fulfillment processing experience. My personal real life experience didn't require an algo but a series of warehouse and customer service back tools (hopefully this will be food for thought to you and others struggling in the warehousing operations development world):
If you have 10 items on the order.
You have 9 in stock
You have 5 in one location and 4 in the other.
You split the order. The 1 product that can't be fulfilled becomes a 'back order'. It can be cancelled because you don't know when you or if your supplier is going to deliver. Make sure you hang on to your credit card authorization references.
The 9 left over (fulfill-able products) in stock will be queried against your warehousing virtual inventory for the best combinations.
In our case we do three things:
Can the fulfillment staff at a warehouse X transfer in the item from another warehouse easily? Yes/No
If so which products can transfer.
This might require human interaction based on warehouse load and capabilities.
If you are strictly going on automation and virtual inventory that fluctuates day in and day out, then you give it your best guess against warehouse inventories.
Next, split the order to two, with references to the main order for paper trails.
You then print to your destinations and hope they can fulfill, if they can't, then hopefully they can partially fulfill the order and generate back order that can be cancelled at the customer's request.
So basically here is what you have to code for.
Order
First glance back order split and reference to main order.
Inventory warehouse feeler function.
Weighted split order based on virtual inventory with reference to main order based on warehouse capabilities to retrieve products from other warehouses.
Print pick page (warehouse function)
Back order or partial fulfillment manual functions (customer service tools)
Collect the money on only the stuff you fulfilled when marked as shipped.
Considerations:
Make sure the main order references the actions back order and splits.
Make sure the splits and partial fulfillment orders references any additional back order and splits.
Fullfill what you can
Mark a shipped.
Collect $$$ on the products that shipped.
Hope this helps and good luck!!!
Suppose there are N groups of people and M tables. We know the size of each group and the capacity of each table. How do we match the people to the tables such that no two persons of the same group sit at the same table?
Does a greedy approach work for this problem ? (The greedy approach works as follows: for each table try to "fill" it with people from different groups).
Assuming the groups and tables can be of unequal size, I don't think the greedy approach as described works (at least not without additional specifications). Suppose you have a table of 2 T1 and a table of 3 T2, and 3 groups {A1}, {B1,B2} and {C1,C2}. If I follow your algorithm, T1 will receive {A1,B1} and now you are left with T2 and {B2,C1,C2} which doesn't work. Yet there is a solution T1 {B1,C1}, T2 {A1,B2,C2}.
I suspect the following greedy approach works: starting with the largest group, take each group and allocate one person of that group per table, picking first tables with the most free seats.
Mathias:
I suspect the following greedy approach works: starting with the largest group, take each group and allocate one person of that group per table, picking first tables with the most free seats.
Indeed. And a small variation of tkleczek's argument proves it.
Suppose there is a solution. We have to prove that the algorithm finds a solution in this case.
This is vacuously true if the number of groups is 0.
For the induction step, we have to show that if there is any solution, there is one where one member of the largest group sits at each of the (size of largest group) largest tables.
Condition L: For all pairs (T1,T2) of tables, if T1 < T2 and a member of the largest group sits at T1, then another member of the largest group sits at T2.
Let S1 be a solution. If S1 fulfills L we're done. Otherwise there is a pair (T1,T2) of tables with T1 < T2 such that a member of the largest group sits at T1 but no member of the largest group sits at T2.
Since T2 > T1, there is a group which has a member sitting at T2, but none at T1 (or there is a free place at T2). So these two can swap seats (or the member of the largest group can move to the free place at T2) and we obtain a solution S2 with fewer pairs of tables violating L. Since there's only a finite number of tables, after finitely many steps we have found a solution Sk satisfying L.
Induction hypothesis: For all constellations of N groups and all numbers M of tables, if there is a solution, the algorithm will find a solution.
Now consider a constellation of (N+1) groups and M tables where a solution exists. By the above, there is also a solution where the members of the largest group are placed according to the algorithm. Place them so. This reduces the problem to a solvable constellation of N groups and M' tables, which is solved by the algorithm per the induction hypothesis.
The following greedy approach works:
Repeat the following steps until there is no seat left:
Pick the largest group and the largest table
Match one person from the chosen group to the chosen table
Reduce group size and table size by 1.
Proof:
We just have to prove that after performing one step we still can reach optimal solution.
Let's call any member of the largest group a cool guy.
Suppose that there is a different optimal solution in which no cool guy sits at the largest table. Let's pick any person sitting at the largest table in this solution and call it lame guy.
He must belong to the group of size no larger than the cool group. So there is another table at which sits a cool guy but no lame guy. We can than safely swap seats of the lame and cool guy which also results in an optimal solution.
I'm working on putting together a problem set for an intro-level CS course and came up with a question that, on the surface, seems very simple:
You are given a list of people with the names of their parents, their birth dates, and their death dates. You are interested in finding out who, at some point in their lifetime, was a parent, a grandparent, a great-grandparent, etc. Devise an algorithm to label each person with this information as an integer (0 means the person never had a child, 1 means that the person was a parent, 2 means that the person was a grandparent, etc.)
For simplicity, you can assume that the family graph is a DAG whose undirected version is a tree.
The interesting challenge here is that you can't just look at the shape of the tree to determine this information. For example, I have 8 great-great-grandparents, but since none of them were alive when I was born, in their lifetimes none of them were great-great-grandparents.
The best algorithm I can come up with for this problem runs in time O(n2), where n is the number of people. The idea is simple - start a DFS from each person, finding the furthest descendant down in the family tree that was born before that person's death date. However, I'm pretty sure that this is not the optimal solution to the problem. For example, if the graph is just two parents and their n children, then the problem can be solved trivially in O(n). What I'm hoping for is some algorithm that is either beats O(n2) or whose runtime is parameterized over the shape of the graph that makes it fast for wide graphs with a graceful degradation to O(n2) in the worst-case.
Update: This is not the best solution I have come up with, but I've left it because there are so many comments relating to it.
You have a set of events (birth/death), parental state (no descendants, parent, grandparent, etc) and life state (alive, dead).
I would store my data in structures with the following fields:
mother
father
generations
is_alive
may_have_living_ancestor
Sort your events by date, and then for each event take one of the following two courses of logic:
Birth:
Create new person with a mother, father, 0 generations, who is alive and may
have a living ancestor.
For each parent:
If generations increased, then recursively increase generations for
all living ancestors whose generations increased. While doing that,
set the may_have_living_ancestor flag to false for anyone for whom it is
discovered that they have no living ancestors. (You only iterate into
a person's ancestors if you increased their generations, and if they
still could have living ancestors.)
Death:
Emit the person's name and generations.
Set their is_alive flag to false.
The worst case is O(n*n) if everyone has a lot of living ancestors. However in general you've got the sorting preprocessing step which is O(n log(n)) and then you're O(n * avg no of living ancestors) which means that the total time tends to be O(n log(n)) in most populations. (I hadn't counted the sorting prestep properly, thanks to #Alexey Kukanov for the correction.)
I thought of this this morning, then found that #Alexey Kukanov had similar thoughts. But mine is more fleshed out and has some more optimization, so I'll post it anyways.
This algorithm is O(n * (1 + generations)), and will work for any dataset. For realistic data this is O(n).
Run through all records and generate objects representing people which include date of birth, links to parents, and links to children, and several more uninitialized fields. (Time of last death between self and ancestors, and an array of dates that they had 0, 1, 2, ... surviving generations.)
Go through all people and recursively find and store the time of last death. If you call the person again, return the memoized record. For each person you can encounter the person (needing to calculate it), and can generate 2 more calls to each parent the first time you calculate it. This gives a total of O(n) work to initialize this data.
Go through all people and recursively generate a record of when they first added a generation. These records only need go to the maximum of when the person or their last ancestor died. It is O(1) to calculate when you had 0 generations. Then for each recursive call to a child you need to do O(generations) work to merge that child's data in to yours. Each person gets called when you encounter them in the data structure, and can be called once from each parent for O(n) calls and total expense O(n * (generations + 1)).
Go through all people and figure out how many generations were alive at their death. This is again O(n * (generations + 1)) if implemented with a linear scan.
The sum total of all of these operations is O(n * (generations + 1)).
For realistic data sets, this will be O(n) with a fairly small constant.
My suggestion:
additionally to the values described in the problem statement, each personal record will have two fields: child counter and a dynamically growing vector (in C++/STL sense) which will keep the earliest birthday in each generation of a person's descendants.
use a hash table to store the data, with the person name being the key. The time to build it is linear (assuming a good hash function, the map has amortized constant time for inserts and finds).
for each person, detect and save the number of children. It's also done in linear time: for each personal record, find the record for its parents and increment their counters. This step can be combined with the previous one: if a record for a parent is not found, it is created and added, while details (dates etc) will be added when found in the input.
traverse the map, and put references to all personal records with no children into a queue. Still O(N).
for each element taken out of the queue:
add the birthday of this person into descendant_birthday[0] for both parents (grow that vector if necessary). If this field is already set, change it only if the new date is earlier.
For all descendant_birthday[i] dates available in the vector of the current record, follow the same rule as above to update descendant_birthday[i+1] in parents' records.
decrement parents' child counters; if it reaches 0, add the corresponding parent's record into the queue.
the cost of this step is O(C*N), with C being the biggest value of "family depth" for the given input (i.e. the size of the longest descendant_birthday vector). For realistic data it can be capped by some reasonable constant without correctness loss (as others already pointed out), and so does not depend on N.
traverse the map one more time, and "label each person" with the biggest i for which descendant_birthday[i] is still earlier than the death date; also O(C*N).
Thus for realistic data the solution for the problem can be found in linear time. Though for contrived data like suggested in #btilly's comment, C can be big, and even of the order of N in degenerate cases. It can be resolved either by putting a cap on the vector size or by extending the algorithm with step 2 of #btilly's solution.
A hash table is key part of the solution in case if parent-child relations in the input data are provided through names (as written in the problem statement). Without hashes, it would require O(N log N) to build a relation graph. Most other suggested solutions seem to assume that the relationship graph already exists.
Create a list of people, sorted by birth_date. Create another list of people, sorted by death_date. You can travel logically through time, popping people from these lists, in order to get a list of the events as they happened.
For each Person, define an is_alive field. This'll be FALSE for everyone at first. As people are born and die, update this record accordingly.
Define another field for each person, called has_a_living_ancestor, initialized to FALSE for everyone at first. At birth, x.has_a_living_ancestor will be set to x.mother.is_alive || x.mother.has_a_living_ancestor || x.father.is_alive || x.father.has_a_living_ancestor. So, for most people (but not everyone), this will be set to TRUE at birth.
The challenge is to identify occasions when has_a_living_ancestor can be set to FALSE. Each time a person is born, we do a DFS up through the ancestors, but only those ancestors for which ancestor.has_a_living_ancestor || ancestor.is_alive is true.
During that DFS, if we find an ancestor that has no living ancestors, and is now dead, then we can set has_a_living_ancestor to FALSE. This does mean, I think, that sometimes has_a_living_ancestor will be out of date, but it will hopefully be caught quickly.
The following is an O(n log n) algorithm that work for graphs in which each child has at most one parent (EDIT: this algorithm does not extend to the two-parent case with O(n log n) performance). It is worth noting that I believe the performance can be improved to O(n log(max level label)) with extra work.
One parent case:
For each node x, in reverse topological order, create a binary search tree T_x that is strictly increasing both in date of birth and in number of generations removed from x. (T_x contains the first born child c1 in the subgraph of the ancestry graph rooted at x, along with the next earliest born child c2 in this subgraph such that c2's 'great grandparent level' is a strictly greater than that of c1, along with the next earliest born child c3 in this subgraph such that c3's level is strictly greater than that of c2, etc.) To create T_x, we merge the previously-constructed trees T_w where w is a child of x (they are previously-constructed because we are iterating in reverse topological order).
If we are careful with how we perform the merges, we can show that the total cost of such merges is O(n log n) for the entire ancestry graph. The key idea is to note that after each merge, at most one node of each level survives in the merged tree. We associate with each tree T_w a potential of h(w) log n, where h(w) is equal to the length of the longest path from w to a leaf.
When we merge the child trees T_w to create T_x, we 'destroy' all of the trees T_w, releasing all of the potential that they store for use in building the tree T_x; and we create a new tree T_x with (log n)(h(x)) potential. Thus, our goal is to spend at most O((log n)(sum_w(h(w)) - h(x) + constant)) time to create T_x from the trees T_w so that the amortized cost of the merge will be only O(log n). This can be achieved by choosing the tree T_w such that h(w) is maximal as a starting point for T_x and then modifying T_w to create T_x. After such a choice is made for T_x, we merge each of the other trees, one by one, into T_x with an algorithm that is similar to the standard algorithm for merging two binary search trees.
Essentially, the merging is accomplished by iterating over each node y in T_w, searching for y's predecessor z by birth date, and then inserting y into T_x if it is more levels removed from x than z; then, if z was inserted into T_x, we search for the node in T_x of the lowest level that is strictly greater than z's level, and splice out the intervening nodes to maintain the invariant that T_x is ordered strictly both by birth date and level. This costs O(log n) for each node in T_w, and there are at most O(h(w)) nodes in T_w, so the total cost of merging all trees is O((log n)(sum_w(h(w))), summing over all children w except for the child w' such that h(w') is maximal.
We store the level associated with each element of T_x in an auxiliary field of each node in the tree. We need this value so that we can figure out the actual level of x once we've constructed T_x. (As a technical detail, we actually store the difference of each node's level with that of its parent in T_x so that we can quickly increment the values for all nodes in the tree. This is a standard BST trick.)
That's it. We simply note that the initial potential is 0 and the final potential is positive so the sum of the amortized bounds is an upper bound on the total cost of all merges across the entire tree. We find the label of each node x once we create the BST T_x by binary searching for the latest element in T_x that was born before x died at cost O(log n).
To improve the bound to O(n log(max level label)), you can lazily merge the trees, only merging the first few elements of the tree as necessary to provide the solution for the current node. If you use a BST that exploits locality of reference, such as a splay tree, then you can achieve the above bound.
Hopefully, the above algorithm and analysis is at least clear enough to follow. Just comment if you need any clarification.
I have a hunch that obtaining for each person a mapping (generation -> date the first descendant in that generation is born) would help.
Since the dates must be strictly increasing, we would be able to use use binary search (or a neat datastructure) to find the most distant living descendant in O(log n) time.
The problem is that merging these lists (at least naively) is O(number of generations) so this could get to be O(n^2) in the worst case (consider A and B are parents of C and D, who are parents of E and F...).
I still have to work out how the best case works and try to identify the worst cases better (and see if there is a workaround for them)
We recently implemented relationship module in one of our project in which we had everything in database and yes I think algorithm was best 2nO(m) (m is max branch factor). I multiplied operations twice to N because in first round we create relationship graph and in second round we visit every Person. We have stored bidirectional relationship between every two nodes. While navigating, we only use one direction to travel. But we have two set of operations, one traverse only children, other traverse only parent.
Person{
String Name;
// all relations where
// this is FromPerson
Relation[] FromRelations;
// all relations where
// this is ToPerson
Relation[] ToRelations;
DateTime birthDate;
DateTime? deathDate;
}
Relation
{
Person FromPerson;
Person ToPerson;
RelationType Type;
}
enum RelationType
{
Father,
Son,
Daughter,
Mother
}
This kind of looks like bidirectional graph. But in this case, first you build list of all Person, and then you can build list relations and setup FromRelations and ToRelations between each node. Then all you have to do is, for every Person, you have to only navigate ToRelations of type (Son,Daughter) only. And since you have date, you can calculate everything.
I dont have time to check correctness of the code, but this will give you idea of how to do it.
void LabelPerson(Person p){
int n = GetLevelOfChildren(p, p.birthDate, p.deathDate);
// label based on n...
}
int GetLevelOfChildren(Person p, DateTime bd, DateTime? ed){
List<int> depths = new List<int>();
foreach(Relation r in p.ToRelations.Where(
x=>x.Type == Son || x.Type == Daughter))
{
Person child = r.ToPerson;
if(ed!=null && child.birthDate <= ed.Value){
depths.Add( 1 + GetLevelOfChildren( child, bd, ed));
}else
{
depths.Add( 1 + GetLevelOfChildren( child, bd, ed));
}
}
if(depths.Count==0)
return 0;
return depths.Max();
}
Here's my stab:
class Person
{
Person [] Parents;
string Name;
DateTime DOB;
DateTime DOD;
int Generations = 0;
void Increase(Datetime dob, int generations)
{
// current person is alive when caller was born
if (dob < DOD)
Generations = Math.Max(Generations, generations)
foreach (Person p in Parents)
p.Increase(dob, generations + 1);
}
void Calculate()
{
foreach (Person p in Parents)
p.Increase(DOB, 1);
}
}
// run for everyone
Person [] people = InitializeList(); // create objects from information
foreach (Person p in people)
p.Calculate();
There's a relatively straightforward O(n log n) algorithm that sweeps the events chronologically with the help of a suitable top tree.
You really shouldn't assign homework that you can't solve yourself.