Greedy Maximum Flow - algorithm

The Dining Problem:
Several families go out to dinner together. In order to increase their social interaction, they would like to sit at tables so that no two members of the same family are at the same table. Assume that the dinner contingent has p families and that the ith family has a(i) members. Also, assume there are q tables available and that the jth table has a seating capacity of b(j).
The question is:
What is the maximum number of persons we can sit on the tables?
EDIT:
This problem can be solved creating a Graph and running a maximum flow algorithm. But if we have 2*10^3 vertices with Dinic algorithm, the global complexity is O(10^6*10^6) = O(10^12).
If we only sit always the larger groups first, in a greedy manner. The complexity is O(10^6).
So my questions are:
1) Does the greedy approach in this problem work?
2) What is the best algorithm to solve this problem?

Yes, greedily seating the largest families first is a correct solution. We just need to prove that, after we seat the next largest family, there is a way to seat the remaining families correctly.
Suppose that an instance is solvable. We prove by induction that there exists a solution after the greedy algorithm seats the k largest families. The basis k = 0 is obvious, since the hypothesis to be proved is that there exists a solution. Inductively, suppose that there exists a solution that extends greedy's partial assignment for the first k - 1 families. Now greedy extends its partial assignment by seating the kth family. We edit the known solution to restore the inductive hypothesis.
While we still can, find a table T1 where greedy has seated a kth family member but the known solution has not. If there is space in the known solution at T1, move a kth family member from a table where greedy has none. Otherwise, the known solution has a family member not in the k largest families seated at T1. Since that family is smaller than the kth largest, a kth largest family member occupies a table T2 that the smaller family does not. Swap these members.

It is easy to come up with examples where such seating is simply impossible, so here's a pseudo code for solving the problem assuming that the problem is solvable:
Sort each family i by a(i) in decreasing order
Add each table j to a max-heap with b(j) as the key
For each family i from the sorted list:
Pop a(i) tables from max-heap
Add one member of i to each table
Add each table j back into the max-heap with b(j) = b(j) - 1
Let n = a(1) + a(2) + ... + a(p) (i.e. total number of people)
Assuming a binary heap is used for the max-heap, the time complexities are:
Sorting families: O(plog(p))
Initializing max-heap of tables: O(qlog(q))
All pops and pushes to/from max-heap: O(nlog(q))
Giving the total time complexity of O(plog(p) + qlog(q) + nlog(q)), where O(nlog(q)) will likely dominate.
Since we are dealing with integers, if we use a 1D bucket system for the max-heap such that c is the maximum b(j), then we will end up with just O(n + c) (assuming the max-heap operations dominate), which maybe quicker.
Finally, please up-vote David's answer as the proof was required and is awesome.

Related

Efficient divide-and-conquer algorithm

At a political event, introducing 2 people determines if they represent the same party or not.
Suppose more than half of the n attendees represent the same party. I'm trying to find an efficient algorithm that will identify the representatives of this party using as few introductions as possible.
A brute force solution will be to maintain two pointers over the array of attendees, introducing n attendees to n-1 other attendees in O(n2) time. I can't figure out how to improve on this.
Edit: Formally,
You are given an integer n. There is a hidden array A of size n, such that more than half of the values in A are the same. This array represents the party affiliation of each person.
You are allowed queries of the form introduce(i, j), where i≠j, and 1 <= i, j <= n, and you will get a boolean value in return: You will get back 1, if A[i] = A[j], and 0 otherwise.
Output: B ⊆ [1, 2. ... n] where |B| > n/2 and the A-value of every element in B is the same.
Hopefully this explains the problem better.
This can be done in O(n) introductions using the Boyer–Moore majority vote algorithm.
Consider some arbitrary ordering of the attendees: A_1, A_2, ..., A_n. In the algorithm, you maintain a 'stored element', denoted by m. Whenever the algorithm wants to check if the current element (x) is same as the stored element or not, you introduce those two people and increment or decrement the counter accordingly. The stored element at the end will be a member of the majority party. Then, you can do another pass over all the other n - 1 people, and introduce each of them to this known person and hence find all the members of the majority party.
Thus, the total number of introductions is O(n).

Knapsack with unique elements

I'm trying to solve the following:
The knapsack problem is as follows: given a set of integers S={s1,s2,…,sn}, and a given target number T, find a subset of S that adds up exactly to T. For example, within S={1,2,5,9,10} there is a subset that adds up to T=22 but not T=23. Give a correct programming algorithm for knapsack that runs in O(nT) time.
but the only algorithm I could come up with is generating all the 1 to N combinations and try the sum out (exponential time).
I can't devise a dynamic programming solution since the fact that I can't reuse an object makes this problem different from a coin rest exchange problem and from a general knapsack problem.
Can somebody help me out with this or at least give me a hint?
The O(nT) running time gives you the hint: do dynamic programming on two axes. That is, let f(a,b) denote the maximum sum <= b which can be achieved with the first a integers.
f satisfies the recurrence
f(a,b) = max( f(a-1,b), f(a-1,b-s_a)+s_a )
since the first value is the maximum without using s_a and the second is the maximum including s_a. From here the DP algorithm should be straightforward, as should outputting the correct subset of S.
I did find a solution but with O(T(n2)) time complexity. If we make a table from bottom to top. In other words If we sort the array and start with the greatest number available and make a table where columns are the target values and rows the provided number. We will need to consider the sum of all possible ways of making i- cost [j] +j . Which will take n^2 time. And this multiplied with target.

Variant Scheduling Algorithm

I'm working on a problem from "Algorithm Design" by Kleinberg, specifically problem 4.15. I'm not currently enrolled in the class that this relates to -- I'm taking a crack at the problem set before the new quarter starts to see if I'd be able to do it. The question is as follows:
The manager of a large student union on campus comes to you with the
following problem. She’s in charge of a group of n students, each of whom
is scheduled to work one shift during the week. There are different jobs
associated with these shifts (tending the main desk, helping with package
delivery, rebooting cranky information kiosks, etc.), but.we can view each
shift as a single contiguous interval of time. There can be multiple shifts
going on at once.
She’s trying to choose a subset of these n students to form a super-
vising committee that she can meet with once a week. She considers such
a committee to be complete if, for every student not on the committee,
that student’s shift overlaps (at least partially) the shift of some student
who is on the committee. In this way, each student’s performance can be
observed by at least one person who’s serving on the committee.
Give an efficient algorithm that takes the schedule of n shifts and
produces a complete supervising committee containing as few students
as possible.
Example. Suppose n = 3, and the shifts are
Monday 4 p.M.-Monday 8 P.M.,
Monday 6 p.M.-Monday 10 P.M.,
Monday 9 P.M.-Monday 1I P.M..
Then the smallest complete supervising committee would consist of just
the second student, since the second shift overlaps both the first and the
third.
My attempt (I can't find this problem in my solution manual, so I'm asking here):
Construct a graph G with vertices S1, S2, ..., Sn for each student.
Let there be an edge between Si and Sj iff students i and j have an overlapping
shift. Let C represent the set of students in the supervising committee.
[O(n + 2m) to build an adjacency list, where m is the number of shifts?
Since we have to add at least each student to the adjacency list, and add an
additional m entries for each shift, with two entries added per shift since
our graph is undirected.]
Sort the vertices by degree into a list S [O(n log n)].
While S[0] has degree > 0:
(1) Add Si to C. [O(1)]
(2) Delete Si and all of the nodes that it was connected to, update the
adjacency list.
(3) Update S so that it is once again sorted.
Add any remaining vertices of degree 0 to C.
I'm not sure how to quantify the runtime of (2) and (3). Since the degree of any node is bounded by n, it seems that (2) is bounded by O(n). But the degree of the node removed in (1) also affects the number of iterations performed inside of the while loop, so I suspect that it's possible to say something about the upper bound of the whole while loop -- something to the effect of "Any sequence of deletions will involve deleting at most n nodes in linear time and resorting at most n nodes in linear time, resulting in an upper bound of O(n log n) for the while loop, and therefore of the algorithm as a whole."
You don't want to convert this to a general graph problem, as then it's simply the NP-hard vertex cover problem. However, on interval graphs in particular, there is in fact a linear-time greedy algorithm, as described in this paper (which is actually for a more general problem, but works fine here). From a quick read of it, here's how it applies to your problem:
Sort the students by the time at which their shift ends, from earliest to latest. Number them 1 through n.
Initialize a counter k = 1 which represents the earliest student in the ordering not in the committee.
Starting from k, find the first student in the order whose shift does not intersect student k's shift. Suppose this is student i. Add student i-1 to the committee, and update k to be the new earliest student not covered by the committee.
Repeat the previous step until all students are covered.
(This feels correct, but like I said I only had a quick read, so please say if I missed something)

Does a greedy approach work here?

Suppose there are N groups of people and M tables. We know the size of each group and the capacity of each table. How do we match the people to the tables such that no two persons of the same group sit at the same table?
Does a greedy approach work for this problem ? (The greedy approach works as follows: for each table try to "fill" it with people from different groups).
Assuming the groups and tables can be of unequal size, I don't think the greedy approach as described works (at least not without additional specifications). Suppose you have a table of 2 T1 and a table of 3 T2, and 3 groups {A1}, {B1,B2} and {C1,C2}. If I follow your algorithm, T1 will receive {A1,B1} and now you are left with T2 and {B2,C1,C2} which doesn't work. Yet there is a solution T1 {B1,C1}, T2 {A1,B2,C2}.
I suspect the following greedy approach works: starting with the largest group, take each group and allocate one person of that group per table, picking first tables with the most free seats.
Mathias:
I suspect the following greedy approach works: starting with the largest group, take each group and allocate one person of that group per table, picking first tables with the most free seats.
Indeed. And a small variation of tkleczek's argument proves it.
Suppose there is a solution. We have to prove that the algorithm finds a solution in this case.
This is vacuously true if the number of groups is 0.
For the induction step, we have to show that if there is any solution, there is one where one member of the largest group sits at each of the (size of largest group) largest tables.
Condition L: For all pairs (T1,T2) of tables, if T1 < T2 and a member of the largest group sits at T1, then another member of the largest group sits at T2.
Let S1 be a solution. If S1 fulfills L we're done. Otherwise there is a pair (T1,T2) of tables with T1 < T2 such that a member of the largest group sits at T1 but no member of the largest group sits at T2.
Since T2 > T1, there is a group which has a member sitting at T2, but none at T1 (or there is a free place at T2). So these two can swap seats (or the member of the largest group can move to the free place at T2) and we obtain a solution S2 with fewer pairs of tables violating L. Since there's only a finite number of tables, after finitely many steps we have found a solution Sk satisfying L.
Induction hypothesis: For all constellations of N groups and all numbers M of tables, if there is a solution, the algorithm will find a solution.
Now consider a constellation of (N+1) groups and M tables where a solution exists. By the above, there is also a solution where the members of the largest group are placed according to the algorithm. Place them so. This reduces the problem to a solvable constellation of N groups and M' tables, which is solved by the algorithm per the induction hypothesis.
The following greedy approach works:
Repeat the following steps until there is no seat left:
Pick the largest group and the largest table
Match one person from the chosen group to the chosen table
Reduce group size and table size by 1.
Proof:
We just have to prove that after performing one step we still can reach optimal solution.
Let's call any member of the largest group a cool guy.
Suppose that there is a different optimal solution in which no cool guy sits at the largest table. Let's pick any person sitting at the largest table in this solution and call it lame guy.
He must belong to the group of size no larger than the cool group. So there is another table at which sits a cool guy but no lame guy. We can than safely swap seats of the lame and cool guy which also results in an optimal solution.

Revisit: 2D Array Sorted Along X and Y Axis

So, this is a common interview question. There's already a topic up, which I have read, but it's dead, and no answer was ever accepted. On top of that, my interests lie in a slightly more constrained form of the question, with a couple practical applications.
Given a two dimensional array such that:
Elements are unique.
Elements are sorted along the x-axis and the y-axis.
Neither sort predominates, so neither sort is a secondary sorting parameter.
As a result, the diagonal is also sorted.
All of the sorts can be thought of as moving in the same direction. That is to say that they are all ascending, or that they are all descending.
Technically, I think as long as you have a >/=/< comparator, any total ordering should work.
Elements are numeric types, with a single-cycle comparator.
Thus, memory operations are the dominating factor in a big-O analysis.
How do you find an element? Only worst case analysis matters.
Solutions I am aware of:
A variety of approaches that are:
O(nlog(n)), where you approach each row separately.
O(nlog(n)) with strong best and average performance.
One that is O(n+m):
Start in a non-extreme corner, which we will assume is the bottom right.
Let the target be J. Cur Pos is M.
If M is greater than J, move left.
If M is less than J, move up.
If you can do neither, you are done, and J is not present.
If M is equal to J, you are done.
Originally found elsewhere, most recently stolen from here.
And I believe I've seen one with a worst-case O(n+m) but a optimal case of nearly O(log(n)).
What I am curious about:
Right now, I have proved to my satisfaction that naive partitioning attack always devolves to nlog(n). Partitioning attacks in general appear to have a optimal worst-case of O(n+m), and most do not terminate early in cases of absence. I was also wondering, as a result, if an interpolation probe might not be better than a binary probe, and thus it occurred to me that one might think of this as a set intersection problem with a weak interaction between sets. My mind cast immediately towards Baeza-Yates intersection, but I haven't had time to draft an adaptation of that approach. However, given my suspicions that optimality of a O(N+M) worst case is provable, I thought I'd just go ahead and ask here, to see if anyone could bash together a counter-argument, or pull together a recurrence relation for interpolation search.
Here's a proof that it has to be at least Omega(min(n,m)). Let n >= m. Then consider the matrix which has all 0s at (i,j) where i+j < m, all 2s where i+j >= m, except for a single (i,j) with i+j = m which has a 1. This is a valid input matrix, and there are m possible placements for the 1. No query into the array (other than the actual location of the 1) can distinguish among those m possible placements. So you'll have to check all m locations in the worst case, and at least m/2 expected locations for any randomized algorithm.
One of your assumptions was that matrix elements have to be unique, and I didn't do that. It is easy to fix, however, because you just pick a big number X=n*m, replace all 0s with unique numbers less than X, all 2s with unique numbers greater than X, and 1 with X.
And because it is also Omega(lg n) (counting argument), it is Omega(m + lg n) where n>=m.
An optimal O(m+n) solution is to start at the top-left corner, that has minimal value. Move diagonally downwards to the right until you hit an element whose value >= value of the given element. If the element's value is equal to that of the given element, return found as true.
Otherwise, from here we can proceed in two ways.
Strategy 1:
Move up in the column and search for the given element until we reach the end. If found, return found as true
Move left in the row and search for the given element until we reach the end. If found, return found as true
return found as false
Strategy 2:
Let i denote the row index and j denote the column index of the diagonal element we have stopped at. (Here, we have i = j, BTW). Let k = 1.
Repeat the below steps until i-k >= 0
Search if a[i-k][j] is equal to the given element. if yes, return found as true.
Search if a[i][j-k] is equal to the given element. if yes, return found as true.
Increment k
1 2 4 5 6
2 3 5 7 8
4 6 8 9 10
5 8 9 10 11

Resources