Two sets of items. Each element of set A a unique match in set B. Match each item of set A to item in set B in O(nlogn) time - algorithm

So, to clarify the question:
Set A and Set B
every element in set A has a partner in set B
you can not sort either set based upon comparing it to members of the same set, ie, each b element of B is indistinguishable from any other b of set B (likewise for A).
when Ai is matched with Bi you can tell if Bi > Ai, Bi < Ai or Bi = Ai.
design an algorithm with O(nlogn) running time.
The obvious answer with quadratic time is trivial and not helpful -- although it's the best I've come up with yet. The log(n) makes me think that I should be using recursion or a binary tree of some sort, but I'm not sure how could create a binary tree without being able to compare elements from the same set. Also, I'm not sure how to use a recursive call to any greater effect than just running nested for loops. Any tips would be greatly appreciated.

You haven't stated it very clearly, but your question looks suspiciously like the Matching Nuts and Bolts problem.
The idea there is to pick a random nut a, find the matching bolt b. Partition the bolts using nut a, and partition the nuts using Bolt b, and then recurse, like quicksort does.
(Of course, we are talking about the average case being nlogn, rather than the worst case).

Related

Select N items such that their properties are balanced

Lets say I have N objects and each of them has associated values A and B. This could be represented as a list of tuples like:
[(3,10), (8,4), (0,0), (20,7),...]
where each tuple is an object and the two values are A and B.
What I want to do is select M of these objects (where M < N) such that the sums of A and B in the selected subset is as balanced as possible. M here is a parameter of the problem, I don't want to find the optimal M. I want to be able to say "give me 100 objects, and make them as balanced as possible".
Any idea if there an efficient algorithm which can solve this problem (not necessarily completely optimally)? I think this might be related to bin-packing, but I'm not really sure.
This is a disguised variant of subset-sum. Replace each (A,B) by A-B, and then the absolute value of the sum of all selected A-B values is the "unbalancedness" of the sums. So you really have to select M of those scalars and try to have a sum as close to 0 as possible.
The "variant" bit is because you have to select exactly M items. (I think this is why your mind went to bin-packing rather than subset-sum.) If you have a black-box subset-sum solver you can account for this too: if the maximum single-pair absolute difference is D, replace each (A,B) by (A-B+D) and have the target sum be M*D. (Don't do that if you're doing a dynamic programming approach, of course, since it increases the magnitude of the numbers you're working with.)
Presuming that you're fine with an approximation (and if you're not, you're gonna have a real bad day) I would tend to use Simulated Annealing or Late Acceptance Hill Climbing as a basic approach, starting with a greedy initial solution (iteratively add whichever object results in the minimal difference), and then in each round, considering randomly replacing one object by one not-currently-selected object.

Extended version of the set cover problem

I don't generally ask questions on SO, so if this question seems inappropriate for SO, just tell me (help would still be appreciated of course).
I'm still a student and I'm currently taking a class in Algorithms. We recently learned about the Branch-and-Bound paradigm and since I didn't fully understand it, I tried to do some exercises in our course book. I came across this particular instance of the Set Cover problem with a special twist:
Let U be a set of elements and S = {S1, S2, ..., Sn} a set of subsets of U, where the union of all sets Si equals U. Outline a Branch-and-Bound algorithm to find a minimal subset Q of S, so that for all elements u in U there are at least two sets in Q, which contain u. Specifically, elaborate how to split the problem up into subproblems and how to calculate upper and lower bounds.
My first thought was to sort all the sets Si in S in descending order, according to how many elements they contain which aren't yet covered at least twice by the currently chosen subsets of S, so our current instance of Q. I was then thinking of recursively solving this, where I choose the first set Si in the sorted order and make one recursive call, where I take this set Si and one where I don't (meaning from those recursive calls onwards the subset is no longer considered). If I choose it I would then go through each element in this chosen subset Si and increase a counter for all its elements (before the recursive call), so that I'll eventually know, when an element is already covered by two or more chosen subsets. Since I sort the not chosen sets Si for each recursive call, I would theoretically (in my mind at least) always be making the best possible choice for the moment. And since I basically create a binary tree of recursive calls, because I always make one call with the current best subset chosen and one where I don't I'll eventually cover all 2^n possibilities, meaning eventually I'll find the optimal solution.
My problem now is I don't know or rather understand how I would implement a heuristic for upper and lower bounds, so the algorithm can discard some of the paths in the binary tree, which will never be better than the current best Q. I would appreciate any help I could get.
Here's a simple lower bound heuristic: Find the set containing the largest number of not-yet-twice-covered elements. (It doesn't matter which set you pick if there are multiple sets with the same, largest possible number of these elements.) Suppose there are u of these elements in total, and this set contains k <= u of them. Then, you need to add at least u/k further sets before you have a solution. (Why? See if you can prove this.)
This lower bound works for the regular set cover problem too. As always with branch and bound, using it may or may not result in better overall performance on a given instance than simply using the "heuristic" that always returns 0.
First, some advice: don't re-sort S every time you recurse/loop. Sorting is an expensive operation (O(N log N)) so putting it in a loop or a recursion usually costs more than you gain from it. Generally you want to sort once at the beginning, and then leverage that sort throughout your algorithm.
The sort you've chosen, descending by the length of the S subsets is a good "greedy" ordering, so I'd say just do that upfront and don't re-sort after that. You don't get to skip over subsets that are not ideal within your recursion, but checking a redundant/non-ideal subset is still faster than re-sorting every time.
Now what upper/lower bounds can you use? Well generally, you want your bounds and bounds-checking to be as simple and efficient as possible because you are going to be checking them a lot.
With this in mind, an upper bounds is easy: use the shortest set-length solution that you've found so far. Initially set your upper-bounds as var bestQlength = int.MaxVal, some maximum value that is greater than n, the number of subsets in S. Then with every recursion you check if currentQ.length > bestQlength, if so then this branch is over the upper-bounds and you "prune" it. Obviously when you find a new solution, you also need to check if it is better (shorter) than your current bestQ and if so then update both bestQ and bestQlength at the same time.
A good lower bounds is a bit trickier, the simplest I can think of for this problem is: Before you add a new subset Si into your currentQ, check to see if Si has any elements that are not already in currentQ two or more times, if it does not, then this Si cannot contribute in any way to the currentQ solution that you are trying to build, so just skip it and move on to the next subset in S.

Optimal selection election algorithm

Given a bunch of sets of people (similar to):
[p1,p2,p3]
[p2,p3]
[p1]
[p1]
Select 1 from each set, trying to minimize the maximum number of times any one person is selected.
For the sets above, the max number of times a given person MUST be selected is 2.
I'm struggling to get an algorithm for this. I don't think it can be done with a greedy algorithm, more thinking along the lines of a dynamic programming solution.
Any hints on how to go about this? Or do any of you know any good websites about this stuff that I could have a look at?
This is neither dynamic nor greedy. Let's look at a different problem first -- can it be done by selecting every person at most once?
You have P people and S sets. Create a graph with S+P vertices, representing sets and people. There is an edge between person pi and set si iff pi is an element of si. This is a bipartite graph and the decision version of your problem is then equivalent to testing whether the maximum cardinality matching in that graph has size S.
As detailed on that page, this problem can be solved by using a maximum flow algorithm (note: if you don't know what I'm talking about, then take your time to read it now, as you won't understand the rest otherwise): first create a super-source, add an edge linking it to all people with capacity 1 (representing that each person may only be used once), then create a super-sink and add edges linking every set to that sink with capacity 1 (representing that each set may only be used once) and run a suitable max-flow algorithm between source and sink.
Now, let's consider a slightly different problem: can it be done by selecting every person at most k times?
If you paid attention to the remarks in the last paragraph, you should know the answer: just change the capacity of the edges leaving the super-source to indicate that each person may be used more than once in this case.
Therefore, you now have an algorithm to solve the decision problem in which people are selected at most k times. It's easy to see that if you can do it with k, then you can also do it with any value greater than k, that is, it's a monotonic function. Therefore, you can run a binary search on the decision version of the problem, looking for the smallest k possible that still works.
Note: You could also get rid of the binary search by testing each value of k sequentially, and augmenting the residual network obtained in the last run instead of starting from scratch. However, I decided to explain the binary search version as it's conceptually simpler.

Finding matches in two bags

This was one of the interview questions, and I was wondering if I did it right, and want to know if my solution is the most efficient one. O(n log n)
Given a bag of nuts and a bag of bolts, each having a different size within a bag but exactly one match in the other bag, give a fast algorithm to find all matches.
Since there is always a match between 2 bags, I said there will be same number of nuts and same number of bolts. Let the number be n.
I would first sort components in each bag each component's weight, and it is possible to do that because they all have different size within a bag. Using merge sort, it will have O(n log n) time complexity.
Next, it would be just easy process of matching each component in 2 bags from lightest to heaviest.
I want to know if this is the right solution, and also if there is any other interesting way to solve this problem.
There is a solution to this problem in O(nlgn) that works even if you can't compare two nuts or two bolts directly. That is without using a scale, or if the differences in weight are too small to observe.
It works by applying a small variation of randomised quicksort as follows:
Pick a random bolt b.
Compare bolt b with all the nuts, partitioning the nuts into those of size less than b and greater than b.
Now we must also partition the bolts into two halves as well and we can't compare bolt to bolt. But now we know what is the matching nut n to b. So we compare n to all the bolts, partitioning them into those of size less than n and greater than n.
The rest of the problem follows directly from randomised quicksort by applying the same steps to the lessThan and greaterThan partitions of nuts and bolts.
The standard solution is to put the bag of nuts into a hash (or HashMap, or dictionary, or whatever your language of choice calls it), and then walk the other bag using hash lookups to find the matches.
Average performance of this algorithm with be O(n), with better constants than your sort/binary search variation.
Assuming you have no exact measurements of size, and you can only see if something is too large, too small, or exactly right by trying, this needs the extra step of how you are sorting -- which is probably by taking a random nut, comparing with each bolt and sorting into smaller, bigger piles, and then taking a bolt and sorting the nuts with that (a modified quicksort... keep in mind you'll have to deal with the problem of how to choose a nut or bolt to compare to when the piles get smaller as to make sure you're using a reasonable pivot).
Since the problem doesn't tell you that the nuts and bolts are labeled with sizes, or that you have some way of measuring the nuts and bolts' sizes, it would be difficult to use a hash table to solve this, or even a regular comparison sort.
That looks like the best solution to me, complexity-wise.
An alternative would be to sort only one of the bags, then use binary search to try matching bolt to nut (removing the pair from the sorted list). Still O(n log n), but the second part is optimized slightly, since we're removing each pair after the match, the worst case would be: log n + log n-1 + log n-2 ... 1 = (n log n)/2
Take two bags extra. Say left and right bags. Now take each nut and compare with each bolt. If the bolt is small place it in left bag, large then in right bag, fits then place it away. Now take the next nut, compare it with previously taken nut, if its smaller, then we need to check only those nuts in left bag, else to the ones in right bag. Continue in this fashion.

Solution to Recursive Relations with Arrays

from Mexico. The truth is almost never asked or open new issues, because really the forum and not only this, if not to work instead of the network, you can find plenty of information about topic x or y, however this time I feel very defeated.
I have two years of recursion.
Define the following recursive algorithms.
a. Calculate the next n integers.
At first not referred to the master with this is that if the algorithm returns a sum, or set of numbers. Furthermore, although in principle and algorithm design for the second case is asked to resolve by its expression as a recurrence relation ... this is where I am more than lost, not how to express this as a RR. And that can be solved
b. Calculate the minimum of a set of integers
In the other case suppose that calls for the minimum of a set of integers. that's solved, but the fact and pass it to a RR fix, has left me completely flooded.
APPRECIATE ANY HELP, thanks
Answering on b)
You have a set of integers. You pick one and you know that minimal element is either that you've picked or the minimal is still in the set. Recursivly you call function unless you pick all elements from set, you assume that minimum of set that contain no elements is infinity. Then your recurrence is going back updateing the minimal value.
minimum (S) = min(any element, minimum(Rest of S))
if (S is empty) then minimum(empty) = infinity.
Not an implementation in any language cause surely depend on representation of set.
P.S why doing this recursivly?

Resources