Problem of assigning some values to a set from multiple options algo - algorithm

I have a problem statemment, where I have some sets and each set have some options, some specific option from options, needs to be assigned to that set.
Some options can be common in multiple sets, but none can be assigned to more than one set. Need an algo to achieve this. A rough example is
Set1 has options [100,101,102,103] - 2 needs to be selected,
Set2 has options [101,102,103,104] - 2 needs to be selected,
Set3 has options [99,100,101] - 2 needs to be selected,
so the possible solution is
Set1 gets 100,102
Set2 gets 103,104
Set3 gets 99,101
Can anyone suggests an approach on how can I get a generic solution to this problem.

This can be modelled as an instance of the bipartite graph matching problem.
Let A be the numbers which appear in the lists (combined into one set with no duplicates), and let B be the lists themselves, each repeated according to how many elements need to be selected from them. There is an edge from a number a ∈ A to a list b ∈ B whenever the number a is in the list b.
Therefore this problem can be approached using an algorithm which finds a "perfect matching", i.e. a matching which includes all vertices. Wikipedia lists several algorithms which can be used to find matchings as large as possible, including the Ford–Fulkerson algorithm and the Hopcroft–Karp algorithm.

Thanks #kaya3, I was not remembering the exact algo, and getting me remember that its a bipartite graph matching problem was really helpful.
But it wasn't giving me the exact solution when I needed n number of options for each So I followed the following approach, i.e.
A = [99,100,101,102,103,104]
B = [a1, a2, b1, b2, c1, c2]
# I repeated the instances because I need 2 instances for each and created my
# graph like this
# Now my graph will be like
99 => [c1, c2]
100 => [a1, a2, c1, c2]
101 => [a1, a2, b1, b2, c1, c2]
102 => [a1, a2, b1, b2]
103 => [a1, a2, b1, b2]
104 => [b1, b2]
Now it is giving correct solution everytime. I tried with multiple use cases. Repeating

Related

Find variations of numbers

I'm sorry I don't know what's the proper title for this because I don't know what topic this question falls into.
So for example there are 5 people. They want to stay in a hotel. This hotel only allows at most 2 lodgers per room and at least 1. That means there are a few possible variations for this.
1-1-1-1-1 (1 room for each person)
1-2-2 (1 person stays alone, the other 4 are divided into 2 room)
1-1-1-2 (... and so on)
What is the algorithm to find these variations?
This is a combinatorial question, and the abstract version is typically called balls & bins. A key question is whether the balls are distinguishable. Ditto for the bins.
In your example, the balls are people and the bins are rooms. If the rooms are distinguishable, you'll also need the total number available.
Let's say neither is distinguishable. Then the only question is how many pairs we have, with the options being 0, 1, or 2, so there are 3 solutions.
If people are distinguishable but not rooms (balls but not bins), then we care who are in the pairs. In this case 1-1-1-1-1 has a single solution, 1-1-1-2 has choose(5,2) = 10 solutions (all the ways we can choose who is in the lone pair), and 1-2-2 has choose(5,2) * choose(3,2) / 2 = 10 * 3 = 30 solutions (choose who is in the first pair, then the second, then divide by 2 to avoid double-counting where the order is reversed). Total solutions: 41.
If people and rooms are both distinguishable, then for each of the solutions above we care which room each person or pair goes in. This will depend on the total number of rooms available. If there are R rooms available, then a solution which uses r rooms where rooms aren't distinguishable will need to be multiplied by R!/(R-r)!.
E.g. 1-1-1-2 has 10 solutions where rooms are indistinguishable. If the hotel has 5 rooms then we multiple that by 5!/(5-4)! = 120 to get 1200 solutions.
If the people are a,b,c,d,e and there are 5 rooms numbered 1,2,3,4,5, then the solutions where b+d are paired up, a is in room 1, and c is in room 2 are:
a1, c2, e3, bd4
a1, c2, e3, bd5
a1, c2, e4, bd3
a1, c2, e4, bd5
a1, c2, e5, bd3
a1, c2, e5, bd4
You can consider above problem similar to coin change problem, where you have the sum and coins and you have to find the number of ways you can make the sum using those coins.
Here:
coins = {1,2}
sum = Number of people

Interview question: minimum number of swaps to make couples sit together

This is an interview question, and the problem description is as follows:
There are n couples sitting in a row with 2n seats. Find the minimum number of swaps to make everyone sit next on his/her partner. For example, 0 and 1 are couple, and 2 and 3 are couple. Originally they are sitting in a row in this order: [2, 0, 1, 3]. The minimum number of swaps is 1, for example swapping 2 with 1.
I know there is a greedy solution for this problem. You just need to scan the array from left to right. Every time you see an unmatched pair, you swap the first person of the pair to his/her correct position. For example, in the above example for pair [2, 0], you will directly swap 2 with 1. There is no need to try swapping 0 with 3.
But I don't really understand why this works. One of the proofs I saw was something like this:
Consider a simple example: 7 1 4 6 2 3 0 5. At first step we have two choices to match the first couple: swap 7 with 0, or swap 1 with 6. Then we get 0 1 4 6 2 3 7 5 or 7 6 4 1 2 3 0 5. Pay attention that the first couple doesn't count any more. For the later part it is composed of 4 X 2 3 Y 5 (X=6 Y=7 or X=1 Y=0). Since different couples are unrelated, we don't care X Y is 6 7 pair or 0 1 pair. They are equivalent! Thus it means our choice doesn't count.
I feel that this is very reasonable but not compelling enough. In my opinion we have to prove that X and Y are couple in all possible cases and don't know how. Can anyone give a hint? Thanks!
I've split the problem into 3 examples. A's are a pair and so are B's in all examples. Note that throughout the examples a match requires that elements are adjacent and the first element occupy an index that satisfies index%2 = 0. An array looking like this [X A1 A2 ...] does not satisfy this condition, however this does [X Y A1 A2 ...]. The examples also do not look to the left at all, because looking to the left of A2 below is the same as looking to the right of A1.
First example
There's an even number of elements between two unmatched pairs:
A1 B1 ..2k.. A2 B2 .. for any number k in {0, 1, 2, ..} meaning A1 B1 A2 B2 .. is just a another case.
Both can be matched in one swap:
A1 A2 ..2k.. B1 B2 .. or B2 B1 ..2k.. A2 A1 ..
Order is not important, so it doesn't matter which pair is first. Once the pairs are matched, there will be no more swapping involving either pair. Finding A2 based on A1 will result in the same amount of swaps as finding B2 based on B1.
Second example
There's an odd number of elements between two pairs (2k + the element C):
A1 B1 ..2k.. C A2 B2 D .. (A1 B1 ..2k.. C B2 A2 D .. is identical)
Both cannot be matched in one swap, but like before it doesn't matter which pair is first nor if the matched pair is in the beginning or in the middle part of the array, so all these possible swaps are equally valid, and none of them creates more swaps later on:
A1 A2 ..2k .. C B1 B2 D .. or B2 B1 ..2k.. C A2 A1 D .. Note that the last pair is not matched
C B1 ..2k.. A1 A2 B2 D .. or A1 D ..2k.. C A2 B2 B1 .. Here we're not matching the first pair.
The important thing about this is that in each case, only one pair is matched and none of the elements of that pair will need to be swapped again. The result of the remaining non-matched pair are either one of:
..2k.. C B1 B2 D ..
..2k.. C A2 A1 D ..
C B1 ..2k.. B2 D ..
A1 D ..2k.. C A2 ..
They are clearly equivalent in terms of swaps needed to match the remaining A's or B's.
Third example
This is logically identical to the second. Both B1/A2 and A2/B2 can have any number of elements between them. No matter how elements are swapped, only one pair can be matched. m1 and m2 are arbitrary number of elements. Note that elements X and Y are just the elements surrounding B2, and they're only used to illustrate the example:
A1 B1 ..m1.. A2 ..m2.. X B2 Y .. (A1 B1 ..m1.. B2 ..m2.. X A2 Y .. is identical)
Again both pairs cannot be matched in one swap, but it's not important which pair is matched, or where the matched pair position is:
A1 A2 ..m1.. B1 ..m2.. X B2 Y .. or B2 B1 ..m1.. A2 ..m2.. X A1 Y .. Note that the last pair is not matched
A1 X ..m1.. A2 ..m2-1.. B1 B2 Y .. or A1 Y ..m1.. A2 ..m2.. X B2 B1.. depending on position of B2. Here we're not matching the first pair.
Matching the pair around A2 is equivalent, but omitted.
As in the second example, one swap can also be matching a pair in the beginning or in the middle of the array, but either choice doesn't change that only one pair is matched. Nor does it change the remaining amount of unmatched pairs.
A little analysis
Keeping in mind that matched pairs drop out of the list of unmatched/problem pairs, the list of unmatched pairs are either one fewer or two fewer pairs for each swap. Since it's not important which pair drops out of the problem, it might as well be the first. In that case we can assume that pairs to the left of the cursor/current index are all matched. And that we only need to match the first pair, unless it's already matched by coincidence and the cursor is then rightfully moved.
It becomes even more clear if the above examples are looked at with the cursor being at the second unmatched pair, instead of the first. It still doesn't matter which pairs are swapped for the amount of total swaps needed. So there's no need to try to match pairs in the middle. The resulting amount of swaps are the same.
The only time two pairs can be matched with only one swap are those in the first example. There is no way to match two pairs in one swap in any other setup. Looking at the result of the swap in the second and third examples, it also becomes clear that none of the results have any advantage to any of the others and that each result becomes a new problem that can be described as one of the three cases (two cases really, because second and third are equivalent in terms of match-able pairs).
Optimal swapping
There is no way to modify the array to prepare it for more optimal swapping later on. Either a swap will match one or two pairs, or it will count as a swap with no matches:
Looking at this: A1 B1 ..2k.. C B2 ... A2 ...
Swap to prepare for optimal swap:
A1 B1 ..2k.. A2 B2 ... C ... no matches
A1 A2 ..2k.. B1 B2 ... C ... two in one
Greedy swap:
B2 B1 ..2k.. C A1 ... A2 ... one
B2 B1 ..2k.. A2 A1 ... C ... one
Un-matching pairs
Pairs already matched will not become unmatched because that would require that:
For A1 B1 ..2k.. C A2 B2 D ..
C is identical to A1 or
D is identical to B1
either of which is impossible.
Likewise with A1 B1 ..m1.. (Z) A2 (V) ..m2.. X B2 Y ..
Or it would require that matched pairs are shifted one (or any odd number of) index inside the array. That's also not possible, because we always ever swap, so the array elements aren't being shifted at all.
[Edited for clarity 4-Mar-2020.]
There is no point doing a swap which does not put (at least) one couple together. To do so would add 1 to the swap count and leave us with the same number of unpaired couples.
So, each time we do a swap, we put one couple together leaving at most n-1 couples. Repeating the process we end up with 1 pair, who must by then be a couple. So, the worst case must be n-1 swaps.
Clearly, we can ignore couples who are already together.
Clearly, where we have two pairs a:B b:A, one swap will create the two couples a:A b:B.
And if we have m pairs a:Q b:A c:B ... q:P -- where the m pairs are a "disjoint subset" (or cycle) of couples, m-1 swaps will put them into couples.
So: the minimum number of swaps is going to be n - s where s is the number of "disjoint subsets" (and s >= 1). [A subset may, of course, contain just one couple.]
Interestingly, there is nothing clever you can do to reduce the number of swaps. Provided every swap creates a couple you will do the minimum number.
If you wanted to arrange each couple in height order as well, things may or may not be more interesting.
FWIW: having shown that you cannot do better than n-1 swaps for each disjoint set of n couples, the trick then is to avoid the O(n^2) search for each swap. That can be done relatively straightforwardly by keeping a vector with one entry per person, giving where they are currently sat. Then in one scan you pick up each person and if you know where their partner is sat, swap down to make a pair, and update the location of the person swapped up.
I will swap every even positioned member,
if he/she doesn't sit besides his/her partner.
Even positioned means array indexed 1, 3, 5 and so on.
The couples are [even, odd] pair. For example [0, 1], [2, 3], [4, 5] and so on.
The loop will be like that:
for(i=1; i<n*2; i+=2) // when n = # of couples.
Now, we will check i-th and (i-1)-th index member. If they are not couple, then we will look for the partner of (i-1)-th member and once we have it, we should swap it with i-th member.
For an example, say at i=1, we got 6, now if (i-1)-th element is 7 then they form a couple (if (i-1)-th element is 5 then [5, 6] is not a couple.) and we don't need any swap, otherwise we should look for the partner of (i-1)-th element and will swap with i-th element. So, (i-1)-th and i-th will form a couple.
It ensure that we need to check only half of the total members, that means, n.
And, for any non-matched couple, we need a linear search from i-th position to the rest of the array. Which is O(2n), eventually O(n).
So, the overall technique complexity will be O(n^2).
In worst case, minimum swap will be n-1. (this is maximum as well).
Very straightforward. If you need help to code, let us know.

Hadoop data cross calculation

I have a huge file, which contains a list of numbers, for example A0, A1, ..., An. I want to calculate values for all combinations of Ai and Aj using a complex operation (op). So I want the values
op(A1, A2), op(A1, A3), ....., op(An, An-2), op(An, An-1)
or event more op(A1, A2, A3), op(A1, A2, A4), ....
My questions is the huge file contains all numbers are divided into segments on nodes. How can I get number Ai & Aj which are not on the same node?
thanks

Lexicographical order in Apriori algorithm

I'm working with the Apriori algorithm for a while and I'm asking me about a step in candidate generation of frequent itemsets.
If I want to join two frequent 3-itemsets to a (candidate) 4-itemsets, there must be 2 items of the joining itemsets the same and the other one different.
For example I can join
{Married: Yes, Age:20, Cars:1} and {Married: Yes, Age:20, Unemployed: No}
to
{Married: Yes, Age:20, Cars:1, Unemployed: No}
But sometimes I read about this step in Apriori algorithm:
I can join two freq. itemstes from L_{k-1}, when there are lexicographically ordered first k-2 items are the same and and the last ones are different.
But when I would order my itemsets from above lexicographical, the first k-2 item wouldn't be the same and so I might not join them?!?
{Age:20, Cars:1, Married: Yes} and {Age:20, Married: Yes Unemployed: No}
I hope that I could explain my problem clearly to you!
Thanks for your help!!
Yes, you should not join them.
Let's take an example.
Let's say that at LEVEL 3, you have the frequent itemsets:
{ A, B, C}
{ A, B, D}
{ A C, D}
{ B, C, D}
{ B, F, G
Now let's say that you want to generate candidate itemsets of size 4.
Obviously, you just want to combine itemsets that have 1 different item. Otherwise the result may include itemsets with a size larger than 4. For example, if you could combine BCD and BFG the result would be BCDFG an itemset of size 5, which we don't want. So that is the reason why we only combine itemsets having a single item that is different.
Now, let me explain why we only combine itemsets having the first k-1 items that are identical. The reason is that we don't want to generate the same candidate twice.
For example, if we could combine BCD and ACD we would get ABCD . If we combine also ABC and ABD we would also get ABCD. This is not good because we would generate the same candidate twice! We don't want that! Thus, by ordering itemsets according to the lexicographical order and only combining if the first k-1 items are the same, we will avoid this problem. We would only combine ABC and ABD but we would not combine BCD and ACD. You can get the proof that it works in the Apriori paper.
Hope this helps.

algorithm issue - find the least common subset

a's are objects with multiple "categories", b's, for instance a1 has three cateories b1,b2,b3.
The problem is to, reduce the number of categories (which can grow rather large), into groups that always occurs together. A "largest common subset" thing.
So for instance, given the following data set:
a1{ b1,b2,b3 }
a2{ b2,b3 }
a3{ b1,b4 }
We can find that b2 and b3 always comes together..
b23 = {b2,b3}
..and we can reduce the category set to this:
a1{ b1, b23 }
a2{ b23 }
a3{ b1,b4 }
So, my issue is to find some algorithm to solve this problem.
I have started to look at the Longest Common Sequence problem, and it might be a solution. i.e. something like repeatedly grouping categories like this b' = LCS(set_of_As) until all categories has been traversed. However, this is not complete. I have to limit the input domain in some way to make this possible.
Do I miss something obvious? Any hints of a problem domain you can point me to? Does anyone recognize any other approach to such a problem.
Transform your sets to have sets of b's that include a's:
b1 { a1, a3 }
b2 { a1, a2 }
b3 { a1, a2 }
b4 { a3 }
Make sure the contents of the new b sets are sorted.
Sort your b sets by their contents.
Any two adjacent sets with the same elements are b's that occur in the same a sets.
I think you're on the right track with the LCS if you can impose an ordering on the catagories (if not then the LCS algorithm can't recognize {b3, b4} and {b4, b3}). If you can impose and ordering and sort them then I think something like this could work:
As = {a1={b1, b2},a2={b3},...}
while ((newgroup = LCS(As)) != empty) {
for (a in As) {
replace newgroup in a
}
}

Resources