Interview question: minimum number of swaps to make couples sit together - algorithm

This is an interview question, and the problem description is as follows:
There are n couples sitting in a row with 2n seats. Find the minimum number of swaps to make everyone sit next on his/her partner. For example, 0 and 1 are couple, and 2 and 3 are couple. Originally they are sitting in a row in this order: [2, 0, 1, 3]. The minimum number of swaps is 1, for example swapping 2 with 1.
I know there is a greedy solution for this problem. You just need to scan the array from left to right. Every time you see an unmatched pair, you swap the first person of the pair to his/her correct position. For example, in the above example for pair [2, 0], you will directly swap 2 with 1. There is no need to try swapping 0 with 3.
But I don't really understand why this works. One of the proofs I saw was something like this:
Consider a simple example: 7 1 4 6 2 3 0 5. At first step we have two choices to match the first couple: swap 7 with 0, or swap 1 with 6. Then we get 0 1 4 6 2 3 7 5 or 7 6 4 1 2 3 0 5. Pay attention that the first couple doesn't count any more. For the later part it is composed of 4 X 2 3 Y 5 (X=6 Y=7 or X=1 Y=0). Since different couples are unrelated, we don't care X Y is 6 7 pair or 0 1 pair. They are equivalent! Thus it means our choice doesn't count.
I feel that this is very reasonable but not compelling enough. In my opinion we have to prove that X and Y are couple in all possible cases and don't know how. Can anyone give a hint? Thanks!

I've split the problem into 3 examples. A's are a pair and so are B's in all examples. Note that throughout the examples a match requires that elements are adjacent and the first element occupy an index that satisfies index%2 = 0. An array looking like this [X A1 A2 ...] does not satisfy this condition, however this does [X Y A1 A2 ...]. The examples also do not look to the left at all, because looking to the left of A2 below is the same as looking to the right of A1.
First example
There's an even number of elements between two unmatched pairs:
A1 B1 ..2k.. A2 B2 .. for any number k in {0, 1, 2, ..} meaning A1 B1 A2 B2 .. is just a another case.
Both can be matched in one swap:
A1 A2 ..2k.. B1 B2 .. or B2 B1 ..2k.. A2 A1 ..
Order is not important, so it doesn't matter which pair is first. Once the pairs are matched, there will be no more swapping involving either pair. Finding A2 based on A1 will result in the same amount of swaps as finding B2 based on B1.
Second example
There's an odd number of elements between two pairs (2k + the element C):
A1 B1 ..2k.. C A2 B2 D .. (A1 B1 ..2k.. C B2 A2 D .. is identical)
Both cannot be matched in one swap, but like before it doesn't matter which pair is first nor if the matched pair is in the beginning or in the middle part of the array, so all these possible swaps are equally valid, and none of them creates more swaps later on:
A1 A2 ..2k .. C B1 B2 D .. or B2 B1 ..2k.. C A2 A1 D .. Note that the last pair is not matched
C B1 ..2k.. A1 A2 B2 D .. or A1 D ..2k.. C A2 B2 B1 .. Here we're not matching the first pair.
The important thing about this is that in each case, only one pair is matched and none of the elements of that pair will need to be swapped again. The result of the remaining non-matched pair are either one of:
..2k.. C B1 B2 D ..
..2k.. C A2 A1 D ..
C B1 ..2k.. B2 D ..
A1 D ..2k.. C A2 ..
They are clearly equivalent in terms of swaps needed to match the remaining A's or B's.
Third example
This is logically identical to the second. Both B1/A2 and A2/B2 can have any number of elements between them. No matter how elements are swapped, only one pair can be matched. m1 and m2 are arbitrary number of elements. Note that elements X and Y are just the elements surrounding B2, and they're only used to illustrate the example:
A1 B1 ..m1.. A2 ..m2.. X B2 Y .. (A1 B1 ..m1.. B2 ..m2.. X A2 Y .. is identical)
Again both pairs cannot be matched in one swap, but it's not important which pair is matched, or where the matched pair position is:
A1 A2 ..m1.. B1 ..m2.. X B2 Y .. or B2 B1 ..m1.. A2 ..m2.. X A1 Y .. Note that the last pair is not matched
A1 X ..m1.. A2 ..m2-1.. B1 B2 Y .. or A1 Y ..m1.. A2 ..m2.. X B2 B1.. depending on position of B2. Here we're not matching the first pair.
Matching the pair around A2 is equivalent, but omitted.
As in the second example, one swap can also be matching a pair in the beginning or in the middle of the array, but either choice doesn't change that only one pair is matched. Nor does it change the remaining amount of unmatched pairs.
A little analysis
Keeping in mind that matched pairs drop out of the list of unmatched/problem pairs, the list of unmatched pairs are either one fewer or two fewer pairs for each swap. Since it's not important which pair drops out of the problem, it might as well be the first. In that case we can assume that pairs to the left of the cursor/current index are all matched. And that we only need to match the first pair, unless it's already matched by coincidence and the cursor is then rightfully moved.
It becomes even more clear if the above examples are looked at with the cursor being at the second unmatched pair, instead of the first. It still doesn't matter which pairs are swapped for the amount of total swaps needed. So there's no need to try to match pairs in the middle. The resulting amount of swaps are the same.
The only time two pairs can be matched with only one swap are those in the first example. There is no way to match two pairs in one swap in any other setup. Looking at the result of the swap in the second and third examples, it also becomes clear that none of the results have any advantage to any of the others and that each result becomes a new problem that can be described as one of the three cases (two cases really, because second and third are equivalent in terms of match-able pairs).
Optimal swapping
There is no way to modify the array to prepare it for more optimal swapping later on. Either a swap will match one or two pairs, or it will count as a swap with no matches:
Looking at this: A1 B1 ..2k.. C B2 ... A2 ...
Swap to prepare for optimal swap:
A1 B1 ..2k.. A2 B2 ... C ... no matches
A1 A2 ..2k.. B1 B2 ... C ... two in one
Greedy swap:
B2 B1 ..2k.. C A1 ... A2 ... one
B2 B1 ..2k.. A2 A1 ... C ... one
Un-matching pairs
Pairs already matched will not become unmatched because that would require that:
For A1 B1 ..2k.. C A2 B2 D ..
C is identical to A1 or
D is identical to B1
either of which is impossible.
Likewise with A1 B1 ..m1.. (Z) A2 (V) ..m2.. X B2 Y ..
Or it would require that matched pairs are shifted one (or any odd number of) index inside the array. That's also not possible, because we always ever swap, so the array elements aren't being shifted at all.

[Edited for clarity 4-Mar-2020.]
There is no point doing a swap which does not put (at least) one couple together. To do so would add 1 to the swap count and leave us with the same number of unpaired couples.
So, each time we do a swap, we put one couple together leaving at most n-1 couples. Repeating the process we end up with 1 pair, who must by then be a couple. So, the worst case must be n-1 swaps.
Clearly, we can ignore couples who are already together.
Clearly, where we have two pairs a:B b:A, one swap will create the two couples a:A b:B.
And if we have m pairs a:Q b:A c:B ... q:P -- where the m pairs are a "disjoint subset" (or cycle) of couples, m-1 swaps will put them into couples.
So: the minimum number of swaps is going to be n - s where s is the number of "disjoint subsets" (and s >= 1). [A subset may, of course, contain just one couple.]
Interestingly, there is nothing clever you can do to reduce the number of swaps. Provided every swap creates a couple you will do the minimum number.
If you wanted to arrange each couple in height order as well, things may or may not be more interesting.
FWIW: having shown that you cannot do better than n-1 swaps for each disjoint set of n couples, the trick then is to avoid the O(n^2) search for each swap. That can be done relatively straightforwardly by keeping a vector with one entry per person, giving where they are currently sat. Then in one scan you pick up each person and if you know where their partner is sat, swap down to make a pair, and update the location of the person swapped up.

I will swap every even positioned member,
if he/she doesn't sit besides his/her partner.
Even positioned means array indexed 1, 3, 5 and so on.
The couples are [even, odd] pair. For example [0, 1], [2, 3], [4, 5] and so on.
The loop will be like that:
for(i=1; i<n*2; i+=2) // when n = # of couples.
Now, we will check i-th and (i-1)-th index member. If they are not couple, then we will look for the partner of (i-1)-th member and once we have it, we should swap it with i-th member.
For an example, say at i=1, we got 6, now if (i-1)-th element is 7 then they form a couple (if (i-1)-th element is 5 then [5, 6] is not a couple.) and we don't need any swap, otherwise we should look for the partner of (i-1)-th element and will swap with i-th element. So, (i-1)-th and i-th will form a couple.
It ensure that we need to check only half of the total members, that means, n.
And, for any non-matched couple, we need a linear search from i-th position to the rest of the array. Which is O(2n), eventually O(n).
So, the overall technique complexity will be O(n^2).
In worst case, minimum swap will be n-1. (this is maximum as well).
Very straightforward. If you need help to code, let us know.

Related

Problem of assigning some values to a set from multiple options algo

I have a problem statemment, where I have some sets and each set have some options, some specific option from options, needs to be assigned to that set.
Some options can be common in multiple sets, but none can be assigned to more than one set. Need an algo to achieve this. A rough example is
Set1 has options [100,101,102,103] - 2 needs to be selected,
Set2 has options [101,102,103,104] - 2 needs to be selected,
Set3 has options [99,100,101] - 2 needs to be selected,
so the possible solution is
Set1 gets 100,102
Set2 gets 103,104
Set3 gets 99,101
Can anyone suggests an approach on how can I get a generic solution to this problem.
This can be modelled as an instance of the bipartite graph matching problem.
Let A be the numbers which appear in the lists (combined into one set with no duplicates), and let B be the lists themselves, each repeated according to how many elements need to be selected from them. There is an edge from a number a ∈ A to a list b ∈ B whenever the number a is in the list b.
Therefore this problem can be approached using an algorithm which finds a "perfect matching", i.e. a matching which includes all vertices. Wikipedia lists several algorithms which can be used to find matchings as large as possible, including the Ford–Fulkerson algorithm and the Hopcroft–Karp algorithm.
Thanks #kaya3, I was not remembering the exact algo, and getting me remember that its a bipartite graph matching problem was really helpful.
But it wasn't giving me the exact solution when I needed n number of options for each So I followed the following approach, i.e.
A = [99,100,101,102,103,104]
B = [a1, a2, b1, b2, c1, c2]
# I repeated the instances because I need 2 instances for each and created my
# graph like this
# Now my graph will be like
99 => [c1, c2]
100 => [a1, a2, c1, c2]
101 => [a1, a2, b1, b2, c1, c2]
102 => [a1, a2, b1, b2]
103 => [a1, a2, b1, b2]
104 => [b1, b2]
Now it is giving correct solution everytime. I tried with multiple use cases. Repeating

Need assitance understanding Sardinas-Patterson algorithm (Algorithm and example provided)

I am having difficulty understanding Sardinas- Patterson algorithm from the below slide:
How do we get C1 and C2???
I also got this information from the internet:
The algorithm is finite because all dangling suffixes added to the list are suffixes of a finite set of codewords, and a dangling suffix can be added at most once.
{ 0, 01, 11 }. The codeword 0 is a prefix of 01, so add the dangling suffix 1. { 0, 01, 11, 1 }. The codeword 0 is a prefix of 01, but the dangling suffix 1 is already in the list; the codeword 1 is a prefix of 11, but the dangling suffix 1 is already in the list. There are no other dangling suffixes, so conclude that the set is uniquely decodable.
{ 0, 01, 10 }. The codeword 0 is a prefix of 01, so add the dangling suffix 1 to the list. { 0, 01, 10, 1 }. The codeword 1 is a prefix of 10, but the dangling suffix 0 is a codewords. So, conclude that the code is not uniquely decodeable.
The wiki article is a great explanation
The C in your slide correspond to the Si from the wiki article.
Here is description from me:
The important operation is taking two strings from C and if one of them is a prefix to the other you and to record the suffix that is left when the prefix is removed.
This is how C1 is obtained.
With the following C2, C3, etc.
You again want to look for words from C which are prefixes to words from Ci and take the remaining suffix, but you also want to look at the words from C_i and remove and words from C which are prefixes. C(i+1) is the union of those sets.
As soon as some Ci contains a word from C you know the code is not uniquely decodeable.
So:
C = 1, 011, 01110, 1110, 10011
C1 = 110 (because (1)110 is in C), 0011 (because (1)0011 is inC), 10 (because (011)10 is in C)
C2 = {10 (because (1)10 is in C1), 0 (because (1)0 is in C1)} union { 011, because (10)011 is in C }
C1 is found by seeing if any code word in C is a prefix of any other code word in C, if it is then the suffix is added to the set C1. e.g. 1 is a prefix of 1110 and hence you get the suffix 110 which is added to C1.
For C2, first you check to see if the code words in C is a prefix of any other code word in C1 if it is then make a set of all the "dangling suffix" , you then check if C1 is a prefix of any code words in C if it is then again make a set of all the "dangling suffix". Then you take the union of those two sets which results in C2.
Hopefully that kinda made sense.
The sets C1 and C2 correspond to S1 and S2 in this Wikipedia article.
Specifically, C1 is the set of words that can remain after we take a single word from C and remove some its prefix that is also in C.
For C2 we have the words that can remain after we take a word from C and remove a prefix from C1, or after we take a word from C1 and remove a prefix from C.
If we wanted to compute C3, we would take the words that can remain after we take a word from C and remove some its prefix that is in C2, and the words that can remain after we take a word from C2 and remove some its prefix that is in C.
Thus, C3 would be: {[empty word], 0, 011, 10, 11, 1110}. It contains the empty word, so the algorithm halts and determines that C is not uniquely decodable.

How can I make this vector enumeration code faster?

I have three large sets of vectors: A, B1 and B2. These sets are stored in files on disk. For each vector a from A I need to check whether it may be presented as a = b1 + b2, where b1 is from B1 and b2 is from B2. Vectors have 20 components, and all components are non-negative numbers.
How I'm solving this problem now (pseudocode):
foreach a in A
foreach b1 in B1
for i = 1 to 20
bt[i] = a[i] - b1[i]
if bt[i] < 0 then try next b1
next i
foreach b2 in B2
for i = 1 to 20
if bt[i] != b2[i] then try next b2
next i
num_of_expansions++
next b2
next b1
next a
My questions:
1. Any ideas on how to make if faster?
2. How to make it in parallel?
3. Questions 1, 2 for the case when I have B1, B2, ..., Bk, k > 2?
You can sort B1 and B2 by norm. If a = b1 + b2, then ||a|| = ||b1 + b2|| <= ||b1|| + ||b2||, so for any a and b1, you can efficiently eliminate all elements of B2 that have norm < ||a|| - ||b1||. There may also be some way to use the distribution of norms in B1 and B2 to decide whether to switch the roles of the two sets in this. (I don't see off-hand how to do it, but it seems to me that something like this should hold if the distributions of norms in B1 and B2 are significantly different.)
As for making it parallel, it seems that each loop can be turned into a parallel computation, since all computations of one inner iteration are independent of all other iterations.
EDIT
Continuing the analysis: since b2 = a - b1, we also have ||b2|| <= ||a|| + ||b1||. So for any given a and b1, you can restrict the search in B2 to those elements with norms in the range ||a|| ± ||b1||. This suggests that for B1 you should select the set with the smallest average norm.

Reordering of array elements

Given an array
[a1 a2 a3 ... an b1 b2 b3 ... bn c1 c2 c3 ...cn]
without using extra memory how do you reorder into an array
[a1 b1 c1 a2 b2 c2 a3 b3 c3 ... an bn cn]
Your question can also be rephrased as 'How to do an in-place matrix transposition?'. To see why, imagine adding a newline after each subsequence in both of your arrays. This will turn the first array into an NxM matrix, and the second array into an MxN matrix.
Still, it is not trivial for non-square matrices. Please refer to the Wikipedia page on In-place matrix transposition for a comprehensive description of the problem and its solutions.
Assuming you mean O(1) memory (or depending on the model O(log n)) rather than no extra memory, a linear time in-place algorithm exists.
This paper: http://arxiv.org/abs/0805.1598 has an algorithm for the case when you have
a1 ... an b1 ... bn and want to convert to
b1 a1 b2 a2 ... bn an.
The paper also mentions that you can generalize this to other k-way shuffles. In your case, k = 3.
The algorithm in the paper will give the following:
Start with a1 a2 ... an b1 b2 ... bn c1 c2 ... cn and convert to
c1 b1 a1 c2 b2 a2 ... cn bn an
Another pass through this, and you can easily get a1 b1 c2 a2 b2 c2 ... an bn cn.
Now to generalize the algorithm in the paper, we need to pick a prime p, such that k is a primitive root of p^2.
For k = 3, p = 5 will do.
Now to apply the algorithm, first you need to find the largest m < n such 3m+1 is a power of 5.
Note: this will only happen when 3m+1 is an even power of 5. Thus you can actually work with powers of 25 when trying to find the m. (5^odd - 1 is not divisible by 3).
Once you find m,
You shuffle the array to be
[a1 a2 ... am b1 b2 ... bm c1 c2 ... cm] [a(m+1) ... an b(m+1) ... bn c(m+1) ... cn]
and then use the follow the cycle method(refer the paper) for the first 3m elements, using the powers of 5 (including 1 = 5^0) as the starting points of the different cycles) and do a tail recursion for the rest.
Now to convert
a1 a2 ... an b1 b2 ... bn c1 c2 ... cn
to
[a1 a2 ... am b1 b2 ... bm c1 c2 ... cm] [a(m+1) ... an b(m+1) ... bn c(m+1) ... cn]
you first do a cyclic shift to get
a1 a2 ... am [b1 b2 bm a(m+1) .. an] b(m+1) .. bn c1 c2 ... cn
(the elements in the square brackets are the ones that were shifted)
Then do a cyclic shift to get
a1 a2 ... am b1 b2 bm a(m+1) .. an [c1 c2 ..cm b(m+1) .. bn ] c(m+1) ... cn
And then a final shift to
a1 a2 ... am b1 b2 bm [c1 c2 ..cm a(m+1) .. an ] b(m+1) .. bn c(m+1) ... cn
Note that cyclic shift can be done in O(n) time and O(1) space.
So whole algorithm is O(n) time and O(1) space.
You can calculate each item's target position based on its index.
groupSize = N/3
group = i/groupSize
rank = i - group * groupSize
dest = rank * 3 + group
You can use this calculation with a cycle sort to put each element in its proper place in linear time. The only issue is tracking which items are already in place. All you need for that is N bits. With certain types of data, you can "steal" a bit from the data item itself. For instance you can use the high bit of ASCII data, or the low byte of word-aligned pointers.
Alternately, you can do it without any extra bits at the expense going to polynomial time. Reverse the calculation, so you can find the original source index of each item in the final array.
source = i % groupSize + groupSize * (i/groupSize) ; //integer division
Now walk forward through the array, swapping every item with the one from the source. The trick is that any time the source index is less than the current position (meaning it has already been swapped out), you need to follow the trail until you find its current location
getSource(i):
s = i % groupSize + groupSize * (i/groupSize)
while (s<i)
s = s % groupSize + groupSize * (s/groupSize)
return s
shuffle:
for i in (0..N-1)
swap(a[i],a[getSource(i)]
You can do this for certain - just take cards ace, 2, ... 5 in 3 suits and put them in order.
First you take out the a2 card and put it aside.
Then you move the b1 to the a2 position and shift all the cards up
Then you put back the a2 card and put in the shifted out position.
Then you take out the a3 card and puti taside
Move the c1 to the a3 position and shift all the cards up.
Then put back the a3 card in the emptied position.
repeat until done.
The actual calculation of the indices is tricky but I believe a previous poster has done this.

Good algorithm for combining items from N lists into one with balanced distribution?

Let's say I have the three following lists
A1
A2
A3
B1
B2
C1
C2
C3
C4
C5
I'd like to combine them into a single list, with the items from each list as evenly distributed as possible sorta like this:
C1
A1
C2
B1
C3
A2
C4
B2
A3
C5
I'm using .NET 3.5/C# but I'm looking more for how to approach it then specific code.
EDIT: I need to keep the order of elements from the original lists.
Take a copy of the list with the most members. This will be the destination list.
Then take the list with the next largest number of members.
divide the destination list length by the smaller length to give a fractional value of greater than one.
For each item in the second list, maintain a float counter. Add the value calculated in the previous step, and mathematically round it to the nearest integer (keep the original float counter intact). Insert it at this position in the destination list and increment the counter by 1 to account for it. Repeat for all list members in the second list.
Repeat steps 2-5 for all lists.
EDIT: This has the advantage of being O(n) as well, which is always nice :)
Implementation of
Andrew Rollings' answer:
public List<String> equimix(List<List<String>> input) {
// sort biggest list to smallest list
Collections.sort(input, new Comparator<List<String>>() {
public int compare(List<String> a1, List<String> a2) {
return a2.size() - a1.size();
}
});
List<String> output = input.get(0);
for (int i = 1; i < input.size(); i++) {
output = equimix(output, input.get(i));
}
return output;
}
public List<String> equimix(List<String> listA, List<String> listB) {
if (listB.size() > listA.size()) {
List<String> temp;
temp = listB;
listB = listA;
listA = temp;
}
List<String> output = listA;
double shiftCoeff = (double) listA.size() / listB.size();
double floatCounter = shiftCoeff;
for (String item : listB) {
int insertionIndex = (int) Math.round(floatCounter);
output.add(insertionIndex, item);
floatCounter += (1+shiftCoeff);
}
return output;
}
First, this answer is more of a train of thought than a concete solution.
OK, so you have a list of 3 items (A1, A2, A3), where you want A1 to be somewhere in the first 1/3 of the target list, A2 in the second 1/3 of the target list, and A3 in the third 1/3. Likewise you want B1 to be in the first 1/2, etc...
So you allocate your list of 10 as an array, then start with the list with the most items, in this case C. Calculate the spot where C1 should fall (1.5) Drop C1 in the closest spot, (in this case, either 1 or 2), then calculate where C2 should fall (3.5) and continue the process until there are no more Cs.
Then go with the list with the second-to-most number of items. In this case, A. Calculate where A1 goes (1.66), so try 2 first. If you already put C1 there, try 1. Do the same for A2 (4.66) and A3 (7.66). Finally, we do list B. B1 should go at 2.5, so try 2 or 3. If both are taken, try 1 and 4 and keep moving radially out until you find an empty spot. Do the same for B2.
You'll end up with something like this if you pick the lower number:
C1 A1 C2 A2 C3 B1 C4 A3 C5 B2
or this if you pick the upper number:
A1 C1 B1 C2 A2 C3 A3 C4 B2 C5
This seems to work pretty well for your sample lists, but I don't know how well it will scale to many lists with many items. Try it and let me know how it goes.
Make a hash table of lists.
For each list, store the nth element in the list under the key (/ n (+ (length list) 1))
Optionally, shuffle the lists under each key in the hash table, or sort them in some way
Concatenate the lists in the hash by sorted key
I'm thinking of a divide and conquer approach. Each iteration of which you split all the lists with elements > 1 in half and recurse. When you get to a point where all the lists except one are of one element you can randomly combine them, pop up a level and randomly combine the lists removed from that frame where the length was one... et cetera.
Something like the following is what I'm thinking:
- filter lists into three categories
- lists of length 1
- first half of the elements of lists with > 1 elements
- second half of the elements of lists with > 1 elements
- recurse on the first and second half of the lists if they have > 1 element
- combine results of above computation in order
- randomly combine the list of singletons into returned list
You could simply combine the three lists into a single list and then UNSORT that list. An unsorted list should achieve your requirement of 'evenly-distributed' without too much effort.
Here's an implementation of unsort: http://www.vanheusden.com/unsort/.
A quick suggestion, in python-ish pseudocode:
merge = list()
lists = list(list_a, list_b, list_c)
lists.sort_by(length, descending)
while lists is not empty:
l = lists.remove_first()
merge.append(l.remove_first())
if l is not empty:
next = lists.remove_first()
lists.append(l)
lists.sort_by(length, descending)
lists.prepend(next)
This should distribute elements from shorter lists more evenly than the other suggestions here.

Resources