Algorithm to sort pairs of numbers - algorithm

I am stuck with a problem and I need some help from bright minds of SO.
I have N pairs of unsigned integerers. I need to sort them. The ending vector of pairs should be sorted nondecreasingly by the first number in each pair and nonincreasingly by the second in each pair. Each pair can have the first and second elements swapped with each other. Sometimes there is no solution, so I need to throw an exception then.
Example:
in pairs:
1 5
7 1
3 8
5 6
out pairs:
1 7 <-- swapped
1 5
6 5 <-- swapped
8 3 <-- swapped
^^ Without swapping pairs it is impossible to build the solution. So we swap pairs (7, 1), (3, 8) and (5, 6) and build the result.
or
in pairs:
1 5
6 9
out:
not possible
One more example that shows how 'sorting pairs' first isn't the solution.
in pairs:
1 4
2 5
out pairs:
1 4
5 2
Thanks

O( n log n ) solution

Let S(n) equals all the valid sort orderings, where n corresponds to pairs included [0,n].
S(n) = []
for each order in S(n-1)
for each combination of n-th pair
if pair can be inserted in order, add the order after insertion to S(n)
else don't include the order in S(n)
A pair can be inserted into an order in maximum of two ways(normal pair and reversed pair).
Maximum orderings = O(2^n)
I'm not very sure about this amortized orderings, but hear me out.
For an order and pair we have four ways of getting sorted orders after insertions
(two orders, one(normal),one(reversed), zero)
No of orderings (Amortized) = (1/4)*2 + (1/4)*1 + (1/4)*1 + (1/4)*0 = 1
Amortized orderings = O(1)
Similarly time complexity will be O(n^2), Again not sure.
Following program finds orderings using a variant of Insertion sort.
debug = False
(LEFT, RIGHT, ERROR) = range(3)
def position(first, second):
""" Returns the position of first pair when compared to second """
x,y = first
a,b = second
if x <= a and b <= y:
return LEFT
if x >= a and b >= y:
return RIGHT
else:
return ERROR
def insert(pair, order):
""" A pair can be inserted in normal order or reversed order
For each order of insertion we will get one solution or none"""
solutions = []
paircombinations = [pair]
if pair[0] != pair[1]: # reverse and normal order are distinct
paircombinations.append(pair[::-1])
for _pair in paircombinations:
insertat = 0
if debug: print "Inserting", _pair,
for i,p in enumerate(order):
pos = position(_pair, p)
if pos == LEFT:
break
elif pos == RIGHT:
insertat += 1
else:
if debug: print "into", order,"is not possible"
insertat = None
break
if insertat != None:
if debug: print "at",insertat,"in", order
solutions.append(order[0:insertat] + [_pair] + order[insertat:])
return solutions
def swapsort(pairs):
"""
Finds all the solutions of pairs such that ending vector
of pairs are be sorted non decreasingly by the first number in
each pair and non increasingly by the second in each pair.
"""
solutions = [ pairs[0:1] ] # Solution first pair
for pair in pairs[1:]:
# Pair that needs to be inserted into solutions
newsolutions = []
for solution in solutions:
sols = insert(pair, solution) # solutions after inserting pair
if sols:
newsolutions.extend(sols)
if newsolutions:
solutions = newsolutions
else:
return None
return solutions
if __name__ == "__main__":
groups = [ [(1,5), (7,1), (3,8), (5,6)],
[(1,5), (2,3), (3,3), (3,4), (2,4)],
[(3,5), (6,6), (7,4)],
[(1,4), (2,5)] ]
for pairs in groups:
print "Solutions for",pairs,":"
solutions = swapsort(pairs)
if solutions:
for sol in solutions:
print sol
else:
print "not possible"
Output:
Solutions for [(1, 5), (7, 1), (3, 8), (5, 6)] :
[(1, 7), (1, 5), (6, 5), (8, 3)]
Solutions for [(1, 5), (2, 3), (3, 3), (3, 4), (2, 4)] :
[(1, 5), (2, 4), (2, 3), (3, 3), (4, 3)]
[(1, 5), (2, 3), (3, 3), (4, 3), (4, 2)]
[(1, 5), (2, 4), (3, 4), (3, 3), (3, 2)]
[(1, 5), (3, 4), (3, 3), (3, 2), (4, 2)]
Solutions for [(3, 5), (6, 6), (7, 4)] :
not possible
Solutions for [(1, 4), (2, 5)] :
[(1, 4), (5, 2)]

This is a fun problem. I came up with Tom's solution independently, here's my Python code:
class UnableToAddPair:
pass
def rcmp(i,j):
c = cmp(i[0],j[0])
if c == 0:
return -cmp(i[1],j[1])
return c
def order(pairs):
pairs = [list(x) for x in pairs]
for x in pairs:
x.sort()
pairs.sort(rcmp)
top, bottom = [], []
for p in pairs:
if len(top) == 0 or p[1] <= top[-1][1]:
top += [p]
elif len(bottom) == 0 or p[1] <= bottom[-1][1]:
bottom += [p]
else:
raise UnableToAddPair
bottom = [[x[1],x[0]] for x in bottom]
bottom.reverse()
print top + bottom
One important point not mentioned in Tom's solution is that in the sorting stage, if the lesser values of any two pairs are the same, you have to sort by decreasing value of the greater element.
It took me a long time to figure out why a failure must indicate that there's no solution; my original code had backtracking.

Below is a simple recursive depth-first search algorithm in Python:
import sys
def try_sort(seq, minx, maxy, partial):
if len(seq) == 0: return partial
for i, (x, y) in enumerate(seq):
if x >= minx and y <= maxy:
ret = try_sort(seq[:i] + seq[i+1:], x, y, partial + [(x, y)])
if ret is not None: return ret
if y >= minx and x <= maxy:
ret = try_sort(seq[:i] + seq[i+1:], y, x, partial + [(y, x)])
if ret is not None: return ret
return None
def do_sort(seq):
ret = try_sort(seq, -sys.maxint-1, sys.maxint, [])
print ret if ret is not None else "not possible"
do_sort([(1,5), (7,1), (3,8), (5,6)])
do_sort([(1,5), (2,9)])
do_sort([(3,5), (6,6), (7,4)])
It maintains a sorted subsequence (partial) and tries to append every remaining pair to it both in the original and in the reversed order, without violating the conditions of the sort.
If desired, the algorithm can be easily changed to find all valid sort orders.
Edit: I suspect that the algorithm can be substantially improved by maintaining two partially-sorted sequences (a prefix and a suffix). I think that this would allow the next element can be chosen deterministically instead of trying all possible elements. Unfortunately, I don't have time right now to think this through.

Update: this answer is no longer valid since question was changed
Split vector of pairs into buckets by first number. Do descending sort on each bucket. Merge buckets in ascending order of first numbers and keep track of second number of last pair. If it's greater than current one there is no solution. Otherwise you will get solution after merge is done.
If you have stable sorting algorithm you can do descending sort by second number and then ascending sort by first number. After that check if second numbers are still in descending order.

The swapping in your case is just a sort of a 2-element array.
so you can
tuple[] = (4,6),(1,5),(7,1),(8,6), ...
for each tuple -> sort internal list
=> (4,6),(1,5),(1,7),(6,8)
sort tuple by 1st asc
=> (1,5),(1,7),(4,6),(6,8)
sort tuple by 1nd desc
=> (1,7),(1,5),(4,6),(6,8)

The first thing I notice is that there is no solution if both values in one tuple are larger than both values in any other tuple.
The next thing I notice is that tuples with a small difference become sorted towards the middle, and tupples with large differences become sorted towards the ends.
With these two pieces of information you should be able to figure out a reasonable solution.
Phase 1: Sort each tuple moving the smaller value first.
Phase 2: Sort the list of tuples; first in descending order of the difference between the two values of each tuple, then sort each grouping of equal difference in ascending order of the first member of each tuple. (Eg. (1,6),(2,7),(3,8),(4,4),(5,5).)
Phase 3: Check for exceptions. 1: Look for a pair of tuples where both elements of one tuple are larger than both elements of the other tuple. (Eg. (4,4),(5,5).) 2: If there are four or more tuples, then look within each group of tuples with the same difference for three or more variations (Eg. (1,6),(2,7),(3,8).)
Phase 4: Rearrange tuples. Starting at the back end (tuples with smallest difference), the second variation within each grouping of tuples with equal difference must have their elements swapped and the tuples appended to the back of the list. (Eg. (1,6),(2,7),(5,5) => (2,7),(5,5),(6,1).)
I think this should cover it.

This is a very interesting question. Here is my solution to it in VB.NET.
Module Module1
Sub Main()
Dim input = {Tuple.Create(1, 5),
Tuple.Create(2, 3),
Tuple.Create(3, 3),
Tuple.Create(3, 4),
Tuple.Create(2, 4)}.ToList
Console.WriteLine(Solve(input))
Console.ReadLine()
End Sub
Private Function Solve(ByVal input As List(Of Tuple(Of Integer, Integer))) As String
Dim splitItems As New List(Of Tuple(Of Integer, Integer))
Dim removedSplits As New List(Of Tuple(Of Integer, Integer))
Dim output As New List(Of Tuple(Of Integer, Integer))
Dim otherPair = Function(indexToFind As Integer, startPos As Integer) splitItems.FindIndex(startPos, Function(x) x.Item2 = indexToFind)
Dim otherPairBackwards = Function(indexToFind As Integer, endPos As Integer) splitItems.FindLastIndex(endPos, Function(x) x.Item2 = indexToFind)
'split the input while preserving their indices in the Item2 property
For i = 0 To input.Count - 1
splitItems.Add(Tuple.Create(input(i).Item1, i))
splitItems.Add(Tuple.Create(input(i).Item2, i))
Next
'then sort the split input ascending order
splitItems.Sort(Function(x, y) x.Item1.CompareTo(y.Item1))
'find the distinct values in the input (which is pre-sorted)
Dim distincts = splitItems.Select(Function(x) x.Item1).Distinct
Dim dIndex = 0
Dim lastX = -1, lastY = -1
'go through the distinct values one by one
Do While dIndex < distincts.Count
Dim d = distincts(dIndex)
'temporary list to store the output for the current distinct number
Dim temOutput As New List(Of Tuple(Of Integer, Integer))
'go through each of the split items and look for the current distinct number
Dim curIndex = 0, endIndex = splitItems.Count - 1
Do While curIndex <= endIndex
If splitItems(curIndex).Item1 = d Then
'find the pair of the item
Dim pairIndex = otherPair(splitItems(curIndex).Item2, curIndex + 1)
If pairIndex = -1 Then pairIndex = otherPairBackwards(splitItems(curIndex).Item2, curIndex - 1)
'create a pair and add it to the temporary output list
temOutput.Add(Tuple.Create(splitItems(curIndex).Item1, splitItems(pairIndex).Item1))
'push the items onto the temporary storage and remove it from the split list
removedSplits.Add(splitItems(curIndex))
removedSplits.Add(splitItems(pairIndex))
If curIndex > pairIndex Then
splitItems.RemoveAt(curIndex)
splitItems.RemoveAt(pairIndex)
Else
splitItems.RemoveAt(pairIndex)
splitItems.RemoveAt(curIndex)
End If
endIndex -= 2
Else
'increment the index or exit the iteration as appropriate
If splitItems(curIndex).Item1 <= d Then curIndex += 1 Else Exit Do
End If
Loop
'sort temporary output by the second item and add to the main output
output.AddRange(From r In temOutput Order By r.Item2 Descending)
'ensure that the entire list is properly ordered
'start at the first item that was added from the temporary output
For i = output.Count - temOutput.Count To output.Count - 1
Dim r = output(i)
If lastX = -1 Then
lastX = r.Item1
ElseIf lastX > r.Item1 Then
'!+ It appears this section of the if statement is unnecessary
'sorting on the first column is out of order so remove the temporary list
'and send the items in the temporary list back to the split items list
output.RemoveRange(output.Count - temOutput.Count, temOutput.Count)
splitItems.AddRange(removedSplits)
splitItems.Sort(Function(x, y) x.Item1.CompareTo(y.Item1))
dIndex += 1
Exit For
End If
If lastY = -1 Then
lastY = r.Item2
ElseIf lastY < r.Item2 Then
'sorting on the second column is out of order so remove the temporary list
'and send the items in the temporary list back to the split items list
output.RemoveRange(output.Count - temOutput.Count, temOutput.Count)
splitItems.AddRange(removedSplits)
splitItems.Sort(Function(x, y) x.Item1.CompareTo(y.Item1))
dIndex += 1
Exit For
End If
Next
removedSplits.Clear()
Loop
If splitItems.Count = 0 Then
Dim result As New Text.StringBuilder()
For Each r In output
result.AppendLine(r.Item1 & " " & r.Item2)
Next
Return result.ToString
Else
Return "Not Possible"
End If
End Function
<DebuggerStepThrough()> _
Public Class Tuple(Of T1, T2)
Implements IEqualityComparer(Of Tuple(Of T1, T2))
Public Property Item1() As T1
Get
Return _first
End Get
Private Set(ByVal value As T1)
_first = value
End Set
End Property
Private _first As T1
Public Property Item2() As T2
Get
Return _second
End Get
Private Set(ByVal value As T2)
_second = value
End Set
End Property
Private _second As T2
Public Sub New(ByVal item1 As T1, ByVal item2 As T2)
_first = item1
_second = item2
End Sub
Public Overloads Function Equals(ByVal x As Tuple(Of T1, T2), ByVal y As Tuple(Of T1, T2)) As Boolean Implements IEqualityComparer(Of Tuple(Of T1, T2)).Equals
Return EqualityComparer(Of T1).[Default].Equals(x.Item1, y.Item1) AndAlso EqualityComparer(Of T2).[Default].Equals(x.Item2, y.Item2)
End Function
Public Overrides Function Equals(ByVal obj As Object) As Boolean
Return TypeOf obj Is Tuple(Of T1, T2) AndAlso Equals(Me, DirectCast(obj, Tuple(Of T1, T2)))
End Function
Public Overloads Function GetHashCode(ByVal obj As Tuple(Of T1, T2)) As Integer Implements IEqualityComparer(Of Tuple(Of T1, T2)).GetHashCode
Return EqualityComparer(Of T1).[Default].GetHashCode(Item1) Xor EqualityComparer(Of T2).[Default].GetHashCode(Item2)
End Function
End Class
Public MustInherit Class Tuple
<DebuggerStepThrough()> _
Public Shared Function Create(Of T1, T2)(ByVal first As T1, ByVal second As T2) As Tuple(Of T1, T2)
Return New Tuple(Of T1, T2)(first, second)
End Function
End Class
End Module
The input
1 5
2 3
3 3
3 4
2 4
Produces the output
1 5
2 4
2 3
3 4
3 3
And
3 5
6 6
7 4
Outputs
Not Nossible
Comments
I found this problem quite challenging. It took me some 15 minutes to come up with with a solution and an hour or so to write and debug it. The code is littered with comments so that anyone can follow it.

Related

Stack permutations sorted in lexicographical order

A stack permutation of number N is defined as the number of sequences which you can print by doing the following
Keep two stacks say A and B.
Push numbers from 1 to N in reverse order in B. (so the top of B is 1 and the last element in B is N)
Do the following operations
Choose the top element from A or B and print it and delete it (pop it). This can be done on a non-empty stack only.
Move the top element from B to A (if B is non-empty)
If both stacks are empty then stop
All possible sequences obtained by doing these operations in some order are called stack permutations.
eg: N = 2
stack permutations are (1, 2) and (2, 1)
eg: N = 3
stack permutations are (1, 2, 3), (1, 3, 2), (2, 1, 3), (2, 3, 1) and (3, 2, 1)
The number of stack permutations for N numbers is C(N), where C(N) is the Nth Catalan Number.
Suppose we generate all stack permutations for a given N and then print them in lexicographical order (dictionary order), how can we determine the kth permutation, without actually generating all the permutations and then sorting them?
I want some algorithmic approaches that are programmable.
You didn't say whether k should be 0 based or 1 based. I chose 0. Switching back is easy.
The approach is to first write a function to be able to count how many stack permutations there are from a given decision point. Use memoization to make it fast. And then proceed down the decision tree by skipping over decisions that lead to permutations which are lexicographically smaller. That will lead to the list of decisions that are the one you want.
def count_stack_permutations (on_b, on_a=0, can_take_from_a=True, cache={}):
key = (on_b, on_a, can_take_from_a)
if on_a < 0:
return 0 # can't go negative.
elif on_b == 0:
if can_take_from_a:
return 1 # Just drain a
else:
return 0 # Got nothing.
elif key not in cache:
# Drain b
answer = count_stack_permutations(on_b-1, on_a, True)
# Drain a?
if can_take_from_a:
answer = answer + count_stack_permutations(on_b, on_a-1, True)
# Move from b to a.
answer = answer + count_stack_permutations(on_b-1, on_a+1, False)
cache[key] = answer
return cache[key]
def find_kth_permutation (n, k):
# The end of the array is the top
a = []
b = list(range(n, 0, -1))
can_take_from_a = True # We obviously won't first. :-)
answer = []
while 0 < max(len(a), len(b)):
action = None
on_a = len(a)
on_b = len(b)
# If I can take from a, that is always smallest.
if can_take_from_a:
if count_stack_permutations(on_b, on_a - 1, True) <= k:
k = k - count_stack_permutations(on_b, on_a - 1, True)
else:
action = 'a'
# Taking from b is smaller than digging into b so I can take deeper.
if action is None:
if count_stack_permutations(on_b-1, on_a, True) <= k:
k = k - count_stack_permutations(on_b-1, on_a, True)
else:
action = 'b'
# Otherwise I will move.
if action is None:
if count_stack_permutations(on_b-1, on_a, False) < k:
return None # Should never happen
else:
action = 'm'
if action == 'a':
answer.append(a.pop())
can_take_from_a = True
elif action == 'b':
answer.append(b.pop())
can_take_from_a = True
else:
a.append(b.pop())
can_take_from_a = False
return answer
# And demonstrate it in action.
for k in range(0, 6):
print((k, find_kth_permutation(3, k)))
This is possible using factoradic(https://en.wikipedia.org/wiki/Factorial_number_system)
If you need quick solution in Java use JNumberTools
JNumberTools.permutationsOf("A","B","C")
.uniqueNth(4) //next 4th permutation
.forEach(System.out::println);
This API will generate the next nth permutation directly in lexicographic order. So you can even generate next billionth permutation of 100 items.
for generating next nth permutation of given size use:
JNumberTools.permutationsOf("A","B","C")
.kNth(2,4) //next 4th permutation of size 2
.forEach(System.out::println);
maven dependency for JNumberTools is:
<dependency>
<groupId>io.github.deepeshpatel</groupId>
<artifactId>jnumbertools</artifactId>
<version>1.0.0</version>
</dependency>

Longest Substring with no Y Divide and Conquer

Before I carry on to the problem, I should note that I know there are much easier ways to solve this problem without using divide and conquer; however, the point of solving this problem under this restriction is that I actually want to learn how to tackle problems with divide and conquer. I am good at recognizing correct solutions, but implementing my own D&C strategy is not a skill I currently have.
The problem is this: given a string, find the longest substring that does not contain the letter 'y'. For example, longestNoY("abydefyhi") should return "def".
My first approach to tackle this problem was to determine the base cases. If we had a string of length 2, we would want to return the non-y components (or empty string if both characters were 'y'). If we had a string of length 1, we would return it if it is not a 'y'.
So the first part should look like this:
def longestNoY(string, start, end):
#Conquer
if start == end:
if string == 'y': return ''
return string
if start + 1 == end:
if string == "yy": return ''
if string[0] == 'y': return string[1]
return string[0]
....
Next, I knew that I would need to recursively call the function for each half of the parent string. I also knew that I wanted the function to return the longer of the two children, except if the sum of the lengths of the two children was equal to the length of the parent, then the function should return the parent because there were no 'y's in the children.
#Divide and Partial Implementation of Rejoin
....
middle = (start + end) // 2
leftString = longestNoY(string, start, middle)
rightString = longestNoY(string, middle, end)
if len(leftString) + len(rightString) == len(string): return string
....
The part I am having trouble with now would best be explained with an example:
0 1 2 3 4 5 6 7 8
a b y d e | f y h i
a b y | d e | f y | h i
a b | y | d e | f y | h i
The longest substring in the left side is either "ab" or "de", but we know that "de" is adjacent to an 'f' which would make "def" the longest. I don't know exactly how to carry on with this problem. Please do not give me a program to solve this problem.
This can be easily solved by just traversing through the string. But I know you want to learn Divide Conquer.
To me, this is not a good problem to solve using Divide Conquer.
What #WillemVanOnsem suggested by recursion has essentially the same effect as when you traverse linearly.
But if you do want to do it in Divide & Conquer fashion, you need to consider the substring that crosses the mid point i.e. start <= i <= mid < j <= end - but that would be overkill.
It is possible. But then you each time need to return four values: the longest subsequence that starts at the left end of the "slice" (this can be zero), the longest subsequence "in the middle", the longest subsequence that ends at the right end of the "slice" (this can be zero as well), and if the string is just a sequence of non-Y characters (a boolean). The fourth element can in fact just be derived by checking if one of the elements in the first three is equal to the length, but this is probably easier to implement.
Why is this important? Because a sequence of non-ys can pass "through" a divide. For example:
abcdeYfghi jklYmnopqr
here if we split it in the middle (or any other way that is not "constant" and "rest").
So here recursively we have several cases:
the empty string returns (0, 0, 0, True),
the non-empty string other than Y, we return (1, 1, 1, True);
for the singleton string Y we return (0, 0, 0, False);
the recursive case that divides the string in two, and applies "merge" logic afterwards on the results.
The "merge logic" is rather complex, especially since it is possible that both "subslices" only contain non-Y strings. After slicing we thus obtain two triples (a0, a1, a2, a3) and (b0, b1, b2, b3), and we produce a 3-tuple (c0, c1, c2, c3).
If a3 = True and b3 = True, then of course that means that the current slice contains no Y's as well. So we can derive that:
c3 = a3 and b3
given a3 holds, then it holds that c0 = a0 + b0 since then a0 has no Y's and thus the left "sequence" is the same as the entire length of the subsequence plus the left subsequence of the right part. If a3 does not hold, c0 is just a0.
Given b3 holds, then it holds that c2 = a2 + b2 for the same reasoning as the one above, if not, then a2 = b2.
Now the element in the middle is the maximum of three elements:
the element in the middle of the left slice a1;
the element in the middle of the right slice b1; and
the sum of a2 and b0 since there can be overlap and then this is the sum of the two.
We thus return the maximum of the tree.
So in Python, this looks like:
def longestNoY(string, start, end):
if start == end:
return (0, 0, 0, True)
elif start+1 == end:
if string[start] == 'y':
return (0, 0, 0, False)
else:
return (1, 1, 1, True)
else:
mid = (start + end)//2
a0, a1, a2, a3 = longestNoY(string, start, mid)
b0, b1, b2, b3 = longestNoY(string, mid, end)
c3 = a3 and b3
c0 = a0 + a3 * b0
c2 = b2 + b3 * a2
c1 = max(a1, b1, a2 + b0)
return (c0, c1, c2, c3)
The final result is the maximum of the first three items in the tuple.
For the given sample string, we thus obtain:
(1, 1, 1, True) a
(1, 1, 1, True) b
(2, 2, 2, True) ab
(0, 0, 0, False) y
(1, 1, 1, True) d
(0, 1, 1, False) yd
(2, 2, 1, False) abyd
(1, 1, 1, True) e
(1, 1, 1, True) f
(2, 2, 2, True) ef
(0, 0, 0, False) y
(1, 1, 1, True) h
(1, 1, 1, True) i
(2, 2, 2, True) hi
(0, 2, 2, False) yhi
(2, 2, 2, False) efyhi
(2, 3, 2, False) abydefyhi
(2, 3, 2, False)
but that being said, it looks to me as an unnecessary complicated procedure to construct something that, in terms of time complexity, is the same as traversal, but typically more expensive (function calls, constructing new objects, etc.). Especially since linear traversal is just:
def longestNoY(string):
mx = 0
cur = 0
for c in string:
if c == 'y':
mx = max(mx, cur)
cur = 0
else:
cur += 1
return mx
There is however an advantage here is that the above described algorithm can be used for parallelization. If for example the string is huge, the above can be used such that every core can count this. In that case it is however likely beneficial to use an iterative level on the "core" level, and only use the above to "distribute" work and "collect" results.
I think the best way to put the problem is to find the positions just before and just after y, not being y. This way you will find left and right ends of intervals. I do not give you the code, since you specifically asked as not to solve the problem for you, just point to the right direction, so:
In trivial cases (length of interval is 0) determine whether the item you have is a valid left end or right and of an interval
In non-trivial cases always halve the set to left and right (no problem if the number of items is odd, just put the middle somewhere) and issue the divide and conquer for them as well
In non-trivial cases always consider the best interval the left and right sub-problem gives you
In non-trivial cases make sure that if an interval happens to start in the left and end in the right, you take that into account
from such intervals, the one which has a greater length is better
These are the ideas you need to employ in order to implement the divide and conquer you desire. Happy coding!
Now that I actually had time to study, I decided to come back to this problem and came up with a very readable solution. Here it is:
def across(string, start, end, middle):
startL = middle
bestL = ''
while(startL >= start and string[startL] != 'y'):
bestL = string[startL] + bestL
startL -= 1
startR = middle + 1
bestR = ''
while(startR <= end and string[startR] != 'y'):
bestR = bestR + string[startR]
startR += 1
return bestL + bestR
def longestNoY(string, start, end):
if(start > end):
return ''
if(start == end):
if(string[start] == 'y'):
return ''
return string[start]
middle = (start + end) // 2
leftString = longestNoY(string, start, middle)
rightString = longestNoY(string, middle + 1, end)
acrossString = across(string, start, end, middle)
return max(leftString, rightString, acrossString, key=len)

Algorithm for combining different age groups together based on their values

Let's say we have an array of age groups and an array of the number of people in each age group
For example:
Ages = ("1-13", "14-20", "21-30", "31-40", "41-50", "51+")
People = (1, 10, 21, 3, 2, 1)
I want to have an algorithm that combines these age groups with the following logic if there are fewer than 5 people in each group. The algorithm that I have so far does the following:
Start from the last element (e.g., "51+") can you combine it with the next group? (here "41-50") if yes add the numbers 1+2 and combine their labels. So we get the following
Ages = ("1-13", "14-20", "21-30", "31-40", "41+")
People = (1, 10, 21, 3, 3)
Take the last one again (here is "41+"). Can you combine it with the next group (31-40)? the answer is yes so we get:
Ages = ("1-13", "14-20", "21-30", "31+")
People = (1, 10, 21, 6)
since the group 31+ now has 6 members we cannot collapse it into the next group.
we cannot collapse "21-30" into the next one "14-20" either
"14-20" also has 10 people (>5) so we don't do anything on this either
for the first one ("1-13") since we have only one person and it is the last group we combine it with the next group "14-20" and get the following
Ages = ("1-20", "21-30", "31+")
People = (11, 21, 6)
I have an implementation of this algorithm that uses many flags to keep track of whether or not any data is changed and it makes a number of passes on the two arrays to finish this task.
My question is if you know any efficient way of doing the same thing? any data structure that can help? any algorithm that can help me do the same thing without doing too much bookkeeping would be great.
Update:
A radical example would be (5,1,5)
in the first pass it becomes (5,6) [collapsing the one on the right into the one in the middle]
then we have (5,6). We cannot touch 6 since it is larger than our threshold:5. so we go to the next one (which is element on the very left 5) since it is less than or equal to 5 and since it is the last one on the left we group it with the one on its right. so we finally get (11)
Here is an OCaml solution of a left-to-right merge algorithm:
let close_group acc cur_count cur_names =
(List.rev cur_names, cur_count) :: acc
let merge_small_groups mini l =
let acc, cur_count, cur_names =
List.fold_left (
fun (acc, cur_count, cur_names) (name, count) ->
if cur_count <= mini || count <= mini then
(acc, cur_count + count, name :: cur_names)
else
(close_group acc cur_count cur_names, count, [name])
) ([], 0, []) l
in
List.rev (close_group acc cur_count cur_names)
let input = [
"1-13", 1;
"14-20", 10;
"21-30", 21;
"31-40", 3;
"41-50", 2;
"51+", 1
]
let output = merge_small_groups 5 input
(* output = [(["1-13"; "14-20"], 11); (["21-30"; "31-40"; "41-50"; "51+"], 27)] *)
As you can see, the result of merging from left to right may not be what you want.
Depending on the goal, it may make more sense to merge the pair of consecutive elements whose sum is smallest and iterate until all counts are above the minimum of 5.
Here is my scala approach.
We start with two lists:
val people = List (1, 10, 21, 3, 2, 1)
val ages = List ("1-13", "14-20", "21-30", "31-40", "41-50", "51+")
and combine them to a kind of mapping:
val agegroup = ages.zip (people)
define a method to merge two Strings, describing an (open ended) interval. The first parameter is, if any, the one with the + in "51+".
/**
combine age-strings
a+ b-c => b+
a-b c-d => c-b
*/
def merge (xs: String, ys: String) = {
val xab = xs.split ("[+-]")
val yab = ys.split ("-")
if (xs.contains ("+")) yab(0) + "+" else
yab (0) + "-" + xab (1)
}
Here is the real work:
/**
reverse the list, combine groups < threshold.
*/
def remap (map: List [(String, Int)], threshold : Int) = {
def remap (mappings: List [(String, Int)]) : List [(String, Int)] = mappings match {
case Nil => Nil
case x :: Nil => x :: Nil
case x :: y :: xs => if (x._2 > threshold) x :: remap (y :: xs) else
remap ((merge (x._1, y._1), x._2 + y._2) :: xs) }
val nearly = (remap (map.reverse)).reverse
// check for first element
if (! nearly.isEmpty && nearly.length > 1 && nearly (0)._2 < threshold) {
val a = nearly (0)
val b = nearly (1)
val rest = nearly.tail.tail
(merge (b._1, a._1), a._2 + b._2) :: rest
} else nearly
}
and invocation
println (remap (agegroup, 5))
with result:
scala> println (remap (agegroup, 5))
List((1-20,11), (21-30,21), (31+,6))
The result is a list of pairs, age-group and membercount.
I guess the main part is easy to understand: There are 3 basic cases: an empty list, which can't be grouped, a list of one group, which is the solution itself, and more than one element.
If the first element (I reverse the list in the beginning, to start with the end) is bigger than 5 (6, whatever), yield it, and procede with the rest - if not, combine it with the second, and take this combined element and call it with the rest in a recursive way.
If 2 elements get combined, the merge-method for the strings is called.
The map is remapped, after reverting it, and the result reverted again. Now the first element has to be inspected and eventually combined.
We're done.
I think a good data structure would be a linked list of pairs, where each pair contains the age span and the count. Using that, you can easily walk the list, and join two pairs in O(1).

How to generate cross product of sets in specific order

Given some sets (or lists) of numbers, I would like to iterate through the cross product of these sets in the order determined by the sum of the returned numbers. For example, if the given sets are { 1,2,3 }, { 2,4 }, { 5 }, then I would like to retrieve the cross-products in the order
<3,4,5>,
<2,4,5>,
<3,2,5> or <1,4,5>,
<2,2,5>,
<1,2,5>
I can't compute all the cross-products first and then sort them, because there are way too many. Is there any clever way to achieve this with an iterator?
(I'm using Perl for this, in case there are modules that would help.)
For two sets A and B, we can use a min heap as follows.
Sort A.
Sort B.
Push (0, 0) into a min heap H with priority function (i, j) |-> A[i] + B[j]. Break ties preferring small i and j.
While H is not empty, pop (i, j), output (A[i], B[j]), insert (i + 1, j) and (i, j + 1) if they exist and don't already belong to H.
For more than two sets, use the naive algorithm and sort to get down to two sets. In the best case (which happens when each set is relatively small), this requires storage for O(√#tuples) tuples instead of Ω(#tuples).
Here's some Python to do this. It should transliterate reasonably straightforwardly to Perl. You'll need a heap library from CPAN and to convert my tuples to strings so that they can be keys in a Perl hash. The set can be stored as a hash as well.
from heapq import heappop, heappush
def largest_to_smallest(lists):
"""
>>> print list(largest_to_smallest([[1, 2, 3], [2, 4], [5]]))
[(3, 4, 5), (2, 4, 5), (3, 2, 5), (1, 4, 5), (2, 2, 5), (1, 2, 5)]
"""
for lst in lists:
lst.sort(reverse=True)
num_lists = len(lists)
index_tuples_in_heap = set()
min_heap = []
def insert(index_tuple):
if index_tuple in index_tuples_in_heap:
return
index_tuples_in_heap.add(index_tuple)
minus_sum = 0 # compute -sum because it's a min heap, not a max heap
for i in xrange(num_lists): # 0, ..., num_lists - 1
if index_tuple[i] >= len(lists[i]):
return
minus_sum -= lists[i][index_tuple[i]]
heappush(min_heap, (minus_sum, index_tuple))
insert((0,) * num_lists)
while min_heap:
minus_sum, index_tuple = heappop(min_heap)
elements = []
for i in xrange(num_lists):
elements.append(lists[i][index_tuple[i]])
yield tuple(elements) # this is where the tuple is returned
for i in xrange(num_lists):
neighbor = []
for j in xrange(num_lists):
if i == j:
neighbor.append(index_tuple[j] + 1)
else:
neighbor.append(index_tuple[j])
insert(tuple(neighbor))

How to generate a list of subsets with restrictions?

I am trying to figure out an efficient algorithm to take a list of items and generate all unique subsets that result from splitting the list into exactly 2 sublists. I'm sure there is a general purpose way to do this, but I'm interested in a specific case. My list will be sorted, and there can be duplicate items.
Some examples:
Input
{1,2,3}
Output
{{1},{2,3}}
{{2},{1,3}}
{{3},{1,2}}
Input
{1,2,3,4}
Output
{{1},{2,3,4}}
{{2},{1,3,4}}
{{3},{1,2,4}}
{{4},{1,2,3}}
{{1,2},{3,4}}
{{1,3},{2,4}}
{{1,4},{2,3}}
Input
{1,2,2,3}
Output
{{1},{2,2,3}}
{{2},{1,2,3}}
{{3},{1,2,2}}
{{1,2},{2,3}}
{{1,3},{2,2}}
I can do this on paper, but I'm struggling to figure out a simple way to do it programmatically. I'm only looking for a quick pseudocode description of how to do this, not any specific code examples.
Any help is appreciated. Thanks.
If you were generating all subsets you would end up generating 2n subsets for a list of length n. A common way to do this is to iterate through all the numbers i from 0 to 2n-1 and use the bits that are set in i to determine which items are in the ith subset. This works because any item either is or is not present in any particular subset, so by iterating through all the combinations of n bits you iterate through the 2n subsets.
For example, to generate the subsets of (1, 2, 3) you would iterate through the numbers 0 to 7:
0 = 000b → ()
1 = 001b → (1)
2 = 010b → (2)
3 = 011b → (1, 2)
4 = 100b → (3)
5 = 101b → (1, 3)
6 = 110b → (2, 3)
7 = 111b → (1, 2, 3)
In your problem you can generate each subset and its complement to get your pair of mutually exclusive subsets. Each pair would be repeated when you do this so you only need to iterate up to 2n-1 - 1 and then stop.
1 = 001b → (1) + (2, 3)
2 = 010b → (2) + (1, 3)
3 = 011b → (1, 2) + (3)
To deal with duplicate items you could generate subsets of list indices instead of subsets of list items. Like with the list (1, 2, 2, 3) generate subsets of the list (0, 1, 2, 3) instead and then use those numbers as indices into the (1, 2, 2, 3) list. Add a level of indirection, basically.
Here's some Python code putting this all together.
#!/usr/bin/env python
def split_subsets(items):
subsets = set()
for n in xrange(1, 2 ** len(items) / 2):
# Use ith index if ith bit of n is set.
l_indices = [i for i in xrange(0, len(items)) if n & (1 << i) != 0]
# Use the indices NOT present in l_indices.
r_indices = [i for i in xrange(0, len(items)) if i not in l_indices]
# Get the items corresponding to the indices above.
l = tuple(items[i] for i in l_indices)
r = tuple(items[i] for i in r_indices)
# Swap l and r if they are reversed.
if (len(l), l) > (len(r), r):
l, r = r, l
subsets.add((l, r))
# Sort the subset pairs so the left items are in ascending order.
return sorted(subsets, key = lambda (l, r): (len(l), l))
for l, r in split_subsets([1, 2, 2, 3]):
print l, r
Output:
(1,) (2, 2, 3)
(2,) (1, 2, 3)
(3,) (1, 2, 2)
(1, 2) (2, 3)
(1, 3) (2, 2)
The following C++ function does exactly what you need, but the order differs from the one in examples:
// input contains all input number with duplicates allowed
void generate(std::vector<int> input) {
typedef std::map<int,int> Map;
std::map<int,int> mp;
for (size_t i = 0; i < input.size(); ++i) {
mp[input[i]]++;
}
std::vector<int> numbers;
std::vector<int> mult;
for (Map::iterator it = mp.begin(); it != mp.end(); ++it) {
numbers.push_back(it->first);
mult.push_back(it->second);
}
std::vector<int> cur(mult.size());
for (;;) {
size_t i = 0;
while (i < cur.size() && cur[i] == mult[i]) cur[i++] = 0;
if (i == cur.size()) break;
cur[i]++;
std::vector<int> list1, list2;
for (size_t i = 0; i < cur.size(); ++i) {
list1.insert(list1.end(), cur[i], numbers[i]);
list2.insert(list2.end(), mult[i] - cur[i], numbers[i]);
}
if (list1.size() == 0 || list2.size() == 0) continue;
if (list1 > list2) continue;
std::cout << "{{";
for (size_t i = 0; i < list1.size(); ++i) {
if (i > 0) std::cout << ",";
std::cout << list1[i];
}
std::cout << "},{";
for (size_t i = 0; i < list2.size(); ++i) {
if (i > 0) std::cout << ",";
std::cout << list2[i];
}
std::cout << "}\n";
}
}
A bit of Erlang code, the problem is that it generates duplicates when you have duplicate elements, so the result list still needs to be filtered...
do([E,F]) -> [{[E], [F]}];
do([H|T]) -> lists:flatten([{[H], T}] ++
[[{[H|L1],L2},{L1, [H|L2]}] || {L1,L2} <- all(T)]).
filtered(L) ->
lists:usort([case length(L1) < length(L2) of true -> {L1,L2};
false -> {L2,L1} end
|| {L1,L2} <- do(L)]).
in pseudocode this means that:
for a two long list {E,F} the result is {{E},{F}}
for longer lists take the first element H and the rest of the list T and return
{{H},{T}} (the first element as a single element list, and the remaining list)
also run the algorithm recursively for T, and for each {L1,L2} element in the resulting list return {{H,L1},{L2}} and {{L1},{H,L2}}
My suggestion is...
First, count how many of each value you have, possibly in a hashtable. Then calculate the total number of combinations to consider - the product of the counts.
Iterate through that number of combinations.
At each combination, copy your loop count (as x), then start an inner loop through your hashtable items.
For each hashtable item, use (x modulo count) as your number of instances of the hashtable key in the first list. Divide x by the count before repeating the inner loop.
If you are worried that the number of combinations might overflow your integer type, the issue is avoidable. Use an array with each item (one for every hashmap key) starting from zero, and 'count' through the combinations treating each array item as a digit (so the whole array represents the combination number), but with each 'digit' having a different base (the corresponding count). That is, to 'increment' the array, first increment item 0. If it overflows (becomes equal to its count), set it to zero and increment the next array item. Repeat the overflow checks until If overflows continue past the end of the array, you have finished.
I think sergdev is using a very similar approach to this second one, but using std::map rather than a hashtable (std::unordered_map should work). A hashtable should be faster for large numbers of items, but won't give you the values in any particular order. The ordering for each loop through the keys in a hashtable should be consistent, though, unless you add/remove keys.

Resources