Divide and conquer on sorted input with Haskell - algorithm

For a part of a divide and conquer algorithm, I have the following question where the data structure is not fixed, so set is not to be taken literally:
Given a set X sorted wrt. some ordering of elements and subsets A and B together consisting of all elements in X, can sorted versions A' and B' of A and B be constructed in time linear in the number of elements in X ?
At the moment I am doing a standard sort at each recursive step giving the recursion
T(n) = 2*T(n/2) + O(n*log n)
for the complexity rather than
T(n) = 2*T(n/2) + O(n)
like in the procedural version, where one can utilize a structure with constant-time lookup on A and B to form A' and B' in linear time.
The added log n factor carries over to the overall complexity, giving O(n* (log n)^2) instead of O(n* log n).
EDIT:
Perhaps I am understanding the term lookup incorrectly. The creation of A' and B' in linear time is easy to do if membership of A and B can be checked in constant time.
I didn't succeed in my attempt at making things clearer by abstracting
away the specifics, so here is the actual problem:
I am implementing the algorithm for the closest pair problem. Given a
finite collection P of points in the plane it finds a pair of points
in P with the minimal distance. It works roughly as follows:
If P
has at least 4 points, form Px and
Py, the points in P sorted by x- and y-coordinate. By
splitting Px form L and R, the left- and right-most
halves of points. Recursively compute the closest pair distance in L and
R, let d be the minimum of the two. Now the minimum distance in P is
either d or the distance from a point in L to a point in R. If the
minimal distance is between points from separate halves, it will appear
between a pair of points lying in the strip of width 2*d centered around
the line x = x0, where x0 is the x-coordinate of
a right-most point in L. It turns out that to find a potential minimal distance pair in
the strip, it is enough to compute for every point in the the strip its
distance to the seven following points if the strip points are in a
collection sorted by y-coordinate.
It is in the steps with forming the sorted collections to pass into the recursion and sorting the strip points by y-coordinate where I don't see how to, in
Haskell, utilize having sorted P at the beginning of the recursion.

The following function may interest you:
partition :: (a -> Bool) -> [a] -> ([a], [a])
partition f xs = (filter f xs, filter (not . f) xs)
If you can compute set-membership in constant time, that is, there is a predicate of type a -> Bool that runs in constant time, then partition will run in time linear in the length of its input list. Furthermore, partition is stable, so that if its input list is sorted, then so are both output lists.
I would also like to point out that the above definition is meant to be give the semantics of partition only; the real implementation in GHC only walks its input list once, even if the entire output is forced.
Of course, the real crux of the question is providing a constant-time predicate. The way you phrased the question leaves sets A and B quite unstructured -- you demand that we can handle any particular partitioning. In that case, I don't know of any particularly Haskell-y way of doing constant-time lookup in arbitrary sets. However, often these problems are a bit more structured: often, rather than set-membership, you are actually interested in whether some easily-computable property holds or not. In this case, the above is just what the doctor ordered.

I know very very little about Haskell but here's a shot anyway.
Given that (A+B) == X can;t you just iterate through X (in the sorted order) and add each element to A' or B' if it exists in A or B? Give linear time lookup of element x in the Sets A and B that would be linear.

Related

What is a sorting algorithm that is robust to a faulty comparison?

I want to sort a list of n items with a comparison sort. However, one of the comparisons made by the algorithm will be flipped from what it's supposed to be. Specifically, there is one pair of items for which the comparator function consistently gives the wrong result.
What is a efficient n*log(n) sorting algorithm that will be robust to this faulty comparison? By robust, I mean that every item is off by at most k spots from its true position, for some reasonably small k.
If possible, I'd like it to be robust in the worst case (faulty comparison chosen adversarially), but I'll settle for robust in the average case.
An example robust algorithm (that's not efficient), would be to make all n*(n-1)/2 pairwise comparisons, and place each item by how many of the comparisons they won. Then, no matter what comparison the adversary makes, each items index will be off by no more than k=1.
An example of a NON-robust algorithm is quicksort, because the adversary could just choose the largest item to be on the wrong side of the first pivot, making it on average n/2 spots off from its correct index.
TL;DR: It's possible to modify quicksort to get the following guarantee: in (expected) time O(n log n), we can do one of the following, depending on which comparison is flipped.
Perfectly sort the array.
Perfectly sort the array, except that an adjacent pair of items somewhere in the array is swapped.
Perfectly sort the array, except that three consecutive items in the array, which can be identified, are permuted.
This guarantees a maximum displacement of 2, which is as good as is theoretically possible.
I mulled over this problem for a couple of hours and everything I'm doing connects back to tournaments.
I'd like to begin by trying to reframe the question as follows. If you have a set of n items and you know the "true" results of the comparisons between them, you can represent that result as a directed graph with one node per item and edges indicating when one item compares less than another. This type of digraph is called a "tournament," since you can think of it as encoding the result of a round-robin tournament where each player plays each other player.
In the case of an honest comparator, our tournament will be acyclic, and in particular it will have the following key property: there's exactly one node of each outdegree 0, 1, 2, ..., n - 1. The idea here is that the smallest element will have outdegree n - 1 (it's smaller than everything else), while the largest element will have outdegree 0 (it's bigger than everything else). And in fact, there's a theorem that a tournament is acyclic if and only if each node in the tournament has a different outdegree. Another useful fact: in an acyclic tournament, there's an edge from U to V if and only if outdeg(U) > outdeg(V).
In the case of a "dishonest comparator," we essentially start with an acyclic tournament, then flip a single edge. Your question asked about doing approximate sorting based on this comparator, but I'd like to step back and ask a different question, which I think can then be used to answer yours more precisely. In what cases can you figure out which edge was flipped? If we can do that, then we can do even better than approximate sorting - we can "unflip" the edge and sort perfectly. On the other hand, in which cases can you not figure out which edge was flipped, and when that happens, how far from sorted will we end up? That corresponds to having to do an approximate sort because we can't recover the original ordering.
Here's a useful fact:
Theorem: Begin with an acyclic tournament and flip a single edge. Then it's possible to determine which edge was flipped if and only if the outdegrees of the two endpoints of the flipped edge originally differ by at least three.
To prove this, we'll show both directions of implication.
First, suppose that we flip an edge between two nodes X and Y whose outdegrees differ by one. When we're done, we're left with a tournament where all nodes have different outdegrees (all other nodes have their outdegrees unchanged, and if we flipped the edge (X, Y), then X and Y swap outdegrees because one goes up by one and one goes down by one). We're now left with another acyclic tournament. And in particular, we can't tell which edge we flipped, because we could have just as well flipped any edge between any pair of nodes whose outdegrees differ by one.
Next, suppose we flip an edge between nodes X and Y where the outdeg(X) = k+1 and outdeg(Y) = k-1. We now have outdeg(X) = k = outdeg(Y), and somewhere else to begin with there must have been some node Z with outdegree k as well. So at this point, we have three nodes of outdegree k (namely, X, Y, and Z), and we know that we must have flipped one of the three edges between them. But we can't tell which one it was. Specifically, flipping the XY edge, or the XZ edge, or the YZ edge would all give back acyclic tournaments. So in that case, there's no way to undo the transform. That means that any sorted ordering we get from this comparator will have those two items out of place, so we'd have a maximum distance of at least 1.
An important note for this particular case: this corresponds to the comparator creating a tournament with exactly one cycle containing the nodes X, Y, and Z. Specifically, it'll take on the form X, Z, Y, X. The problem is we can't tell whether the original ordering was (X, Z, Y), or (Z, Y, X), or (Y, X, Z), and so we'd have a maximum distance of at least 2.
And finally, suppose that we have two nodes X and Y and flip the edge XY in the case where outdeg(X) = k, outdeg(Y) = m, and k ≥ m + 3. We're now left with a tournament in which two nodes have outdegree k - 1 and two nodes have outdegree m + 1. But of those four nodes, it's guaranteed that there's exactly one pair of them that can be flipped back to produce an acyclic tournament. One way to see this: take the four nodes that now have repeated outdegrees; call them X and Y (as above) and also W and Z, and suppose we have the cycle X, W, Z, Y, X, where the only flipped edge from the original is (Y, X). What will this cycle look like? Well, since (X, W), (W, Z), and (Z, Y) are edges in the tournament that weren't flipped, back in the original tournament we have outdeg(X) > outdeg(W) > outdeg(Z) > outdeg(Y). That means that we have to have X and W having outdegree k - 1 in the new graph and Z and Y having outdegree m + 1 in the new graph. Therefore, only flipping the edge from Y to X will increase the degree of one of the degree-(k-1) nodes back up to k while also decreasing the degree of one of the degree-(m+1) nodes down to m.
Summarizing:
Theorem: The faulty comparator will either
Behave as a real comparator, in which case we swapped two adjacent elements in the original sequence and we will never know which.
Have exactly one cycle of length three of elements whose original ordering can never be known, or
Have a cycle of length four, in which case we can identify which comparison is reversed.
With this in mind, it seems reasonable to reframe your problem in the following way:
Goal: Design an algorithm that, in time O(n log n), does one of the following things to a list of n elements given a faulty comparator that returns the wrong result when comparing two fixed elements X and Y against one another:
Perfectly sort the list.
Perfectly sort the list, except with two adjacent items swapped.
Perfectly sort the list, except with three adjacent items permuted.
Here's one possible algorithm that does this in expected O(n log n) time that's based on quicksort. The basic idea is the following: we run more or less a regular quicksort, at each point in time checking to see whether we found a triangle. If not, then either we're in case (1) or case (2). If we do find a triangle, we see whether we can identify which comparison got reversed. If we can, then we rerun quicksort, except that we "fix" the comparator in this broken case. If we can't, then we're in case (3) and just finish quicksort as usual.
The specific technique we'll use to detect a triangle works like this. Begin with a regular, vanilla quicksort: pick a pivot, partition the array into things less than the pivot and things bigger than the pivot, then recursively sort the two smaller subarrays. However, after doing so, we do one additional step: assuming the subarray we're sorting has three or more elements in it, look at the pivot p and the element just before and just after it (call those s, p, g for "smaller," "pivot," and "greater"). Then if the comparator says s < p < g < s, we've found a triangle. And in fact, we have something stronger.
Suppose that at some point in quicksort comparator does indeed compare X and Y, the mismatched items. We're assuming X < Y, but that the comparator incorrectly reports that Y < X. The only way that two items can be compared in quicksort is if one of them is a pivot element at a time when the other is in the current subarray. Without loss of generality, let's assume that X was the pivot, and that Y was compared against it.
What should happen here, assuming the comparator was honest, is that Y would be found to be larger than X, and therefore would be placed into the "bigger" subarray. But because the comparator is a lying liar who lies, instead Y gets placed into the "smaller" subarray. If we then recursively sort the "smaller" subarray and the "bigger" subarray, think about where Y will end up. It's in the "smaller" subarray but is actually bigger than X, which means it'll compare larger than everything in that "smaller" subarray. Consequently, Y will appear just before X. Now, look at the items in the "bigger" subarray. There are two options. The first is that in the "real" ordering, there's at least one value between X and Y. That value would then appear in the "bigger" subarray because it's larger than X, and in particular the first element of the "bigger" subarray would compare smaller than Y. That would mean that Y, then X, then the item immediately after X after sorting would form a triangle. The other option is that X and Y are adjacent in the true sorted ordering, which case we'd never find out (as mentioned above). This, combined with the above insight, means that
Theorem: Suppose we run quicksort, and after recursively sorting the left and right subarrays we look at the three items consisting of the pivot, the item just before it, and the item just after it to see if they form a triangle. Then if this algorithm detects a triangle, a triangle exists. Moreover, if this algorithm does not detect a triangle, then either (1) no triangle exists or (2) a triangle does exist, but the comparator was never applied to the bad pair (X, Y) and so the sorted order is correct.
With all this said and done, we can state the full algorithm that, in expected O(n log n) time, sorts the array as best as is possible.
function modifiedQuicksort(array, comparator):
if array has length 0 or 1, return.
pick a random pivot element from the array.
use the comparator to form subarrays smaller and greater based on
how elements compare against the pivot.
recursively apply modifiedQuicksort to those two arrays.
if the comparator finds a triangle formed from the last element of
smaller, the pivot, and the first element of greater, report those
three items as a triangle.
return smaller, pivot, greater.
function sortAsBestWeCan(array, comparator):
run modifiedQuicksort(array, comparator)
if it didn't report a triangle, return the result of the call.
otherwise, it reported a triangle A, B, C.
for each other item D:
if comparator(A, D) and comparator(D, B) or
comparator(B, D) and comparator(D, C) or
comparator(C, D) and comparator(D, A):
you have found a 4-cycle from A, B, C, and D.
detect which comparison is reversed.
use that knowledge plus the comparator and your favorite
O(n log n)-time sorting algorithm to perfectly sort
the input array.
otherwise, those three items are the only triangle, and the
array is sorted as well as it can be. return it.
I think I've thought up a solution.
First, do a first pass with any decent sorting algorithm you want (like quicksort), which should, at worst, result in only one item that's significantly far from where it should be.
Then, choose a width h that's at least 5.
for i from 0 to n-h, we look at the group of h items at i, i+1, ..., i+h-1. We make all h*(h-1)/2 pairwise comparisons in that group, and rearrange them by who won the most comparisons. We then increment i and move onto the next group.
Afterwards, we do the same thing, but going backwards from i=n-h to i=0.
These two extra passes will bubble up/bubble down the displaced item to be in the correct area, and uses the extra comparisons in a group of h to override the faulty single comparison.
The final number of comparisons will be O(n*log(n)) + n*h*(h-1)/2. Not sure how much better you can do.
This method also works (I think) for more than one faulty comparison. All you need to do is make sure that h is large enough to override those faulty comparisons.

Search in locality sensitive hashing

I'm trying to understand the section 5. of this paper about LSH, in particular how to bucket the generated hashes. Quoting the linked paper:
Given bit vectors consisting of d bits each, we choose N = O(n 1/(1+epsilon)
) random permutations of the bits. For each random permutation σ, we
maintain a sorted order O σ of the bit vectors, in lexicographic order
of the bits permuted by σ. Given a query bit vector q, we find the
approximate nearest neighbor by doing the following: For each permu-
tation σ, we perform a binary search on O σ to locate the two bit
vectors closest to q (in the lexicographic order ob- tained by bits
permuted by σ). We now search in each of the sorted orders O σ
examining elements above and below the position returned by the binary
search in order of the length of the longest prefix that matches q.
This can be done by maintaining two pointers for each sorted order O σ
(one moves up and the other down). At each step we move one of the
pointers up or down corresponding to the element with the longest
matching prefix. (Here the length of the longest matching prefix in O
σ is computed relative to q with its bits permuted by σ). We examine
2N = O(n 1/(1+epsilon) ) bit vec- tors in this way. Of all the bit vectors
examined, we return the one that has the smallest Hamming distance to
q.
I'm confused by this algorithm and I don't think that I understood how it works.
I found already this question on the topic, but I didn't understand the answer in the comments. Also in this question in point 2 the same algorithm is described but again, I don't understand how its works.
Can you please try to explain to me how it works step by step trying to be more simple as possible?
I even tried to make a list of things that I don't understand, but in practice is so bad written that I don't understand most of the sentences!
EDIT after gsamaras answer:
I mostly understood the answer, but I still have some doubts:
Is it correct to say that the total cost of performing the N permutations is O(Nnlogn), since we have to sort each one of them ?
The permutation+sorting process described above is performed only once during the pre-processing or for every query q? It seems already pretty expensive O(Nnlogn) even in pre-processing, if we have to do this at query time it's a disaster :D
At the last point, where we compare v0 and v4 to q, we compare their permuted version or the original one (before their permutation)?
This question is somehow broad, so I am just going to give a minimal (abstract) example here:
We have 6 (= n) vectors in our dataset, with d bits each. Let's assume that we do 2 (= N) random permutation.
Let the 1st random permutation begin! Remember that we permute the bits, not the order of the vectors. After permuting the bits, they maintain an order, for example:
v1
v5
v0
v3
v2
v4
Now the query vector, q, arrives, but it's (almost) unlikely that is going to be the same with a vector in our dataset (after the permutation), thus we won't find it by performing binary search.
However, we are going to end up between two vectors. So now we can imagine the scenario to be like this (for example q lies between v0 and v3:
v1
v5
v0 <-- up pointer
<-- q lies here
v3 <-- down pointer
v2
v4
Now we move either up or down pointer, seeking for the vi vector that will match at the most bits with q. Let's say it was v0.
Similarly, we do the second permutation and we find the vector vi, let's say v4. we now compare v0 from the first permutation and v4, to see which one is closest to q, i.e. which one has the most bits equal with q.
Edit:
Is it correct to say that the total cost of performing the N permutations is O(Nnlogn), since we have to sort each one of them?
If they actually sort every permutation from scratch, then yes, but it's not clear for me how they do it.
The permutation+sorting process described above is performed only once during the pre-processing or for every query q?
ONCE.
At the last point, where we compare v0 and v4 to q, we compare their permuted version or the original one (before their permutation)?
I think they do it with the permuted version (see the parentheses before 2N in the paper). But that wouldn't make any difference, since they permute q too with the same permute (σ).
This quora answer may shed some light too.

Ordered lattice point enumeration

Setup: Let ei be an orthogonal basis for n-dimensional Euclidean space, but suppose that ei has irrational (L1) norm. Let L be the set of points obtained by taking linear combinations of the ei with coefficients in the natural numbers (including zero). Now order the points in L first by their L1-norm and then lexicographically.
Question: Is there an efficient algorithm for producing the points in L in increasing order up to some pre-defined bound? Note that I do not want to produce the points and then sort them, rather I want to walk the lattice in order.
Observation: This is easy to do if the ei are an orthonormal basis. For instance, this problem is solved here. In principle something similar would work here, however determining the radii to iterate over is almost as hard as solving the enumeration problem, so it isn't very useful.
How about this:
Let L₁ and L₂ be lists of vectors, where L₁ is the list of visited/processed lattice vectors and L₂ is a list of lists of vectors that will be visited next.
Set L₁={ } and L₂ = {[0]}, where 0 is the zero-vector.
Let v be the smallest vector of the first list in L₂.
Visit/process the vector v.
Add the list L={v+e₁,...,v+en} to L₂, such that the lists are sorted by their smallest element. Only generate v+ei as long as its norm is smaller than your predefined bound.
Insert v at the end of L₁ and remove it from the front of the first list L₂.
If the first list is now empty, remove it from L₂. If not, move it to the correct place.
If L₂ is not empty, goto 2.
This algorithm requires the ei to be sorted by their norm from small to big.
This algorithm adds at most n vectors to L₂ per round. Let B your predefined upper bound, then there are at most nk-1 vectors you are going to visit, where k = 1+B/||e₁||. The first ca. nk' rounds, the list will be of size n, where k' = B/||en||. So in total you have to store less than N = nk' + (nk-1)/(nk'+1) lists. you can generate a new list in O(n) and add place it in L₂ in O(log N) (binary search the correct place and link insert it there).
So the overall complexity would be something like O(N⋅n⋅log N), but notice that N is about the number of vectors you are looking for.
Notice: most likely there is a faster algorithm, but this is something you can try.

Mapping functions to reduce Time Complexity? PhD Qual Item

This was on my last comp stat qual. I gave an answer I thought was pretty good. We just get our score on the exam, not whether we got specific questions right. Hoping the community can give guidance on this one, I am not interested in the answer so much as what is being tested and where I can go read more about it and get some practice before the next exam.
At first glance it looks like a time complexity question, but when it starts talking about mapping-functions and pre-sorting data, I am not sure how to handle.
So how would you answer?
Here it is:
Given a set of items X = {x1, x2, ..., xn} drawn from some domain Z, your task is to find if a query item q in Z occurs in the set. For simplicity you may assume each item occurs exactly once in X and that it takes O(l) amount of time to compare any two items in Z.
(a) Write pseudo-code for an algorithm which checks if q in X. What is the worst case time complexity of your algorithm?
(b) If l is very large (e.g. if each element of X is a long video) then one needs efficient algorithms to check if q \in X. Suppose you are given access to k functions h_i: Z -> {1, 2, ..., m} which uniformly map an element of Z to a number between 1 and m, and let k << l and m > n.
Write pseudo-code for an algorithm which uses the function h_1...h_k to check if q \in X. Note that you are allowed to preprocess the data. What is the worst case time complexity of your algorithm?
Be explicit about the inputs, outputs, and assumptions in your pseudocode.
The first seems to be a simple linear scan. The time complexity is O(n * l), the worst case is to compare all elements. Note - it cannot be sub-linear with n, since there is no information if the data is sorted.
The second (b) is actually a variation of bloom-filter, which is a probabalistic way to represent a set. Using bloom filters - you might have false positives (say something is in the set while it is not), but never false negative (say something is not int the set, while it is).

Generate all subset sums within a range faster than O((k+N) * 2^(N/2))?

Is there a way to generate all of the subset sums s1, s2, ..., sk that fall in a range [A,B] faster than O((k+N)*2N/2), where k is the number of sums there are in [A,B]? Note that k is only known after we have enumerated all subset sums within [A,B].
I'm currently using a modified Horowitz-Sahni algorithm. For example, I first call it to for the smallest sum greater than or equal to A, giving me s1. Then I call it again for the next smallest sum greater than s1, giving me s2. Repeat this until we find a sum sk+1 greater than B. There is a lot of computation repeated between each iteration, even without rebuilding the initial two 2N/2 lists, so is there a way to do better?
In my problem, N is about 15, and the magnitude of the numbers is on the order of millions, so I haven't considered the dynamic programming route.
Check the subset sum on Wikipedia. As far as I know, it's the fastest known algorithm, which operates in O(2^(N/2)) time.
Edit:
If you're looking for multiple possible sums, instead of just 0, you can save the end arrays and just iterate through them again (which is roughly an O(2^(n/2) operation) and save re-computing them. The value of all the possible subsets is doesn't change with the target.
Edit again:
I'm not wholly sure what you want. Are we running K searches for one independent value each, or looking for any subset that has a value in a specific range that is K wide? Or are you trying to approximate the second by using the first?
Edit in response:
Yes, you do get a lot of duplicate work even without rebuilding the list. But if you don't rebuild the list, that's not O(k * N * 2^(N/2)). Building the list is O(N * 2^(N/2)).
If you know A and B right now, you could begin iteration, and then simply not stop when you find the right answer (the bottom bound), but keep going until it goes out of range. That should be roughly the same as solving subset sum for just one solution, involving only +k more ops, and when you're done, you can ditch the list.
More edit:
You have a range of sums, from A to B. First, you solve subset sum problem for A. Then, you just keep iterating and storing the results, until you find the solution for B, at which point you stop. Now you have every sum between A and B in a single run, and it will only cost you one subset sum problem solve plus K operations for K values in the range A to B, which is linear and nice and fast.
s = *i + *j; if s > B then ++i; else if s < A then ++j; else { print s; ... what_goes_here? ... }
No, no, no. I get the source of your confusion now (I misread something), but it's still not as complex as what you had originally. If you want to find ALL combinations within the range, instead of one, you will just have to iterate over all combinations of both lists, which isn't too bad.
Excuse my use of auto. C++0x compiler.
std::vector<int> sums;
std::vector<int> firstlist;
std::vector<int> secondlist;
// Fill in first/secondlist.
std::sort(firstlist.begin(), firstlist.end());
std::sort(secondlist.begin(), secondlist.end());
auto firstit = firstlist.begin();
auto secondit = secondlist.begin();
// Since we want all in a range, rather than just the first, we need to check all combinations. Horowitz/Sahni is only designed to find one.
for(; firstit != firstlist.end(); firstit++) {
for(; secondit = secondlist.end(); secondit++) {
int sum = *firstit + *secondit;
if (sum > A && sum < B)
sums.push_back(sum);
}
}
It's still not great. But it could be optimized if you know in advance that N is very large, for example, mapping or hashmapping sums to iterators, so that any given firstit can find any suitable partners in secondit, reducing the running time.
It is possible to do this in O(N*2^(N/2)), using ideas similar to Horowitz Sahni, but we try and do some optimizations to reduce the constants in the BigOh.
We do the following
Step 1: Split into sets of N/2, and generate all possible 2^(N/2) sets for each split. Call them S1 and S2. This we can do in O(2^(N/2)) (note: the N factor is missing here, due to an optimization we can do).
Step 2: Next sort the larger of S1 and S2 (say S1) in O(N*2^(N/2)) time (we optimize here by not sorting both).
Step 3: Find Subset sums in range [A,B] in S1 using binary search (as it is sorted).
Step 4: Next, for each sum in S2, find using binary search the sets in S1 whose union with this gives sum in range [A,B]. This is O(N*2^(N/2)). At the same time, find if that corresponding set in S2 is in the range [A,B]. The optimization here is to combine loops. Note: This gives you a representation of the sets (in terms of two indexes in S2), not the sets themselves. If you want all the sets, this becomes O(K + N*2^(N/2)), where K is the number of sets.
Further optimizations might be possible, for instance when sum from S2, is negative, we don't consider sums < A etc.
Since Steps 2,3,4 should be pretty clear, I will elaborate further on how to get Step 1 done in O(2^(N/2)) time.
For this, we use the concept of Gray Codes. Gray codes are a sequence of binary bit patterns in which each pattern differs from the previous pattern in exactly one bit.
Example: 00 -> 01 -> 11 -> 10 is a gray code with 2 bits.
There are gray codes which go through all possible N/2 bit numbers and these can be generated iteratively (see the wiki page I linked to), in O(1) time for each step (total O(2^(N/2)) steps), given the previous bit pattern, i.e. given current bit pattern, we can generate the next bit pattern in O(1) time.
This enables us to form all the subset sums, by using the previous sum and changing that by just adding or subtracting one number (corresponding to the differing bit position) to get the next sum.
If you modify the Horowitz-Sahni algorithm in the right way, then it's hardly slower than original Horowitz-Sahni. Recall that Horowitz-Sahni works two lists of subset sums: Sums of subsets in the left half of the original list, and sums of subsets in the right half. Call these two lists of sums L and R. To obtain subsets that sum to some fixed value A, you can sort R, and then look up a number in R that matches each number in L using a binary search. However, the algorithm is asymmetric only to save a constant factor in space and time. It's a good idea for this problem to sort both L and R.
In my code below I also reverse L. Then you can keep two pointers into R, updated for each entry in L: A pointer to the last entry in R that's too low, and a pointer to the first entry in R that's too high. When you advance to the next entry in L, each pointer might either move forward or stay put, but they won't have to move backwards. Thus, the second stage of the Horowitz-Sahni algorithm only takes linear time in the data generated in the first stage, plus linear time in the length of the output. Up to a constant factor, you can't do better than that (once you have committed to this meet-in-the-middle algorithm).
Here is a Python code with example input:
# Input
terms = [29371, 108810, 124019, 267363, 298330, 368607,
438140, 453243, 515250, 575143, 695146, 840979, 868052, 999760]
(A,B) = (500000,600000)
# Subset iterator stolen from Sage
def subsets(X):
yield []; pairs = []
for x in X:
pairs.append((2**len(pairs),x))
for w in xrange(2**(len(pairs)-1), 2**(len(pairs))):
yield [x for m, x in pairs if m & w]
# Modified Horowitz-Sahni with toolow and toohigh indices
L = sorted([(sum(S),S) for S in subsets(terms[:len(terms)/2])])
R = sorted([(sum(S),S) for S in subsets(terms[len(terms)/2:])])
(toolow,toohigh) = (-1,0)
for (Lsum,S) in reversed(L):
while R[toolow+1][0] < A-Lsum and toolow < len(R)-1: toolow += 1
while R[toohigh][0] <= B-Lsum and toohigh < len(R): toohigh += 1
for n in xrange(toolow+1,toohigh):
print '+'.join(map(str,S+R[n][1])),'=',sum(S+R[n][1])
"Moron" (I think he should change his user name) raises the reasonable issue of optimizing the algorithm a little further by skipping one of the sorts. Actually, because each list L and R is a list of sizes of subsets, you can do a combined generate and sort of each one in linear time! (That is, linear in the lengths of the lists.) L is the union of two lists of sums, those that include the first term, term[0], and those that don't. So actually you should just make one of these halves in sorted form, add a constant, and then do a merge of the two sorted lists. If you apply this idea recursively, you save a logarithmic factor in the time to make a sorted L, i.e., a factor of N in the original variable of the problem. This gives a good reason to sort both lists as you generate them. If you only sort one list, you have some binary searches that could reintroduce that factor of N; at best you have to optimize them somehow.
At first glance, a factor of O(N) could still be there for a different reason: If you want not just the subset sum, but the subset that makes the sum, then it looks like O(N) time and space to store each subset in L and in R. However, there is a data-sharing trick that also gets rid of that factor of O(N). The first step of the trick is to store each subset of the left or right half as a linked list of bits (1 if a term is included, 0 if it is not included). Then, when the list L is doubled in size as in the previous paragraph, the two linked lists for a subset and its partner can be shared, except at the head:
0
|
v
1 -> 1 -> 0 -> ...
Actually, this linked list trick is an artifact of the cost model and never truly helpful. Because, in order to have pointers in a RAM architecture with O(1) cost, you have to define data words with O(log(memory)) bits. But if you have data words of this size, you might as well store each word as a single bit vector rather than with this pointer structure. I.e., if you need less than a gigaword of memory, then you can store each subset in a 32-bit word. If you need more than a gigaword, then you have a 64-bit architecture or an emulation of it (or maybe 48 bits), and you can still store each subset in one word. If you patch the RAM cost model to take account of word size, then this factor of N was never really there anyway.
So, interestingly, the time complexity for the original Horowitz-Sahni algorithm isn't O(N*2^(N/2)), it's O(2^(N/2)). Likewise the time complexity for this problem is O(K+2^(N/2)), where K is the length of the output.

Resources