Searching in vector of pairs - algorithm

I have a vector of pairs (datatype=double), where each pair is (a,b) and a less than b.For a number x, I want to find out number of pair in vector, where a<=x<=b.
Consider the vector size about 10^6.
My Approach
Sort the vector pair and perform a lower_bound operation for x over "a" in pair then iterate from start till my lower bound value and check for values of "b" which satisfies condition of x<=b.
Time Complexity
N(LogN) where N is vector size.
Issue
I have to perform this over large queries where this approach becomes inefficient.So is there any better solution to decrease the time complexity.
Sorry for my poor English and question formatting.

In addition to the previous answer, here's a suggestion how to prepare the ranges to optimize the subsequent lookup. The idea boils down to precomputing the result for all significantly different input values, but being smart about when values don't differ significantly.
To illustrate what I mean, let's consider this sequence of ranges:
1, 3
1, 8
2, 4
2, 6
The prepared output structure then looks like this:
1, 2 -> 2
2, 3 -> 4
3, 4 -> 3
4, 6 -> 2
6, 8 -> 1
For any number in the range 1, 2, there are two matching ranges in the initial sequence. For any number in the range 2, 3, there are four matches, etc. Note that there are five ranges here now, because some of the input ranges partially overlapped. Since for every range here the end value is also the start value of the next range, the end value can be optimized out. The result then looks like a simple map:
1 -> 2
2 -> 4
3 -> 3
4 -> 2
6 -> 1
8 -> 0
Note here that the last range didn't have one following, so the explicit zero becomes necessary. For the values before the first, that is implied. In order to find the result for a value, just find the key that is less than or equal to that value. This is a simple O(log n) lookup.

Firstly, if you just did a simple scan over the pairs, you would have O(n) complexity! The O(n log n) comes from sorting and for a one-off operation this is just overhead. This might even be the best way to do it, if you don't reuse the results and even if you just perform a few queries, it might still be better than sorting. Make sure you allow yourself to switch out the algorithm.
Anyhow, let's consider that you need to make many queries. Then, one relatively obvious step to improve things is to not iterate step-by-step after sorting. Instead, you can do a binary search for the lower bound. Simply partition the sequence into halves. The lower bound can be found in either half, which you can determine by looking at the middle element between the partitions. Recurse until you found the first element that can not possibly contain the value you search, because its start value is already greater.
Concerning the other direction, things are not that easy. Just because you sorted the ranges by the start value doesn't imply that the end values are sorted, too. Also, ranges that match and ranges that don't can be mixed in the sequence, so here you will have to perform a linear scan.
Lastly, some notes:
You could parallelize this algorithm using multithreading.
Depending on your number of searches M in your outer loop, you could also switch the outer loop with the inner one. That means that for every pair of the input vector, you check each of the M search values whether they fall within the range. This might be better, in particular when the M searches fit into the CPU cache.

This is a very typical style problem in for segment trees, binary indexed trees, interval trees.
There are two operations that you have to carry out on an array arr.
You have two operations on an array arr:
1. Range update: Add(a, b): for(int i = a; i <= b; ++i) arr[i]++
2. Point query : Query(x): return arr[x]
Alternately, you could formulate your problem slightly cleverly.
1. Point Update: Add(a, b): arr[a]++; arr[b+1]--;
2. Range Query: Query(x): return sum(arr[0], arr[1] ..... arr[x]);
In each of the cases above, you have one O(n) operation and one O(1) operation.
For the second case, the query is essentially a prefix sum calculation. Binary Indexed Trees are especially efficient at this task.
Tutorial for Binary Indexed Trees
IMPORTANT IDEA: ARRAY COMPRESSION
You did mention that the vector size is about 10^6, so there is a chance that you may not be able to create an array that big. If you are able to create a set that consists of all the as and bs and xs beforehand, then you can translate them into numbers from 1 to size of set.
SUPER CLEVER IDEA: MO's ALGORITHM
This is only allowed if you are allowed to solve the problem offline. What that means is that you can take all the query points x as input, solve them in any order as you like and store the solution, and then print the solution in the correct order.
Please mention if this is your situation, and only then will I elaborate further on this. But Binary Indexed Trees are going to be more efficient than Mo's algorithm.
EDIT:
Because your interval values are of type double, you must convert them to integers before you use my solution. Let me give an example,
Intervals = (1.1 to 1.9), (1.4 to 2.1)
Query Points = 1.5, 2.0
Here all the points that are of interest are not all the possible doubles, but just the above numbers = {1.1, 1.4, 1.5, 1.9, 2.0, 2.1}
If we map them into positive integers:
1.1 --> 1
1.4 --> 2
1.5 --> 3
1.9 --> 4
2.0 --> 5
2.1 --> 6
Then you could use segment trees/binary indexed trees.

For each pair a,b you can decompose so that a=+1 and b=-1 for the number of ranges valid for a particular value. Then in becomes a simple O(log n) lookup to see how many ranges encompass the search value.

Related

Efficiently search for pairs of numbers in various rows

Imagine you have N distinct people and that you have a record of where these people are, exactly M of these records to be exact.
For example
1,50,299
1,2,3,4,5,50,287
1,50,299
So you can see that 'person 1' is at the same place with 'person 50' three times. Here M = 3 obviously since there's only 3 lines. My question is given M of these lines, and a threshold value (i.e person A and B have been at the same place more than threshold times), what do you suggest the most efficient way of returning these co-occurrences?
So far I've built an N by N table, and looped through each row, incrementing table(N,M) every time N co occurs with M in a row. Obviously this is an awful approach and takes 0(n^2) to O(n^3) depending on how you implent. Any tips would be appreciated!
There is no need to create the table. Just create a hash/dictionary/whatever your language calls it. Then in pseudocode:
answer = []
for S in sets:
for (i, j) in pairs from S:
count[(i,j)]++
if threshold == count[(i,j)]:
answer.append((i,j))
If you have M sets of size of size K the running time will be O(M*K^2).
If you want you can actually keep the list of intersecting sets in a data structure parallel to count without changing the big-O.
Furthermore the same algorithm can be readily implemented in a distributed way using a map-reduce. For the count you just have to emit a key of (i, j) and a value of 1. In the reduce you count them. Actually generating the list of sets is similar.
The known concept for your case is Market Basket analysis. In this context, there are different algorithms. For example Apriori algorithm can be using for your case in a specific case for sets of size 2.
Moreover, in these cases to finding association rules with specific supports and conditions (which for your case is the threshold value) using from LSH and min-hash too.
you could use probability to speed it up, e.g. only check each pair with 1/50 probability. That will give you a 50x speed up. Then double check any pairs that make it close enough to 1/50th of M.
To double check any pairs, you can either go through the whole list again, or you could double check more efficiently if you do some clever kind of reverse indexing as you go. e.g. encode each persons row indices into 64 bit integers, you could use binary search / merge sort type techniques to see which 64 bit integers to compare, and use bit operations to compare 64 bit integers for matches. Other things to look up could be reverse indexing, binary indexed range trees / fenwick trees.

Constant time binning of values

Say I have a vector of values that represent the upper boundaries of classes to classify (bin) values in. So e.g. vector { 1, 3, 5, 10 } represents bins [0, 1[, [1, 3[, [3, 5[ and [5,10[. How do I implement classification of a random value V in one of these classes (0,1,2,3) in constant time? It's trivial to walk the list of boundaries and stop once V surpasses the bin's upper boundary; but that's O(n) wrt the number of bins; I'm looking to do this in constant time.
I thought it was trivial before I was actually typing the code, by setting up a lookup table, dividing each V by a certain value depending on the class bounds and then using the (rounded) result of the division to find the bin number in the lookup table. But I'm finding it a lot harder than I thought to made this in a generic way that minimizes the size of the lookup table while still being accurate, regardless of the proportional distance between bin boundaries; and in a way that works for all real values. With Google'ing I only find algorithms that determine the boundaries of the bins, at least using the terms I did.
I doubt there's a way to do this in strictly constant time (and not requiring infinite space) without taking advantage of some property of the given numbers.
A lookup table is a decent idea, but floating point values makes this difficult. If the number of digits is finite, you can consider is having the lookup table represented as essentially a trie (a tree where each level represents a digit).
So for {1, 2.5, 5, 9}, your tree would look something like this:
root
/ / / / | \ \ \ \ \
0 1 2 3 4 5 6 7 8 9
/ | \
2.0 ... 2.5 ... 2.9
Each leaf node would contain a value indicating which interval it belongs to, so
0 will be set to 0,
1, 2.0 - 2.4 will all be set to 1,
2.5 - 2.9, 3 - 4 will be set to 2,
5 - 9 will be set to 3
A query would just involve starting from the root and repeatedly going to the child node corresponding to the next digit in the number we're looking up (if you look up 2.65 in the above tree, you first go to 2, then 2.6, then, since it's a leaf, you stop and return it's value, which is 1).
The time complexity for a query would be O(d), where d is the number of significant digits in your vector, and the space complexity is O(nd).
That might not sound particularly efficient, but keep in mind that d is the number of digits - for example, that would be d = log m with m being the maximum possible value if we're talking about positive integers.
O(log n) is fairly trivial if you just set up a binary search tree (BST) containing all the values in the vector mapped to their original indices.
A lookup would look very similar to how you'd search a BST - start from the root and go either left or right until you find the value, except in this case you note every node you visit and return the mapped index of the closest value that's not bigger. Some API's have methods that basically do this for you (such as std::map in C++).
I think the only way to get O(1) is to create a lookup table so that you can look up all the values directly.
This is only feasable if the boundaries are behaving nicely:
The expected numbers are integers or the boundaries are integers or have limited precision. This allows you to round down (floor) the number before checking against the lookup table and drastically reduces the required entries for the table.
The difference between the max and min boundary cannot be too big. Let's say we know that the precision of the boundaries is 0.5 and the min is 1 and the max is 10, then the lookup table requires (10-1)/0.5 = 18 entries.
The checks for the first and last group (smaller than min and greater than max) is done with simple if checks which doesn't affect the complexity.

Adding, Removing and First missing positive integer

Given an empty list. There are three types of queries 1, 2, 3.
Query 1 x where x is a positive integer indicates adding the number x into the list.
Query 2 x indicates removing x from the list.
Query 3 indicates printing the smallest positive integer not present in the array.
Here x can be from 1 upto 10^9 and number of queries upto 10^5. For the large range of x I can't keep a boolean array marking visited integers. How should I approach?
There are too many unknowns about your data to give a definitive answer here. The approach differs a lot between at least these different cases:
Few values.
A lot of values but with large gaps.
A lot of values with only small gaps.
Almost all values.
It also depends on which ones of the mentioned operations that you will do the most.
It is less than 1 GB of data so it is possible to keep it as a bit array in memory on most machines. But if the data set is sparse (case 1 and 2 above) you might want to consider sparse arrays instead, or for very sparse sets (case 1) perhaps a binary search tree or a min-heap. The heap is probably not a good idea if you are going to use operation 2 a lot.
For case 1, 2 and 4 you might consider a range tree. The upside to this solution is that you can do operation 3 in logarithmic time just by going leftwards down the tree and look at the first range.
It might also be possible to page out your datastructure to disk if you are not going to do a lot of random insertions.
You might also consider speeding up the search with a Bloom filter, depending on what type of datastructure you choose in the end.

Find random numbers in a given range with certain possible numbers excluded

Suppose you are given a range and a few numbers in the range (exceptions). Now you need to generate a random number in the range except the given exceptions.
For example, if range = [1..5] and exceptions = {1, 3, 5} you should generate either 2 or 4 with equal probability.
What logic should I use to solve this problem?
If you have no constraints at all, i guess this is the easiest way: create an array containing the valid values, a[0]...a[m] . Return a[rand(0,...,m)].
If you don't want to create an auxiliary array, but you can count the number of exceptions e and of elements n in the original range, you can simply generate a random number r=rand(0 ... n-e), and then find the valid element with a counter that doesn't tick on exceptions, and stops when it's equal to r.
Depends on the specifics of the case. For your specific example, I'd return a 2 if a Uniform(0,1) was below 1/2, 4 otherwise. Similarly, if I saw a pattern such as "the exceptions are odd numbers", I'd generate values for half the range and double. In general, though, I'd generate numbers in the range, check if they're in the exception set, and reject and re-try if they were - a technique known as acceptance/rejection for obvious reasons. There are a variety of techniques to make the exception-list check efficient, depending on how big it is and what patterns it may have.
Let's assume, to keep things simple, that arrays are indexed starting at 1, and your range runs from 1 to k. Of course, you can always shift the result by a constant if this is not the case. We'll call the array of exceptions ex_array, and let's say we have c exceptions. These need to be sorted, which shall turn out to be pretty important in a while.
Now, you only have k-e useful numbers to work with, so it'll be meaningful to find a random number in the range 1 to k-e. Say we end up with the number r. Now, we just need to find the r-th valid number in your array. Simple? Not so much. Remember, you can never simply walk over any of your arrays in a linear fashion, because that can really slow down your implementation when you have a lot of numbers. You have do some sort of binary search, say, to come up with a fast enough algorithm.
So let's try something better. The r-th number would nominally have lied at index r in your original array had you had no exceptions. The number at index r is r, of course, since your range and your array indices start from 1. But, you have a bunch of invalid numbers between 1 and r, and you want to somehow get to the r-th valid number. So, lets do a binary search on the array of exceptions, ex_array, to find how many invalid numbers are equal to or less than r, because we have these many invalid numbers lying between 1 and r. If this number is 0, we're all done, but if it isn't, we have a bit more work to do.
Assume you found there were n invalid numbers between 1 and r after the binary search. Let's advance n indices in your array to the index r+n, and find the number of invalid numbers lying between 1 and r+n, using a binary search to find how many elements in ex_array are less than or equal to r+n. If this number is exactly n, no more invalid numbers were encountered, and you've hit upon your r-th valid number. Otherwise, repeat again, this time for the index r+n', where n' is the number of random numbers that lay between 1 and r+n.
Repeat till you get to a stage where no excess exceptions are found. The important thing here is that you never once have to walk over any of the arrays in a linear fashion. You should optimize the binary searches so they don't always start at index 0. Say if you know there are n random numbers between 1 and r. Instead of starting your next binary search from 1, you could start it from one index after the index corresponding to n in ex_array.
In the worst case, you'll be doing binary searches for each element in ex_array, which means you'll do c binary searches, the first starting from index 1, the next from index 2, and so on, which gives you a time complexity of O(log(n!)). Now, Stirling's approximation tells us that O(ln(x!)) = O(xln(x)), so using the algorithm above only makes sense if c is small enough that O(cln(c)) < O(k), since you can achieve O(k) complexity using the trivial method of extracting valid elements from your array first.
In Python the solution is very simple (given your example):
import random
rng = set(range(1, 6))
ex = {1, 3, 5}
random.choice(list(rng-ex))
To optimize the solution, one needs to know how long is the range and how many exceptions there are. If the number of exceptions is very low, it's possible to generate a number from the range and just check if it's not an exception. If the number of exceptions is dominant, it probably makes sense to gather the remaining numbers into an array and generate random index for fetching non-exception.
In this answer I assume that it is known how to get an integer random number from a range.
Here's another approach...just keep on generating random numbers until you get one that isn't excluded.
Suppose your desired range was [0,100) excluding 25,50, and 75.
Put the excluded values in a hashtable or bitarray for fast lookup.
int randNum = rand(0,100);
while( excludedValues.contains(randNum) )
{
randNum = rand(0,100);
}
The complexity analysis is more difficult, since potentially rand(0,100) could return 25, 50, or 75 every time. However that is quite unlikely (assuming a random number generator), even if half of the range is excluded.
In the above case, we re-generate a random value for only 3/100 of the original values.
So 3% of the time you regenerate once. Of those 3%, only 3% will need to be regenerated, etc.
Suppose the initial range is [1,n] and and exclusion set's size is x. First generate a map from [1, n-x] to the numbers [1,n] excluding the numbers in the exclusion set. This mapping with 1-1 since there are equal numbers on both sides. In the example given in the question the mapping with be as follows - {1->2,2->4}.
Another example suppose the list is [1,10] and the exclusion list is [2,5,8,9] then the mapping is {1->1, 2->3, 3->4, 4->6, 5->7, 6->10}. This map can be created in a worst case time complexity of O(nlogn).
Now generate a random number between [1, n-x] and map it to the corresponding number using the mapping. Map looks can be done in O(logn).
You can do it in a versatile way if you have enumerators or set operations. For example using Linq:
void Main()
{
var exceptions = new[] { 1,3,5 };
RandomSequence(1,5).Where(n=>!exceptions.Contains(n))
.Take(10)
.Select(Console.WriteLine);
}
static Random r = new Random();
IEnumerable<int> RandomSequence(int min, int max)
{
yield return r.Next(min, max+1);
}
I would like to acknowledge some comments that are now deleted:
It's possible that this program never ends (only theoretically) because there could be a sequence that never contains valid values. Fair point. I think this is something that could be explained to the interviewer, however I believe my example is good enough for the context.
The distribution is fair because each of the elements has the same chance of coming up.
The advantage of answering this way is that you show understanding of modern "functional-style" programming, which may be interesting to the interviewer.
The other answers are also correct. This is a different take on the problem.

Generate all subset sums within a range faster than O((k+N) * 2^(N/2))?

Is there a way to generate all of the subset sums s1, s2, ..., sk that fall in a range [A,B] faster than O((k+N)*2N/2), where k is the number of sums there are in [A,B]? Note that k is only known after we have enumerated all subset sums within [A,B].
I'm currently using a modified Horowitz-Sahni algorithm. For example, I first call it to for the smallest sum greater than or equal to A, giving me s1. Then I call it again for the next smallest sum greater than s1, giving me s2. Repeat this until we find a sum sk+1 greater than B. There is a lot of computation repeated between each iteration, even without rebuilding the initial two 2N/2 lists, so is there a way to do better?
In my problem, N is about 15, and the magnitude of the numbers is on the order of millions, so I haven't considered the dynamic programming route.
Check the subset sum on Wikipedia. As far as I know, it's the fastest known algorithm, which operates in O(2^(N/2)) time.
Edit:
If you're looking for multiple possible sums, instead of just 0, you can save the end arrays and just iterate through them again (which is roughly an O(2^(n/2) operation) and save re-computing them. The value of all the possible subsets is doesn't change with the target.
Edit again:
I'm not wholly sure what you want. Are we running K searches for one independent value each, or looking for any subset that has a value in a specific range that is K wide? Or are you trying to approximate the second by using the first?
Edit in response:
Yes, you do get a lot of duplicate work even without rebuilding the list. But if you don't rebuild the list, that's not O(k * N * 2^(N/2)). Building the list is O(N * 2^(N/2)).
If you know A and B right now, you could begin iteration, and then simply not stop when you find the right answer (the bottom bound), but keep going until it goes out of range. That should be roughly the same as solving subset sum for just one solution, involving only +k more ops, and when you're done, you can ditch the list.
More edit:
You have a range of sums, from A to B. First, you solve subset sum problem for A. Then, you just keep iterating and storing the results, until you find the solution for B, at which point you stop. Now you have every sum between A and B in a single run, and it will only cost you one subset sum problem solve plus K operations for K values in the range A to B, which is linear and nice and fast.
s = *i + *j; if s > B then ++i; else if s < A then ++j; else { print s; ... what_goes_here? ... }
No, no, no. I get the source of your confusion now (I misread something), but it's still not as complex as what you had originally. If you want to find ALL combinations within the range, instead of one, you will just have to iterate over all combinations of both lists, which isn't too bad.
Excuse my use of auto. C++0x compiler.
std::vector<int> sums;
std::vector<int> firstlist;
std::vector<int> secondlist;
// Fill in first/secondlist.
std::sort(firstlist.begin(), firstlist.end());
std::sort(secondlist.begin(), secondlist.end());
auto firstit = firstlist.begin();
auto secondit = secondlist.begin();
// Since we want all in a range, rather than just the first, we need to check all combinations. Horowitz/Sahni is only designed to find one.
for(; firstit != firstlist.end(); firstit++) {
for(; secondit = secondlist.end(); secondit++) {
int sum = *firstit + *secondit;
if (sum > A && sum < B)
sums.push_back(sum);
}
}
It's still not great. But it could be optimized if you know in advance that N is very large, for example, mapping or hashmapping sums to iterators, so that any given firstit can find any suitable partners in secondit, reducing the running time.
It is possible to do this in O(N*2^(N/2)), using ideas similar to Horowitz Sahni, but we try and do some optimizations to reduce the constants in the BigOh.
We do the following
Step 1: Split into sets of N/2, and generate all possible 2^(N/2) sets for each split. Call them S1 and S2. This we can do in O(2^(N/2)) (note: the N factor is missing here, due to an optimization we can do).
Step 2: Next sort the larger of S1 and S2 (say S1) in O(N*2^(N/2)) time (we optimize here by not sorting both).
Step 3: Find Subset sums in range [A,B] in S1 using binary search (as it is sorted).
Step 4: Next, for each sum in S2, find using binary search the sets in S1 whose union with this gives sum in range [A,B]. This is O(N*2^(N/2)). At the same time, find if that corresponding set in S2 is in the range [A,B]. The optimization here is to combine loops. Note: This gives you a representation of the sets (in terms of two indexes in S2), not the sets themselves. If you want all the sets, this becomes O(K + N*2^(N/2)), where K is the number of sets.
Further optimizations might be possible, for instance when sum from S2, is negative, we don't consider sums < A etc.
Since Steps 2,3,4 should be pretty clear, I will elaborate further on how to get Step 1 done in O(2^(N/2)) time.
For this, we use the concept of Gray Codes. Gray codes are a sequence of binary bit patterns in which each pattern differs from the previous pattern in exactly one bit.
Example: 00 -> 01 -> 11 -> 10 is a gray code with 2 bits.
There are gray codes which go through all possible N/2 bit numbers and these can be generated iteratively (see the wiki page I linked to), in O(1) time for each step (total O(2^(N/2)) steps), given the previous bit pattern, i.e. given current bit pattern, we can generate the next bit pattern in O(1) time.
This enables us to form all the subset sums, by using the previous sum and changing that by just adding or subtracting one number (corresponding to the differing bit position) to get the next sum.
If you modify the Horowitz-Sahni algorithm in the right way, then it's hardly slower than original Horowitz-Sahni. Recall that Horowitz-Sahni works two lists of subset sums: Sums of subsets in the left half of the original list, and sums of subsets in the right half. Call these two lists of sums L and R. To obtain subsets that sum to some fixed value A, you can sort R, and then look up a number in R that matches each number in L using a binary search. However, the algorithm is asymmetric only to save a constant factor in space and time. It's a good idea for this problem to sort both L and R.
In my code below I also reverse L. Then you can keep two pointers into R, updated for each entry in L: A pointer to the last entry in R that's too low, and a pointer to the first entry in R that's too high. When you advance to the next entry in L, each pointer might either move forward or stay put, but they won't have to move backwards. Thus, the second stage of the Horowitz-Sahni algorithm only takes linear time in the data generated in the first stage, plus linear time in the length of the output. Up to a constant factor, you can't do better than that (once you have committed to this meet-in-the-middle algorithm).
Here is a Python code with example input:
# Input
terms = [29371, 108810, 124019, 267363, 298330, 368607,
438140, 453243, 515250, 575143, 695146, 840979, 868052, 999760]
(A,B) = (500000,600000)
# Subset iterator stolen from Sage
def subsets(X):
yield []; pairs = []
for x in X:
pairs.append((2**len(pairs),x))
for w in xrange(2**(len(pairs)-1), 2**(len(pairs))):
yield [x for m, x in pairs if m & w]
# Modified Horowitz-Sahni with toolow and toohigh indices
L = sorted([(sum(S),S) for S in subsets(terms[:len(terms)/2])])
R = sorted([(sum(S),S) for S in subsets(terms[len(terms)/2:])])
(toolow,toohigh) = (-1,0)
for (Lsum,S) in reversed(L):
while R[toolow+1][0] < A-Lsum and toolow < len(R)-1: toolow += 1
while R[toohigh][0] <= B-Lsum and toohigh < len(R): toohigh += 1
for n in xrange(toolow+1,toohigh):
print '+'.join(map(str,S+R[n][1])),'=',sum(S+R[n][1])
"Moron" (I think he should change his user name) raises the reasonable issue of optimizing the algorithm a little further by skipping one of the sorts. Actually, because each list L and R is a list of sizes of subsets, you can do a combined generate and sort of each one in linear time! (That is, linear in the lengths of the lists.) L is the union of two lists of sums, those that include the first term, term[0], and those that don't. So actually you should just make one of these halves in sorted form, add a constant, and then do a merge of the two sorted lists. If you apply this idea recursively, you save a logarithmic factor in the time to make a sorted L, i.e., a factor of N in the original variable of the problem. This gives a good reason to sort both lists as you generate them. If you only sort one list, you have some binary searches that could reintroduce that factor of N; at best you have to optimize them somehow.
At first glance, a factor of O(N) could still be there for a different reason: If you want not just the subset sum, but the subset that makes the sum, then it looks like O(N) time and space to store each subset in L and in R. However, there is a data-sharing trick that also gets rid of that factor of O(N). The first step of the trick is to store each subset of the left or right half as a linked list of bits (1 if a term is included, 0 if it is not included). Then, when the list L is doubled in size as in the previous paragraph, the two linked lists for a subset and its partner can be shared, except at the head:
0
|
v
1 -> 1 -> 0 -> ...
Actually, this linked list trick is an artifact of the cost model and never truly helpful. Because, in order to have pointers in a RAM architecture with O(1) cost, you have to define data words with O(log(memory)) bits. But if you have data words of this size, you might as well store each word as a single bit vector rather than with this pointer structure. I.e., if you need less than a gigaword of memory, then you can store each subset in a 32-bit word. If you need more than a gigaword, then you have a 64-bit architecture or an emulation of it (or maybe 48 bits), and you can still store each subset in one word. If you patch the RAM cost model to take account of word size, then this factor of N was never really there anyway.
So, interestingly, the time complexity for the original Horowitz-Sahni algorithm isn't O(N*2^(N/2)), it's O(2^(N/2)). Likewise the time complexity for this problem is O(K+2^(N/2)), where K is the length of the output.

Resources