In what order should you insert a set of known keys into a B-Tree to get minimal height? - algorithm

Given a fixed number of keys or values(stored either in array or in some data structure) and order of b-tree, can we determine the sequence of inserting keys that would generate a space efficient b-tree.
To illustrate, consider b-tree of order 3. Let the keys be {1,2,3,4,5,6,7}. Inserting elements into tree in the following order
for(int i=1 ;i<8; ++i)
{
tree.push(i);
}
would create a tree like this
4
2 6
1 3 5 7
see http://en.wikipedia.org/wiki/B-tree
But inserting elements in this way
flag = true;
for(int i=1,j=7; i<8; ++i,--j)
{
if(flag)
{
tree.push(i);
flag = false;
}
else
{
tree.push(j);
flag = true;
}
}
creates a tree like this
3 5
1 2 4 6 7
where we can see there is decrease in level.
So is there a particular way to determine sequence of insertion which would reduce space consumption?

The following trick should work for most ordered search trees, assuming the data to insert are the integers 1..n.
Consider the binary representation of your integer keys - for 1..7 (with dots for zeros) that's...
Bit : 210
1 : ..1
2 : .1.
3 : .11
4 : 1..
5 : 1.1
6 : 11.
7 : 111
Bit 2 changes least often, Bit 0 changes most often. That's the opposite of what we want, so what if we reverse the order of those bits, then sort our keys in order of this bit-reversed value...
Bit : 210 Rev
4 : 1.. -> ..1 : 1
------------------
2 : .1. -> .1. : 2
6 : 11. -> .11 : 3
------------------
1 : ..1 -> 1.. : 4
5 : 1.1 -> 1.1 : 5
3 : .11 -> 11. : 6
7 : 111 -> 111 : 7
It's easiest to explain this in terms of an unbalanced binary search tree, growing by adding leaves. The first item is dead centre - it's exactly the item we want for the root. Then we add the keys for the next layer down. Finally, we add the leaf layer. At every step, the tree is as balanced as it can be, so even if you happen to be building an AVL or red-black balanced tree, the rebalancing logic should never be invoked.
[EDIT I just realised you don't need to sort the data based on those bit-reversed values in order to access the keys in that order. The trick to that is to notice that bit-reversing is its own inverse. As well as mapping keys to positions, it maps positions to keys. So if you loop through from 1..n, you can use this bit-reversed value to decide which item to insert next - for the first insert use the 4th item, for the second insert use the second item and so on. One complication - you have to round n upwards to one less than a power of two (7 is OK, but use 15 instead of 8) and you have to bounds-check the bit-reversed values. The reason is that bit-reversing can move some in-bounds positions out-of-bounds and visa versa.]
Actually, for a red-black tree some rebalancing logic will be invoked, but it should just be re-colouring nodes - not rearranging them. However, I haven't double checked, so don't rely on this claim.
For a B tree, the height of the tree grows by adding a new root. Proving this works is, therefore, a little awkward (and it may require a more careful node-splitting than a B tree normally requires) but the basic idea is the same. Although rebalancing occurs, it occurs in a balanced way because of the order of inserts.
This can be generalised for any set of known-in-advance keys because, once the keys are sorted, you can assign suitable indexes based on that sorted order.
WARNING - This isn't an efficient way to construct a perfectly balanced tree from known already-sorted data.
If you have your data already sorted, and know it's size, you can build a perfectly balanced tree in O(n) time. Here's some pseudocode...
if size is zero, return null
from the size, decide which index should be the (subtree) root
recurse for the left subtree, giving that index as the size (assuming 0 is a valid index)
take the next item to build the (subtree) root
recurse for the right subtree, giving (size - (index + 1)) as the size
add the left and right subtree results as the child pointers
return the new (subtree) root
Basically, this decides the structure of the tree based on the size and traverses that structure, building the actual nodes along the way. It shouldn't be too hard to adapt it for B Trees.

This is how I would add elements to b-tree.
Thanks to Steve314, for giving me the start with binary representation,
Given are n elements to add, in order. We have to add it to m-order b-tree. Take their indexes (1...n) and convert it to radix m. The main idea of this insertion is to insert number with highest m-radix bit currently and keep it above the lesser m-radix numbers added in the tree despite splitting of nodes.
1,2,3.. are indexes so you actually insert the numbers they point to.
For example, order-4 tree
4 8 12 highest radix bit numbers
1,2,3 5,6,7 9,10,11 13,14,15
Now depending on order median can be:
order is even -> number of keys are odd -> median is middle (mid median)
order is odd -> number of keys are even -> left median or right median
The choice of median (left/right) to be promoted will decide the order in which I should insert elements. This has to be fixed for the b-tree.
I add elements to trees in buckets. First I add bucket elements then on completion next bucket in order. Buckets can be easily created if median is known, bucket size is order m.
I take left median for promotion. Choosing bucket for insertion.
| 4 | 8 | 12 |
1,2,|3 5,6,|7 9,10,|11 13,14,|15
3 2 1 Order to insert buckets.
For left-median choice I insert buckets to the tree starting from right side, for right median choice I insert buckets from left side. Choosing left-median we insert median first, then elements to left of it first then rest of the numbers in the bucket.
Example
Bucket median first
12,
Add elements to left
11,12,
Then after all elements inserted it looks like,
| 12 |
|11 13,14,|
Then I choose the bucket left to it. And repeat the same process.
Median
12
8,11 13,14,
Add elements to left first
12
7,8,11 13,14,
Adding rest
8 | 12
7 9,10,|11 13,14,
Similarly keep adding all the numbers,
4 | 8 | 12
3 5,6,|7 9,10,|11 13,14,
At the end add numbers left out from buckets.
| 4 | 8 | 12 |
1,2,|3 5,6,|7 9,10,|11 13,14,|15
For mid-median (even order b-trees) you simply insert the median and then all the numbers in the bucket.
For right-median I add buckets from the left. For elements within the bucket I first insert median then right elements and then left elements.
Here we are adding the highest m-radix numbers, and in the process I added numbers with immediate lesser m-radix bit, making sure the highest m-radix numbers stay at top. Here I have only two levels, for more levels I repeat the same process in descending order of radix bits.
Last case is when remaining elements are of same radix-bit and there is no numbers with lesser radix-bit, then simply insert them and finish the procedure.
I would give an example for 3 levels, but it is too long to show. So please try with other parameters and tell if it works.

Unfortunately, all trees exhibit their worst case scenario running times, and require rigid balancing techniques when data is entered in increasing order like that. Binary trees quickly turn into linked lists, etc.
For typical B-Tree use cases (databases, filesystems, etc), you can typically count on your data naturally being more distributed, producing a tree more like your second example.
Though if it is really a concern, you could hash each key, guaranteeing a wider distribution of values.
for( i=1; i<8; ++i )
tree.push(hash(i));

To build a particular B-tree using Insert() as a black box, work backward. Given a nonempty B-tree, find a node with more than the minimum number of children that's as close to the leaves as possible. The root is considered to have minimum 0, so a node with the minimum number of children always exists. Delete a value from this node to be prepended to the list of Insert() calls. Work toward the leaves, merging subtrees.
For example, given the 2-3 tree
8
4 c
2 6 a e
1 3 5 7 9 b d f,
we choose 8 and do merges to obtain the predecessor
4 c
2 6 a e
1 3 5 79 b d f.
Then we choose 9.
4 c
2 6 a e
1 3 5 7 b d f
Then a.
4 c
2 6 e
1 3 5 7b d f
Then b.
4 c
2 6 e
1 3 5 7 d f
Then c.
4
2 6 e
1 3 5 7d f
Et cetera.

So is there a particular way to determine sequence of insertion which would reduce space consumption?
Edit note: since the question was quite interesting, I try to improve my answer with a bit of Haskell.
Let k be the Knuth order of the B-Tree and list a list of keys
The minimization of space consumption has a trivial solution:
-- won't use point free notation to ease haskell newbies
trivial k list = concat $ reverse $ chunksOf (k-1) $ sort list
Such algorithm will efficiently produce a time-inefficient B-Tree, unbalanced on the left but with minimal space consumption.
A lot of non trivial solutions exist that are less efficient to produce but show better lookup performance (lower height/depth). As you know, it's all about trade-offs!
A simple algorithm that minimizes both the B-Tree depth and the space consumption (but it doesn't minimize lookup performance!), is the following
-- Sort the list in increasing order and call sortByBTreeSpaceConsumption
-- with the result
smart k list = sortByBTreeSpaceConsumption k $ sort list
-- Sort list so that inserting in a B-Tree with Knuth order = k
-- will produce a B-Tree with minimal space consumption minimal depth
-- (but not best performance)
sortByBTreeSpaceConsumption :: Ord a => Int -> [a] -> [a]
sortByBTreeSpaceConsumption _ [] = []
sortByBTreeSpaceConsumption k list
| k - 1 >= numOfItems = list -- this will be a leaf
| otherwise = heads ++ tails ++ sortByBTreeSpaceConsumption k remainder
where requiredLayers = minNumberOfLayersToArrange k list
numOfItems = length list
capacityOfInnerLayers = capacityOfBTree k $ requiredLayers - 1
blockSize = capacityOfInnerLayers + 1
blocks = chunksOf blockSize balanced
heads = map last blocks
tails = concat $ map (sortByBTreeSpaceConsumption k . init) blocks
balanced = take (numOfItems - (mod numOfItems blockSize)) list
remainder = drop (numOfItems - (mod numOfItems blockSize)) list
-- Capacity of a layer n in a B-Tree with Knuth order = k
layerCapacity k 0 = k - 1
layerCapacity k n = k * layerCapacity k (n - 1)
-- Infinite list of capacities of layers in a B-Tree with Knuth order = k
capacitiesOfLayers k = map (layerCapacity k) [0..]
-- Capacity of a B-Tree with Knut order = k and l layers
capacityOfBTree k l = sum $ take l $ capacitiesOfLayers k
-- Infinite list of capacities of B-Trees with Knuth order = k
-- as the number of layers increases
capacitiesOfBTree k = map (capacityOfBTree k) [1..]
-- compute the minimum number of layers in a B-Tree of Knuth order k
-- required to store the items in list
minNumberOfLayersToArrange k list = 1 + f k
where numOfItems = length list
f = length . takeWhile (< numOfItems) . capacitiesOfBTree
With this smart function given a list = [21, 18, 16, 9, 12, 7, 6, 5, 1, 2] and a B-Tree with knuth order = 3 we should obtain [18, 5, 9, 1, 2, 6, 7, 12, 16, 21] with a resulting B-Tree like
[18, 21]
/
[5 , 9]
/ | \
[1,2] [6,7] [12, 16]
Obviously this is suboptimal from a performance point of view, but should be acceptable, since obtaining a better one (like the following) would be far more expensive (computationally and economically):
[7 , 16]
/ | \
[5,6] [9,12] [18, 21]
/
[1,2]
If you want to run it, compile the previous code in a Main.hs file and compile it with ghc after prepending
import Data.List (sort)
import Data.List.Split
import System.Environment (getArgs)
main = do
args <- getArgs
let knuthOrder = read $ head args
let keys = (map read $ tail args) :: [Int]
putStr "smart: "
putStrLn $ show $ smart knuthOrder keys
putStr "trivial: "
putStrLn $ show $ trivial knuthOrder keys

Related

Parallel Merge Sort Using Binary Search

I'm trying to implement the Merge Sort algorithm on CUDA which is mentioned on the Designing Efficient Sorting Algorithms for Manycore GPUs.
For this as in the intermediate level, It has suggested to assign a thread block to merge two sorted arrays (say A and B) to a single array. And the idea is to assign a thread to data - a_i on the array A and find its location on B array using Binary Search. and if the position of a_i on B is j the a_i's position on new array is (i+j)
But when I implemented this (or tried to), I found a scenario that above method doesn't seems to work. For example if the two arrays that need to be merged are as follows,
and
So that the 53 on the first array (shaded in gray one) will find that its respective position on the second array is 11, So its position on the final array is (11 + 11 = 22). And likewise position of 53 on the first array which is shaded in blue is (11 + 12 = 23).
If we take the 53 of the second array its final position is also given as (11 + 11 = 22). Which conflicts with the grey color 53 on the first array.
So according to my understanding a simple binary search algorithm cannot be used to merge these two arrays. Is there a known or easier method to resolve this conflict?
In the paper, I read:
Given the sorted sequences A and B, we want to compute the sorted
sequence C = merge(A, B). If these sequences are sufficiently small,
say of size no greater than t = 256 elements each, we can merge them
using a single t-thread block. For an element ai ∈ A, we need only
compute rank(ai , C), which is the position of element ai in the
merged sequence C. Because both A and B are sorted, we know that
rank(ai , C) = i + rank(ai , B), where rank(ai , B) is simply the
count of elements bj ∈ B with bj < ai and which we compute efficiently
using binary search. Elements of B can obviously be treated in the
same way.
You have to deal with duplicates in the binary search, when you are looking for rank(ai, B). Imagine B as a BST, rank(ai, B) = max of j {bj <= ai} +1 should work well.
Sorry I'm a bit late, but I just try to implement the same algorithm on a FPGA.
It should work if you just use <= comparison to binary sort A into B and < comparison if you sort B into A. That way you have a guaranteed order that elements from A will always be lower in the index than elements from B even if they are the same.
In your example the positions would be:
Grey A 11 + 11 = 22
Blue A 12 + 11 = 23
Grey B 11 + 13 = 24
Sidenote: I would recommend start counting from 0 for the indexing (I choose the same counting as you for clarity). Like this the first index in the Array C is 1 + 1 = 2

Getting the lowest possible sum from numbers' difference

I have to find the lowest possible sum from numbers' difference.
Let's say I have 4 numbers. 1515, 1520, 1500 and 1535. The lowest sum of difference is 30, because 1535 - 1520 = 15 && 1515 - 1500 = 15 and 15 + 15 = 30. If I would do like this: 1520 - 1515 = 5 && 1535 - 1500 = 35 it would be 40 in sum.
Hope you got it, if not, ask me.
Any ideas how to program this? I just found this online, tried to translate from my language to English. It sounds interesting. I can't do bruteforce, because it would take ages to compile. I don't need code, just ideas how to program or little fragment of code.
Thanks.
Edit:
I didn't post everything... One more edition:
I have let's say 8 possible numbers. But I have to take only 6 of them to make the smallest sum. For instance, numbers 1731, 1572, 2041, 1561, 1682, 1572, 1609, 1731, the smallest sum will be 48, but here I have to take only 6 numbers from 8.
Taking the edit into account:
Start by sorting the list. Then use a dynamic programming solution, with state i, n representing the minimum sum of n differences when considering only the first i numbers in the sequence. Initial states: dp[*][0] = 0, everything else = infinity. Use two loops: outer loop looping through i from 1 to N, inner loop looping through n from 0 to R (3 in your example case in your edit - this uses 3 pairs of numbers which means 6 individual numbers). Your recurrence relation is dp[i][n] = min(dp[i-1][n], dp[i-2][n-1] + seq[i] - seq[i-1]).
You have to be aware of handling boundary cases which I've ignored, but the general idea should work and will run in O(N log N + NR) and use O(NR) space.
The solution by marcog is a correct, non-recursive, polynomial-time solution to the problem — it's a pretty standard DP problem — but, just for completeness, here's a proof that it works, and actual code for the problem. [#marcog: Feel free to copy any part of this answer into your own if you wish; I'll then delete this.]
Proof
Let the list be x1, …, xN. Assume wlog that the list is sorted. We're trying to find K (disjoint) pairs of elements from the list, such that the sum of their differences is minimised.
Claim: An optimal solution always consists of the differences of consecutive elements.
Proof: Suppose you fix the subset of elements whose differences are taken. Then by the proof given by Jonas Kölker, the optimal solution for just this subset consists of differences of consecutive elements from the list. Now suppose there is a solution corresponding to a subset that does not comprise pairs of consecutive elements, i.e. the solution involves a difference xj-xi where j>i+1. Then, we can replace xj with xi+1 to get a smaller difference, since
xi ≤ xi+1 ≤ xj ⇒ xi+1-xi ≤ xj-xi.
(Needless to say, if xi+1=xj, then taking xi+1 is indistinguishable from taking xj.) This proves the claim.
The rest is just routine dynamic programming stuff: the optimal solution using k pairs from the first n elements either doesn't use the nth element at all (in which case it's just the optimal solution using k pairs from the first n-1), or it uses the nth element in which case it's the difference xn-xn-1 plus the optimal solution using k-1 pairs from the first n-2.
The whole program runs in time O(N log N + NK), as marcog says. (Sorting + DP.)
Code
Here's a complete program. I was lazy with initializing arrays and wrote Python code using dicts; this is a small log(N) factor over using actual arrays.
'''
The minimum possible sum|x_i - x_j| using K pairs (2K numbers) from N numbers
'''
import sys
def ints(): return [int(s) for s in sys.stdin.readline().split()]
N, K = ints()
num = sorted(ints())
best = {} #best[(k,n)] = minimum sum using k pairs out of 0 to n
def b(k,n):
if best.has_key((k,n)): return best[(k,n)]
if k==0: return 0
return float('inf')
for n in range(1,N):
for k in range(1,K+1):
best[(k,n)] = min([b(k,n-1), #Not using num[n]
b(k-1,n-2) + num[n]-num[n-1]]) #Using num[n]
print best[(K,N-1)]
Test it:
Input
4 2
1515 1520 1500 1535
Output
30
Input
8 3
1731 1572 2041 1561 1682 1572 1609 1731
Output
48
I assume the general problem is this: given a list of 2n integers, output a list of n pairs, such that the sum of |x - y| over all pairs (x, y) is as small as possible.
In that case, the idea would be:
sort the numbers
emit (numbers[2k], numbers[2k+1]) for k = 0, ..., n - 1.
This works. Proof:
Suppose you have x_1 < x_2 < x_3 < x_4 (possibly with other values between them) and output (x_1, x_3) and (x_2, x_4). Then
|x_4 - x_2| + |x_3 - x_1| = |x_4 - x_3| + |x_3 - x_2| + |x_3 - x_2| + |x_2 - x_1| >= |x_4 - x_3| + |x_2 - x_1|.
In other words, it's always better to output (x_1, x_2) and (x_3, x_4) because you don't redundantly cover the space between x_2 and x_3 twice. By induction, the smallest number of the 2n must be paired with the second smallest number; by induction on the rest of the list, pairing up smallest neighbours is always optimal, so the algorithm sketch I proposed is correct.
Order the list, then do the difference calculation.
EDIT: hi #hey
You can solve the problem using dynamic programming.
Say you have a list L of N integers, you must form k pairs (with 2*k <= N)
Build a function that finds the smallest difference within a list (if the list is sorted, it will be faster ;) call it smallest(list l)
Build another one that finds the same for two pairs (can be tricky, but doable) and call it smallest2(list l)
Let's define best(int i, list l) the function that gives you the best result for i pairs within the list l
The algorithm goes as follows:
best(1, L) = smallest(L)
best(2, L) = smallest2(L)
for i from 1 to k:
loop
compute min (
stored_best(i-2) - smallest2( stored_remainder(i-2) ),
stored_best(i-1) - smallest( stored_remainder(i-1)
) and store as best(i)
store the remainder as well for the chosen solution
Now, the problem is once you have chosen a pair, the two ints that form the boundaries are reserved and can't be used to form a better solution. But by looking two levels back you can guaranty you have allowed switching candidates.
(The switching work is done by smallest2)
Step 1: Calculate pair differences
I think it is fairly obvious that the right approach is to sort the numbers and then take differences between each
adjacent pair of numbers. These differences are the "candidate" differences contributing to the
minimal difference sum. Using the numbers from your example would lead to:
Number Diff
====== ====
1561
11
1572
0
1572
37
1609
73
1682
49
1731
0
1731
310
2041
Save the differences into an array or table or some other data structure where you can maintain the
differences and the two numbers that contributed to each difference. Call this the DiffTable. It
should look something like:
Index Diff Number1 Number2
===== ==== ======= =======
1 11 1561 1572
2 0 1572 1572
3 37 1572 1609
4 73 1609 1682
5 49 1682 1731
6 0 1731 1731
7 310 1731 2041
Step 2: Choose minimal Differences
If all numbers had to be chosen, we could have stopped at step 1 by choosing the number pair for odd numbered
indices: 1, 3, 5, 7. This is the correct answer. However,
the problem states that a subset of pairs are chosen and this complicates the problem quite a bit.
In your example 3 differences (6 numbers = 3 pairs = 3 differences) need to be chosen such that:
The sum of the differences is minimal
The numbers participating in any chosen difference are removed from the list.
The second point means that if we chose Diff 11 (Index = 1 above), the numbers 1561 and 1572 are
removed from the list, and consequently, the next Diff of 0 at index 2 cannot be used because only 1 instance
of 1572 is left. Whenever a
Diff is chosen the adjacent Diff values are removed. This is why there is only one way to choose 4 pairs of
numbers from a list containing eight numbers.
About the only method I can think of to minimize the sum of the Diff above is to generate and test.
The following pseudo code outlines a process to generate
all 'legal' sets of index values for a DiffTable of arbitrary size
where an arbitrary number of number pairs are chosen. One (or more) of the
generated index sets will contain the indices into the DiffTable yielding a minimum Diff sum.
/* Global Variables */
M = 7 /* Number of candidate pair differences in DiffTable */
N = 3 /* Number of indices in each candidate pair set (3 pairs of numbers) */
AllSets = [] /* Set of candidate index sets (set of sets) */
call GenIdxSet(1, []) /* Call generator with seed values */
/* AllSets now contains candidate index sets to perform min sum tests on */
end
procedure: GenIdxSet(i, IdxSet)
/* Generate all the valid index values for current level */
/* and subsequent levels until a complete index set is generated */
do while i <= M
if CountMembers(IdxSet) = N - 1 then /* Set is complete */
AllSets = AppendToSet(AllSets, AppendToSet(IdxSet, i))
else /* Add another index */
call GenIdxSet(i + 2, AppendToSet(IdxSet, i))
i = i + 1
end
return
Function CountMembers returns the number of members in the given set, function AppendToSet returns a new set
where the arguments are appended into a single ordered set. For example
AppendToSet([a, b, c], d) returns the set: [a, b, c, d].
For the given parameters, M = 7 and N = 3, AllSets becomes:
[[1 3 5]
[1 3 6] <= Diffs = (11 + 37 + 0) = 48
[1 3 7]
[1 4 6]
[1 4 7]
[1 5 7]
[2 4 6]
[2 4 7]
[2 5 7]
[3 5 7]]
Calculate the sums using each set of indices, the one that is minimum identifies the
required number pairs in DiffTable. Above I show that the second set of indices gives
the minimum you are looking for.
This is a simple brute force technique and it does not scale very well. If you had a list of
50 number pairs and wanted to choose the 5 pairs, AllSets would contain 1,221,759 sets of
number pairs to test.
I know you said you did not need code but it is the best way for me to describe a set based solution. The solution runs under SQL Server 2008. Included in the code is the data for the two examples you give. The sql solution could be done with a single self joining table but I find it easier to explain when there are multiple tables.
--table 1 holds the values
declare #Table1 table (T1_Val int)
Insert #Table1
--this data is test 1
--Select (1515) Union ALL
--Select (1520) Union ALL
--Select (1500) Union ALL
--Select (1535)
--this data is test 2
Select (1731) Union ALL
Select (1572) Union ALL
Select (2041) Union ALL
Select (1561) Union ALL
Select (1682) Union ALL
Select (1572) Union ALL
Select (1609) Union ALL
Select (1731)
--Select * from #Table1
--table 2 holds the sorted numbered list
Declare #Table2 table (T2_id int identity(1,1), T1_Val int)
Insert #Table2 Select T1_Val from #Table1 order by T1_Val
--table 3 will hold the sorted pairs
Declare #Table3 table (T3_id int identity(1,1), T21_id int, T21_Val int, T22_id int, T22_val int)
Insert #Table3
Select T2_1.T2_id, T2_1.T1_Val,T2_2.T2_id, T2_2.T1_Val from #Table2 AS T2_1
LEFT Outer join #Table2 AS T2_2 on T2_1.T2_id = T2_2.T2_id +1
--select * from #Table3
--remove odd numbered rows
delete from #Table3 where T3_id % 2 > 0
--select * from #Table3
--show the diff values
--select *, ABS(T21_Val - T22_val) from #Table3
--show the diff values in order
--select *, ABS(T21_Val - T22_val) from #Table3 order by ABS(T21_Val - T22_val)
--display the two lowest
select TOP 2 CAST(T22_val as varchar(24)) + ' and ' + CAST(T21_val as varchar(24)) as 'The minimum difference pairs are'
, ABS(T21_Val - T22_val) as 'Difference'
from #Table3
ORDER by ABS(T21_Val - T22_val)
I think #marcog's approach can be simplified further.
Take the basic approach that #jonas-kolker proved for finding the smallest differences. Take the resulting list and sort it. Take the R smallest entries from this list and use them as your differences. Proving that this is the smallest sum is trivial.
#marcog's approach is effectively O(N^2) because R == N is a legit option. This approach should be (2*(N log N))+N aka O(N log N).
This requires a small data structure to hold a difference and the values it was derived from. But, that is constant per entry. Thus, space is O(N).
I would go with answer of marcog, you can sort using any of the sorting algoriothms. But there is little thing to analyze now.
If you have to choose R numbers out N numbers so that the sum of their differences is minimum then the numbers be chosen in a sequence without missing any numbers in between.
Hence after sorting the array you should run an outer loop from 0 to N-R and an inner loop from 0 to R-1 times to calculate the sum of differnces.
If needed, you should try with some examples.
I've taken an approach which uses a recursive algorithm, but it does take some of what other people have contributed.
First of all we sort the numbers:
[1561,1572,1572,1609,1682,1731,1731,2041]
Then we compute the differences, keeping track of which the indices of the numbers that contributed to each difference:
[(11,(0,1)),(0,(1,2)),(37,(2,3)),(73,(3,4)),(49,(4,5)),(0,(5,6)),(310,(6,7))]
So we got 11 by getting the difference between number at index 0 and number at index 1, 37 from the numbers at indices 2 & 3.
I then sorted this list, so it tells me which pairs give me the smallest difference:
[(0,(1,2)),(0,(5,6)),(11,(0,1)),(37,(2,3)),(49,(4,5)),(73,(3,4)),(310,(6,7))]
What we can see here is that, given that we want to select n numbers, a naive solution might be to select the first n / 2 items of this list. The trouble is, in this list the third item shares an index with the first, so we'd only actually get 5 numbers, not 6. In this case you need to select the fourth pair as well to get a set of 6 numbers.
From here, I came up with this algorithm. Throughout, there is a set of accepted indices which starts empty, and there's a number of numbers left to select n:
If n is 0, we're done.
if n is 1, and the first item will provide just 1 index which isn't in our set, we taken the first item, and we're done.
if n is 2 or more, and the first item will provide 2 indices which aren't in our set, we taken the first item, and we recurse (e.g. goto 1). This time looking for n - 2 numbers that make the smallest difference in the remainder of the list.
This is the basic routine, but life isn't that simple. There are cases we haven't covered yet, but make sure you get the idea before you move on.
Actually step 3 is wrong (found that just before I posted this :-/), as it may be unnecessary to include an early difference to cover indices which are covered by later, essential differences. The first example ([1515, 1520, 1500, 1535]) falls foul of this. Because of this I've thrown it away in the section below, and expanded step 4 to deal with it.
So, now we get to look at the special cases:
** as above **
** as above **
If n is 1, but the first item will provide two indices, we can't select it. We have to throw that item away and recurse. This time we're still looking for n indices, and there have been no changes to our accepted set.
If n is 2 or more, we have a choice. Either we can a) choose this item, and recurse looking for n - (1 or 2) indices, or b) skip this item, and recurse looking for n indices.
4 is where it gets tricky, and where this routine turns into a search rather than just a sorting exercise. How can we decide which branch (a or b) to take? Well, we're recursive, so let's call both, and see which one is better. How will we judge them?
We'll want to take whichever branch produces the lowest sum.
...but only if it will use up the right number of indices.
So step 4 becomes something like this (pseudocode):
x = numberOfIndicesProvidedBy(currentDifference)
branchA = findSmallestDifference (n-x, remainingDifferences) // recurse looking for **n-(1 or 2)**
branchB = findSmallestDifference (n , remainingDifferences) // recurse looking for **n**
sumA = currentDifference + sumOf(branchA)
sumB = sumOf(branchB)
validA = indicesAddedBy(branchA) == n
validB = indicesAddedBy(branchB) == n
if not validA && not validB then return an empty branch
if validA && not validB then return branchA
if validB && not validA then return branchB
// Here, both must be valid.
if sumA <= sumB then return branchA else return branchB
I coded this up in Haskell (because I'm trying to get good at it). I'm not sure about posting the whole thing, because it might be more confusing than useful, but here's the main part:
findSmallestDifference = findSmallestDifference' Set.empty
findSmallestDifference' _ _ [] = []
findSmallestDifference' taken n (d:ds)
| n == 0 = [] -- Case 1
| n == 1 && provides1 d = [d] -- Case 2
| n == 1 && provides2 d = findSmallestDifference' taken n ds -- Case 3
| provides0 d = findSmallestDifference' taken n ds -- Case 3a (See Edit)
| validA && not validB = branchA -- Case 4
| validB && not validA = branchB -- Case 4
| validA && validB && sumA <= sumB = branchA -- Case 4
| validA && validB && sumB <= sumA = branchB -- Case 4
| otherwise = [] -- Case 4
where branchA = d : findSmallestDifference' (newTaken d) (n - (provides taken d)) ds
branchB = findSmallestDifference' taken n ds
sumA = sumDifferences branchA
sumB = sumDifferences branchB
validA = n == (indicesTaken branchA)
validB = n == (indicesTaken branchA)
newTaken x = insertIndices x taken
Hopefully you can see all the cases there. That code(-ish), plus some wrapper produces this:
*Main> findLeastDiff 6 [1731, 1572, 2041, 1561, 1682, 1572, 1609, 1731]
Smallest Difference found is 48
1572 - 1572 = 0
1731 - 1731 = 0
1572 - 1561 = 11
1609 - 1572 = 37
*Main> findLeastDiff 4 [1515, 1520, 1500,1535]
Smallest Difference found is 30
1515 - 1500 = 15
1535 - 1520 = 15
This has become long, but I've tried to be explicit. Hopefully it was worth while.
Edit : There is a case 3a that can be added to avoid some unnecessary work. If the current difference provides no additional indices, it can be skipped. This is taken care of in step 4 above, but there's no point in evaluating both halves of the tree for no gain. I've added this to the Haskell.
Something like
Sort List
Find Duplicates
Make the duplicates a pair
remove duplicates from list
break rest of list into pairs
calculate differences of each pair
take lowest amounts
In your example you have 8 number and need the best 3 pairs. First sort the list which gives you
1561, 1572, 1572, 1609, 1682, 1731, 1731, 2041
If you have duplicates make them a pair and remove them from the list so you have
[1572, 1572] = 0
[1731, 1731] = 0
L = { 1561, 1609, 1682, 2041 }
Break the remaining list into pairs, giving you the 4 following pairs
[1572, 1572] = 0
[1731, 1731] = 0
[1561, 1609] = 48
[1682, 2041] = 359
Then drop the amount of numbers you need to.
This gives you the following 3 pairs with the lowest pairs
[1572, 1572] = 0
[1731, 1731] = 0
[1561, 1609] = 48
So
0 + 0 + 48 = 48

More localized, efficient Lowest Common Ancestor algorithm given multiple binary trees?

I have multiple binary trees stored as an array. In each slot is either nil (or null; pick your language) or a fixed tuple storing two numbers: the indices of the two "children". No node will have only one child -- it's either none or two.
Think of each slot as a binary node that only stores pointers to its children, and no inherent value.
Take this system of binary trees:
0 1
/ \ / \
2 3 4 5
/ \ / \
6 7 8 9
/ \
10 11
The associated array would be:
0 1 2 3 4 5 6 7 8 9 10 11
[ [2,3] , [4,5] , [6,7] , nil , nil , [8,9] , nil , [10,11] , nil , nil , nil , nil ]
I've already written simple functions to find direct parents of nodes (simply by searching from the front until there is a node that contains the child)
Furthermore, let us say that at relevant times, both all trees are anywhere between a few to a few thousand levels deep.
I'd like to find a function
P(m,n)
to find the lowest common ancestor of m and n -- to put more formally, the LCA is defined as the "lowest", or deepest node in which have m and n as descendants (children, or children of children, etc.). If there is none, a nil would be a valid return.
Some examples, given our given tree:
P( 6,11) # => 2
P( 3,10) # => 0
P( 8, 6) # => nil
P( 2,11) # => 2
The main method I've been able to find is one that uses an Euler trace, which turns the given tree (Adding node A as the invisible parent of 0 and 1, with a "value" of -1), into:
A-0-2-6-2-7-10-7-11-7-2-0-3-0-A-1-4-1-5-8-5-9-5-1-A
And from that, simply find the node between your given m and n that has the lowest number; For example, to find P(6,11), look for a 6 and an 11 on the trace. The number between them that is the lowest is 2, and that's your answer. If A (-1) is in between them, return nil.
-- Calculating P(6,11) --
A-0-2-6-2-7-10-7-11-7-2-0-3-0-A-1-4-1-5-8-5-9-5-1-A
^ ^ ^
| | |
m lowest n
Unfortunately, I do believe that finding the Euler trace of a tree that can be several thousands of levels deep is a bit machine-taxing...and because my tree is constantly being changed throughout the course of the programming, every time I wanted to find the LCA, I'd have to re-calculate the Euler trace and hold it in memory every time.
Is there a more memory efficient way, given the framework I'm using? One that maybe iterates upwards? One way I could think of would be the "count" the generation/depth of both nodes, and climb the lowest node until it matched the depth of the highest, and increment both until they find someone similar.
But that'd involve climbing up from level, say, 3025, back to 0, twice, to count the generation, and using a terribly inefficient climbing-up algorithm in the first place, and then re-climbing back up.
Are there any other better ways?
Clarifications
In the way this system is built, every child will have a number greater than their parents.
This does not guarantee that if n is in generation X, there are no nodes in generation (X-1) that are greater than n. For example:
0
/ \
/ \
/ \
1 2 6
/ \ / \ / \
2 3 9 10 7 8
/ \ / \
4 5 11 12
is a valid tree system.
Also, an artifact of the way the trees are built are that the two immediate children of the same parent will always be consecutively numbered.
Are the nodes in order like in your example where the children have a larger id than the parent? If so, you might be able to do something similar to a merge sort to find them.. for your example, the parent tree of 6 and 11 are:
6 -> 2 -> 0
11 -> 7 -> 2 -> 0
So perhaps the algorithm would be:
left = left_start
right = right_start
while left > 0 and right > 0
if left = right
return left
else if left > right
left = parent(left)
else
right = parent(right)
Which would run as:
left right
---- -----
6 11 (right -> 7)
6 7 (right -> 2)
6 2 (left -> 2)
2 2 (return 2)
Is this correct?
Maybe this will help: Dynamic LCA Queries on Trees.
Abstract:
Richard Cole, Ramesh Hariharan
We show how to maintain a data
structure on trees which allows for
the following operations, all in
worst-case constant time. 1. Insertion
of leaves and internal nodes. 2.
Deletion of leaves. 3. Deletion of
internal nodes with only one child. 4.
Determining the Least Common Ancestor
of any two nodes.
Conference: Symposium on Discrete
Algorithms - SODA 1999
I've solved your problem in Haskell. Assuming you know the roots of the forest, the solution takes time linear in the size of the forest and constant additional memory. You can find the full code at http://pastebin.com/ha4gqU0n.
The solution is recursive, and the main idea is that you can call a function on a subtree which returns one of four results:
The subtree contains neither m nor n.
The subtree contains m but not n.
The subtree contains n but not m.
The subtree contains both m and n, and the index of their least common ancestor is k.
A node without children may contain m, n, or neither, and you simply return the appropriate result.
If a node with index k has two children, you combine the results as follows:
join :: Int -> Result -> Result -> Result
join _ (HasBoth k) _ = HasBoth k
join _ _ (HasBoth k) = HasBoth k
join _ HasNeither r = r
join _ r HasNeither = r
join k HasLeft HasRight = HasBoth k
join k HasRight HasLeft = HasBoth k
After computing this result you have to check the index k of the node itself; if k is equal to m or n, you will "extend" the result of the join operation.
My code uses algebraic data types, but I've been careful to assume you need only the following operations:
Get the index of a node
Find out if a node is empty, and if not, find its two children
Since your question is language-agnostic I hope you'll be able to adapt my solution.
There are various performance tweaks you could put in. For example, if you find a root that has exactly one of the two nodes m and n, you can quit right away, because you know there's no common ancestor. Also, if you look at one subtree and it has the common ancestor, you can ignore the other subtree (that one I get for free using lazy evaluation).
Your question was primarily about how to save memory. If a linear-time solution is too slow, you'll probably need an auxiliary data structure. Space-for-time tradeoffs are the bane of our existence.
I think that you can simply loop backwards through the array, always replacing the higher of the two indices by its parent, until they are either equal or no further parent is found:
(defun lowest-common-ancestor (array node-index-1 node-index-2)
(cond ((or (null node-index-1)
(null node-index-2))
nil)
((= node-index-1 node-index-2)
node-index-1)
((< node-index-1 node-index-2)
(lowest-common-ancestor array
node-index-1
(find-parent array node-index-2)))
(t
(lowest-common-ancestor array
(find-parent array node-index-1)
node-index-2))))

generate sequence with all permutations

How can I generate the shortest sequence with contains all possible permutations?
Example:
For length 2 the answer is 121, because this list contains 12 and 21, which are all possible permutations.
For length 3 the answer is 123121321, because this list contains all possible permutations:
123, 231, 312, 121 (invalid), 213, 132, 321.
Each number (within a given permutation) may only occur once.
This greedy algorithm produces fairly short minimal sequences.
UPDATE: Note that for n ≥ 6, this algorithm does not produce the shortest possible string!
Make a collection of all permutations.
Remove the first permutation from the collection.
Let a = the first permutation.
Find the sequence in the collection that has the greatest overlap with the end of a. If there is a tie, choose the sequence is first in lexicographic order. Remove the chosen sequence from the collection and add the non-overlapping part to the end of a. Repeat this step until the collection is empty.
The curious tie-breaking step is necessary for correctness; breaking the tie at random instead seems to result in longer strings.
I verified (by writing a much longer, slower program) that the answer this algorithm gives for length 4, 123412314231243121342132413214321, is indeed the shortest answer. However, for length 6 it produces an answer of length 873, which is longer than the shortest known solution.
The algorithm is O(n!2).
An implementation in Python:
import itertools
def costToAdd(a, b):
for i in range(1, len(b)):
if a.endswith(b[:-i]):
return i
return len(b)
def stringContainingAllPermutationsOf(s):
perms = set(''.join(tpl) for tpl in itertools.permutations(s))
perms.remove(s)
a = s
while perms:
cost, next = min((costToAdd(a, x), x) for x in perms)
perms.remove(next)
a += next[-cost:]
return a
The length of the strings generated by this function are 1, 3, 9, 33, 153, 873, 5913, ... which appears to be this integer sequence.
I have a hunch you can do better than O(n!2).
Create all permutations.
Let each
permutation represent a node in a
graph.
Now, for any two states add an
edge with a value 1 if they share
n-1 digits (for the source from the
end, and for the target from the
end), two if they share n-2 digits
and so on.
Now, you are left to find
the shortest path containing n
vertices.
Here is a fast algorithm that produces a short string containing all permutations. I am pretty sure it produces the shortest possible answer, but I don't have a complete proof in hand.
Explanation. Below is a tree of All Permutations. The picture is incomplete; imagine that the tree goes on forever to the right.
1 --+-- 12 --+-- 123 ...
| |
| +-- 231 ...
| |
| +-- 312 ...
|
+-- 21 --+-- 213 ...
|
+-- 132 ...
|
+-- 321 ...
The nodes at level k of this tree are all the permutations of length
k. Furthermore, the permutations are in a particular order with a lot
of overlap between each permutation and its neighbors above and below.
To be precise, each node's first child is found by simply adding the next
symbol to the end. For example, the first child of 213 would be 2134. The rest
of the children are found by rotating to the first child to left one symbol at
a time. Rotating 2134 would produce 1342, 3421, 4213.
Taking all the nodes at a given level and stringing them together, overlapping
as much as possible, produces the strings 1, 121, 123121321, etc.
The length of the nth string in that sequence is the sum for x=1 to n of x!. (You can prove this by observing how much non-overlap there is between neighboring permutations. Siblings overlap in all but 1 symbol; first-cousins overlap in all but 2 symbols; and so on.)
Sketch of proof. I haven't completely proved that this is the best solution, but here's a sketch of how the proof would proceed. First show that any string containing n distinct permutations has length ≥ 2n - 1. Then show that adding any string containing n+1 distinct permutations has length 2n + 1. That is, adding one more permutation will cost you two digits. Proceed by calculating the minimum length of strings containing nPr and nPr + 1 distinct permutations, up to n!. In short, this sequence is optimal because you can't make it worse somewhere in the hope of making it better someplace else. It's already locally optimal everywhere. All the moves are forced.
Algorithm. Given all this background, the algorithm is very simple. Walk this tree to the desired depth and string together all the nodes at that depth.
Fortunately we do not actually have to build the tree in memory.
def build(node, s):
"""String together all descendants of the given node at the target depth."""
d = len(node) # depth of this node. depth of "213" is 3.
n = len(s) # target depth
if d == n - 1:
return node + s[n - 1] + node # children of 213 join to make "2134213"
else:
c0 = node + s[d] # first child node
children = [c0[i:] + c0[:i] for i in range(d + 1)] # all child nodes
strings = [build(c, s) for c in children] # recurse to the desired depth
for j in range(1, d + 1):
strings[j] = strings[j][d:] # cut off overlap with previous sibling
return ''.join(strings) # join what's left
def stringContainingAllPermutationsOf(s):
return build(s[:1], s)
Performance. The above code is already much faster than my other solution, and it does a lot of cutting and pasting of large strings that you can optimize away. The algorithm can be made to run in time and memory proportional to the size of the output.
For n 3 length chain is 8
12312132
Seems to me we are working with cycled system - it's ring, saying in other words. But we are are working with ring as if it is chain. Chain is realy 123121321 = 9
But the ring is 12312132 = 8
We take last 1 for 321 from the beginning of the sequence 12312132[1].
These are called (minimal length) superpermutations (cf. Wikipedia).
Interest on this has re-sparked when an anonymous user has posted a new lower bound on 4chan. (See Wikipedia and many other web pages for history.)
AFAIK, as of today we just know:
Their length is A180632(n) ≤ A007489(n) = Sum_{k=1..n} k! but this bound is only sharp for n ≤ 5, i.e., we have equality for n ≤ 5 but strictly less for n > 5.
There's a very simple recursive algorithm, given below, producing a superpermutation of length A007489(n), which is always palindromic (but as said above this is not the minimal length for n > 5).
For n ≥ 7 we have the better upper bound n! + (n−1)! + (n−2)! + (n−3)! + n − 3.
For n ≤ 5 all minimal SP's are known; and for all n > 5 we don't know which is the minimal SP.
For n = 1, 2, 3, 4 the minimal SP's are unique (up to changing the symbols), given by (1, 121, 123121321, 123412314231243121342132413214321) of length A007489(1..4) = (1, 3, 9, 33).
For n = 5 there are 8 inequivalent ones of minimal length 153 = A007489(5); the palindromic one produced by the algorithm below is the 3rd in lexicographic order.
For n = 6 Houston produced thousands of the smallest known length 872 = A007489(6) - 1, but AFAIK we still don't know whether this is minimal.
For n = 7 Egan produced one of length 5906 (one less than the better upper bound given above) but again we don't know whether that's minimal.
I've written a very short PARI/GP program (you can paste to run it on the PARI/GP web site) which implements the standard algorithm producing a palindromic superpermutation of length A007489(n):
extend(S,n=vecmax(s))={ my(t); concat([
if(#Set(s)<n, [], /* discard if not a permutation */
s=concat([s, n+1, s]); /* Now merge with preceding segment: */
forstep(i=min(#s, #t)-1, 0, -1,
if(s[1..1+i]==t[#t-i..#t], s=s[2+i..-1]; break));
t=s /* store as previous for next */
)/*endif*/
| s <- [ S[i+1..i+n] | i <- [0..#S-n] ]])
}
SSP=vector(6, n, s=if(n>1, extend(s), [1])); // gives the first 6, the 6th being non-minimal
I think that easily translates to any other language. (For non-PARI speaking persons: "| x <-" means "for x in".)

Efficiently get sorted sums of a sorted list

You have an ascending list of numbers, what is the most efficient algorithm you can think of to get the ascending list of sums of every two numbers in that list. Duplicates in the resulting list are irrelevant, you can remove them or avoid them if you like.
To be clear, I'm interested in the algorithm. Feel free to post code in any language and paradigm that you like.
Edit as of 2018: You should probably stop reading this. (But I can't delete it as it is accepted.)
If you write out the sums like this:
1 4 5 6 8 9
---------------
2 5 6 7 9 10
8 9 10 12 13
10 11 13 14
12 14 15
16 17
18
You'll notice that since M[i,j] <= M[i,j+1] and M[i,j] <= M[i+1,j], then you only need to examine the top left "corners" and choose the lowest one.
e.g.
only 1 top left corner, pick 2
only 1, pick 5
6 or 8, pick 6
7 or 8, pick 7
9 or 8, pick 8
9 or 9, pick both :)
10 or 10 or 10, pick all
12 or 11, pick 11
12 or 12, pick both
13 or 13, pick both
14 or 14, pick both
15 or 16, pick 15
only 1, pick 16
only 1, pick 17
only 1, pick 18
Of course, when you have lots of top left corners then this solution devolves.
I'm pretty sure this problem is Ω(n²), because you have to calculate the sums for each M[i,j] -- unless someone has a better algorithm for the summation :)
Rather than coding this out, I figure I'll pseudo-code it in steps and explain my logic, so that better programmers can poke holes in my logic if necessary.
On the first step we start out with a list of numbers length n. For each number we need to create a list of length n-1 becuase we aren't adding a number to itself. By the end we have a list of about n sorted lists that was generated in O(n^2) time.
step 1 (startinglist)
for each number num1 in startinglist
for each number num2 in startinglist
add num1 plus num2 into templist
add templist to sumlist
return sumlist
In step 2 because the lists were sorted by design (add a number to each element in a sorted list and the list will still be sorted) we can simply do a mergesort by merging each list together rather than mergesorting the whole lot. In the end this should take O(n^2) time.
step 2 (sumlist)
create an empty list mergedlist
for each list templist in sumlist
set mergelist equal to: merge(mergedlist,templist)
return mergedlist
The merge method would be then the normal merge step with a check to make sure that there are no duplicate sums. I won't write this out because anyone can look up mergesort.
So there's my solution. The entire algorithm is O(n^2) time. Feel free to point out any mistakes or improvements.
You can do this in two lines in python with
allSums = set(a+b for a in X for b in X)
allSums = sorted(allSums)
The cost of this is n^2 (maybe an extra log factor for the set?) for the iteration and s * log(s) for the sorting where s is the size of the set.
The size of the set could be as big as n*(n-1)/2 for example if X = [1,2,4,...,2^n]. So if you want to generate this list it will take at least n^2/2 in the worst case since this is the size of the output.
However if you want to select the first k elements of the result you can do this in O(kn) using a selection algorithm for sorted X+Y matrices by Frederickson and Johnson (see here for gory details). Although this can probably be modified to generate them online by reusing computation and get an efficient generator for this set.
#deuseldorf, Peter
There is some confusion about (n!) I seriously doubt deuseldorf meant "n factorial" but simply "n, (very excited)!"
The best I could come up with is to produce a matrix of sums of each pair, and then merge the rows together, a-la merge sort. I feel like I'm missing some simple insight that will reveal a much more efficient solution.
My algorithm, in Haskell:
matrixOfSums list = [[a+b | b <- list, b >= a] | a <- list]
sortedSums = foldl merge [] matrixOfSums
--A normal merge, save that we remove duplicates
merge xs [] = xs
merge [] ys = ys
merge (x:xs) (y:ys) = case compare x y of
LT -> x:(merge xs (y:ys))
EQ -> x:(merge xs (dropWhile (==x) ys))
GT -> y:(merge (x:xs) ys)
I found a minor improvement, one that's more amenable to lazy stream-based coding. Instead of merging the columns pair-wise, merge all of them at once. The advantage being that you start getting elements of the list immediately.
-- wide-merge does a standard merge (ala merge-sort) across an arbitrary number of lists
-- wideNubMerge does this while eliminating duplicates
wideNubMerge :: Ord a => [[a]] -> [a]
wideNubMerge ls = wideNubMerge1 $ filter (/= []) ls
wideNubMerge1 [] = []
wideNubMerge1 ls = mini:(wideNubMerge rest)
where mini = minimum $ map head ls
rest = map (dropWhile (== mini)) ls
betterSortedSums = wideNubMerge matrixOfSums
However, if you know you're going to use all of the sums, and there's no advantage to getting some of them earlier, go with 'foldl merge []', as it's faster.
In SQL:
create table numbers(n int not null)
insert into numbers(n) values(1),(1), (2), (2), (3), (4)
select distinct num1.n+num2.n sum2n
from numbers num1
inner join numbers num2
on num1.n<>num2.n
order by sum2n
C# LINQ:
List<int> num = new List<int>{ 1, 1, 2, 2, 3, 4};
var uNum = num.Distinct().ToList();
var sums=(from num1 in uNum
from num2 in uNum
where num1!=num2
select num1+num2).Distinct();
foreach (var s in sums)
{
Console.WriteLine(s);
}
No matter what you do, without additional constraints on the input values, you cannot do better than O(n^2), simply because you have to iterate through all pairs of numbers. The iteration will dominate sorting (which you can do in O(n log n) or faster).
This question has been wracking my brain for about a day now. Awesome.
Anyways, you can't get away from the n^2 nature of it easily, but you can do slightly better with the merge since you can bound the range to insert each element in.
If you look at all the lists you generate, they have the following form:
(a[i], a[j]) | j>=i
If you flip it 90 degrees, you get:
(a[i], a[j]) | i<=j
Now, the merge process should be taking two lists i and i+1 (which correspond to lists where the first member is always a[i] and a[i+1]), you can bound the range to insert element (a[i + 1], a[j]) into list i by the location of (a[i], a[j]) and the location of (a[i + 1], a[j + 1]).
This means that you should merge in reverse in terms of j. I don't know (yet) if you can leverage this across j as well, but it seems possible.
If you are looking for a truly language agnostic solution then you will be sorely disappointed in my opinion because you'll be stuck with a for loop and some conditionals. However if you opened it up to functional languages or functional language features (I'm looking at you LINQ) then my colleagues here can fill this page with elegant examples in Ruby, Lisp, Erlang, and others.

Resources