Sorting in O(n) intersect - algorithm

Let S1 and S2 be two sets of integers (they are not necessarily disjoint).
We know
that |S1| = |S2| = n (i.e. each set has n integers).
Each set is stored in an array of length n, where
its integers are sorted in ascending order.
Let k ≥ 1 be an integer.
Design an algorithm to find the
k smallest integers in S1 ∩ S2 in O(n) time.
This is what I have so far:
Create a new array called Intersection
For each e in S1 add e to hashset in O(n) time
For each e in S2 check if e exists in hashset in O(n) time
If e exists in hashset add e to Intersection
Once comparisons are done sort Intersection by count sort in O(n) time
return the first k integers
Thus O(n) + O(n) + O(n) = O(n)
Am I on the right track?

Yes, you're definitely on the right track but there's actually no need at all to generate a hash-table or extra set. As your two sets are already sorted, you can simply run an index/pointer through both of them, looking for the common elements.
For example, to find the first common element from the two sets, use the following pseudo-code:
start at first index of both sets
while more elements in both sets, and current values are different:
if set1 value is less than set2 value:
advance set1 index
else
advance set2 index
At the end of that, set1 index will refer to an intersect point provided that neither index has moved beyond the last element in their respective list. You can then just use that method in a loop to find the first x intersection values.
Here's a proof of concept in Python 3 that gives you the first three numbers that are in the two lists (multiples-of-two and multiples-of-three). The full intersection would be {0, 6, 12, 18, 24} but you will see that it will only extract the first three of those:
# Create the two lists to be used for intersection.
set1 = [i * 2 for i in range(15)] ; print(set1) # doubles
set2 = [i * 3 for i in range(15)] ; print(set2) # trebles
idx1 = 0 ; count1 = len(set1)
idx2 = 0 ; count2 = len(set2)
# Only want first three.
need = 3
while need > 0:
# Continue until we find next intersect or end of a list.
while idx1 < count1 and idx2 < count2 and set1[idx1] != set2[idx2]:
# Advance pointer of list with lowest value.
if set1[idx1] < set2[idx2]:
idx1 += 1
else:
idx2 += 1
# Break if reached end of a list with no intersect.
if idx1 >= count1 or idx2 >= count2:
break
# Otherwise print intersect and advance to next list candidate.
print(set1[idx1]) ; need -= 1
idx1 += 1 ; idx2 += 1
The output is, as expected:
[0, 2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24, 26, 28]
[0, 3, 6, 9, 12, 15, 18, 21, 24, 27, 30, 33, 36, 39, 42]
0
6
12
If you needed a list at the end rather than just printing out the intersect points, you would simply initialise an empty container before the loop and the append the value to it rather than printing it. This then becomes a little more like your proposed solution but with the advantage of not needing hash tables or sorting.

Create two arrays, call them arr1 and arr2, of size array_size and populate them with integer values in ascending order. Create two indexes, call them i and j, that will be used to iterate over arr1 and arr2 respectively and initialize them to 0. Compare the first two values of each array: if arr1[0] is less than arr2[0] then increment i, else if arr1[0] is greater than arr2[0] increment j, else the values intersect and we can return this value. Once we have returned k intersecting values we can stop iterating. In the worst case scenario this will be i + j, O(n) if no intersections occur between both sets of values and we will have to iterate to the end of each array.
Here is the solution in bash:
#!/bin/bash
#-------------------------------------------------------------------------------
# Design an algorithm to find the k smallest integers in S1 ∩ S2 in O(n) time.
#-------------------------------------------------------------------------------
typeset -a arr1 arr2 arr_answer
typeset -i array_size=20 k=5
function populate_arrs {
typeset -i counter=0
while [[ ${counter} -lt ${array_size} ]]; do
arr1[${counter}]=$((${counter} * 2))
arr2[${counter}]=$((${counter} * 3))
counter=${counter}+1
done
printf "%8s" "Set1: "; printf "%4d" ${arr1[*]}; printf "\n"
printf "%8s" "Set2: "; printf "%4d" ${arr2[*]}; printf "\n\n"
}
function k_smallest_integers_main {
populate_arrs
typeset -i counter=0 i=0 j=0
while [[ ${counter} -lt ${k} ]]; do
if [[ ${arr1[${i}]} -eq ${arr2[${j}]} ]]; then
arr_answer[${counter}]=${arr1[${i}]}
counter=${counter}+1; i=${i}+1; j=${j}+1
elif [[ ${arr1[${i}]} -lt ${arr2[${j}]} ]]; then
i=${i}+1
else
j=${j}+1
fi
done
printf "%8s" "Answer: "; printf "%4d" ${arr_answer[*]}; printf "\n"
}
k_smallest_integers_main
Output:
Set1: 0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34 36 38
Set2: 0 3 6 9 12 15 18 21 24 27 30 33 36 39 42 45 48 51 54 57
Answer: 0 6 12 18 24

In Python:
i1= 0; i2= 0
while k > 0 and i1 < n and i2 < n:
if S1[i1] < S2[i2]:
i1+= 1
elif S1[i1] > S2[i2]:
i2+= 1
else:
Process(S1[i1], S2[i2])
i1+= 1; i2+= 1
k-= 1
Execution will perform less than k calls to Process if there aren't sufficiently many elements in the intersection.

Related

Maxsubsequence - What are the key insights for this problem?

Below is the problem assignment using tree recursion approach:
Maximum Subsequence
A subsequence of a number is a series of (not necessarily contiguous) digits of the number. For example, 12345 has subsequences that include 123, 234, 124, 245, etc. Your task is to get the maximum subsequence below a certain length.
def max_subseq(n, l):
"""
Return the maximum subsequence of length at most l that can be found in the given number n.
For example, for n = 20125 and l = 3, we have that the subsequences are
2
0
1
2
5
20
21
22
25
01
02
05
12
15
25
201
202
205
212
215
225
012
015
025
125
and of these, the maxumum number is 225, so our answer is 225.
>>> max_subseq(20125, 3)
225
>>> max_subseq(20125, 5)
20125
>>> max_subseq(20125, 6) # note that 20125 == 020125
20125
>>> max_subseq(12345, 3)
345
>>> max_subseq(12345, 0) # 0 is of length 0
0
>>> max_subseq(12345, 1)
5
"""
"*** YOUR CODE HERE ***"
There are two key insights for this problem
You need to split into the cases where the ones digit is used and the one where it is not. In the case where it is, we want to reduce l since we used one of the digits, and in the case where it isn't we do not.
In the case where we are using the ones digit, you need to put the digit back onto the end, and the way to attach a digit d to the end of a number n is 10 * n + d.
I could not understand the insights of this problem, mentioned below 2 points:
split into the cases where the ones digit is used and the one where it is not
In the case where we are using the ones digit, you need to put the digit back onto the end
My understanding of this problem:
Solution to this problem looks to generate all subsequences upto l, pseudo code looks like:
digitSequence := strconv.Itoa(n) // "20125"
printSubSequence = func(digitSequence string, currenSubSequenceSize int) { // digitSequence is "20125" and currenSubSequenceSize is say 3
printNthSubSequence(digitSequence, currenSubSequenceSize) + printSubSequence(digitSequence, currenSubSequenceSize-1)
}
where printNthSubSequence prints subsequences for (20125, 3) or (20125, 2) etc...
Finding max_subseq among all these sequences then becomes easy
Can you help me understand the insights given in this problem, with an example(say 20125, 1)? here is the complete question
Something like this? As the instructions suggest, try it with and without the current digit:
function f(s, i, l){
if (i + 1 <= l)
return Number(s.substr(0, l));
if (!l)
return 0;
return Math.max(
// With
Number(s[i]) + 10 * f(s, i - 1, l - 1),
// Without
f(s, i - 1, l)
);
}
var input = [
['20125', 3],
['20125', 5],
['20125', 6],
['12345', 3],
['12345', 0],
['12345', 1]
];
for (let [s, l] of input){
console.log(s + ', l: ' + l);
console.log(f(s, s.length-1, l));
console.log('');
}

Determining the pairs of integers that sum to some value in the array

I have the program which counts the number of pairs of N integers that sum to value. To simplify the problem, assume also that the integers are distinct.
l.Sort();
for (int i = 0; i < l.Count; ++i)
{
int j = l.BinarySearch(value - l[i]);
if (j > i)
{
Console.WriteLine("{0} {1}", i + 1, j+1);
}
}
To solve the problem, we sort the array (to enable binary search) and then, for every entry a[i] in the array, do a binary search for value - a[i]. If the result is an index j with j > i, we show this pair.
But this algorithm don't work on the following input:
1 2 3 4 4 9 56 90 because j always smaller than i.
How to fix that?
I would go with more efficient solution that needs more space.
Assume that numbers are not distinct
Create a hash table with your integers as a key and a frequency as a value
Iterate over this hash table.
For each key
calculate diff diff = value - k
lookup for diff in hash
if there is a match check if this value have got frequency > 0
if frequency is > 0 decrement it by 1 and yield current pair k, diff
Here is a Python code:
def count_pairs(arr, value):
hsh = {}
for k in arr:
cnt = hsh.get(k, 0)
hsh[k] = cnt + 1
for k in arr:
diff = value - k
cnt = hsh.get(diff)
if cnt > 0:
hsh[k] -= 1
print("Pair detected: " + str(k) + " and " + str(diff))
count_pairs([4, 2, 3, 4, 9, 1, 5, 4, 56, 90], 8)
#=> Pair detected: 4 and 4
#=> Pair detected: 3 and 5
#=> Pair detected: 4 and 4
#=> Pair detected: 4 and 4
As far as counts the number of pairs is very vague description, here you could see 4 distinct (by number's index) pairs.
If you want this to work for non-distinct values (which your
question does not say, but your comment implies), binary search only the
portion of the array after i. This also eliminates the need for the
if (j > i) test.
Would show the code, but I don't know how to specify such a slice in
whatever language you're using.

Sort array by pairwise difference

For example we have the array X[n] = {X0, X1, X2, ... Xn}
The goal is to sort this array that the difference between every pair is in ascending order.
For example X[] = {10, 2, 7, 4}
Answers are:
2 7 10 4
4 10 7 2
I have some code but it's brute force :)
#include <stdio.h>
int main(int argc, char **argv)
{
int array[] = { 10, 2, 7, 4 };
int a[4];
for(int i = 0; i < 4; i++){
a[0] = array[i];
for(int j = 0; j < 4; j++){
a[1] = array[j];
if(a[0] == a[1])
continue;
for(int k = 0; k < 4; k++){
a[2] = array[k];
if(a[0] == a[2] || a[1] == a[2])
continue;
for(int l = 0; l < 4; l++){
a[3] = array[l];
if(a[0] == a[3] || a[1] == a[3] || a[2] == a[3])
continue;
if(a[0] - a[1] < a[1] - a[2] && a[1] - a[2] < a[2] - a[3])
printf("%d %d %d %d\n", a[0], a[1], a[2], a[3]);
}
}
}
}
return 0;
}
Any idea for "pretty" algorithm ? :)
DISCLAIMER This solution will arrange items to difference grow by absolute value. Thx to #Will Ness
One solutions according to the difference between every pair is in ascending order requirement.
You just sort array in ascending order O(n)*log(n) and then start in the middle. And the you arrange elements like this :
[n/2, n/2+1, n/2-1, n/2+2, n/2-2, n/2+3 ...] you go to +1 first if more element are on the right side of (n/2)th element
[n/2, n/2-1, n/2+1, n/2-2, n/2+2, n/2-3 ...] you go to -1 first otherwise.
Here you get ascending pairwise difference.
NOTE!!! It is not guaranteed that this algo will find the smallest difference and start with it, but I do not see this is requirements.
Example
Sorted array: {1, 2, 10, 15, 40, 50, 60, 61, 100, 101}
Then, you pick 50 (as 10/2 = 5th), 60 (10/2+1 = 6), 40 and so on...
You'll get: {40, 50, 15, 60, 10, 61, 2, 100, 1, 101}
Which got you diffs: 10, 35, 45, 50, 51, 59, 88, 99, 100
Let's see. Your example array is {10,2,7,4} and the answers you show are:
2 7 10 4
5 3 -6 differences, a[i+1] - a[i]
4 10 7 2
6 -3 -5
I show the flipped differences here, it's easier to analyze that way.
So, the goal is to have the differences a[i+1] - a[i] in descending order. Obviously some positive difference values will go first, then some negative. This means the maximal element of the array will appear somewhere in the middle. The positive differences to the left of it must be in descending order of absolute value, and the negatives to the right - in ascending order of absolute value.
Let's take another array as an example: {4,8,20,15,16,1,3}. We start by sorting it:
1 3 4 8 15 16 20
2 1 4 7 1 4 differences, a[i+1] - a[i]
Now, 20 goes in the middle, and after it to the right must go values progressively further apart. Since the differences to the left of 20 in the solution are positive, the values themselves are ascending, i.e. sorted. So whatever's left after we pick some of them to move to the right of the maximal element, stays as is, and the (positive) differences must be in descending order. If they are, the solution is found.
Here there are no solutions. The possibilities are:
... 20 16 8 (no more) left: 1 3 4 15 (diffs: 2 1 11 5)
... 20 16 4 (no more) left: 1 3 8 15 (diffs: 2 5 7 5)
... 20 16 3 (no more) left: 1 4 8 15 (diffs: 3 4 7 5)
... 20 16 1 (no more) left: 3 4 8 15 ....................
... 20 15 8 (no more) left: 1 3 4 16
... 20 15 4 (no more) left: 1 3 8 16
... 20 15 3 (no more) left: 1 4 8 16
... 20 15 1 (no more) left: 3 4 8 16
... 20 8 (no more) left: 1 3 4 15 16
... 20 4 (no more) left: 1 3 8 15 16
... 20 3 (no more) left: 1 4 8 15 16
... 20 1 (no more) left: 3 4 8 15 16
... 20 (no more) left: 1 3 4 8 15 16
Without 1 and 3, several solutions are possible.
Solution for this problem is not always possible. For example, array X[] = {0, 0, 0} cannot be "sorted" as required because both differences are always equal.
In case this problem has a solution, array values should be "sorted" as shown on the left diagram: some subset of the values in ascending order should form prefix of the resulting array, then all the remaining values in descending order should form its suffix. And "sorted" array should be convex.
This gives a hint for an algorithm: sort the array, then split its values into two convex subsets, then extract one of these subsets and append it (in reverse order) at the end.
A simple (partial) implementation would be: sort the array, find a subset of values that belong to convex hull, then check all the remaining values, and if they are convex, append them at the end. This algorithm works only if one of the subsets lies completely below the other one.
If the resulting subsets intersect (as shown on the right diagram), an improved version of this algorithm may be used: split sorted array into segments where one of the subsets lies completely below other one (A-B, B-C), then for each of these segments find convex hull and check convexity of the remaining subset. Note that X axis on the right diagram corresponds to the array indexes in a special way: for subset intersections (A, B, C) X corresponds to an index in ascending-sorted array; X coordinates for values between intersections are scaled according to their positions in the resulting array.
Sketch of an algorithm
Sort the array in ascending order.
Starting from the largest value, try adding convex hull values to the "top" subset (in a way similar to Graham scan algorithm). Also put all the values not belonging to convex hull to the "bottom" subset and check its convexity. Continue while all the values properly fit to either "top" or "bottom" subset. When the smallest value is processed, remove one of these subsets from the array, reverse the subset, and append at the and of the array.
If after adding some value to the "top" subset, the "bottom" subset is not convex anymore, rollback last addition and check if this value can be properly added to the "bottom" subset. If not, stop, because input array cannot be "sorted" as required. Otherwise, exchange "top" and "bottom" subsets and continue with step 2 (already processed values should not be moved between subsets, any attempt to move them should result in going to step 3).
In other words, we could process each value of sorted array, from largest to smallest, trying to append this value to one of two subsets in such a way that both subsets stay convex. At first, we try to place a new value to the subset where previous value was added. This may make several values, added earlier, unfit to this subset - then we check if they all fit to other subset. If they do - move them to other subset, if not - leave them in "top" subset but move current value to other subset.
Time complexity
Each value is added or removed from "top" subset at most once, also it may be added to "bottom" subset at most once. And for each operation on an element we need to inspect only two its nearest predecessors. This means worst-case time complexity of steps 2 and 3 is O(N). So overall time complexity is determined by the sorting algorithm on step 1.

Google Interview: Arrangement of Blocks

You are given N blocks of height 1…N. In how many ways can you arrange these blocks in a row such that when viewed from left you see only L blocks (rest are hidden by taller blocks) and when seen from right you see only R blocks? Example given N=3, L=2, R=1 there is only one arrangement {2, 1, 3} while for N=3, L=2, R=2 there are two ways {1, 3, 2} and {2, 3, 1}.
How should we solve this problem by programming? Any efficient ways?
This is a counting problem, not a construction problem, so we can approach it using recursion. Since the problem has two natural parts, looking from the left and looking from the right, break it up and solve for just one part first.
Let b(N, L, R) be the number of solutions, and let f(N, L) be the number of arrangements of N blocks so that L are visible from the left. First think about f because it's easier.
APPROACH 1
Let's get the initial conditions and then go for recursion. If all are to be visible, then they must be ordered increasingly, so
f(N, N) = 1
If there are suppose to be more visible blocks than available blocks, then nothing we can do, so
f(N, M) = 0 if N < M
If only one block should be visible, then put the largest first and then the others can follow in any order, so
f(N,1) = (N-1)!
Finally, for the recursion, think about the position of the tallest block, say N is in the kth spot from the left. Then choose the blocks to come before it in (N-1 choose k-1) ways, arrange those blocks so that exactly L-1 are visible from the left, and order the N-k blocks behind N it in any you like, giving:
f(N, L) = sum_{1<=k<=N} (N-1 choose k-1) * f(k-1, L-1) * (N-k)!
In fact, since f(x-1,L-1) = 0 for x<L, we may as well start k at L instead of 1:
f(N, L) = sum_{L<=k<=N} (N-1 choose k-1) * f(k-1, L-1) * (N-k)!
Right, so now that the easier bit is understood, let's use f to solve for the harder bit b. Again, use recursion based on the position of the tallest block, again say N is in position k from the left. As before, choose the blocks before it in N-1 choose k-1 ways, but now think about each side of that block separately. For the k-1 blocks left of N, make sure that exactly L-1 of them are visible. For the N-k blocks right of N, make sure that R-1 are visible and then reverse the order you would get from f. Therefore the answer is:
b(N,L,R) = sum_{1<=k<=N} (N-1 choose k-1) * f(k-1, L-1) * f(N-k, R-1)
where f is completely worked out above. Again, many terms will be zero, so we only want to take k such that k-1 >= L-1 and N-k >= R-1 to get
b(N,L,R) = sum_{L <= k <= N-R+1} (N-1 choose k-1) * f(k-1, L-1) * f(N-k, R-1)
APPROACH 2
I thought about this problem again and found a somewhat nicer approach that avoids the summation.
If you work the problem the opposite way, that is think of adding the smallest block instead of the largest block, then the recurrence for f becomes much simpler. In this case, with the same initial conditions, the recurrence is
f(N,L) = f(N-1,L-1) + (N-1) * f(N-1,L)
where the first term, f(N-1,L-1), comes from placing the smallest block in the leftmost position, thereby adding one more visible block (hence L decreases to L-1), and the second term, (N-1) * f(N-1,L), accounts for putting the smallest block in any of the N-1 non-front positions, in which case it is not visible (hence L stays fixed).
This recursion has the advantage of always decreasing N, though it makes it more difficult to see some formulas, for example f(N,N-1) = (N choose 2). This formula is fairly easy to show from the previous formula, though I'm not certain how to derive it nicely from this simpler recurrence.
Now, to get back to the original problem and solve for b, we can also take a different approach. Instead of the summation before, think of the visible blocks as coming in packets, so that if a block is visible from the left, then its packet consists of all blocks right of it and in front of the next block visible from the left, and similarly if a block is visible from the right then its packet contains all blocks left of it until the next block visible from the right. Do this for all but the tallest block. This makes for L+R packets. Given the packets, you can move one from the left side to the right side simply by reversing the order of the blocks. Therefore the general case b(N,L,R) actually reduces to solving the case b(N,L,1) = f(N,L) and then choosing which of the packets to put on the left and which on the right. Therefore we have
b(N,L,R) = (L+R choose L) * f(N,L+R)
Again, this reformulation has some advantages over the previous version. Putting these latter two formulas together, it's much easier to see the complexity of the overall problem. However, I still prefer the first approach for constructing solutions, though perhaps others will disagree. All in all it just goes to show there's more than one good way to approach the problem.
What's with the Stirling numbers?
As Jason points out, the f(N,L) numbers are precisely the (unsigned) Stirling numbers of the first kind. One can see this immediately from the recursive formulas for each. However, it's always nice to be able to see it directly, so here goes.
The (unsigned) Stirling numbers of the First Kind, denoted S(N,L) count the number of permutations of N into L cycles. Given a permutation written in cycle notation, we write the permutation in canonical form by beginning the cycle with the largest number in that cycle and then ordering the cycles increasingly by the first number of the cycle. For example, the permutation
(2 6) (5 1 4) (3 7)
would be written in canonical form as
(5 1 4) (6 2) (7 3)
Now drop the parentheses and notice that if these are the heights of the blocks, then the number of visible blocks from the left is exactly the number of cycles! This is because the first number of each cycle blocks all other numbers in the cycle, and the first number of each successive cycle is visible behind the previous cycle. Hence this problem is really just a sneaky way to ask you to find a formula for Stirling numbers.
well, just as an empirical solution for small N:
blocks.py:
import itertools
from collections import defaultdict
def countPermutation(p):
n = 0
max = 0
for block in p:
if block > max:
n += 1
max = block
return n
def countBlocks(n):
count = defaultdict(int)
for p in itertools.permutations(range(1,n+1)):
fwd = countPermutation(p)
rev = countPermutation(reversed(p))
count[(fwd,rev)] += 1
return count
def printCount(count, n, places):
for i in range(1,n+1):
for j in range(1,n+1):
c = count[(i,j)]
if c > 0:
print "%*d" % (places, count[(i,j)]),
else:
print " " * places ,
print
def countAndPrint(nmax, places):
for n in range(1,nmax+1):
printCount(countBlocks(n), n, places)
print
and sample output:
blocks.countAndPrint(10)
1
1
1
1 1
1 2
1
2 3 1
2 6 3
3 3
1
6 11 6 1
6 22 18 4
11 18 6
6 4
1
24 50 35 10 1
24 100 105 40 5
50 105 60 10
35 40 10
10 5
1
120 274 225 85 15 1
120 548 675 340 75 6
274 675 510 150 15
225 340 150 20
85 75 15
15 6
1
720 1764 1624 735 175 21 1
720 3528 4872 2940 875 126 7
1764 4872 4410 1750 315 21
1624 2940 1750 420 35
735 875 315 35
175 126 21
21 7
1
5040 13068 13132 6769 1960 322 28 1
5040 26136 39396 27076 9800 1932 196 8
13068 39396 40614 19600 4830 588 28
13132 27076 19600 6440 980 56
6769 9800 4830 980 70
1960 1932 588 56
322 196 28
28 8
1
40320 109584 118124 67284 22449 4536 546 36 1
40320 219168 354372 269136 112245 27216 3822 288 9
109584 354372 403704 224490 68040 11466 1008 36
118124 269136 224490 90720 19110 2016 84
67284 112245 68040 19110 2520 126
22449 27216 11466 2016 126
4536 3822 1008 84
546 288 36
36 9
1
You'll note a few obvious (well, mostly obvious) things from the problem statement:
the total # of permutations is always N!
with the exception of N=1, there is no solution for L,R = (1,1): if a count in one direction is 1, then it implies the tallest block is on that end of the stack, so the count in the other direction has to be >= 2
the situation is symmetric (reverse each permutation and you reverse the L,R count)
if p is a permutation of N-1 blocks and has count (Lp,Rp), then the N permutations of block N inserted in each possible spot can have a count ranging from L = 1 to Lp+1, and R = 1 to Rp + 1.
From the empirical output:
the leftmost column or topmost row (where L = 1 or R = 1) with N blocks is the sum of the
rows/columns with N-1 blocks: i.e. in #PengOne's notation,
b(N,1,R) = sum(b(N-1,k,R-1) for k = 1 to N-R+1
Each diagonal is a row of Pascal's triangle, times a constant factor K for that diagonal -- I can't prove this, but I'm sure someone can -- i.e.:
b(N,L,R) = K * (L+R-2 choose L-1) where K = b(N,1,L+R-1)
So the computational complexity of computing b(N,L,R) is the same as the computational complexity of computing b(N,1,L+R-1) which is the first column (or row) in each triangle.
This observation is probably 95% of the way towards an explicit solution (the other 5% I'm sure involves standard combinatoric identities, I'm not too familiar with those).
A quick check with the Online Encyclopedia of Integer Sequences shows that b(N,1,R) appears to be OEIS sequence A094638:
A094638 Triangle read by rows: T(n,k) =|s(n,n+1-k)|, where s(n,k) are the signed Stirling numbers of the first kind (1<=k<=n; in other words, the unsigned Stirling numbers of the first kind in reverse order).
1, 1, 1, 1, 3, 2, 1, 6, 11, 6, 1, 10, 35, 50, 24, 1, 15, 85, 225, 274, 120, 1, 21, 175, 735, 1624, 1764, 720, 1, 28, 322, 1960, 6769, 13132, 13068, 5040, 1, 36, 546, 4536, 22449, 67284, 118124, 109584, 40320, 1, 45, 870, 9450, 63273, 269325, 723680, 1172700
As far as how to efficiently compute the Stirling numbers of the first kind, I'm not sure; Wikipedia gives an explicit formula but it looks like a nasty sum. This question (computing Stirling #s of the first kind) shows up on MathOverflow and it looks like O(n^2), as PengOne hypothesizes.
Based on #PengOne answer, here is my Javascript implementation:
function g(N, L, R) {
var acc = 0;
for (var k=1; k<=N; k++) {
acc += comb(N-1, k-1) * f(k-1, L-1) * f(N-k, R-1);
}
return acc;
}
function f(N, L) {
if (N==L) return 1;
else if (N<L) return 0;
else {
var acc = 0;
for (var k=1; k<=N; k++) {
acc += comb(N-1, k-1) * f(k-1, L-1) * fact(N-k);
}
return acc;
}
}
function comb(n, k) {
return fact(n) / (fact(k) * fact(n-k));
}
function fact(n) {
var acc = 1;
for (var i=2; i<=n; i++) {
acc *= i;
}
return acc;
}
$("#go").click(function () {
alert(g($("#N").val(), $("#L").val(), $("#R").val()));
});
Here is my construction solution inspired by #PengOne's ideas.
import itertools
def f(blocks, m):
n = len(blocks)
if m > n:
return []
if m < 0:
return []
if n == m:
return [sorted(blocks)]
maximum = max(blocks)
blocks = list(set(blocks) - set([maximum]))
results = []
for k in range(0, n):
for left_set in itertools.combinations(blocks, k):
for left in f(left_set, m - 1):
rights = itertools.permutations(list(set(blocks) - set(left)))
for right in rights:
results.append(list(left) + [maximum] + list(right))
return results
def b(n, l, r):
blocks = range(1, n + 1)
results = []
maximum = max(blocks)
blocks = list(set(blocks) - set([maximum]))
for k in range(0, n):
for left_set in itertools.combinations(blocks, k):
for left in f(left_set, l - 1):
other = list(set(blocks) - set(left))
rights = f(other, r - 1)
for right in rights:
results.append(list(left) + [maximum] + list(right))
return results
# Sample
print b(4, 3, 2) # -> [[1, 2, 4, 3], [1, 3, 4, 2], [2, 3, 4, 1]]
We derive a general solution F(N, L, R) by examining a specific testcase: F(10, 4, 3).
We first consider 10 in the leftmost possible position, the 4th ( _ _ _ 10 _ _ _ _ _ _ ).
Then we find the product of the number of valid sequences in the left and in the right of 10.
Next, we'll consider 10 in the 5th slot, calculate another product and add it to the previous one.
This process will go on until 10 is in the last possible slot, the 8th.
We'll use the variable named pos to keep track of N's position.
Now suppose pos = 6 ( _ _ _ _ _ 10 _ _ _ _ ). In the left of 10, there are 9C5 = (N-1)C(pos-1) sets of numbers to be arranged.
Since only the order of these numbers matters, we could look at 1, 2, 3, 4, 5.
To construct a sequence with these numbers so that 3 = L-1 of them are visible from the left, we can begin by placing 5 in the leftmost possible slot ( _ _ 5 _ _ ) and follow similar steps to what we did before.
So if F were defined recursively, it could be used here.
The only difference now is that the order of numbers in the right of 5 is immaterial.
To resolve this issue, we'll use a signal, INF (infinity), for R to indicate its unimportance.
Turning to the right of 10, there will be 4 = N-pos numbers left.
We first consider 4 in the last possible slot, position 2 = R-1 from the right ( _ _ 4 _ ).
Here what appears in the left of 4 is immaterial.
But counting arrangements of 4 blocks with the mere condition that 2 of them should be visible from the right is no different than counting arrangements of the same blocks with the mere condition that 2 of them should be visible from the left.
ie. instead of counting sequences like 3 1 4 2, one can count sequences like 2 4 1 3
So the number of valid arrangements in the right of 10 is F(4, 2, INF).
Thus the number of arrangements when pos == 6 is 9C5 * F(5, 3, INF) * F(4, 2, INF) = (N-1)C(pos-1) * F(pos-1, L-1, INF)* F(N-pos, R-1, INF).
Similarly, in F(5, 3, INF), 5 will be considered in a succession of slots with L = 2 and so on.
Since the function calls itself with L or R reduced, it must return a value when L = 1, that is F(N, 1, INF) must be a base case.
Now consider the arrangement _ _ _ _ _ 6 7 10 _ _.
The only slot 5 can take is the first, and the following 4 slots may be filled in any manner; thus F(5, 1, INF) = 4!.
Then clearly F(N, 1, INF) = (N-1)!.
Other (trivial) base cases and details could be seen in the C implementation below.
Here is a link for testing the code
#define INF UINT_MAX
long long unsigned fact(unsigned n) { return n ? n * fact(n-1) : 1; }
unsigned C(unsigned n, unsigned k) { return fact(n) / (fact(k) * fact(n-k)); }
unsigned F(unsigned N, unsigned L, unsigned R)
{
unsigned pos, sum = 0;
if(R != INF)
{
if(L == 0 || R == 0 || N < L || N < R) return 0;
if(L == 1) return F(N-1, R-1, INF);
if(R == 1) return F(N-1, L-1, INF);
for(pos = L; pos <= N-R+1; ++pos)
sum += C(N-1, pos-1) * F(pos-1, L-1, INF) * F(N-pos, R-1, INF);
}
else
{
if(L == 1) return fact(N-1);
for(pos = L; pos <= N; ++pos)
sum += C(N-1, pos-1) * F(pos-1, L-1, INF) * fact(N-pos);
}
return sum;
}

Link list algorithm to find pairs adding up to 10

Can you suggest an algorithm that find all pairs of nodes in a link list that add up to 10.
I came up with the following.
Algorithm: Compare each node, starting with the second node, with each node starting from the head node till the previous node (previous to the current node being compared) and report all such pairs.
I think this algorithm should work however its certainly not the most efficient one having a complexity of O(n2).
Can anyone hint at a solution which is more efficient (perhaps takes linear time). Additional or temporary nodes can be used by such a solution.
If their range is limited (say between -100 and 100), it's easy.
Create an array quant[-100..100] then just cycle through your linked list, executing:
quant[value] = quant[value] + 1
Then the following loop will do the trick.
for i = -100 to 100:
j = 10 - i
for k = 1 to quant[i] * quant[j]
output i, " ", j
Even if their range isn't limited, you can have a more efficient method than what you proposed, by sorting the values first and then just keeping counts rather than individual values (same as the above solution).
This is achieved by running two pointers, one at the start of the list and one at the end. When the numbers at those pointers add up to 10, output them and move the end pointer down and the start pointer up.
When they're greater than 10, move the end pointer down. When they're less, move the start pointer up.
This relies on the sorted nature. Less than 10 means you need to make the sum higher (move start pointer up). Greater than 10 means you need to make the sum less (end pointer down). Since they're are no duplicates in the list (because of the counts), being equal to 10 means you move both pointers.
Stop when the pointers pass each other.
There's one more tricky bit and that's when the pointers are equal and the value sums to 10 (this can only happen when the value is 5, obviously).
You don't output the number of pairs based on the product, rather it's based on the product of the value minus 1. That's because a value 5 with count of 1 doesn't actually sum to 10 (since there's only one 5).
So, for the list:
2 3 1 3 5 7 10 -1 11
you get:
Index a b c d e f g h
Value -1 1 2 3 5 7 10 11
Count 1 1 1 2 1 1 1 1
You start pointer p1 at a and p2 at h. Since -1 + 11 = 10, you output those two numbers (as above, you do it N times where N is the product of the counts). Thats one copy of (-1,11). Then you move p1 to b and p2 to g.
1 + 10 > 10 so leave p1 at b, move p2 down to f.
1 + 7 < 10 so move p1 to c, leave p2 at f.
2 + 7 < 10 so move p1 to d, leave p2 at f.
3 + 7 = 10, output two copies of (3,7) since the count of d is 2, move p1 to e, p2 to e.
5 + 5 = 10 but p1 = p2 so the product is 0 times 0 or 0. Output nothing, move p1 to f, p2 to d.
Loop ends since p1 > p2.
Hence the overall output was:
(-1,11)
( 3, 7)
( 3, 7)
which is correct.
Here's some test code. You'll notice that I've forced 7 (the midpoint) to a specific value for testing. Obviously, you wouldn't do this.
#include <stdio.h>
#define SZSRC 30
#define SZSORTED 20
#define SUM 14
int main (void) {
int i, s, e, prod;
int srcData[SZSRC];
int sortedVal[SZSORTED];
int sortedCnt[SZSORTED];
// Make some random data.
srand (time (0));
for (i = 0; i < SZSRC; i++) {
srcData[i] = rand() % SZSORTED;
printf ("srcData[%2d] = %5d\n", i, srcData[i]);
}
// Convert to value/size array.
for (i = 0; i < SZSORTED; i++) {
sortedVal[i] = i;
sortedCnt[i] = 0;
}
for (i = 0; i < SZSRC; i++)
sortedCnt[srcData[i]]++;
// Force 7+7 to specific count for testing.
sortedCnt[7] = 2;
for (i = 0; i < SZSORTED; i++)
if (sortedCnt[i] != 0)
printf ("Sorted [%3d], count = %3d\n", i, sortedCnt[i]);
// Start and end pointers.
s = 0;
e = SZSORTED - 1;
// Loop until they overlap.
while (s <= e) {
// Equal to desired value?
if (sortedVal[s] + sortedVal[e] == SUM) {
// Get product (note special case at midpoint).
prod = (s == e)
? (sortedCnt[s] - 1) * (sortedCnt[e] - 1)
: sortedCnt[s] * sortedCnt[e];
// Output the right count.
for (i = 0; i < prod; i++)
printf ("(%3d,%3d)\n", sortedVal[s], sortedVal[e]);
// Move both pointers and continue.
s++;
e--;
continue;
}
// Less than desired, move start pointer.
if (sortedVal[s] + sortedVal[e] < SUM) {
s++;
continue;
}
// Greater than desired, move end pointer.
e--;
}
return 0;
}
You'll see that the code above is all O(n) since I'm not sorting in this version, just intelligently using the values as indexes.
If the minimum is below zero (or very high to the point where it would waste too much memory), you can just use a minVal to adjust the indexes (another O(n) scan to find the minimum value and then just use i-minVal instead of i for array indexes).
And, even if the range from low to high is too expensive on memory, you can use a sparse array. You'll have to sort it, O(n log n), and search it for updating counts, also O(n log n), but that's still better than the original O(n2). The reason the binary search is O(n log n) is because a single search would be O(log n) but you have to do it for each value.
And here's the output from a test run, which shows you the various stages of calculation.
srcData[ 0] = 13
srcData[ 1] = 16
srcData[ 2] = 9
srcData[ 3] = 14
srcData[ 4] = 0
srcData[ 5] = 8
srcData[ 6] = 9
srcData[ 7] = 8
srcData[ 8] = 5
srcData[ 9] = 9
srcData[10] = 12
srcData[11] = 18
srcData[12] = 3
srcData[13] = 14
srcData[14] = 7
srcData[15] = 16
srcData[16] = 12
srcData[17] = 8
srcData[18] = 17
srcData[19] = 11
srcData[20] = 13
srcData[21] = 3
srcData[22] = 16
srcData[23] = 9
srcData[24] = 10
srcData[25] = 3
srcData[26] = 16
srcData[27] = 9
srcData[28] = 13
srcData[29] = 5
Sorted [ 0], count = 1
Sorted [ 3], count = 3
Sorted [ 5], count = 2
Sorted [ 7], count = 2
Sorted [ 8], count = 3
Sorted [ 9], count = 5
Sorted [ 10], count = 1
Sorted [ 11], count = 1
Sorted [ 12], count = 2
Sorted [ 13], count = 3
Sorted [ 14], count = 2
Sorted [ 16], count = 4
Sorted [ 17], count = 1
Sorted [ 18], count = 1
( 0, 14)
( 0, 14)
( 3, 11)
( 3, 11)
( 3, 11)
( 5, 9)
( 5, 9)
( 5, 9)
( 5, 9)
( 5, 9)
( 5, 9)
( 5, 9)
( 5, 9)
( 5, 9)
( 5, 9)
( 7, 7)
Create a hash set (HashSet in Java) (could use a sparse array if your numbers are well-bounded, i.e. you know they fall into +/- 100)
For each node, first check if 10-n is in the set. If so, you have found a pair. Either way, then add n to the set and continue.
So for example you have
1 - 6 - 3 - 4 - 9
1 - is 9 in the set? Nope
6 - 4? No.
3 - 7? No.
4 - 6? Yup! Print (6,4)
9 - 1? Yup! Print (9,1)
This is a mini subset sum problem, which is NP complete.
If you were to first sort the set, it would eliminate the pairs of numbers that needed to be evaluated.

Resources