Permutation algorithm in the form of - algorithm

R= repeats allowed -> 2
A= alphabet (1-10)
S= space = 4;
So we want example:
[1][1][4][5]
[1][7][4][5]
[5][1][4][5]
But need a fancy math formula to calculate this and all combinations ?

As I understand it, your alphabet is 1 .. 10, with each 'letter' possibly occurring twice. So what you really have is an alphabet that is ...
1 1 2 2 3 3 4 4 5 5 6 6 7 7 8 8 9 9 10 10
It has a length of 20, not 10.
The problem now becomes 20 permute 4.
Hope this helps.
EDIT:
As per your additional comments to your question, you can then check each generated permutation to see if it is of the form XXYY as that would be invalid according to what you have written.

A correct general answer requires a summation. I will show you how to do it for these particular values, and let you generalize it.
There are two cases:
Permutations containing no duplicates. This is just 10 P 4
Permutations containing exactly one duplicate:
Choose which number is the duplicate: 10 C 1
Choose two places for it: 4 C 2
Choose the numbers which fit into the remaining two places: 9 P 2
Thus the answer to this particular case is 9360.

The total number is
N * [ (S choose 1) * ((N-1) permute (S-1))
+ (S choose 2) * ((N-1) permute (S-2))
+ ...
+ (S choose R) * ((N-1) permute (S-R)) ]
In otherwords, probably best to fix 1
of the repeated item in place (S
choose 1 different ways of doing
this) and permute the remaining N-1
items over the S-1 remaining spaces; (same as normal N permute S)
then fix 2 of your identical item in
place (S choose 2 different ways of
doing this) and permute the remaining
N-1 items over the remaining S-2 spaces.
etc for each possible number of repeated items, from 1 up to R
And then there are N choices for
your possible repeated item.
You can use this algorithm to enumerate the possibilities too.
Edit
Oh dear.
Thanks #blueraja, you are absolutely correct! the n-repeated-items case does not generalise to 1 item!
corrected formula is therefore
(N permute S)
+ N * [ (S choose 2) * ((N-1) permute (S-2))
+ (S choose 3) * ((N-1) permute (S-3))
+ ...
+ (S choose R) * ((N-1) permute (S-R)) ]

There are relatively few possible solutions (< 10000), so it should be ok to generate all words in A^4, then remove the words with more than 2 repeats.
OR
Generate the (ordered) combinations of N distinct words
Generate the permutations of that subset to get all the possibilities without duplicates
Do the same with N-1 words
For each element in these words, add a duplicate at all the positions but the position of the said character.

I assume that only one item can repeat, and this item is not predetermined.
Here is a formula that works on A(alphabet size), S(strings size) and R(maximum repetition count):
f(A,S,R) = (A perm S) + ASum[r=2 to R] ( (S choose r)(A-1 perm S-r) )
For example, for R=1 (simple permutation) we get f(A,S,R)=(A perm S) as expected. For A=S=R=2 we have f(A,S,R)=4 which corresponds to:
1,2
2,1
1,1
2,2
The case you describe in the question is A=10, R=2, S=4, and then we have:
f(A,S,R) = 9360
(Exactly as BlueRaja calculated)

This is a nice article about combinations and permutations, there you can find all the formulas

Related

How can I minimise number of additions?

Multiply two numbers without using * operator, and with minimum number of additions
For eg: If input is, 5*8, one of the following ways, can be add the bigger number smaller number of times, and that will be the answer. But how can I minimise the number of additions?
One strategy to minimize reduce the number of additions is to add things hierarchically. This is the same strategy that is used in the classic power algorithm, which follows the same technique for minimizing the number of multiplications.
Let's say you need
M = a * 8 = a + a + a + a + a + a + a + a
Once you calculate m2 = a + a, you can substitute it into the above addition and get
M = m2 + m2 + m2 + m2
Then you can calculate m4 = m2 + m2 and arrive at
M = m4 + m4
So, the result is calculated in 3 additions instead of the original 8. However, adding a value to itself can be replaced by a left-shift by 1 bit (if this is allowed), this greatly reducing the number of additions.
This technique can be elegantly implemented through analyzing the binary representation of one of the multiplicands (exactly as it is typically implemented in the power algorithm). E.g. if you need to calculate a * b you can do it in this fashion
int M = 0;
for (int m = a; b != 0; b >>= 1, m <<= 1)
if ((b & 1) != 0)
M += m;
The total number of additions such implementation will use is the total number of 1 bits in b. It will multiply 5 by 8 in 1 addition.
Note that in order to achieve the lowest the number of additions provided by this strategy, multiplying larger number by smaller number is not necessarily the best idea. E.g. multiplying by 8 uses less additions than multiplying by 5.
A better example will be 5 * 7. This is essentially the binary multiplication using old methods, but with clever choice of the multiplier.
If we can use left-shift and that doesn't count as an addition: choose the number with the smaller number of bits as the multiplier. This will be 5 in this case.
111
x 101
------
111
000x <== This is not an addition, only a left shift
111xx
-------
100011 <== 2 additions totally.
-------
If we cannot use left-shift: note that left shift is the same as doubling / additions. Then we will have to use a slightly different tactic. Since the multiplicand will be shifted the same number of times as the (position of MSB - 1), the number of additions will be the number with the lesser value of (position of MSB - 1) + (number of bits set). In the case of 5 * 8, the values are (3-1) + 2 = 4 and (4-1) = 3 respectively. The lesser is for 8 and hence use that as the multiplier.
101
x 1000
-------
000
000x <== left shift
000xx <== left shift
101xxx <== left shift
--------
101000 <== no addition needed, so 3 additions totally.
--------
The above has three shifts and zero additions.
I like Codor's suggestion of using shifts and having zero additions!
But if you can truly only use additions and no other operations like shifts, logs, subtractions, etc, I believe the minimal number of additions to compute a * b will be:
min{int[log2(a+1)] + numbits(a), int[log2(b+1)] + numbits(b)} - 2
where
numbits(n) is the number of ones in the binary representation of
integer n
For example, numbits(4)=1, numbits(5)=2, etc.
int[x] is the integer part of float x
For example, int[3.9]=3
Now, how did we get there? First look at your original example. You can at least group additions together. E.g.
8+8=16
16+16=32
32+8=40
To generalize this, if you need to multiply a b times by only using additions that used a or the results of additions already computed, you need:
int[log2(b+1)]-1 additions to compute all the 2^n.a intermediate numbers you need.
In your example, int[log2(5+1)]-1 = 2: you need 2 additions to compute 16 and 32
numbits(b)-1 additions to add all intermediate results together, where numbits(b) is the number of ones in the binary representation of b.
In your example, 5 = 2^2 + 2^0 so numbits(5)-1 = 1: you need 1 addition to do 32 + 8
Interestingly, this means that your statement
add the bigger number smaller number of times
is not always the recipe to minimize the number of additions.
For example, if you need to compute 2^9 * (2^9 - 1), you are better off computing additions based on (2^9-1) than on 2^9 even though 2^9 is larger. The fastest approach is:
x = (2^9-1) + (2^9-1)
And then
x = x+x
8 times for a total of 9 additions.
If instead you added 2^9 to itself, you would need 8 additions to get all the 2^k*2^9 first and then an additional 8 additions to add all these numbers together for a total of 16 additions.
suppose a is to be multiplied with b and we are storing the result in res, we add a to res only if b is odd, else keep dividing b by 2 and multiplying a by 2. this is done in a loop till b becomes 0. multiplication and division can be done using bitwise operator.
Let the two given numbers be 'a' and 'b'
1) Initialize result 'res' as 0.
2) Do following while 'b' is greater than 0
a) If 'b' is odd, add 'a' to 'res'
b) Double 'a' and halve 'b'
3) Return 'res'.

Determine whether a symbol is part of the ith combination nCr

UPDATE:
Combinatorics and unranking was eventually what I needed.
The links below helped alot:
http://msdn.microsoft.com/en-us/library/aa289166(v=vs.71).aspx
http://www.codeproject.com/Articles/21335/Combinations-in-C-Part-2
The Problem
Given a list of N symbols say {0,1,2,3,4...}
And NCr combinations of these
eg. NC3 will generate:
0 1 2
0 1 3
0 1 4
...
...
1 2 3
1 2 4
etc...
For the ith combination (i = [1 .. NCr]) I want to determine Whether a symbol (s) is part of it.
Func(N, r, i, s) = True/False or 0/1
eg. Continuing from above
The 1st combination contains 0 1 2 but not 3
F(N,3,1,"0") = TRUE
F(N,3,1,"1") = TRUE
F(N,3,1,"2") = TRUE
F(N,3,1,"3") = FALSE
Current approaches and tibits that might help or be related.
Relation to matrices
For r = 2 eg. 4C2 the combinations are the upper (or lower) half of a 2D matrix
1,2 1,3 1,4
----2,3 2,4
--------3,4
For r = 3 its the corner of a 3D matrix or cube
for r = 4 Its the "corner" of a 4D matrix and so on.
Another relation
Ideally the solution would be of a form something like the answer to this:
Calculate Combination based on position
The nth combination in the list of combinations of length r (with repitition allowed), the ith symbol can be calculated
Using integer division and remainder:
n/r^i % r = (0 for 0th symbol, 1 for 1st symbol....etc)
eg for the 6th comb of 3 symbols the 0th 1st and 2nd symbols are:
i = 0 => 6 / 3^0 % 3 = 0
i = 1 => 6 / 3^1 % 3 = 2
i = 2 => 6 / 3^2 % 3 = 0
The 6th comb would then be 0 2 0
I need something similar but with repition not allowed.
Thank you for following this question this far :]
Kevin.
I believe your problem is that of unranking combinations or subsets.
I will give you an implementation in Mathematica, from the package Combinatorica, but the Google link above is probably a better place to start, unless you are familiar with the semantics.
UnrankKSubset::usage = "UnrankKSubset[m, k, l] gives the mth k-subset of set l, listed in lexicographic order."
UnrankKSubset[m_Integer, 1, s_List] := {s[[m + 1]]}
UnrankKSubset[0, k_Integer, s_List] := Take[s, k]
UnrankKSubset[m_Integer, k_Integer, s_List] :=
Block[{i = 1, n = Length[s], x1, u, $RecursionLimit = Infinity},
u = Binomial[n, k];
While[Binomial[i, k] < u - m, i++];
x1 = n - (i - 1);
Prepend[UnrankKSubset[m - u + Binomial[i, k], k-1, Drop[s, x1]], s[[x1]]]
]
Usage is like:
UnrankKSubset[5, 3, {0, 1, 2, 3, 4}]
{0, 3, 4}
Yielding the 6th (indexing from 0) length-3 combination of set {0, 1, 2, 3, 4}.
There's a very efficient algorithm for this problem, which is also contained in the recently published:Knuth, The Art of Computer Programming, Volume 4A (section 7.2.1.3).
Since you don't care about the order in which the combinations are generated, let's use the lexicographic order of the combinations where each combination is listed in descending order. Thus for r=3, the first 11 combinations of 3 symbols would be: 210, 310, 320, 321, 410, 420, 421, 430, 431, 432, 510. The advantage of this ordering is that the enumeration is independent of n; indeed it is an enumeration over all combinations of 3 symbols from {0, 1, 2, …}.
There is a standard method to directly generate the ith combination given i, so to test whether a symbol s is part of the ith combination, you can simply generate it and check.
Method
How many combinations of r symbols start with a particular symbol s? Well, the remaining r-1 positions must come from the s symbols 0, 1, 2, …, s-1, so it's (s choose r-1), where (s choose r-1) or C(s,r-1) is the binomial coefficient denoting the number of ways of choosing r-1 objects from s objects. As this is true for all s, the first symbol of the ith combination is the smallest s such that
&Sum;k=0s(k choose r-1) ≥ i.
Once you know the first symbol, the problem reduces to finding the (i - &Sum;k=0s-1(k choose r-1))-th combination of r-1 symbols, where we've subtracted those combinations that start with a symbol less than s.
Code
Python code (you can write C(n,r) more efficiently, but this is fast enough for us):
#!/usr/bin/env python
tC = {}
def C(n,r):
if tC.has_key((n,r)): return tC[(n,r)]
if r>n-r: r=n-r
if r<0: return 0
if r==0: return 1
tC[(n,r)] = C(n-1,r) + C(n-1,r-1)
return tC[(n,r)]
def combination(r, k):
'''Finds the kth combination of r letters.'''
if r==0: return []
sum = 0
s = 0
while True:
if sum + C(s,r-1) < k:
sum += C(s,r-1)
s += 1
else:
return [s] + combination(r-1, k-sum)
def Func(N, r, i, s): return s in combination(r, i)
for i in range(1, 20): print combination(3, i)
print combination(500, 10000000000000000000000000000000000000000000000000000000000000000)
Note how fast this is: it finds the 10000000000000000000000000000000000000000000000000000000000000000th combination of 500 letters (it starts with 542) in less than 0.5 seconds.
I have written a class to handle common functions for working with the binomial coefficient, which is the type of problem that your problem falls under. It performs the following tasks:
Outputs all the K-indexes in a nice format for any N choose K to a file. The K-indexes can be substituted with more descriptive strings or letters. This method makes solving this type of problem quite trivial.
Converts the K-indexes to the proper index of an entry in the sorted binomial coefficient table. This technique is much faster than older published techniques that rely on iteration. It does this by using a mathematical property inherent in Pascal's Triangle. My paper talks about this. I believe I am the first to discover and publish this technique, but I could be wrong.
Converts the index in a sorted binomial coefficient table to the corresponding K-indexes.
Uses Mark Dominus method to calculate the binomial coefficient, which is much less likely to overflow and works with larger numbers.
The class is written in .NET C# and provides a way to manage the objects related to the problem (if any) by using a generic list. The constructor of this class takes a bool value called InitTable that when true will create a generic list to hold the objects to be managed. If this value is false, then it will not create the table. The table does not need to be created in order to perform the 4 above methods. Accessor methods are provided to access the table.
There is an associated test class which shows how to use the class and its methods. It has been extensively tested with 2 cases and there are no known bugs.
To read about this class and download the code, see Tablizing The Binomial Coeffieicent.
This class can easily be applied to your problem. If you have the rank (or index) to the binomial coefficient table, then simply call the class method that returns the K-indexes in an array. Then, loop through that returned array to see if any of the K-index values match the value you have. Pretty straight forward...

Getting the lowest possible sum from numbers' difference

I have to find the lowest possible sum from numbers' difference.
Let's say I have 4 numbers. 1515, 1520, 1500 and 1535. The lowest sum of difference is 30, because 1535 - 1520 = 15 && 1515 - 1500 = 15 and 15 + 15 = 30. If I would do like this: 1520 - 1515 = 5 && 1535 - 1500 = 35 it would be 40 in sum.
Hope you got it, if not, ask me.
Any ideas how to program this? I just found this online, tried to translate from my language to English. It sounds interesting. I can't do bruteforce, because it would take ages to compile. I don't need code, just ideas how to program or little fragment of code.
Thanks.
Edit:
I didn't post everything... One more edition:
I have let's say 8 possible numbers. But I have to take only 6 of them to make the smallest sum. For instance, numbers 1731, 1572, 2041, 1561, 1682, 1572, 1609, 1731, the smallest sum will be 48, but here I have to take only 6 numbers from 8.
Taking the edit into account:
Start by sorting the list. Then use a dynamic programming solution, with state i, n representing the minimum sum of n differences when considering only the first i numbers in the sequence. Initial states: dp[*][0] = 0, everything else = infinity. Use two loops: outer loop looping through i from 1 to N, inner loop looping through n from 0 to R (3 in your example case in your edit - this uses 3 pairs of numbers which means 6 individual numbers). Your recurrence relation is dp[i][n] = min(dp[i-1][n], dp[i-2][n-1] + seq[i] - seq[i-1]).
You have to be aware of handling boundary cases which I've ignored, but the general idea should work and will run in O(N log N + NR) and use O(NR) space.
The solution by marcog is a correct, non-recursive, polynomial-time solution to the problem — it's a pretty standard DP problem — but, just for completeness, here's a proof that it works, and actual code for the problem. [#marcog: Feel free to copy any part of this answer into your own if you wish; I'll then delete this.]
Proof
Let the list be x1, …, xN. Assume wlog that the list is sorted. We're trying to find K (disjoint) pairs of elements from the list, such that the sum of their differences is minimised.
Claim: An optimal solution always consists of the differences of consecutive elements.
Proof: Suppose you fix the subset of elements whose differences are taken. Then by the proof given by Jonas Kölker, the optimal solution for just this subset consists of differences of consecutive elements from the list. Now suppose there is a solution corresponding to a subset that does not comprise pairs of consecutive elements, i.e. the solution involves a difference xj-xi where j>i+1. Then, we can replace xj with xi+1 to get a smaller difference, since
xi ≤ xi+1 ≤ xj ⇒ xi+1-xi ≤ xj-xi.
(Needless to say, if xi+1=xj, then taking xi+1 is indistinguishable from taking xj.) This proves the claim.
The rest is just routine dynamic programming stuff: the optimal solution using k pairs from the first n elements either doesn't use the nth element at all (in which case it's just the optimal solution using k pairs from the first n-1), or it uses the nth element in which case it's the difference xn-xn-1 plus the optimal solution using k-1 pairs from the first n-2.
The whole program runs in time O(N log N + NK), as marcog says. (Sorting + DP.)
Code
Here's a complete program. I was lazy with initializing arrays and wrote Python code using dicts; this is a small log(N) factor over using actual arrays.
'''
The minimum possible sum|x_i - x_j| using K pairs (2K numbers) from N numbers
'''
import sys
def ints(): return [int(s) for s in sys.stdin.readline().split()]
N, K = ints()
num = sorted(ints())
best = {} #best[(k,n)] = minimum sum using k pairs out of 0 to n
def b(k,n):
if best.has_key((k,n)): return best[(k,n)]
if k==0: return 0
return float('inf')
for n in range(1,N):
for k in range(1,K+1):
best[(k,n)] = min([b(k,n-1), #Not using num[n]
b(k-1,n-2) + num[n]-num[n-1]]) #Using num[n]
print best[(K,N-1)]
Test it:
Input
4 2
1515 1520 1500 1535
Output
30
Input
8 3
1731 1572 2041 1561 1682 1572 1609 1731
Output
48
I assume the general problem is this: given a list of 2n integers, output a list of n pairs, such that the sum of |x - y| over all pairs (x, y) is as small as possible.
In that case, the idea would be:
sort the numbers
emit (numbers[2k], numbers[2k+1]) for k = 0, ..., n - 1.
This works. Proof:
Suppose you have x_1 < x_2 < x_3 < x_4 (possibly with other values between them) and output (x_1, x_3) and (x_2, x_4). Then
|x_4 - x_2| + |x_3 - x_1| = |x_4 - x_3| + |x_3 - x_2| + |x_3 - x_2| + |x_2 - x_1| >= |x_4 - x_3| + |x_2 - x_1|.
In other words, it's always better to output (x_1, x_2) and (x_3, x_4) because you don't redundantly cover the space between x_2 and x_3 twice. By induction, the smallest number of the 2n must be paired with the second smallest number; by induction on the rest of the list, pairing up smallest neighbours is always optimal, so the algorithm sketch I proposed is correct.
Order the list, then do the difference calculation.
EDIT: hi #hey
You can solve the problem using dynamic programming.
Say you have a list L of N integers, you must form k pairs (with 2*k <= N)
Build a function that finds the smallest difference within a list (if the list is sorted, it will be faster ;) call it smallest(list l)
Build another one that finds the same for two pairs (can be tricky, but doable) and call it smallest2(list l)
Let's define best(int i, list l) the function that gives you the best result for i pairs within the list l
The algorithm goes as follows:
best(1, L) = smallest(L)
best(2, L) = smallest2(L)
for i from 1 to k:
loop
compute min (
stored_best(i-2) - smallest2( stored_remainder(i-2) ),
stored_best(i-1) - smallest( stored_remainder(i-1)
) and store as best(i)
store the remainder as well for the chosen solution
Now, the problem is once you have chosen a pair, the two ints that form the boundaries are reserved and can't be used to form a better solution. But by looking two levels back you can guaranty you have allowed switching candidates.
(The switching work is done by smallest2)
Step 1: Calculate pair differences
I think it is fairly obvious that the right approach is to sort the numbers and then take differences between each
adjacent pair of numbers. These differences are the "candidate" differences contributing to the
minimal difference sum. Using the numbers from your example would lead to:
Number Diff
====== ====
1561
11
1572
0
1572
37
1609
73
1682
49
1731
0
1731
310
2041
Save the differences into an array or table or some other data structure where you can maintain the
differences and the two numbers that contributed to each difference. Call this the DiffTable. It
should look something like:
Index Diff Number1 Number2
===== ==== ======= =======
1 11 1561 1572
2 0 1572 1572
3 37 1572 1609
4 73 1609 1682
5 49 1682 1731
6 0 1731 1731
7 310 1731 2041
Step 2: Choose minimal Differences
If all numbers had to be chosen, we could have stopped at step 1 by choosing the number pair for odd numbered
indices: 1, 3, 5, 7. This is the correct answer. However,
the problem states that a subset of pairs are chosen and this complicates the problem quite a bit.
In your example 3 differences (6 numbers = 3 pairs = 3 differences) need to be chosen such that:
The sum of the differences is minimal
The numbers participating in any chosen difference are removed from the list.
The second point means that if we chose Diff 11 (Index = 1 above), the numbers 1561 and 1572 are
removed from the list, and consequently, the next Diff of 0 at index 2 cannot be used because only 1 instance
of 1572 is left. Whenever a
Diff is chosen the adjacent Diff values are removed. This is why there is only one way to choose 4 pairs of
numbers from a list containing eight numbers.
About the only method I can think of to minimize the sum of the Diff above is to generate and test.
The following pseudo code outlines a process to generate
all 'legal' sets of index values for a DiffTable of arbitrary size
where an arbitrary number of number pairs are chosen. One (or more) of the
generated index sets will contain the indices into the DiffTable yielding a minimum Diff sum.
/* Global Variables */
M = 7 /* Number of candidate pair differences in DiffTable */
N = 3 /* Number of indices in each candidate pair set (3 pairs of numbers) */
AllSets = [] /* Set of candidate index sets (set of sets) */
call GenIdxSet(1, []) /* Call generator with seed values */
/* AllSets now contains candidate index sets to perform min sum tests on */
end
procedure: GenIdxSet(i, IdxSet)
/* Generate all the valid index values for current level */
/* and subsequent levels until a complete index set is generated */
do while i <= M
if CountMembers(IdxSet) = N - 1 then /* Set is complete */
AllSets = AppendToSet(AllSets, AppendToSet(IdxSet, i))
else /* Add another index */
call GenIdxSet(i + 2, AppendToSet(IdxSet, i))
i = i + 1
end
return
Function CountMembers returns the number of members in the given set, function AppendToSet returns a new set
where the arguments are appended into a single ordered set. For example
AppendToSet([a, b, c], d) returns the set: [a, b, c, d].
For the given parameters, M = 7 and N = 3, AllSets becomes:
[[1 3 5]
[1 3 6] <= Diffs = (11 + 37 + 0) = 48
[1 3 7]
[1 4 6]
[1 4 7]
[1 5 7]
[2 4 6]
[2 4 7]
[2 5 7]
[3 5 7]]
Calculate the sums using each set of indices, the one that is minimum identifies the
required number pairs in DiffTable. Above I show that the second set of indices gives
the minimum you are looking for.
This is a simple brute force technique and it does not scale very well. If you had a list of
50 number pairs and wanted to choose the 5 pairs, AllSets would contain 1,221,759 sets of
number pairs to test.
I know you said you did not need code but it is the best way for me to describe a set based solution. The solution runs under SQL Server 2008. Included in the code is the data for the two examples you give. The sql solution could be done with a single self joining table but I find it easier to explain when there are multiple tables.
--table 1 holds the values
declare #Table1 table (T1_Val int)
Insert #Table1
--this data is test 1
--Select (1515) Union ALL
--Select (1520) Union ALL
--Select (1500) Union ALL
--Select (1535)
--this data is test 2
Select (1731) Union ALL
Select (1572) Union ALL
Select (2041) Union ALL
Select (1561) Union ALL
Select (1682) Union ALL
Select (1572) Union ALL
Select (1609) Union ALL
Select (1731)
--Select * from #Table1
--table 2 holds the sorted numbered list
Declare #Table2 table (T2_id int identity(1,1), T1_Val int)
Insert #Table2 Select T1_Val from #Table1 order by T1_Val
--table 3 will hold the sorted pairs
Declare #Table3 table (T3_id int identity(1,1), T21_id int, T21_Val int, T22_id int, T22_val int)
Insert #Table3
Select T2_1.T2_id, T2_1.T1_Val,T2_2.T2_id, T2_2.T1_Val from #Table2 AS T2_1
LEFT Outer join #Table2 AS T2_2 on T2_1.T2_id = T2_2.T2_id +1
--select * from #Table3
--remove odd numbered rows
delete from #Table3 where T3_id % 2 > 0
--select * from #Table3
--show the diff values
--select *, ABS(T21_Val - T22_val) from #Table3
--show the diff values in order
--select *, ABS(T21_Val - T22_val) from #Table3 order by ABS(T21_Val - T22_val)
--display the two lowest
select TOP 2 CAST(T22_val as varchar(24)) + ' and ' + CAST(T21_val as varchar(24)) as 'The minimum difference pairs are'
, ABS(T21_Val - T22_val) as 'Difference'
from #Table3
ORDER by ABS(T21_Val - T22_val)
I think #marcog's approach can be simplified further.
Take the basic approach that #jonas-kolker proved for finding the smallest differences. Take the resulting list and sort it. Take the R smallest entries from this list and use them as your differences. Proving that this is the smallest sum is trivial.
#marcog's approach is effectively O(N^2) because R == N is a legit option. This approach should be (2*(N log N))+N aka O(N log N).
This requires a small data structure to hold a difference and the values it was derived from. But, that is constant per entry. Thus, space is O(N).
I would go with answer of marcog, you can sort using any of the sorting algoriothms. But there is little thing to analyze now.
If you have to choose R numbers out N numbers so that the sum of their differences is minimum then the numbers be chosen in a sequence without missing any numbers in between.
Hence after sorting the array you should run an outer loop from 0 to N-R and an inner loop from 0 to R-1 times to calculate the sum of differnces.
If needed, you should try with some examples.
I've taken an approach which uses a recursive algorithm, but it does take some of what other people have contributed.
First of all we sort the numbers:
[1561,1572,1572,1609,1682,1731,1731,2041]
Then we compute the differences, keeping track of which the indices of the numbers that contributed to each difference:
[(11,(0,1)),(0,(1,2)),(37,(2,3)),(73,(3,4)),(49,(4,5)),(0,(5,6)),(310,(6,7))]
So we got 11 by getting the difference between number at index 0 and number at index 1, 37 from the numbers at indices 2 & 3.
I then sorted this list, so it tells me which pairs give me the smallest difference:
[(0,(1,2)),(0,(5,6)),(11,(0,1)),(37,(2,3)),(49,(4,5)),(73,(3,4)),(310,(6,7))]
What we can see here is that, given that we want to select n numbers, a naive solution might be to select the first n / 2 items of this list. The trouble is, in this list the third item shares an index with the first, so we'd only actually get 5 numbers, not 6. In this case you need to select the fourth pair as well to get a set of 6 numbers.
From here, I came up with this algorithm. Throughout, there is a set of accepted indices which starts empty, and there's a number of numbers left to select n:
If n is 0, we're done.
if n is 1, and the first item will provide just 1 index which isn't in our set, we taken the first item, and we're done.
if n is 2 or more, and the first item will provide 2 indices which aren't in our set, we taken the first item, and we recurse (e.g. goto 1). This time looking for n - 2 numbers that make the smallest difference in the remainder of the list.
This is the basic routine, but life isn't that simple. There are cases we haven't covered yet, but make sure you get the idea before you move on.
Actually step 3 is wrong (found that just before I posted this :-/), as it may be unnecessary to include an early difference to cover indices which are covered by later, essential differences. The first example ([1515, 1520, 1500, 1535]) falls foul of this. Because of this I've thrown it away in the section below, and expanded step 4 to deal with it.
So, now we get to look at the special cases:
** as above **
** as above **
If n is 1, but the first item will provide two indices, we can't select it. We have to throw that item away and recurse. This time we're still looking for n indices, and there have been no changes to our accepted set.
If n is 2 or more, we have a choice. Either we can a) choose this item, and recurse looking for n - (1 or 2) indices, or b) skip this item, and recurse looking for n indices.
4 is where it gets tricky, and where this routine turns into a search rather than just a sorting exercise. How can we decide which branch (a or b) to take? Well, we're recursive, so let's call both, and see which one is better. How will we judge them?
We'll want to take whichever branch produces the lowest sum.
...but only if it will use up the right number of indices.
So step 4 becomes something like this (pseudocode):
x = numberOfIndicesProvidedBy(currentDifference)
branchA = findSmallestDifference (n-x, remainingDifferences) // recurse looking for **n-(1 or 2)**
branchB = findSmallestDifference (n , remainingDifferences) // recurse looking for **n**
sumA = currentDifference + sumOf(branchA)
sumB = sumOf(branchB)
validA = indicesAddedBy(branchA) == n
validB = indicesAddedBy(branchB) == n
if not validA && not validB then return an empty branch
if validA && not validB then return branchA
if validB && not validA then return branchB
// Here, both must be valid.
if sumA <= sumB then return branchA else return branchB
I coded this up in Haskell (because I'm trying to get good at it). I'm not sure about posting the whole thing, because it might be more confusing than useful, but here's the main part:
findSmallestDifference = findSmallestDifference' Set.empty
findSmallestDifference' _ _ [] = []
findSmallestDifference' taken n (d:ds)
| n == 0 = [] -- Case 1
| n == 1 && provides1 d = [d] -- Case 2
| n == 1 && provides2 d = findSmallestDifference' taken n ds -- Case 3
| provides0 d = findSmallestDifference' taken n ds -- Case 3a (See Edit)
| validA && not validB = branchA -- Case 4
| validB && not validA = branchB -- Case 4
| validA && validB && sumA <= sumB = branchA -- Case 4
| validA && validB && sumB <= sumA = branchB -- Case 4
| otherwise = [] -- Case 4
where branchA = d : findSmallestDifference' (newTaken d) (n - (provides taken d)) ds
branchB = findSmallestDifference' taken n ds
sumA = sumDifferences branchA
sumB = sumDifferences branchB
validA = n == (indicesTaken branchA)
validB = n == (indicesTaken branchA)
newTaken x = insertIndices x taken
Hopefully you can see all the cases there. That code(-ish), plus some wrapper produces this:
*Main> findLeastDiff 6 [1731, 1572, 2041, 1561, 1682, 1572, 1609, 1731]
Smallest Difference found is 48
1572 - 1572 = 0
1731 - 1731 = 0
1572 - 1561 = 11
1609 - 1572 = 37
*Main> findLeastDiff 4 [1515, 1520, 1500,1535]
Smallest Difference found is 30
1515 - 1500 = 15
1535 - 1520 = 15
This has become long, but I've tried to be explicit. Hopefully it was worth while.
Edit : There is a case 3a that can be added to avoid some unnecessary work. If the current difference provides no additional indices, it can be skipped. This is taken care of in step 4 above, but there's no point in evaluating both halves of the tree for no gain. I've added this to the Haskell.
Something like
Sort List
Find Duplicates
Make the duplicates a pair
remove duplicates from list
break rest of list into pairs
calculate differences of each pair
take lowest amounts
In your example you have 8 number and need the best 3 pairs. First sort the list which gives you
1561, 1572, 1572, 1609, 1682, 1731, 1731, 2041
If you have duplicates make them a pair and remove them from the list so you have
[1572, 1572] = 0
[1731, 1731] = 0
L = { 1561, 1609, 1682, 2041 }
Break the remaining list into pairs, giving you the 4 following pairs
[1572, 1572] = 0
[1731, 1731] = 0
[1561, 1609] = 48
[1682, 2041] = 359
Then drop the amount of numbers you need to.
This gives you the following 3 pairs with the lowest pairs
[1572, 1572] = 0
[1731, 1731] = 0
[1561, 1609] = 48
So
0 + 0 + 48 = 48

Algorithm to count the number of valid blocks in a permutation [duplicate]

This question already has answers here:
Closed 12 years ago.
Possible Duplicate:
Finding sorted sub-sequences in a permutation
Given an array A which holds a permutation of 1,2,...,n. A sub-block A[i..j]
of an array A is called a valid block if all the numbers appearing in A[i..j]
are consecutive numbers (may not be in order).
Given an array A= [ 7 3 4 1 2 6 5 8] the valid blocks are [3 4], [1,2], [6,5],
[3 4 1 2], [3 4 1 2 6 5], [7 3 4 1 2 6 5], [7 3 4 1 2 6 5 8]
So the count for above permutation is 7.
Give an O( n log n) algorithm to count the number of valid blocks.
Ok, I am down to 1 rep because I put 200 bounty on a related question: Finding sorted sub-sequences in a permutation
so I cannot leave comments for a while.
I have an idea:
1) Locate all permutation groups. They are: (78), (34), (12), (65). Unlike in group theory, their order and position, and whether they are adjacent matters. So, a group (78) can be represented as a structure (7, 8, false), while (34) would be (3,4,true). I am using Python's notation for tuples, but it is actually might be better to use a whole class for the group. Here true or false means contiguous or not. Two groups are "adjacent" if (max(gp1) == min(gp2) + 1 or max(gp2) == min(gp1) + 1) and contigous(gp1) and contiguos(gp2). This is not the only condition, for union(gp1, gp2) to be contiguous, because (14) and (23) combine into (14) nicely. This is a great question for algo class homework, but a terrible one for interview. I suspect this is homework.
Just some thoughts:
At first sight, this sounds impossible: a fully sorted array would have O(n2) valid sub-blocks.
So, you would need to count more than one valid sub-block at a time. Checking the validity of a sub-block is O(n). Checking whether a sub-block is fully sorted is O(n) as well. A fully sorted sub-block contains n·(n - 1)/2 valid sub-blocks, which you can count without further breaking this sub-block up.
Now, the entire array is obviously always valid. For a divide-and-conquer approach, you would need to break this up. There are two conceivable breaking points: the location of the highest element, and that of the lowest element. If you break the array into two at one of these points, including the extremum in the part that contains the second-to-extreme element, there cannot be a valid sub-block crossing this break-point.
By always choosing the extremum that produces a more even split, this should work quite well (average O(n log n)) for "random" arrays. However, I can see problems when your input is something like (1 5 2 6 3 7 4 8), which seems to produce O(n2) behaviour. (1 4 7 2 5 8 3 6 9) would be similar (I hope you see the pattern). I currently see no trick to catch this kind of worse case, but it seems that it requires other splitting techniques.
This question does involve a bit of a "math trick" but it's fairly straight forward once you get it. However, the rest of my solution won't fit the O(n log n) criteria.
The math portion:
For any two consecutive numbers their sum is 2k+1 where k is the smallest element. For three it is 3k+3, 4 : 4k+6 and for N such numbers it is Nk + sum(1,N-1). Hence, you need two steps which can be done simultaneously:
Create the sum of all the sub-arrays.
Determine the smallest element of a sub-array.
The dynamic programming portion
Build two tables using the results of the previous row's entries to build each successive row's entries. Unfortunately, I'm totally wrong as this would still necessitate n^2 sub-array checks. Ugh!
My proposition
STEP = 2 // amount of examed number
B [0,0,0,0,0,0,0,0]
B [1,1,0,0,0,0,0,0]
VALID(A,B) - if not valid move one
B [0,1,1,0,0,0,0,0]
VALID(A,B) - if valid move one and step
B [0,0,0,1,1,0,0,0]
VALID (A,B)
B [0,0,0,0,0,1,1,0]
STEP = 3
B [1,1,1,0,0,0,0,0] not ok
B [0,1,1,1,0,0,0,0] ok
B [0,0,0,0,1,1,1,0] not ok
STEP = 4
B [1,1,1,1,0,0,0,0] not ok
B [0,1,1,1,1,0,0,0] ok
.....
CON <- 0
STEP <- 2
i <- 0
j <- 0
WHILE(STEP <= LEN(A)) DO
j <- STEP
WHILE(STEP <= LEN(A) - j) DO
IF(VALID(A,i,j)) DO
CON <- CON + 1
i <- j + 1
j <- j + STEP
ELSE
i <- i + 1
j <- j + 1
END
END
STEP <- STEP + 1
END
The valid method check that all elements are consecutive
Never tested but, might be ok
The original array doesn't contain duplicates so must itself be a consecutive block. Lets call this block (1 ~ n). We can test to see whether block (2 ~ n) is consecutive by checking if the first element is 1 or n which is O(1). Likewise we can test block (1 ~ n-1) by checking whether the last element is 1 or n.
I can't quite mould this into a solution that works but maybe it will help someone along...
Like everybody else, I'm just throwing this out ... it works for the single example below, but YMMV!
The idea is to count the number of illegal sub-blocks, and subtract this from the total possible number. We count the illegal ones by examining each array element in turn and ruling out sub-blocks that include the element but not its predecessor or successor.
Foreach i in [1,N], compute B[A[i]] = i.
Let Count = the total number of sub-blocks with length>1, which is N-choose-2 (one for each possible combination of starting and ending index).
Foreach i, consider A[i]. Ignoring edge cases, let x=A[i]-1, and let y=A[i]+1. A[i] cannot participate in any sub-block that does not include x or y. Let iX=B[x] and iY=B[y]. There are several cases to be treated independently here. The general case is that iX<i<iY<i. In this case, we can eliminate the sub-block A[iX+1 .. iY-1] and all intervening blocks containing i. There are (i - iX + 1) * (iY - i + 1) such sub-blocks, so call this number Eliminated. (Other cases left as an exercise for the reader, as are those edge cases.) Set Count = Count - Eliminated.
Return Count.
The total cost appears to be N * (cost of step 2) = O(N).
WRINKLE: In step 2, we must be careful not to eliminate each sub-interval more than once. We can accomplish this by only eliminating sub-intervals that lie fully or partly to the right of position i.
Example:
A = [1, 3, 2, 4]
B = [1, 3, 2, 4]
Initial count = (4*3)/2 = 6
i=1: A[i]=1, so need sub-blocks with 2 in them. We can eliminate [1,3] from consideration. Eliminated = 1, Count -> 5.
i=2: A[i]=3, so need sub-blocks with 2 or 4 in them. This rules out [1,3] but we already accounted for it when looking right from i=1. Eliminated = 0.
i=3: A[i] = 2, so need sub-blocks with [1] or [3] in them. We can eliminate [2,4] from consideration. Eliminated = 1, Count -> 4.
i=4: A[i] = 4, so we need sub-blocks with [3] in them. This rules out [2,4] but we already accounted for it when looking right from i=3. Eliminated = 0.
Final Count = 4, corresponding to the sub-blocks [1,3,2,4], [1,3,2], [3,2,4] and [3,2].
(This is an attempt to do this N.log(N) worst case. Unfortunately it's wrong -- it sometimes undercounts. It incorrectly assumes you can find all the blocks by looking at only adjacent pairs of smaller valid blocks. In fact you have to look at triplets, quadruples, etc, to get all the larger blocks.)
You do it with a struct that represents a subblock and a queue for subblocks.
struct
c_subblock
{
int index ; /* index into original array, head of subblock */
int width ; /* width of subblock > 0 */
int lo_value;
c_subblock * p_above ; /* null or subblock above with same index */
};
Alloc an array of subblocks the same size as the original array, and init each subblock to have exactly one item in it. Add them to the queue as you go. If you start with array [ 7 3 4 1 2 6 5 8 ] you will end up with a queue like this:
queue: ( [7,7] [3,3] [4,4] [1,1] [2,2] [6,6] [5,5] [8,8] )
The { index, width, lo_value, p_above } values for subbblock [7,7] will be { 0, 1, 7, null }.
Now it's easy. Forgive the c-ish pseudo-code.
loop {
c_subblock * const p_left = Pop subblock from queue.
int const right_index = p_left.index + p_left.width;
if ( right_index < length original array ) {
// Find adjacent subblock on the right.
// To do this you'll need the original array of length-1 subblocks.
c_subblock const * p_right = array_basic_subblocks[ right_index ];
do {
Check the left/right subblocks to see if the two merged are also a subblock.
If they are add a new merged subblock to the end of the queue.
p_right = p_right.p_above;
}
while ( p_right );
}
}
This will find them all I think. It's usually O(N log(N)), but it'll be O(N^2) for a fully sorted or anti-sorted list. I think there's an answer to this though -- when you build the original array of subblocks you look for sorted and anti-sorted sequences and add them as the base-level subblocks. If you are keeping a count increment it by (width * (width + 1))/2 for the base-level. That'll give you the count INCLUDING all the 1-length subblocks.
After that just use the loop above, popping and pushing the queue. If you're counting you'll have to have a multiplier on both the left and right subblocks and multiply these together to calculate the increment. The multiplier is the width of the leftmost (for p_left) or rightmost (for p_right) base-level subblock.
Hope this is clear and not too buggy. I'm just banging it out, so it may even be wrong.
[Later note. This doesn't work after all. See note below.]

(ProjectEuler) Sum Combinations

From ProjectEuler.net:
Prob 76: How many different ways can one hundred be written as a sum of at least two positive integers?
I have no idea how to start this...any points in the right direction or help? I'm not looking for how to do it but some hints on how to do it.
For example 5 can be written like:
4 + 1
3 + 2
3 + 1 + 1
2 + 2 + 1
2 + 1 + 1 + 1
1 + 1 + 1 + 1 + 1
So 6 possibilities total.
Partition Numbers (or Partition Functions) are the key to this one.
Problems like these are usually easier if you start at the bottom and work your way up to see if you can detect any patterns.
P(1) = 1 = {1}
P(2) = 2 = {[2], [1 + 1]}
P(3) = 3 = {[3], [2 + 1], [1 + 1 + 1]}
P(4) = 5 = {[4], [3 + 1], [2 + 2], [2 + 1 + 1], [1 + 1 + 1 + 1]}
P(5) = 7 ...
P(6) = 11 ...
P(7) = 15 ...
P(8) = 22 ...
P(9) = 30 ...
Hint: See if you can build P(N) up from some combination of the results prior to P(N).
The solution can be found using a chopping algorithm.
Use for example the 6. Then we have:
6
5+1
4+2
3+3
but we are not finished yet.
If we take the 5+1, and consider the +1 part as finished, because all other ending combinations are handled by the 4+2 and 3+3. So we need to apply the same trick to the 5.
4+1+1
3+2+1
And we can continue. But not mindlessly. Because for example 4+2 produces 3+1+2 and 2+2+2. But we don't want the 3+1+2 because we will have a 3+2+1. So we only use all productions of 4 where the lowest number is greater or equal than 2.
6
5+1
4+1+1
3+1+1+1
2+1+1+1+1
1+1+1+1+1+1
2+2+1+1
3+2+1
4+2
2+2+2
3+3
Next step is to put this in an algorithm.
Ok we need a recursive function that takes two parameters. The number to be chopped and the minimal value:
func CountCombinations(Number, Minimal)
temp = 1
if Number<=1 then return 1
for i = 1 to Floor(Number/2)
if i>=Minimal then
temp := temp + CountCombinations(Number-i, i)
end for
return temp
end func
To check the algorithm:
C(6,1) = 1 + C(5,1) + C(4,2) + C(3,3) = 11, which is correct.
C(5,1) = 1 + C(4,1) + C(3,2) = 7
C(4,1) = 1 + C(3,1) + C(2,2) = 5
C(3,1) = 1 + C(2,1) = 3
C(2,1) = 1 + C(1,1) = 2
C(1,1) = 1
C(2,2) = 1
C(3,2) = 1
C(4,2) = 1 + C(2,2) = 2
C(3,3) = 1
By the way, the number of combinations of 100:
CC(100) = 190569292
CC(100) = 190569291 (if we don't take into account 100 + 0)
A good way to approach these is not get fixated on the '100' but try to consider what the difference between totalling for a sum n and n+1 would be, by looking for patterns as n increases 1,2,3....
I'd have a go now but I have work to do :)
Like most problems in Project Euler with big numbers, the best way to think about them is not to get stumped with that huge upper bound, and think of the problem in smaller terms, and gradually work your way up. Maybe, on the way you'll recognize a pattern, or learn enough to get you to the answer easily.
The only other hint I think I can give you without spoiling your epiphany is the word 'partition'.
Once you've figured that out, you'll have it in no time :)
one approach is to think recursive function: find permutations of a numeric series drawn from the positive integers (duplicates allowed) that add up to 100
the zero is 1, i.e. for the number 1 there are zero solutions
the unit is 2, i.e for the number 2 there is only one solution
another approach is to think generative function: start with the zero and find permutation series up to the target, keeping a map/hash or the intermediate values and counts
you can iterate up from 1, or recurse down from 100; you'll get the same answer either way. At each point you could (for a naive solution) generate all permutations of the series of positive integers counting up to the target number minus 1, and count only those that add up to the target number
good luck!
Notice: My maths is a bit rusty but hopefully this will help...
You are going well with your break down of the problem.
Think Generally:
A number n can be written as (n-1)+1 or (n-2)+2
You generalise this to (n-m)+m
Remember that the above also applies to all numbers (including m)
So the idea is to find the first set (lets say 5 = (5-1)+1) and then treat (5-1) as a new n...5 = 4 +1...5 = ((4-1)+1)+1. The once that is exhausted begin again on 5 = 3 + 2....which becomes 5 = ((3-1)+1)+2 ....= 2+1+2....breaking down each one as you go along.
Many math problems can be solved by induction.
You know the answer for a specific value, and you can find the answer for every value, if you find something that link n with n+1.
For example in your case you know that the answer for How many different ways can one be written as a sum of at least two positive integers? is just 1.
What do I mean with the link between n and n+1? Well I mean exactly that you must find a formula that, provide you know the answer for n, you will find the answer for n+1. Then, calling recursively that formula, you'll know the answer and you've done (note: this is just the mathematical part of it, in real life you might find that this approach would gave you something too slow to be practical, so you've not done yet - in this case I think you will be done).
Now, suppose that you know n can be written as a sum of at least two positive integers? in k different ways, one of which would be:
n=a1+a2+a3+...am (this sum has m terms)
What can you say about n+1? Since you would like just hints I'm not writing the solution here, but just what follows. Surely you have the same k different ways, but in each of them there will be the +1 term, one of which would be:
n+1=a1+a2+a3+...am+1 (this sum has m+1 terms)
Then, of course, you will have other k possibilities, such those in which the last term of each sum is not the same, but it will be increased by one, like:
n+1=a1+a2+a3+...(am+1) (this sum has m terms)
Thus there are at least 2k ways of writing n+1 as a sum of at least two positive integers. Well, there are other ones. Find it out, if you can :-)
And enjoy :-))

Resources