Number of different rolls of K N-sided dice - dice

I needed to calculate the number of different possible rolls that could arise from rolling K dice, each with N-sides. My definition of roll is that something like {1, 1, 2, 3, 4} is equivalent to {1, 4, 3, 1, 2} (order doesn't matter), but not to {1, 1, 3, 3, 3} (they're not the same set of results). For example: Yahtzee is a game involving rolls of 5 6-sided dice—at least initially, before rerolls—and the number of different rolls is thus 252. The case when N = K leads to OEIS sequence A001700.
If I'm not horribly mistaken, this is given by "(N-1+K) choose (N-1)", or equivalently, "(N+K-1) choose K", which is K ! <: K + N in J. This leads me to four different tacit representations:
d =: ([ ! [: <: +). Simple train, without parentheses, though I need to use a cap.
d =: ([ (! <:) +). No cap, but parenthesizing the inner hook.
d =: (] !&<: +). Only a three verb train, but using a Compose. It uses the (<: N) ! <: K + N version.
d =: (([ ! +) * ] % +). This one rewrites "C(N+K-1, K)" as "C(N+K, K) * N / (N+K)". It's uglier, but in the 0 dice of 0 sides case, it gives 0 instead of 1, which is arguably a less nonsensical answer.
Which of these is the most "J-ish" solution?
Also, the monadic case for all of these is meaningless: 1 0 0 0 0 ... for the first three and 0 1 1 1 ... for the fourth. A more logical monad for this verb would be the reflexive, as given by d~, so would it be better to define this verb as (d~ : d)?

My preference would be:
d =: ([ (! <:) +)
and to add a monadic option to the dyadic
d =: d~ : ([ (! <:) +) NB. 4 d 5 ( 4 rolls of 5 sided dice : 70 possible combinations)
I would add the comment including sample arguments and an expected purpose to save me time were I to stumble across it later.
Of course, the final version would be the choice if 0 d 0 were to return 0, even if it does look a little more complicated.

Related

Minimum Delete operations to empty the vector

My friend was asked this question in an interview:
We have a vector of integers consisting only of 0s and 1s. A delete consists of selecting consecutive equal numbers and removing them. The remaining parts are then attached to each other. For e.g., if the vector is [0,1,1,0] then after removing [1,1] we get [0,0]. We need one delete to remove an element from the vector, if no consecutive elements are found.
We need to write a function that returns the minimum number of deletes to make the vector empty.
Examples 1:
Input: [0,1,1,0]
Output: 2
Explanation: [0,1,1,0] -> [0,0] -> []
Examples 2:
Input: [1,0,1,0]
Output: 3
Explanation: [1,0,1,0] -> [0,1,0] -> [0,0] -> [].
Examples 3:
Input: [1,1,1]
Output: 1
Explanation: [1,1,1] -> []
I am unsure of how to solve this question. I feel that we can use a greedy approach:
Remove all consecutive equal elements and increment the delete counter for each;
Remove elements of the form <a, b, c> where a==c and a!=b, because of we had multiple consecutive bs, it would have been deleted in step (1) above. Increment the delete counter once as we delete one b.
Repeat steps (1) and (2) as long as we can.
Increment delete counter once for each of the remaining elements in the vector.
But I am not sure if this would work. Could someone please confirm if this is the right approach? If not, how do we solve this?
Hint
You can simplify this problem greatly by noticing the following fact: a chain of consecutive zeros or ones can be shortened or lengthened without changing the final solution. By example, the two vectors have the same solution:
[1, 0, 1]
[1, 0, 0, 0, 0, 0, 0, 1]
With that in mind, the solution becomes simpler. So I encourage you to pause and try to figure it out!
Solution
With the previous remark, we can reduce the problem to vectors of alternating zeros and ones. In fact, since zero and one have no special meaning here, it suffices to solve for all such vector which start by... say a one.
[] # number of steps: 0
[1] # number of steps: 1
[1, 0] # number of steps: 2
[1, 0, 1] # number of steps: 2
[1, 0, 1, 0] # number of steps: 3
[1, 0, 1, 0, 1] # number of steps: 3
[1, 0, 1, 0, 1, 0] # number of steps: 4
[1, 0, 1, 0, 1, 0, 1] # number of steps: 4
We notice a pattern, the solution seems to be floor(n / 2) + 1 for n > 1 where n is the length of those sequences. But can we prove it..?
Proof
We will proceed by induction. Suppose you have a solution for a vector of length n - 2, then any move you do (except for deleting the two characters on the edges of the vector) will have the following result.
[..., 0, 1, 0, 1, 0 ...]
^------------ delete this one
Result:
[..., 0, 1, 1, 0, ...]
But we already mentioned that a chain of consecutive zeros or ones can be shortened or lengthened without changing the final solution. So the result of the deletion is in fact equivalent to now having to solve for:
[..., 0, 1, 0, ...]
What we did is one deletion in n elements and arrived to a case which is equivalent to having to solve for n - 2 elements. So the solution for a vector of size n is...
Solution(n) = Solution(n - 2) + 1
= [floor((n - 2) / 2) + 1] + 1
= floor(n / 2) + 1
Keeping in mind that the solutions for [1] and [1, 0] are respectively 1 and 2, this concludes our proof. Notice here, that [] turns out to be an edge case.
Interestingly enough, this proof also shows us that the optimal sequence of deletions for a given vector is highly non-unique. You can simply delete any block of ones or zeros, except for the first and last ones, and you will end up with an optimal solution.
Conclusion
In conclusion, given an arbitrary vector of ones and zeros, the smallest number of deletions you will need can be computed by counting the number of groups of consecutive ones or zeros. The answer is then floor(n / 2) + 1 for n > 1.
Just for fun, here is a Python implementation to solve this problem.
from itertools import groupby
def solution(vector):
n = 0
for group in groupby(vector):
n += 1
return n // 2 + 1 if n > 1 else n
Intuition: If we remove the subsegments of one integer, then all the remaining integers are of one type leads to only one operation.
Choosing the integer which is not the starting one to remove subsegments leads to optimal results.
Solution:
Take the integer other than the one that is starting as a flag.
Count the number of contiguous segments of the flag in a vector.
The answer will be the above count + 1(one operation for removing a segment of starting integer)
So, the answer is:
answer = Count of contiguous segments of flag + 1
Example 1:
[0,1,1,0]
flag = 1
Count of subsegments with flag = 1
So, answer = 1 + 1 = 2
Example 2:
[1,0,1,0]
flag = 0
Count of subsegments with flag = 2
So, answer = 2 + 1 = 3
Example 3:
[1,1,1]
flag = 0
Count of subsegments with flag = 0
So, answer = 0 + 1 = 1

permutations without repetition

I would like to know, what is the best approach to solve this problem:
Given x, y and y integers: a1, a2, a3 .. ay find all combinations of
a1 ± a2 ± ... ± ay = x, y < 20.
My recent approach is to find all permutations of 1 and 0 stored in table T and then, depending on whether number T[i] is 1 and 0, add or subtract ai from sum. The problem is that there are n! permutations of n-element array. Hence, for 20-element array, I have to check 20! possibilities where most of them are repeated. Could you please suggest me any potential approach to solving my problem?
There are only 2^20 (just over a million) binary vectors of length 20 rather than the infeasible 20!. Use should be able to brute-force that few in less than a second, especially if you use a Gray Code which would allow you to pass from one candidate sum to another in a single step (e.g. to go from a + b - c -d to a + b - c + d just add 2*d.
The excellent branch and bound idea of #MikeWise would be good if y gets much larger. Generate a tree starting with a root node of 0. Give it children of -a1 and +a1. Then 4 grand children by adding and subtracting a2, etc. If you ever get farther than the sum of the remaining ai from the target x -- you can prune that branch. In the worst case, this might be slightly worse than the Gray-code based brute force (because you need to do so much more processing at each node), but in the best case you might be able to prune away most possibilities.
On Edit: Here is some Python code. First I define a generator which, given an integer n, successively returns which bit position needs to flip to step through a Gray code:
def grayBit(n):
code = [0]*n
odd = True
done = False
while not done:
if odd:
code[0] = 1 - code[0] #flip bit
odd = False
yield 0
else:
i = code.index(1)
if i == n-1:
done = True
else:
code[i+1] = 1 - code[i+1]
odd = True
yield i+1
(This uses an algorithm which I learned years ago in the excellent book "Constructive Combinatorics" by Stanton and White).
Then -- I use this to return all solutions (as lists consisting of the input list of numbers with negative signs inserted as needed). The key point is that I can take the current bit-to-flip and either add or subtract twice the corresponding number:
def signedSums(nums, target):
n = len(nums)
patterns = []
total = sum(nums)
pattern = [1]*n
if target == total: patterns.append([x*y for x,y in zip(nums,pattern)])
deltas = [2*i for i in nums]
for i in grayBit(n):
if pattern[i] == 1:
total -= deltas[i]
else:
total += deltas[i]
pattern[i] = -1 * pattern[i]
if target == total: patterns.append([x*y for x,y in zip(nums,pattern)])
return patterns
Typical output:
>>> signedSums([1,2,3,4,5,9],6)
[[1, -2, -3, -4, 5, 9], [1, 2, 3, -4, -5, 9], [-1, 2, -3, 4, -5, 9], [1, 2, 3, 4, 5, -9]]
It only takes about a second to evaluate:
>>> len(signedSums([i for i in range(1,21)],100))
2865
Hence there are 2865 ways to add or subtract the integers in the range 1,2,..,20 to get a net sum of 100.
I assumed that a1 can be either added or subtracted (instead of just added, which is what your question implies if taken literally). Note that if you really want to insist that a1 occurs positively, then you could just subtract it from x and apply the above algorithm to the rest of the list and the adjusted target.
Finally, it is not too hard to see that if you solve the subset sub problem with the set of weights {2*a1, 2*a2, 2*a3, .... 2*ay} and with a target sum of x + a1 + a2 + ... + ay then the subsets selected will correspond exactly to the subsets where the positive signs occur in the solution to the original problem. Thus your problem is easily reducible to the subset-sum problem and it is thus NP-complete to determine if it has any solutions (and NP-hard to list them all).
We have conditions:
a1 ± a2 ± ... ± ay = x, y<20 [1]
First of all, I would generalize the condition [1], allowing all 'a' including 'a1' to be ±:
±a1 ± a2 ± ... ± ay = x [2]
If we have solution for [2], we can easily get solution for [1]
To solve [2] we can use the following approach:
combinations list x
| x == 0 && null list = [[]]
| null list = []
| otherwise = plusCombinations ++ minusCombinations where
a = head list
rest = tail list
plusCombinations = map (\c -> a:c) $ combinations rest (x-a)
minusCombinations = map (\c -> -a:c) $ combinations rest (x+a)
Explanation:
First condition checks if x reached zero and used all numbers from list. This means that solution found and we return single solution: [[]]
Second condition checks that list is empty and as far as x is not 0 this means that no solution can be found, returning empty solution: []
Third branch means that we can two alternatives: to use ai with '+' or with '-' so we concatenate plus and minus combinations
Example output:
*Main> combinations [1,2,3,4] 2
[[1,2,3,-4],[-1,2,-3,4]]
*Main> combinations [1,2,3,4] 3
[]
*Main> combinations [1,2,3,4] 4
[[1,2,-3,4],[-1,-2,3,4]]

Modifying the range of a uniform random number generator

I am given a function rand5() that generates, with a uniform distribution, a random integer in the closed interval [1,5]. How can I use rand5(), and nothing else, to create a function rand7(), which generates integers in [1,7] (again, uniformly distributed) ?
I searched stackoverflow, and found many similar questions, but not exactly like this one.
My initial attempt was rand5() + 0.5*rand5() + 0.5*rand5(). But this won't generate integers from 1 to 7 with uniform probability. Any answers, or links to answers, are very welcome.
Note that a prefect uniform distribution cannot be achieved with a bounded number of draw5() invocations, because for every k: 5^k % 7 != 0 - so you will always have some "spare" elements.
Here is a solution with unbounded number of draw5() uses:
Draw two numbers, x1,x2. There are 5*5=25 possible outcomes for this.
Note that 25/7 ~= 3.57. Chose 3*7=21 combinations, such that each combination will be mapped to one number in [1,7], for all other 4 numbers - redraw.
For example:
(1,1),(1,2),(2,1) : 1
(3,1),(1,3),(3,2): 2
(3,3),(1,4),(4,1): 3
(2,4),(4,2)(3,4): 4
(4,3), (4,4), (1,5): 5
(5,1), (2,5), (5,2) : 6
(5,3), (3,5), (4,5) : 7
(5,4),(5,5),(2,3), (2,2) : redraw
Here's a simple way:
Use rand5() to generate a sequence of three random integers from the set { 1, 2, 4, 5 } (i.e., throw away any 3 that is generated).
If all three numbers are in the set { 1, 2 }, discard the sequence and return to step 1.
For each number in the sequence, map { 1, 2} to 0 and { 4, 5 } to 1. Use these as the three bit values for a 3-bit number. Because the bits cannot all be 0, the number will be in the range [1, 7]. Because each bit is 0 or 1 with equal probability, the distribution over [1, 7] should be uniform.
ok I had to think about it for a while but it is actually not that hard. Imagine instead of rand5 you had rand2 which either outputs 0 or 1. You can make rand2 our of rand5 by simply doing
rand2() {
if(rand5() > 2.5) return 1
else return 0
}
now using rand2 multiple times do a tree to get rand7. For example if you start rand7 can be in [1,2,3,4,5,6,7] after a throw of rand2 which gives 0 you now subset to [1,2,3,4] and after another throw or rand2 which is 1 you subset to [3,4] and a final throw of 1 gives the output of rand7 to be 4. In general this tree trick can work to take a rand2 and map to randx where x is any integer.
Here's one meta-trick which comes in handy for lots of these problems: the bias is introduced when we treat the terms differently in some fashion, so if we treat them all the same at each step and perform operations only on the set, we'll stay out of trouble.
We have to call rand5() at least once (obviously!), but if we branch on that bad things happen unless we're clever. So instead let's call it once for each of the 7 possibilities:
In [126]: import random
In [127]: def r5():
.....: return random.randint(1, 5)
.....:
In [128]: [r5() for i in range(7)]
Out[128]: [3, 1, 3, 4, 1, 1, 2]
Clearly each of these terms was equally likely to be any of these numbers.. but only one of them happened to be 2, so if our rule had been "choose whichever term rand5() returns 2 for" then it would have worked. Or 4, or whatever, and if we simply looped long enough that would happen. So there are lots of way to come up with something that works. Here (in pseudocode -- this is terrible Python) is one way:
import random, collections
def r5():
return random.randint(1, 5)
def r7():
left = range(1, 8)
while True:
if len(left) == 1:
return left[0]
rs = [r5() for n in left]
m = max(rs)
how_many_at_max = rs.count(m)
if how_many_at_max == len(rs):
# all the same: try again
continue
elif how_many_at_max == 1:
# hooray!
return left[rs.index(m)]
# keep only the non-maximals
left = [l for l,r in zip(left, rs) if r != m]
which gives
In [189]: collections.Counter(r7() for _ in xrange(10**6))
Out[189]: Counter({7: 143570, 5: 143206, 4: 142827, 2: 142673, 6: 142604, 1: 142573, 3: 142547})

Code Golf: Countdown Number Game

Locked. This question and its answers are locked because the question is off-topic but has historical significance. It is not currently accepting new answers or interactions.
Challenge
Here is the task, inspired by the well-known British TV game show Countdown. The challenge should be pretty clear even without any knowledge of the game, but feel free to ask for clarifications.
And if you fancy seeing a clip of this game in action, check out this YouTube clip. It features the wonderful late Richard Whitely in 1997.
You are given 6 numbers, chosen at random from the set {1, 2, 3, 4, 5, 6, 8, 9, 10, 25, 50, 75, 100}, and a random target number between 100 and 999. The aim is to use the six given numbers and the four common arithmetic operations (addition, subtraction, multiplication, division; all over the rational numbers) to generate the target - or as close as possible either side. Each number may only be used once at most, while each arithmetic operator may be used any number of times (including zero.) Note that it does not matter how many numbers are used.
Write a function that takes the target number and set of 6 numbers (can be represented as list/collection/array/sequence) and returns the solution in any standard numerical notation (e.g. infix, prefix, postfix). The function must always return the closest-possible result to the target, and must run in at most 1 minute on a standard PC. Note that in the case where more than one solution exists, any single solution is sufficient.
Examples:
{50, 100, 4, 2, 2, 4}, target 203
e.g. 100 * 2 + 2 + (4 / 4) (exact)
e.g. (100 + 50) * 4 * 2 / (4 + 2) (exact)
{25, 4, 9, 2, 3, 10}, target 465
e.g. (25 + 10 - 4) * (9 * 2 - 3) (exact)
{9, 8, 10, 5, 9, 7}, target 241
e.g. ((10 + 9) * 9 * 7) + 8) / 5 (exact)
{3, 7, 6, 2, 1, 7}, target 824
e.g. ((7 * 3) - 1) * 6 - 2) * 7 (= 826; off by 2)
Rules
Other than mentioned in the problem statement, there are no further restrictions. You may write the function in any standard language (standard I/O is not necessary). The aim as always is to solve the task with the smallest number of characters of code.
Saying that, I may not simply accept the answer with the shortest code. I'll also be looking at elegance of the code and time complexity of the algorithm!
My Solution
I'm attempting an F# solution when I find the free time - will post it here when I have something!
Format
Please post all answers in the following format for the purpose of easy comparison:
Language
Number of characters: ???
Fully obfuscated function:
(code here)
Clear (ideally commented) function:
(code here)
Any notes on the algorithm/clever shortcuts it takes.
Python
Number of characters: 548 482 425 421 416 413 408
from operator import *
n=len
def C(N,T):
R=range(1<<n(N));M=[{}for i in R];p=1
for i in range(n(N)):M[1<<i][1.*N[i]]="%d"%N[i]
while p:
p=0
for i in R:
for j in R:
m=M[i|j];l=n(m)
if not i&j:m.update((f(x,y),"("+s+o+t+")")for(y,t)in M[j].items()if y for(x,s)in M[i].items() for(o,f)in zip('+-*/',(add,sub,mul,div)))
p|=l<n(m)
return min((abs(x-T),e)for t in M for(x,e)in t.items())[1]
you can call it like this:
>>> print C([50, 100, 4, 2, 2, 4], 203)
((((4+2)*(2+100))/4)+50)
Takes about half a minute on the given examples on an oldish PC.
Here's the commented version:
def countdown(N,T):
# M is a map: (bitmask of used input numbers -> (expression value -> expression text))
M=[{} for i in range(1<<len(N))]
# initialize M with single-number expressions
for i in range(len(N)):
M[1<<i][1.0*N[i]] = "%d" % N[i]
# allowed operators
ops = (("+",lambda x,y:x+y),("-",lambda x,y:x-y),("*",lambda x,y:x*y),("/",lambda x,y:x/y))
# enumerate all expressions
n=0
while 1:
# test to see if we're done (last iteration didn't change anything)
c=0
for x in M: c +=len(x)
if c==n: break
n=c
# loop over all values we have so far, indexed by bitmask of used input numbers
for i in range(len(M)):
for j in range(len(M)):
if i & j: continue # skip if both expressions used the same input number
for (x,s) in M[i].items():
for (y,t) in M[j].items():
if y: # avoid /0 (and +0,-0,*0 while we're at it)
for (o,f) in ops:
M[i|j][f(x,y)]="(%s%s%s)"%(s,o,t)
# pick best expression
L=[]
for t in M:
for(x,e) in t.items():
L+=[(abs(x-T),e)]
L.sort();return L[0][1]
It works through exhaustive enumeration of all possibilities. It is a bit smart in that if there are two expressions with the same value that use the same input numbers, it discards one of them. It is also smart in how it considers new combinations, using the index into M to prune quickly all the potential combinations that share input numbers.
Haskell
Number of characters: 361 350 338 322
Fully obfuscated function:
m=map
f=toRational
a%w=m(\(b,v)->(b,a:v))w
p[]=[];p(a:w)=(a,w):a%p w
q[]=[];q(a:w)=[((a,b),v)|(b,v)<-p w]++a%q w
z(o,p)(a,w)(b,v)=[(a`o`b,'(':w++p:v++")")|b/=0]
y=m z(zip[(-),(/),(+),(*)]"-/+*")++m flip(take 2 y)
r w=do{((a,b),v)<-q w;o<-y;c<-o a b;c:r(c:v)}
c t=snd.minimum.m(\a->(abs(fst a-f t),a)).r.m(\a->(f a,show a))
Clear function:
-- | add an element on to the front of the remainder list
onRemainder :: a -> [(b,[a])] -> [(b,[a])]
a`onRemainder`w = map (\(b,as)->(b,a:as)) w
-- | all ways to pick one item from a list, returns item and remainder of list
pick :: [a] -> [(a,[a])]
pick [] = []
pick (a:as) = (a,as) : a `onRemainder` (pick as)
-- | all ways to pick two items from a list, returns items and remainder of list
pick2 :: [a] -> [((a,a),[a])]
pick2 [] = []
pick2 (a:as) = [((a,b),cs) | (b,cs) <- pick as] ++ a `onRemainder` (pick2 as)
-- | a value, and how it was computed
type Item = (Rational, String)
-- | a specification of a binary operation
type OpSpec = (Rational -> Rational -> Rational, String)
-- | a binary operation on Items
type Op = Item -> Item -> Maybe Item
-- | turn an OpSpec into a operation
-- applies the operator to the values, and builds up an expression string
-- in this context there is no point to doing +0, -0, *0, or /0
combine :: OpSpec -> Op
combine (op,os) (ar,as) (br,bs)
| br == 0 = Nothing
| otherwise = Just (ar`op`br,"("++as++os++bs++")")
-- | the operators we can use
ops :: [Op]
ops = map combine [ ((+),"+"), ((-), "-"), ((*), "*"), ((/), "/") ]
++ map (flip . combine) [((-), "-"), ((/), "/")]
-- | recursive reduction of a list of items to a list of all possible values
-- includes values that don't use all the items, includes multiple copies of
-- some results
reduce :: [Item] -> [Item]
reduce is = do
((a,b),js) <- pick2 is
op <- ops
c <- maybe [] (:[]) $ op a b
c : reduce (c : js)
-- | convert a list of real numbers to a list of items
items :: (Real a, Show a) => [a] -> [Item]
items = map (\a -> (toRational a, show a))
-- | return the first reduction of a list of real numbers closest to some target
countDown:: (Real a, Show a) => a -> [a] -> Item
countDown t is = snd $ minimum $ map dist $ reduce $ items is
where dist is = (abs . subtract t' . fst $ is, is)
t' = toRational t
Any notes on the algorithm/clever shortcuts it takes:
In the golf'd version, z returns in the list monad, rather than Maybe as ops does.
While the algorithm here is brute force, it operates in small, fixed, linear space due to Haskell's laziness. I coded the wonderful #keith-randall algorithm, but it ran in about the same time and took over 1.5G of memory in Haskell.
reduce generates some answers multiple times, in order to easily include solutions with fewer terms.
In the golf'd version, y is defined partially in terms of itself.
Results are computed with Rational values. Golf'd code would be 17 characters shorter, and faster if computed with Double.
Notice how the function onRemainder factors out the structural similarity between pick and pick2.
Driver for golf'd version:
main = do
print $ c 203 [50, 100, 4, 2, 2, 4]
print $ c 465 [25, 4, 9, 2, 3, 10]
print $ c 241 [9, 8, 10, 5, 9, 7]
print $ c 824 [3, 7, 6, 2, 1, 7]
Run, with timing (still under one minute per result):
[1076] : time ./Countdown
(203 % 1,"(((((2*4)-2)/100)+4)*50)")
(465 % 1,"(((((10-4)*25)+2)*3)+9)")
(241 % 1,"(((((10*9)/5)+8)*9)+7)")
(826 % 1,"(((((3*7)-1)*6)-2)*7)")
real 2m24.213s
user 2m22.063s
sys 0m 0.913s
Ruby 1.9.2
Number of characters: 404
I give up for now, it works as long as there is an exact answer. If there isn't it takes way too long to enumerate all possibilities.
Fully Obfuscated
def b a,o,c,p,r
o+c==2*p ?r<<a :o<p ?b(a+['('],o+1,c,p,r):0;c<o ?b(a+[')'],o,c+1,p,r):0
end
w=a=%w{+ - * /}
4.times{w=w.product a}
b [],0,0,3,g=[]
*n,l=$<.read.split.map(&:to_f)
h={}
catch(0){w.product(g).each{|c,f|k=f.zip(c.flatten).each{|o|o.reverse! if o[0]=='('};n.permutation{|m|h[x=eval(d=m.zip(k)*'')]=d;throw 0 if x==l}}}
c=h[k=h.keys.min_by{|i|(i-l).abs}]
puts c.gsub(/(\d*)\.\d*/,'\1')+"=#{k}"
Decoded
Coming soon
Test script
#!/usr/bin/env ruby
[
[[50,100,4,2,2,4],203],
[[25,4,9,2,3,10],465],
[[9,8,10,5,9,7],241],
[[3,7,6,2,1,7],824]
].each do |b|
start = Time.now
puts "{[#{b[0]*', '}] #{b[1]}} gives #{`echo "#{b[0]*' '} #{b[1]}" | ruby count-golf.rb`.strip} in #{Time.now-start}"
end
Output
→ ./test.rb
{[50, 100, 4, 2, 2, 4] 203} gives 100+(4+(50-(2)/4)*2)=203.0 in 3.968534736
{[25, 4, 9, 2, 3, 10] 465} gives 2+(3+(25+(9)*10)*4)=465.0 in 1.430715549
{[9, 8, 10, 5, 9, 7] 241} gives 5+(9+(8)+10)*9-(7)=241.0 in 1.20045702
{[3, 7, 6, 2, 1, 7] 824} gives 7*(6*(7*(3)-1)-2)=826.0 in 193.040054095
Details
The function used for generating the bracket pairs (b) is based off this one: Finding all combinations of well-formed brackets
Ruby 1.9.2 second attempt
Number of characters: 492 440(426)
Again there is a problem with the non-exact answer. This time this is easily fast enough but for some reason the closest it gets to 824 is 819 instead of 826.
I decided to put this in a new answer since it is using a very different method to my last attempt.
Removing the total of the output (as its not required by spec) is -14 characters.
Fully Obfuscated
def r d,c;d>4?[0]:(k=c.pop;a=[];r(d+1,c).each{|b|a<<[b,k,nil];a<<[nil,k,b]};a)end
def f t,n;[0,2].each{|a|Array===t[a] ?f(t[a],n): t[a]=n.pop}end
def d t;Float===t ?t:d(t[0]).send(t[1],d(t[2]))end
def o c;Float===c ?c.round: "(#{o c[0]}#{c[1]}#{o c[2]})"end
w=a=%w{+ - * /}
4.times{w=w.product a}
*n,l=$<.each(' ').map(&:to_f)
h={}
w.each{|y|r(0,y.flatten).each{|t|f t,n.dup;h[d t]=o t}}
puts h[k=h.keys.min_by{|i|(l-i).abs}]+"=#{k.round}"
Decoded
Coming soon
Test script
#!/usr/bin/env ruby
[
[[50,100,4,2,2,4],203],
[[25,4,9,2,3,10],465],
[[9,8,10,5,9,7],241],
[[3,7,6,2,1,7],824]
].each do |b|
start = Time.now
puts "{[#{b[0]*', '}] #{b[1]}} gives #{`echo "#{b[0]*' '} #{b[1]}" | ruby count-golf.rb`.strip} in #{Time.now-start}"
end
Output
→ ./test.rb
{[50, 100, 4, 2, 2, 4] 203} gives ((4-((2-(2*4))/100))*50)=203 in 1.089726252
{[25, 4, 9, 2, 3, 10] 465} gives ((10*(((3+2)*9)+4))-25)=465 in 1.039455671
{[9, 8, 10, 5, 9, 7] 241} gives (7+(((9/(5/10))+8)*9))=241 in 1.045774539
{[3, 7, 6, 2, 1, 7] 824} gives ((((7-(1/2))*6)*7)*3)=819 in 1.012330419
Details
This constructs the set of ternary trees representing all possible combinations of 5 operators. It then goes through and inserts all permutations of the input numbers into the leaves of these trees. Finally it simply iterates through these possible equations storing them into a hash with the result as index. Then it's easy enough to pick the closest value to the required answer from the hash and display it.

How can you compare to what extent two lists are in the same order?

I have two arrays containing the same elements, but in different orders, and I want to know the extent to which their orders differ.
The method I tried, didn't work. it was as follows:
For each list I built a matrix which recorded for each pair of elements whether they were above or below each other in the list. I then calculated a pearson correlation coefficient of these two matrices. This worked extremely badly. Here's a trivial example:
list 1:
1
2
3
4
list 2:
1
3
2
4
The method I described above produced matrices like this (where 1 means the row number is higher than the column, and 0 vice-versa):
list 1:
1 2 3 4
1 1 1 1
2 1 1
3 1
4
list 2:
1 2 3 4
1 1 1 1
2 0 1
3 1
4
Since the only difference is the order of elements 2 and 3, these should be deemed to be very similar. The Pearson Correlation Coefficient for those two matrices is 0, suggesting they are not correlated at all. I guess the problem is that what I'm looking for is not really a correlation coefficient, but some other kind of similarity measure. Edit distance, perhaps?
Can anyone suggest anything better?
Mean square of differences of indices of each element.
List 1: A B C D E
List 2: A D C B E
Indices of each element of List 1 in List 2 (zero based)
A B C D E
0 3 2 1 4
Indices of each element of List 1 in List 1 (zero based)
A B C D E
0 1 2 3 4
Differences:
A B C D E
0 -2 0 2 0
Square of differences:
A B C D E
4 4
Average differentness = 8 / 5.
Just an idea, but is there any mileage in adapting a standard sort algorithm to count the number of swap operations needed to transform list1 into list2?
I think that defining the compare function may be difficult though (perhaps even just as difficult as the original problem!), and this may be inefficient.
edit: thinking about this a bit more, the compare function would essentially be defined by the target list itself. So for example if list 2 is:
1 4 6 5 3
...then the compare function should result in 1 < 4 < 6 < 5 < 3 (and return equality where entries are equal).
Then the swap function just needs to be extended to count the swap operations.
A bit late for the party here, but just for the record, I think Ben almost had it... if you'd looked further into correlation coefficients, I think you'd have found that Spearman's rank correlation coefficient might have been the way to go.
Interestingly, jamesh seems to have derived a similar measure, but not normalized.
See this recent SO answer.
You might consider how many changes it takes to transform one string into another (which I guess it was you were getting at when you mentioned edit distance).
See: http://en.wikipedia.org/wiki/Levenshtein_distance
Although I don't think l-distance takes into account rotation. If you allow rotation as an operation then:
1, 2, 3, 4
and
2, 3, 4, 1
Are pretty similar.
There is a branch-and-bound algorithm that should work for any set of operators you like. It may not be real fast. The pseudocode goes something like this:
bool bounded_recursive_compare_routine(int* a, int* b, int level, int bound){
if (level > bound) return false;
// if at end of a and b, return true
// apply rule 0, like no-change
if (*a == *b){
bounded_recursive_compare_routine(a+1, b+1, level+0, bound);
// if it returns true, return true;
}
// if can apply rule 1, like rotation, to b, try that and recur
bounded_recursive_compare_routine(a+1, b+1, level+cost_of_rotation, bound);
// if it returns true, return true;
...
return false;
}
int get_minimum_cost(int* a, int* b){
int bound;
for (bound=0; ; bound++){
if (bounded_recursive_compare_routine(a, b, 0, bound)) break;
}
return bound;
}
The time it takes is roughly exponential in the answer, because it is dominated by the last bound that works.
Added: This can be extended to find the nearest-matching string stored in a trie. I did that years ago in a spelling-correction algorithm.
I'm not sure exactly what formula it uses under the hood, but difflib.SequenceMatcher.ratio() does exactly this:
ratio(self) method of difflib.SequenceMatcher instance:
Return a measure of the sequences' similarity (float in [0,1]).
Code example:
from difflib import SequenceMatcher
sm = SequenceMatcher(None, '1234', '1324')
print sm.ratio()
>>> 0.75
Another approach that is based on a little bit of mathematics is to count the number of inversions to convert one of the arrays into the other one. An inversion is the exchange of two neighboring array elements. In ruby it is done like this:
# extend class array by new method
class Array
def dist(other)
raise 'can calculate distance only to array with same length' if length != other.length
# initialize count of inversions to 0
count = 0
# loop over all pairs of indices i, j with i<j
length.times do |i|
(i+1).upto(length) do |j|
# increase count if i-th and j-th element have different order
count += 1 if (self[i] <=> self[j]) != (other[i] <=> other[j])
end
end
return count
end
end
l1 = [1, 2, 3, 4]
l2 = [1, 3, 2, 4]
# try an example (prints 1)
puts l1.dist(l2)
The distance between two arrays of length n can be between 0 (they are the same) and n*(n+1)/2 (reversing the first array one gets the second). If you prefer to have distances always between 0 and 1 to be able to compare distances of pairs of arrays of different length, just divide by n*(n+1)/2.
A disadvantage of this algorithms is it running time of n^2. It also assumes that the arrays don't have double entries, but it could be adapted.
A remark about the code line "count += 1 if ...": the count is increased only if either the i-th element of the first list is smaller than its j-th element and the i-th element of the second list is bigger than its j-th element or vice versa (meaning that the i-th element of the first list is bigger than its j-th element and the i-th element of the second list is smaller than its j-th element). In short: (l1[i] < l1[j] and l2[i] > l2[j]) or (l1[i] > l1[j] and l2[i] < l2[j])
If one has two orders one should look at two important ranking correlation coefficients:
Spearman's rank correlation coefficient: https://en.wikipedia.org/wiki/Spearman%27s_rank_correlation_coefficient
This is almost the same as Jamesh answer but scaled in the range -1 to 1.
It is defined as:
1 - ( 6 * sum_of_squared_distances ) / ( n_samples * (n_samples**2 - 1 )
Kendalls tau: https://nl.wikipedia.org/wiki/Kendalls_tau
When using python one could use:
from scipy import stats
order1 = [ 1, 2, 3, 4]
order2 = [ 1, 3, 2, 4]
print stats.spearmanr(order1, order2)[0]
>> 0.8000
print stats.kendalltau(order1, order2)[0]
>> 0.6667
if anyone is using R language, I've implemented a function that computes the "spearman rank correlation coefficient" using the method described above by #bubake here:
get_spearman_coef <- function(objectA, objectB) {
#getting the spearman rho rank test
spearman_data <- data.frame(listA = objectA, listB = objectB)
spearman_data$rankA <- 1:nrow(spearman_data)
rankB <- c()
for (index_valueA in 1:nrow(spearman_data)) {
for (index_valueB in 1:nrow(spearman_data)) {
if (spearman_data$listA[index_valueA] == spearman_data$listB[index_valueB]) {
rankB <- append(rankB, index_valueB)
}
}
}
spearman_data$rankB <- rankB
spearman_data$distance <-(spearman_data$rankA - spearman_data$rankB)**2
spearman <- 1 - ( (6 * sum(spearman_data$distance)) / (nrow(spearman_data) * ( nrow(spearman_data)**2 -1) ) )
print(paste("spearman's rank correlation coefficient"))
return( spearman)
}
results :
get_spearman_coef(c("a","b","c","d","e"), c("a","b","c","d","e"))
spearman's rank correlation coefficient: 1
get_spearman_coef(c("a","b","c","d","e"), c("b","a","d","c","e"))
spearman's rank correlation coefficient: 0.9

Resources