Related
I need an algorithm that produces a partition of the number n into k parts with the added restrictions that each element of the partition must be between a and b. Ideally, all possible partitions satisfying the restrictions should be equally likely. Partitions are considered the same if they have the same elements in different order.
For example, with n=10, k=3, a=2, b=4 one has only {4,4,2} and {4,3,3} as possible outcomes.
Is there a standard algorithm for such a problem? One can assume that at least one partition satisfying the restrictions always exists.
You can implement this as a recursive algorithm. Basically, the recurrence is like this:
if k == 1 and a <= n <= b, then the only partition is [n], otherwise none
otherwise, combine all the elements x from a to b with all the partitions for n-x, k-1
to prevent duplicates, also substitute the lower bound a with x
Here's some Python (aka executable pseudo-code):
def partitions(n, k, a, b):
if k == 1 and a <= n <= b:
yield [n]
elif n > 0 and k > 0:
for x in range(a, b+1):
for p in partitions(n-x, k-1, x, b):
yield [x] + p
print(list(partitions(10, 3, 2, 4)))
# [[2, 4, 4], [3, 3, 4]]
This could be further improved by checking (k-1)*a and (k-1)*b for the lower and upper bounds for the remaining elements, respectively, and restricting the range for x accordingly:
min_x = max(a, n - (k-1) * b)
max_x = min(b, n - (k-1) * a)
for x in range(min_x, max_x+1):
For partitions(110, 12, 3, 12) with 3,157 solutions, this reduces the number of recursive calls from 638,679 down to 24,135.
Here's a sampling algorithm that uses conditional probability.
import collections
import random
countmemo = {}
def count(n, k, a, b):
assert n >= 0
assert k >= 0
assert a >= 0
assert b >= 0
if k == 0:
return 1 if n == 0 else 0
key = (n, k, a, b)
if key not in countmemo:
countmemo[key] = sum(
count(n - c, k - 1, a, c) for c in range(a, min(n, b) + 1))
return countmemo[key]
def sample(n, k, a, b):
partition = []
x = random.randrange(count(n, k, a, b))
while k > 0:
for c in range(a, min(n, b) + 1):
y = count(n - c, k - 1, a, c)
if x < y:
partition.append(c)
n -= c
k -= 1
b = c
break
x -= y
else:
assert False
return partition
def test():
print(collections.Counter(
tuple(sample(20, 6, 2, 5)) for i in range(10000)))
if __name__ == '__main__':
test()
If k and b - a are not too big you can try a randomized depth-first search:
import random
def restricted_partition_rec(n, k, min, max):
if k <= 0 or n < min:
return []
ps = list(range(min, max + 1))
random.shuffle(ps)
for p in ps:
if p > n:
continue
elif p < n:
subp = restricted_partition(n - p, k - 1, min, max)
if subp:
return [p] + subp
elif k == 1:
return [p]
return []
def restricted_partition(n, k, min, max):
return sorted(restricted_partition_rec(n, k, min, max), reverse=True)
print(restricted_partition(10, 3, 2, 4))
>>>
[4, 4, 2]
Although I'm not sure if all the partitions have exactly the same probability in this case.
We have K different sets of numbers. We have to choose a number from each set, so that the difference between the higher and the lower number is the minimum.
Any ideas?
Something like this (written in Haskell)?
import Data.List (minimum, maximum, minimumBy)
minDiff (x:xs) = comb (head x) (diff $ matches (head x)) x where
lenxs = length xs
diff m = maximum m - minimum m
matches y = minimumBy (\a b -> compare (diff a) (diff b)) $ p [] 0 where
md = map (minimumBy (\a b -> compare (abs (a - y)) (abs (b - y)))) xs
mds = [m | m <- foldl (\b a -> filter (\z -> abs (z - y) == abs (y - md!!a)) (xs!!a) : b) [] [0..lenxs - 1]]
p result index
| index == lenxs = [y:result]
| otherwise = do
p' <- mds!!index
p (p':result) (index + 1)
comb result difference [] = matches result
comb result difference (z:zs) =
let diff' = diff (matches z)
in if diff' < difference
then comb z diff' zs
else comb result difference zs
OUTPUT:
*Main> minDiff [[1,3,5,9,10],[2,4,6,8],[7,11,12,13]]
[5,6,7]
Evaluating the following integral should be non-zero, and mathematica correctly gives a non-zero result
Integrate[ Cos[ (Pi * x)/2 ]^2 * Cos[ (3*Pi*x)/2 ]^2, {x, -1, 1}]
However, attempting a more general integral:
FullSimplify[
Integrate[Cos[(Pi x)/2]^2 Cos[((2 n + 1) Pi x)/2] Cos[((2 m + 1) Pi x)/2],
{x, -1, 1}],
Element[{m, n}, Integers]]
yields zero, which is definitely not true for m = n = 1
I'd expect a conditional expression. Is it possible to "tell" mathematica about my constraints on m and n before the integral is evaluated so that it handles the special cases properly?
While I'm late to the party, no one has given a complete solution, thus far.
Sometimes, it pays to understand the integrand better before you integrate. Consider,
ef = TrigReduce[
Cos[(Pi x)/2]^2 Cos[((2 n + 1) Pi x)/2] Cos[((2 m + 1) Pi x)/2]]/.
Cos[a_] :> Cos[ Simplify[a, Element[{m,n}, Integers] ] ]
which returns
(2 Cos[(m - n) Pi x] + Cos[(1 + m - n) Pi x] + Cos[(1 - m + n) Pi x] +
Cos[(m + n) Pi x] + 2 Cos[(1 + m + n) Pi x] + Cos[(2 + m + n) Pi x] )/8
where each term has the form Cos[q Pi x] with integral q. Now, there are two cases to consider when integrating Cos[q Pi x] over -1 to 1 (where q is integral): q == 0 and q != 0.
Case q = 0: This is a special case that Mathematica misses in the general result, as it implies a constant integrand. (I'll often miss it, also, when doing this by hand, so Mathematica isn't entirely to blame.) So, the integral is 2, in this case.
Strictly speaking, this isn't true. When told to integrate Cos[ q Pi x ] over -1 < x < 1, Mathematica returns
2 Sin[ Pi q ]/( Pi q )
which is 0 except when q == 0. At that point, the function is undefined in the strict sense, but Limit[Sin[x]/x, q -> 0] == 1. As the singularity at q == 0 is removable, the integral is 2 when q -> 0. So, Mathematica does not miss it, it is just in a form not immediately recognized.
Case q != 0: Since Cos[Pi x] is periodic with period 2, an integral of Cos[q Pi x] from x == -1 to x == 1 will always be over q periods. In other words,
Integrate[ Cos[q Pi x], {x, -1, 1},
Assumptions -> (Element[ q, Integers ] && q != 0) ] == 0
Taken together, this means
Integrate[ Cos[q Pi x], {x, -1, 1}, Assumptions -> Element[ q, Integers ] ] ==
Piecewise[{{ q == 0, 2 }, { 0, q!=0 }}]
Using this, we can integrate the expanded form of the integrand via
intef = ef /. Cos[q_ Pi x] :> Piecewise[{{2, q == 0}, {0, q != 0}}] //
PiecewiseExpand
which admits non-integral solutions. To clean that up, we need to reduce the conditions to only those that have integral solutions, and we might as well simplify as we go:
(Piecewise[{#1,
LogicalExpand[Reduce[#2 , {m, n}, Integers]] //
Simplify[#] &} & ### #1, #2] & ## intef) /. C[1] -> m
\begin{Edit}
To limit confusion, internally Piecewise has the structure
{ { { value, condition } .. }, default }
In using Apply (##), the condition list is the first parameter and the default is the second. To process this, I need to simplify the condition for each value, so then I use the second short form of Apply (###) on the condition list so that for each value-condition pair I get
{ value, simplified condition }
The simplification process uses Reduce to restrict the conditions to integers, LogicalExpand to help eliminate redundancy, and Simplify to limit the number of terms. Reduce internally uses the arbitrary constant, C[1], which it sets as C[1] == m, so we set C[1] back to m to complete the simplification
\end{Edit}
which gives
Piecewise[{
{3/4, (1 + n == 0 || n == 0) && (1 + m == 0 || m == 0)},
{1/2, Element[m, Integers] &&
(n == m || (1 + m + n == 0 && (m <= -2 || m >= 1)))},
{1/4, (n == 1 + m || (1 + n == m && (m <= -1 || m >= 1)) ||
(m + n == 0 && (m >= 1 || m <= 0)) ||
(2 + m + n == 0 && (m <= -1 || m >= 0))) &&
Element[m, Integers]},
{0, True}
}
as the complete solution.
Another Edit: I should point out that both the 1/2 and 1/4 cases include the values for m and n in the 3/4 case. It appears that the 3/4 case may be the intersection of the other two, and, hence, their sum. (I have not done the calc out, but I strongly suspect it is true.) Piecewise evaluates the conditions in order (I think), so there is no chance of getting this incorrect.
Edit, again: The simplification of the Piecewise object is not as efficient as it could be. At issue is the placement of the replacement rule C[1] -> m. It happens to late in the process for Simplify to make use of it. But, if it is brought inside the LogicalExpand and assumptions are added to Simplify
(Piecewise[{#1,
LogicalExpand[Reduce[#2 , {m, n}, Integers] /. C[1] -> m] //
Simplify[#, {m, n} \[Element] Integers] &} & ### #1, #2] & ## intef)
then a much cleaner result is produce
Piecewise[{
{3/4, -2 < m < 1 && -2 < n < 1},
{1/2, (1 + m + n == 0 && (m >= 1 || m <= -2)) || m == n},
{1/4, 2 + m + n == 0 || (m == 1 + n && m != 0) || m + n == 0 || 1 + m == n},
{0, True}
}]
Not always zero ...
k = Integrate[
Cos[(Pi x)/2]^2 Cos[((2 (n) + 1) Pi x)/2] Cos[((2 m + 1) Pi x)/ 2],
{x, -1, 1}, Assumptions -> Element[{m, n}, Integers]];
(*Let's find the zeroes of the denominator *)
d = Denominator[k];
s = Solve[d == 0, {m, n}]
(*The above integral is indeterminate at those zeroes, so let's compute
the integral again there (a Limit[] could also do the work) *)
denZ = Integrate[
Cos[(Pi x)/2]^2 Cos[((2 (n) + 1) Pi x)/2] Cos[((2 m + 1) Pi x)/ 2] /.s,
{x, -1, 1}, Assumptions -> Element[{m, n}, Integers]];
(* All possible results are generated with m=1 *)
denZ /. m -> 1
(*
{1/4, 1/2, 1/4, 1/4, 1/2, 1/4}
*)
Visualizing those cases:
Plot[Cos[(Pi x)/2]^2 Cos[((2 (n) + 1) Pi x)/2] Cos[((2 m + 1) Pi x)/2]
/. s /. m -> 1, {x, -1, 1}]
Compare with a zero result integral one:
Plot[Cos[(Pi x)/2]^2 Cos[((2 (n) + 1) Pi x)/2] Cos[((2 m + 1) Pi x)/ 2]
/. {m -> 1, n -> 4}, {x, -1, 1}]
If you just drop the whole FullSimplify part, mathematica does the integration neatly for you.
Integrate[
Cos[(Pi x)/2]^2 Cos[((2 n + 1) Pi x)/2] Cos[((2 m + 1) Pi x)/
2], {x, -1, 1}]
To include the condition that m and n are integers, it's better to use the Assumptions option in Integrate.
Integrate[
Cos[(Pi x)/2]^2 Cos[((2 n + 1) Pi x)/2] Cos[((2 m + 1) Pi x)/
2], {x, -1, 1}, Assumptions -> Element[{m, n}, Integers]]
Lets use some conclusive conditions about the two integers m=n||m!=n.
Assuming[{(n \[Element] Integers && m \[Element] Integers && m == n)},
Integrate[Cos[(Pi x)/2]^2 Cos[((2 n + 1) Pi x)/2] Cos[((2 m + 1) Pi x)/2],
{x, -1, 1}]]
The answer for this case is 1/2. For the other case it is
Assuming[{(n \[Element] Integers && m \[Element] Integers && m != n)},
Integrate[
Cos[(Pi x)/2]^2 Cos[((2 n + 1) Pi x)/2] Cos[((2 m + 1) Pi x)/
2], {x, -1, 1}]]
and the answer is 0.
However I am amazed to see that if we add this two conditions as an "either or stuff", Mathematica returns one zero after integration. I mean in case of the following I am getting only zero but not ``1/2||0`.
Assuming[{(n \[Element] Integers && m \[Element] Integers &&
m == n) || (n \[Element] Integers && m \[Element] Integers &&
m != n)},
Integrate[
Cos[(Pi x)/2]^2 Cos[((2 n + 1) Pi x)/2] Cos[((2 m + 1) Pi x)/
2], {x, -1, 1}]]
By the way we can see the conditions exclusively where this integral becomes Indeterminate.
res = Integrate[
Cos[(Pi x)/2]^2 Cos[((2 n + 1) Pi x)/2] Cos[((2 m + 1) Pi x)/
2], {x, -1, 1}] // Simplify
The output is here.
Now lets see all the relations m and n can have to make the Integral bad!
BadPart = (res*4 Pi);
Flatten#(Solve[(Denominator[#] == 0), m] & /#
Table[BadPart[[i]], {i, 1, Length#BadPart}] /.
Rule -> Equal) // TableForm
So these are the special cases which as Sjoerd mentioned are having infinite instances.
BR
Let's say, we have a list/an array of positive integers x1, x2, ... , xn.
We can do a join operation on this sequence, that means that we can replace two elements that are next to each other with one element, which is sum of these elements. For example:
-> array/list: [1;2;3;4;5;6]
we can join 2 and 3, and replace them with 5;
we can join 5 and 6, and replace them with 11;
we cannot join 2 and 4;
we cannot join 1 and 3 etc.
Main problem is to find minimum join operations for given sequence, after which this sequence will be sorted in increasing order.
Note: empty and one-element sequences are sorted in increasing order.
Basic examples:
for [4; 6; 5; 3; 9] solution is 1 (we join 5 and 3)
for [1; 3; 6; 5] solution is also 1 (we join 6 and 5)
What I am looking for, is an algorithm that solve this problem. It could be in pseudocode, C, C++, PHP, OCaml or similar (I mean: I would understand solution, if You wrote solution in one of these languages).
This is an ideal problem to solve using Dynamic Programming, and the recurrence described by #lijie is exactly the right approach, with a few minor tweaks to ensure all possibilities are considered. There are two key observations: (a) Any sequence of join operations results in a set of non-overlapping summed subsequences of the original vector, and (b) For the optimal join-sequence, if we look to the right of any summed subsequence (m...n), that portion is an optimal solution to the problem: "find an optimal join-sequence for the sub-vector (n+1)...N such that the resulting final sequence is sorted, and all elements are >= sum(m...n).
Implementing the recurrence directly would of course result in an exponential time algorithm, but a simple tweak using Dynamic Programming makes it O(N^2), because essentially all (m,n) pairs are considered once. An easy way to implement the recurrence using Dynamic Programming is to have a data-structure indexed by (m,n) that stores the results of f(m,n) once they are computed, so that the next time we invoke f(m,n), we can lookup the previously saved results. The following code does this using the R programming language. I am using the formulation where we want to find the min-number of joins to get a non-decreasing sequence. For those new to R, to test this code, simply download R from any mirror (Google "R Project"), fire it up, and paste the two function definitions (f and solve) into the console, and then solve any vector using "solve(c(...))" as in the examples below.
f <- function(m,n) {
name <- paste(m,n)
nCalls <<- nCalls + 1
# use <<- for global assignment
if( !is.null( Saved[[ name ]] ) ) {
# the solution for (m,n) has been cached, look it up
nCached <<- nCached + 1
return( Saved[[ name ]] )
}
N <- length(vec) # vec is global to this function
sum.mn <- -Inf
if(m >= 1)
sum.mn <- sum( vec[m:n] )
if(n == N) { # boundary case: the (m,n) range includes the last number
result <- list( num = 0, joins = list(), seq = c())
} else
{
bestNum <- Inf
bestJoins <- list()
bestSeq <- c()
for( k in (n+1):N ) {
sum.nk <- sum( vec[ (n+1):k ] )
if( sum.nk < sum.mn ) next
joinRest <- f( n+1, k )
numJoins <- joinRest$num + k-n-1
if( numJoins < bestNum ) {
bestNum <- numJoins
if( k == n+1 )
bestJoins <- joinRest$joins else
bestJoins <- c( list(c(n+1,k)), joinRest$joins )
bestSeq <- c( sum.nk, joinRest$seq)
}
}
result <- list( num = bestNum, joins = bestJoins, seq = bestSeq )
}
Saved[[ name ]] <<- result
result
}
solve <- function(input) {
vec <<- input
nCalls <<- 0
nCached <<- 0
Saved <<- c()
result <- f(0,0)
cat( 'Num calls to f = ', nCalls, ', Cached = ', nCached, '\n')
cat( 'Min joins = ', result$num, '\n')
cat( 'Opt summed subsequences: ')
cat( do.call( paste,
lapply(result$joins,
function(pair) paste(pair[1], pair[2], sep=':' ))),
'\n')
cat( 'Final Sequence: ', result$seq, '\n' )
}
Here are some sample runs:
> solve(c(2,8,2,2,9,12))
Num calls to f = 22 , Cached = 4
Min joins = 2
Opt summed subsequences: 2:3 4:5
Final Sequence: 2 10 11 12
> solve(c(1,1,1,1,1))
Num calls to f = 19 , Cached = 3
Min joins = 0
Opt summed subsequences:
Final Sequence: 1 1 1 1 1
> solve(c(4,3,10,11))
Num calls to f = 10 , Cached = 0
Min joins = 1
Opt summed subsequences: 1:2
Final Sequence: 7 10 11
> solve(c (2, 8, 2, 2, 8, 3, 8, 9, 9, 2, 9, 8, 8, 7, 4, 2, 7, 5, 9, 4, 6, 7, 4, 7, 3, 4, 7, 9, 1, 2, 5, 1, 8, 7, 3, 3, 6, 3, 8, 5, 6, 5))
Num calls to f = 3982 , Cached = 3225
Min joins = 30
Opt summed subsequences: 2:3 4:5 6:7 8:9 10:12 13:16 17:19 20:23 24:27 28:33 34:42
Final Sequence: 2 10 10 11 18 19 21 21 21 21 26 46
Note that the min number of joins for the sequence considered by #kotlinski is 30, not 32 or 33.
Greedy algorithm!
import Data.List (inits)
joinSequence :: (Num a, Ord a) => [a] -> Int
joinSequence (x:xs) = joinWithMin 0 x xs
where joinWithMin k _ [] = k
joinWithMin k x xs =
case dropWhile ((< x) . snd) $ zip [0..] $ scanl1 (+) xs
of (l, y):_ -> joinWithMin (k + l) y $ drop (l+1) xs
_ -> k + length xs
joinSequence _ = 0
At each step, grab more elements until their sum is not less than the last. If you run out of elements, just join all the ones that remain to the prior group.
That was wrong.
Combinatorial explosion!
joinSequence :: (Num a, Ord a) => [a] -> Int
joinSequence = joinWithMin 0 0
where joinWithMin k _ [] = k
joinWithMin k m xs =
case dropWhile ((< m) . snd) $ zip [0..] $ scanl1 (+) xs
of [] -> k + length xs
ys -> minimum [ joinWithMin (k+l) y $ drop (l+1) xs
| (l, y) <- ys ]
Just try every possible joining and take the minimum. I couldn't think of a smart heuristic to limit backtracking, but this should be O(n²) with dynamic programming, and O(2n) as written.
A dynamic programming approach:
Let the original array be a[i], 0 <= i < N.
Define f(m, n) to be the minimum number of joins needed to make a[n..N-1] sorted, such that all elements in the sorted sublist are > (or >=, if another variant is desired) the sum of a[m..n-1] (let the sum of an empty list to be -inf).
The base case is f(m, N) = 0 (the sublist is empty).
The recursion is f(m, n) = min_{n < k <= N s.t. sum(a[n..k-1]) > sum(a[m..n-1])} f(n, k) + k-n-1. If no values of k are suitable, then let f(m, n) = inf (anything >= N will also work, because there are at most N-1 joins).
Calculate f(m,n) in decreasing order of m and n.
Then, the desired answer is f(0,0).
EDIT
Oops this is basically ephemient's second answer, I believe, although I am not familiar enough with Haskell to know exactly what it is doing.
Some Haskell code:
sortJoin (a:b:c:xs)
| a <= b = a : sortJoin (b:c:xs)
| a+b <= c = a+b : sortJoin (c:xs)
| otherwise = sortJoin (a:b+c:xs)
sortJoin (a:b:[]) = if a <= b then [a,b] else [a+b]
sortJoin a#_ = a
edits xs = length xs - length (sortJoin xs)
UPDATE: Made this work with test = [2, 8, 2, 2, 8, 3, 8, 9, 9, 2, 9, 8, 8, 7, 4, 2, 7, 5, 9, 4, 6, 7, 4, 7, 3, 4, 7, 9, 1, 2, 5, 1, 8, 7, 3, 3, 6, 3, 8, 5, 6, 5]
...now we get:
> sortJoin test
[2,8,12,20,20,23,27,28,31,55]
> edits test
32
Hopefully keeping it simple. Here's some pseudo-code that's exponential time.
Function "join" (list, max-join-count, join-count) ->
Fail if join-count is greater than max-join-count.
If the list looks sorted return join-count.
For Each number In List
Recur (list with current and next number joined, max-join-count, join-count + 1)
Function "best-join" (list) ->
max-join-count = 0
while not join (list, max-join-count++)
Here's an implementation on Clojure:
(defn join-ahead [f i v]
(concat (take i v)
[(f (nth v i) (nth v (inc i)))]
(drop (+ 2 i) v)))
(defn sort-by-joining
"Sort a list by joining neighboring elements with `+'"
([v max-join-count join-count]
(if (or (nil? max-join-count)
(<= join-count max-join-count))
(if (or (empty? v)
(= v (sort v)))
{:vector v :join-count join-count}
(loop [i 0]
(when (< (inc i) (count v))
(let [r (sort-by-joining (join-ahead + i v)
max-join-count
(inc join-count))]
(or r (recur (inc i)))))))))
([v max-join-count]
(sort-by-joining v max-join-count 0))
([v]
(sort-by-joining v nil 0)))
(defn fewest-joins [v]
(loop [i 0]
(if (sort-by-joining v i)
i
(recur (inc i)))))
(deftest test-fewest-joins
(is (= 0 (fewest-joins nil)))
(is (= 1 (fewest-joins [4 6 5 3 9])))
(is (= 6 (fewest-joins [1 9 22 90 1 1 1 32 78 13 1]))))
This is pchalasani code in F# with some modifications. The memoization is similar, I added a sumRange function generator for sums in O(1) time and moved the start position to f 1 0 to skip checking for n = 0 in minJoins.
let minJoins (input: int array) =
let length = input.Length
let sum = sumRange input
let rec f = memoize2 (fun m n ->
if n = length then
0
else
let sum_mn = sum m n
{n + 1 .. length}
|> Seq.filter (fun k -> sum (n + 1) k >= sum_mn)
|> Seq.map (fun k -> f (n + 1) k + k-n-1)
|> Seq.append {length .. length}
|> Seq.min
)
f 1 0
Full code.
open System.Collections.Generic
// standard memoization
let memoize2 f =
let cache = new Dictionary<_, _>()
(fun x1 x2 ->
match cache.TryGetValue((x1, x2)) with
| true, y -> y
| _ ->
let v = f x1 x2
cache.Add((x1, x2), v)
v)
// returns a function that takes two integers n,m and returns sum(array[n:m])
let sumRange (array : int array) =
let forward = Array.create (array.Length + 1) 0
let mutable total = 0
for i in 0 .. array.Length - 1 do
total <- total + array.[i]
forward.[i + 1] <- total
(fun i j -> forward.[j] - forward.[i - 1])
// min joins to sort an array ascending
let minJoins (input: int array) =
let length = input.Length
let sum = sumRange input
let rec f = memoize2 (fun m n ->
if n = length then
0
else
let sum_mn = sum m n
{n + 1 .. length}
|> Seq.filter (fun k -> sum (n + 1) k >= sum_mn)
|> Seq.map (fun k -> f (n + 1) k + k-n-1)
|> Seq.append {length .. length} // if nothing passed the filter return length as the min
|> Seq.min
)
f 1 0
let input = [|2;8;2;2;8;3;8;9;9;2;9;8;8;7;4;2;7;5;9;4;6;7;4;7;3;4;7;9;1;2;5;1;8;7;3;3;6;3;8;5;6;5|]
let output = minJoins input
printfn "%A" output
// outputs 30
I have a sorted list of inputs:
let x = [2; 4; 6; 8; 8; 10; 12]
let y = [-8; -7; 2; 2; 3; 4; 4; 8; 8; 8;]
I want to write a function which behaves similar to an SQL INNER JOIN. In other words, I want to return the cartesian product of x and y which contains only items shared in both lists:
join(x, y) = [2; 2; 4; 4; 8; 8; 8; 8; 8; 8]
I've written a naive version as follows:
let join x y =
[for x' in x do
for y' in y do
yield (x', y')]
|> List.choose (fun (x, y) -> if x = y then Some x else None)
It works, but this runs in O(x.length * y.length). Since both my lists are sorted, I think its possible to get the results I want in O(min(x.length, y.length)).
How can I find common elements in two sorted lists in linear time?
I can't help you with the F#, but the basic idea is to use two indices, one for each list. Choose the item in each list at the current index for that list. If the two items are the same value, then add that value to your result set and increment both indices. If the items have different values, increment just the index for the list containing the lesser of the two values. Repeat the comparison until one of your lists is empty and then return the result set.
O(min(n,m)) time is impossible: Take two lists [x;x;...;x;y] and [x;x;...;x;z]. You have to browse both lists till the end to compare y and z.
Even O(n+m) is impossible. Take
[1,1,...,1] - n times
and
[1,1,...,1] - m times
Then the resulting list should have n*m elements. You need at least O(n m) (correctly Omega(n m)) time do create such list.
Without cartesian product (simple merge), this is quite easy. Ocaml code (I don't know F#, should be reasonably close; compiled but not tested):
let rec merge a b = match (a,b) with
([], xs) -> xs
| (xs, []) -> xs
| (x::xs, y::ys) -> if x <= y then x::(merge xs (y::ys))
else y::(merge (x::xs) (y::ys));;
(Edit: I was too late)
So your code in O(n m) is the best possible in worst case. However, IIUIC it performs always n*m operations, which is not optimal.
My approach would be
1) write a function
group : 'a list -> ('a * int) list
that counts the number of same elements:
group [1,1,1,1,1,2,2,3] == [(1,5);(2,2);(3,1)]
2) use it to merge both lists using similar code as before (there you can multiply those coefficients)
3) write a function
ungroup : ('a * int) list -> 'a list
and compose those three.
This has complexity O(n+m+x) where x is the length of resulting list. This is the best possible up to constant.
Edit: Here you go:
let group x =
let rec group2 l m =
match l with
| [] -> []
| a1::a2::r when a1 == a2 -> group2 (a2::r) (m+1)
| x::r -> (x, m+1)::(group2 r 0)
in group2 x 0;;
let rec merge a b = match (a,b) with
([], xs) -> []
| (xs, []) -> []
| ((x, xm)::xs, (y, ym)::ys) -> if x == y then (x, xm*ym)::(merge xs ys)
else if x < y then merge xs ((y, ym)::ys)
else merge ((x, xm)::xs) ys;;
let rec ungroup a =
match a with
[] -> []
| (x, 0)::l -> ungroup l
| (x, m)::l -> x::(ungroup ((x,m-1)::l));;
let crossjoin x y = ungroup (merge (group x) (group y));;
# crossjoin [2; 4; 6; 8; 8; 10; 12] [-7; -8; 2; 2; 3; 4; 4; 8; 8; 8;];;
- : int list = [2; 2; 4; 4; 8; 8; 8; 8; 8; 8]
The following is also tail-recursive (so far as I can tell), but the output list is consequently reversed:
let rec merge xs ys acc =
match (xs, ys) with
| ((x :: xt), (y :: yt)) ->
if x = y then
let rec count_and_remove_leading zs acc =
match zs with
| z :: zt when z = x -> count_and_remove_leading zt (acc + 1)
| _ -> (acc, zs)
let rec replicate_and_prepend zs n =
if n = 0 then
zs
else
replicate_and_prepend (x :: zs) (n - 1)
let xn, xt = count_and_remove_leading xs 0
let yn, yt = count_and_remove_leading ys 0
merge xt yt (replicate_and_prepend acc (xn * yn))
else if x < y then
merge xt ys acc
else
merge xs yt acc
| _ -> acc
let xs = [2; 4; 6; 8; 8; 10; 12]
let ys = [-7; -8; 2; 2; 3; 4; 4; 8; 8; 8;]
printf "%A" (merge xs ys [])
Output:
[8; 8; 8; 8; 8; 8; 4; 4; 2; 2]
Note that, as sdcvvc says in his answer, this is still O(x.length * y.length) in worst case, simply because the edge case of two lists of repeating identical elements would require the creation of x.length * y.length values in the output list, which is by itself inherently an O(m*n) operation.
I don't know F#, however I suppose it has arrays and binary-search implementation over arrays(can be implemented also)
choose smallest list
copy it to array (for O(1) random access, if F# already gives you that, you can skip this step)
go over big list and using binary search find in small array elements from big list,
if found add it to result list
Complexity O(min + max*log min), where min = sizeof small list and max - sizeof(big list)
I don't know F#, but I can provide a functional Haskell implementation, based on the algorithm outlined by tvanfosson (further specified by Lasse V. Karlsen).
import Data.List
join :: (Ord a) => [a] -> [a] -> [a]
join l r = gjoin (group l) (group r)
where
gjoin [] _ = []
gjoin _ [] = []
gjoin l#(lh#(x:_):xs) r#(rh#(y:_):ys)
| x == y = replicate (length lh * length rh) x ++ gjoin xs ys
| x < y = gjoin xs r
| otherwise = gjoin l ys
main :: IO ()
main = print $ join [2, 4, 6, 8, 8, 10, 12] [-7, -8, 2, 2, 3, 4, 4, 8, 8, 8]
This prints [2,2,4,4,8,8,8,8,8,8]. I case you're not familiar with Haskell, some references to the documentation:
group
length
replicate
I think it can be done simply by using hash tables. The hash tables store the frequencies of the elements in each list. These are then used to create a list where the frequency of each element e is frequency of e in X multiplied by the frequency of e in Y. This has a complexity of O(n+m).
(EDIT: Just noticed that this can be worst case O(n^2), after reading comments on other posts. Something very much like this has already been posted. Sorry for the duplicate. I'm keeping the post in case the code helps.)
I don't know F#, so I'm attaching Python code. I'm hoping the code is readable enough to be converted to F# easily.
def join(x,y):
x_count=dict()
y_count=dict()
for elem in x:
x_count[elem]=x_count.get(elem,0)+1
for elem in y:
y_count[elem]=y_count.get(elem,0)+1
answer=[]
for elem in x_count:
if elem in y_count:
answer.extend( [elem]*(x_count[elem]*y_count[elem] ) )
return answer
A=[2, 4, 6, 8, 8, 10, 12]
B=[-8, -7, 2, 2, 3, 4, 4, 8, 8, 8]
print join(A,B)
The problem with what he wants is that it obviously has to re-traverse the list.
In order to get 8,8,8 to show up twice, the function has to loop thru the second list a bit. Worst case scenario (two identical lists) will still yield O(x * y)
As a note, this is not utilizing external functions that loop on their own.
for (int i = 0; i < shorterList.Length; i++)
{
if (shorterList[i] > longerList[longerList.Length - 1])
break;
for (int j = i; j < longerList.Length && longerList[j] <= shorterList[i]; j++)
{
if (shorterList[i] == longerList[j])
retList.Add(shorterList[i]);
}
}
I think this is O(n) on the intersect/join code, though the full thing traverses each list twice:
// list unique elements and their multiplicity (also reverses sorting)
// e.g. pack y = [(8, 3); (4, 2); (3, 1); (2, 2); (-8, 1); (-7, 1)]
// we assume xs is ordered
let pack xs = Seq.fold (fun acc x ->
match acc with
| (y,ny) :: tl -> if y=x then (x,ny+1) :: tl else (x,1) :: acc
| [] -> [(x,1)]) [] xs
let unpack px = [ for (x,nx) in px do for i in 1 .. nx do yield x ]
// for lists of (x,nx) and (y,ny), returns list of (x,nx*ny) when x=y
// assumes inputs are sorted descending (from pack function)
// and returns results sorted ascending
let intersect_mult xs ys =
let rec aux rx ry acc =
match (rx,ry) with
| (x,nx)::xtl, (y,ny)::ytl ->
if x = y then aux xtl ytl ((x,nx*ny) :: acc)
elif x < y then aux rx ytl acc
else aux xtl ry acc
| _,_ -> acc
aux xs ys []
let inner_join x y = intersect_mult (pack x) (pack y) |> unpack
Now we test it on your sample data
let x = [2; 4; 6; 8; 8; 10; 12]
let y = [-7; -8; 2; 2; 3; 4; 4; 8; 8; 8;]
> inner_join x y;;
val it : int list = [2; 2; 4; 4; 8; 8; 8; 8; 8; 8]
EDIT: I just realized this is the same idea as the earlier answer by sdcvvc (after the edit).
You can't get O(min(x.length, y.length)), because the output may be greater than that. Supppose all elements of x and y are equal, for instance. Then the output size is the product of the size of x and y, which gives a lower bound to the efficiency of the algorithm.
Here's the algorithm in F#. It is not tail-recursive, which can be easily fixed. The trick is doing mutual recursion. Also note that I may invert the order of the list given to prod to avoid unnecessary work.
let rec prod xs ys =
match xs with
| [] -> []
| z :: zs -> reps xs ys ys
and reps xs ys zs =
match zs with
| [] -> []
| w :: ws -> if xs.Head = w then w :: reps xs ys ws
else if xs.Head > w then reps xs ys ws
else match ys with
| [] -> []
| y :: yss -> if y < xs.Head then prod ys xs.Tail else prod xs.Tail ys
The original algorithm in Scala:
def prod(x: List[Int], y: List[Int]): List[Int] = x match {
case Nil => Nil
case z :: zs => reps(x, y, y)
}
def reps(x: List[Int], y: List[Int], z: List[Int]): List[Int] = z match {
case w :: ws if x.head == w => w :: reps(x, y, ws)
case w :: ws if x.head > w => reps(x, y, ws)
case _ => y match {
case Nil => Nil
case y1 :: ys if y1 < x.head => prod(y, x.tail)
case _ => prod(x.tail, y)
}
}