Let's say, we have a list/an array of positive integers x1, x2, ... , xn.
We can do a join operation on this sequence, that means that we can replace two elements that are next to each other with one element, which is sum of these elements. For example:
-> array/list: [1;2;3;4;5;6]
we can join 2 and 3, and replace them with 5;
we can join 5 and 6, and replace them with 11;
we cannot join 2 and 4;
we cannot join 1 and 3 etc.
Main problem is to find minimum join operations for given sequence, after which this sequence will be sorted in increasing order.
Note: empty and one-element sequences are sorted in increasing order.
Basic examples:
for [4; 6; 5; 3; 9] solution is 1 (we join 5 and 3)
for [1; 3; 6; 5] solution is also 1 (we join 6 and 5)
What I am looking for, is an algorithm that solve this problem. It could be in pseudocode, C, C++, PHP, OCaml or similar (I mean: I would understand solution, if You wrote solution in one of these languages).
This is an ideal problem to solve using Dynamic Programming, and the recurrence described by #lijie is exactly the right approach, with a few minor tweaks to ensure all possibilities are considered. There are two key observations: (a) Any sequence of join operations results in a set of non-overlapping summed subsequences of the original vector, and (b) For the optimal join-sequence, if we look to the right of any summed subsequence (m...n), that portion is an optimal solution to the problem: "find an optimal join-sequence for the sub-vector (n+1)...N such that the resulting final sequence is sorted, and all elements are >= sum(m...n).
Implementing the recurrence directly would of course result in an exponential time algorithm, but a simple tweak using Dynamic Programming makes it O(N^2), because essentially all (m,n) pairs are considered once. An easy way to implement the recurrence using Dynamic Programming is to have a data-structure indexed by (m,n) that stores the results of f(m,n) once they are computed, so that the next time we invoke f(m,n), we can lookup the previously saved results. The following code does this using the R programming language. I am using the formulation where we want to find the min-number of joins to get a non-decreasing sequence. For those new to R, to test this code, simply download R from any mirror (Google "R Project"), fire it up, and paste the two function definitions (f and solve) into the console, and then solve any vector using "solve(c(...))" as in the examples below.
f <- function(m,n) {
name <- paste(m,n)
nCalls <<- nCalls + 1
# use <<- for global assignment
if( !is.null( Saved[[ name ]] ) ) {
# the solution for (m,n) has been cached, look it up
nCached <<- nCached + 1
return( Saved[[ name ]] )
}
N <- length(vec) # vec is global to this function
sum.mn <- -Inf
if(m >= 1)
sum.mn <- sum( vec[m:n] )
if(n == N) { # boundary case: the (m,n) range includes the last number
result <- list( num = 0, joins = list(), seq = c())
} else
{
bestNum <- Inf
bestJoins <- list()
bestSeq <- c()
for( k in (n+1):N ) {
sum.nk <- sum( vec[ (n+1):k ] )
if( sum.nk < sum.mn ) next
joinRest <- f( n+1, k )
numJoins <- joinRest$num + k-n-1
if( numJoins < bestNum ) {
bestNum <- numJoins
if( k == n+1 )
bestJoins <- joinRest$joins else
bestJoins <- c( list(c(n+1,k)), joinRest$joins )
bestSeq <- c( sum.nk, joinRest$seq)
}
}
result <- list( num = bestNum, joins = bestJoins, seq = bestSeq )
}
Saved[[ name ]] <<- result
result
}
solve <- function(input) {
vec <<- input
nCalls <<- 0
nCached <<- 0
Saved <<- c()
result <- f(0,0)
cat( 'Num calls to f = ', nCalls, ', Cached = ', nCached, '\n')
cat( 'Min joins = ', result$num, '\n')
cat( 'Opt summed subsequences: ')
cat( do.call( paste,
lapply(result$joins,
function(pair) paste(pair[1], pair[2], sep=':' ))),
'\n')
cat( 'Final Sequence: ', result$seq, '\n' )
}
Here are some sample runs:
> solve(c(2,8,2,2,9,12))
Num calls to f = 22 , Cached = 4
Min joins = 2
Opt summed subsequences: 2:3 4:5
Final Sequence: 2 10 11 12
> solve(c(1,1,1,1,1))
Num calls to f = 19 , Cached = 3
Min joins = 0
Opt summed subsequences:
Final Sequence: 1 1 1 1 1
> solve(c(4,3,10,11))
Num calls to f = 10 , Cached = 0
Min joins = 1
Opt summed subsequences: 1:2
Final Sequence: 7 10 11
> solve(c (2, 8, 2, 2, 8, 3, 8, 9, 9, 2, 9, 8, 8, 7, 4, 2, 7, 5, 9, 4, 6, 7, 4, 7, 3, 4, 7, 9, 1, 2, 5, 1, 8, 7, 3, 3, 6, 3, 8, 5, 6, 5))
Num calls to f = 3982 , Cached = 3225
Min joins = 30
Opt summed subsequences: 2:3 4:5 6:7 8:9 10:12 13:16 17:19 20:23 24:27 28:33 34:42
Final Sequence: 2 10 10 11 18 19 21 21 21 21 26 46
Note that the min number of joins for the sequence considered by #kotlinski is 30, not 32 or 33.
Greedy algorithm!
import Data.List (inits)
joinSequence :: (Num a, Ord a) => [a] -> Int
joinSequence (x:xs) = joinWithMin 0 x xs
where joinWithMin k _ [] = k
joinWithMin k x xs =
case dropWhile ((< x) . snd) $ zip [0..] $ scanl1 (+) xs
of (l, y):_ -> joinWithMin (k + l) y $ drop (l+1) xs
_ -> k + length xs
joinSequence _ = 0
At each step, grab more elements until their sum is not less than the last. If you run out of elements, just join all the ones that remain to the prior group.
That was wrong.
Combinatorial explosion!
joinSequence :: (Num a, Ord a) => [a] -> Int
joinSequence = joinWithMin 0 0
where joinWithMin k _ [] = k
joinWithMin k m xs =
case dropWhile ((< m) . snd) $ zip [0..] $ scanl1 (+) xs
of [] -> k + length xs
ys -> minimum [ joinWithMin (k+l) y $ drop (l+1) xs
| (l, y) <- ys ]
Just try every possible joining and take the minimum. I couldn't think of a smart heuristic to limit backtracking, but this should be O(n²) with dynamic programming, and O(2n) as written.
A dynamic programming approach:
Let the original array be a[i], 0 <= i < N.
Define f(m, n) to be the minimum number of joins needed to make a[n..N-1] sorted, such that all elements in the sorted sublist are > (or >=, if another variant is desired) the sum of a[m..n-1] (let the sum of an empty list to be -inf).
The base case is f(m, N) = 0 (the sublist is empty).
The recursion is f(m, n) = min_{n < k <= N s.t. sum(a[n..k-1]) > sum(a[m..n-1])} f(n, k) + k-n-1. If no values of k are suitable, then let f(m, n) = inf (anything >= N will also work, because there are at most N-1 joins).
Calculate f(m,n) in decreasing order of m and n.
Then, the desired answer is f(0,0).
EDIT
Oops this is basically ephemient's second answer, I believe, although I am not familiar enough with Haskell to know exactly what it is doing.
Some Haskell code:
sortJoin (a:b:c:xs)
| a <= b = a : sortJoin (b:c:xs)
| a+b <= c = a+b : sortJoin (c:xs)
| otherwise = sortJoin (a:b+c:xs)
sortJoin (a:b:[]) = if a <= b then [a,b] else [a+b]
sortJoin a#_ = a
edits xs = length xs - length (sortJoin xs)
UPDATE: Made this work with test = [2, 8, 2, 2, 8, 3, 8, 9, 9, 2, 9, 8, 8, 7, 4, 2, 7, 5, 9, 4, 6, 7, 4, 7, 3, 4, 7, 9, 1, 2, 5, 1, 8, 7, 3, 3, 6, 3, 8, 5, 6, 5]
...now we get:
> sortJoin test
[2,8,12,20,20,23,27,28,31,55]
> edits test
32
Hopefully keeping it simple. Here's some pseudo-code that's exponential time.
Function "join" (list, max-join-count, join-count) ->
Fail if join-count is greater than max-join-count.
If the list looks sorted return join-count.
For Each number In List
Recur (list with current and next number joined, max-join-count, join-count + 1)
Function "best-join" (list) ->
max-join-count = 0
while not join (list, max-join-count++)
Here's an implementation on Clojure:
(defn join-ahead [f i v]
(concat (take i v)
[(f (nth v i) (nth v (inc i)))]
(drop (+ 2 i) v)))
(defn sort-by-joining
"Sort a list by joining neighboring elements with `+'"
([v max-join-count join-count]
(if (or (nil? max-join-count)
(<= join-count max-join-count))
(if (or (empty? v)
(= v (sort v)))
{:vector v :join-count join-count}
(loop [i 0]
(when (< (inc i) (count v))
(let [r (sort-by-joining (join-ahead + i v)
max-join-count
(inc join-count))]
(or r (recur (inc i)))))))))
([v max-join-count]
(sort-by-joining v max-join-count 0))
([v]
(sort-by-joining v nil 0)))
(defn fewest-joins [v]
(loop [i 0]
(if (sort-by-joining v i)
i
(recur (inc i)))))
(deftest test-fewest-joins
(is (= 0 (fewest-joins nil)))
(is (= 1 (fewest-joins [4 6 5 3 9])))
(is (= 6 (fewest-joins [1 9 22 90 1 1 1 32 78 13 1]))))
This is pchalasani code in F# with some modifications. The memoization is similar, I added a sumRange function generator for sums in O(1) time and moved the start position to f 1 0 to skip checking for n = 0 in minJoins.
let minJoins (input: int array) =
let length = input.Length
let sum = sumRange input
let rec f = memoize2 (fun m n ->
if n = length then
0
else
let sum_mn = sum m n
{n + 1 .. length}
|> Seq.filter (fun k -> sum (n + 1) k >= sum_mn)
|> Seq.map (fun k -> f (n + 1) k + k-n-1)
|> Seq.append {length .. length}
|> Seq.min
)
f 1 0
Full code.
open System.Collections.Generic
// standard memoization
let memoize2 f =
let cache = new Dictionary<_, _>()
(fun x1 x2 ->
match cache.TryGetValue((x1, x2)) with
| true, y -> y
| _ ->
let v = f x1 x2
cache.Add((x1, x2), v)
v)
// returns a function that takes two integers n,m and returns sum(array[n:m])
let sumRange (array : int array) =
let forward = Array.create (array.Length + 1) 0
let mutable total = 0
for i in 0 .. array.Length - 1 do
total <- total + array.[i]
forward.[i + 1] <- total
(fun i j -> forward.[j] - forward.[i - 1])
// min joins to sort an array ascending
let minJoins (input: int array) =
let length = input.Length
let sum = sumRange input
let rec f = memoize2 (fun m n ->
if n = length then
0
else
let sum_mn = sum m n
{n + 1 .. length}
|> Seq.filter (fun k -> sum (n + 1) k >= sum_mn)
|> Seq.map (fun k -> f (n + 1) k + k-n-1)
|> Seq.append {length .. length} // if nothing passed the filter return length as the min
|> Seq.min
)
f 1 0
let input = [|2;8;2;2;8;3;8;9;9;2;9;8;8;7;4;2;7;5;9;4;6;7;4;7;3;4;7;9;1;2;5;1;8;7;3;3;6;3;8;5;6;5|]
let output = minJoins input
printfn "%A" output
// outputs 30
Related
I'm quite new to prolog and I am trying to write a predicate which gives the value of nth prime number and it looks like nth_prime(N, Prime) .
I have already done the function that counts if the number is prime or not
div(X, Y):- 0 is X mod Y.
div(X, Y):- X>Y+1, Y1 is Y+1, div(X, Y1).
prime(2):- true.
prime(X):- X<2, false.
prime(X):- not(div(X, 2)).
I don't understand what is my next step, and how I should count which prime belong to N.
Your code is a bit unusual for prolog but (with the exception of prime(1)) it works.
Here is a solution for your predicate:
nextprime(N,N):-
prime(N),
!.
nextprime(P, Prime):-
PP is P+1,
nextprime(PP,Prime).
nthprime(1, 2).
nthprime(N, Prime):-
N>1,
NN is N-1,
nthprime(NN, PrevPrime),
PP is PrevPrime+1,
nextprime(PP, Prime).
?- nthprime(1,P).
P = 2 ;
false.
?- nthprime(2,P).
P = 3 ;
false.
?- nthprime(3,P).
P = 5 ;
false.
It works as follows: It is known that the first prime number is 2 (nthprime(1, 2).). For every other number N larger than 1, get the previous prime number (nthprime(NN, PrevPrime)), add 1 until you hit a prime number. The add 1 part is done through a help predicate nextprime/2: for a given number P it will check if this number is a prime. If yes, it returns this number, otherwise it will call itself for the next higher number (nextprime(PP,Prime)) and forwards the output. The bang ! is called a cut which cuts the other choice branches. So if you once hit a prime, you can not go back and try the other path.
To test it you can ask ?- nthprime(N,P). for a given N. Or to check multiple answers at once, let's introdice a helperpredicate nthprimeList/2 which calls nthprime/2 for every item in the firstlist and puts the "output" into a list:
nthprimeList([],[]).
nthprimeList([N|TN],[P|TP]):-
nthprime(N,P),
nthprimeList(TN,TP).
?- nthprimeList([1,2,3,4,5,6,7,8,9],[P1,P2,P3,P4,P5,P6,P7,P8,P9]).
P1 = 2,
P2 = 3,
P3 = 5,
P4 = 7,
P5 = 11,
P6 = 13,
P7 = 17,
P8 = 19,
P9 = 23;
false.
Using your definitions, we define the following to count up and test all numbers from 2 and up, one after another:
nth_prime(N, Prime):-
nth_prime(N, Prime, 1, 2). % 2 is the candidate for 1st prime
nth_prime(N, P, I, Q):- % Q is I-th prime candidate
prime(Q)
-> ( I = N, P = Q
; I1 is I+1, Q1 is Q+1, nth_prime(N, P, I1, Q1)
)
; Q1 is Q+1, nth_prime(N, P, I, Q1).
Testing:
30 ?- nth_prime(N,P).
N = 1,
P = 2 ;
N = 2,
P = 3 ;
N = 3,
P = 5 ;
N = 4,
P = 7 ;
N = 5,
P = 11 .
31 ?- nth_prime(N,P), N>24.
N = 25,
P = 97 ;
N = 26,
P = 101 ;
N = 27,
P = 103 .
32 ?- nth_prime(N,P), N>99.
N = 100,
P = 541 ;
N = 101,
P = 547 ;
N = 102,
P = 557 .
I need an algorithm that produces a partition of the number n into k parts with the added restrictions that each element of the partition must be between a and b. Ideally, all possible partitions satisfying the restrictions should be equally likely. Partitions are considered the same if they have the same elements in different order.
For example, with n=10, k=3, a=2, b=4 one has only {4,4,2} and {4,3,3} as possible outcomes.
Is there a standard algorithm for such a problem? One can assume that at least one partition satisfying the restrictions always exists.
You can implement this as a recursive algorithm. Basically, the recurrence is like this:
if k == 1 and a <= n <= b, then the only partition is [n], otherwise none
otherwise, combine all the elements x from a to b with all the partitions for n-x, k-1
to prevent duplicates, also substitute the lower bound a with x
Here's some Python (aka executable pseudo-code):
def partitions(n, k, a, b):
if k == 1 and a <= n <= b:
yield [n]
elif n > 0 and k > 0:
for x in range(a, b+1):
for p in partitions(n-x, k-1, x, b):
yield [x] + p
print(list(partitions(10, 3, 2, 4)))
# [[2, 4, 4], [3, 3, 4]]
This could be further improved by checking (k-1)*a and (k-1)*b for the lower and upper bounds for the remaining elements, respectively, and restricting the range for x accordingly:
min_x = max(a, n - (k-1) * b)
max_x = min(b, n - (k-1) * a)
for x in range(min_x, max_x+1):
For partitions(110, 12, 3, 12) with 3,157 solutions, this reduces the number of recursive calls from 638,679 down to 24,135.
Here's a sampling algorithm that uses conditional probability.
import collections
import random
countmemo = {}
def count(n, k, a, b):
assert n >= 0
assert k >= 0
assert a >= 0
assert b >= 0
if k == 0:
return 1 if n == 0 else 0
key = (n, k, a, b)
if key not in countmemo:
countmemo[key] = sum(
count(n - c, k - 1, a, c) for c in range(a, min(n, b) + 1))
return countmemo[key]
def sample(n, k, a, b):
partition = []
x = random.randrange(count(n, k, a, b))
while k > 0:
for c in range(a, min(n, b) + 1):
y = count(n - c, k - 1, a, c)
if x < y:
partition.append(c)
n -= c
k -= 1
b = c
break
x -= y
else:
assert False
return partition
def test():
print(collections.Counter(
tuple(sample(20, 6, 2, 5)) for i in range(10000)))
if __name__ == '__main__':
test()
If k and b - a are not too big you can try a randomized depth-first search:
import random
def restricted_partition_rec(n, k, min, max):
if k <= 0 or n < min:
return []
ps = list(range(min, max + 1))
random.shuffle(ps)
for p in ps:
if p > n:
continue
elif p < n:
subp = restricted_partition(n - p, k - 1, min, max)
if subp:
return [p] + subp
elif k == 1:
return [p]
return []
def restricted_partition(n, k, min, max):
return sorted(restricted_partition_rec(n, k, min, max), reverse=True)
print(restricted_partition(10, 3, 2, 4))
>>>
[4, 4, 2]
Although I'm not sure if all the partitions have exactly the same probability in this case.
I am working through Cormen et. al., Introduction to Algorithms, 3rd ed., but I also have an interest in Haskell. Section 8.2 (p. 194) covers counting sort. I was interested in how it and many algorithms might be implemented in haskell as they often use array access and destructive update. I took a look at the implementation on RosettaCode (copied below) and I find it very difficult to follow.
import Data.Array
countingSort :: (Ix n) => [n] -> n -> n -> [n]
countingSort l lo hi = concatMap (uncurry $ flip replicate) count
where count = assocs . accumArray (+) 0 (lo, hi) . map (\i -> (i, 1)) $ l
One of the things I like about haskell is how algorithms can be very clear (e.g. Haskell quicksort examples), at least as a specification that hasn't been optimized. This seems very unclear and I wonder if it's necessarily so or just overdone.
Can someone
clarify what's going on here,
perhaps provide a more instructive and clear implementation, and
tackle whether this is actually implementing counting sort or if non-strictness (lazyness) and immutability mean that this is actually some other sort disguised as counting sort?
It is indeed doing a counting sort. Here's a slightly rewritten version that I find easier to understand:
import Data.Array
countingSort :: (Ix n) => [n] -> n -> n -> [n]
countingSort l lo hi = concat [replicate times n | (n, times) <- counts]
where counts = assocs (accumArray (+) 0 (lo, hi) [(i, 1) | i <- l])
Let's break it down step by step. We'll use the list [5, 3, 1, 2, 3, 4, 5].
*Main> [(i, 1) | i <- [5, 3, 1, 2, 3, 4, 5]]
[(5,1),(3,1),(1,1),(2,1),(3,1),(4,1),(5,1)]
We're just taking every element of the list and turning it into a tuple with 1. This is the basis for our counts. Now we need a way to sum up those counts per element. This is where accumArray comes into play.
*Main> accumArray (+) 0 (1, 5) [(i, 1) | i <- [5, 3, 1, 2, 3, 4, 5]]
array (1,5) [(1,1),(2,1),(3,2),(4,1),(5,2)]
The first parameter to accumArray is the operation to apply during accumulation (just simple addition for us). The second parameter is the starting value, and the third parameter is the bounds. So we end up with an array mapping numbers to their counts in the input.
Next we use assocs to get key/value tuples from the map:
*Main> assocs $ accumArray (+) 0 (1, 5) [(i, 1) | i <- [5, 3, 1, 2, 3, 4, 5]]
[(1,1),(2,1),(3,2),(4,1),(5,2)]
And then replicate to repeat each number based on its count:
*Main> [replicate times n | (n, times) <- assocs $ accumArray (+) 0 (1, 5) [(i, 1) | i <- [5, 3, 1, 2, 3, 4, 5]]]
[[1],[2],[3,3],[4],[5,5]]
Finally, we use concat to turn this list of lists into a single list:
*Main> concat [replicate times n | (n, times) <- assocs $ accumArray (+) 0 (1, 5) [(i, 1) | i <- [5, 3, 1, 2, 3, 4, 5]]]
[1,2,3,3,4,5,5]
In the actual function I wrote above, I used where to break up this one-liner.
I find list comprehension to be easier to deal with than map and concatMap, so those are the main changes I made in my version of the function.
(uncurry $ flip replicate) is a nice trick... flip replicate gives you a version of replicate that takes its arguments in the opposite order. uncurry takes that curried function and turns it into a function that takes a tuple as an argument instead. Together, those produce the same result as my list comprehension, which destructured the tuple and then passed its parameters in reverse order. I'm not familiar enough with Haskell to know if this is a common idiom, but for me, the list comprehension was easier to follow.
count through accumArray works as histogram builder. That is, for each number from lo to hi it returns how many times the number occurs in the argument list.
count :: (Ix i, Num e) => [i] -> i -> i -> [(i, e)]
count l lo hi = assocs . accumArray (+) 0 (lo, hi) . map (\i -> (i, 1)) $ l
count [6,2,1,6] 0 10 ==
[(0,0),(1,1),(2,1),(3,0),(4,0),(5,0),(6,2),(7,0),(8,0),(9,0),(10,0)]
Results of count are used to generate original elements back from this specification. That is done by replicateing each fst element of tuple snd number of times . This yields list of lists that are concatenated together.
f l lo hi = map (uncurry $ flip replicate ) $ count l lo hi
f [6,2,1,6] 0 10 == [[],[1],[2],[],[],[],[6,6],[],[],[],[]]
Full solution is equivalent to
countingSort l lo hi = concat $ f l lo hi
We have K different sets of numbers. We have to choose a number from each set, so that the difference between the higher and the lower number is the minimum.
Any ideas?
Something like this (written in Haskell)?
import Data.List (minimum, maximum, minimumBy)
minDiff (x:xs) = comb (head x) (diff $ matches (head x)) x where
lenxs = length xs
diff m = maximum m - minimum m
matches y = minimumBy (\a b -> compare (diff a) (diff b)) $ p [] 0 where
md = map (minimumBy (\a b -> compare (abs (a - y)) (abs (b - y)))) xs
mds = [m | m <- foldl (\b a -> filter (\z -> abs (z - y) == abs (y - md!!a)) (xs!!a) : b) [] [0..lenxs - 1]]
p result index
| index == lenxs = [y:result]
| otherwise = do
p' <- mds!!index
p (p':result) (index + 1)
comb result difference [] = matches result
comb result difference (z:zs) =
let diff' = diff (matches z)
in if diff' < difference
then comb z diff' zs
else comb result difference zs
OUTPUT:
*Main> minDiff [[1,3,5,9,10],[2,4,6,8],[7,11,12,13]]
[5,6,7]
I have a sorted list of inputs:
let x = [2; 4; 6; 8; 8; 10; 12]
let y = [-8; -7; 2; 2; 3; 4; 4; 8; 8; 8;]
I want to write a function which behaves similar to an SQL INNER JOIN. In other words, I want to return the cartesian product of x and y which contains only items shared in both lists:
join(x, y) = [2; 2; 4; 4; 8; 8; 8; 8; 8; 8]
I've written a naive version as follows:
let join x y =
[for x' in x do
for y' in y do
yield (x', y')]
|> List.choose (fun (x, y) -> if x = y then Some x else None)
It works, but this runs in O(x.length * y.length). Since both my lists are sorted, I think its possible to get the results I want in O(min(x.length, y.length)).
How can I find common elements in two sorted lists in linear time?
I can't help you with the F#, but the basic idea is to use two indices, one for each list. Choose the item in each list at the current index for that list. If the two items are the same value, then add that value to your result set and increment both indices. If the items have different values, increment just the index for the list containing the lesser of the two values. Repeat the comparison until one of your lists is empty and then return the result set.
O(min(n,m)) time is impossible: Take two lists [x;x;...;x;y] and [x;x;...;x;z]. You have to browse both lists till the end to compare y and z.
Even O(n+m) is impossible. Take
[1,1,...,1] - n times
and
[1,1,...,1] - m times
Then the resulting list should have n*m elements. You need at least O(n m) (correctly Omega(n m)) time do create such list.
Without cartesian product (simple merge), this is quite easy. Ocaml code (I don't know F#, should be reasonably close; compiled but not tested):
let rec merge a b = match (a,b) with
([], xs) -> xs
| (xs, []) -> xs
| (x::xs, y::ys) -> if x <= y then x::(merge xs (y::ys))
else y::(merge (x::xs) (y::ys));;
(Edit: I was too late)
So your code in O(n m) is the best possible in worst case. However, IIUIC it performs always n*m operations, which is not optimal.
My approach would be
1) write a function
group : 'a list -> ('a * int) list
that counts the number of same elements:
group [1,1,1,1,1,2,2,3] == [(1,5);(2,2);(3,1)]
2) use it to merge both lists using similar code as before (there you can multiply those coefficients)
3) write a function
ungroup : ('a * int) list -> 'a list
and compose those three.
This has complexity O(n+m+x) where x is the length of resulting list. This is the best possible up to constant.
Edit: Here you go:
let group x =
let rec group2 l m =
match l with
| [] -> []
| a1::a2::r when a1 == a2 -> group2 (a2::r) (m+1)
| x::r -> (x, m+1)::(group2 r 0)
in group2 x 0;;
let rec merge a b = match (a,b) with
([], xs) -> []
| (xs, []) -> []
| ((x, xm)::xs, (y, ym)::ys) -> if x == y then (x, xm*ym)::(merge xs ys)
else if x < y then merge xs ((y, ym)::ys)
else merge ((x, xm)::xs) ys;;
let rec ungroup a =
match a with
[] -> []
| (x, 0)::l -> ungroup l
| (x, m)::l -> x::(ungroup ((x,m-1)::l));;
let crossjoin x y = ungroup (merge (group x) (group y));;
# crossjoin [2; 4; 6; 8; 8; 10; 12] [-7; -8; 2; 2; 3; 4; 4; 8; 8; 8;];;
- : int list = [2; 2; 4; 4; 8; 8; 8; 8; 8; 8]
The following is also tail-recursive (so far as I can tell), but the output list is consequently reversed:
let rec merge xs ys acc =
match (xs, ys) with
| ((x :: xt), (y :: yt)) ->
if x = y then
let rec count_and_remove_leading zs acc =
match zs with
| z :: zt when z = x -> count_and_remove_leading zt (acc + 1)
| _ -> (acc, zs)
let rec replicate_and_prepend zs n =
if n = 0 then
zs
else
replicate_and_prepend (x :: zs) (n - 1)
let xn, xt = count_and_remove_leading xs 0
let yn, yt = count_and_remove_leading ys 0
merge xt yt (replicate_and_prepend acc (xn * yn))
else if x < y then
merge xt ys acc
else
merge xs yt acc
| _ -> acc
let xs = [2; 4; 6; 8; 8; 10; 12]
let ys = [-7; -8; 2; 2; 3; 4; 4; 8; 8; 8;]
printf "%A" (merge xs ys [])
Output:
[8; 8; 8; 8; 8; 8; 4; 4; 2; 2]
Note that, as sdcvvc says in his answer, this is still O(x.length * y.length) in worst case, simply because the edge case of two lists of repeating identical elements would require the creation of x.length * y.length values in the output list, which is by itself inherently an O(m*n) operation.
I don't know F#, however I suppose it has arrays and binary-search implementation over arrays(can be implemented also)
choose smallest list
copy it to array (for O(1) random access, if F# already gives you that, you can skip this step)
go over big list and using binary search find in small array elements from big list,
if found add it to result list
Complexity O(min + max*log min), where min = sizeof small list and max - sizeof(big list)
I don't know F#, but I can provide a functional Haskell implementation, based on the algorithm outlined by tvanfosson (further specified by Lasse V. Karlsen).
import Data.List
join :: (Ord a) => [a] -> [a] -> [a]
join l r = gjoin (group l) (group r)
where
gjoin [] _ = []
gjoin _ [] = []
gjoin l#(lh#(x:_):xs) r#(rh#(y:_):ys)
| x == y = replicate (length lh * length rh) x ++ gjoin xs ys
| x < y = gjoin xs r
| otherwise = gjoin l ys
main :: IO ()
main = print $ join [2, 4, 6, 8, 8, 10, 12] [-7, -8, 2, 2, 3, 4, 4, 8, 8, 8]
This prints [2,2,4,4,8,8,8,8,8,8]. I case you're not familiar with Haskell, some references to the documentation:
group
length
replicate
I think it can be done simply by using hash tables. The hash tables store the frequencies of the elements in each list. These are then used to create a list where the frequency of each element e is frequency of e in X multiplied by the frequency of e in Y. This has a complexity of O(n+m).
(EDIT: Just noticed that this can be worst case O(n^2), after reading comments on other posts. Something very much like this has already been posted. Sorry for the duplicate. I'm keeping the post in case the code helps.)
I don't know F#, so I'm attaching Python code. I'm hoping the code is readable enough to be converted to F# easily.
def join(x,y):
x_count=dict()
y_count=dict()
for elem in x:
x_count[elem]=x_count.get(elem,0)+1
for elem in y:
y_count[elem]=y_count.get(elem,0)+1
answer=[]
for elem in x_count:
if elem in y_count:
answer.extend( [elem]*(x_count[elem]*y_count[elem] ) )
return answer
A=[2, 4, 6, 8, 8, 10, 12]
B=[-8, -7, 2, 2, 3, 4, 4, 8, 8, 8]
print join(A,B)
The problem with what he wants is that it obviously has to re-traverse the list.
In order to get 8,8,8 to show up twice, the function has to loop thru the second list a bit. Worst case scenario (two identical lists) will still yield O(x * y)
As a note, this is not utilizing external functions that loop on their own.
for (int i = 0; i < shorterList.Length; i++)
{
if (shorterList[i] > longerList[longerList.Length - 1])
break;
for (int j = i; j < longerList.Length && longerList[j] <= shorterList[i]; j++)
{
if (shorterList[i] == longerList[j])
retList.Add(shorterList[i]);
}
}
I think this is O(n) on the intersect/join code, though the full thing traverses each list twice:
// list unique elements and their multiplicity (also reverses sorting)
// e.g. pack y = [(8, 3); (4, 2); (3, 1); (2, 2); (-8, 1); (-7, 1)]
// we assume xs is ordered
let pack xs = Seq.fold (fun acc x ->
match acc with
| (y,ny) :: tl -> if y=x then (x,ny+1) :: tl else (x,1) :: acc
| [] -> [(x,1)]) [] xs
let unpack px = [ for (x,nx) in px do for i in 1 .. nx do yield x ]
// for lists of (x,nx) and (y,ny), returns list of (x,nx*ny) when x=y
// assumes inputs are sorted descending (from pack function)
// and returns results sorted ascending
let intersect_mult xs ys =
let rec aux rx ry acc =
match (rx,ry) with
| (x,nx)::xtl, (y,ny)::ytl ->
if x = y then aux xtl ytl ((x,nx*ny) :: acc)
elif x < y then aux rx ytl acc
else aux xtl ry acc
| _,_ -> acc
aux xs ys []
let inner_join x y = intersect_mult (pack x) (pack y) |> unpack
Now we test it on your sample data
let x = [2; 4; 6; 8; 8; 10; 12]
let y = [-7; -8; 2; 2; 3; 4; 4; 8; 8; 8;]
> inner_join x y;;
val it : int list = [2; 2; 4; 4; 8; 8; 8; 8; 8; 8]
EDIT: I just realized this is the same idea as the earlier answer by sdcvvc (after the edit).
You can't get O(min(x.length, y.length)), because the output may be greater than that. Supppose all elements of x and y are equal, for instance. Then the output size is the product of the size of x and y, which gives a lower bound to the efficiency of the algorithm.
Here's the algorithm in F#. It is not tail-recursive, which can be easily fixed. The trick is doing mutual recursion. Also note that I may invert the order of the list given to prod to avoid unnecessary work.
let rec prod xs ys =
match xs with
| [] -> []
| z :: zs -> reps xs ys ys
and reps xs ys zs =
match zs with
| [] -> []
| w :: ws -> if xs.Head = w then w :: reps xs ys ws
else if xs.Head > w then reps xs ys ws
else match ys with
| [] -> []
| y :: yss -> if y < xs.Head then prod ys xs.Tail else prod xs.Tail ys
The original algorithm in Scala:
def prod(x: List[Int], y: List[Int]): List[Int] = x match {
case Nil => Nil
case z :: zs => reps(x, y, y)
}
def reps(x: List[Int], y: List[Int], z: List[Int]): List[Int] = z match {
case w :: ws if x.head == w => w :: reps(x, y, ws)
case w :: ws if x.head > w => reps(x, y, ws)
case _ => y match {
case Nil => Nil
case y1 :: ys if y1 < x.head => prod(y, x.tail)
case _ => prod(x.tail, y)
}
}