Is a list comprehension or a sequential filter more optimized? - algorithm

Let's say you have to return the sum of all the multiples of 2 and 3 in a set of integers from 1-100. In Haskell, the code I would write would look something like this:
sum ([x*2 | x<-[1..100], x*2 < 100] `union` [x*3 | x<-[1..100], x*3 < 100])
This uses 2 list comprehensions with a union. Another solution would be to step through each item in the list and evaluate it (using a modulus), then add it to a separate list, which you would later add together.
Both of these solutions come out with the same answer, but which one is more optimized if you had to do the same for, say, a list from 1..1000000?
The answer to the original question is 3317 if you want to create your own algorithm.

If you are looking for performance, you can simplify this problem to the point where you don't even need a computer....
Numbers divisible by 2 or 3 fall into a pattern
0 (1) 2 3 4 (5).... 6 (7) 8 9 10 (11).... etc
or
TFTTTF.... TFTTTF....
Assume that the max bound is divisible by 6, (if not, you can just choose the highest value below the real bound and add the remaining few values by hand). Let maxBound=6*N.
For each additional N, you add the following values
6*n, 0, 6*n+2, 6*n+3, 6*n+4, 0
which sums to
24*n+9
so all you need to do is sum up
sum from n=0 to N of (24*n+9)
=24*(sum from n=0 to N of n) + 9*N
=24*N*(N-1)/2 + 9*N
=12*N^2-3*N
so a very fast Haskell program that would solve this problem would look something like this
f maxBound = 12*n^2-3*n + remainingStuff
where n = maxBound `quot` 6
remainingStuff = sum $ filter (<= maxBound) [6*n, 6*n+2, 6*n+3, 6*n+4]

The union function is a "quadratic" algorithm, so using one list comprehension will be faster.
A better way which is useful for generating these kinds of sequences is to take advantage of the fact that they are ordered and merge them together with a function like:
merge :: [Int] -> [Int] -> [Int]
merge as [] = as
merge [] bs = bs
merge as#(a:at) bs#(b:bt) =
case compare a b of
LT -> a : merge at bs
EQ -> a : merge at bt
GT -> b : merge as bt
and then generate your sequence with:
[ x | x <- merge [2,4..100] [3,6..100] ]
One last tip for writing combinatorial loops... replace expressions like x <- [1..100], 2*x < 100 with x <- [1..49], or if you can't compute the upper bound explicitly, use x <- takeWhile (\x -> 2*x < 100) [1..100]. The latter forms only generates as many items as needed.

Related

Algorithm to find list given dot product and another list

I need to write a function findL that takes a list L1 of integers and a desired dot product n, and returns a list L2 of nonnegative integers such that L1 · L2 = n. (By "dot product" I mean the sum of the pairwise products; for example, [1,2] · [3,4] = 1·3+2·4 = 11.)
So, for example, findL(11, [1,2]) might return SOME [3,4]. If there's no possible list, I return NONE.
I'm using a functional language. (Specifically Standard ML, but the exact language isn't so important, I'm just trying to think of an FP algorithm.) What I have written so far:
Let's say I have findL(n, L1):
if L1 = [], I return NONE.
if L1 = [x] (list of length 1)
if (n >= 0 and x > 0 and n mod x = 0), return SOME [n div x]
else return NONE
If L1 has length greater than 1, I recurse on findL (n, L[1:]). If that returns a list L2, I return [1] concatenated to L2. If the recursive call returns NONE, I did another recursive call on findL (0, L[1:]) and prepended [n div x] to the result if it wasn't NONE. This works on many inputs but are failing on others.
I need to change part 3, but I'm not sure if I have the right idea. I would appreciate any tips!
Unless you need to say that empty lists in the input are always bad (even n = 0 with the list []), I'd recommend returning something different for an empty list based on whether you've reached 0 at the end (everything has been subtracted away) or not, then recurse when receiving any nonempty list rather than special-casing a one-element list.
As far as step three, you need to test every possible positive integer multiple of the first element of your input list until they exceed n, not just the first and last. The first non-None value you get is good enough, so you just prepend the multiplier (not the multiple) to the return list. If everything gives you Nones, you return None.
I don't know SML, but here's how I'd do it in Haskell:
import Data.Maybe (isJust, listToMaybe)
-- Find linear combinations of positive integers
solve :: Integer -> [Integer] -> Maybe [Integer]
-- If we've made it to the end with zero left, good!
solve 0 [] = Just []
-- Otherwise, this way isn't the way to go.
solve _ [] = Nothing
-- If one of the elements of the input list is zero, just multiply that element by one.
solve n (0:xs) = case solve n xs of
Nothing -> Nothing
Just ys -> Just (1:ys)
solve n (x:xs) = listToMaybe -- take first solution if it exists
. map (\ (m, Just ys) -> m:ys) -- put multiplier at front of list
. filter (isJust . snd) -- remove nonsolutions
. zip [1 ..] -- tuple in the multiplier
. map (\ m -> solve (n - m) xs) -- use each multiple
$ [x, x + x .. n] -- the multiples of x up to n
Here it is solving 11 with [1, 2] and 1 with [1, 2].

Recursion confusion in Haskell again - subsets with an inclusion test

I'm testing a simple program to generate subsets with an inclusion test. For example, given
*Main Data.List> factorsets 7
[([2],2),([2,3],1),([3],1),([5],1),([7],1)]
calling chooseP 3 (factorsets 7), I would like to get (read from right to left, a la cons)
[[([5],1),([3],1),([2],2)]
,[([7],1),([3],1),([2],2)]
,[([7],1),([5],1),([2],2)]
,[([7],1),([5],1),([2,3],1)]
,[([7],1),([5],1),([3],1)]]
But my program is returning an extra [([7],1),([5],1),([3],1)] (and missing a [([7],1),([5],1),([2],2)]):
[[([5],1),([3],1),([2],2)]
,[([7],1),([3],1),([2],2)]
,[([7],1),([5],1),([3],1)]
,[([7],1),([5],1),([2,3],1)]
,[([7],1),([5],1),([3],1)]]
The inclusion test is: members' first part of the tuple must have a null intersection.
Once tested as working, the plan is to sum the internal products of each subset's snds, rather than accumulate them.
Since I've asked a similar question before, I imagine that an extra branch is generated since when the recursion splits at [2,3], the second branch runs over the same possibilities once it passes the skipped section. Any pointers on how to resolve that would be appreciated; and if you'd like to share ideas about how to enumerate and sum such product combinations more efficiently, that would be great, too.
Haskell code:
chooseP k xs = chooseP' xs [] 0 where
chooseP' [] product count = if count == k then [product] else []
chooseP' yys product count
| count == k = [product]
| null yys = []
| otherwise = f ++ g
where (y:ys) = yys
(factorsY,numY) = y
f = let zzs = dropWhile (\(fs,ns) -> not . and . map (null . intersect fs . fst) $ product) yys
in if null zzs
then chooseP' [] product count
else let (z:zs) = zzs in chooseP' zs (z:product) (count + 1)
g = if and . map (null . intersect factorsY . fst) $ product
then chooseP' ys product count
else chooseP' ys [] 0
Your code is complicated enough that I might recommend starting over. Here's how I would proceed.
Write a specification. Let it be as stupidly inefficient as necessary -- for example, the spec I choose below will build all combinations of k elements from the list, then filter out the bad ones. Even the filter will be stupidly slow.
sorted xs = sort xs == xs
unique xs = nub xs == xs
disjoint xs = and $ liftM2 go xs xs where
go x1 x2 = x1 == x2 || null (intersect x1 x2)
-- check that x is valid according to all the validation functions in fs
-- (there are other fun ways to spell this, but this is particularly
-- readable and clearly correct -- just what we want from a spec)
allFuns fs x = all ($x) fs
choosePSpec k = filter good . replicateM k where
good pairs = allFuns [unique, disjoint, sorted] (map fst pairs)
Just to make sure it's right, we can test it at the prompt:
*Main> mapM_ print $ choosePSpec 3 [([2],2),([2,3],1),([3],1),([5],1),([7],1)]
[([2],2),([3],1),([5],1)]
[([2],2),([3],1),([7],1)]
[([2],2),([5],1),([7],1)]
[([2,3],1),([5],1),([7],1)]
[([3],1),([5],1),([7],1)]
Looks good.
Now that we have a spec, we can try to improve the speed one refactoring at a time, always checking that it matches the spec. The first thing I'd want to do is notice that we can ensure uniqueness and sortedness just by sorting the input and picking things "in an increasing way". To do this, we can define a function which chooses subsequences of a given length. It piggy-backs on the tails function, which you can think of as nondeterministically choosing a place to split its input list.
subseq 0 xs = [[]]
subseq n xs = do
x':xt <- tails xs
xs' <- subseq (n-1) xt
return (x':xs')
Here's an example of this function in action:
*Main> subseq 3 [1..4]
[[1,2,3],[1,2,4],[1,3,4],[2,3,4]]
Now we can write a slightly faster chooseP by replacing replicateM with subseq. Recall that we're assuming the inputs are already sorted and unique, though.
choosePSlow k = filter good . subseq k where
good pairs = disjoint $ map fst pairs
We can sanity-check that it's working by running it on the particular input we have from above:
*Main> let i = [([2],2),([2,3],1),([3],1),([5],1),([7],1)]
*Main> choosePSlow 3 i == choosePSpec 3 i
True
Or, better yet, we can stress-test it with QuickCheck. We'll need a tiny bit more code. The condition k < 5 is just because the spec is so hopelessly slow that bigger values of k take forever.
propSlowMatchesSpec :: NonNegative Int -> OrderedList ([Int], Int) -> Property
propSlowMatchesSpec (NonNegative k) (Ordered xs)
= k < 5 && unique (map fst xs)
==> choosePSlow k xs == choosePSpec k xs
*Main> quickCheck propSlowMatchesSpec
+++ OK, passed 100 tests.
There are several more opportunities to make things faster. For instance, the disjoint test could be sped up using choose 2 instead of liftM2; or we might be able to ensure disjointness during element selection and prune the search even earlier; etc. How you want to improve it from here I leave to you -- but the basic technique (start with stupid and slow, then make it smarter, testing as you go) should be helpful to you.

Finding largest f satisfying a property given f is non-decreasing in its arguments

this has been bugging me for a while.
Lets say you have a function f x y where x and y are integers and you know that f is strictly non-decreasing in its arguments,
i.e. f (x+1) y >= f x y and f x (y+1) >= f x y.
What would be the fastest way to find the largest f x y satisfying a property given that x and y are bounded.
I was thinking that this might be a variation of saddleback search and I was wondering if there was a name for this type of problem.
Also, more specifically I was wondering if there was a faster way to solve this problem if you knew that f was the multiplication operator.
Thanks!
Edit: Seeing the comments below, the property can be anything
Given a property g (where g takes a value and returns a boolean) I am simply looking for the largest f such that g(f) == True
For example, a naive implementation (in haskell) would be:
maximise :: (Int -> Int -> Int) -> (Int -> Bool) -> Int -> Int -> Int
maximise f g xLim yLim = head . filter g . reverse . sort $ results
where results = [f x y | x <- [1..xLim], y <- [1..yLim]]
Let's draw an example grid for your problem to help think about it. Here's an example plot of f for each x and y. It is monotone in each argument, which is an interesting constraint we might be able to do something clever with.
+------- x --------->
| 0 0 1 1 1 2
| 0 1 1 2 2 4
y 1 1 3 4 6 6
| 1 2 3 6 6 7
| 7 7 7 7 7 7
v
Since we don't know anything about the property, we can't really do better than to list the values in the range of f in decreasing order. The question is how to do that efficiently.
The first thing that comes to mind is to traverse it like a graph starting at the lower-right corner. Here is my attempt:
import Data.Maybe (listToMaybe)
maximise :: (Ord b, Num b) => (Int -> Int -> b) -> (b -> Bool) -> Int -> Int -> Maybe b
maximise f p xLim yLim =
listToMaybe . filter p . map (negate . snd) $
enumIncreasing measure successors (xLim,yLim)
where
measure (x,y) = negate $ f x y
successors (x,y) = [ (x-1,y) | x > 0 ] ++ [ (x,y-1) | y > 0 ] ]
The signature is not as general as it could be (Num should not be necessary, but I needed it to negate the measure function because enumIncreasing returns an increasing rather than a decreasing list -- I could have also done it with a newtype wrapper).
Using this function, we can find the largest odd number which can be written as a product of two numbers <= 100:
ghci> maximise (*) odd 100 100
Just 9801
I wrote enumIncreasing using meldable-heap on hackage to solve this problem, but it is pretty general. You could tweak the above to add additional constraints on the domain, etc.
The answer depends on what's expensive. The case that might be intersting is when f is expensive.
What you might want to do is look at pareto-optimality. Suppose you have two points
(1, 2) and (3, 4)
Then you know that the latter point is going to be a better solution, so long as f is a nondecreasing function. However, of course, if you have points,
(1, 2) and (2, 1)
then you can't know. So, one solution would be to establish a pareto-optimal frontier of points that the predicate g permits, and then evaluate these though f.

Evaluate all possible interpretations in OCaml

I need to evaluate whether two formulas are equivalent or not. Here, I use a simple definition of formula, which is a prefix formula.
For example, And(Atom("b"), True) means b and true, while And(Atom("b"), Or(Atom("c"), Not(Atom("c")))) means (b and (c or not c))
My idea is simple, get all atoms, apply every combination (for my cases, I will have 4 combination, which are true-true, true-false, false-true, and false-false). The thing is, I don't know how to create these combinations.
For now, I have known how to get all involving atoms, so in case of there are 5 atoms, I should create 32 combinations. How to do it in OCaml?
Ok, so what you need is a function combinations n that will produce all the booleans combinations of length n; let's represent them as lists of lists of booleans (i.e. a single assignment of variables will be a list of booleans). Then this function would do the job:
let rec combinations = function
| 0 -> [[]]
| n ->
let rest = combinations (n - 1) in
let comb_f = List.map (fun l -> false::l) rest in
let comb_t = List.map (fun l -> true::l) rest in
comb_t # comb_f
There is only one empty combination of length 0 and for n > 0 we produce combinations of n-1 and prefix them with false and with true to produce all possible combinations of length n.
You could write a function to print such combinations, let's say:
let rec combinations_to_string = function
| [] -> ""
| x::xs ->
let rec bools_to_str = function
| [] -> ""
| b::bs -> Printf.sprintf "%s%s" (if b then "T" else "F") (bools_to_str bs)
in
Printf.sprintf "[%s]%s" (bools_to_str x) (combinations_to_string xs)
and then test it all with:
let _ =
let n = int_of_string Sys.argv.(1) in
let combs = combinations n in
Printf.eprintf "combinations(%d) = %s\n" n (combinations_to_string combs)
to get:
> ./combinations 3
combinations(3) = [TTT][TTF][TFT][TFF][FTT][FTF][FFT][FFF]
If you think of a list of booleans as a list of bits of fixed length, there is a very simple solution: Count!
If you want to have all combinations of 4 booleans, count from 0 to 15 (2^4 - 1) -- then interpret each bit as one of the booleans. For simplicity I'll use a for-loop, but you can also do it with a recursion:
let size = 4 in
(* '1 lsl size' computes 2^size *)
for i = 0 to (1 lsl size) - 1 do
(* from: is the least significant bit '1'? *)
let b0 = 1 = ((i / 1) mod 2) in
let b1 = 1 = ((i / 2) mod 2) in
let b2 = 1 = ((i / 4) mod 2) in
(* to: is the most significant bit '1'? *)
let b3 = 1 = ((i / 8) mod 2) in
(* do your thing *)
compute b0 b1 b2 b3
done
Of course you can make the body of the loop more general so that it e.g. creates a list/array of booleans depending on the size given above etc.;
The point is that you can solve this problem by enumerating all values you are searching for. If this is the case, compute all integers up to your problem size. Write a function that generates a value of your original problem from an integer. Put it all together.
This method has the advantage that you do not need to first create all combinations, before starting your computation. For large problems this might well save you. For rather small size=16 you will already need 65535 * sizeof(type) memory -- and this is growing exponentially with the size! The above solution will require only a constant amount of memory of sizeof(type).
And for science's sake: Your problem is NP-complete, so if you want the exact solution, it will take exponential time.

variant of pascal's triangle in haskell - problem with lazy evaluation

To solve some problem I need to compute a variant of the pascal's triangle which is defined like this:
f(1,1) = 1,
f(n,k) = f(n-1,k-1) + f(n-1,k) + 1 for 1 <= k < n,
f(n,0) = 0,
f(n,n) = 2*f(n-1,n-1) + 1.
For n given I want to efficiently get the n-th line (f(n,1) .. f(n,n)). One further restriction: f(n,k) should be -1 if it would be >= 2^32.
My implementation:
next :: [Int64] -> [Int64]
next list#(x:_) = x+1 : takeWhile (/= -1) (nextRec list)
nextRec (a:rest#(b:_)) = boundAdd a b : nextRec rest
nextRec [a] = [boundAdd a a]
boundAdd x y
| x < 0 || y < 0 = -1
| x + y + 1 >= limit = -1
| otherwise = (x+y+1)
-- start shoud be [1]
fLine d start = until ((== d) . head) next start
The problem: for very large numbers I get a stack overflow. Is there a way to force haskell to evaluate the whole list? It's clear that each line can't contain more elements than an upper bound, because they eventually become -1 and don't get stored and each line only depends on the previous one. Due to the lazy evaluation only the head of each line is computed until the last line needs it's second element and all the trunks along the way are stored...
I have a very efficient implementation in c++ but I am really wondering if there is a way to get it done in haskell, too.
Works for me: What Haskell implementation are you using? A naive program to calculate this triangle works fine for me in GHC 6.10.4. I can print the 1000th row just fine:
nextRow :: [Integer] -> [Integer]
nextRow row = 0 : [a + b + 1 | (a, b) <- zip row (tail row ++ [last row])]
tri = iterate nextRow [0]
main = putStrLn $ show $ tri !! 1000 -- print 1000th row
I can even print the first 10 numbers in row 100000 without overflowing the stack. I'm not sure what's going wrong for you. The global name tri might be keeping the whole triangle of results alive, but even if it is, that seems relatively harmless.
How to force order of evaluation: You can force thunks to be evaluated in a certain order using the Prelude function seq (which is a magic function that can't be implemented in terms of Haskell's other basic features). If you tell Haskell to print a `seq` b, it first evaluates the thunk for a, then evaluates and prints b.
Note that seq is shallow: it only does enough evaluation to force a to no longer be a thunk. If a is of a tuple type, the result might still be a tuple of thunks. If it's a list, the result might be a cons cell having thunks for both the head and the tail.
It seems like you shouldn't need to do this for such a simple problem; a few thousand thunks shouldn't be too much for any reasonable implementation. But it would go like this:
-- Evaluate a whole list of thunks before calculating `result`.
-- This returns `result`.
seqList :: [b] -> a -> a
seqList lst result = foldr seq result lst
-- Exactly the same as `nextRow`, but compute every element of `row`
-- before calculating any element of the next row.
nextRow' :: [Integer] -> [Integer]
nextRow' row = row `seqList` nextRow row
tri = iterate nextRow' [0]
The fold in seqList basically expands to lst!!0 `seq` lst!!1 `seq` lst!!2 `seq` ... `seq` result.
This is much slower for me when printing just the first 10 elements of row 100,000. I think that's because it requires computing 99,999 complete rows of the triangle.

Resources