Maybe it's already implemented in F# ?
basically i would like to define a class of generic filter functions with infix operators, so something which would look like
type Filter<'T> = ('T -> bool) with
static member (|*) (f:Filter<'T>) (g:Filter<'T>) = (fun x -> (f x) ||
(g x)) // OR operator
but this is not the right syntax it seems
Stopped due to error System.Exception: Operation could not be
completed due to earlier error Type abbreviations cannot have
augmentations at 2,5 Type abbreviations cannot have members at 3,19
thanks
What you are defining there is a type abbreviation, which, as the errors will indicate, can neither have augmentations nor members. You could fix that by using a single case discriminated union:
type Filter<'T> = Filter of ('T -> bool) with
static member (|* ) (Filter f, Filter g) =
Filter(fun x -> f x || g x) // OR operator
Of course, you now need to unwrap the predicate function prior to the boolean operation, and wrap the composed function again afterwards. A simple test...
let odd x = x % 2 <> 0
let big x = x > 10
let (Filter f) = Filter odd |* Filter big in [8..12] |> List.filter f
// val it : int list = [9; 11; 12]
I'm writing a function called after which takes a list of integers and two integers as parameters. after list num1 num2 should return True if num1 occurs in the list and num2 occurs in list afternum1. (Not necessarily immediately after).
after::[Int]->Int->Int->Bool
after [] _ _=False
after [x:xs] b c
|x==b && c `elem` xs =True
|x/=b && b `elem` xs && b `elem` xs=True
This is what I have so far,my biggest problem is that I don't know how to force num2 to be after num1.
There's a few different ways to approach this one; while it's tempting to go straight for recursion on this, it's nice to
avoid using recursion explicitly if there's another option.
Here's a simple version using some list utilities. Note that it's a Haskell idiom that the object we're operating over is usually the last argument. In this case switching the arguments lets us write it as a pipeline with it's third argument (the list) passed implicitly:
after :: Int -> Int -> [Int] -> Bool
after a b = elem b . dropWhile (/= a)
Hopefully this is pretty easy to understand; we drop elements of the list until we hit an a, assuming we find one we check if there's a b in the remaining list. If there was no a, this list is [] and obviously there's no b there, so it returns False as expected.
You haven't specified what happens if 'a' and 'b' are equal, so I'll leave it up to you to adapt it for that case. HINT: add a tail somewhere ;)
Here are a couple of other approaches if you're interested:
This is pretty easily handled using a fold;
We have three states to model. Either we're looking for the first elem, or
we're looking for the second elem, or we've found them (in the right order).
data State =
FindA | FindB | Found
deriving Eq
Then we can 'fold' (aka reduce) the list down to the result of whether it matches or not.
after :: Int -> Int -> [Int] -> Bool
after a b xs = foldl go FindA xs == Found
where
go FindA x = if x == a then FindB else FindA
go FindB x = if x == b then Found else FindB
go Found _ = Found
You can also do it recursively if you like:
after :: Int -> Int -> [Int] -> Bool
after _ _ [] = False
after a b (x:xs)
| x == a = b `elem` xs
| otherwise = after a b xs
Cheers!
You can split it into two parts: the first one will find the first occurrence of num1. After that, you just need to drop all elements before it and just check that num2 is in the remaining part of the list.
There's a standard function elemIndex for the first part. The second one is just elem.
import Data.List (elemIndex)
after xs x y =
case x `elemIndex` xs of
Just i -> y `elem` (drop (i + 1) xs)
Nothing -> False
If you'd like to implement it without elem or elemIndex, you could include a subroutine. Something like:
after xs b c = go xs False
where go (x:xs) bFound
| x == b && not (null xs) = go xs True
| bFound && x == c = True
| null xs = False
| otherwise = go xs bFound
Trying to implement the straightforward dynamic programming algorithm for the Knapsack problem. Obviously this approach uses a lot of memory and so I am trying to optimize the memory utilized. I am simply trying to store only the previous row of my table in memory just long enough to compute the next row, and so on. At first I thought my implementation was solid, but it still ran out of memory as an implementation designed to store the whole table. So next I thought maybe I need foldl' instead of foldr, but it did not make any difference. My program continues to eat memory until my system runs out.
So I have 2 specific questions:
What is it about my code that is using up all the memory? I thought I was being clever by using a fold, because I assumed only the current value of the accumulator would be stored in memory.
What is the proper approach for achieving my goal; that is, storing only the most recent row in memory? I don't necessarily need code, maybe just some helpful functions and data types. More generally, what are some tips and techniques for understanding memory usage in Haskell?
Here is my implementation
data KSItem a = KSItem { ksItem :: a, ksValue :: Int, ksWeight :: Int} deriving (Eq, Show, Ord)
dynapack5 size items = finalR ! size
where
noItems = length items
itemsArr = listArray(1,noItems) items
row = listArray(1,size) (replicate size (0,[]))
computeRow row item =
let w = ksWeight item
v = ksValue item
idx = ksItem item
pivot = let (lastVal, selections) = row ! w
in if v > lastVal
then (v, [idx])
else (lastVal, selections)
figure r c =
if (prevVal + v) > lastVal
then (prevVal + v, prevItems ++ [idx])
else (lastVal, lastItems)
where (lastVal, lastItems) = (r ! c)
(prevVal, prevItems) = (r ! (c - w))
theRest = [ (figure row cw) | cw <- [(w+1)..size] ]
newRow = (map (row!) [1..(w-1)]) ++
[pivot] ++
theRest
in listArray (1,size) newRow
finalR = foldl' computeRow row items
In my head, what I think this is doing is initializing the first row to (0,[])... repeated as necessary, then kicking off the fold where the next row is calculated based on the supplied row, and this value then becomes the accumulator. I'm not seeing where more and more memory is being consumed...
Random thought: what if i used the \\ operator on the accumulator instead?
As Tom Ellis said, using force on the array solves the space issues. However, it is extremely slow, because force traverses all the lists in the array from start to end each time it is invoked. So we should only force as needed:
let res = listArray (1,size) newRow in force (map fst $ elems res) `seq` res
This fixes the space leak and it's also pretty fast.
If you want to take space efficiency to the logical next step, you could use bitsets of the indices of the items instead of lists of items. Integers are good for the job here since they automatically resize themselves to accommodate the highest set bit. Also, with Integer-s forcing is straightforward:
import qualified Data.Vector as V -- using this instead of Array cause I like it more
import Data.List
import Control.Arrow
import Data.Bits
import Control.DeepSeq
data KSItem a = KSItem { ksItem :: a, ksValue :: Int, ksWeight :: Int} deriving (Eq, Show, Ord)
dynapack5' :: Int -> [KSItem a] -> (Int, Integer)
dynapack5' size items = V.last solutions where
items' = [KSItem i v w | (i, KSItem _ v w) <- zip [0..] items]
solutions = foldl' add (V.replicate (size + 1) (0, 0::Integer)) items'
add arr (KSItem item currVal w) = force $ V.imap go arr where
go i (v, is) | w < i && v' > v = (v', is')
| otherwise = (v, is)
where (v', is') = (+currVal) *** (`setBit` item) $ arr V.! (i - w)
Data.Array is non-strict in its elements so even though foldl' forces it to WHNF each time around the loop the contents don't get evaluated. The simplest fix would be to import Control.DeepSeq and change
in listArray (1,size) newRow
to
in force (listArray (1,size) newRow)
This is doing more work than strictly necessary each time around the loop, but will do the job.
Unfortunately you can't just substitute unboxed arrays here, since your arrays contain a tuple containing a list.
I'm trying to resolve problem 14 of Project Euler (http://projecteuler.net/problem=14) and I hit a dead end using Haskell.
Now, I know that the numbers may be small enough and I could do a brute force, but that isn't the purpose of my exercise.
I am trying to memorize the intermediate results in a Map of type Map Integer (Bool, Integer) with the meaning of:
- the first Integer (the key) holds the number
- the Tuple (Bool, Interger) holds either (True, Length) or (False, Number)
where Length = length of the chain
Number = the number before him
Ex:
for 13: the chain is 13 → 40 → 20 → 10 → 5 → 16 → 8 → 4 → 2 → 1
My map should contain :
13 - (True, 10)
40 - (False, 13)
20 - (False, 40)
10 - (False, 20)
5 - (False, 10)
16 - (False, 5)
8 - (False, 16)
4 - (False, 8)
2 - (False, 4)
1 - (False, 2)
Now when I search for another number like 40 i know that the chain has (10 - 1) length and so on.
I want now, if I search for 10, not only to tell me that length of 10 is (10 - 3) length and update the map, but also I want to update 20, 40 in case they are still (False, _)
My code:
import Data.Map as Map
solve :: [Integer] -> Map Integer (Bool, Integer)
solve xs = solve' xs Map.empty
where
solve' :: [Integer] -> Map Integer (Bool, Integer) -> Map Integer (Bool, Integer)
solve' [] table = table
solve' (x:xs) table =
case Map.lookup x table of
Nothing -> countF x 1 (x:xs) table
Just (b, _) ->
case b of
True -> solve' xs table
False -> {-WRONG-} solve' xs table
f :: Integer -> Integer
f x
| x `mod` 2 == 0 = x `quot` 2
| otherwise = 3 * x + 1
countF :: Integer -> Integer -> [Integer] -> Map Integer (Bool, Integer) -> Map Integer (Bool, Integer)
countF n cnt (x:xs) table
| n == 1 = solve' xs (Map.insert x (True, cnt) table)
| otherwise = countF (f n) (cnt + 1) (x:xs) $ checkMap (f n) n table
checkMap :: Integer -> Integer -> Map Integer (Bool, Integer) -> Map Integer (Bool, Integer)
checkMap n rez table =
case Map.lookup n table of
Nothing -> Map.insert n (False, rez) table
Just _ -> table
At the {-WRONG-} part we should update all the values like in the following example:
--We are looking for 10:
10 - (False, 20)
|
V {-finally-} update 10 => (True, 10 - 1 - 1 - 1)
20 - (False, 40) ^
| |
V update 20 => 20 - (True, 10 - 1 - 1)
40 - (False, 13) ^
| |
V update 40 => 40 - (True, 10 - 1)
13 - (True, 10) ^
| |
---------------------------
The problem is that I don't know if its possible to do 2 things in a function like updating a number and continue the recurence. In a C like language I may do something like (pseudocode):
void f(int n, tuple(b,nr), int &length, table)
{
if(b == False) f (nr, (table lookup nr), 0, table);
// the bool is true so we got a length
else
{
length = nr;
return;
}
// Since this is a recurence it would work as a stack, producing the right output
table update(n, --cnt);
}
The last instruction would work since we are sending cnt by reference. Also we always know that it will finish at some point and cnt should not be < 1.
The easiest optimization (as you have identified) is memoization. You have attempted create a memoization system yourself, however have come across issues on how to store the memoized values. There are solutions to doing this in a maintainable way, such as using a State monad or a STArray. However, there is a much simpler solution to your problem - use haskell's existing memoization. Haskell by default remembers constant values, so if you create a value that stores the collatz values, it will be automatically memoized!
A simple example of this is the following fibonacci definition:
fib :: Int -> Integer
fib n = fibValues !! n where
fibValues = 1 : 1 : zipWith (+) fibValues (tail fibValues)
The fibValues is a [Integer], and as it is just a constant value, it is memoized. However, that doesn't mean it is all memoized at once, since as it is an infinte list, this would never finish. Instead, the values are only calculated when needed, as haskell is lazy.
So if you do something similar with your problem, you will get memoization without a lot of the work. However, using a list like above won't work well in your solution. This is because the collatz algorithm uses many different values to get the result for a given number, so the container used will require random access to be efficient. The obvious choice is an array.
collatzMemoized :: Array Integer Int
Next, we need to fill up the array with the correct values. I'll write this function pretending a collatz function exists that calculates the collatz value for any n. Also, note that arrays are fixed size, so a value needs to be used to determine the maximum number to memoize. I'll use a million, but any value can be used (it is a memory/speed tradeoff).
collatzMemoized = listArray (1, maxNumberToMemoize) $ map collatz [1..maxNumberToMemoize] where
maxNumberToMemroize = 1000000
That is pretty straightforward, the listArray is given bounds, and the a list of all the collatz values in that range is given to it. Remember that this won't calculate all the collatz values straight away, as the values are lazy.
Now, the collatz function can be written. The most important part is to only check the collatzMemoized array if the number being checked is within its bounds:
collatz :: Integer -> Int
collatz 1 = 1
collatz n
| inRange (bounds collatzMemoized) nextValue = 1 + collatzMemoized ! nextValue
| otherwise = 1 + collatz nextValue
where
nextValue = case n of
1 -> 1
n | even n -> n `div` 2
| otherwise -> 3 * n + 1
In ghci, you can now see the effectiveness of the memoization. Try collatz 200000. It will take about 2 seconds to finish. However, if you run it again, it will complete instantly.
Finally, the solution can be found:
maxCollatzUpTo :: Integer -> (Integer, Int)
maxCollatzUpTo n = maximumBy (compare `on` snd) $ zip [1..n] (map collatz [1..n]) where
and then printed:
main = print $ maxCollatzUpTo 1000000
If you run main, the result will be printed in about 10 seconds.
Now, a small problem with this approach is it uses a lot of stack space. It will work fine in ghci (which seems to use be more flexible with regards to stack space). However, if you compile it and try to run the executable, it will crash (with a stack space overflow). So to run the program, you have to specify more when you compile it. This can be done by adding -with-rtsopts='K64m' to the compile options. This increases the stack to 64mb.
Now the program can be compiled and ran:
> ghc -O3 --make -with-rtsopts='-K6m' problem.hs
Running ./problem will give the result in less than a second.
You are going about memoization the hard way, trying to write an imperative program in Haskell. Borrowing from David Eisenstat's solution, we'll solve it as j_random_hacker suggested:
collatzLength :: Integer -> Integer
collatzLength n
| n == 1 = 1
| even n = 1 + collatzLength (n `div` 2)
| otherwise = 1 + collatzLength (3*n + 1)
The dynamic programming solution for this is to replace the recursion with looking things up in a table. Let's make a function where we can replace the recursive call:
collatzLengthDef :: (Integer -> Integer) -> Integer -> Integer
collatzLengthDef r n
| n == 1 = 1
| even n = 1 + r (n `div` 2)
| otherwise = 1 + r (3*n + 1)
Now we could define the recursive algorithm as
collatzLength :: Integer -> Integer
collatzLength = collatzLengthDef collatzLength
Now we could also make a tabled version of this (it takes a number for the table size, and returns a collatzLength function that is calculated using a table of that size):
-- A utility function that makes memoizing things easier
buildTable :: (Ix i) => (i, i) -> (i -> e) -> Array i e
buildTable bounds f = array $ map (\x -> (x, f x)) $ range bounds
collatzLengthTabled :: Integer -> Integer -> Integer
collatzLengthTabled n = collatzLengthTableLookup
where
bounds = (1, n)
table = buildTable bounds (collatzLengthDef collatzLengthTableLookup)
collatzLengthTableLookup =
\x -> Case inRange bounds x of
True -> table ! x
_ -> (collatzLengthDef collatzLengthTableLookup) x
This works by defining the collatzLength to be a table lookup, with the table being the definition of the function, but with recursive calls replaced by table lookup. The table lookup function checks to see if the argument to the function is in the range that is tabled, and falls back on the definition of the function. We can even make this work for tabling any function like this:
tableRange :: (Ix a) => (a, a) -> ((a -> b) -> a -> b) -> a -> b
tableRange bounds definition = tableLookup
where
table = buildTable bounds (definition tableLookup)
tableLookup =
\x -> Case inRange bounds x of
True -> table ! x
_ -> (definition tableLookup) x
collatzLengthTabled n = tableRange (1, n) collatzLengthDef
You just need to make sure that you
let memoized = collatzLengthTabled 10000000
... memoized ...
So that only one table is built in memory.
I remember finding memoisation of dynamic programming algorithms very counterintuitive in Haskell, and it's been a while since I've done it, but hopefully the following trick works for you.
But first, I don't quite understand your current DP scheme, though I suspect it may be quite inefficient as it seems like it will need to update many entries for each answer. (a) I don't know how to do this in Haskell, and (b) you don't need to do this to solve the problem efficiently ;-)
I suggest the following approach instead: first build an ordinary recursive function that computes the right answer for an input number. (Hint: it will have a signature like collatzLength :: Int -> Int.) When you have this function working, just replace its definition with the definition of an array whose elements are defined lazily with the array function using an association list, and replace all recursive calls to the function to array lookups (e.g. collatzLength 42 would become collatzLength ! 42). This will automagically populate the array in the necessary order! So your "top-level" collatzLength object will now actually be an array, rather than a function.
As I suggested above, I would use an array instead of a map datatype to hold the DP table, since you will need to store values for all integer indices from 1 up to 1,000,000.
I don't have a Haskell compiler handy, so I apologize for any broken code.
Without memoization, there's a function
collatzLength :: Integer -> Integer
collatzLength n
| n == 1 = 1
| even n = 1 + collatzLength (n `div` 2)
| otherwise = 1 + collatzLength (3*n + 1)
With memoization, the type signature is
memoCL :: Map Integer Integer -> Integer -> (Map Integer Integer, Integer)
since memoCL receives a table as input and gives the updated table as output. What memoCL needs to do is intercept the return of the recursive call with a let form and insert the new result.
-- table must have an initial entry for 1
memoCL table n = case Map.lookup n table of
Just m -> (table, m)
Nothing -> let (table', m) = memoCL table (collatzStep n) in (Map.insert n (1 + m) table', 1 + m)
collatzStep :: Integer -> Integer
collatzStep n = if even n then n `div` 2 else 3*n + 1
At some point you'll get sick of the above idiom. Then it's time for monads.
I eventually modify the {-WRONG-} part to do what it should with a call to mark x (b, n) [] xs table where
mark :: Integer -> (Bool, Integer) -> [Integer] -> [Integer] -> Map Integer (Bool, Integer) -> Map Integer (Bool, Integer)
mark crtElem (b, n) list xs table
| b == False = mark n (findElem n table) (crtElem:list) xs table
| otherwise = continueWith n list xs table
continueWith :: Integer -> [Integer] -> [Integer] -> Map Integer (Bool, Integer) -> Map Integer (Bool, Integer)
continueWith _ [] xs table = solve' xs table
continueWith cnt (y:ys) xs table = continueWith (cnt - 1) ys xs (Map.insert y (True, cnt - 1) table)
findElem :: Integer -> Map Integer (Bool, Integer) -> (Bool, Integer)
findElem n table =
case Map.lookup n table of
Nothing -> (False, 0)
Just (b, nr) -> (b, nr)
But it seams that there are better (and far less verbose) answers than this 1
Maybe you might find interesting how I solved the problem. Its is pretty functional though it might be not the most efficient thing on earth :)
You can find the code here: https://github.com/fmancinelli/project-euler/blob/master/haskell/project-euler/Problem014.hs
P.S.: Disclaimer: I was doing Project Euler exercises in order to learn Haskell, so the quality of the solution could be debatable.
Since we are studying recursion schemes, here's one for you.
Let's consider functor N(A,B,X)=A+B*X, which is a stream of Bs with the last element being A.
{-# LANGUAGE DeriveFunctor
, TypeFamilies
, TupleSections #-}
import Data.Functor.Foldable
import qualified Data.Map as M
import Data.List
import Data.Function
import Data.Int
data N a b x = Z a | S b x deriving (Functor)
This stream is handy for several kinds of iterations. For one, we can use it to represent a chain of Ints in a Collatz sequence:
type instance Base Int64 = N Int Int64
instance Foldable Int64 where
project 1 = Z 1
project x | odd x = S x $ 3*x+1
project x = S x $ x `div` 2
This is just a algebra, not a initial one, because the transformation is not a isomorphism (same chain of Ints is part of a chain for 2*x and (x-1)/3), but this is sufficient to represent the fixpoint Base Int64 Int64.
With this definition, cata is going to feed the chain to the algebra given to it, and you can use it to construct a memo Map of integers to the chain length. Finally, anamorphism can use it to generate a stream of solutions to the problem of different sizes:
problems = ana (uncurry $ cata . phi) (M.empty, 1) where
phi :: M.Map Int64 Int ->
Base Int64 (Prim [(Int64, Int)] (M.Map Int64 Int, Int64)) ->
Prim [(Int64, Int)] (M.Map Int64 Int, Int64)
phi m (Z v) = found m 1 v
phi m (S x ~(Cons (_, v') (m', _))) = maybe (notFound m' x v') (found m x) $
M.lookup x m
The ~ before (Cons ...) means lazy pattern matching. We don't touch the pattern until the values are needed. If not for lazy pattern matching, it would always construct the whole chain, and using the map would be useless. With lazy pattern matching we only construct the values v' and m' if the chain length for x was not in the map.
Helper functions construct the stream of (Int, chain length) pairs:
found m x v = Cons (x, v) (m, x+1)
notFound m x v = Cons (x, 1+v) (M.insert x (1+v) m, x+1)
Now just take the first 999999 problems, and figure out the one that has the longest chain:
main = print $ maximumBy (compare `on` snd) $ take 999999 problems
This works slower than array-based solution, because Map lookup is logarithmic of map size, but this solution is not fixed size. Still, it finishes in about 5 seconds.