Related
I am implementing some algorithm on haskell. This algorithm requires generating some data.
I have a function of an algorithm which takes generation function as a parameter. For example, algorithm is just multiplying input data by n:
algo :: a -> ??? -> [a]
algo n dgf = map (\x -> x * n) $ dgf
dgf is used to generate data. How to write function header correctly, as dgf can be any function with any number of parameters?
Another variant is accepting not the generation function but already generated data.
algo :: a -> [b] -> [a]
algo n d = (\x -> n*x) d
So, now let's imagine I'm generation data with stdGen, which uses IO. How can I make function more generic, so that it could accept both IO instance and plain values like just [1,2,3]. This also relates to variant with function, as it can also produce IO.
All in all, which solution is better - having a generation function or a pre-generated data?
Thanks in advance.
One option is to take a stream rather than a list. If generating the values involves performing IO, and there may be many many values, this is often the best approach. There are several packages that offer streams of some sort, but I'll use the streaming package in this example.
import qualified Streaming.Prelude as S
import Streaming
algo :: Monad m => a -> Stream (Of a) m r -> Stream (Of a) m r
algo a = S.map (a +)
You can read Stream (Of a) m r as "a way to use operations in m to produce successive values of type a and finally a result of type r". This algo function doesn't commit to any particular way of generating the data; they can be created purely:
algo a (S.each [these, are, my, elements])
or within IO,
algo a $ S.takeWhile (> 3) (S.readLn :: Stream (Of Int) IO ())
or using a randomness monad, or whatever you like.
For contrast, I'm going to take the opposite approach as dfeuer's answer.
Just use lists.
Consider your first example:
algo :: a -> ??? -> [a]
algo n dgf = map (\x -> x * n) $ dgf
You ask "How to write function header correctly, as dgf can be any function with any number of parameters?"
Well, one way is to use uncurrying.
Normally, Haskell functions are curried. If we have a function like
add :: Int -> Int -> Int
add x y = x + y
And we want a function that adds two to its input we can just use add 2.
>>> map (add 2) [1..10]
[3,4,5,6,7,8,9,10,11,12]
Because add is not actually a function that takes two arguments,
it's a function of one argument that returns a function of one argument.
We could have added parentheses to the argument of add above to make this more clear:
add :: Int -> (Int -> Int)
In Haskell, all functions are functions of one argument.
However, we can also go the other way - uncurry a function
that returns a function to get a function that takes a pair:
>>> :t uncurry
uncurry :: (a -> b -> c) -> (a, b) -> c
>>> :t uncurry add
uncurry add :: (Int, Int) -> Int
This can also be useful, say if we want to find the sum of each pair in a list:
>>> map (uncurry add) [ (1,2), (3,4), (5,6), (7,8), (9,10) ]
[3,7,11,15,19]
In general, we can uncurry any function of type a0-> a1 -> ... -> aN -> b
into a function (a0, a1, ..., aN) -> b, though there might not be
a cute library function to do it for us.
With that in mind, we could implement algo by passing it an uncurried
function and a tuple of values:
algo :: Num a => a -> (t -> [a]) -> t -> [a]
algo n f t = map (\x -> x * n) $ f t
And then use anonymous functions to uncurry our argument functions:
>>> algo 2 (\(lo,hi) -> enumFromTo lo hi) (5, 10)
[10,12,14,16,18,20]
>>> algo 3 (\(a,b,c,d) -> zipWith (+) [a..b] [c..d]) (1, 5, 10, 14)
[33,39,45,51,57]
Now we could do it this way, but we don't need to. As implemented above,
algo is only using f and t once. So why not pass it the list directly?
algo' :: Num a => a -> [a] -> [a]
algo' n ns = map (\x -> x * n) ns
It calculates the same results:
>>> algo' 2 $ (\(lo,hi) -> enumFromTo lo hi) (5, 10)
[10,12,14,16,18,20]
>>> algo' 2 $ enumFromTo 5 10
[10,12,14,16,18,20]
>>> algo' 3 $ (\(a,b,c,d) -> zipWith (+) [a..b] [c..d]) (1, 5, 10, 14)
[33,39,45,51,57]
>>> algo' 3 $ zipWith (+) [1..5] [10..14]
[33,39,45,51,57]
Furthermore, since haskell is non-strict, the argument to algo' isn't evaluated
until it's actually used, so we don't have to worry about "wasting" time computing
arguments that won't actually be used:
algo'' :: Num a => a -> [a] -> [a]
algo'' n ns = [n,n,n,n]
algo'' doesn't use the list passed to it, so it's never forced, so whatever
computation is used to calculate it never runs:
>>> let isPrime n = n > 2 && null [ i | i <- [2..n-1], n `rem` i == 0 ]
>>> :set +s
>>> isPrime 10000019
True
(6.18 secs, 2,000,067,648 bytes)
>>> algo'' 5 (filter isPrime [1..999999999999999])
[5,5,5,5]
(0.01 secs, 68,936 bytes)
Now to the second part of your question - what if your data is being generated within some monad?
Rather than convince algo to operate on monadic values, you could take the stream
based approach as dfeuer explains. Or you could just use a list.
Just because you're in a monad, doesn't mean that your values suddenly become strict.
For example, want a infinite list of random numbers? No problem.
newRandoms :: Num a -> IO [a]
newRandoms = unfoldr (\g -> Just (random g)) <$> newStdGen
Now I can just pass those to some algorithm:
>>> rints <- newRandoms :: IO [Int]
(0.00 secs, 60,624 bytes)
>>> algo'' 5 rints
[5,5,5,5]
(0.00 secs, 68,920 bytes)
For a small program which is just reading input from a file or two, there's no problem
with just using readFile and lazy I/O to get a list to operate on.
For example
>>> let grep pat lines = [ line | line <- lines, pat `isInfixOf` line ]
>>> :set +s
>>> dict <- lines <$> readFile "/usr/share/dict/words"
(0.01 secs, 81,504 bytes)
>>> grep "poop" dict
["apoop","epoophoron","nincompoop","nincompoopery","nincompoophood","nincompoopish","poop","pooped","poophyte","poophytic","whisterpoop"]
(0.72 secs, 423,650,152 bytes)
Im trying to understand function composition in Haskell.
According to ZVON http://zvon.org/other/haskell/Outputprelude/filter_f.html
the filter function should have two arguments, a bool function and a list.
Example filter (>5) [1,2,3,4,5,6,7,8] returns anything greater than 5:
[6,7,8]
Question, how does the following line with several function compositions pass in a boolean for the filter to utilize?
map fst . filter snd . assocs . soeA
shouldn't it be map fst . filter (==True) snd . assocs . soeA
To analyze I run the first two functions of the composition and pass an argument: assocs . soeA $ 9 returns
[(0,False),(1,False),(2,True),(3,True),(4,False),(5,True),(6,False),(7,True),(8,False),(9,False)]
soe 9 returns [2,3,5,7]
Somehow the bool value in each array element of soeA is being used, but any help explaining how this composition is working would be very much appreciated.
Full Code is:
`
module FastSeive where
import Control.Monad
import Control.Monad.ST
import Data.Array.ST
import Data.Array.Unboxed
soeST :: forall s. Int -> ST s (STUArray s Int Bool)
soeST n = do
arr <- newArray (0, n) True
mapM_ (\i -> writeArray arr i False) [0, 1]
let n2 = n `div` 2
let loop :: Int -> ST s ()
loop i | i > n2 = return ()
loop i = do
b <- readArray arr i
let reset :: Int -> ST s ()
reset j | j > n = return ()
reset j = writeArray arr j False >> reset (j + i)
when b (reset (2*i))
loop (succ i)
loop 2
return arr
soeA :: Int -> UArray Int Bool
soeA n = runST (soeST n >>= freeze)
soe :: Int -> [Int]
soe = map fst . filter snd . assocs . soeA
soeCount :: Int -> Int
soeCount = length . filter id . elems . soeA
`
The short answer is: here, snd is the Bool-returning function filter expects. In the expression you wrote: map fst . filter (==True) snd . assocs . soeA. snd would be filter's second argument, while (==True) would be the first one. Of course, it won't typecheck because filter is already applied to two arguments, and cannot be used in function composition: it's not a function anymore.
For a longer answer, we can actually apply (.)'s definition to find out what's happening:
(f . g) x = f (g x)
-- In haskell, it is defined as being right associative
-- Meaning that if we put explicit parenthesises, we'd have:
soe = (map fst . (filter snd . (assocs . soeA)))
-- That only really matters for the compiler, though,
-- because we know function composition is associative.
soe = map fst . filter snd . assocs . soeA
-- "Un-pointfree-ing" it:
soe x = (map fst . filter snd . assocs . soeA) x
-- Applying (.)'s definition:
soe x = map fst ((filter snd . assocs . soeA) x)
-- Again:
soe x = map fst (filter snd ((assocs . soeA) x))
-- And again:
soe x = map fst (filter snd (asocs (soeA x)))
It's now clear that snd is filter's first argument, while the second argument will evaluate to what assocs (soeA x) will evaluate to.
More generally, when one writes f . g . h, this can be read right-to-left as a function that first applies h to its argument, then g to the result, then f to the next result, and yields that final value.
Now, for the even longer answer, we can look at how the types for your expression will be inferred. It'll tell us why snd is the Bool-returning function filter expects even though it has a type signature of snd :: (a, b) -> b.
Disclaimer: I don't have a background in compiler engineering; the terms I'll be using may be inexact.
The type of filter is (a -> Bool) -> [a] -> [a]. The type of snd is (a, b) -> b.
Those are actually parameterized types. We can make the type parameters explicit:
filter :: forall a. (a -> Bool) -> [a] -> [a]
snd :: forall a b. (a, b) -> b
We'll also rename filter's type argument in order to make it non-ambiguous in what we'll write next:
filter :: forall c. (c -> Bool) -> [c] -> [c]
filter gets applied first to snd. So, we can try and unify c -> Bool from filter with (a, b) -> b, snd's type. We get these equations :
c -> Bool = (a, b) -> b
===
c = (a, b)
b = Bool
===
c = (a, Bool)
b = Bool
We'll assume that assocs (soeA x)'s type is [(Int, Bool)]. Since filter's second argument has the type [c], we can unify further:
[c] = [(Int, Bool)]
===
c = (Int, Bool)
This also gives us:
(Int, Bool) = c = (a, Bool)
===
a = Int
So, after type application, we get these concrete types for our sub-expressions :
filter :: ((Int, Bool) -> Bool) -> [(Int, Bool)] -> [(Int, Bool)]
snd :: (Int, Bool) -> Bool
Well, of course, we could have used GHC's type inference all along to tell us about that, either using GHCi, or through a text editor's haskell plugin.
Is there a variant of
sortBy :: (a -> a -> Ordering) -> [a] -> [a]
(in Data.List) that allows me to use a a -> a -> Maybe Ordering sorting function instead of a -> a -> Ordering?
What this variant would do is this:
sortBy' :: (a -> a -> Maybe Ordering) -> [a] -> Maybe [a]
If a -> a -> Maybe Ordering ever returns Nothing when it's called during the sort, sortBy' would return Nothing. Otherwise it would return the sorted list wrapped in Just.
If such a variant is not already available, can you please help me construct one? (Preferably one that is at least as efficient as sortBy.)
You can adapt quickSort :
quickSortBy :: (a -> a -> Maybe Ordering) -> [a] -> Maybe [a]
quickSortBy f [] = Just []
quickSortBy f (x:xs) = do
comparisons <- fmap (zip xs) $ mapM (f x) xs
sortLesser <- quickSortBy f . map fst $ filter ((`elem` [GT, EQ]) . snd) comparisons
sortUpper <- quickSortBy f . map fst $ filter ((== LT) . snd) comparisons
return $ sortLesser ++ [x] ++ sortUpper
At least assume that your sorting predicate f :: a -> a -> Maybe Ordering is anti-symmetric : f x y == Just LT if and only if f y x == Just GT. Then when quickSortBy f returns Just [x1,...,xn], I think you have this guarantee : for all i in [1..n-1], f xi x(i+1) is Just LT or Just EQ.
When in particular f is a partial order (transitive), then [x1,...,xn] is totally ordered.
I'm interested in efficient functional algorithms (preferably in Haskell, and even more preferably already implemented as part of a library!) for computing the closure of a container under a unary operator.
A basic and inefficient example of what I have in mind, for lists, is:
closure :: Ord a => (a -> a) -> [a] -> [a]
closure f xs = first_dup (iterate (\xs -> nub $ sort $ xs ++ map f xs) xs) where
first_dup (xs:ys:rest) = if xs == ys then xs else first_dup (ys:rest)
A more efficient implementation keeps tracks of the new elements generated at each stage (the "fringe") and doesn't apply the function to elements to which it has already been applied:
closure' :: Ord a => (a -> a) -> [a] -> [a]
closure' f xs = stable (iterate close (xs, [])) where
-- return list when it stabilizes, i.e., when fringe is empty
stable ((fringe,xs):iterates) = if null fringe then xs else stable iterates
-- one iteration of closure on (fringe, rest); key invariants:
-- (1) fringe and rest are disjoint; (2) (map f rest) subset (fringe ++ rest)
close (fringe, xs) = (fringe', xs') where
xs' = sort (fringe ++ xs)
fringe' = filter (`notElem` xs') (map f fringe)
As an example, if xs is a nonempty sublist of [0..19], then closure' (\x->(x+3)`mod`20) xs is [0..19], and the iteration stabilizes in 20 steps for [0], 13 steps for [0,1], and 4 steps for [0,4,8,12,16].
Even more efficiency could be gotten using a tree-based ordered-set implementation.
Has this been done already? What about the related but harder question of closure under binary (or higher-arity) operators?
How about something like this which uses the Hash Array Mapped Trie data structures in unordered-containers. For unordered-containers member and insert are O(min(n,W)) where W is the length of the hash.
module Closed where
import Data.HashSet (HashSet)
import Data.Hashable
import qualified Data.HashSet as Set
data Closed a = Closed { seen :: HashSet a, iter :: a -> a }
insert :: (Hashable a, Eq a) => a -> Closed a -> Closed a
insert a c#(Closed set iter)
| Set.member a set = c
| otherwise = insert (iter a) $ Closed (Set.insert a set) iter
empty :: (a -> a) -> Closed a
empty = Closed Set.empty
close :: (Hashable a, Eq a) => (a -> a) -> [a] -> Closed a
close iter = foldr insert (empty iter)
Here's a variation on the above that generates the solution set more lazily, in a breadth-first manner.
data Closed' a = Unchanging | Closed' (a -> a) (HashSet a) (Closed' a)
close' :: (Hashable a, Eq a) => (a -> a) -> [a] -> Closed' a
close' iter = build Set.empty where
inserter :: (Hashable a, Eq a) => a -> (HashSet a, [a]) -> (HashSet a, [a])
inserter a (set, fresh) | Set.member a set = (set, fresh)
| otherwise = (Set.insert a set, a:fresh)
build curr [] = Unchanging
build curr as =
Closed' iter curr $ step (foldr inserter (curr, []) as)
step (set, added) = build set (map iter added)
-- Only computes enough iterations of the closure to
-- determine whether a particular element has been generated yet
--
-- Returns both a boolean and a new 'Closed'' value which will
-- will be more precisely defined and thus be faster to query
member :: (Hashable a, Eq a) => a -> Closed' a -> (Bool, Closed' a)
member _ Unchanging = False
member a c#(Closed' _ set next) | Set.member a set = (True, c)
| otherwise = member a next
improve :: Closed' a -> Maybe ([a], Closed' a)
improve Unchanging = Nothing
improve (Closed' _ set next) = Just (Set.toList set, next)
seen' :: Closed' a -> HashSet a
seen' Unchanging = Set.empty
seen' (Closed' _ set Unchanging) = set
seen' (Closed' _ set next) = seen' next
And to check
>>> member 6 $ close (+1) [0]
...
>>> fst . member 6 $ close' (+1) [0]
True
for simple problems like fibonacci, writing CPS is relatively straightforward
let fibonacciCPS n =
let rec fibonacci_cont a cont =
if a <= 2 then cont 1
else
fibonacci_cont (a - 2) (fun x ->
fibonacci_cont (a - 1) (fun y ->
cont(x + y)))
fibonacci_cont n (fun x -> x)
However, in the case of the rod-cutting exemple from here (or the book intro to algo), the number of closure is not always equal to 2, and can't be hard coded.
I imagine one has to change the intermediate variables to sequences.
(I like to think of the continuation as a contract saying "when you have the value, pass it on to me, then i'll pass it on to my boss after treatment" or something along those line, which defers the actual execution)
For the rod cutting, we have
//rod cutting
let p = [|1;5;8;9;10;17;17;20;24;30|]
let rec r n = seq { yield p.[n-1]; for i in 1..(n-1) -> (p.[i-1] + r (n-i)) } |> Seq.max
[1 .. 10] |> List.map (fun i -> i, r i)
In this case, I will need to attached the newly created continuation
let cont' = fun (results: _ array) -> cont(seq { yield p.[n-1]; for i in 1..(n-1) -> (p.[i-1] + ks.[n-i]) } |> Seq.max)
to the "cartesian product" continuation made by the returning subproblems.
Has anyone seen a CPS version of rod-cutting / has any tips on this ?
I assume you want to explicitly CPS everything, which means some nice stuff like the list comprehension will be lost (maybe using async blocks can help, I don't know F# very well) -- so starting from a simple recursive function:
let rec cutrod (prices: int[]) = function
| 0 -> 0
| n -> [1 .. min n (prices.Length - 1)] |>
List.map (fun i -> prices.[i] + cutrod prices (n - i)) |>
List.max
It's clear that we need CPS versions of the list functions used (map, max and perhaps a list-building function if you want to CPS the [1..(blah)] expression too). map is quite interesting since it's a higher-order function, so its first parameter needs to be modified to take a CPS-ed function instead. Here's an implementation of a CPS List.map:
let rec map_k f list k =
match list with
| [] -> k []
| x :: xs -> f x (fun y -> map_k f xs (fun ys -> k (y :: ys)))
Note that map_k invokes its argument f like any other CPS function, and puts the recursion in map_k into the continuation. With map_k, max_k, gen_k (which builds a list from 1 to some value), the cut-rod function can be CPS-ed:
let rec cutrod_k (prices: int[]) n k =
match n with
| 0 -> k 0
| n -> gen_k (min n (prices.Length - 1)) (fun indices ->
map_k (fun i k -> cutrod_k prices (n - i) (fun ret -> k (prices.[i] + ret)))
indices
(fun totals -> max_k totals k))