Why is there a let in OCaml's List.map? - syntax

In OCaml 3.12.1, List.map is written as follows:
let rec map f = function
[] -> []
| a::l -> let r = f a in r :: map f l
I'd expect that last line to be written as | a::l -> f a :: map f l, but instead, there is a seemingly useless let binding. Why?

I believe it is there to guarantee an order of function application for the map. The order of evaluation of simple expressions in OCaml is unspecified, so without the let the order of applications of f to the elements of the list would be unspecified. Since OCaml is not a pure language, you really would like the order to be specified (f is called on the head of the list first, and so on recursively).

Related

Functional programming with OCAML

I'm new to functional programming and I'm trying to implement a basic algorithm using OCAML for course that I'm following currently.
I'm trying to implement the following algorithm :
Entries :
- E : a non-empty set of integers
- s : an integer
- d : a positive float different of 0
Output :
- T : a set of integers included into E
m <- min(E)
T <- {m}
FOR EACH e ∈ sort_ascending(E \ {m}) DO
IF e > (1+d)m AND e <= s THEN
T <- T U {e}
m <- e
RETURN T
let f = fun (l: int list) (s: int) (d: float) ->
List.fold_left (fun acc x -> if ... then (list_union acc [x]) else acc)
[(list_min l)] (list_sort_ascending l) ;;
So far, this is what I have, but I don't know how to handle the modification of the "m" variable mentioned in the algorithm... So I need help to understand what is the best way to implement the algorithm, maybe I'm not gone in the right direction.
Thanks by advance to anyone who will take time to help me !
The basic trick of functional programming is that although you can't modify the values of any variables, you can call a function with different arguments. In the initial stages of switching away from imperative ways of thinking, you can imagine making every variable you want to modify into the parameters of your function. To modify the variables, you call the function recursively with the desired new values.
This technique will work for "modifying" the variable m. Think of m as a function parameter instead.
You are already using this technique with acc. Each call inside the fold gets the old value of acc and returns the new value, which is then passed to the function again. You might imagine having both acc and m as parameters of this inner function.
Assuming list_min is defined you should think the problem methodically. Let's say you represent a set with a list. Your function takes this set and some arguments and returns a subset of the original set, given the elements meet certain conditions.
Now, when I read this for the first time, List.filter automatically came to my mind.
List.filter : ('a -> bool) -> 'a list -> 'a list
But you wanted to modify the m so this wouldn't be useful. It's important to know when you can use library functions and when you really need to create your own functions from scratch. You could clearly use filter while handling m as a reference but it wouldn't be the functional way.
First let's focus on your predicate:
fun s d m e -> (float e) > (1. +. d)*.(float m) && (e <= s)
Note that +. and *. are the plus and product functions for floats, and float is a function that casts an int to float.
Let's say the function predicate is that predicate I just mentioned.
Now, this is also a matter of opinion. In my experience I wouldn't use fold_left just because it's just complicated and not necessary.
So let's begin with my idea of the code:
let m = list_min l;;
So this is the initial m
Then I will define an auxiliary function that reads the m as an argument, with l as your original set, and s, d and m the variables you used in your original imperative code.
let rec f' l s d m =
match l with
| [] -> []
| x :: xs -> if (predicate s d m x) then begin
x :: (f' xs s d x)
end
else
f' xs s d m in
f' l s d m
Then for each element of your set, you check if it satisfies the predicate, and if it does, you call the function again but you replace the value of m with x.
Finally you could just call f' from a function f:
let f (l: int list) (s: int) (d: float) =
let m = list_min l in
f' l s d m
Be careful when creating a function like your list_min, what would happen if the list was empty? Normally you would use the Option type to handle those cases but you assumed you're dealing with a non-empty set so that's great.
When doing functional programming it's important to think functional. Pattern matching is super recommended, while pointers/references should be minimal. I hope this is useful. Contact me if you any other doubt or recommendation.

Is there an F# equivalent of Enumerable.DefaultIfEmpty?

After searching quite a bit, I couldn't find an F# equivalent of Enumerable.DefaultIfEmpty.
Does something similar exists in F# (perhaps in a different, idiomatic, way)?
To preserve the laziness of the sequence, we could work with the enumerator's state.
let DefaultIfEmpty (l:'t seq) (d:'t) =
seq{
use en = l.GetEnumerator()
if en.MoveNext() then
yield en.Current
while en.MoveNext() do
yield en.Current
else
yield d }
Seq module functions operate and return IEnumerable<_>'s and DefaultIfEmpty operate and return IEnumerable<_>'s. How about just wrap it in function that is composable.
let inline DefaultIfEmpty d l = System.Linq.Enumerable.DefaultIfEmpty(l, d)
This also preserves laziness.
example:
Seq.empty |> DefaultIfEmpty 0
Update
I've made an open source library inlining many extension and static methods, including Enumerable.defaultIfEmpty -- ComposableExtesions
There are a few options:
Use DefaultIfEmpty which might be non-idiomatic but will work
write your own like so:
let DefaultIfEmpty (l:'t seq) (d:'t) =
match Seq.length l with |0 -> seq [d] |_ -> l
Worry about infinite sequences
let DefaultIfEmpty (l:'t seq) (d:'t) =
match Seq.isEmpty l with |true -> seq [d] |false -> l

F# equivalent of LINQ Single

Ok, so for most LINQ operations there is a F# equivalent.
(Generally in the Seq module, since Seq= IEnumerable)
I can't find the equiv of IEmumerable.Single, I prefer Single over First (which is Seq.find), because it is more defensive - it asserts for me the state is what I expect.
So I see a couple of solutions (other than than using Seq.find).
(These could be written as extension methods)
The type signature for this function, which I'm calling only, is
('a->bool) -> seq<'a> -> 'a
let only = fun predicate src -> System.Linq.Enumerable.Single<'a>(src, predicate)
let only2 = Seq.filter >> Seq.exactlyOne
only2 is preferred, however it won't compile (any clues on that?).
In F# 2.0, this is a solution works without enumerating the whole sequence (close to your 2nd approach):
module Seq =
let exactlyOne seq =
match seq |> Seq.truncate 2 with
| s when Seq.length s = 1 -> s |> Seq.head |> Some
| _ -> None
let single predicate =
Seq.filter predicate >> exactlyOne
I choose to return option type since raising exception is quite unusual in F# high-order functions.
EDIT:
In F# 3.0, as #Oxinabox mentioned in his comment, Seq.exactlyOne exists in Seq module.
What about
let Single source f =
let worked = ref false
let newf = fun a ->
match f a with
|true ->
if !worked = true then failwith "not single"
worked := true
Some(a)
|false -> None
let r = source |> Seq.choose newf
Seq.nth 0 r
Very unidiomatic but probably close to optimal
EDIT:
Solution with exactlyOne
let only2 f s= (Seq.filter f s) |> exactlyOne

Compiler optimizations for infinite lists in Haskell?

I've various "partial permutation" functions of type t -> Maybe t that either take me to a new location in a data structure by returning a Just or else return a Nothing if they cannot yet get there.
I routinely must applying these partial permutations in repeated specific patterns, building a list of all intermediate values, but truncating the list whenever I return to my starting position or a permutation fails.
scan_partial_perms :: Eq t => [t -> Maybe t] -> t -> [t]
scan_partial_perms ps v = map fromJust . takeWhile test $ scanl (>>=) (Just v) ps
where test (Just i) | i /= v = True
test _ = False
iterate_partial_perm = scan_partial_perm . iterate
cycle_partial_perms = scan_partial_perms perms . cycle
I'm fairly confident that scanl has the desirable strictness and tail recursion properties in this context. Any other tips on optimizing this code? In particular, what compiler options beyond -O3 -fllvm should I read about?
At worst, I could replace the scanl and infinite list with an accessor function defined like
perm l i = l !! i `rem` length l
I'd imagine this cannot improve performance with the right optimizations however.
I think you have a bug in scan_partial_perms,
scan_partial_perms ps v = map fromJust . takeWhile test $ scanl (>>=) (Just v) ps
scanl f s list always starts with s, so takeWhile test (scanl ...) is []. If that is intentional, it's quite obfuscated. Assuming what you want is
scan_partial_perms ps v = (v:) . map fromJust . takeWhile test . tail $ scanl (>>=) (Just v) ps
there's not much you can do. You can {-# SPECIALISE #-} it so the Eq dictionary is eliminated for the specialised-for types. That'll do you some good if the compiler doesn't do that on its own (which it may if it can see the use site). With ghc >= 7, you can instead make it {-# INLINABLE #-}, so that it can be specialised and perhaps inlined at each use site.
I don't know what happens down the llvm road, but at the core-level, map, fromJust and takeWhile are not yet inlined, so if you're desperate enough, you can get maybe a few tenths of a percent by inlining them manually if they aren't inlined later in the llvm backend:
scan_partial_perms ps v = v : go v ps
where
go w (q:qs) = case q w of
Just z
| z /= v -> z : go z qs
_ -> []
go _ _ = []
But those are very cheap functions, so the gains - if at all present - would be small.
So what you have is rather good already, if it's not good enough, you need a different route of attack.
The one with the list indexing,
perm l i = l !! (i `rem` length l)
-- parentheses necessary, I don't think (l !! i) `rem` length l was what you want
doesn't look good. length is expensive, (!!) is expensive too, so both should in general be avoided.

Haskell mutable map/tree

I am looking for a mutable (balanced) tree/map/hash table in Haskell or a way how to simulate it inside a function. I.e. when I call the same function several times, the structure is preserved. So far I have tried Data.HashTable (which is OK, but somewhat slow) and tried Data.Array.Judy but I was unable to make it work with GHC 6.10.4. Are there any other options?
If you want mutable state, you can have it. Just keep passing the updated map around, or keep it in a state monad (which turns out to be the same thing).
import qualified Data.Map as Map
import Control.Monad.ST
import Data.STRef
memoize :: Ord k => (k -> ST s a) -> ST s (k -> ST s a)
memoize f = do
mc <- newSTRef Map.empty
return $ \k -> do
c <- readSTRef mc
case Map.lookup k c of
Just a -> return a
Nothing -> do a <- f k
writeSTRef mc (Map.insert k a c) >> return a
You can use this like so. (In practice, you might want to add a way to clear items from the cache, too.)
import Control.Monad
main :: IO ()
main = do
fib <- stToIO $ fixST $ \fib -> memoize $ \n ->
if n < 2 then return n else liftM2 (+) (fib (n-1)) (fib (n-2))
mapM_ (print <=< stToIO . fib) [1..10000]
At your own risk, you can unsafely escape from the requirement of threading state through everything that needs it.
import System.IO.Unsafe
unsafeMemoize :: Ord k => (k -> a) -> k -> a
unsafeMemoize f = unsafePerformIO $ do
f' <- stToIO $ memoize $ return . f
return $ unsafePerformIO . stToIO . f'
fib :: Integer -> Integer
fib = unsafeMemoize $ \n -> if n < 2 then n else fib (n-1) + fib (n-2)
main :: IO ()
main = mapM_ (print . fib) [1..1000]
Building on #Ramsey's answer, I also suggest you reconceive your function to take a map and return a modified one. Then code using good ol' Data.Map, which is pretty efficient at modifications. Here is a pattern:
import qualified Data.Map as Map
-- | takes input and a map, and returns a result and a modified map
myFunc :: a -> Map.Map k v -> (r, Map.Map k v)
myFunc a m = … -- put your function here
-- | run myFunc over a list of inputs, gathering the outputs
mapFuncWithMap :: [a] -> Map.Map k v -> ([r], Map.Map k v)
mapFuncWithMap as m0 = foldr step ([], m0) as
where step a (rs, m) = let (r, m') = myFunc a m in (r:rs, m')
-- this starts with an initial map, uses successive versions of the map
-- on each iteration, and returns a tuple of the results, and the final map
-- | run myFunc over a list of inputs, gathering the outputs
mapFunc :: [a] -> [r]
mapFunc as = fst $ mapFuncWithMap as Map.empty
-- same as above, but starts with an empty map, and ignores the final map
It is easy to abstract this pattern and make mapFuncWithMap generic over functions that use maps in this way.
Although you ask for a mutable type, let me suggest that you use an immutable data structure and that you pass successive versions to your functions as an argument.
Regarding which data structure to use,
There is an implementation of red-black trees at Kent
If you have integer keys, Data.IntMap is extremely efficient.
If you have string keys, the bytestring-trie package from Hackage looks very good.
The problem is that I cannot use (or I don't know how to) use a non-mutable type.
If you're lucky, you can pass your table data structure as an extra parameter to every function that needs it. If, however, your table needs to be widely distributed, you may wish to use a state monad where the state is the contents of your table.
If you are trying to memoize, you can try some of the lazy memoization tricks from Conal Elliott's blog, but as soon as you go beyond integer arguments, lazy memoization becomes very murky—not something I would recommend you try as a beginner. Maybe you can post a question about the broader problem you are trying to solve? Often with Haskell and mutability the issue is how to contain the mutation or updates within some kind of scope.
It's not so easy learning to program without any global mutable variables.
If I read your comments right, then you have a structure with possibly ~500k total values to compute. The computations are expensive, so you want them done only once, and on subsequent accesses, you just want the value without recomputation.
In this case, use Haskell's laziness to your advantage! ~500k is not so big: Just build a map of all the answers, and then fetch as needed. The first fetch will force computation, subsequent fetches of the same answer will reuse the same result, and if you never fetch a particular computation - it never happens!
You can find a small implementation of this idea using 3D point distances as the computation in the file PointCloud.hs. That file uses Debug.Trace to log when the computation actually gets done:
> ghc --make PointCloud.hs
[1 of 1] Compiling Main ( PointCloud.hs, PointCloud.o )
Linking PointCloud ...
> ./PointCloud
(1,2)
(<calc (1,2)>)
Just 1.0
(1,2)
Just 1.0
(1,5)
(<calc (1,5)>)
Just 1.0
(1,2)
Just 1.0
Are there any other options?
A mutable reference to a purely functional dictionary like Data.Map.

Resources