Reordering match clauses in a recursive function - algorithm

I have some Ocaml courses at school, and for an exercise we must write the function length.
My teacher showed us how Xavier Leroy wrote his function :
let rec length_aux len = function
[] -> len
| a::l -> length_aux (len + 1) l
let length l = length_aux 0 l
When my teacher explained us why he do the length function like that, he said he didn't know why Xavier Leroy didn't write:
let rec length_aux len = function
a::l -> length_aux (len + 1) l
| [] -> len
let length l = length_aux 0 l
... in order to make it faster (since most of the cases the list in nonempty).
So if someone knows why the second one is no better than the first one, could you answer me please ?
Thank you.

For OCaml, this is the same function. The pattern matching will be compiled to a test on whether the list is empty or not, and jump to a side or the other.
Similar code in C would be reordering cases in a switch statement.

Related

Difference between two kinds of recursive function

In OCaml, there are two ways I have seen to write a map function for example
let rec map f xs =
match xs with
| [] -> []
| x::rest -> f x :: map f rest
and
let map f xs =
let rec go xs =
match xs with
| [] -> []
| x::rest -> f x :: go rest
in go xs
The second one looks like more optimizing because it is similar to loop invariant elimination but in functional programming it may involve allocating a new closure. Can anyone explain the difference between the two styles of recursive function, in particular in terms of performance? Thanks for your help!
I couldn't find similar questions in SO and I'm expecting there is a term like "recursive invariant elimination" to describe the kind of transformation from the first program to the second one.
I've always wondered the exact same thing: does the compiler optimizes invariant argument in recursive function ?
Since your question motivated me to benchmark it, let me share here my results.
Protocol
I have not tried it with map, since it would require big lists, which would result in a stack_overflow. I could try it with rev_map but i don't see the point of allocating huge lists while it's easier to test an equivalent behavior on integers (plus I'm afraid that allocations. would ultimately trigger a round of GC which would mess with my time measures).
The following code reproduces your use-case with a dummy recursive function with an invariant argument, as in map:
let rec g f x = if x = 0 then 0 else g f (f x)
let g2 f x =
let rec aux x = if x = 0 then 0 else aux (f x) in
aux x
let time title f x =
let t = Sys.time () in
let fx = f x in
Printf.printf "%s: %fs\n%!" title (Sys.time () -. t) ;
fx
let main =
let nb = int_of_string Sys.argv.(1) in
ignore (time "normal" (g pred) nb) ;
ignore (time "invariant elimination" (g2 pred) nb)
You can compile it (ocamlopt a.ml for example) and run it by doing
./a.out 10000000000. You can obviously change the integer parameter to tune the number of recursive calls.
Results
On my computer, for an input number of 10000000000, it outputs:
normal: 11.813643s
invariant elimination: 11.646377s
On bigger values:
20000000000
normal: 23.353022s
invariant elimination: 22.977813s
30000000000
normal: 35.586871s
invariant elimination: 35.421313s
I didn't bother going higher.
This to me seems to indicate that both versions are equivalent, maybe the compiler does optimize invariant argument in recursive function and it's just not measurable, maybe it doesn't.
Bytecode comparison
I have also tried to see if the generated bytecode is the same or not (ocamlc -dinstr a.ml), and it does differ slightly as you can see in the following code snippet
normal
compiling a file with only this in it:
let g f x =
let rec aux f x = if x = 0 then 0 else aux f (f x) in
aux f x
gives
branch L2
restart
L3: grab 1
acc 1
push
const 0
eqint
branchifnot L4
const 0
return 2
L4: acc 1
push
acc 1
apply 1
push
acc 1
push
offsetclosure 0
appterm 2, 4
restart
L1: grab 1
closurerec 3, 0
acc 2
push
acc 2
push
acc 2
appterm 2, 5
L2: closure L1, 0
push
acc 0
makeblock 1, 0
pop 1
setglobal E!
invariant elimination
compiling a file with only this in it:
let g2 f x =
let rec aux x = if x = 0 then 0 else aux (f x) in
aux x
gives:
branch L2
L3: acc 0
push
const 0
eqint
branchifnot L4
const 0
return 1
L4: acc 0
push
envacc 1
apply 1
push
offsetclosure 0
appterm 1, 2
restart
L1: grab 1
acc 0
closurerec 3, 1
acc 2
push
acc 1
appterm 1, 4
L2: closure L1, 0
push
acc 0
makeblock 1, 0
pop 1
setglobal E2!
But i'm not expert enough to draw any conclusion as i don't speak bytecode.
That's also around here that i decided that the answer is not that important for now and it's easier anyway to ask #gasche next time i see him.
The use of go suggests a Haskell background. Both OCaml and Haskell are functional programming languages, but there are substantial differences and what one knows about Haskell should not be used to make assumptions about OCaml.
I see no particular reason to write map the second way. If you're using OCaml 4.14.0 or later, you might want to use tail_mod_cons to make map tail-recursive without an explicit accumulator as in Stef's comment.
let[#tail_mod_cons] rec map f =
function
| [] -> []
| x::xs -> f x :: map f xs
And of course, the real solution is:
let map = List.map
As others, I never seen the second form. And it's hard for me to imagine what kind of optimization it can provide. What I know however is that (as #Stef and #Chris pointed out) this function can be written in a tail-recursive way. So just for the sake of completeness:
let map f xs =
let rec go xs ys =
match xs with
| [] -> ys
| x::rest -> go rest ((f x)::ys)
in List.rev (go xs [])
This version is more optimized than the two forms from your post, as each next recursive call can reuse the same stack frame eliminating unnecessary allocations, saving space and the execution time.

Random number generation in OCaml

When using strict functional languages you are bound to a way of writing programs. I come with the problem of generating large quantity of pseudo random numbers with OCaml and I'm not sure I'm using the best way to generate this numbers on such language.
What I did was create a module with a function (gen) that takes an integer as the size and an empty list and returns a list of pseudo random numbers of size size. The problem is when the size is to large, it asserts a StackOverflow which is what is expected.
Should I use tail recursion? Should I use a better method that I'm not aware of?
module RNG =
struct
(* Append a number n in the end of the list l *)
let rec append l n =
match l with
| [] -> [n]
| h :: t -> h :: (append t n)
(* Generate a list l with size random numbers *)
let rec gen size l =
if size = 0 then
l
else
let n = Random.int 1000000 in
let list = append l n in
gen (size - 1) list
end
Testing the code to generate a billion pseudo random numbers returns:
# let l = RNG.gen 1000000000 [];;
Stack overflow during evaluation (looping recursion?).
The problem is that the append function is not tail recursive. Each recursion uses up a bit of stack space to store it's state and as the list gets longer the append function takes more and more stack space. As some point the stack simply isn't big enough and the code fails.
As you suggested in the question the way to avoid that is using tail recursion. When working with lists that usually means constructing the lists in reverse order. The append function then becomes simply ::.
If the order of the resulting list is important the list needs to be reversed at the end. So it is not uncommon to see code returning List.rev acc. This takes O(n) time but constant space and is tail recursive. So the stack is no limit there.
So your code would become:
let rec gen size l =
if size = 0 then
List.rev l
else
let n = Random.int 1000000 in
let list = n :: l in
gen (size - 1) list
A few more things to optimize:
When building a result bit by bit through recursion the result is usually names acc, short for accumulator, and passed first:
let rec gen acc size =
if size = 0 then
List.rev acc
else
let n = Random.int 1000000 in
let list = n :: acc in
gen list (size - 1)
This then allows the use of function and pattern matching instead of the size argument and if construct:
let rec gen acc = function
| 0 -> List.rev acc
| size ->
let n = Random.int 1000000 in
let list = n :: acc in
gen list (size - 1)
A list of random numbers is usually just as good reversed. Unless you want lists of different sizes but using the same seed to begin with the same sequence of numbers you can skip the List.rev. And n :: acc is such a simple costruct one usually doesn't bind that to a variable.
let rec gen acc = function
| 0 -> acc
| size ->
let n = Random.int 1000000 in
gen (n :: acc) (size - 1)
And last you can take advantage of optional arguments. While that makes the code a bit more complex to read it greatly simplifies it's use:
let rec gen ?(acc=[]) = function
| 0 -> acc
| size ->
let n = Random.int 1000000 in
gen ~acc:(n :: acc) (size - 1)
# gen 5;;
- : int list = [180439; 831641; 180182; 326685; 809344]
You no longer need to specify the empty list to generate a list of random number.
Note: An alternative way is to use a wrapper function:
let gen size =
let rec loop acc = function
| 0 -> acc
| size ->
let n = Random.int 1000000 in
loop (n :: acc) (size - 1)
in loop [] size
It would be a big improvement to generate your list in reverse order, then reverse it once at the end. Adding successive values to the end of a list is very slow. Adding to the front of a list can be done in constant time.
Even better, just generate the list in reverse order and return it that way. Do you care that the list is in the same order that the values were generated?
Why do you need to compute the full list explicitly? Another option might be to generate the element lazily (and deterministically) using the new sequence module:
let rec random_seq state () =
let state' = Random.State.copy state in
Seq.Cons(Random.State.int state' 10, random_seq state')
Then the random sequence random_seq state is fully determined by the initial state state: it can be both reused without troubles and only generate new elements as needed.
The standard List module has an init function you can use to write all this in one line:
let upperbound = 10
let rec gen size =
List.init size (fun _ -> Random.int upperbound)

Ocaml summing up values in an integer list

I'm facing a syntax error in the code below:
let sum list =
let current_sum = List.hd list in
for i = 1 to List.length list - 1 do
let counter = List.nth list i
current_sum = current_sum + counter
done;;
The error that I'm facing is here
done;
^^^^
Error: Syntax error
The code is supposed to sum up the current values of the list at each iteration, for example
sum [1;2;3;4];;
- : int list = [1; 3; 6; 10]
So I think I'm going about this in the right direction, what I don't understand is why this error keeps popping up?
The keyword in is missing in the let counter statement.
Another error that will pop after : current_sum is immutable. You will have to change this.
Another way to achieve your sum : use List.fold function.
Putting in shape the comments below :
let sum_l l =
let (r,_) = List.fold_left
(fun (a_l, a_i) x -> ((a_i + x) :: a_l , a_i+x))
([],0) l in
List.rev r;;
You have simply forgotten the in keywork in your line 4.
However, OCaml is a functional language, and you're trying to use an imperative method here.
Even though it will work when you solve your syntax error, it is not the way you would do this in OCaml. For example, a function summing up the elements of a integer list can be done with the following:
let sum = List.fold_left (+) 0;;
or even
let sum = List.reduce ~f:(+);;
if you're using the Core library.
EDIT
After reading the comments under another answer, I've understood what you're really trying to do:
# sum [1;2;3;4];;
- : int list = [1; 3; 6; 10]
And here is a way to do so, using OCaml's functional features:
let sum l =
let sums =
List.fold_left (fun l x -> match l with
| [] -> [x]
| h::t -> (x+h)::l) [] l
in List.rev sums;;
The code is more complicated than just computing the sum itself, but it does the trick.

Some OCAML concerns

So I have a couple of questions, as a newbie trying to learn O'Caml.
In functions, I often times see a | what does that mean? Also, why are functions some times defined as:
let rec a = function
Why does it specifically equal to function and then the code?
My main question however is, I was trying to write a function that would count the number of times an element exists in a list, so if I had 1, 5,5,6,9 with the target val as 5, then I'd return 2, if target val was 9, then I'd return 1, since it repeats once.
here is my attempt, please tell me what I'm doing wrong:
let rec track (x, l)= let rec helper(x,l, count)
in counthelper
match l with [] --> count
| (a::as) -> if(x = a)
then helper(as,l, count+1)
else count( as, l, count);;
The match and function keywords take a list of patterns to be matched. The | symbol is used to separate the different patterns. That's why it shows up so frequently in OCaml code.
The function keyword is like an abbreviation for fun and match. It lets you define a function as a set of patterns to be matched against an argument.
Your code has let rec helper (x, l, count) in .... This isn't a proper let expression. You want something like this: let helper (x, l, count) = def in expr.
More generally your code might look like this:
let track (x, l) =
let rec helper (x, l, count) =
... definition of helper ...
in
helper (x, l, 0)
As a side comment, you're using tuples for function parameters. It's more idiomatic in OCaml to use currying, i.e., to have separate parameters more like this:
let track x l =
...
This lets you do partial application (specify only some of the parameters), and also is cleaner syntactically.
Update
Your latest code doesn't return a value because it has infinite recursion.
Usually | means pattern matching.
let rec means that function can be recursive (call itself). Tutorial.
This is my solution where some useful symbols are changed to _ symbols. Let it be an exercise for you:
let rec count y xs =
let rec inner n = function
| __ -> n
| ______________ -> inner (n+1) xs
| ____ -> inner n xs
in
inner 0 xs;;
Your implementation has some issues.
The most obvious one is that you are using as in pattern matching. You can't us keyword in pattern matching this way.
You need to reread chapter about function declarations. It seems that you are mixing it with function invocation.
You are using not curried functions. You did some in C before, don't you?
You are using if when using using when is nicer. This construction is called guard.

Let and construct versus let in sequence

Consider this OCaml code:
let coupe_inter i j cases =
let lcases = Array.length cases in
let low,_,_ = cases.(i)
and _,high,_ = cases.(j) in
low,high,
Array.sub cases i (j-i+1),
case_append (Array.sub cases 0 i) (Array.sub cases (j+1) (lcases-(j+1)))
Why the expression let ... and ... in is used in place of a let ... in let ... in sequence (like F# force you to do)? This construct seems quite frequent in OCaml code.
Thanks!
let x = a and y = b in c has the effect of defining x and y "simultaneously". This means that the order of evaluation (a after or before b) is unspecified (you must not assume that a will be evaluated before), and that x is not bound in b and y not bound in a, they are only available in c.
I rarely use this construction, because I have been bitten in the past by the evaluation order thing. I often use the recursive variant of it, let rec ... and ... in ... (where all variable bound are available everywhere), however, to define mutually recursive functions.
let rec even n = (n = 0) || odd (n - 1)
and odd n = (n <> 0) && even (n - 1)
In F# let ... and ... is prohibited, but you still can write:
let rec low,_,_ = cases.[i]
and _,high,_ = cases.[j]
As #gasche said, let rec ... and ... is mainly used for defining mutually recursive functions/types. I think using a sequence of let is more intuitive and less error-prone hence should be preferred.

Resources