Use Set.Make.iter to transform the elements of a set? - data-structures

I've got a library which implements a set (interface with documentation available here: http://pastebin.com/j9QUyN1G). I understand everything apart from this fragment:
val iter : ('a -> unit) -> 'a t -> unit
(** [iter f s] applies [f] to all elements in set [s]. The elements
are passed to [f] in increasing order with respect to the ordering
used to create the set. *)
So iter takes a function as one of the arguements and applies it to all elements of set. So I would expect something like ('a -> 'a) which takes an element of the set and changes it to element of the same type with other value or ('a -> 'b) which takes 'a t and transforms it to 'b t. But instead iter takes a function of type ('a -> unit) and also returns unit, not an 'a t nor 'b t.
So how should an example function passed to iter look like?

iter doesn't change the elements of the set. It's executed purely for its side effects. You might use it to print the elements, for example:
module StringSet = Set.Make(String)
…
StringSet.iter print_endline ss
The set data structure is immutable, so you can't change the elements of the set. You can build a new set whose elements are derived from an existing set. For a list, there's the function map which takes a list [x1; …; xn] and returns a new list [f x1; …; f xn]. There is no similar function in the Set module because elements in a set are not stored in the order chosen by the caller: there's no such thing as a set with its elements in an order derived from another set. If you want to build a set from the images of the elements of a set, insert the new elements one by one.
module Int = struct
type t = int
let compare = Pervasives.compare
end
module IntSet = Set.Make(Int)
module StringSet = Set.Make(String)
let int_to_string_set is =
IntSet.fold (fun i ss -> StringSet.add (string_of_int i) ss) is StringSet.empty

iter takes such function that accepts argument of type 'a do with it whatever it whats and returns a value of type unit. In other words it is evaluated for the side-effects since it can't return anything worthwhile.
What you're looking for is a map function, that usually accepts a function of type 'a -> 'b a container with elements of type 'a and returns an container with elements of type 'b. Unfortunately to you, the interface you've shown, doesn't provide such function. But this is not a problem, since it provides a fold function, that is the most general iterator. Having only fold you can implement any other iteratos, like map, iter, exists, etc... Indeed in Core library you can find Container.Make functor that will automatically derive a common container interface from only one function - fold. But also, you can define map by yourself:
let map f xs =
fold (fun x ys -> add (f x) ys) xs empty

It would be a function with side effects, like this:
Let p x = Printf.printf "%d\n" x

Related

SML Syntax Breakdown

I am trying to study SML (for full transparency this is in preparation for an exam (exam has not started)) and one area that I have been struggling with is higher level functions such as map and foldl/r. I understand that they are used in situations where you would use a for loop in oop languages (I think). What I am struggling with though is what each part in a fold or map function is doing. Here are some examples that if someone could break them down I would be very appreciative
fun cubiclist L = map (fn x=> x*x*x) L;
fun min (x::xs) = foldr (fn (a,b) => if (a < b) then a else b) x xs;
So if I could break down the parts I see and high light the parts I'm struggling with I believe that would be helpful.
Obviously right off the bat you have the name of the functions and the parameters that are being passed in but one question I have on that part is why are we just passing in a variable to cubiclist but for min we pass in (x::xs)? Is it because the map function is automatically applying the function to each part in the map? Also along with that will the fold functions typically take the x::xs parameters while map will just take a variable?
Then we have the higher order function along with the anonymous functions with the logic/operations that we want to apply to each element in the list. But the parameters being passed in for the foldr anonymous function I'm not quite sure about. I understand we are trying to capture the lowest element in the list and the then a else b is returning either a or b to be compared with the other elements in the list. I'm pretty sure that they are rutnred and treated as a in future comparisons but where do we get the following b's from? Where do we say b is the next element in the list?
Then the part that I really don't understand and have no clue is the L; and x xs; at the end of the respective functions. Why are they there? What are they doing? what is their purpose? is it just syntax or is there actually a purpose for them being there, not saying that syntax isn't a purpose or a valid reason, but does they actually do something? Are those variables that can be changed out with something else that would provide a different answer?
Any help/explanation is much appreciated.
In addition to what #molbdnilo has already stated, it can be helpful to a newcomer to functional programming to think about what we're actually doing when we crate a loop: we're specifying a piece of code to run repeatedly. We need an initial state, a condition for the loop to terminate, and an update between each iteration.
Let's look at simple implementation of map.
fun map f [] = []
| map f (x :: xs) = f x :: map f xs
The initial state of the contents of the list.
The termination condition is the list is empty.
The update is that we tack f x onto the front of the result of mapping f to the rest of the list.
The usefulness of map is that we abstract away f. It can be anything, and we don't have to worry about writing the loop boilerplate.
Fold functions are both more complex and more instructive when comparing to loops in procedural languages.
A simple implementation of fold.
fun foldl f init [] = init
| foldl f init (x :: xs) = foldl f (f init x) xs
We explicitly provide an initial value, and a list to operate on.
The termination condition is the list being empty. If it is, we return the initial value provided.
The update is to call the function again. This time the initial value is updated, and the list is the tail of the original.
Consider summing a list of integers.
foldl op+ 0 [1,2,3,4]
foldl op+ 1 [2,3,4]
foldl op+ 3 [3,4]
foldl op+ 6 [4]
foldl op+ 10 []
10
Folds are important to understand because so many fundamental functions can be implemented in terms of foldl or foldr. Think of folding as a means of reducing (many programming languages refer to these functions as "reduce") a list to another value of some type.
map takes a function and a list and produces a new list.
In map (fn x=> x*x*x) L, the function is fn x=> x*x*x, and L is the list.
This list is the same list as cubiclist's parameter.
foldr takes a function, an initial value, and a list and produces some kind of value.
In foldr (fn (a,b) => if (a < b) then a else b) x xs, the function is fn (a,b) => if (a < b) then a else b, the initial value is x, and the list is xs.
x and xs are given to the function by pattern-matching; x is the argument's head and xs is its tail.
(It follows from this that min will fail if it is given an empty list.)

How do I modify a Haskell list without entering an infinite loop?

I am writing a piece of code in Haskell, where I have a line that does something like this:
addElement :: [a] -> a -> [a]
addElement list elem = list ++ [elem]
I need (or at least, I think so) a function like this for the purpose of adding new vertices in a vertex list of a graph data structure that I'm implementing. Now, I can call this function as follows
newlist = addElement oldlist elem
and everything works out fine. However, if I write
mylist = addElement mylist elem
and then try to do anything with mylist after the call has terminated (it does), I enter an infinite loop, and if I understand correctly this is due to the lazy evaluation of Haskell or something of the sort (mylist gets expanded to addElement (addElement ... elem) elem if I got it right ?).
This is of course bad for my particular implementation, since for my purposes I now have to make new lists every time I need to add an element to a list. So how do I make an element-adding function that works the way I want?
First of all mylist = addElement mylist elem is an equation, it is not an assignment. It is not evaluated once: since Haskell is a declarative language, you cannot alter a variable: once you give it a value, it will always have that value.
Your equation will thus result in:
mylist = addElement mylist elem
= addElement (addElement mylist elem) elem
= addElement (... (addElement mylist elem) ...) elem
you get the idea.
Nevertheless, you do not need to construct an complete new list each time: you can simply use (h:t) to append to the head:
addElement :: [a] -> a -> [a]
addElement t h = (h:t)
This will construct a "new" list in O(1) that reuses the old list as tail. As mentioned before the element will be added to the front.
Another way to solve the issue is using difference lists. Here a list is denoted as:
type DiffList a = a -> [a]
and an empty list is:
emptyDiffList :: DiffList a
emptyDiffList = \x -> x
In that case you ground the difference list with:
groundDiffList :: DiffList a -> [a]
groundDiffList x = x []
and you can add an element to the end of the list with:
addElement :: DiffList a -> a -> DiffList a
addElement l el = \x -> l (el:x)
Nevertheless you will always need to create a new variable for a "new list": you cannot all of a sudden give mylist another value (you can of course use recursion but in that case those are technically two different variables: the mylist of the caller, and the mylist of the callee).

Scope of implicit type variables in OCaml constraints

In Ocaml you can introduce new type variables inside a constraint, which is useful to enforce type-identities in the type-checker:
let f g n = (g (n:'n):'n) ;;
val f : ('n -> 'n) -> 'n -> 'n = <fun>
It is obviously possible to re-use these type variables (otherwise it would be a rather pointless exercise). However, since they are not introduced by some special statement, I wonder what there scope is? Is it the enclosing function, let-binding or top-level statement?
Is there a way to limit the scope of such an implicitly introduced type-variable?
A scope of any type variable used in a type constraint is the body of the enclosing let-expression. If an expression is mutually recursive, then the scope is extended to the whole set of mutual recursive expressions. The scope cannot be reduced. Let-expression is a typing primitive. It is not possible to hide or override a type variable.
Whenever a new type variable is introduced, it is looked up in a current typing context. If it was already introduced, then it is unified. Otherwise a new type variable is added to the context. (That can be later used for unification).
An example to clarify the idea:
let rec f g h x y = g (x : 'a) + h (y : 'a) and e (x : 'a) = x + 1;;
Here, 'a used to constraint x in e is the same 'a that was used to contraint x and y in the body of function f. Since, x in e is unified with int type the unification extends to function f, constraining function g and h to type int -> int.

What is the role of `while`-loops in computation expressions in F#?

If you define a While method of the builder-object, you can use while-loops in your computation expressions. The signature of the While method is:
member b.While (predicate:unit->bool, body:M<'a>) : M<'a>
For comparison, the signature of the For method is:
member b.For (items:seq<'a>, body:unit->M<'a>) : M<'a>
You should notice that, in the While-method, the body is a simple type, and not a function as in the For method.
You can embed some other statements, like let and function-calls inside your computation-expressions, but those can impossibly execute in a while-loop more than once.
builder {
while foo() do
printfn "step"
yield bar()
}
Why is the while-loop not executed more than once, but merely repeated? Why the significant difference from for-loops? Better yet, is there some intended strategy for using while-loops in computation-expressions?
If you look at how computation expressions are evaluated, you'll see that
while foo() do
printfn "step"
yield bar()
is translated to something like
builder.While(fun () -> foo(),
builder.Delay(fun () ->
printfn "step"
builder.Yield(bar()))))
This translation allows the body of the while loop to be evaluated multiple times. While your type signatures are accurate for some computation expressions (such as seq or async), note that the insertion of the call to Delay may result in a different signature. For instance, you could define a list builder like this:
type ListBuilder() =
member x.Delay f = f
member x.While(f, l) = if f() then l() # (x.While(f, l)) else []
member x.Yield(i) = [i]
member x.Combine(l1,l2) = l1 # l2()
member x.Zero() = []
member x.Run f = f()
let list = ListBuilder()
Now you can evaluate an expression like:
list {
let x = ref 0
while !x < 10 do
yield !x
x := !x + 1
}
to get the equivalent of [0 .. 9].
Here, our While method has the signature (unit -> bool) * (unit -> 'a list) -> 'a list, rather than (unit -> bool) * 'a list -> 'a list. In general, when the Delay operation has type (unit -> M<'a>) -> D<M<'a>>, the While method's signature will be (unit -> bool) * D<M<'a>> -> M<'a>.

Why is F#'s Seq.sortBy much slower than LINQ's IEnumerable<T>.OrderBy extension method?

I've recently written a piece of code to read some data from a file, store it in a tuple and sort all the collected data by the first element of the tuple. After some tests I've noticed that using Seq.sortBy (and Array.sortBy) is extremely slower than using IEnumerable.OrderBy.
Below are two snippets of code which should show the behaviour I'm talking about:
(filename
|> File.ReadAllLines
|> Array.Parallel.map(fun ln -> let arr = ln.Split([|' '|], StringSplitOptions.RemoveEmptyEntries)
|> Array.map(double)
|> Array.sort in arr.[0], arr.[1])
).OrderBy(new Func(fun (a,b) -> a))
and
filename
|> File.ReadAllLines
|> Array.Parallel.map(fun ln -> let arr = ln.Split([|' '|], StringSplitOptions.RemoveEmptyEntries) |> Array.map(double) |> Array.sort in arr.[0], arr.[1])
|> Seq.sortBy(fun (a,_) -> a)
On a file containing 100000 lines made of two doubles, on my computer the latter version takes over twice as long as the first one (no improvements are obtained if using Array.sortBy).
Ideas?
the f# implementation uses a structural comparison of the resulting key.
let sortBy keyf seq =
let comparer = ComparisonIdentity.Structural
mkDelayedSeq (fun () ->
(seq
|> to_list
|> List.sortWith (fun x y -> comparer.Compare(keyf x,keyf y))
|> to_array) :> seq<_>)
(also sort)
let sort seq =
mkDelayedSeq (fun () ->
(seq
|> to_list
|> List.sortWith Operators.compare
|> to_array) :> seq<_>)
both Operators.compare and the ComparisonIdentity.Structural.Compare become (eventually)
let inline GenericComparisonFast<'T> (x:'T) (y:'T) : int =
GenericComparisonIntrinsic x y
// lots of other types elided
when 'T : float = if (# "clt" x y : bool #)
then (-1)
else (# "cgt" x y : int #)
but the route to this for the Operator is entirely inline, thus the JIT compiler will end up inserting a direct double comparison instruction with no additional method invocation overhead except for the (required in both cases anyway) delegate invocation.
The sortBy uses a comparer so will go through an additional virtual method call but is basically about the same.
In comparison the OrderBy function also must go through virtual method calls for the equality (Using EqualityComparer<T>.Default) but the significant difference is that it sorts in place and uses the buffer created for this as the result. In comparison if you take a look at the sortBy you will see that it sorts the list (not in place, it uses the StableSortImplementation which appears to be merge sort) and then creates a copy of it as a new array. This additional copy (given the size of your input data) is likely the principle cause of the slow down though the differing sort implementations may also have an effect.
That said this is all guessing. If this area is a concern for you in performance terms then you should simply profile to find out what is taking the time.
If you wish to see what effect the sorting/copying change would have try this alternate:
// these are taken from the f# source so as to be consistent
// beware doing this, the compiler may know about such methods
open System.Collections.Generic
let mkSeq f =
{ new IEnumerable<'b> with
member x.GetEnumerator() = f()
interface System.Collections.IEnumerable with
member x.GetEnumerator() = (f() :> System.Collections.IEnumerator) }
let mkDelayedSeq (f: unit -> IEnumerable<'T>) =
mkSeq (fun () -> f().GetEnumerator())
// the function
let sortByFaster keyf seq =
let comparer = ComparisonIdentity.Structural
mkDelayedSeq (fun () ->
let buffer = Seq.to_array seq
Array.sortInPlaceBy (fun x y -> comparer.Compare(keyf x,keyf y)) buffer
buffer :> seq<_>)
I get some reasonable percentage speedups within the repl with very large (> million) input sequences but nothing like an order of magnitude. Your mileage, as always, may vary.
A difference of x2 is not much when sorts are O(n.log(n)).
Small differences in data structures (e.g. optimising for input being ICollection<T>) could make this scale of difference.
And F# is currently Beta (not so much focus on optimisation vs. getting the language and libraries right), plus the generality of F# functions (supporting partial application etc.) could lead to a slight slow down in calling speed: more than enough to account for the different.

Resources