SML set intersection using map, foldr, or foldl functions - set

I am working on a SML programming assingment for class and stuck on a question. The question is:
"Write an ML function that uses map, foldr, or foldl to compute the intersection of a nonempty list of sets.Here you may assume that sets are denoted as lists. For this problem, you may use auxilary named functions (e.g., isMember). Hint: A nonempty list of sets contains at least one set."
This is what I have so far can anyone point me in the right direction I am fairly new to SML?
fun member(x,[]) = false
| member(x,L) =
if x=hd(L) then true
else member(x,tl(L));
fun intersect(L1,L2) = if tl(L1) = [] then L1
else if member(hd(L1),L2) = true then L1
else intersect(tl(L1),L2);
fun combine(L1) = if tl(L1) = [] then hd(L1)
else
foldr intersect [] L1;
What I want the code to do is start off by executing the combine function with a lists of lists. It checks if there is only one list (i.e. tl(L1) = []) and if it is true then just print the first list. If its false I want to call the foldr function that then calls the intersect function. In theory during the foldr function I want it to check the first list and second list and only keep what values are the same, then check the next list to those kept values and keeping doing this until it has checked every list. After that is done I want it to print each value that was in each list (i.e. intersection of sets).
I know my member function works and the combine function does what its supposed to do, my QUESTION is, what is wrong with the intersection function and can someone explain what the intersection should be doing?
I obviously don't want the straight up answer, that's not what I am here for. I need help to get on track for the right answer.

Your member and combine functions seem fine, but you need to think through the intersect function more carefully. For starters, if L1 is an empty list, it rases an exception, because tl doesn't operate on empty lists. Think about the three cases you need to worry about:
L1 is nil
hd(L1) is in L2
hd(L1) is not in L2
Figure out what you need to do in each case and go from there.

Here is my working answer!
fun member(x,[]) = false
| member(x,L) =
if x=hd(L) then true
else member(x,tl(L));
fun intersect([],[]) = []
| intersect(L1,[]) = []
| intersect(L1,L2) = if member(hd(L2), L1) then hd(L2)::intersect(L1, tl(L2))
else intersect(L1, tl(L2));
fun combine(L1) = if L1 = [] then []
else if tl(L1) = [] then hd(L1)
else foldr intersect (hd(L1)) (tl(L1));
Here are some test cases ran.
combine([[1,2],[1,3],[1,4]]);
val it = [1] : int list
combine([[1,2,3],[1,8,9,3],[1,4,3,8,9]]);
val it = [1,3] : int list
combine([[1,2,3,8,9],[1,8,9,3],[1,4,3,8,9]]);
val it = [1,3,8,9] : int list

Related

SML Syntax Breakdown

I am trying to study SML (for full transparency this is in preparation for an exam (exam has not started)) and one area that I have been struggling with is higher level functions such as map and foldl/r. I understand that they are used in situations where you would use a for loop in oop languages (I think). What I am struggling with though is what each part in a fold or map function is doing. Here are some examples that if someone could break them down I would be very appreciative
fun cubiclist L = map (fn x=> x*x*x) L;
fun min (x::xs) = foldr (fn (a,b) => if (a < b) then a else b) x xs;
So if I could break down the parts I see and high light the parts I'm struggling with I believe that would be helpful.
Obviously right off the bat you have the name of the functions and the parameters that are being passed in but one question I have on that part is why are we just passing in a variable to cubiclist but for min we pass in (x::xs)? Is it because the map function is automatically applying the function to each part in the map? Also along with that will the fold functions typically take the x::xs parameters while map will just take a variable?
Then we have the higher order function along with the anonymous functions with the logic/operations that we want to apply to each element in the list. But the parameters being passed in for the foldr anonymous function I'm not quite sure about. I understand we are trying to capture the lowest element in the list and the then a else b is returning either a or b to be compared with the other elements in the list. I'm pretty sure that they are rutnred and treated as a in future comparisons but where do we get the following b's from? Where do we say b is the next element in the list?
Then the part that I really don't understand and have no clue is the L; and x xs; at the end of the respective functions. Why are they there? What are they doing? what is their purpose? is it just syntax or is there actually a purpose for them being there, not saying that syntax isn't a purpose or a valid reason, but does they actually do something? Are those variables that can be changed out with something else that would provide a different answer?
Any help/explanation is much appreciated.
In addition to what #molbdnilo has already stated, it can be helpful to a newcomer to functional programming to think about what we're actually doing when we crate a loop: we're specifying a piece of code to run repeatedly. We need an initial state, a condition for the loop to terminate, and an update between each iteration.
Let's look at simple implementation of map.
fun map f [] = []
| map f (x :: xs) = f x :: map f xs
The initial state of the contents of the list.
The termination condition is the list is empty.
The update is that we tack f x onto the front of the result of mapping f to the rest of the list.
The usefulness of map is that we abstract away f. It can be anything, and we don't have to worry about writing the loop boilerplate.
Fold functions are both more complex and more instructive when comparing to loops in procedural languages.
A simple implementation of fold.
fun foldl f init [] = init
| foldl f init (x :: xs) = foldl f (f init x) xs
We explicitly provide an initial value, and a list to operate on.
The termination condition is the list being empty. If it is, we return the initial value provided.
The update is to call the function again. This time the initial value is updated, and the list is the tail of the original.
Consider summing a list of integers.
foldl op+ 0 [1,2,3,4]
foldl op+ 1 [2,3,4]
foldl op+ 3 [3,4]
foldl op+ 6 [4]
foldl op+ 10 []
10
Folds are important to understand because so many fundamental functions can be implemented in terms of foldl or foldr. Think of folding as a means of reducing (many programming languages refer to these functions as "reduce") a list to another value of some type.
map takes a function and a list and produces a new list.
In map (fn x=> x*x*x) L, the function is fn x=> x*x*x, and L is the list.
This list is the same list as cubiclist's parameter.
foldr takes a function, an initial value, and a list and produces some kind of value.
In foldr (fn (a,b) => if (a < b) then a else b) x xs, the function is fn (a,b) => if (a < b) then a else b, the initial value is x, and the list is xs.
x and xs are given to the function by pattern-matching; x is the argument's head and xs is its tail.
(It follows from this that min will fail if it is given an empty list.)

Adding 2 Int Lists Together F#

I am working on homework and the problem is where we get 2 int lists of the same size, and then add the numbers together. Example as follows.
vecadd [1;2;3] [4;5;6];; would return [5;7;9]
I am new to this and I need to keep my code pretty simple so I can learn from it. I have this so far. (Not working)
let rec vecadd L K =
if L <> [] then vecadd ((L.Head+K.Head)::L) K else [];;
I essentially want to just replace the first list (L) with the added numbers. Also I have tried to code it a different way using the match cases.
let rec vecadd L K =
match L with
|[]->[]
|h::[]-> L
|h::t -> vecadd ((h+K.Head)::[]) K
Neither of them are working and I would appreciate any help I can get.
First, your idea about modifying the first list instead of returning a new one is misguided. Mutation (i.e. modifying data in place) is the number one reason for bugs today (used to be goto, but that's been banned for a long time now). Making every operation produce a new datum rather than modify existing ones is much, much safer. And in some cases it may be even more performant, quite counterintuitively (see below).
Second, the way you're trying to do it, you're not doing what you think you're doing. The double-colon doesn't mean "modify the first item". It means "attach an item in front". For example:
let a = [1; 2; 3]
let b = 4 :: a // b = [4; 1; 2; 3]
let c = 5 :: b // c = [5; 4; 1; 2; 3]
That's how lists are actually built: you start with a empty list and prepend items to it. The [1; 2; 3] syntax you're using is just a syntactic sugar for that. That is, [1; 2; 3] === 1::2::3::[].
So how do I modify a list, you ask? The answer is, you don't! F# lists are immutable data structures. Once you've created a list, you can't modify it.
This immutability allows for an interesting optimization. Take another look at the example I posted above, the one with three lists a, b, and c. How many cells of memory do you think these three lists occupy? The first list has 3 items, second - 4, and third - 5, so the total amount of memory taken must be 12, right? Wrong! The total amount of memory taken up by these three lists is actually just 5 cells. This is because list b is not a block of memory of length 4, but rather just the number 4 paired with a pointer to the list a. The number 4 is called "head" of the list, and the pointer is called its "tail". Similarly, the list c consists of one number 5 (its "head") and a pointer to list b, which is its "tail".
If lists were not immutable, one couldn't organize them like this: what if somebody modifies my tail? Lists would have to be copied every time (google "defensive copy").
So the only way to do with lists is to return a new one. What you're trying to do can be described like this: if the input lists are empty, the result is an empty list; otherwise, the result is the sum of tails prepended with the sum of heads. You can write this down in F# almost verbatim:
let rec add a b =
match a, b with
| [], [] -> [] // sum of two empty lists is an empty list
| a::atail, b::btail -> (a + b) :: (add atail btail) // sum of non-empty lists is sum of their tails prepended with sum of their heads
Note that this program is incomplete: it doesn't specify what the result should be when one input is empty and the other is not. The compiler will generate a warning about this. I'll leave the solution as an exercise for the reader.
You can map over both lists together with List.map2 (see the docs)
It goes over both lists pairwise and you can give it a function (the first parameter of List.map2) to apply to every pair of elements from the lists. And that generates the new list.
let a = [1;2;3]
let b = [4;5;6]
let vecadd = List.map2 (+)
let result = vecadd a b
printfn "%A" result
And if you want't to do more work 'yourself' something like this?
let a = [1;2;3]
let b = [4;5;6]
let vecadd l1 l2 =
let rec step l1 l2 acc =
match l1, l2 with
| [], [] -> acc
| [], _ | _, [] -> failwithf "one list is bigger than the other"
| h1 :: t1, h2 :: t2 -> step t1 t2 (List.append acc [(h1 + h2)])
step l1 l2 []
let result = vecadd a b
printfn "%A" result
The step function is a recursive function that takes two lists and an accumulator to carry the result.
In the last match statement it does three things
Sum the head of both lists
Add the result to the accumulator
Recursively call itself with the new accumulator and the tails of the lists
The first match returns the accumulator when the remaining lists are empty
The second match returns an error when one of the lists is longer than the other.
The accumulator is returned as the result when the remaining lists are empty.
The call step l1 l2 [] kicks it off with the two supplied lists and an empty accumulator.
I have done this for crossing two lists (multiply items with same index together):
let items = [1I..50_000I]
let another = [1I..50_000I]
let rec cross a b =
let rec cross_internal = function
| r, [], [] -> r
| r, [], t -> r#t
| r, t, [] -> r#t
| r, head::t1, head2::t2 -> cross_internal(r#[head*head2], t1, t2)
cross_internal([], a, b)
let result = cross items another
result |> printf "%A,"
Note: not really performant. There are list object creations at each step which is horrible. Ideally the inner function cross_internal must create a mutable list and keep updating it.
Note2: my ranges were larger initially and using bigint (hence the I suffix in 50_000) but then reduced the sample code above to just 50,500 elements.

is this implementation of merge sort good?

I've just started to learn Haskell last night and I've never used a functional programming language before.
I just want to know if my implemention of merge sort is good or bad and what exactly is good or bad.
Maybe it's even wrong - Well it does sort but maybe the Algorithm is not what I think what merge sort is.
Just tell me everything I could improve here. I by myself think its a pretty clear and simple implementation.
Thanks for your advice, here's the code :)
merge [] ys = ys
merge xs [] = xs
merge xs ys = sorted : merge left right
where
sorted = if head(xs) < head(ys) then head(xs) else head(ys)
left = if head(xs) <= head(ys) then tail(xs) else xs
right = if head(xs) > head(ys) then tail(ys) else ys
msort [] = []
msort [x] = [x]
msort xs = merge (msort left) (msort right)
where
left = take (div (length xs) 2) xs
right = drop (div (length xs) 2) xs
Well, first of all, we can rewrite merge to be a little more elegant using pattern matching
merge [] ys = ys
merge xs [] = xs
merge xs#(x:xs1) ys#(y:ys1)
| x <= y = x : merge xs1 ys
| otherwise = y : merge xs ys1
In general you should avoid using head and tail since they are a bit unsafe (they raise an error for the empty list) and use pattern matching whenever possible.
The implementation of msort is pretty much spot on, except that we can split the list in a more efficient way. That's because length xs - takes O(N) to complete. The compiler might save you and cache the result of the length call so that the second call to length won't traverse the list again. But the take and drop will pretty much cause another two traversals thus splitting the list using 3 traversals which may prove to be expensive. We can do better by splitting the list in two lists - the first one containing the elements on the odd positions and the second list with the elements placed on the even positions, like so:
msort [] = []
msort [x] = [x]
msort xs = merge (msort first) (msort second)
where
(first, second) = splitInHalves xs
splitInHalves [] = ([], [])
splitInHalves [x] = ([x], [])
splitInHalves (x:y:xs) =
let (xs1, ys1) = splitInHalves xs
in (x:xs1, y:ys1)
This gets you the same Merge Sort in O(NlogN) time. It feels different because you would probably implement it in place (by modifying the original list) in an imperative language such as C. This version is slightly more costly on the memory, but it does have it's advantages - it is more easy to reason about, so it is more maintainable, and also it is very easy to parallelize without being concerned of anything else except the algorithm itself - which is exactly what a good programming language should provide for the developers that use it.
EDIT 1 :
If the syntax is a bit much, here are some resources:
Pattern Matching - the bit with the # symbol is called an as-pattern. You'll find it in there
let is a keyword used to declare a variable to be used in the expression that follows it (whereas where binds a variable in the expression that precedes it). More on Haskell syntax, including guards (the things with | condition = value) can be found here, in this chapter of Learn You a Haskell
EDIT 2 :
#is7s proposed a far more concise version of splitInHalves using the foldr function:
splitInHalves = foldr (\x (l,r) -> (x:r,l)) ([],[])
EDIT 3 :
Here is another answer which provides an alternative implementation of merge sort, which also has the property of being stable:
Lazy Evaluation and Time Complexity
Hope this helps and welcome to the wonderful world of Functional Programming !

Ocaml homework need some advices

We have N sets of integers A1, A2, A3 ... An. Find an algorithm that returns a list containg one element from each of the sets, with the property that the difference between the largest and the smallest element in the list is minimal
Example:
IN: A1 = [0,4,9], A2 = [2,6,11], A3 = [3,8,13], A4 = [7,12]
OUT: [9,6,8,7]
I have an idea about this exercise, first we need sort all the elements on one list(every element need to be assigned to its set), so with that input we get this:
[[0,1],[2,2],[3,3],[4,1],[6,2],[7,4],[8,3],[9,1],[11,2],[12,4],[13,3]]
later on we create all possible list and find this one with the difference between smallest and largest element, and return correct out like this: [9,6,8,7]
I am newbie in ocaml so I have some questions about coding this stuff:
Can I create a function with N(infinite amount of) arguments?
Should I create a new type, like list of pair to realize assumptions?
Sorry for my bad english, hope you will understand what I wanted to express.
This answer is about the algorithmic part, not the OCaml code.
You might want to implement your proposed solution first, to have a working one and to compare its results with an improved solution, which I now write about.
Here is a hint about how to improve the algorithmic part. Consider sorting all sets, not only the first one. Now, the list of all minimum elements from all sets is a candidate to the output.
To consider other candidate output, how can you move from there?
I'm just going to answer your questions, rather than comment on your proposed solution. (But I think you'll have to work on it a little more before you're done.)
You can write a function that takes a list of lists. This is pretty much the same
as allowing an arbitrary number of arguments. But really it just has one argument
(like all functions in OCaml).
You can just use built-in types like lists and tuples, you don't need to create or
declare them explicitly.
Here's an example function that takes a list of lists and combines them into one big long list:
let rec concat lists =
match lists with
| [] -> []
| head :: tail -> head # concat tail
Here is the routine you described in the question to get you started. Note that
I did not pay any attention to efficiency. Also added the reverse apply (pipe)
operator for clarity.
let test_set = [[0;4;9];[2;6;11];[3;8;13]; [7;12]]
let (|>) g f = f g
let linearize sets =
let open List in sets
|> mapi (fun i e -> e |> map (fun x -> (x, i+1) ))
|> flatten |> sort (fun (e1,_) (e2, _) -> compare e1 e2)
let sorted = linearize test_set
Your approach does not sound very efficient, with an n number of sets, each with x_i elments, your sorted list will have (n * x_i) elements, and the number of sub-lists you can generate out of that would be: (n * x_i)! (factorial)
I'd like to propose a different approach, but you'll have to work out the details:
Tag (index) each element with it's set identifier (like you have done).
Sort each set individually.
Build the exact opposite to that of your desired result!
Optimize!
I hope you can figure out steps 3, 4 on your own... :)

FP homework. Is it possible to define a function using nested pattern matching instead of auxiliary function?

I am solving the Programming assinment for Harvard CS 51 programming course in ocaml.
The problem is to define a function that can compress a list of chars to list of pairs where each pair contains a number of consequent occurencies of the character in the list and the character itself, i.e. after applying this function to the list ['a';'a';'a';'a';'a';'b';'b';'b';'c';'d';'d';'d';'d'] we should get the list of [(5,'a');(3,'b');(1,'c');(4,'d')].
I came up with the function that uses auxiliary function go to solve this problem:
let to_run_length (lst : char list) : (int*char) list =
let rec go i s lst1 =
match lst1 with
| [] -> [(i,s)]
| (x::xs) when s <> x -> (i,s) :: go 0 x lst1
| (x::xs) -> go (i + 1) s xs
in match lst with
| x :: xs -> go 0 x lst
| [] -> []
My question is: Is it possible to define recursive function to_run_length with nested pattern matching without defining an auxiliary function go. How in this case we can store a state of counter of already passed elements?
The way you have implemented to_run_length is correct, readable and efficient. It is a good solution. (only nitpick: the indentation after in is wrong)
If you want to avoid the intermediary function, you must use the information present in the return from the recursive call instead. This can be described in a slightly more abstract way:
the run length encoding of the empty list is the empty list
the run length encoding of the list x::xs is,
if the run length encoding of xs start with x, then ...
if it doesn't, then (x,1) ::run length encoding of xs
(I intentionally do not provide source code to let you work the detail out, but unfortunately there is not much to hide with such relatively simple functions.)
Food for thought: You usually encounter this kind of techniques when considering tail-recursive and non-tail-recursive functions (what I've done resembles turning a tail-rec function in non-tail-rec form). In this particular case, your original function was not tail recursive. A function is tail-recursive when the flows of arguments/results only goes "down" the recursive calls (you return them, rather than reusing them to build a larger result). In my function, the flow of arguments/results only goes "up" the recursive calls (the calls have the least information possible, and all the code logic is done by inspecting the results). In your implementation, flows goes both "down" (the integer counter) and "up" (the encoded result).
Edit: upon request of the original poster, here is my solution:
let rec run_length = function
| [] -> []
| x::xs ->
match run_length xs with
| (n,y)::ys when x = y -> (n+1,x)::ys
| res -> (1,x)::res
I don't think it is a good idea to write this function. Current solution is OK.
But if you still want to do it you can use one of two approaches.
1) Without changing arguments of your function. You can define some toplevel mutable values which will contain accumulators which are used in your auxilary function now.
2) You can add argument to your function to store some data. You can find some examples when googling for continuation-passing style.
Happy hacking!
P.S. I still want to underline that your current solution is OK and you don't need to improve it!

Resources