OCaml insert an element in list - algorithm

What is the standard way of inserting an element to a specific position in a list in OCaml. Only recursion is allowed. No assignment operation is permitted.
My goal is to compress a graph in ocaml by removing vertexes with in_degree=out_degree=1. For this reason I need to remove the adjacent edges to make a single edge. Now the edges are in a list [(6,7);(1,2);(2,3);(5,4)]. So I need to remove those edges from the list and add a single edge.
so the above list will now look like [(6,7);(1,3);(5,4)]. Here we see (1,2);(2,3) is removed and (1,3) is inserted in the second position. I have devised an algorithm for this. But to do this I need to know how can I remove the edges (1,2);(2,3) from position 2,3 and insert (1,3) in position 2 without any explicit variable and in a recursive manner.

OCaml list is immutable so there's no such thing like removing and inserting elements in list operations.
What you can do is creating a new list by reusing certain part of the old list. For example, to create a list (1, 3)::xs' from (1, 2)::(2, 3)::xs' you actually reuse xs' and make the new list using cons constructor.
And pattern matching is very handy to use:
let rec transform xs =
match xs with
| [] | [_] -> xs
| (x, y1)::(y2, z)::xs' when y1 = y2 -> (x, z)::transform xs'
| (x, y1)::(y2, z)::xs' -> (x, y1)::transform ((y2, z)::xs')

You can do something like that :
let rec compress l = match l with
[] -> []
| x :: [] -> [x]
| x1 :: x2 :: xs ->
if snd x1 = fst x2 then
(fst x1, snd x2) :: compress xs
else x1 :: compress (x2 :: xs)

You are using the wrong datastructure to store your edges and your question doesnt indicate that you can't choose a different datastructure. As other posters already said: lists are immutable so repeated deletion of elements deep within them is a relatively costly (O(n)) operation.
I also dont understand why you have to reinsert the new edge at position 2. A graph is defined by G=(V,E) where V and E are sets of vertices and edges. The order of them therefor doesnt matter. This definition of graphs also already tells you a better datastructure for your edges: sets.
In ocaml, sets are represented by balanced binary trees so the average complexity of insertion and deletion of members is O(log n). So you see that for deletion of members this complexity is definitely better than the one of lists (O(n)) on the other hand it is more costly to add members to a set than it is to prepend elements to a list using the cons operation.
An alternative datastructure would be a hashtable where insertion and deletion can be done in O(1) time. Let the keys in the hashtable be your edges and since you dont use the values, just use a constant like unit or 0.

Related

Generate infinite list from function results

I have a function that takes an integer and returns a list of integers.
How do I efficiently map this function to an initial integer, then for each item of the resulting list that has not be previously mapped, apply the same function and essentially generate an infinite list.
E.g.
f :: Int -> [Int]
f 0 = [1,2]++(f 1)++(f 2)
Additionally, I need to be able to index the resulting list up to 10E10. How would this be optimised? memoization?
You want a breadth-first search. The basic idiom goes like this:
bfs :: (a -> [a]) -> [a] -> [a]
bfs f xs = xs ++ bfs f (concatMap f xs)
Notice how we keep the current "state" in the argument xs, output it and then recursively call with a new state which is f applied to each element of the input state.
If you want to filter out elements you haven't seen before, you need to also pass along some extra state keeping track of which elements you've seen, e.g. a Data.Set, and adjust the algorithm accordingly. I'll leave that bit to you because I'm an irritating pedagogue.

Recursive algorithm that returns every pair of a set

I was wondering if any algorithm of that kind does exist, I don't have the slightest idea on how to program it...
For exemple if you give it [1;5;7]
it should returns [(1,5);(1,7);(5,1);(5,7);(7,1);(7,5)]
I don't want to use any for loop.
Do you have any clue on how to achieve this ?
You have two cases: list is empty -> return empty list; list is not empty -> take first element x, for each element y yield (x, y) and make a recursive call on the tail of the list. Haskell:
pairs :: [a] -> [(a, a)]
pairs [] = []
pairs (x:xs) = [(x, x') | x' <- xs] ++ pairs xs
--*Main> pairs [1..10]
--[(1,2),(1,3),(1,4),(1,5),(1,6),(1,7),(1,8),(1,9),(1,10),(2,3),(2,4),(2,5),(2,6),(2,7),(2,8),(2,9),(2,10),(3,4),(3,5),(3,6),(3,7),(3,8),(3,9),(3,10),(4,5),(4,6),(4,7),(4,8),(4,9),(4,10),(5,6),(5,7),(5,8),(5,9),(5,10),(6,7),(6,8),(6,9),(6,10),(7,8),(7,9),(7,10),(8,9),(8,10),(9,10)]
I don't know is the algorithm used is a recursive one or not, but what are you asking for is the itertools.combinations('ABCD', 2) method from Python and I suppose the same thing is implemented in other programming language, so you can probably use the native method.
But if you need to write your own, then you can take a look at Algorithm to return all combinations of k elements from n (on this site) for some ideas

Ocaml homework need some advices

We have N sets of integers A1, A2, A3 ... An. Find an algorithm that returns a list containg one element from each of the sets, with the property that the difference between the largest and the smallest element in the list is minimal
Example:
IN: A1 = [0,4,9], A2 = [2,6,11], A3 = [3,8,13], A4 = [7,12]
OUT: [9,6,8,7]
I have an idea about this exercise, first we need sort all the elements on one list(every element need to be assigned to its set), so with that input we get this:
[[0,1],[2,2],[3,3],[4,1],[6,2],[7,4],[8,3],[9,1],[11,2],[12,4],[13,3]]
later on we create all possible list and find this one with the difference between smallest and largest element, and return correct out like this: [9,6,8,7]
I am newbie in ocaml so I have some questions about coding this stuff:
Can I create a function with N(infinite amount of) arguments?
Should I create a new type, like list of pair to realize assumptions?
Sorry for my bad english, hope you will understand what I wanted to express.
This answer is about the algorithmic part, not the OCaml code.
You might want to implement your proposed solution first, to have a working one and to compare its results with an improved solution, which I now write about.
Here is a hint about how to improve the algorithmic part. Consider sorting all sets, not only the first one. Now, the list of all minimum elements from all sets is a candidate to the output.
To consider other candidate output, how can you move from there?
I'm just going to answer your questions, rather than comment on your proposed solution. (But I think you'll have to work on it a little more before you're done.)
You can write a function that takes a list of lists. This is pretty much the same
as allowing an arbitrary number of arguments. But really it just has one argument
(like all functions in OCaml).
You can just use built-in types like lists and tuples, you don't need to create or
declare them explicitly.
Here's an example function that takes a list of lists and combines them into one big long list:
let rec concat lists =
match lists with
| [] -> []
| head :: tail -> head # concat tail
Here is the routine you described in the question to get you started. Note that
I did not pay any attention to efficiency. Also added the reverse apply (pipe)
operator for clarity.
let test_set = [[0;4;9];[2;6;11];[3;8;13]; [7;12]]
let (|>) g f = f g
let linearize sets =
let open List in sets
|> mapi (fun i e -> e |> map (fun x -> (x, i+1) ))
|> flatten |> sort (fun (e1,_) (e2, _) -> compare e1 e2)
let sorted = linearize test_set
Your approach does not sound very efficient, with an n number of sets, each with x_i elments, your sorted list will have (n * x_i) elements, and the number of sub-lists you can generate out of that would be: (n * x_i)! (factorial)
I'd like to propose a different approach, but you'll have to work out the details:
Tag (index) each element with it's set identifier (like you have done).
Sort each set individually.
Build the exact opposite to that of your desired result!
Optimize!
I hope you can figure out steps 3, 4 on your own... :)

Haskell: brute force and maximum subarray problem

I am trying to solve the maximum sub array problem with a brute force approach i.e generating all the possible subarrays combinations. I got something that works but it's not satisfying at all because it produces way too many duplicated subarrays.
Does anyone knows a smart way to generate all the subarrays (in [[]] form) with a minimal number of duplicated elements ?
By the way, I'm new to Haskell. Here's my current solution:
import qualified Data.List as L
maximumSubList::[Integer]->[Integer]
maximumSubList x = head $ L.sortBy (\a b -> compare (sum b) (sum a)) $ L.nub $ slice x
where
-- slice will return all the "sub lists"
slice [] = []
slice x = (slice $ tail x) ++ (sliceLeft x) ++ (sliceRight x)
-- Create sub lists by removing "left" part
-- ex [1,2,3] -> [[1,2,3],[2,3],[3]]
sliceRight [] = []
sliceRight x = x : (sliceRight $ tail x)
-- Create sub lists by removing "right" part
-- ex [1,2,3] -> [[1,2,3],[1,2],[1]]
sliceLeft [] = []
sliceLeft x = x : (sliceLeft $ init x)
There are many useful functions for operating on lists in the standard Data.List module.
import Data.List
slice :: [a] -> [[a]]
slice = filter (not . null) . concatMap tails . inits
dave4420's answer is how to do what you want to do using smart, concise Haskell. I'm no Haskell expert, but I occasionally play around with it and find solving a problem like this to be an interesting distraction, and enjoy figuring out exactly why it works. Hopefully the following explanation will be helpful :)
The key property of dave4420's answer (which your answer doesn't have) is that the pair (startPos, endPos) is unique for each subarray it generates. Now, observe that two subarrays are distinct if either their startPos or endPos is different. Applying inits to the original array returns a list of subarrays that each have unique startPos, and the same endPos (equal to the number of elements in the array). Applying tails to each of these subarrays in turn produces another list of subarrays -- one list of subarrays is output per input subarray. Notice that tails does not disturb the distinctness between input subarrays because the subarrays output by invoking tails on a single input subarray all retain the same startPos: that is, if you have two subarrays with distinct startPoses, and put both of them through tails, each of the subarrays produced from the first input subarray will be distinct from each of the subarrays produced from the second one.
Additionally, each of the subarrays produced by the invocation of tails on a single subarray are distinct because, although they all share the same startPos, they all have distinct endPoses. Therefore all subarrays produced by (concatMap tails) . inits are distinct. It only remains to note that no subarray is missed out: for any subarray starting at position i and ending at position j, that subarray must appear as the j-i+1th list produced by applying tails to the i+1th list produced by inits. So in conclusion, every possible subarray appears exactly once!

binary search tree for finding more than one object

I've just read about binary search trees from the "Learn You a Haskell" book, and I'm wondering whether it is effective to search more than one element using this tree? For example, suppose I have a bunch of objects where every object has some index, and
5
/ \
3 7
/ \ / \
1 4 6 8
if I need to find an element by index 8, I need to do only three steps 5 -> 7 -> 8, instead of iterating over the whole list until the end. But what if I need to find several objects, say 1, 4, 6, 8? It seems like I'd need to repeat the same action for each element 5-> 3 -> 1 5 -> 3 -> 4, 5 -> 7 -> 6 and 5 -> 7 -> 8.
So my question is: does it still make sense to use binary search tree for finding more than one element? Could it be better than checking each element for condition (which leads only to O(n) in the worst case)?
Also, what kind of data structure is better to use if I need to check more than one attribute. E.g. in the example above, I was looking only for the id attribute, but what if I also need to search by name, or color, etc?
You can share some of the work. See members, which takes in a list of values and outputs a list of exactly those values of the input list that are in the tree. Note: The order of the input list is not perserved in the output list.
EDIT: I'm actually not sure if you can get better performance (from a theoretical standpoint) with members over doing map member. I think that if the input list is sorted, then you could by splitting the list in threes (lss, eqs, gts) could be done easily.
data BinTree a
= Branch (BinTree a) a (BinTree a)
| Leaf
deriving (Show, Eq, Ord)
empty :: BinTree a
empty = Leaf
singleton :: a -> BinTree a
singleton x = Branch Leaf x Leaf
add :: (Ord a) => a -> BinTree a -> BinTree a
add x Leaf = singleton x
add x tree#(Branch left y right) = case compare x y of
EQ -> tree
LT -> Branch (add x left) y right
GT -> Branch left y (add x right)
member :: (Ord a) => a -> BinTree a -> Bool
member x Leaf = False
member x (Branch left y right) = case compare x y of
EQ -> True
LT -> member x left
GT -> member x right
members :: (Ord a) => [a] -> BinTree a -> [a]
members xs Leaf = []
members xs (Branch left y right) = eqs ++ members lts left ++ members gts right
where
comps = map (\x -> (compare x y, x)) xs
grab ordering = map snd . filter ((ordering ==) . fst)
eqs = grab EQ comps
lts = grab LT comps
gts = grab GT comps
A quite acceptable solution when searching for multiple elements is to search for them one at a time with the most efficient algorithm (which is O(log n) in your case). However, it can be quite advantageous to step through the entire tree and pool all the elements that match a certain condition, it really depends on where and how often you search inside your code. If you only search at one point in your code it would make sense to collect all the elements in the tree in one shot instead of searching for them one by one. If you decide to opt for that solution then you could feasibly use other data structures such as a list.
If you need to check for multiple attributes I suggest replacing "id" with a tuple containing all the different possible identifiers (id, color, ...). You can then unpack the tuple and compare whichever identifiers you want.
Assuming your binary tree is balanced, if you have a constant number k of search items, then k searches with a total time of O(k * log(n)) is still better than a single O(n) search, where at each character, you still have to do k comparisons, making it O(k*n). Even if the list of search items is sorted, and you can binary search in O(log(k)) time to see if your current item is a match, you're still at O(n * log(k)), which is worse than the tree unless k is Theta(n).
No.
A single search is O(log n). 4 searchs is (4 log n). A linear search, which would pick up all items, is O(n). The tree structure of a btree means finding more than one datum requires a walk (which is actually worse than a list walk).

Resources