Related
I am quite new to Coq, but for my project I have to use a union-find data structure in Coq. Are there any implementations of the union-find (disjoint set) data structure in Coq?
If not, can someone provide an implementation or some ideas? It doesn't have to be very efficient. (no need to do path compression or all the fancy optimizations) I just need a data structure that can hold an arbitrary data type (or nat if it's too hard) and perform: union and find.
Thanks in advance
If all you need is a mathematical model, with no concern for actual performance, I would go for the most straightforward one: a functional map (finite partial function) in which each element optionally links to another element with which it has been merged.
If an element links to nothing, then its canonical representative is itself.
If an element links to another element, then its canonical representative is the canonical representative of that other element.
Note: in the remaining of this answer, as is standard with union-find, I will assume that elements are simply natural numbers. If you want another type of elements, simply have another map that binds all elements to unique numbers.
Then you would define a function find : UnionFind → nat → nat that returns the canonical representative of a given element, by following links as long as you can. Notice that the function would use recursion, whose termination argument is not trivial. To make it happen, I think that the easiest way is to maintain the invariant that a number only links to a lesser number (i.e. if i links to j, then i > j). Then the recursion terminates because, when following links, the current element is a decreasing natural number.
Defining the function union : UnionFind → nat → nat → UnionFind is easier: union m i j simply returns an updated map with max i' j' linking to min i' j', where i' = find m i and j' = find m j.
[Side note on performance: maintaining the invariant means that you cannot adequately choose which of a pair of partitions to merge into the other, based on their ranks; however you can still implement path compression if you want!]
As for which data structure exactly to use for the map: there are several available.
The standard library (look under the title FSets) has several implementations (FMapList, FMapPositive and so on) satisfying the interface FMapInterface.
The stdpp libray has gmap.
Again if performance is not a concern, just pick the simplest encoding or, more importantly, the one that makes your proofs the simplest. I am thinking of just a list of natural numbers.
The positions of the list are the elements in reverse order.
The values of the list are offsets, i.e. the number of positions to skip forward in order to reach the target of the link.
For an element i linking to j (i > j), the offset is i − j.
For a canonical representative, the offset is zero.
With my best pseudo-ASCII-art skills, here is a map where the links are { 6↦2, 4↦2, 3↦0, 2↦1 } and the canonical representatives are { 5, 1, 0 }:
6 5 4 3 2 1 0 element
↓ ↓ ↓ ↓ ↓ ↓ ↓
/‾‾‾‾‾‾‾‾‾↘
[ 4 ; 0 ; 2 ; 3 ; 1 ; 0 ; 0 ] map
\ \____↗↗ \_↗
\___________/
The motivation is that the invariant discussed above is then enforced structurally. Hence, there is hope that find could actually be defined by structural induction (on the structure of the list), and have termination for free.
A related paper is: Sylvain Conchon and Jean-Christophe Filliâtre. A Persistent Union-Find Data Structure. In ACM SIGPLAN Workshop on ML.
It describes the implementation of an efficient union-find data structure in ML, that is persistent from the user perspective, but uses mutation internally. What may be more interesting for you, is that they prove it correct in Coq, which implies that they have a Coq model for union-find. However, this model reflects the memory store for the imperative program that they seek to prove correct. I’m not sure how applicable it is to your problem.
Maëlan has a good answer, but for an even simpler and more inefficient disjoint set data structure, you can just use functions to nat to represent them. This avoids any termination stickiness. In essence, the preimages of any total function form disjoint sets over the domain. Another way of looking at this is as representing any disjoint set G as the curried application find_root G : nat -> nat since find_root is the essential interface that disjoint sets provide.
This is also analogous to using functions to represent Maps in Coq like in Software Foundations. https://softwarefoundations.cis.upenn.edu/lf-current/Maps.html
Require Import Arith.
Search eq_nat_decide.
(* disjoint set *)
Definition ds := nat -> nat.
Definition init_ds : ds := fun x => x.
Definition find_root (g : ds) x := g x.
Definition in_same_set (g : ds) x y :=
eq_nat_decide (g x) (g y).
Definition union (g : ds) x y : ds :=
fun z =>
if in_same_set g x z
then find_root g y
else find_root g z.
You can also make it generic over the type held in the disjoint set like so
Definition ds (a : Type) := a -> nat.
Definition find_root {a} (g : ds a) x := g x.
Definition in_same_set {a} (g : ds a) x y :=
eq_nat_decide (g x) (g y).
Definition union {a} (g : ds a) x y : ds a :=
fun z =>
if in_same_set g x z
then find_root g y
else find_root g z.
To initialize the disjoint set for a particular a, you need an Enum instance for your type a basically.
Definition init_bool_ds : ds bool := fun x => if x then 0 else 1.
You may want to trade out eq_nat_decide for eqb or some other roughly equivalent thing depending on your proof style and needs.
This question haskell fold rose tree paths delved into the code for folding a rose tree to its paths. I was experimenting with infinite rose trees, and I found that the provided solution was not lazy enough to work on infinite rose trees with infinity in both depth and breadth.
Consider a rose tree like:
data Rose a = Rose a [Rose a] deriving (Show, Functor)
Here's a finite rose tree:
finiteTree = Rose "root" [
Rose "a" [
Rose "d" [],
Rose "e" []
],
Rose "b" [
Rose "f" []
],
Rose "c" []
]
The output of the edge path list should be:
[["root","a","d"],["root","a","e"],["root","b","f"],["root","c"]]
Here is an infinite Rose tree in both dimensions:
infiniteRoseTree :: [[a]] -> Rose a
infiniteRoseTree ((root:_):breadthGens) = Rose root (infiniteRoseForest breadthGens)
infiniteRoseForest :: [[a]] -> [Rose a]
infiniteRoseForest (breadthGen:breadthGens) = [ Rose x (infiniteRoseForest breadthGens) | x <- breadthGen ]
infiniteTree = infiniteRoseTree depthIndexedBreadths where
depthIndexedBreadths = iterate (map (+1)) [0..]
The tree looks like this (it's just an excerpt, there's infinite depth and infinite breadth):
0
|
|
[1,2..]
/ \
/ \
/ \
[2,3..] [2,3..]
The paths would look like:
[[0,1,2..]..[0,2,2..]..]
Here was my latest attempt (doing it on GHCi causes an infinite loop, no streaming output):
rosePathsLazy (Rose x []) = [[x]]
rosePathsLazy (Rose x children) =
concat [ map (x:) (rosePathsLazy child) | child <- children ]
rosePathsLazy infiniteTree
The provided solution in the other answer also did not produce any output:
foldRose f z (Rose x []) = [f x z]
foldRose f z (Rose x ns) = [f x y | n <- ns, y <- foldRose f z n]
foldRose (:) [] infiniteTree
Both of the above work for the finite rose tree.
I tried a number of variations, but I can't figure out to make the edge folding operation lazy for infinite 2-dimensional rose tree. I feel like it has something to do with infinite amounts of concat.
Since the output is a 2 dimensional list. I can run a 2 dimensional take and project with a depth-limit or a breadth-limit or both at the same time!
Any help is appreciated!
After reviewing the answers here and thinking about it a bit more. I came to the realisation that this is unfoldable, because the resulting list is uncountably infinite. This is because an infinite depth & breadth rose tree is not a 2 dimensional data structure, but an infinite dimensional data structure. Each depth level confers an extra dimension. In other words, it is somewhat equivalent to an infinite dimensional matrix, imagine a matrix where each field is another matrix.. ad-infinitum. The cardinality of the infinite matrix is infinity ^ infinity, which has been proven (I think) to be uncountably infinite. This means any infinite dimensional data structure is not really computable in a useful sense.
To apply this to the rose tree, if we have infinite depth, then the paths never enumerate past the far left of the rose tree. That is this tree:
0
|
|
[1,2..]
/ \
/ \
/ \
[2,3..] [2,3..]
Would produce a path like: [[0,1,2..], [0,1,2..], [0,1,2..]..], and we'd never get past [0,1,2..].
Or in another way, if we have a list containing lists ad-infinitum. We can also never count (enumerate) it either, as there would be an infinite amount of dimensions that the code would jump to.
This also has some relationship to real numbers being uncountably infinite too. In a lazy list of infinite real numbers would just infinitely produce 0.000.. and never enumerate past that.
I'm not sure how to formalise the above explanation, but that's my intuition. (For reference see: https://en.wikipedia.org/wiki/Uncountable_set) It'd be cool to see someone expand on applying https://en.wikipedia.org/wiki/Cantor's_diagonal_argument to this problem.
This book seems to expand on it: https://books.google.com.au/books?id=OPFoJZeI8MEC&pg=PA140&lpg=PA140&dq=haskell+uncountably+infinite&source=bl&ots=Z5hM-mFT6A&sig=ovzWV3AEO16M4scVPCDD-gyFgII&hl=en&sa=X&redir_esc=y#v=onepage&q=haskell%20uncountably%20infinite&f=false
For some reason, dfeuer has deleted his answer, which included a very nice insight and only a minor, easily-fixed problem. Below I discuss his nice insight, and fix the easily-fixed problem.
His insight is that the reason the original code hangs is because it is not obvious to concat that any of the elements of its argument list are non-empty. Since we can prove this (outside of Haskell, with paper and pencil), we can cheat just a little bit to convince the compiler that it's so.
Unfortunately, concat isn't quite good enough: if you give concat a list like [[1..], foo], it will never draw elements from foo. The universe collection of packages can help here with its diagonal function, which does draw elements from all sublists.
Together, these two insights lead to the following code:
import Data.Tree
import Data.Universe.Helpers
paths (Node x []) = [[x]]
paths (Node x children) = map (x:) (p:ps) where
p:ps = diagonal (map paths children)
If we define a particular infinite tree:
infTree x = Node x [infTree (x+i) | i <- [1..]]
We can look at how it behaves in ghci:
> let v = paths (infTree 0)
> take 5 (head v)
[0,1,2,3,4]
> take 5 (map head v)
[0,0,0,0,0]
Looks pretty good! Of course, as observed by ErikR, we cannot have all paths in here. However, given any finite prefix p of an infinite path through t, there is a finite index in paths t whose element starts with prefix p.
Not a complete answer, but you might be interested in this detailed answer on how Haskell's permutations function is written so that it works on infinite lists:
What does this list permutations implementation in Haskell exactly do?
Update
Here's a simpler way to create an infinite Rose tree:
iRose x = Rose x [ iRose (x+i) | i <- [1..] ]
rindex (Rose a rs) [] = a
rindex (Rose _ rs) (x:xs) = rindex (rs !! x) xs
Examples:
rindex (iRose 0) [0,1,2,3,4,5,6] -- returns: 26
rindex infiniteTree [0,1,2,3,4,5,6] -- returns: 13
Infinite Depth
If a Rose tree has infinite depth and non-trivial width (> 1) there can't be an algorithm to list all of the paths just using a counting argument - the number of total paths is uncountable.
Finite Depth & Infinite Breadth
If the Rose tree has finite depth the number of paths is countable even if the trees have infinite breadth, and there is an algorithm which can produce all possible paths. Watch this space for updates.
ErikR has explained why you can't produce a list that necessarily contains all the paths, but it is possible to list paths lazily from the left. The simplest trick, albeit a dirty one, is to recognize that the result is never empty and force that fact on Haskell.
paths (Rose x []) = [[x]]
paths (Rose x children) = map (x :) (a : as)
where
a : as = concatMap paths children
-- Note that we know here that children is non-empty, and therefore
-- the result will not be empty.
For making very infinite rose trees, consider
infTree labels = Rose labels (infForest labels)
infForest labels = [Rose labels' (infForest labels')
| labels' <- map (: labels) [0..]]
As chi points out, while this definition of paths is productive, it will in some cases repeat the leftmost path forever, and never reach any more. Oops! So some attempt at fairness or diagonal traversal is necessary to give interesting/useful results.
In DFA we can do the intersection of two automata by doing the cross product of the states of the two automata and accepting those states that are accepting in both the initial automata.
Union is performed similarly. How ever although i can do union in NFA easily using epsilon transition how do i do their intersection?
You can use the cross-product construction on NFAs just as you would DFAs. The only changes are how you'd handle ε-transitions. Specifically, for each state (qi, rj) in the cross-product automaton, you add an ε-transition from that state to each pair of states (qk, rj) where there's an ε-transition in the first machine from qi to qk and to each pair of states (qi, rk) where there's an ε-transition in the second machine from rj to rk.
Alternatively, you can always convert the NFAs into DFAs and then compute the cross product of those DFAs.
Hope this helps!
We can also use De Morgan's Laws: A intersection B = (A' U B')'
Taking the union of the compliments of the two NFA's is comparatively simpler, especially if you are used to the epsilon method of union.
There is a huge mistake in templatetypedef's answer.
The product automaton of L1 and L2 which are NFAs :
New states Q = product of the states of L1 and L2.
Now the transition function:
a is a symbol in the union of both automatons' alphabets
delta( (q1,q2) , a) = delta_L1(q1 , a) X delta_L2(q2 , a)
which means you should multiply the set that is the result of delta_L1(q1 , a) with the set that results from delta_L2(q1 , a).
The problem in the templatetypedef's answer is that the product result (qk ,rk) is not mentioned.
Probably a late answer, but since I had the similar problem today I felt like sharing it. Realise the meaning of intersection first. Here, it means that given the string e, e should be accepted by both automata.
Consider the folowing automata:
m1 accepting the language {w | w contains '11' as a substring}
m2 accepting the language {w | w contains '00' as a substring}
Intuitively, m = m1 ∩ m2 is the automaton accepting the strings containing both '11' and '00' as substrings. The idea is to simulate both automata simultaneously.
Let's now formally define the intersection.
m = (Q, Σ, Δ, q0, F)
Let's start by defining the states for m; this is, as mentioned above the Cartesian product of the states in m1 and m2. So, if we have a1, a2 as labels for the states in m1, and b1, b2 the states in m2, Q will consist of following states: a1b1, a2b1, a1b2, a2b2. The idea behind this product construction is to keep track of where we are in both m1 and m2.
Σ most likely remains the same, however in some cases they differ and we just take the union of alphabets in m1 and m2.
q0 is now the state in Q containing both the start state of m1 and the start state of m2. (a1b1, to give an example.)
F contains state s IF and only IF both states mentioned in s are accept states of m1, m2 respectively.
Last but not least, Δ; we define delta again in terms of the Cartesian product, as follows: Δ(a1b1, E) = Δ(m1)(a1, E) x Δ(m2)(b1, E), as also mentioned in one of the answers above (if I am not mistaken). The intuitive idea behind this construction for Δ is just to tear a1b1 apart and consider the states a1 and b1 in their original automaton. Now we 'iterate' each possible edge, let's pick E for example, and see where it brings us in the original automaton. After that, we glue these results together using the Cartesian product. If (a1, E) is present in m1 but not Δ(b1, E) in m2, then the edge will not exist in m; otherwise we'll have some kind of a union construction.
An alternative to constructing the product automaton is allowing more complicated acceptance criteria. Ordinarily, an NFA accepts an input string when it has reached any one of a set of accepting final states. That can be extended to boolean combinations of states. Specifically, you construct the automaton for the intersection like you do for the union, but consider the resulting automaton to accept an input string only when it is in (what corresponds to) accepting final states in both automata.
Okay, so I've been trying to teach myself Prolog recently, and am having a hard time wrapping my head around finding a "Shortest Path" between two (defined) elements in a list of lists. It may not be the most effective way of representing a Grid or finding a Shortest Path, but I'd like to try it this way.
For example:
[[x,x,x,x,x,x,x],
[x,1,o,o,o,o,x],
[x,-,-,-,o,-,x],
[x,-,-,o,o,-,x],
[x,o,o,o,o,2,x],
[x,o,-,-,o,o,x],
[x,x,x,x,x,x,x]]
A few assumptions I can make (either given or based on checking before path-finding):
The grid is square
Their will always exist a path from 1 to 2
'1' can pass through anything except '-' (walls) or 'x' (borders)
The goal is for '1' to find a shortest path to '2'.
In the instance of:
[[x,x,x,x,x,x,x],
[x,o,o,1,o,o,x],
[x,-,o,o,o,-,x],
[x,-,o,-,o,-,x],
[x,o,o,2,o,o,x],
[x,o,-,-,-,o,x],
[x,x,x,x,x,x,x]]
Notice, there are two "Shortest paths":
[d,l,d,d,r]
and
[d,r,d,d,l]
In Prolog, I'm trying to make the function (if that's the proper name):
shortestPath(Grid,Path)
I've made a function to find elements '1' and '2', and a function that verifies that the grid is valid, but I can't even begin how to start constructing a function to find a shortest path from '1' to '2'.
Given a defined Grid, I'd like the output of Path to be the shortest path. Or, given a defined Grid AND a defined Path, I'd like to check if it's indeed a shortest path.
Help would be much appreciated! If I missed anything, or was unclear, let me know!
not optimized solution
shortestPath(G, S) :-
findall(L-P, (findPath(G,P), length(P,L)), All),
keysort(All, [_-S|_]).
findPath(G, Path) :-
pos(G, (Rs,Cs), 1),
findPath(G, [(Rs,Cs)], [], Path).
findPath(G, [Act|Rest], Trail, Path) :-
move(Act,Next,Move),
pos(G, Next, Elem),
( Elem == 2
-> reverse([Move|Trail], Path)
; Elem == o
-> \+ memberchk(Next, Rest),
findPath(G, [Next,Act|Rest], [Move|Trail], Path)
).
move((R,C), (R1,C1), M) :-
R1 is R-1, C1 is C , M = u;
R1 is R , C1 is C-1, M = l;
R1 is R+1, C1 is C , M = d;
R1 is R , C1 is C+1, M = r.
pos(G, (R,C), E) :- nth1(R, G, Row), nth1(C, Row, E).
grid(1,
[[x,x,x,x,x,x,x],
[x,1,o,o,o,o,x],
[x,-,-,-,o,-,x],
[x,-,-,o,o,-,x],
[x,o,o,o,o,2,x],
[x,o,-,-,o,o,x],
[x,x,x,x,x,x,x]]).
grid(2,
[[x,x,x,x,x,x,x],
[x,o,o,1,o,o,x],
[x,-,o,o,o,-,x],
[x,-,o,-,o,-,x],
[x,o,o,2,o,o,x],
[x,o,-,-,-,o,x],
[x,x,x,x,x,x,x]]).
What is the standard way of inserting an element to a specific position in a list in OCaml. Only recursion is allowed. No assignment operation is permitted.
My goal is to compress a graph in ocaml by removing vertexes with in_degree=out_degree=1. For this reason I need to remove the adjacent edges to make a single edge. Now the edges are in a list [(6,7);(1,2);(2,3);(5,4)]. So I need to remove those edges from the list and add a single edge.
so the above list will now look like [(6,7);(1,3);(5,4)]. Here we see (1,2);(2,3) is removed and (1,3) is inserted in the second position. I have devised an algorithm for this. But to do this I need to know how can I remove the edges (1,2);(2,3) from position 2,3 and insert (1,3) in position 2 without any explicit variable and in a recursive manner.
OCaml list is immutable so there's no such thing like removing and inserting elements in list operations.
What you can do is creating a new list by reusing certain part of the old list. For example, to create a list (1, 3)::xs' from (1, 2)::(2, 3)::xs' you actually reuse xs' and make the new list using cons constructor.
And pattern matching is very handy to use:
let rec transform xs =
match xs with
| [] | [_] -> xs
| (x, y1)::(y2, z)::xs' when y1 = y2 -> (x, z)::transform xs'
| (x, y1)::(y2, z)::xs' -> (x, y1)::transform ((y2, z)::xs')
You can do something like that :
let rec compress l = match l with
[] -> []
| x :: [] -> [x]
| x1 :: x2 :: xs ->
if snd x1 = fst x2 then
(fst x1, snd x2) :: compress xs
else x1 :: compress (x2 :: xs)
You are using the wrong datastructure to store your edges and your question doesnt indicate that you can't choose a different datastructure. As other posters already said: lists are immutable so repeated deletion of elements deep within them is a relatively costly (O(n)) operation.
I also dont understand why you have to reinsert the new edge at position 2. A graph is defined by G=(V,E) where V and E are sets of vertices and edges. The order of them therefor doesnt matter. This definition of graphs also already tells you a better datastructure for your edges: sets.
In ocaml, sets are represented by balanced binary trees so the average complexity of insertion and deletion of members is O(log n). So you see that for deletion of members this complexity is definitely better than the one of lists (O(n)) on the other hand it is more costly to add members to a set than it is to prepend elements to a list using the cons operation.
An alternative datastructure would be a hashtable where insertion and deletion can be done in O(1) time. Let the keys in the hashtable be your edges and since you dont use the values, just use a constant like unit or 0.