Tarjan's offline Least Common Ancestor Algorithm - data-structures

I am currently reading about an algorithm from Tarjan on how to get the Least Common Ancestor of two nodes in a Binary Tree.
I have read the pseudo code from Wikipedia, but I'm not getting the gist of it. I mean I am not able to apply the algorithm on any given Binary Tree. I also tried to find some explanation of each steps on Google but i did not get anything worth. So, if anybody can help me in understanding how this algorithm works on a Binary Tree, it would be really appreciable.

Besides the given binary tree , you should implement another data structure named disjoint set to apply this algorithm . There are three main methods along with this data structure , MAKE-SET、UNION、FIND-SET . I strongly recommend you to read Chapter 21 "Data Structures for Disjoint Sets" of 《Introduction to algorithms》 for a better understanding.

Related

Where can I find the definitions of binary trees and the algorithms associated with them in isabelle?

Where can I find the definition of a binary tree and the algorithms associated with binary trees in Isabelle?
I am a beginner in Isabelle and, therefore, I am looking for new learning materials. Recently, I was trying to find the definition of a binary tree and the algorithms on binary trees in Isabelle, but, unfortunately, my attempt failed. Where can I find them? Thank you for your help in advance.
Binary trees are defined in HOL-Library (~~/src/HOL/Library/Tree.thy). Some algorithms on them (i.e. implementations of data structures such as AVL trees with them) are defined in HOL-Data_Structures (~~/src/HOL/Data_Structures/).
Both of these are in the Isabelle distribution. You can import them by writing e.g. "Data_Structures.AVL_Set" or "HOL-Library.Tree" (the quotation marks are required when there's a dash in the name).

create decision tree from data

I'm trying to create decision tree from data. I'm using the tree for guess-the-animal-game kind of application. User answers questions with yes/no and program guesses the answer. This program is for homework.
I don't know how to create decision tree from data. I have no way of knowing what will be the root node. Data will be different every time. I can't do it by hand. My data is like this:
Animal1: property1, property3, property5
Animal2: property2, property3, property5, property6
Animal3: property1, property6
etc.
I searched stackoverflow and i found ID3 and C4.5 algorithms. But i don't know if i should use them.
Can someone direct me, what algorithm should i use, to build decision tree in this situation?
I searched stackoverflow and i found ID3 and C4.5 algorithms. But i
don't know if i should use them.
Yes, you should. They are very commonly used decision trees, and have some nice open source implementations for them. (Weka's J48 is an example implementation of C4.5)
If you need to implement something from scratch, implementing a simple decision tree is fairly simple, and is done iteratively:
Let the set of labled samples be S, with set of properties P={p1,p2,...,pk}
Choose a property pi
Split S to two sets S1,S2 - S1 holds pi, and S2 do not. Create two children for the current node, and move S1 and S2 to them respectively
Repeat for S'=S1, S'=S2 for each of the subsets of samples, if they are not empty.
Some pointers:
At each iteration you basically split the current data to 2 subsets, the samples that hold pi, and the data that does not. You then create two new nodes, which are the current node's children, and repeat the process for each of them, each with the relevant subset of data.
A smart algorithm chooses the property pi (in step 2) in a way that minimizes the tree's height as much as it can (finding the best solution is NP-Hard, but there are greedy approaches to minimize entropy, for example).
After the tree is created, some pruning to it is done, in order to avoid overfitting.
A simple extension of this algorithm is using multiple decision trees that work seperately - this is called Random Forests, and is empirically getting pretty good results usually.

Similarities Between Trees

I am working on a problem of Clustering of Results of Keyword Search on Graph. The results are in the form of Tree and I need to cluster those threes in group based on their similarities. Every node of the tree has two keys, one is the table name in the SQL database(semantic form) and second is the actual values of a record of that table(label).
I have used Zhang and Shasha, Klein, Demaine and RTED algorithms to find the Tree Edit Distance between the trees based on these two keys. All algorithms use no of deletion/insertion/relabel operation need to modify the trees to make them look same.
**I want some more matrices of to check the similarities between two trees e.g. Number of Nodes, average fan outs and more so that I can take a weighted average of these matrices to reach on a very good similarity matrix which takes into account both the semantic form of the tree (structure) and information contained in the tree(Labels at the node).
Can you please suggest me some way out or some literature which can be of some help?**
Can anyone suggest me some good paper
Even if you had the (pseudo-)distances between each pair of possible trees, this is actually not what you're after. You actually want to do unsupervised learning (clustering) in which you combine structure learning with parameter learning. The types of data structures you want to perform inference on are trees. To postulate "some metric space" for your clustering method, you introduce something that is not really necessary. To find the proper distance measure is a very difficult problem. I'll point in different directions in the following paragraphs and hope they can help you on your way.
The following is not the only way to represent this problem... You can see your problem as Bayesian inference over all possible trees with all possible values at the tree nodes. You probably would have some prior knowledge on what kind of trees are more likely than others and/or what kind of values are more likely than others. The Bayesian approach would allow you to define priors for both.
One article you might like to read is "Learning with Mixtures of Trees" by Meila and Jordan, 2000 (pdf). It explains that it is possible to use a decomposable prior: the tree structure has a different prior from the values/parameters (this of course means that there is some assumption of independence at play here).
I know you were hinting at heuristics such as the average fan-out etc., but you might find it worthwhile to check out these new applications of Bayesian inference. Note, for example that within nonparametric Bayesian method it is also feasible to reason about infinite trees, as done e.g. by Hutter, 2004 (pdf)!

breadth first implementation in Giraph, Graphchi or Pregel

Does anyone knows if there exists a breadth first(from multiple sources) implementation in either of the graph processing systems- Giraph, Pregel or Graphchi.
Or please tell some easier implementation on either of the systems.
In the Giraph user mail list, one can find a few discussions - and i guess an implementation also - of BFS implementation.
I have made this type of search for Giraph in the past and they are available at:
https://github.com/MarcoLotz/GiraphBFSSO
https://github.com/MarcoLotz/GiraphBFSTO
The difference between them is that one is target oriented and other one is structure oriented.
Although they are not from multiple start vertex, one can easily modify the code to support it :)
You are looking for the multi-seed breadth first search (BFS) algorithm.
For Giraph, this is still an open issue as can be read in this
feature request.
For Pregel, you can not expect to find any open algorithm as Pregel is a closed-source graph system at Google.
I guess the easiest thing would be to use the code from Github and execute it for each source separately. One idea to optimize the algorithm's runtime complexity is to execute BFS for the first seed vertex and reuse your results for subsequent vertex seeds (the first BFS results in a spanning tree which can be easily transformed to a BFS order for any given seed vertex).
Nevertheless, KISS suggests to simply execute k times BFS for k seed vertices until you run into performance issues (which is unlikely due to linear runtime complexity of BFS).

Depth First Search Basics

I'm trying to improve my current algorithm for the 8 Queens problem, and this is the first time I'm really dealing with algorithm design/algorithms. I want to implement a depth-first search combined with a permutation of the different Y values described here:
http://en.wikipedia.org/wiki/Eight_queens_puzzle#The_eight_queens_puzzle_as_an_exercise_in_algorithm_design
I've implemented the permutation part to solve the problem, but I'm having a little trouble wrapping my mind around the depth-first search. It is described as a way of traversing a tree/graph, but does it generate the tree graph? It seems the only way that this method would be more efficient only if the depth-first search generates the tree structure to be traversed, by implementing some logic to only generate certain parts of the tree.
So essentially, I would have to create an algorithm that generated a pruned tree of lexigraphic permutations. I know how to implement the pruning logic, but I'm just not sure how to tie it in with the permutation generator since I've been using next_permutation.
Is there any resources that could help me with the basics of depth first searches or creating lexigraphic permutations in tree form?
In general, yes, the idea of the depth-first search is that you won't have to generate (or "visit" or "expand") every node.
In the case of the Eight Queens problem, if you place a queen such that it can attack another queen, you can abort that branch; it cannot lead to a solution.
If you were solving a variant of Eight Queens such that your goal was to find one solution, not all 92, then you could quit as soon as you found one.
More generally, if you were solving a less discrete problem, like finding the "best" arrangement of queens according to some measure, then you could abort a branch as soon as you knew it could not lead to a final state better than a final state you'd already found on another branch. This is related to the A* search algorithm.
Even more generally, if you are attacking a really big problem (like chess), you may be satisfied with a solution that is not exact, so you can abort a branch that probably can't lead to a solution you've already found.
The DFS algorithm itself does not generate the tree/graph. If you want to build the tree and graph, it's as simple building it as you perform the search. If you only want to find one soution, a flat LIFO data structure like a linked list will suffice for this: when you visit a new node, append it to the list. When you leave a node to backtrack in the search, pop the node off.
A book called "Introduction to algorithms" by anany levitan has a proper explanation for your understanding. He also provided the solution to 8 queens problem just the way you desctribed it. It will helpyou for sure.
As my understanding, for finding one solution you dont need any permutation all you need is dfs.That will lonely suffice in finding solution

Resources