XPath: Recursive Ancestor Search - xpath

With my current node selected I wish to find an element <name> [./name] and return the text content. If an element <name> does not exist in the currently selected node I wish to check the parent element [./parent::name] and so on recursively up to the root, returning the value of the nearest parent where the element exists.
Can it be done with XPath?

(Edit: I misinterpreted the question the first time)
I propose to use
ancestor-or-self::name[1]
This finds all name elements, starting with self, parent, and so on, and orders them in increasing distance from self. So selecting [1] gives you the nearest one.

Related

Xpath concept confusion

I'm stuck at trying to figure out what does the following XPath expressions lead to in XML:
paper/publisher/parent::*/author
/bib//address[ancestor::book]
/bib//author/ancestor::*//zip
1) The first is to show all the parent that has author as the root element? What does */ means
2) The second one list all the ancestor root element under book?
3) The third one I really have no clue, it list all the zip node
Just confused about how ancestor node works overall, please give some guidance.
The first XPath expression is using the parent:: axis with an element wildcard node test. From the publisher element, it "jumps up" to it's parent, the paper element, and then selects the author child element. It only selects that author element if the publisher sibling is present. It could be written more compactly and avoid looking up the node tree with this XPath using a predicate filter on the paper element: paper[publisher]/author.
That XPath selects all of the address elements that are descendants of the bib element and are also descendants of a book element(which could be the ancestor or descendant of the bib ancestor). The square braces are a predicate, which filters the nodes that pass the test for the expression. In this case, it must be a descendant of a book element in order to "pass" the test and be selected.
This XPath looks for any zip element that is the descendant of an element that is the ancestor of an author element that is the descendant of a bib element.
Below is a visualization of the XPath axes, in order to understand what it means to select an ancestor, parent, sibling, descendant, etc.
It might be helpful to play with an XPath visualization tool, in order to evaluate your XPath expressions against some sample XML and see what is and is not selected.
For example:
http://chris.photobooks.com/xml

Find number of leaves under each node of a tree

I have a tree which is represented in the following format:
nodes is a list of nodes in the tree in the order of their height from top. Node at height 0 is the first element of nodes. Nodes at height 1 (read from left to right) are the next elements of nodes and so on.
n_children is a list of integers such that n_children[i] = num children of nodes[i]
For example given a tree like {1: {2, 3:{4,5,2}}}, nodes=[1,2,3,4,5,2], n_children = [2,0,3,0,0,0].
Given a Tree, is it possible to generate nodes and n_children and the number of leaves corresponding to each node in nodes by traversing the tree only once?
Is such a representation unique? Or is it possible for two different trees to have the same representation?
For the first question - creating the representation given a tree:
I am assuming by "a given tree" we mean a tree that is given in the form of node-objects, each holding its value and a list of references to its children-node-objects.
I propose this algorithm:
Start at node=root.
if node.children is empty return {values_list:[[node.value]], children_list:[[0]]}
otherwise:
3.1. construct two lists. One will be called values_list and each element there shall be a list of values. The other will be called children_list and each element there shall be a list of integers. Each element in these two lists will represent a level in the sub-tree beginning with node, including node itself (will be added at step 3.3).
So values_list[1] will become the list of values of the children-nodes of node, and values_list[2] will become the list of values of the grandchildren-nodes of node. values_list[1][0] will be the value of the leftmost child-node of node. And values_list[0] will be a list with one element alone, values_list[0][0], which will be the value of node.
3.2. for each child-node of node (for which we have references through node.children):
3.2.1. start over at (2.) with the child-node set to node, and the returned results will be assigned back (when the function returns) to child_values_list and child_children_list accordingly.
3.2.2. for each index i in the lists (they are of same length) if there is a list already in values_list[i] - concatenate child_values_list[i] to values_list[i] and concatenate child_children_list[i] to children_list[i]. Otherwise assign values_list[i]=child_values_list[i] and children_list[i]=child.children.list[i] (that would be a push - adding to the end of the list).
3.3. Make node.value the sole element of a new list and add that list to the beginning of values_list. Make node.children.length the sole element of a new list and add that list to the beginning of children_list.
3.4. return values_list and children_list
when the above returns with values_list and children_list for node=root (from step (1)), all we need to do is concatenate the elements of the lists (because they are lists, each for one specific level of the tree). After concatenating the list-elements, the resulting values_list_concatenated and children_list_concatenated will be the wanted representation.
In the algorithm above we visit a node only by starting step (2) with it set as node and we do that only once for each child of a node we visit. We start at the root-node and each node has only one parent => every node is visited exactly once.
For the number of leaves associated with each node: (if I understand correctly - the number of leaves in the sub-tree a node is its root), we can add another list that will be generated and returned: leaves_list.
In the stop-case (no children to node - step (2)) we will return leaves_list:[[1]]. In step (3.2.2) we will concatenate the list-elements like the other two lists' list-elements. And in step (3.3) we will sum the first list-element leaves_list[0] and will make that sum the sole element in a new list that we will add to the beginning of leaves_list. (something like leaves_list.add_to_eginning([leaves_list[0].sum()]))
For the second question - is this representation unique:
To prove uniqueness we actually want to show that the function (let's call it rep for "representation") preserves distinctiveness over the space of trees. i.e. that it is an injection. As you can see in the wiki linked, for that it suffices to show that there exists a function (let's call it tre for "tree") that given a representation gives a tree back, and that for every tree t it holds that tre(rep(t))=t. In simple words - that we can make a method that takes a representation and builds a tree out of it, and for every tree if we make its representation and passes that representation through that methos we'll get the exact same tree back.
So let's get cracking!
Actually the first job - creating that method (the function tre) is already done by you - by the way you explained what the representation is. But let's make it explicit:
if the lists are empty return the empty tree. Otherwise continue
make the root node with values[0] as its value and n_children[0] as its number of children (without making the children nodes yet).
initiate a list-index i=1 and a level index li=1 and level-elements index lei=root.children.length and a next-level-elements accumulator nle_acc=0
while lei>0:
4.1. for lei times:
4.1.1. make a node with values[i] as value and n_children[i] as the number of children.
4.1.2. add the new node as the leftmost child in level li that has not been filled yet (traverse the tree to the li level from the leftmost in right direction and assign the new node to the first reference that is not assigned yet. We know the previous level is done, so each node in the li-1 level has a children.length property we can check and see if each has filled the number of children they should have)
4.1.3. add nle_acc+=n_children[i]
4.1.4. increment ++i
4.2. assign lei=nle_acc (level-elements can take what the accumulator gathered for it)
4.3. clear nle_acc=0 (next-level-elements accumulator needs to accumulate from the start for the next round)
Now we need to prove that an arbitrary tree that is passed through the first algorithm and then through the second algorithm (this one here) will get out of all of that the same as it was originally.
As I'm not trying to prove the corectness of the algorithms (although I should), let's assume they do what I intended them to do. i.e. the first one writes the representation as you described it, and the second one makes a tree level-by-level, left-to-right, assigning a value and the number of children from the representation and fills the children references according to those numbers when it comes to the next level.
So each node has the right amount of children according to the representation (that's how the children were filled), and that number was written from the tree (when generating the representation). And the same is true for the values and thus it is the same tree as the original.
The proof actually should be much more elaborate and detailed - but I think I'll leave it at that now. If there will be a demand for elaboration maybe I'll make it an actual proof.

What is the need for asymmetric linked list

I am studying data structures. I have come across Asymmetric linked list which states that it is a special type of double linked list in which
1. next link points to next node address
2. prev link points to current node address itself
But I wonder,
1. what are the advantages we get by designing such linked list?
2. what kind of applications this would be suitable for?
Could anyone kindly explain more on Asymmetric linked list. I googled but I could not find relevent answers. Thank you.
Source :http://en.wikipedia.org/wiki/Doubly_linked_list#Asymmetric_doubly-linked_list
I agree the Wiki page is misleading. Here is the difference between LL and ALL:
Open Linked List:
node.next = nextNode
node.prev = prevNode
Asymmetric Linked List:
node.next = nextNode
node.prev = prevNode.next
Note the difference prevNode vs prevNode.next.
While pointing to a pointer within node still preserves the ability to traverse list backwards (you can get prevNode address by subtracting from prevNode.next) it may simplify insertion and deletion operations on the list, especially on the start element.
Given a node pointer from a double linked list, we can traverse all the nodes by the 'prev' and 'next', while a single linked list cannot do that if the pointer provided didn;t point to the first node.
E.g, delete a node from linked list. With single linked list, you have to traverse the list from head to find the specific node, and also need to record the prev node against the specific node, which causes the time complexity O(n). However, with double linked list, you can perform the delete with the specific node with the constant time.
In short, given a specific node, for single linked list, if we need to use its prev node information, the traverse wiht O(n) from the head is inevitable, while double lined list doesn't.
By the way, list in STL and LinkedList in Java are implemented with double linked list.
Because a picture worth thousands words :
As you can see, "previous" field is referencing "next", rather than previous element itself. This make little difference between nodes, except for first element : the previous field can point to the head rather than pointing to the last element (circular list) or be null.
The main advantage is for insertion and deletion : you don't need to take care of head and check if element is first one. Just having a single pointer to an element is enough to perform an insert or a delete to the list.
One disadvantage vs circular list : the only way to get last element (eg: to implement some "add last" operation) is to loop through the whole list.
You also lose the ability to loop through the list in reverse way (because no previous pointer), except if all elements have same size and you are allowed to do pointer arithmetic (as it is in C/C++).

Given a binary search tree and a key, How to find first M tree nodes which values are closest to that key?

Recently, I encounter an algorithm problem:
Given a binary search tree and a key, How to find first M tree nodes
which values are closest to that key?
My idea is to use In-order traversal to put each tree node value into a another array. And we then use binary search to find the most closest node X to that given key in this array. In the end, we start search from this node X to the left and right side to find the first M values are closest to that given key.
However, My idea require O(n) time and O(n) space. is there some better thought than my idea?
Locate the key (or at least where it would go)
From that point, advance to the next highest value
Also from that point, regress to the next lowest value
Whichever is closer, save that value and move to the next value away from the key. That is, if the higher value is closer, go to the next higher value; otherwise, go to the next lower one.
Continue until you have your M nodes.

Trie iterator, suffix ordering

I need to implement an iterator for a trie. Let's say I have
a
/\
b c
/\
d e
If the current iterator.state="abd", I would like to have iterator.next.state="abe", then "ac". At each level, nodes are sorted in lexicographical order (e.g. on level 2, c comes after b). Also this should happen in log(n) time, where n is the number of nodes.
One solution I can think of is: consider a special case, when each branch has the same height. A rather cool implementation I think, would be to maintain a balanced tree for each "level". On asking: "what string follows after abd", when positioned on b, one could search for the first element bigger than "b" in the tree associated with the third level, giving "abe".
However that might be impractical, due to having to create the trees.
If I understand the question correctly, the iterator state could be the current string and a pointer to the current location in the trie. Then, to move to the next element:
if your current location has a sibling, move to it and replace the last character in the current string with the current location's character.
else, remove the last character and go up the tree. If you're trying to go up from the root, you're done. Otherwise, go to step 1.
So for example when you're at abd (in your example), the current string is "abd" and the pointer points to the 'd'. To move to the next element you change the string to "ab", move to the sibling node ('e') and add it to the string, yielding "abe". After that, you'll be going up since there's no sibling and then to b's sibling, yielding the correct next value 'ac'.
As you can see, at worst each of those steps needs to go all the way back to the root before it can find a sibling; that's the log(n) you were asking for.

Resources