Typically when I had to find siblings, I had to do this:
find('#child').find(:xpath, '..').find(#some-other-child-from-this-parent)
Does using sibling replace this entire line?
Does ancestor replace the call to traverse upwards in the xpath selector?
How "far down" do these instance methods navigate?
Thank you
Capybaras ancestor and sibling methods are called on an element and take the same parameters as find. They are implemented by locating all elements that match the passed in parameters and intersecting that with the set of ancestor or sibling elements respectively. From your example finding a sibling of the element with id of 'child' could be like
find('#child').sibling('.some_class')
which would return an element with the class some_class which is a sibling of the element with id of child. ancestor works the same way but looks up the document tree at all the elements ancestors.
td_element.ancestor('table')
would return the table element which is the ancestor of the previously found td element
Related
I am dealing with an XML vocabulary that has "default values": i.e., if a node does not have a certain subnode, I'd like to find the nearest enclosing node that has that subnode, and use its string value as the value of the original node's subnode.
E.g., if I have a tree
Super_node
sub_node: XXX
...
context_node
/* does not have a child with name sub_node */
and no intervening nodes between Super_node and context_node have a child sub_node, I want an expression that evaluates to XXX.
I was thinking that the following query should work, but I always get a node list:
string(ancestor-or-self::*/sub_node[1]/text())
My thinking is that ancestor-or-self::* returns, in reverse document order, the list of context_node, parent_of_context_node, ..., Super_node. I apply the sub_node test to that, and get the list of sub_nodes in that list, again, hopefully, in reverse document order.
I then apply the predicate [1], which should return the first element of that list. However, this predicate seems to be not effective: in my implementation (which I think is based on libxml2), I still receive a list.
What does work, I found after poking around on Stack Exchange a bit, is
string((ancestor-or-self::*/sub_node)[last()]/text())
Why is the predicate [1] above not effective?
The expression
ancestor-or-self::*/sub_node[1]
means
ancestor-or-self::*/(child::sub_node[1])
which selects the first sub_node element child of every ancestor element.
I suspect you were thinking of
(ancestor-or-self::*/sub_node)[1]
which selects all the sub_node children of all the ancestor elements, sorts them into document order, and then returns the first node in this list.
Predicates like [1] bind more strongly than "/".
I have a tree data-structure implemented in ruby. I'm using it to represent a parse-tree.
It works, as you might expect, by having many node objects, each containing useful values as well as an array of references to it's child-nodes.
I've written a method to traverse the tree that's pretty simple and works like:
def depth_first_traversal(node, &block)
if(node.has_children?)
depth_first_traversal(node.children[0], &block)
yield node
depth_first_traversal(node.children[1], &block)
else
yield node
end
end
The issue is that for each tree I only explicitly hold a reference to the root node. Thus far I've just been using my recursive traversal to access all the other nodes.
Now I need to change the values of the nodes in the tree and I'm not sure how to do it.
How could I modify this traversal so that I could modify each element in the tree, instead of just passing a reference to them in to &block?
--- EDIT: ---
Apologies for the lack of detail, I was trying to make my question broad and useful.
The 'value' of a node in the tree is several instance variables in each instance of the node-object. Lets call them #value and #type. There are getter and setter methods for them.
The tree is a binary tree - but that may change later. I also don't think that's the aspect of the problem I'm struggling with:
My tree explicitly creates the Node #root. All other nodes in the tree are created in a loop. So a typical node is accessible, for example as "the child of the child of the root" and in no other manner.
In other words, searching this structure of pointers is my only means of accessing the nodes.
If ruby passes exclusively by value, any value yielded (like in the above method) will be a copy of this object, not the object itself.
So I'm confused about how I should modify values in any tree, not just this one.
If I understand you correctly, you could probably do something like this:
def df_tree_map(node, &block)
if(node.has_children?)
df_tree_map(node.children[0], &block)
node = yield node
df_tree_map(node.children[1], &block)
else
node = yield node
end
end
Obviously this is going to have consequences to the tree structure, but that might be a benefit. The critical point here though is that you're block will need to return a node instead of any old thing. Returning a string, for example, isn't going to work the way that Array#map does, because a node inherently has children.
Another solution is to allow the map function to modify the contents of the node but not the structure. I'm taking a little liberty here as you didn't post the instance variables nodes have access too, but it should make enough sense:
def df_tree_map(node, &block)
if(node.has_children?)
df_tree_map(node.children[0], &block)
node.contents = yield node.contents
df_tree_map(node.children[1], &block)
else
node.contents = yield node.contents
end
end
Here, I'm not passing the node itself to the block, but rather the contents. This way, the tree structure cannot be altered by the map. It seems like it might be more consistent with the Array#map function, but it might not do what you're looking for.
Ello, ive got the Prolog function member which tells us if an element exists in a list.
now i should create a function that returns a list without that element. i know more or less how it should look like, but somehow i have no idea how to do it.
ideas so far:
return the elements in the list before our element, and concat it with the rest of the list after our element.
use member() in a predicate that goes through the list recursively and builds it.
help.
(Seems like homework to me so I'll give you an outline containing some hints ;-)
Given [H|T]...
... if H is the element to remove, return T, (If you need to remove all such elements, remember to recurse on T as well.)
... if H is not the element to remove, return [H|NewTail] where NewTail is result of recursively removing the element from T.
My rule requires me to apply them only to methods without 'get' as part of their name. In another words, my rules need to apply to only non-getter methods in the class. I know to get a hold of all the non-getter methods, I can use
//MethodDeclarator[not(contains(#Image,'get'))]
However, I don't know the syntax about where I insert my logic for the rules. Is it like
//MethodDeclarator[
not(contains(#Image,'get'))
'Some Rule Statements'
]
I saw the use of . in the beginning of statement inside [] in some example code. what are they used for?
In my particular case, I need to combine following pieces together but so far I am unable to accomplish it yet.
Piece 1:
//PrimaryExpression[not(PrimarySuffix/Arguments)]
Piece 2:
//MethodDeclarator[not(contains(#Image,'get'))]
Piece 3:
//PrimaryExpression[PrimaryPrefix/#Label='this']
You need to have at least some basic knowledge/understanding of XPath.
I saw the use of . in the beginning of statement inside [] in some
example code. what are they used for?
[] is called predicate. It must contain a boolean expression. It must immediately follow a node-test. This specifies an additional condition for a node that satisfies the node-test to be selected.
For example:
/*/num
selects all elements named num that are children of the top element of the XML document.
However, if we want to select only such num elements, whose value is an odd integer, we add this additional condition inside a predicate:
/*/num[. mod 2 = 1]
Now this last expression selects all elements named num that are children of the top element of the XML document and whose string value represents an odd integer.
. denotes the context node -- this is the node that has been selected so-far (or the starting node off which the complete XPath expression is evaluated).
In my particular case, I need to combine following pieces together ...
You forgot to say in what way / how the three expressions should be combined. In XPath some of the frequently used "combinators" are the operators and, or, and the function not().
For example, if you want to select elements that are selected by all three provided XPath expressions, you can use the and operator:
//PrimaryExpression
[not(PrimarySuffix/Arguments)
and
PrimaryPrefix/#Label='this'
]
Given this XML/HTML:
<dl>
<dt>Label1</dt><dd>Value1</dd>
<dt>Label2</dt><dd>Value2</dd>
<dt>Label3</dt><dd>Value3a</dd><dd>Value3b</dd>
<dt>Label4</dt><dd>Value4</dd>
</dl>
I want to find all <dt> and then, for each, find the following <dd> up until the next <dt>.
Using Ruby's Nokogiri I am able to accomplish this like so:
dl.xpath('dt').each do |dt|
ct = dt.xpath('count(following-sibling::dt)')
dds = dt.xpath("following-sibling::dd[count(following-sibling::dt)=#{ct}]")
puts "#{dt.text}: #{dds.map(&:text).join(', ')}"
end
#=> Label1: Value1
#=> Label2: Value2
#=> Label3: Value3a, Value3b
#=> Label4: Value4
However, as you can see I'm creating a variable in Ruby and then composing an XPath using it. How can I write a single XPath expression that does the equivalent?
I guessed at:
following-sibling::dd[count(following-sibling::dt)=count(self/following-sibling::dt)]
but apparently I don't understand what self means there.
This question is similar to XPath : select all following siblings until another sibling except there is no unique identifier for the 'stop' node.
This question is almost the same as xpath to find all following sibling adjacent nodes up til another type except that I'm asking for an XPath-only solution.
This is an interesting question. Most of the problems were already mentioned in #lwburk's answer and in its comments. Just to open up a bit more the complexity hidden in this question for a random reader, my answer is probably more elaborate or more verbose than OP needed.
Features of XPath 1.0 related to this problem
In XPath each step, and each node in the set of selected nodes, work independently. This means that
a subexpression has no generic way to access data that was computed in a previous subexpression or share data computed in this subexpression to other subexpressions
a node has no generic way to refer to a node that was used as a context node in a previous subexpression
a node has no generic way to refer to other nodes that are currently selected.
if everyone of the selected nodes must be compared to a same certain node, then that node must be uniquely definable in a way that is common to all selected nodes
(Well, in fact I'm not 100% sure if that list is absolutely correct in every case. If anyone has better knowledge of the quirks of XPath, please comment or correct this answer by editing it.)
Despite the lack of generic solutions some of these restrictions can be overcome if there is proper knowledge of the document structure, and/or the axis used previously can be "reverted" with another axis that serves as a backlink i.e. matches only nodes that were used as context node in the previous expression. A common example of this is when a parent axis is used after first using a child axis (the opposite case, from child to parent, is not uniquely revertible without additional information). In such cases, the information from previous steps is more precisely recreated at a later step (instead of accessing previously known information).
Unfortunately in this case I couldn't come up with any other solution to refer to previously known nodes except using XPath variables (that needs to be defined beforehand).
XPath specifies a syntax for referring a variable but it does not specify syntax for defining variables, the way how to define variables depends on the environment where XPath is used. Actually since the recommendation states that "The variable bindings used to evaluate a subexpression are always the same as those used to evaluate the containing expression", you could also claim that XPath explicitly forbids defining variables inside an XPath expression.
Problem reformulated
In your question the problem would be, when given a <dt>, to identify the following <dd> elements or the initially given node after the context node has been switched. Identifying the originally given <dt> is crucial since for each node in the node-set to be filtered, the predicate expression is evaluated with that node as the context node; so one cannot refer to the original <dt> in a predicate, if there is no way to identify it after the context has changed. The same applies to <dd> elements that are following siblings of the given <dt>.
If you are using variables, one could debate is there a major difference between 1) using XPath variable syntax and a Nokogiri specific way to declare that variable or 2) using Nokogiri extended XPath syntax that allows you to use Ruby variables in an XPath expression. In both cases the variable is defined in environment specific way and the meaning of the XPath is clear only if the definition of the variable is also available. Similar case can be seen with XSLT where in some cases you could make a choice between 1) defining a variable with <xsl:variable> prior to using your XPath expression or 2) using current() (inside your XPath expression) which is an XSLT extension.
Solution using nodeset variables and Kaysian method
You can select all the <dd> elements following the current <dt> element with following-sibling::dd (set A). Also you can select all the <dd> elements following the next <dt> element with following-sibling::dt[1]/following-sibling::dd (set B). Now a set difference A\B leaves the <dd> elements you actually wanted (elements that are in set A but not in set B). If variable $setA contains nodeset A and variable $setB contains nodeset B, the set difference can be obtained with (a modification of) Kaysian technique:
dds = $setA[count(.|$setB) != count($setB)]
A simple workaround without any variables
Currently your method is to select all the <dt> elements and then try to couple the value of each such element with values of corresponding <dd> elements in a single operation. Would it be possible to convert that coupling logic to work the other way round? So you would first select all <dd> elements and then for each <dd> find the corresponding <dt>. This would mean that you end up accessing same <dt> elements several times and with every operation you add only one new <dd> value. This could affect performance and the Ruby code could be more complicated.
The good side is the simplicity of the required XPath. When given a <dd> element, finding the corresponding <dt> is amazingly simple: preceding-sibling::dt[1]
As applied to your current Ruby code
dl.xpath('dd').each do |dd|
dt = dd.xpath("preceding-sibling::dt[1]")
## Insert new Ruby magic here ##
end
One possible solution:
dl.xpath('dt').each_with_index do |dt, i|
dds = dt.xpath("following-sibling::dd[not(../dt[#{i + 2}]) or " +
"following-sibling::dt[1]=../dt[#{i + 2}]]")
puts "#{dt.text}: #{dds.map(&:text).join(', ')}"
end
This relies on a value comparison of dt elements and will fail when there are duplicates. The following (much more complicated) expression does not depend on unique dt values:
following-sibling::dd[not(../dt[$n]) or
(following-sibling::dt[1] and count(following-sibling::dt[1]|../dt[$n])=1)]
Note: Your use of self fails because you're not properly using it as an axis (self::). Also, self always contains just the context node, so it would refer to each dd inspected by the expression, not back to the original dt