I have an XML below -
<document>
<node name="Node 0 Text here" ID="01" >aa
</node>
<node name="Node 1 Text here" ID="11">bb
</node>
<node name="Node 2 Text here" ID="12">cc
</node>
<node name="Node 3 Text here" ID="22">dd
</node>
<node name="Node 4 Text here" ID="23">ee
</node>
</document>
I need to search content in a particular node within this XML.
If search keyword does not exist in that node, then I have to begin searching from the next node of current node, you could say sibling.
If that keyword does not exist in all the nodes after the current node then it should begin search from start..
I have to achieve this in my code behind- dotnet class. I have used -
XmlNodeList xmlNodes = xd.SelectNodes("//12/following-sibling::*");
Here, 12 refers to nodeid of the current node,which will be passed as an argument. But I am getting error.
Any help is appreciated.
I need to search content in a particular node within this XML
to get a node matching by its content, the XPath is:
node[contains(text(),'aa')]
This will return the first node for example and any other node whose content text contains aa.
If search keyword does not exist in that node, then I have to begin searching from the next node of current node, you could say sibling. If that keyword does not exist in all the nodes after the current node then it should begin search from start.
This sentence does not make much sense to XPath. The expression above will return all nodes matching the keyword. If you want the first matched node you can get it from the XmlNodeList after or directly from the XPath expression changing it to:
node[contains(text(),'aa')][1]
12 refers to nodeid of the current node,which will be passed as an argument
That's not correct. To select the node by id you should use, for instance:
node[#id=12]/text()
This will get the content of the node with id=12.
Use:
(/*/node[ID='12']/following-sibling::*[contains(.,$pattern)][1]
|
/*/node[ID='12']/preceding-sibling::*[contains(.,$pattern)][1]
)
[last()]
This expression selects the last from the two wanted selections -- the first of the following siblings that contains the value of $pattern and the first of the preceding siblings that contains the value of $pattern.
You need to substitute $pattern with the exact value you want to serch for.
Related
I need my xpath expression to select only the first child element of an xml file based on condition. Say the first having field1=B.
I use this expression but it return that with field1=A.
<root>
<entry>
<field1>A</field1>
<field2>10</field2>
</entry>
<entry>
<field1>B</field1>
<field2>20</field2>
</entry>
/root/entry[//field1='B' or 'C'][1]
How can I do it?
It should be
/root/entry[.//field1='B' or .//field1='C'][1]
Note that entry[//field1='B'][1] means return first entry node if field1 node with value 'B' exists (anywhere in XML) while entry[.//field1='B'][1] means return first entry node if it has a descendant field1 node with value 'B'
Also you can simplify expression as
/root/entry[field1='B' or field1='C'][1]
if field1 always appears as direct child of entry
In this xpath:
/A/B[C='hello']
Is C="hello" some kind of syntactic shortcut for C[text()='hello']? Is it documented anywhere?
Edit: Okay, I discovered one difference: C= returns all the text nodes in C and C's children, while C[text()= returns only the text nodes in C.
Now, suppose I have the XML:
<root>
<A>
<B>
<C>hello<E>EEE</E>world</C>
<D>world</D>
</B>
<B>
<C>goodbye</C>
<D>mars</D>
</B>
</A>
</root>
How would I choose the B node containing the first C node using the syntax C[text()=? I can get the B node using the C= syntax like this:
/root/A/B[C="helloEEEworld"]
But this doesn't work:
/root/A/B[C[text()="helloworld"]]
nor do these:
/root/A/B[C[text()="hello world"]]
/root/A/B[C[text()="helloEEEworld"]]
Hmmm...this works:
/root/A/B[C[text()="hello"]]
Why is that? Does text() only return the first text node? According to the W3C, text() returns all text node children of the context node.
text() really returns all text node children as list of nodes
When you use /root/A/B[C[text()="hello"]] you mean fetch B node with C child that any direct child node is equal to "hello".
In the same way you can match it by :
/root/A/B[C[text()="world"]]
or explicitly specify that you want to get node by exact first or second direct child text node:
/root/A/B[C[text()[1]="hello"]]
/root/A/B[C[text()[2]="world"]]
If you want to match required node by its complete text content you can use
/root/A/B[C[.="helloEEEworld"]]
or
/root/A/B[C="helloEEEworld"]
C in the predicate expression [C='hello'] returns all C elements that is direct child of context element which is B. So the entire predicate is a boolean expression that contains comparison between a node-set and a string (notice that element is a type of node in XPath data model), and behavior of this case is documented in the spec as follows :
If one object to be compared is a node-set and the other is a string, then the comparison will be true if and only if there is a node in the node-set such that the result of performing the comparison on the string-value of the node and the other string is true. If one object to be compared is a node-set and the other is a boolean, then the comparison will be true if and only if the result of performing the comparison on the boolean and on the result of converting the node-set to a boolean using the boolean function is true. [source]
C='hello' in /A/B[C='hello'] will be evaluated to true if any of the C elements, after converted to string, equals 'hello'. So it is more of a shortcut for C[string()='hello'] if you will.
"Hmmm...this works:
/root/A/B[C[text()="hello"]]
Why is that? Does text() only return the first text node? According to the W3C, text() returns all text node children of the context node."
Instead of the first text node, text() in this context returns all direct child text nodes. This is because child:: is the default axis in XPath. Contrasts your XPath with the equivalent verbose version of it :
/child::root/child::A/child::B[child::C[child::text()="hello"]]
I have a fairly deeply nested xml structure and I would like to find an element with a particular value after I have already selected a node. In the sample below I have an array of 'B' and after selecting each of the 'B' nodes I would like to get the text of one of the children (which are not consistent) that starts with the word 'history'
<A>
<Items>
<B>
<C>
<D>history: pressed K,E</D> // could be nested anywhere
</C>
</B>
<B>
<C>
<E>history: pressed W</E>
</C>
</B>
</Items>
</A>
// Select the nodes from the array (B)
var nodes = select(xmldoc, "//A/Items/B");
// Iterate through the nodes.
nodes.forEach(node){
// is it possible to select any element that starts with the text 'history' from the already selected node.
var history = select(node, "???[starts-with(.,'history')]");
all the samples I have seen start with : //*[text()] which searches from the root of the structure.
//B//*[starts-with(normalize-space(), 'history')]
looks like it would do what you intend.
It selects "any descendant element of <B> whose text content starts with 'history'".
Manual iteration to find further nodes is not typically necessary. XPath does that for you. If you must iterate for some other reason, use the context node . to select.
nodes.forEach(function (node) {
var history = select(node, "./*[starts-with(.,'history')]");
});
If you are actually looking for "any text node..."
//B//text()[starts-with(normalize-space(), 'history')]
Or "any element node that has no further child elements..."
//B//*[not(*) and starts-with(normalize-space(), 'history')]
I have an xml document with fragments like the following:
<x>
abcd
<z>ef</z>
ghij
</x>
I want to find the text "defg" inside the node, and modify that node to the following:
<x>
abc
<y>
d<z>ef</z>g
</y>
hij
</x>
This means creating a new node that has bit of x.text and other children inside.
I can find the node which includes the text, but I don't know how to break it up, and wrap just the matching section inside the <y> tags.
Any ideas that can point me in the right direction are most appreciated. Thanks.
What about turning it into a sting and then use a regex to change it, and then parse it with nokogiri again.
sting = some_xml.to_s
# => '<x>abcd<z>ef</z>ghij</x>'
splits = sting.match(/(.)<z>(.*)<\/z>(.)/)
new_string = sting.gsub(splits[1], "<y>#{splits[1]}").gsub(splits[3], "#{splits[3]}</y>")
Nokogiri::XML(new_string)
Here is an excerpt of my xml :
<node/>
<node/>
<node id="1">content</node>
<node/>
<node/>
<node/>
<node id="2">content</node>
<node/>
<node/>
I am positioned in the node[#id='1']. I need an Xpath to match all the <node/> elements until the next not empty node (here node[#id='2']).
Edit:
the #id attributes are only to explain my problem more clearly, but are not in my original XML. I need a solution which does not use the #id attributes.
I do not want to match the empty siblings after node[#id='2'], so I can't use a naive following-sibling::node[text()=''].
How can I achieve this ?
You could do it this way:
../node[not(text()) and preceding-sibling::node[#id][1][#id='1']]
where '1' is the id of the current node (generate the expression dynamically).
The expression says:
from the current context go to the parent
select those child nodes that
have no text and
from all "preceding sibling nodes that have an id" the first one must have an id of 1
If you are in XSLT you can select from the following-sibling axis because you can use the current() function:
<!-- the for-each is merely to switch the current node -->
<xsl:for-each select="node[#id='1']">
<xsl:copy-of select="
following-sibling::node[
not(text()) and
generate-id(preceding-sibling::node[#id][1])
=
generate-id(current())
]
" />
</xsl:for-each>
or simpler (and more efficient) with a key:
<xsl:key
name="kNode"
match="node[not(text())]"
use="generate-id(preceding-sibling::node[#id][1])"
/>
<xsl:copy-of select="key('kNode', generate-id(node[#id='1']))" />
Simpler than the accepted answer:
//node[#id='1']/following-sibling::node[following::node[#id='2']]
Find a node anywhere whose id is '1'
Now find all the following sibling node elements
...but only if those elements also have a node with id="2" somewhere after them.
Shown in action with a more clear test document (and legal id values):
xml = '<root>
<node id="a"/><node id="b"/>
<node id="c">content</node>
<node id="d"/><node id="e"/><node id="f"/>
<node id="g">content</node>
<node id="h"/><node id="i"/>
</root>'
# A Ruby library that uses libxml2; http://nokogiri.org
require 'nokogiri'; doc = Nokogiri::XML(xml)
expression = "//node[#id='c']/following-sibling::node[following::node[#id='g']]"
puts doc.xpath(expression)
#=> <node id="d"/>
#=> <node id="e"/>
#=> <node id="f"/>
XPath 2.0 has the operators '<<' and '>>' where node1 << node2 is true if node1 precedes node2 in document order.
So based on that with XPath 2.0 in an XSLT 2.0 stylesheet where the current node is the node[#id = '1'] you could use
following-sibling::node[not(text()) and . << current()/following-sibling::node[#od][1]]
That also needs the current() function from XSLT, so that is why I said "with XPath 2.0 in an XSLT 2.0 stylesheet". The syntax above is pure XPath, in an XSLT stylesheet you would need to escape '<<' as '<<'.