Selecting nodes between two processing instructions with XPATH - xpath

I have something like this:
<?cond_start condition="Online" ?>
<p>This section is tagged with Conditional "Online".</p>
<?cond_end?>
<?cond_start condition="Print" ?>
<p>This section is tagged with Conditional "Print".</p>
<?cond_end?>
I need to process the content between the PIs based on the value of the PIs (condition="Online" / condition="Print").
I can select a specific PI with e.g. this:
//processing-instruction('cond_start')
But I have no idea how to go beyond this … Especially not, as there can be nested PIs like this:
<?cond_start condition="Online" ?>
<p>This section is <?cond_start condition="Comment" ?>Are you sure?<?cond_end?> tagged with Conditional "Online".</p>
<?cond_end?>
Aynone any ideas?

I think you want all nodes where
the first preceding processing instruction node is itself a cond_start which has a certain contents, for example 'condition="Online"'.
the first following processing instruction node is itself a cond_end
That would be:
//node()[
preceding-sibling::processing-instruction()[1][
self::processing-instruction('cond_start')
and contains(., 'condition="Online"')
]
and following-sibling::processing-instruction()[1][
self::processing-instruction('cond_end')
]
]
Note that this does not work when there are more processing instructions between <?cond_start condition="..." ?> and <?cond_end?>. If that's the case for you things get more complicated.

Related

XPath "and" Confusion

I recently started a new job that uses cucumber/Gherkin along with selenium. I was trying to create a XPath for a specific element. The xml looks slightly like this...
<p>
<div class="slds-text-title_bold slds-m-bottom_x-small ncc-input-label">
Amp
</div>
<div class="slds-text-title_bold slds-m-bottom_x-small ncc-input-label required-field-label">
Voltage
</div>
</p>
I am looking to only get the div with the required field label in the class and text of "Voltage" So far this kinda works...
//div[contains(text(), "Voltage")] | //*[contains(class, "required-field-label")]
however I'm getting way too many false positives. Any time I change the pipe into "and" I get nothing. What am I doing wrong?
HCSloan
Try the following expression on your actual code, and see if it works:
//div[contains(#class, "required-field-label")][contains(text(), "Voltage")]
You can match the element using "and" like this:
//div[contains(#class, 'required-field-label') and contains(text(), 'Voltage')]

How to prevent Xpath recursion

Given I have this (unknown) document structure, how do I write xpath to select div1 and div2, i.e. all divs, but not recursivelly (no divs, contained anywhere within another divs)?
I couldn't find any documentation that would point me in this direction, all I could manage is to select ALL divs, i.e. div1, div2 and div3 (with //div expression), but I want to exclude div2 here as it is the descendant div of another one.
(I need a generic solution to select tags not recursivelly, the ids here are for explanatory purposes only.)
...some unknown structure with no divs...
<div id="1">
...some unknown structure with no divs...
<div id="2"></div>
...some unknown structure with no divs...
</div>
...some unknown structure with no divs...
<div id="3"></div>
...some unknown structure with no divs...
If you select //div[not(ancestor::div)] you select all div elements that don't have any ancestor also being a div.
If you have access to XPath 3.1 or 3.0 you can also use the outermost function https://www.w3.org/TR/xpath-functions/#func-outermost as it "returns every node within the sequence that does not have another node within the sequence as an ancestor" so "the expression outermost(//div) returns those div elements that are not contained within further div elements".

xpath without specificy the tag? [duplicate]

Given this XML, what XPath returns all elements whose prop attribute contains Foo (the first three nodes):
<bla>
<a prop="Foo1"/>
<a prop="Foo2"/>
<a prop="3Foo"/>
<a prop="Bar"/>
</bla>
//a[contains(#prop,'Foo')]
Works if I use this XML to get results back.
<bla>
<a prop="Foo1">a</a>
<a prop="Foo2">b</a>
<a prop="3Foo">c</a>
<a prop="Bar">a</a>
</bla>
Edit:
Another thing to note is that while the XPath above will return the correct answer for that particular xml, if you want to guarantee you only get the "a" elements in element "bla", you should as others have mentioned also use
/bla/a[contains(#prop,'Foo')]
This will search you all "a" elements in your entire xml document, regardless of being nested in a "blah" element
//a[contains(#prop,'Foo')]
I added this for the sake of thoroughness and in the spirit of stackoverflow. :)
This XPath will give you all nodes that have attributes containing 'Foo' regardless of node name or attribute name:
//attribute::*[contains(., 'Foo')]/..
Of course, if you're more interested in the contents of the attribute themselves, and not necessarily their parent node, just drop the /..
//attribute::*[contains(., 'Foo')]
descendant-or-self::*[contains(#prop,'Foo')]
Or:
/bla/a[contains(#prop,'Foo')]
Or:
/bla/a[position() <= 3]
Dissected:
descendant-or-self::
The Axis - search through every node underneath and the node itself. It is often better to say this than //. I have encountered some implementations where // means anywhere (decendant or self of the root node). The other use the default axis.
* or /bla/a
The Tag - a wildcard match, and /bla/a is an absolute path.
[contains(#prop,'Foo')] or [position() <= 3]
The condition within [ ]. #prop is shorthand for attribute::prop, as attribute is another search axis. Alternatively you can select the first 3 by using the position() function.
Have you tried something like:
//a[contains(#prop, "Foo")]
I've never used the contains function before but suspect that it should work as advertised...
John C is the closest, but XPath is case sensitive, so the correct XPath would be:
/bla/a[contains(#prop, 'Foo')]
If you also need to match the content of the link itself, use text():
//a[contains(#href,"/some_link")][text()="Click here"]
/bla/a[contains(#prop, "foo")]
try this:
//a[contains(#prop,'foo')]
that should work for any "a" tags in the document
For the code above...
//*[contains(#prop,'foo')]

xpath how to skip a node

<article class='article-contents'>
<div class='summary'>xxxx</div>
<p>xxxxxx</p>
<table>...</table>
<p>....</p>...
</article>
I have a html structure like above, i'd like to skip pass <div class='summary'> and get the whole content inside article section using Xpath structure.
You could use a query like this:
//article[#class='article-contents']/node()[not(local-name()='div' and #class='summary')]
This should select all child nodes of the article excluding the summary div.

Using XPath expression how can i get the first text node immediately following a node?

I want to get to the exact node having this text: 'Company'. Once I get to this node I want to get to the next text node immediately following this node because this contains the company name. How can I do this with Xpath?
Fragment of XML is:
<div id="jobsummary">
<div id="jobsummary_content">
<h2>Job Summary</h2>
<dl>
<dt>Company</dt>
<!-- the following element is the one I'm looking for -->
<dd><span class="wrappable">Pinpoint IT Services, LLC</span></dd>
<dt>Location</dt>
<dd><span class="wrappable">Newport News, VA</span></dd>
<dt>Industries</dt>
<dd><span class="wrappable">All</span></dd>
<dt>Job Type</dt>
<dd class="multipledd"><span class="wrappable">Full Time</span></dd><dd class="multipleddlast"><span class="wrappable"> Employee</span></dd>
</dl>
</div>
</div>
I got to the Company tag with following xpath: //*[text()= 'Company']
Now I want to get to the next text node. My XML is dynamic. So I can't hardcode the node type like <dd> for getting the company value. But this is for sure that the value be in the immediate next text node.
So how can I get to the text node immediately after the node with text as Company?
If you cannot hardcode any part of the following-sibling node your xpath should look like this:
//*[text()='Company']/following::*/*/text()
assuming that the desired text is always enclosed in another element like span.
To test for given dt text, modify your xpath to
//*[text()='Company' or text()='Company:' or text()='Company Name']/following::*/*/text()
use //*[text()='Company']/following-sibling::dd to get the next dd.
You can even insert conditions for that dd and also go further in it.
following-sibling::elementName just looks for the next sibling at the same parent level that meets your requirements.
With no conditions, like above, it will get the next dd after the 'Company'.
The text is in the span so you might try
//*[text()='Company']/following-sibling::dd/span
Another clarifying example would be, let's say that you want to get also the next industries text for the current selected 'Company'.
Having //*[text()='Company',
you can modify it like this: //*[text()='Company']/following-sibling::dt[text()='Industries']/dd/span
Of course, instead of hardcoding the values for text(), you can use variables.
You can Use XPathNavigator and go on to every node type one by one
I think XPathNavigator::MoveToNext is the method you are looking for.
There is the sample code as well at..
http://msdn.microsoft.com/en-us/library/9yxc3x24.aspx
Use this general XPath expression that selects the wanted text node even when it is wrapped in statically unknown markup elements:
(//*[text()='Company']/following-sibling::*[1]//text())[1]
When this XPath expression is evaluated against the provided XML document:
<div id="jobsummary">
<div id="jobsummary_content">
<h2>Job Summary</h2>
<dl>
<dt>Company</dt>
<!-- the following element is the one I'm looking for -->
<dd><span class="wrappable">Pinpoint IT Services, LLC</span></dd>
<dt>Location</dt>
<dd><span class="wrappable">Newport News, VA</span></dd>
<dt>Industries</dt>
<dd><span class="wrappable">All</span></dd>
<dt>Job Type</dt>
<dd class="multipledd"><span class="wrappable">Full Time</span></dd><dd class="multipleddlast"><span class="wrappable"> Employee</span></dd>
</dl>
</div>
</div>
exactly the wanted text node is selected:
Pinpoint IT Services, LLC
Even if we change the XML to this one:
<div id="jobsummary">
<div id="jobsummary_content">
<h2>Job Summary</h2>
<div>
<p>Company</p>
<!-- the following element is the one I'm looking for -->
<dd><span class="wrappable"><b><i><u>Pinpoint IT Services, LLC</u></i></b></span></dd>
<dt>Location</dt>
<dd><span class="wrappable">Newport News, VA</span></dd>
<dt>Industries</dt>
<dd><span class="wrappable">All</span></dd>
<dt>Job Type</dt>
<dd class="multipledd"><span class="wrappable">Full Time</span></dd><dd class="multipleddlast"><span class="wrappable"> Employee</span></dd>
</div>
</div>
</div>
the XPath expression above still selects the wanted text node:
Pinpoint IT Services, LLC

Resources