Unable to create Xpath to locate individual value - xpath

I found some elements where several documents within. How can i locate individual document by creating an xpath expression, as in email and phone?
I tried with:
//div[#class="professional-info-content"]/text()
but it gives all the docs.
Here are the elements:
<div class='professional-info-content'><b>Impressum:</b><br/>hokon
<br/>Jörn Brenscheidt GmbH
<br/>Wasserbank 21
<br/>D-58456 Witten Herbede
<br/>
<br/>Tel.: 0049 (0) 2302 780100
<br/>Fax: 0049 (0) 2302 780110
<br/>E-Mail: info#hokon.de
<br/>www.hokon.de</div>

/div[#class="professional-into-content"]/text() returns a node list of all the text nodes under the div. To get only a specific one, you can specify an index, e.g.:
/div[#class="professional-into-content"]/text()[6]
returns the telephone number and
/div[#class="professional-into-content"]/text()[8]
the email.

Related

Find locator with contains text (Robotframework)

From this code as below:
<span id="cTDQo7-img" class="z-menu-img"></span> payment
<span id="cTDQo7-img" class="z-menu-img"></span>
"payment"
I would like to get locator use keyword contains but the word "payment" is
a lot of the page such as payment1,payment2,payment3
And id is not unique.
I tried to use the code below but not work for me.
//a[contains(.,'payment')]
//span[#class='z-menu-img'] [contains(.,'payment')]
//span[#class='z-menu-img'] and [contains(.,'payment')]
//span[#class='z-menu-img'] contains(.,'payment')
Option 1 : Use the other attributes in combination with text
//a[#class='z-menu-cnt z-menu-cnt-img' and normalize-space(.)='payment']
Option 2: Specify the position if you have multiple elements without unique attributes/path
(//a[contains(.,'payment')])[1]
The second xpath will identify the first occurrence of the link contains text 'payment'. You can change the tagname and index based on your interest.

How to prevent Xpath recursion

Given I have this (unknown) document structure, how do I write xpath to select div1 and div2, i.e. all divs, but not recursivelly (no divs, contained anywhere within another divs)?
I couldn't find any documentation that would point me in this direction, all I could manage is to select ALL divs, i.e. div1, div2 and div3 (with //div expression), but I want to exclude div2 here as it is the descendant div of another one.
(I need a generic solution to select tags not recursivelly, the ids here are for explanatory purposes only.)
...some unknown structure with no divs...
<div id="1">
...some unknown structure with no divs...
<div id="2"></div>
...some unknown structure with no divs...
</div>
...some unknown structure with no divs...
<div id="3"></div>
...some unknown structure with no divs...
If you select //div[not(ancestor::div)] you select all div elements that don't have any ancestor also being a div.
If you have access to XPath 3.1 or 3.0 you can also use the outermost function https://www.w3.org/TR/xpath-functions/#func-outermost as it "returns every node within the sequence that does not have another node within the sequence as an ancestor" so "the expression outermost(//div) returns those div elements that are not contained within further div elements".

How to find xpath of a text element without node

<h1>
<span class="visually-hidden">BBC Radio</span>
Search results for 'archers'
</h1>
I want to locate the text element "Search results for 'archers'" . What will be the xpath that will locate to it and not to the element in span node ??
For your input sample
/h1/text()
Tested on http://videlibri.sourceforge.net/cgi-bin/xidelcgi
returns Search results for 'archers'

Get count of specific nodes between two specific sibling nodes

I'm using HtmlAgilityPack to get a filtered DOM of <h2> and <h3> nodes and using Xpath 1.0 (from my Xpath 1.0 crash course this week) I need to get the number of <h3>'s (the number varies) that are between sibling <h2>'s as follows:
<div>
<h2>heading 1</h2>
<h3>sub 1.1</h3>
<h3>sub 1.2</h3>
<h2>heading 2</h2>
<h3>sub 2.1</h3>
<h2>heading 3</h2>
....
</div>
When I iterate (using C#) through the filtered nodes I want the exact number of <h3>'s that are after a <h2> and before the next <h2>. When I use the following I get all the <h3>'s as the result.
int countH3 = n.SelectNodes("./preceding-sibling::h2[2]/following-sibling::h2[3]/preceding-sibling::h3").Count(); //the [position] is set dynamically
For the node structure above would like the result of the code line to be:
countH3 = 1
but it is:
countH3 = 3
I've found many similar SO questions regarding "sibling nodes between sibling nodes" and have to thank #LarsH for his comment in another question that /preceding::h3 returns ALL <h3>'s which helped explain the issue. I think I may need to use the Kayessian method of node-set intersection but get the "invalid token" error when I include the . | union character as follows:
countH3 = n.SelectNodes("./h2[2]/following-sibling::h2[3]
[count(.|./h2[2]/following-sibling::h2[3]/preceding-sibling::h3)=
count(./h2[2]/following-sibling::h2[3]/preceding-sibling::h3)]").Count();
Any suggestions appreciated.

xpath trying to select content inside a div except one, with text included

Im trying to select the content inside a div, this div has some text inside and some additional tags. I dont want to select the first div inside. I was trying with this selector, but only gives me the tags, without text
//div[#class='contentDealDescriptionFacts cf']/div[#class='viewHalfWidthSize' and position()=2]/*[not(#class='subHeadline')]
the div that is giving me problems is this one:
<div class="viewHalfWidthSize">
.......
</div>
<div class="viewHalfWidthSize">
<div class="subHeadline firefinder-match">The Fine Print</div> <----------Except this div I want everything inside of this div!!
<strong class="firefinder-match">Validity: </strong>
Expires 27 June 2013.
<br class="firefinder-match">
<strong class="firefinder-match">Purchase: </strong>
Limit 1 per 2 people. May buy multiple as gifts.
<br class="firefinder-match">
<strong class="firefinder-match">Redemption: </strong>
Booking required online at
<a target="_blank" href="http://grouponbookings.co.uk/lautre-pied-march/" class="firefinder-match">http://grouponbookings.co.uk/lautre-pied-march/</a>
. 48-hour cancellation policy; late cancellation incurs a £30 surcharge per person.
<br class="firefinder-match">
<strong class="firefinder-match">Further information: </strong>
Valid Mon-Sun midday-2.45pm; Mon-Wed 6pm-10.45pm. Must be 18 or older, ID may be requested. Valid only on set tasting menu only; menu is dependent on market changes and seasonality and is subject to change. Max. two hours seating time. Discretionary service charge will be added to the bill based on original price. Original value verified 19 March 2013 at 9.01am.
<br class="firefinder-match">
<a target="_blank" href="http://www.groupon.co.uk/universal-fine-print" style="color: #339933;" class="firefinder-match">See the rules</a>
that apply to all deals.
</div>
The * matches element nodes and not text nodes. Try replacing * with node() to select all node types.
To break down what your XPath is doing:
You are looking anywhere in the document (//) for a div with class 'contentDealDescriptionFacts cf'.
Then you are looking for the 2nd div under that which also has the class viewHalfWidthSize. Note, this is not the 2nd div that has the class but the div that is 2nd AND has that class, so if the divs with that class are the 3rd and 4th it wouldn't match anything as the 2nd div with the class has position() = 4. If you want the 2nd viewHalfWidthSize div then you'll want [#class='viewHalfWidthSize'][position()=2].
Finally, you are returning a nodelist of all elements without the class subHeadline. If you change the * to node() then you will get a nodelist of all nodes.
The following XPath:
//div[#class='contentDealDescriptionFacts cf']/div[#class='viewHalfWidthSize' and position()=2]/node()[not(name(.)='div' and position() = 1)]
should return what you want as long as the first child node is the div you want to ignore.
If you change it to:
//div[#class='contentDealDescriptionFacts cf']/div[#class='viewHalfWidthSize' and position()=2]/node()[position() != count(../div[1]/preceding-sibling::node()) + 1]
then it should work regardless. It returns your nodelist, then works out how many preceding nodes there are before the first div, and checks the position isn't one greater than that (i.e. position of first div) and excludes that from the list.
As yet another alternative you could just modify your original solution but instead of doing not(#class='subHeadline') you should do
not(contains(concat(' ', #class, ' '), ' subHeadline '))
which will check if the class attribute contains subHeadline anywhere in the string on the assumption that your classes are space separated. This would then match your fragment which has the class "subHeadline firefinder-match"

Resources