I need to select a link using xpath that matches the following three criterion:
parent #class = 'testItem'
child #class = 'icon icon_checked'
text = 'test text goes here!'
i'm unsure about where to put the text attribute in the xpath reference. i've tried many permutations of the following:
//a[#class="testItem" and child::span[#class="icon icon_checked"] and li[text()="test text goes here!"]]
my issue is that the text part is not in its own span.
here's the raw example:
<li>
<a class="testItem2" data-code="2" href="javascript:void(0);">
<span class="icon icon_checked"></span>
test text goes here2!
</a>
</li>
<li>
<a class="testItem" data-code="2" href="javascript:void(0);">
<span class="icon icon_checked"></span>
test text goes here!
</a>
</li>
Thanks for the help. I've found the answer.
I can simply change the last part of my xpath from li[text()="test text goes here!"] to .[text()="test text goes here!"]].
My final working xpath is:
//a[#class='testItem' and child::span[#class='icon icon_checked'] and .[text()='test text goes here!']]
Related
From the following XML-document, I'm trying to specify XPath that will capture the text that immediately follows the h4-headline "Source", namely - in this example - "Information about the source":
<div class="doc-inf doc-inf-information">
<h3>Document information</h3>
<div>
<h4>Source</h4>
<ul>
<li>Information about the source</li>
</ul>
I've tried the following:
//h4[contains(text(), "Source")]/ul/li'
Which doesn't seem to work. Would anyone be able to help? I would greatly appreciate it.
EDIT:
My problem (which I didn't specify fully, sorry) is that this div tag has multiple h4 tags in it of which I want to select the ul-child for each:
<div class="doc-inf doc-inf-information">
<h3>Document information</h3>
<div>
<h4>Source</h4>
<ul>
<li>Source information</li>
</ul>
<h4>Language</h4>
<ul>
<li>Swedish</li>
</ul>
<h4>Publishers</h4>
<ul>
<li>Publishing Project</li>
</ul>
<h4>Record ID</h4>
<ul>
<li>36785</li>
</ul>
In essence, I'm trying to grab the child under h4 headlines "Source", "Language", "Publishers", "Record ID" (= what I'm interested in is "Source information", "Swedish", "Publishing Project" and "36785") but the h4 headlines are inconsistently placed across pages so I need to be able to target the children of the specific headlines.
You are directly accessing the tag <h4>, which has no children, therefore the following doesn't work:
//h4[contains(text(), "Source")]/ul/li
Try this instead:
//div[h4[contains(text(), "Source")]]/ul/li/text()
which searches for a <div> that has the tag <h4> in it with the text 'Source' and then it selects the <ul> child.
I am trying to scrape a web page for NAME OF COMPANY and CITY AND STATE OF COMPANY shown below.
I have an xpath code snippet that identifies both text elements at the same time:
// span[starts-with(#class,"text-align")]/text()[2]
This xpath snippet pulls the first text value (COMPANY NAME). How do I get the second text element (CITY,STATE)?
A snip of the web page code looks like this:
<div>
<ul class="pv-top-card-v3--experience-list">
<li>
<a class="pv-top-card-v3--experience-list-item" href="#" data-control-name="position_see_more" data-ember-action="" data-ember-action-172="172">
<img src="https://media.licdn.com/dms/image/C4E0BAQFhA8h46hvabA/company-logo_100_100/0?e=1582761600&v=beta&t=VAeZqaGu3Lu6Ol_n5kiiI74FSRuSOZA1ggAI5qTVRjE" id="ember173" class="EntityPhoto-square-1 flex-shrink-zero ember-view">
<span id="ember174" class="text-align-left ml2 t-14 t-black t-bold full-width lt-line-clamp lt-line-clamp--multi-line ember-view" style="-webkit-line-clamp: 2"> THIS IS THE NAME OF A COMPANY
<!----></span>
</a>
</li>
<li>
<a class="pv-top-card-v3--experience-list-item" href="#" data-control-name="education_see_more" data-ember-action="" data-ember-action-176="176">
<img src="https://media.licdn.com/dms/image/C560BAQEr2uQX-x2EwQ/company-logo_100_100/0?e=1582761600&v=beta&t=aDbYLUDMvlS4DpwOLjOaQj3Dj60C_cYLC5UUvGoyld0" id="ember177" class="EntityPhoto-square-1 flex-shrink-zero ember-view">
<span id="ember178" class="text-align-left ml2 t-14 t-black t-bold full-width lt-line-clamp lt-line-clamp--multi-line ember-view" style="-webkit-line-clamp: 2"> THIS IS THE CITY AND STATE OF COMPANY
<!----></span>
</a>
</li>
</ul>
</div>
The xpath string is picking up the two span elements using class. I can't use the span id attributes because they are dynamic and change with each page (one page per company).
Can someone advise how I extract the desired text?
Thanks.
point to the li level.
//ul/li[2]/a/span[starts-with(#class,"text-align")]
I am trying to get the error message off of a page from a site. The list contains several possible errors so i can't check by id; but I do know that the one with display:list-item is the one I want. This is my rule but doesn't seem to work, what is wrong with it? What I want returned is the error text in the element.
//*[#id='errors']/ul/li[contains(#style,'display:list-item')]
Example dom elements:
<div id="errors" class="some class" style="display: block;">
<div class="some other class"></div>
<div class="some other class 2">
<span class="displayError">Please correct the errors listed in red below:</span>
<ul>
<li style="display:none;" id="invalidId">Enter a valid id</li>
<li style="display:list-item;" id="genericError">Something bad happened</li>
<li style="display:none;" id="somethingBlah" ............ </li>
....
</ul>
</div>
The correct XPath should be:
//*[#id='errors']//ul/li[contains(#style,'display:list-item')]
After //*[#id='errors'] you need an extra /, because <ul> is not directly beneath it. Using // again scans all underlying elements for <ul>.
If you are capable to not use // it would be better and faster and less consuming.
I want to Select all the LI elements which contain SPAN with id="liveDeal152_dealPrice" as descendents. How do i do this with xpath?
Here is a sample html
<ul>
<li id="liveDeal_152">
<p class="price">
<em>|
<span class="WebRupee">₹ </span>
<span id="liveDeal152_dealPrice">495 </span>
</p>
</li>
<li id="liveDeal_152">
<p class="price">
<em>|
<span class="WebRupee">₹ </span>
(price hidden)
</p>
</li>
</ul>
//li[.//span[#id = 'liveDeal152_dealPrice']] should do. Or more verbose but closer to your textual description //li[descendant::span[#id = 'liveDeal152_dealPrice']].
Use this
//li[.//span[#id="liveDeal152_dealPrice"]]
It selects
ALL <li> ELEMENTS
//li[ ]
THAT HAVE A <span> DESCENDANT
.//span[ ]
WITH id ATTRIBUTE EQUAL TO "liveDeal152_dealPrice"
#id="liveDeal152_dealPrice"
That said, it doesn't seem like a very wise element selection, mostly due to the dynamically looking id. If you're going to use it once, it's probably ok, but if you're using it, say, for testing and will reuse it many times, it might cause trouble. Are you sure this won't change when you change your website and/or database?
As a side note:
ul stands for "unordered list"
ol stands for "ordered list"
li stands for "list item"
I'm trying to use Xpath to get the text of the parent anchor without also getting the text from the span of the example below:
<a id="readingListBtn" class="btn btn-transparent" title="Reading List" href="javascript:void(0);">
<span class="icon icon_headerBookmark">Header Bookmark</span>
0
</a>
The Xpath I'm using (//a[#id = 'readingListBtn']) returns "Header Bookmark0", but I'm just interested in the "0" part.
Just get the direct text child:
//a[#id = 'readingListBtn']/text()