Unable to create an accurate xpath to locate an item - xpath

I'am stuck creating an appropriate xpath to locate the Title: and Genre: from the below elements.
Html elements:
<div class="mdif">
<ul>
<li><b>Title:</b>Army Of Darkness</li>
<li><b>Genre:</b></li> Horror
</ul>
</div>
Output I would like to have:
Army Of Darkness
Horror
I've tried with the below one to get Army Of Darkness and I got success. I don't know whether it is the most accurate one:
root.xpath("//div[#class='mdif']//li/text()")[0]
However, in case of getting Horror, I get stuck. Any help to get Horroras result using xpath from the above elements will be highly appreciated.

Try below XPath and let me know in case of any issues:
//div[#class='mdif']//ul//text()[normalize-space() and not(parent::b)]
normalize-space() predicate should discard text node that consists of space characters only and not(parent::b) predicate should allow you to ignore "Title:" and "Genre:" nodes

Related

Can't get xpath to locate element

I have this piece of HTML and I'm trying to select the <a href> link using xpath.
<li class="footable-page-nav" data-page="next" aria-label="next"><a class="footable-page-link xh-highlight" href="#">›</a></li>
I need the selector to be reasonably specific since "footable-page-link" exists in multiple places in the HTML.
I've tried this:
//li[#class='footable-page-nav']/a[#class='xh-highlight']//#href
Selenium throws an error: selenium.common.exceptions.NoSuchElementException
If I shorten the xpath expression to //li[#class='footable-page-nav'] just to see if I'm on the right track then I get
selenium.common.exceptions.ElementNotInteractableException: Message: element not interactable: element has zero size
What am I missing?
Try changing your xpath expression to
//li[#class='footable-page-nav']/a[contains(#class,'xh-highlight')]//#href
and see if it works.

XPath: How do I find a page element which contains another element, using the full text of both?

I have an HTML page which contains the following:
<div class="book-info">
The book is <i>Italicized Title</i> by Author McWriter
</div>
When I view this in Chrome Dev Tools, it looks like:
<div class="book-info">
"The book is "
<i>Italicized Title</i>
" by Author McWriter"
</div>
I need a way to find this single div using XPath.
Constraints:
There are many book-info divs on the page, so I can't just look for a div with that class.
Any part of the text within the book-info div might also appear in another, but the complete text within the div is unique. So I want to match the entire text, if possible.
It is not guaranteed that an <i> will exist within the book-info div. The following could also exist, and I need to be able to find it as well (but my code is working for this case):
<div class="book-info">
"Author McWriter's Legacy"
</div>
I think I can detect whether the div I'm looking for contains an <i> or not, and construct a different XPath expression depending on that.
Things I have tried:
//div[text()=concat("The book is ","Italicized Title"," by Author McWriter")]
//div[text()=concat("The book is ","<i>Italicized Title"</i>," by Author McWriter")]
//div[text()=concat("The book is ",[./i[text()="Italicized Title"]," by Author McWriter")]
//div[concat(text()="The book is ", i[text()="Italicized Title"],text()=" by Author McWriter")]
None of these worked for me. What XPath expression would?
You can use this combination of XPath-1.0 predicates in one expression. It matches both cases:
//div[#class="book-info" and ((i and contains(text()[1],"The book is") and contains(text()[2],"by Author McWriter")) or (not(i) and contains(string(.),"Author McWriter&apos;s Legacy")))]

How to get number of list element (ul tag) of HTML using Get matching XPath count?

I'm kind of new to XPATH-query. I use RF and selenium2library and the XPath Helper-plugin in chrome to test a certain website page. I'm new to HTML/CSS/JavaScript as well.
The web page consists of two ULs (lists) for left and right sides of the page and each one has a few LIs which have few divisions comprised of widgets (JPEG images etc).
I need to count this list rows (number of LIs in each UL). I have already done the samething in a drop down menu to count its elements with no problem (perhaps because it was considered
a web element). But right now I use the same "Get Matching Xpath Count" which returns almost the whole page HTML source instead of a number and it then fails.
All my program is based on getting the number of LIs in a UL (of drop down menu, page, table,...). so I wonder what to do now. Here is an example of the HTML code of the page:
<ul class="rqcol" id="col8a580456553ae">
<li class="rqportlet" id="por8a58045655">
<div id="hdrpor8a580" class="rqhdr" onmouseover="RQ.util.showTools(this)" onmouseout="RQ.util.hideTools(this)"> </div> </li>
<li class="rqportlet" id="por8a580456" >
<div id="hdrpor8a581" class="rqhdr" onmouseover="RQ.util.showTools(this)" onmouseout="RQ.util.hideTools(this)"> </div></li>
</ul>
and my code was:
Get Matching Xpath Count | //ul[#id="ccol8a580456553ae"]/li
which does give me some texts plus HTML code.i also tried:
Get Length | //ul[#id="ccol8a580456553ae"]
which doesn't give me 2 but a big number.
An XPath 2.0 expression to count the 'li' for the specific '' would be:
//ul[#id="col8a580456553ae"]/count(li)
Try this new chrome extension
https://chrome.google.com/webstore/detail/relative-xpath-helper/eanaofphbanknlngejejepmfomkjaiic
You've made a typo in the id value - an extra "c" char in the beginning; otherwise the xpath is correct:
${count}= Get Matching Xpath Count //ul[#id="col8a580456553ae"]/li
By the way, the keyword Get Matching Xpath Count is deprecated in the latest version of the SeleniumLibrary, in favour of Get Element Count

XPath/Scrapy crawling weirdly formatted pages

I've been playing around with scrapy and I see that knowledge of xpath is vital in order to leverage scrapy sucessfully. I have a webpage I'm trying to gather some information from where the tags are formatted as such
<div id = "content">
<h1></h1>
<p></p>
<p></p>
<h1></h1>
<p></p>
<p></p>
Now the heading contains a title and the first 'p' contains data1 and the second 'p' contains data2. This seems like a pretty straight forward task, and if this were always the case I would have no problem i.e. hsx.select('//*[#id="content"]') etc. etc.
The problem is, sometimes there will only be ONE p tag following a header instead of two.
<div id = "content">
<h1></h1>
<p></p> (a)
<h1></h1>
<p></p> (b)
<p></p> (c)
What i would like is if there is a paragraph tag missing I want to store that information as just blank data in my list. Right now what happens is the lists are storing the first heading 1, the first paragraph tag(a), and then the paragraph tag under the second h1 (b).
What it should be doing is storing
title -> h1[0]
data1[0] -> (a)
data2[0] ->[]
I hope that makes sense. I've been looking for a good xpath or scrapy solution to do this but I can't seem to find one. Any helpful tips would be awesome. thanks
Use:
//div[#id='content']
/h1[1]/following sibling::*
[not(position()>2)][self::p]
This selects the (utmost) two immediate sibling elements, only if they are p, of the first h1 child of any div (we know that this must be just one div) the string value of whoseidattribute is"content"`.
If only the first immediate sibling is a p, then the returned node-list contains only one item.
You can check whether the length of the returned node-list is 1 or 2, and use this to build the control of your processing.
I think you'd want something like this; not 100% though / untested.
//h1/following-sibling::*[2][self::p]/text()|//h1[not(following-sibling::*[2][self::p])]/string('')

Xpath getting node without node child contents

hey guys coudln't get around this. I have an html structured as follow:
<div class="review-text">
<div id="reviewerprofile">
<div id="revimg"></div>
<div id="reviewr">marc</div>
<div id="revdate">2011-07-06</div>
</div>
this is an awesome review
</div>
what i am trying to get is just the text "this is an awesome review" but everytyme i query the node i also get the other content in the childs. using something like this now ".//div[#class='review-text']" how to get just that text only? tank you very much
You're almost there! Just add /text() at the end of your XPath to get the text node.
An XPath expression such as //div returns a set of nodes, in this case div elements. These are in effect pointers to the original nodes in the original tree; the nodes are still connected to their parents, children, ancestors, and siblings. If you see the children of the div element and don't want them, that's not the fault of the XPath processor, it's the fault of whatever software is processing the results returned by the XPath expression.
You can get the text that's an immediate child of the div element by using /text() as suggested. However, that assumes that you know exactly what you are expecting to find in the HTML page - if "awesome" were in italics, it would give you something different.

Resources