Returning a list of <li> WebElements via find_element_by_xpath - xpath

So I am using a combination of Selenium and Python 2.7 (and if it matters the browser I am in is Firefox). I am new to XPath but it seems very useful for fetching WebElements.
I have the following HTML file that I am parsing through:
<html>
<head></head>
<body>
..
<div id="childItem">
<ul>
<li class="listItem"><img/><span>text1</span></li>
<li class="listItem"><img/><span>text2</span></li>
...
<li class="listItem"><img/><span>textN</span></li>
</ul>
</div>
</body>
</html>
Now I can use the following code to get a list of all the li elements:
root = element.find_element_by_xpath('..')
child = root.find_element_by_id('childDiv')
list = child.find_elements_by_css_selector('div.childDiv > ul > li.listItem')
I am wondering how I can do this in an XPath statement. I have tried a few statments but the most simple is:
list = child.find_element_by_xpath('li[#class="listItem"]')
But I always end up getting the error:
selenium.common.exceptions.NoSuchElementException: Message: u'Unable to locate element: {"method":"xpath","selector":"li[#class=\\"listItem\\"]"}';
As I do have a work around (the first three lines) this is not critical for me, but I would like to know what I am doing wrong.

You are missing the .// at the start of the xpath:
list = child.find_element_by_xpath('.//li[#class="listItem"]')
The .// means to search anywhere within the child element.

Related

XPath valid in Firefox but not in Chrome

I am trying to find a menu element via XPath in the JupyterLab UI; The following is an extract of the list of elements in the menu I am interested in, and should be a good minimal example of my problem:
<li tabindex="0" aria-disabled="true" role="menuitem" class="lm-Menu-item p-Menu-item lm-mod-disabled p-mod-disabled lm-mod-hidden p-mod-hidden" data-type="command" data-command="filemenu:logout">
<div class="f1vya9e0 lm-Menu-itemIcon p-Menu-itemIcon jp-Icon"></div>
<div class="lm-Menu-itemLabel p-Menu-itemLabel">Log Out</div>
<div class="lm-Menu-itemShortcut p-Menu-itemShortcut"></div>
<div class="lm-Menu-itemSubmenuIcon p-Menu-itemSubmenuIcon"></div>
</li>
<li tabindex="0" role="menuitem" class="lm-Menu-item p-Menu-item" data-type="command" data-command="hub:logout"><div class="f1vya9e0 lm-Menu-itemIcon p-Menu-itemIcon jp-Icon">
<div class="f1vya9e0 lm-Menu-itemIcon p-Menu-itemIcon jp-Icon"></div>
<div class="lm-Menu-itemLabel p-Menu-itemLabel">Log Out</div>
<div class="lm-Menu-itemShortcut p-Menu-itemShortcut"></div>
<div class="lm-Menu-itemSubmenuIcon p-Menu-itemSubmenuIcon"></div>
</li>
As you can see, both <li> items contain a <div> with the text Log Out, which is my main problem, as I am trying to write a general Xpath expression that can work for any Menu item. What I am currently trying to use is:
//div[contains(#class, 'p-Menu-itemLabel')][text() = '${item}']
Where ${item} can be any menu item, as all <li> items will have a similar div with text in them. The problem arises with the Log Out item, which is the only one that is repeated twice. In order to handle this special case, I have though of using
//div[contains(#class, 'p-Menu-itemLabel')][text() = 'Log Out']/..[not(contains(#class,'p-mod-hidden'))]
Since either one of the two <li> items will not contain that specific class (i.e., the currently active Log Out element).
This XPath works fine in Firefox and finds the element I am looking for everytime, however Chrome complains that it is not a valid XPath expression. Somehow this reduced version:
//div[contains(#class, 'p-Menu-itemLabel')][text() = 'Log Out']/..
works in Chrome, but any time I try to use an attribute selector on the parent element (i.e. /..[something]) it fails to recognize it as a valid XPath.
Does anyone have any idea of why? And what can I do to make Chrome recognize it as a valid XPath?
It seems that Chrome doesn't like applying a predicate directly from the .. parent axis.
But you can modify to use the long form: parent::*
//div[contains(#class, 'p-Menu-itemLabel')][text() = 'Log Out']/parent::*[not(contains(#class,'p-mod-hidden'))]
Or apply the self::* axis and then apply the predicate:
//div[contains(#class, 'p-Menu-itemLabel')][text() = 'Log Out']/../self::*[not(contains(#class,'p-mod-hidden'))]

XPath node that doesn't contain a child

I'm trying to access a certain element from by using XML but I just can't seem to get it, and I don't understand quite why.
<ul class="test1" id="content">
<li class="list">
<p>Insert random text here</p>
<div class="author">
</div>
</li>
<li class="list">
<p>I need this text here</p>
</li>
</ul>
Basically the text I want is the second one but I want/need to use something similar to p[not(div)] as to retrieve it.
I have tried the methods from the following link but to no avail (xpath find node that does not contain child)
Here is how I tried accessing the text:
ul[contains(#id,"content")]//p[not(.//div)]/text()
If you have any possible answers, thank you !
The HTML snippet posted in question shows that both p elements do not contain any div, so the expression //p[not(.//div)] would match both p. The first p element is sibling of the div (both shares the same parent element li) instead of parent or ancestor. The following XPath expression would match text nodes from the 2nd p and not those from the first one:
//ul[contains(#id,"content")]/li[not(div)]/p/text()
Brief explanation:
//ul[contains(#id,"content")]: find ul elements where id attribute value contains text "content"
/li[not(div)]: from such ul find child elements li that don't have child element div. This will match only the end li in the example HTML
/p/text(): from such li, find child elements p and then return child text nodes form such p

Obtain an xpath element containing another element with an specific class

Hello I have this HTML:
<div class="_3Vhpd"><span>Your commerce Data</span>
<a class="n3G0C" href='http://www.webadress.......'><span>Some Text</span</a>
</div>
I tried to obtain the tag as follow:
parser.xpath('//div[contains(#class,"_3Vhpd")]//following-sibling::*[a[#class="n3G0C"]]/#href ')
but I received none '[]'. Maybe because is not just after div but after a span...
First, you sample html doesn't have a class="n3G0C", but assuming you fix it, this xpath expression should work:
//div[contains(#class,"_3Vhpd")]//following-sibling::a/#href
Output:
http://www.webadress.......

Access two elements simultaneously in Nokogiri

I have some weirdly formatted HTML files which I have to parse.
This is my Ruby code:
File.open('2.html', 'r:utf-8') do |f|
#parsed = Nokogiri::HTML(f, nil, 'windows-1251')
puts #parsed.xpath('//span[#id="f5"]//div[#id="f5"]').inner_text
end
I want to parse a file containing:
<span style="position:absolute;top:156pt;left:24pt" id=f6>36.4.1.1. варенье, джемы, конфитюры, сиропы</span>
<div style="position:absolute;top:167.6pt;left:24.7pt;width:709.0;height:31.5;padding-top:23.8;font:0pt Arial;border-width:1.4; border-style:solid;border-color:#000000;"><table></table></div>
<span style="position:absolute;top:171pt;left:28pt" id=f5>003874</span>
<div style="position:absolute;top:171pt;left:99pt" id=f5>ВАРЕНЬЕ "ЭКОПРОДУКТ" ЧЕРНАЯ СМОРОДИНА</div>
<div style="position:absolute;top:180pt;left:99pt" id=f5>325гр. </div>
<div style="position:absolute;top:167.6pt;left:95.8pt;width:2.8;height:31.5;padding-top:23.8;font:0pt Arial;border-width:0 0 0 1.4; border-style:solid;border-color:#000000;"><table></table></div>
I need to select either <div> or <span> with id==5. With my current XPath selector it's not possible. If I remove //span[#id="f5"], for example, then the divs are selected correctly. I can output them one after another:
puts #parsed.xpath('//div[#id="f5"]').inner_text
puts #parsed.xpath('//span[#id="f5"]').inner_text
but then the order would be a complete mess. The parsed span have to be directly underneath the div from the original file.
Am I missing some basics? I haven't found anything on the web regarding parallel parsing of two elements. Most posts are concerned with parsing two classes of a div for example, but not two different elements at a time.
If I understand this correctly, you can use the following XPath :
//*[self::div or self::span][#id="f5"]
xpathtester demo
The XPath above will find element named either div or span that have id attribute value equals "f5"
output :
<span id="f5" style="position:absolute;top:171pt;left:28pt">003874</span>
<div id="f5" style="position:absolute;top:171pt;left:99pt">ВАРЕНЬЕ "ЭКОПРОДУКТ" ЧЕРНАЯ СМОРОДИНА</div>
<div id="f5" style="position:absolute;top:180pt;left:99pt">325гр.</div>

Extracting contents from a list split across different divs

Consider the following html
<div id="relevantID">
<div class="column left">
<h1> Section-Header-1 </h1>
<ul>
<li>item1a</li>
<li>item1b</li>
<li>item1c</li>
<li>item1d</li>
</ul>
</div>
<div class="column">
<ul> <!-- Pay attention here -->
<li>item1e</li>
<li>item1f</li>
</ul>
<h1> Section-Header-2 </h1>
<ul>
<li>item2a</li>
<li>item2b</li>
<li>item2c</li>
<li>item2d</li>
</ul>
</div>
<div class="column right">
<h1> Section-Header-3 </h1>
<ul>
<li>item3a</li>
<li>item3b</li>
<li>item3c</li>
<li>item3d</li>
</ul>
</div>
</div>
My objective is to extract the items for each Section headers. However, inconveniently the designer of the webpage decided to break up the data into three columns, adding an additional div (with classes column right etc).
My current method of extraction was using the xpath
for section headers, I use the xpath (get all h1 elements withing a div with given id)
//div[#id="relevantID"]//h1
above returns a list of h1 elements, looping over each element I apply the additional selector, for each matched h1 element, look up the next ul node and retreive all its li nodes.
following-sibling::ul//li
But thanks to the designer's aesthetics, I am failing in the one particular case I've marked in the HTML file. Where the items are split across two different column divs.
I can probably bypass this problem by stripping out the column divs entirely, but I don't think modifying the html to make a selector match is considered good (I haven't seen it needed anywhere in the examples I've browsed so far).
What would be a good way to extract data that has been formatted like this? Full solutions are not neccessary, hints/tips will do. Thanks!
The columns do frustrate use of following-sibling:: and preceding-sibling::, but you could instead use the following:: and preceding:: axis if the columns at least keep the list items in proper document order. (That is indeed the case in your example.)
The following XPath will select all li items, regardless of column, occurring after the "Section-Header-1" h1 and before the "Section-Header-2" h1 header in document order:
//div[#id='relevantID']//li[normalize-space(preceding::h1) = 'Section-Header-1'
and normalize-space(following::h1) = 'Section-Header-2']
Specifically, it selects the following items from your example HTML:
<li>item1a</li>
<li>item1b</li>
<li>item1c</li>
<li>item1d</li>
<li>item1e</li>
<li>item1f</li>
You can combine following-sibling and preceding-sibling to get possible li elements in a div before the h2 and use the union operator |. As example for the second h2:
((//div[#id="relevantID"]//h1)[2]/preceding-sibling::ul//li) |
((//div[#id="relevantID"]//h1)[2]/following-sibling::ul//li)
Result:
<li>item1e</li>
<li>item1f</li>
<li>item2a</li>
<li>item2b</li>
<li>item2c</li>
<li>item2d</li>
As you're already selecting all h1 using //div[#id="relevantID"]//h1 and retrieving all li items for each h1 using as a second step following-sibling::ul//li, you could combine this to following-sibling::ul//li | preceding-sibling::ul//li.

Resources