I would like to show an example.
This how the page looks:
<a class="aclass">
<div class="divclass"></div>
<div id="innerclass">
<span class="spanclass">Hello</span>
</div>
</a>
<a class="aclass">
<div class="divclass"></div>
<div id="innerclass">
<span class="spanclass">Pick Delivery Location</span>
</div>
</a>
I want to select anchor tags that have a child (direct or non-direct) span that has the text 'Hello'.
Right now, I do something like this:
//a[#class='aclass'][div/span[text() = 'Hello']]
I want to be able to select without having to select direct children (div in this case), like this:
//a[#class='aclass'][//span[text() = 'Hello']]
However, the second one finds all the anchor tags with the class 'aclass' rather than the one with the span with 'Hello' text.
I hope I worded my question clearly. Please feel free to edit if necessary.
In your attempt, // goes back to the root of the document - effectively you are saying "Give me the as for which there is a span anywhere in the document", which is why you get them all.
What you need is the descendant axis :
//a[#class='aclass' and descendant::span[text() = 'Hello']]
Note I have joined the conditions with and, but two separate conditions would also work.
Related
I am trying to find a menu element via XPath in the JupyterLab UI; The following is an extract of the list of elements in the menu I am interested in, and should be a good minimal example of my problem:
<li tabindex="0" aria-disabled="true" role="menuitem" class="lm-Menu-item p-Menu-item lm-mod-disabled p-mod-disabled lm-mod-hidden p-mod-hidden" data-type="command" data-command="filemenu:logout">
<div class="f1vya9e0 lm-Menu-itemIcon p-Menu-itemIcon jp-Icon"></div>
<div class="lm-Menu-itemLabel p-Menu-itemLabel">Log Out</div>
<div class="lm-Menu-itemShortcut p-Menu-itemShortcut"></div>
<div class="lm-Menu-itemSubmenuIcon p-Menu-itemSubmenuIcon"></div>
</li>
<li tabindex="0" role="menuitem" class="lm-Menu-item p-Menu-item" data-type="command" data-command="hub:logout"><div class="f1vya9e0 lm-Menu-itemIcon p-Menu-itemIcon jp-Icon">
<div class="f1vya9e0 lm-Menu-itemIcon p-Menu-itemIcon jp-Icon"></div>
<div class="lm-Menu-itemLabel p-Menu-itemLabel">Log Out</div>
<div class="lm-Menu-itemShortcut p-Menu-itemShortcut"></div>
<div class="lm-Menu-itemSubmenuIcon p-Menu-itemSubmenuIcon"></div>
</li>
As you can see, both <li> items contain a <div> with the text Log Out, which is my main problem, as I am trying to write a general Xpath expression that can work for any Menu item. What I am currently trying to use is:
//div[contains(#class, 'p-Menu-itemLabel')][text() = '${item}']
Where ${item} can be any menu item, as all <li> items will have a similar div with text in them. The problem arises with the Log Out item, which is the only one that is repeated twice. In order to handle this special case, I have though of using
//div[contains(#class, 'p-Menu-itemLabel')][text() = 'Log Out']/..[not(contains(#class,'p-mod-hidden'))]
Since either one of the two <li> items will not contain that specific class (i.e., the currently active Log Out element).
This XPath works fine in Firefox and finds the element I am looking for everytime, however Chrome complains that it is not a valid XPath expression. Somehow this reduced version:
//div[contains(#class, 'p-Menu-itemLabel')][text() = 'Log Out']/..
works in Chrome, but any time I try to use an attribute selector on the parent element (i.e. /..[something]) it fails to recognize it as a valid XPath.
Does anyone have any idea of why? And what can I do to make Chrome recognize it as a valid XPath?
It seems that Chrome doesn't like applying a predicate directly from the .. parent axis.
But you can modify to use the long form: parent::*
//div[contains(#class, 'p-Menu-itemLabel')][text() = 'Log Out']/parent::*[not(contains(#class,'p-mod-hidden'))]
Or apply the self::* axis and then apply the predicate:
//div[contains(#class, 'p-Menu-itemLabel')][text() = 'Log Out']/../self::*[not(contains(#class,'p-mod-hidden'))]
<ul>
<li class="xyz">
<div class="divClass">
<span class="ContentItem---status---dL0iS">
<span>Success</span>
</span>
<p class="ContentItem---title---37IqA">
<span>Test Check</span>
: Please display the text
</p>
</div>
</li>
<li class="xyz">
<div class="divClass">
<span class="ContentItem---status---dL0iS">
<span>Not COMPLETED</span>
</span>
<p class="ContentItem---title---37IqA">
<span>Knowledge</span> A Team
</p>
</div>
</li>
.... and so on
</ul>
This is my html structure.I have this text Test Check inside a Span and : Please display the text inside a Paragraph tag.
What i need is ,i need to identify, whether my structure contains this complete text or not Test Check: Please display the text.
I have tried multiple ways and couldn't identify the complete path.Please find the way which i have tried
//span[text()='Test Check']/p[text()=': Please display the text']
Can you please provide me the xpath for this?
I think there is one possible solution to identify within the given html text and retrieve. I hope this solves your problem.
def get_tag_if_present(html_text):
soup_obj = BeautifulSoup(html_text,"html.parser")
test_check = soup_obj.find_all(text = re.compile(r"Test Check"))
result_val = "NOT FOUND"
if test_check:
for each_value in test_check:
parent_tag_span = each_value.parent
if parent_tag_span.name == "span":
parent_p_tag = parent_tag_span.parent
if parent_p_tag.name == "p" and "Please display the text" in parent_p_tag.get_text():
result_val = parent_p_tag
break
return result_val
The returned result_val will have the tag corresponding to the p tag element with the parameter. It would return NOT FOUND, if no such element exists.
I've taken this with the assumption that the corresponding data entries would exist in a "p" tag and "span" tag respectively , feel free to remove the said conditions for all identifications of the text in the given html text.
I am newbie here. Please advise. How to select checkbox in my case?
<ul class="phrases-list" style="">
<li>
<input type="checkbox" class="select-phrase">
<span class="prase-title"> Dog - Wikipedia, the free encyclopedia </span>
(en.wikipedia.org)
<div class="prase-desc hidden">The domestic dog (Canis lupus familiaris or Canis familiaris) is a domesticated...</div>
</li>
The following doesn't work for me:
When /I check box "([^\"]+)"$/ do |label|
page.check(label)
end
step: And I check box "Dog - Wikipedia, the free encyclopedia"
If you can change the html, wrap the input and span in a label element
<ul class="phrases-list" style="">
<li>
<label>
<input type="checkbox" class="select-phrase">
<span class="prase-title"> Dog - Wikipedia, the free encyclopedia </span>
</label>
(en.wikipedia.org)
<div class="prase-desc hidden">The domestic dog (Canis lupus familiaris or Canis familiaris) is a domesticated...</div>
</li>
which has the added benefit of clicks on the "Dog - Wikipedia ..." text triggering the checkbox too. With that change your step should work as written. If you can't modify the html then things get more difficult.
Something like
find('span', text: label).find(:xpath, './preceding-sibling::input').set(true)
should work, although I'm curious how you're using these checkboxes from JS with nothing tying them to any specific value
Let's assume that you are prevented from changing the HTML. In this case, it would probably be easiest to query for the element via XPath. For example:
# Here's the XPath query
q = "//span[contains(text(), 'Dog - Wikipedia')]/preceding-sibling::input"
# Use the query to find the checkbox. Then, check the checkbox.
page.find(:xpath, q).set(true)
Okay - it's not as bad as it looks! Let's analyze this XPath so we can understand what it's doing:
//span
This first part says "Search the entire HTML document and discover all "span" elements. Of course, there are probably a LOT of "span" elements in the HTML document, so we'll need to restrict this:
//span[contains(text(), 'Dog - Wikipedia')]
Now we're only searching for the "span" elements that contain the text "Dog - Wikipedia". Presumably, this text will uniquely identify the desired "span" element on the page (if not, then just search for more of the text).
At this point, we have the "span" element that is adjacent to the desired "input" element. So, we can query for the "input" element using the "preceding-sibling::" XPath Axis:
//span[contains(text(), 'Dog - Wikipedia')]/preceding-sibling::input
Lets say I have a simple page that has less IDs than I'd like for testing
<div class="__panel_body">
<div class="__panel_header">Real Estate Rating</div>
<div class="__panel_body">
<div class="__panel_header">Property Rating Info</div>
<a class="icon.edit"></a>
<a class="icon.edit"></a>
</div>
<div class="__panel_body">
<div class="__panel_header">General Risks</div>
<a class="icon.edit"></a>
<a class="icon.edit"></a>
</div>
<div class="__panel_body">
<div class="__panel_header">Amenities</div>
<a class="icon.edit"></a>
<a class="icon.edit"></a>
</div>
</div>
I'm using Jeff Morgan's Page Object gem and I want to make accessors for the edit links in any given section.
The challenge is that the panel headers differentiate what body I want to choose. Then I need to access the parent and get all links with class "icon.edit". Assume I can't change the HTML to solve this.
Here's a start
module RealEstateRatingPageFields
div(:general_risks_section, ....)
def general_risks_edit_links
general_risks_section_element.links(class: "icon.edit")
end
end
How do I get the general_risks_section accessor to work, though?
I want that to represent the parent div to the panel header with text 'General Risks'...
There are a number of ways to get the general risk section.
Using a Block
The accessors can take a block where you can more programatically describe how to locate the element. This allows you to locate a distinguishing element and then traverse the DOM to the element you actually want. In this case, you can locate the header with the matching text and navigate to its parent.
div(:general_risks_section) { div_element(class: '__panel_header', text: 'General Risks').parent }
Using XPath
While harder to read and write, you could also use an XPath locator. The concept and thought process is the same as using the block. The only benefit is that it reduces the number of element calls, which slightly improves performance.
div(:general_risks_section, xpath: './/div[#class="__panel_body"][./div[#class="__panel_header" and text() = "General Risks"]]')
The XPath is saying:
.//div # Find a div element that
[#class="__panel_body"] # Has the class "__panel_body" and
[./div[ # Contains a div element that
#class="__panel_header" and # Has the class "__panel_header" and
text() = "General Risks" # Has the text "General Risks"
]]
Using the Body Text
Given the HTML, you could also just locate the section directly based on its text.
div(:general_risks_section, class: '__panel_body', text: 'General Risks')
Note that this assumes that the HTML given was not simplified. If there are actually other text nodes, this probably would not be the best option.
Consider the following html
<div id="relevantID">
<div class="column left">
<h1> Section-Header-1 </h1>
<ul>
<li>item1a</li>
<li>item1b</li>
<li>item1c</li>
<li>item1d</li>
</ul>
</div>
<div class="column">
<ul> <!-- Pay attention here -->
<li>item1e</li>
<li>item1f</li>
</ul>
<h1> Section-Header-2 </h1>
<ul>
<li>item2a</li>
<li>item2b</li>
<li>item2c</li>
<li>item2d</li>
</ul>
</div>
<div class="column right">
<h1> Section-Header-3 </h1>
<ul>
<li>item3a</li>
<li>item3b</li>
<li>item3c</li>
<li>item3d</li>
</ul>
</div>
</div>
My objective is to extract the items for each Section headers. However, inconveniently the designer of the webpage decided to break up the data into three columns, adding an additional div (with classes column right etc).
My current method of extraction was using the xpath
for section headers, I use the xpath (get all h1 elements withing a div with given id)
//div[#id="relevantID"]//h1
above returns a list of h1 elements, looping over each element I apply the additional selector, for each matched h1 element, look up the next ul node and retreive all its li nodes.
following-sibling::ul//li
But thanks to the designer's aesthetics, I am failing in the one particular case I've marked in the HTML file. Where the items are split across two different column divs.
I can probably bypass this problem by stripping out the column divs entirely, but I don't think modifying the html to make a selector match is considered good (I haven't seen it needed anywhere in the examples I've browsed so far).
What would be a good way to extract data that has been formatted like this? Full solutions are not neccessary, hints/tips will do. Thanks!
The columns do frustrate use of following-sibling:: and preceding-sibling::, but you could instead use the following:: and preceding:: axis if the columns at least keep the list items in proper document order. (That is indeed the case in your example.)
The following XPath will select all li items, regardless of column, occurring after the "Section-Header-1" h1 and before the "Section-Header-2" h1 header in document order:
//div[#id='relevantID']//li[normalize-space(preceding::h1) = 'Section-Header-1'
and normalize-space(following::h1) = 'Section-Header-2']
Specifically, it selects the following items from your example HTML:
<li>item1a</li>
<li>item1b</li>
<li>item1c</li>
<li>item1d</li>
<li>item1e</li>
<li>item1f</li>
You can combine following-sibling and preceding-sibling to get possible li elements in a div before the h2 and use the union operator |. As example for the second h2:
((//div[#id="relevantID"]//h1)[2]/preceding-sibling::ul//li) |
((//div[#id="relevantID"]//h1)[2]/following-sibling::ul//li)
Result:
<li>item1e</li>
<li>item1f</li>
<li>item2a</li>
<li>item2b</li>
<li>item2c</li>
<li>item2d</li>
As you're already selecting all h1 using //div[#id="relevantID"]//h1 and retrieving all li items for each h1 using as a second step following-sibling::ul//li, you could combine this to following-sibling::ul//li | preceding-sibling::ul//li.