how to select 2nd set of <ol> list with xpath - xpath

I'm trying to scrape the tips sections of these exercises but a lot of the pages are different resulting in a blank field.
The only thing they all have in common is that the tips are always in the 2nd oldered list. The 1st ordered list is the instructions. 2nd ordered list are the tips.
Here are some of the xpath that I have tried:
//ol (this selects both ordered lists)
//ol[2] (this doesn't work at all for some reason)
//h3[contains(text(),'​Exercise Tips:')]/following::ol (some of the pages it didn't pick up tips section)
//DIV[#class="field field-name-body field-type-text-with-summary field-label-hidden"]/DIV[1]/DIV[1]/OL[2] (again some of the pages it returned blank)
Link to the some of the exercises that the page are different:
https://www.muscleandstrength.com/exercises/one-arm-kettlebell-floor-press
https://www.muscleandstrength.com/exercises/overhead-tricep-extension.html

I would try //div[contains(#class, "exercise-tips")]//li/text().
Based on your example xpaths, you could use //div[contains(#class, "exercise-tips")]//ol if you truly want to select the ol element and not the text of the tips.
//ol[2] doesn't select any nodes because both ol nodes are the first ol child of their respective parents, not the second. (//ol)[2] does work, however. See https://stackoverflow.com/a/52166328/5225301 for more details.

Related

Trying to exclude a portion of an xPath

I have looked through several posts about this, but have failed to apply the principles used to get the result I desire, so I'm going to just post my specific problem.
I am building a Google Sheet that enables the user to pull up Bible verses.
I have it all working, however I am running into an issue with a hidden element being pulled into my text().
FUNCTION:
=IMPORTXML("http://www.biblestudytools.com/ESV/Numbers/5-3.html",
"//*[#class='scripture']//span[2]//text()")
RESULT: You shall put out both male and female, putting them outside the camp, that they may not defile their camp, 1in the midst of which I dwell."
You can see the "1" that is showing up before the word "in"
I have found the xPath that pulls only that "1"
//*[#class='scripture']//span[2]//sup//text()
I am trying to remove that "1" from the text.
HELP PLEASE!!! :)
You can add a predicate to the end to exclude text nodes that are inside sup elements:
=IMPORTXML("http://www.biblestudytools.com/ESV/Numbers/5-3.html",
"//*[#class='scripture']//span[2]//text()[not(ancestor::sup)]")
This will retrieve only the text nodes that are not inside a sup element, but it will still result in having the verse spread out across two cells, because there are two text nodes. You can rectify this by wrapping this expression in a JOIN():
=JOIN("", IMPORTXML("http://www.biblestudytools.com/ESV/Numbers/5-3.html",
"//*[#class='scripture']//span[2]//text()[not(ancestor::sup)]"))

Retrieve an xpath text contains using text()

I've been hacking away at this one for hours and I just can't figure it out. Using XPath to find text values is tricky and this problem has too many moving parts.
I have a webpage with a large table and a section in this table contains a list of users (assignees) that are assigned to a particular unit. There is nearly always multiple users assigned to a unit and I need to make sure a particular user is assigned to any of the units on the table. I've used XPath for nearly all of my selectors and I'm half way there on this one. I just can't seem to figure out how to use contains with text() in this context.
Here's what I have so far:
//td[#id='unit']/span [text()='asdfasdfasdfasdfasdf (Primary); asdfasdfasdfasdfasdf, asdfasdfasdfasdf; 456, 3456'; testuser]
The XPath Query above captures all text in the particular section I am looking at, which is great. However, I only need to know if testuser is in that section.
text() gets you a set of text nodes. I tend to use it more in a context of //span//text() or something.
If you are trying to check if the text inside an element contains something you should use contains on the element rather than the result of text() like this:
span[contains(., 'testuser')]
XPath is pretty good with context. If you know exactly what text a node should have you can do:
span[.='full text in this span']
But if you want to do something like regular expressions (using exslt for example) you'll need to use the string() function:
span[regexp:test(string(.), 'testuser')]

Dealing with duplicate ids in selenium webdriver

I am trying to automate some tests using selenium webdriver. I am dealing with a third-party login provider (OAuth) who is using duplicate id's in their html. As a result I cannot "find" the input fields correctly. When I just select on an id, I get the wrong one.
This question has already been answered for JQuery. But I would like an answer (I am presuming using Xpath) that will work in Selenium webdriver.
On other questions about this issue, answers typically say "you should not have duplicate id's in html". Preaching to the choir there. I am not in control of the webpage in question. If it was, I would use class and id properly and just fix the problem that way.
Since I cannot do that. What options do I get with xpath etc?
you can do it by driver.find_element_by_id, for example ur duplicate "duplicate_ID" is inside "div_ID" wich is unique :
driver.find_element_by_id("div_ID").find_element_by_id("duplicate_id")
for other duplicate id under another div :
driver.find_element_by_id("div_ID2").find_element_by_id("duplicate_id")
This XPath expression:
//div[#id='something']
selects all div elements in the XML document, the string value of whose id attribute is the string "something".
This Xpath expression:
count(//div[#id='something'])
produces the number of the div elements selected by the first XPath expression.
And this XPath expression:
(//div[#id='something'])[3]
selects the third (in document order) div element that is selected by the first XPath expression above.
Generally:
(//div[#id='something'])[$k]
selects the $k-th such div element ($k must be substituted with a positive integer).
Equipped with this knowledge, one can get any specific div whose id attribute has string value "something".
Which language are you working on? Dublicate id's shouldn't be a problem as you can virtually grab any attribute not just the id tag using xpath. The syntax will differ slightly in other languages (let me know if you want something else than Ruby) but this is how you do it:
driver.find_element(:xpath, "//input[#id='loginid']"
The way you go about constructing the xpath locator is the following:
From the html code you can pick any attribute:
<input id="gbqfq" class="gbqfif" type="text" value="" autocomplete="off" name="q">
Let's say for example that you want to consturct your xpath with the html code above (Google's search box) using name attribute. Your xpath will be:
driver.find_element(:xpath, "//input[#name='q']"
In other words when the id's are the same just grab another attribute available!
Improvement:
To avoid fragile xpath locators such as order in the XML document (which can change easily) you can use something even more robust. Two xpath locators instead of one. This can also be useful when dealing with hmtl tags that are really similar. You can locate an element by 2 of its attributes like this:
driver.find_element(:id, 'amount') and driver.find_element(xpath: "//input[#maxlength='50']")
or in pure xpath one liner if you prefer:
//input[#id="amount" and #maxlength='50']
Alternatively (and provided your xpath will only return one unique element) you can move one more step higher in the abstraction level; completely omitting the attribute values:
//input[#id and #maxlength]
It's not listed at http://selenium-python.readthedocs.io/locating-elements.html but I'm able access a method find_elements_by_id
This returns a list of all elements with the duplicate ID.
links = browser.find_elements_by_id("link")
for link in links:
print(link.get_attribute("href"))
you should use driver.findElement(By.xpath() but while locating element with firebug you should select absolute path for particular element instead of getting relative path this is how you will get the element even with duplicate ID's

Matching text with xpath?

I'm screen-scraping an HTML page which contains:
<table border=1 class="searchresult" cellpadding=2>
<tr><th colspan=2>Last search</th></tr>
<tr><th align=left>Search term</th><td>xxxxxx</td></tr>
<tr><th align=left>Result</th><td>yyyyyyyy/td></tr>
</table>
I want to write an XPATH expression which gets me the data cell containing "yyyyyyyy". I've gotten as far as
.//table[#class='searchresult']//tr/th
which gets me a list of all the table-header nodes in the table. I can iterate over them in user code, find the one whose .text is "Results" and then call .getnext() on that to get the table-data. But, is there a cleaner way to do this by writing a more specific XPATH pattern? It seems like there should be, but I haven't gotten my head that far around XPATH yet to figure out how.
If it matters, I'm doing this in Python with lxml.
.//table[#class='searchresult']//tr/td[preceding-sibling::th] might give you what you need.
Two comprehensive papers on semi-automatically creating XPath statements like this one, specifically for screen scraping purposes can be found here:
http://tobiasanton.com/Tobias_Anton/Academia.html
Use:
//table/tr[last()]/td
This selects any td element that is a child of any tr that is the last tr child of any table in this XHTML document.
This may select more than one td element, depending on whether or not there is only one table in the XHTML document. You need to make this expression more precise, if more than one table element is present.
For example, if the table in question is the first in the document, use:
(//table)[1]/tr[last()]/td

Selenium RC locators - referring to subsequent elements?

When there is more than a single element with the same locator in a page, how should the next elements be referenced?
Using Xpath locators it's possible to add array notation, e.g. xpath=(//span/div)[1]
But with simple locators?
For example, if there are 3 links identified by "link=Click Here", simply appending [3] won't get the 3rd element.
And where is the authoritative reference for addressing array of elements? I couldn't find any.
Selenium doesn't handle arrays of locators by itself. It just returns the first element that meets your query, so if you want to do that, you have to use xpath, dom or even better, css.
So for the link example you should use:
selenium.click("css=a:contains('Click Here'):nth-child(3)")
Santi is correct that Selenium returns the first element matching your specified locator and you have to apply the appropriate expression of the locator type you use. I thought it would be useful to give the details here, though, for in this case they do border on being "gory details":
CSS
The :nth-child pseudo-class is tricky to use; it has subtleties that are little-known and not clearly documented, even on the W3C pages. Consider a list such as this:
<ul>
<li class="bird">petrel</li>
<li class="mammal">platypus</li>
<li class="bird">albatross</li>
<li class="bird">shearwater</li>
</ul>
Then the selector css=li.bird:nth-child(3) returns the albatross element not the shearwater! The reason for this is that it uses your index (3) into the list of elements that are siblings of the first matching element--unfiltered by the .bird class! Once it has the correct element, in this example the third one, it then applies the bird class filter: if the element in hand matches, it returns it. If it does not, it fails to match.
Now consider the selector css=li.bird:nth-child(2). This starts with the second element--platypus--sees it is not a bird and comes up empty. This manifests as your code throwing a "not found" exception!
What might fit the typical mental model of finding an indexed entry is the CSS :nth-of-type pseudo-class which applies the filter before indexing. Unfortunately, this is not supported by Selenium, according to the official documentation on locators.
XPath
Your question already showed that you know how to do this in XPath. Add an array reference at any point in the expression with square brackets. You could, for example use something like this: //*[#id='abc']/div[3]/p[2]/span to find a span in the second paragraph under the 3rd div under the specified id.
DOM
DOM uses the same square bracket notation as XPath except that DOM indexes from zero while XPath indexes from 1: document.getElementsByTagName("div")[1] returns the second div, not the first div! DOM offers an alternate syntax as well: document.getElementsByTagName("div").item(0) is exactly equivalent. And note that with getElementsByTagName you always have to use an index since it returns a node set, not a single node.

Resources