Case sensitivity of Xpath Function name() is inconsistent in Edge compared to other browsers - xpath

Issue:
When using [name() = "SomeValue"] in Edge, it will not return nodes if the "SomeValue" to match contains capital letters. Even if those capital letters match the node name exactly.
Example:
I have created this JSFiddle which exhibits problem. It uses two XML strings, both a subset of the books.xml sample on MSDN, where the first has capitalized node names and the second I have modified to use lowercase node names. Fiddle with cleaner code.
Current Results:
Running the fiddle in Edge, you will see when searching for [name() = "catalog"] where "catalog" is in any mixed case, the XPath will match nodes only when the search term is fully lowercase. Notice that it doesn't matter what the case of the matching node is, the term "catalog" will match a node if the node name is camel case, full caps, or all lowercase.
Edge will match all three of these nodes:
<Catalog/>
<CATALOG/>
<catalog/>
When running the same in another browser (I have tested Firefox, Chrome, and Opera), the search term must match the node name case exactly, and is how I would expect XPath to operate. Out of the three node names above these browsers will only match <Catalog/> when using [name() = "Catalog"]
Expected Results:
I would expect Edge to behave the same as other browsers, since other functions like text() don't operate this way in Edge, which makes it even more inconsistent. This is shown in the JSFiddle as well.
Another reason I expect the same behavior, is that only XPath 1.0 is supported for all my tested browsers, so there should be no difference there.
In summary:
Is this a defect in Edge? / Is this allowed by the standard? If it is not allowed, I can write up a bug report to Microsoft. If it is allowed by the standard, do I just need to account for the browser difference?
Additional Info
Supporting existing software using jQuery, and looking for a solution which does not require additional third party software.

XPath 1.0 is defined over a particular data model, which is not exactly the same as the HTML DOM. And the HTML5 DOM in particular was defined many years after XPath 1.0 was frozen. This means that anyone implementing XPath 1.0 over the HTML5 DOM has to decide how to map the HTML5 DOM to the XPath data model. It's very unfortunate if different vendors do this mapping in different ways, but it's not actually a violation of any standard. One of the key decisions to make in defining this mapping is how to cope with the fact that HTML5 is case-insensitive while XPath 1.0 is case-sensitive.
The underlying problem here is that you are using the HTML5 DOM to hold stuff that isn't HTML. This is a bad idea, because HTML5 tries hard to bend your content to the HTML5 model, which may corrupt your data in surprising ways. It would be much better to create an XML DOM for this data.
Also, using the predicate [name()='SomeValue'] is bad practice anyway, because XPath 1.0 gives no guarantees about namespace prefixes in the result of the name() function. It's much better to use self::SomeValue, or self::hh:SomeValue if the data is in a namespace (although the mapping of HTML5 to a namespaced instance of the XPath data model raises another set of potential issues.)
Suggestion: use Saxon-JS as your XPath engine. That way (a) you get support for XPath 3.0 rather than 1.0, and (b) you're using the same XPath engine on every browser, so it will give compatible behaviour across browsers.

Related

tool for extracting xpath query from speciifed/selected node

Normally, one would use an XPath query to obtain a certain value or node. In my case, I'm doing some web-scraping with google spreadsheets, using the importXML function to update automatically some values. Two examples are given below:
=importxml("http://www.creditagricoledtvm.com.br/";"(//td[#class='xl7825385'])[9]")
=importxml("http://www.bloomberg.com/quote/ELIPCAM:BZ";"(//span)[32]")
The problem is that the pages I'm scraping will change every now and then and I understand very little about XML/XPath, so it takes a lot of trial and error to get to a node. I was wondering if there is any tool I could use to point to an element (either in the page or in its code) that would provide an appropriate query.
For example, in the second case, I've noticed the info I wanted was in a span node (hence (//span)), so I printed all of them in a spreadsheet and used the line count to find the [32] index. This takes long to load, so it's pretty inconvenient. Also, I don't even remember how I've figured the //td[#class='xl7825385'] query. Thus why I'm wondering if there is more practical method of pointing to page elements.
Some clues :
Learning XPath basics is still useful. W3Schools is a good starting point.
https://www.w3schools.com/xml/xpath_intro.asp
Otherwise, built-in dev tools of your browser can help you to generate absolute XPath. Select an element, right-click on it then >Copy>Copy XPath.
https://developers.google.com/web/tools/chrome-devtools/open
Browser extensions like Chropath can generate absolute or relative XPath for you.
https://autonomiq.io/chropath/

Webscraping Selectors

At what level of hierarchy do you begin your selectors?
There seems to be a convention of beginning with the container of the target element, but why not ever the target element itself, especially in the case of an id or starting with a wildcard plus a unique identifier?
Recursive descent seems like everyone's best friend.
XPaths and Css-Selectors are very versatile, and can describe the same element in many different ways - i.e. an single element has infinitely many possible locators to describe it. The goal is to get something to fit the needs of the developer which might include being readable, unique, and or adaptive.
Consider the following html example:
<div id='mainContainer'>
<span>some span</span>
</div>
If I were trying to make a locator for the <span> element, I wouldn't choose //span, because that will probably yield way too many results. Instead you could start with its parent who has an id, and then proceed to the span: //*[#id='mainContainer']/span, and alternatively: //span[parent::*[#id='mainContainer']]. Which XPath is better? Whichever one you personally find more readable. I agree with you that the first example does seem to be more common, although I myself am more partial to the latter.
Sometimes the point of making a locator a certain way is to be adaptable. For instance, I rarely write a locator like this: //*[#class='fooBar']. The reason is because in modern web development classes come and go frequently, and it's likely that that element's class could change at the slightest breeze. Instead you might write //*[contains(#class,'fooBar')]. Now when a developer goes in and adds a class for pure styling, you don't have to go back and update all of your selenium tests. That is also the reason I use wildcard characters frequently. If a developer goes in and updates a div to a span, my test will still work.
As #Gilles Quenot commented, it isn't always safe to assume that ids are unique. Many websites were written by someone's unemployed uncle who took an html class back in '86. They are terrible, and don't care at all about standards or audits. This is another reason that you need to include enough information in your locator to specify the exact element/elements you are talking about, but not too much information that you are describing too many elements.
One more comment is that XPaths are bidirectional, whereas Css-Selectors are not. This means XPaths can go from child to parent and from parent to child, where Css-Selectors can only go from parent to child. This affects which node you are starting at, and may be a reason that you see more Css-Selectors start from a parent/ancestor node.
TL;DR There isn't a convention, just personal preferences. Do what meets your needs.

Identifying objects in Tosca with Xpath

I am recently brushing up my skills in TOSCA, I was working on it 2 years ago and switched to Selenium, I noticed that the new TOSCA allows identification using Xpath, and I am really familiar with it now, however, I cannot make it work in TOSCA and I am sure the object identification works because I am testing my xpath in google chrome developer tools.
Something as simple as (//*[text()='Forgot Password?'])[1] does not seem to be working. Could I be missing something?
This is the webpage I am using as reference for this example:
https://www.freecrm.com/index.html
XPath certainly can be used to identify elements of an HTML web UI in Tosca.
Since the question was originally posted, the "Forgot Password?" link at https://www.freecrm.com/index.html appears to have changed so that it's text is now "Forgot your password?" and is actually located at https://ui.freecrm.com/.
To account for that change, this answer uses "(//*[text()='Forgot your password?'])[1]" instead of the expression provided in the original post.
With the text modification, the expression works to idenfity the element in XScan after wrapping it in double quotes:
"(//*[text()='Forgot your password?'])[1]"
Some things to keep in mind when using XPath in Tosca:
It seems that XPath expressions need to be wrapped in double quotes (") so that XScan knows when to start evaluating XPath instead of using its normal rules. Looking closely at the expression that is pregenerated when XScan starts, we see that it is wrapped in double quotes:
"id('ui')/div[1]/div[1]/div[1]/a[1]"
A valid XPath expression doesn't necessarily guarantee uniqueness, so it is helpful to pay attention to any feedback messages at the bottom of XScan. There is a significant difference between "The selected element was not found" and "The selected element is not unique". The former simply indicates XScan can't find a match, the latter indicates that XScan matches successfully, but cannot uniquely identify the element.
My experience has been that it helps to explicitly identify the element to reduce the possibility of ambiguity. If the idea is to target the anchor element in order for tests to click a link, then reducing scope from any element i.e. "(//*[text()='Forgot your Password?'])[1]" to only match anchor elements with that text "//a[text()='Forgot your password?']".
In general, Tricentis (or at least the trainers with whom I have spoken) recommends using methods other than XPath to identify a target if they are available. That said, in my experience I've had better luck with XPath than with "Identify by Anchor".
An XPath expression is visible and editable in the XModuleAttribute properties without having to rescan. Personally, I find it easier to work with than the XML value of the RelativeId property that is generated when using Identify by Anchor.
With Anchor, I've had issues where XModuleAttributes scanned in one browser can no longer be found when switching to another browser, specifically from IE to Chrome. With XPath, I've not had these issues.
While XPath works well to identify the properties of one element with attributes of another because it can identify the relationship between them (very common with controls in Angular applications), the same can often be accomplished by adapting the engine layer using the TBox API (i.e. building a custom control). This requires some initial work up front from developer resources, but it can significantly improve how tests steer these controls in addition to reducing the need for Automation Specialists to have to rely on XPath.
What I know is that you can identify elements with XPath when working with XML messages in Tosca API testing. Your use case seems to be UI testing, but I am not sure about that.
Did you try to use XScan to scan the page? Usually Tosca automatically calculates an XPath expression for you that you can use immediately.
Please see the manual for details.
If it still does not work please try to be more specific? What isn't working? Error message? Unexpected behavior? ...
Tosca provides its set of attributes for locating any type of elements. You can directly select any number of attributes you want to make your element unique along with index of that element. Just make sure that you are not using any dynamic values in 'id' or 'class-name' of that element, also the index range is not so large like 20 out of 100; it could be 5 out of 10, which will be helpful if you need to update it in future.
Also take help of parent elements which will be uniquely located easily and then locate your expected element.
TOSCA provide various ways to locate an element just like selenium plus in addition it will provide other properties also.Under transition properties you will find x path and it will be absolute x path since you know selenium you know the difference between absolute and relative x path. I would suggest you to go with.
1.Identify by ID OR name
2. Identify by anchor
if your relative x path is not working
Try load all properties on the right side bottom. But it showed for me without clicking on it. See here

facing issue to find xpath expression

My XPath '//div[#id='sharetools-container-div']/iframe[#id='sharetools-iframe']' is working fine, but after this tag there is '#document' text present and after this '#document' there is html tag, so when I extend the XPath expression as '//div[#id='sharetools-container-div']/iframe[#id='sharetools-iframe']/#document/html', it is throwing exception as follows:
Caused by: class org.jaxen.saxpath.XPathSyntaxException:
//div[#id='sharetools-container-div']/iframe[#id='sharetools-iframe']/#document:
70: Expected one of '.', '..', '#', '*', QName.
So please guide me how to write XPath for this.
Thanks,
Dhananjay
From what I can gather, XPath does not descend into iframes.You see, XPath expressions are tied to a particular XML document, such as an HTML document,1 that they can be evaluated against. In the browser, an iframe counts as a separate document. The <iframe> node itself is a part of the parent document; but it is merely a pointer to another document (the iframe's contents) which is completely separate.
That seems to be the gist of this email chain, and seems to fall naturally out of the fact that XPath expressions are evaluated by calling document.evaluate (that is, a member of a particular document object), as implemented in Firefox. This suggests that the overlap between the various specs defining iframes and XPath excludes traversing that document boundary in a single XPath expression — or at least that seems to be Mozilla's interpretation.
But take note that all of this is an guesswork based on Firefox's particular implementation of the XPath specification. This limitation may or may not apply to other browsers, but I would suspect that it does.
It also seems to explain why Selenium requires you to switch context from one document (the parent HTML page) to another (the iframe itself) in order to execute XPath expressions against it, as hinted at by the solution posted by #singaravelan, and others.
1But only if the HTML document is magical enough! (Not all HTML documents are well-formed XML: browsers are much more lenient than XML parsers can be; Cf. #MathiasMüller's comment.)
You haven't shown your source XML, but one thing we know for sure is that it doesn't contain an element called "#document", because that isn't a legal element name. For the same reason, you can't request an element called "#document" in your XPath expression.
You can use with different XPath to bypass the word: #document with the word: descendant
For example:
//div[#id='sharetools-container-div']/iframe[#id='sharetools-iframe']/descendant::*[1]
or something like that. It is depend on what do you want in the inner html.
First thanks to raise this question. I am also face the same problem.
with help of following line I got solved for my case.
driver.SwitchTo().Frame(driver.FindElement(By.Name("fraToc")));
Thanks.

CSS equivalent to XPath parenthetical grouping and indexing?

This question is geared towards testing via Selenium / Web Driver, though applies to general web application/development.
XPath has a very nice feature of grouping a given XPath and combining with indexing to say "give me element N for all/multiple elements returned from given XPath, specified as "(//someXpath)[n]" w/o the quotes.
I was wondering if there is a translatable equivalent in CSS. If not via standard CSS, then how about Sizzle/jQuery? If none exist, would be nice if that kind of thing be added as a CSS standard in the future. Something like a "(someCssSelector):nth-of-type(n)"
Other than that, the alternative for XPath and CSS is to be more specific in describing the DOM tree, going up the tree to get uniqueness in identifying elements (as opposed to (someShorterSimplerXpath)[n]).
You can access jquery sets like arrays: $('selector')[n]
For the relative / xpath, you can use children(), so for an xpath like //selector/foo you'd do $('selector').children('foo'). For the relative // xpath, you can use find(): for //selector//foo use $('selector').find('foo'). For .. you can use parent(): for //selector/.. use $('selector').parent()
With CSS, while there are no parent selectors, there is an nth-of-type pseudo-class (specification here). So you can do selector:nth-of-type(n).

Resources