Some websites not allowed to be parsed by xpath? - xpath

I am trying to parse one element from a website that is inside of a table. This is the exact xpath expression that I use:
[xpathParser search:#"/table[1]/tr[2]/td[1]"];
However, when I run the program, my string comes up empty. I'm wondering if the site is blocking me from parsing, or whether my expression is correct. If it helps, this is the site, and the piece I am trying to parse is the element Atlantic.
http://cluster.leaguestat.com/download.php?client_code=ahl&file_path=daily-report/daily-report.html

There are several 'atlantic' sections on the page, not sure what you mean by the element Atlantic. Your xpath expression might not be correct, as the 'tr' is not a direct descendant of table (there is a tbody in between). You might want to try //table/tbody/tr[2]/td[1], as well as the xpath checker firefox plugin to test expressions.

Related

tool for extracting xpath query from speciifed/selected node

Normally, one would use an XPath query to obtain a certain value or node. In my case, I'm doing some web-scraping with google spreadsheets, using the importXML function to update automatically some values. Two examples are given below:
=importxml("http://www.creditagricoledtvm.com.br/";"(//td[#class='xl7825385'])[9]")
=importxml("http://www.bloomberg.com/quote/ELIPCAM:BZ";"(//span)[32]")
The problem is that the pages I'm scraping will change every now and then and I understand very little about XML/XPath, so it takes a lot of trial and error to get to a node. I was wondering if there is any tool I could use to point to an element (either in the page or in its code) that would provide an appropriate query.
For example, in the second case, I've noticed the info I wanted was in a span node (hence (//span)), so I printed all of them in a spreadsheet and used the line count to find the [32] index. This takes long to load, so it's pretty inconvenient. Also, I don't even remember how I've figured the //td[#class='xl7825385'] query. Thus why I'm wondering if there is more practical method of pointing to page elements.
Some clues :
Learning XPath basics is still useful. W3Schools is a good starting point.
https://www.w3schools.com/xml/xpath_intro.asp
Otherwise, built-in dev tools of your browser can help you to generate absolute XPath. Select an element, right-click on it then >Copy>Copy XPath.
https://developers.google.com/web/tools/chrome-devtools/open
Browser extensions like Chropath can generate absolute or relative XPath for you.
https://autonomiq.io/chropath/

Identifying objects in Tosca with Xpath

I am recently brushing up my skills in TOSCA, I was working on it 2 years ago and switched to Selenium, I noticed that the new TOSCA allows identification using Xpath, and I am really familiar with it now, however, I cannot make it work in TOSCA and I am sure the object identification works because I am testing my xpath in google chrome developer tools.
Something as simple as (//*[text()='Forgot Password?'])[1] does not seem to be working. Could I be missing something?
This is the webpage I am using as reference for this example:
https://www.freecrm.com/index.html
XPath certainly can be used to identify elements of an HTML web UI in Tosca.
Since the question was originally posted, the "Forgot Password?" link at https://www.freecrm.com/index.html appears to have changed so that it's text is now "Forgot your password?" and is actually located at https://ui.freecrm.com/.
To account for that change, this answer uses "(//*[text()='Forgot your password?'])[1]" instead of the expression provided in the original post.
With the text modification, the expression works to idenfity the element in XScan after wrapping it in double quotes:
"(//*[text()='Forgot your password?'])[1]"
Some things to keep in mind when using XPath in Tosca:
It seems that XPath expressions need to be wrapped in double quotes (") so that XScan knows when to start evaluating XPath instead of using its normal rules. Looking closely at the expression that is pregenerated when XScan starts, we see that it is wrapped in double quotes:
"id('ui')/div[1]/div[1]/div[1]/a[1]"
A valid XPath expression doesn't necessarily guarantee uniqueness, so it is helpful to pay attention to any feedback messages at the bottom of XScan. There is a significant difference between "The selected element was not found" and "The selected element is not unique". The former simply indicates XScan can't find a match, the latter indicates that XScan matches successfully, but cannot uniquely identify the element.
My experience has been that it helps to explicitly identify the element to reduce the possibility of ambiguity. If the idea is to target the anchor element in order for tests to click a link, then reducing scope from any element i.e. "(//*[text()='Forgot your Password?'])[1]" to only match anchor elements with that text "//a[text()='Forgot your password?']".
In general, Tricentis (or at least the trainers with whom I have spoken) recommends using methods other than XPath to identify a target if they are available. That said, in my experience I've had better luck with XPath than with "Identify by Anchor".
An XPath expression is visible and editable in the XModuleAttribute properties without having to rescan. Personally, I find it easier to work with than the XML value of the RelativeId property that is generated when using Identify by Anchor.
With Anchor, I've had issues where XModuleAttributes scanned in one browser can no longer be found when switching to another browser, specifically from IE to Chrome. With XPath, I've not had these issues.
While XPath works well to identify the properties of one element with attributes of another because it can identify the relationship between them (very common with controls in Angular applications), the same can often be accomplished by adapting the engine layer using the TBox API (i.e. building a custom control). This requires some initial work up front from developer resources, but it can significantly improve how tests steer these controls in addition to reducing the need for Automation Specialists to have to rely on XPath.
What I know is that you can identify elements with XPath when working with XML messages in Tosca API testing. Your use case seems to be UI testing, but I am not sure about that.
Did you try to use XScan to scan the page? Usually Tosca automatically calculates an XPath expression for you that you can use immediately.
Please see the manual for details.
If it still does not work please try to be more specific? What isn't working? Error message? Unexpected behavior? ...
Tosca provides its set of attributes for locating any type of elements. You can directly select any number of attributes you want to make your element unique along with index of that element. Just make sure that you are not using any dynamic values in 'id' or 'class-name' of that element, also the index range is not so large like 20 out of 100; it could be 5 out of 10, which will be helpful if you need to update it in future.
Also take help of parent elements which will be uniquely located easily and then locate your expected element.
TOSCA provide various ways to locate an element just like selenium plus in addition it will provide other properties also.Under transition properties you will find x path and it will be absolute x path since you know selenium you know the difference between absolute and relative x path. I would suggest you to go with.
1.Identify by ID OR name
2. Identify by anchor
if your relative x path is not working
Try load all properties on the right side bottom. But it showed for me without clicking on it. See here

Standard process for creating complex xpath in protractor

I am looking for standard ways to arrive at complex xpath expressions in protractor.
For e.g. I have a complex xpath as follows:
(//*[contains(#class,'day')][normalize-space(text())='2'])[1]
Here I have to get first access to elements matching xpath
//*[contains(#class,'day')][normalize-space(text())='2']
and then pick the first from the matching ones. Any pointers?
Protractor in its documentation clearly describes any process for creating xpaths:
http://www.protractortest.org/#/style-guide [section Locator strategies].
Firstly, you shouldn't use XPath except as a last resort. I second the recommendation by #Kacper to read the style guide he posted.
However, if you're dead set on using XPath, (sometimes it is unavoidable), you can pick the first element that matches like so:
element.all(by.xpath("//*[contains(#class,'day')][normalize-space(text())='2']")).first();

How to search elements matching an xpath expression in emacs nxml-mode?

Is there a way to interactively search for a nodes that matches a given xpath expression in emacs?
I would like something similar to re-forward-search but instead of using a regular expression I'd type an xpath expression.
I don't have an answer wrt XPath queries; sorry. But you might try Icicles search search keys M-s M-s x and M-s M-s X (commands icicle-search-xml-element and icicle-search-xml-element-text-node).
These let you search the contents and the text() nodes, respectively, of top-level XML elements whose names match a regexp that you provide.
For icicle-search-xml-element, can have any of these
forms:
<ELEMENTNAME>...</ELEMENTNAME>
<ELEMENTNAME ATTRIBUTE1="..."...>...</ELEMENTNAME>
<ELEMENTNAME/>
<ELEMENTNAME ATTRIBUTE1="...".../>
You can alternatively choose to search, not the search contexts as
defined by the element-name regexp, but the non-contexts, that is, the
buffer text that is outside such elements. To do this, use `C-M-~'
during completion. (This is a toggle, and it affects only future
search commands, not the current one.)
For icicle-search-xml-element-text-node, the top-level matching elements must not have attributes. Only top-level elements of the form <ELEMENTNAME>...</ELEMENTNAME> are
matched.
HTH.
I did something like that a long time ago. I can't give you any details now, but I'll provide an overview of the approach I took.
I created some Emacs functions to interact with (query) a native XML database. I did it with a MarkLogic server once and with a Berkley DB XML database another time. One of those functions simply queried the database. Another one of the functions would send an XQuery query that included an Emacs buffer or buffer selection.
The native XML database server would process the query, return the results, and my Emacs functions would render the result in a result buffer.
This approach allowed me to query the XML with XPath and XQuery, which is a much more powerful query language that includes XPath. (I wrote about XQuery a long time ago, here: https://www.ibm.com/developerworks/library/x-xqueryxpath/)
As difficult as all of this might sound, it turned out to be surprisingly easy.

facing issue to find xpath expression

My XPath '//div[#id='sharetools-container-div']/iframe[#id='sharetools-iframe']' is working fine, but after this tag there is '#document' text present and after this '#document' there is html tag, so when I extend the XPath expression as '//div[#id='sharetools-container-div']/iframe[#id='sharetools-iframe']/#document/html', it is throwing exception as follows:
Caused by: class org.jaxen.saxpath.XPathSyntaxException:
//div[#id='sharetools-container-div']/iframe[#id='sharetools-iframe']/#document:
70: Expected one of '.', '..', '#', '*', QName.
So please guide me how to write XPath for this.
Thanks,
Dhananjay
From what I can gather, XPath does not descend into iframes.You see, XPath expressions are tied to a particular XML document, such as an HTML document,1 that they can be evaluated against. In the browser, an iframe counts as a separate document. The <iframe> node itself is a part of the parent document; but it is merely a pointer to another document (the iframe's contents) which is completely separate.
That seems to be the gist of this email chain, and seems to fall naturally out of the fact that XPath expressions are evaluated by calling document.evaluate (that is, a member of a particular document object), as implemented in Firefox. This suggests that the overlap between the various specs defining iframes and XPath excludes traversing that document boundary in a single XPath expression — or at least that seems to be Mozilla's interpretation.
But take note that all of this is an guesswork based on Firefox's particular implementation of the XPath specification. This limitation may or may not apply to other browsers, but I would suspect that it does.
It also seems to explain why Selenium requires you to switch context from one document (the parent HTML page) to another (the iframe itself) in order to execute XPath expressions against it, as hinted at by the solution posted by #singaravelan, and others.
1But only if the HTML document is magical enough! (Not all HTML documents are well-formed XML: browsers are much more lenient than XML parsers can be; Cf. #MathiasMüller's comment.)
You haven't shown your source XML, but one thing we know for sure is that it doesn't contain an element called "#document", because that isn't a legal element name. For the same reason, you can't request an element called "#document" in your XPath expression.
You can use with different XPath to bypass the word: #document with the word: descendant
For example:
//div[#id='sharetools-container-div']/iframe[#id='sharetools-iframe']/descendant::*[1]
or something like that. It is depend on what do you want in the inner html.
First thanks to raise this question. I am also face the same problem.
with help of following line I got solved for my case.
driver.SwitchTo().Frame(driver.FindElement(By.Name("fraToc")));
Thanks.

Resources