Working with node-sets in Xpath 1.0 - xpath

I'm working with CALS tables which has multiple colspec elements with tgroup element as a parent.
In xpath 2.0 the following works:
colspec/substring-before( #colwidth , '*' )
In xpath 1.0 it complains: Unexpected token - "substring-before( #colwid"
There has to be a way to accomplish this. I need to sum the number values before the asterisk so that I can convert the relative column widths to percentages. At this point in the day I can't even think of an inelegant solution.

In xpath 1.0 it complains: Unexpected
token - "substring-before( #colwid"
That's because right term of / step operator can't be a function call in XPath 1.0 (This is a feature of XPath 2.0!).
You have to develop a recursive template.

Related

xpath query url with one folder depth only

I am using this XPath query succesfully:
//div[(#class="result")]//a[contains(#href,"pinterest.com")]/#href
The URL I am using the XPath query (with simple_html_dom.php) is this one here.
Now, I would like to find results for pinterest.com/one-folder-deep-only and exclude all URLs deeper than one directory, like pinterest.com/one-folder-deep-only/this or pinterest.com/one-folder-deep-only/this/this. I have no idea if there is a way to achieve that. Have googled a lot, but not found anything. Maybe my search terms weren't the best.
Do you have any ideas? Thanks for helping me here.
I am testing the query using the Chrome XPath Helper.
"//" is to evaluate all levels/depths. Instead use only one "/" for the "a" query to only evaluate immediate children
//div[(#id="first-result")]/a[contains(#href,"url.com")]/#href
Note use of / instead of // before the "a" tag.
Try below XPath to select #href from required anchors only:
//a[contains(#href, "url.com") and not(contains(substring-after(./#href, 'url.com/'), "/"))]/#href
Solution for XPath 2.0:
//a[contains(#href, "url.com") and count(tokenize(#href, "/"))=2]/#href
Note that if in real HTML source href starts-with "http://url.com" you should specify =4 instead of =2

finding the current position of xpath match

I'm trying to get the current position of a xpath match. Here is a real world example
on this page http://newyork.backpage.com/homes-for-sale/
running the following xpath matches the 8th listing counting from top
//div[contains(#class, 'cat 93893742')]
I want to somehow get the ad position using xpath which at the time of posting this question is "8". I tried using prececeding-sibling::div but I am getting unexpected results.
Anyway to achieve this with xpath?
I'm not sure wether current version of htmlunit supports XPath 2.0, but if so you can use below expression:
index-of(//div[starts-with(#class, "cat")], //div[#class='cat 93893742'])
This will return 10 - position in common list
If you want to get position in list for specific date (Thu. May. 11) you can try:
index-of(//div[normalize-space()="Thu. May. 11"]/following::div[starts-with(#class, "cat")],//div[normalize-space()="Thu. May. 11"]/following::div[#class='cat 93893742'])
which returns 8
Some description added in #har07 answer
I think this is what you required
count(//div[contains(#class, 'cat 93893742')]/preceding-sibling::div[starts-with(#class,'cat')])+1
Lets breakdown the whole
//div[contains(#class, 'cat 93893742')]
will match the required context node which have classname = cat 93893742
/preceding-sibling::div[starts-with(#class,'cat')]
Will match the all div element starts with classname=cat just before your context node
So if we keep all those in count() it will count all div tag just before the context node So add 1 to include count of context node as well
If you want to point that element by using index like calculated above then add this
//div[starts-with(#class,'cat')][count(//div[contains(#class, 'cat 93893742')]/preceding-sibling::div[starts-with(#class,'cat')])+1]
Equal to
//div[starts-with(#class,'cat')][10] // 10 in index number
Based on this and your previous question, maybe the following XPath is what you're looking for :
count(
//div[contains(#class, 'cat 93893742')]/preceding-sibling::div[contains(#class, 'cat ')]
)+1

getting attribute via xpath query succesfull in browser, but not in Robot Framework

I have a certain XPATH-query which I use to get the height from a certain HTML-element which returns me perfectly the desired value when I execute it in Chrome via the XPath Helper-plugin.
//*/div[#class="BarChart"]/*[name()="svg"]/*[name()="svg"]/*[name()="g"]/*[name()="rect" and #class="bar bar1"]/#height
However, when I use the same query via the Get Element Attribute-keyword in the Robot Framework
Get Element Attribute//*/div[#class="BarChart"]/*[name()="svg"]/*[name()="svg"]/*[name()="g"]/*[name()="rect" and #class="bar bar1"]/#height
... then I got an InvalidSelectorException about this XPATH.
InvalidSelectorException: Message: u'invalid selector: Unable to locate an
element with the xpath expression `//*/div[#class="BarChart"]/*[name()="svg"]/*
[name()="svg"]/*[name()="g"]/*[name()="rect" and #class="bar bar1"]/`
So, the Robot Framework or Selenium removed the #-sign and everything after it. I thought it was an escape -problem and added and removed some slashes before the #height, but unsuccessful. I also tried to encapsulate the result of this query in the string()-command but this was also unsuccessful.
Does somebody has an idea to prevent my XPATH-query from getting broken?
It looks like you can't include the attribute axis in the XPath itself when you're using Robot. You need to retrieve the element by XPath, and then specify the attribute name outside that. It seems like the syntax is something like this:
Get Element Attribute xpath=(//*/div[#class="BarChart"]/*[name()="svg"]/*[name()="svg"]/*[name()="g"]/*[name()="rect" and #class="bar bar1"])#height
or perhaps (I've never used Robot):
Get Element Attribute xpath=(//*/div[#class="BarChart"]/*[name()="svg"]/*[name()="svg"]/*[name()="g"]/*[name()="rect" and #class="bar bar1"])[1]#height
This documentation says
attribute_locator consists of element locator followed by an # sign and attribute name, for example "element_id#class".
so I think what I've posted above is on the right track.
You are correct in your observation that the keyword seems to removes everything after the final #. More correctly, it uses the # to separate the element locator from the attribute name, and does this by splitting the string at that final # character.
No amount of escaping will solve the problem as the code isn't doing any parsing at this point. This is the exact code (as of this writing...) that performs that operation:
def _parse_attribute_locator(self, attribute_locator):
parts = attribute_locator.rpartition('#')
...
The simple solution is to drop that trailing slash, so your xpath will look like this:
//*/div[#class="BarChart"]/... and #class="bar bar1"]#height`

Select element with a changing Id string using XPath

I have a textarea control with an Id that goes something like this:
<textarea id="NewTextArea~~51887~~1" rows="2"/>
And the xpath that has worked before has been
//textarea[#id, "NewTextArea~~51887~~1"]
But now the '51887' portion of the id is become diverse (changing every time) so I need to select the NewtextArea~~*~~1 element without actually specifying the number. Is there a way I can wildcard part of the string so that it will match a particular pattern? I tried using starts-with and ends-with but couldn't get it to work:
//textarea[starts-with(#id, 'NewTextArea~~') and ends-with(#name, '~~1')]
Bare in mind there are other fields with the difference being the number on the end.
Any advice or guidance would be greatly appreciated :)
I tried using starts-with and ends-with but couldn't get it to work:
//textarea[starts-with(#id, 'NewTextArea~~') and ends-with(#name, '~~1')]
ends-with() is available as a standard function only in XPath 2.0 and you seem to be using XPath 1.0.
Use:
//textarea
[starts-with(#id, 'NewTextArea~~')
and
substring(#id, string-length(#id) - 2) = '~~1'
]
Explanation:
See the answer to this question, for how to implement ends-with() in XPath 1.0:
https://stackoverflow.com/a/405507/36305

Can't get nth node in Selenium

I try to write xpath expressions so that my tests won't be broken by small design changes. So instead of the expressions that Selenium IDE generates, I write my own.
Here's an issue:
//input[#name='question'][7]
This expression doesn't work at all. Input nodes named 'question' are spread across the page. They're not siblings.
I've tried using intermediate expression, but it also fails.
(//input[#name='question'])[2]
error = Error: Element (//input[#name='question'])[2] not found
That's why I suppose Seleniun has a wrong implementation of XPath.
According to XPath docs, the position predicate must filter by the position in the nodeset, so it must find the seventh input with the name 'question'. In Selenium this doesn't work. CSS selectors (:nth-of-kind) neither.
I had to write an expression that filters their common parents:
//*[contains(#class, 'question_section')][7]//input[#name='question']
Is this a Selenium specific issue, or I'm reading the specs wrong way? What can I do to make a shorter expression?
Here's an issue:
//input[#name='question'][7]
This expression doesn't work at all.
This is a FAQ.
[] has a higher priority than //.
The above expression selects every input element with #name = 'question', which is the 7th child of its parent -- and aparently the parents of input elements in the document that is not shown don't have so many input children.
Use (note the brackets):
(//input[#name='question'])[7]
This selects the 7th element input in the document that satisfies the conditions in the predicate.
Edit:
People, who know Selenium (Dave Hunt) suggest that the above expression is written in Selenium as:
xpath=(//input[#name='question'])[7]
If you want the 7th input with name attribute with a value of question in the source then try the following:
/descendant::input[#name='question'][7]

Resources