Make an xpath balanced with braces - xpath

I have a string (ideally an xpath) which I need to parse and balance the opening and closing braces like '(' , ')' , '[' , ']'
Ex: //view/section/row[(cell/data[#value='Other Roles']) and (cell/data[contains(#value,'336')]) and (cell/data[contains(#value,'0')]) and (cell/data[contains(#value,'320')]) and (cell/data[contains(#value,'16')]) and (cell/data[contains(#value,'0')]) and (cell/data[contains(#value,'0')]) and (cell/data[contains(#value,'0')]) and (cell/data[contains(#value,'0')]) and (cell/data[contains(#value,'0')]) and (cell/data[contains(#value,'0')]) and (cell/data[contains(#value,'0')])
Can you please suggest an optimized solution to add the braces or parenthesis accordingly so that xpath works
Thanks in advance

Related

XPath V1.0 contains() not specific enough

I have an application that requires me to find a XPath selector for an element and then see if that XPath can be simplified.
So if I have
<a class="abc def gh">
I may determine that the XPath
a[contains(#class, "abc")
is specific enough. The problem is, it also selects items with class "abcxyz",
Is there a way to select items with ONLY class "abc"?
i.e. I think it's clear but I want to find items that have a class of "abc" or "abc def" but not "abcxyz".
Here's a more specific example because I believe neither of the answers so far works:
<div>
<span id="x" class="btnSalePriceLabel">Sale:</span>
<span id="y" class="btnSalePrice highlight">$20.40</span>
</div>
I want whatever XPath selector will select the 2nd span and not the first.
If I try
//span[#class and contains(concat(' ', normalize-space(#class), ' '), ' btnSalesPrice ')]
I get nothing selected. Likewise with
//span[contains(concat(' ', normalize-space(#class), ' '), ' btnSalesPrice ')]
Since class attribute is a multi-valued attribute, you have to account for these spaces between the values with concat():
//a[contains(concat(' ', normalize-space(#class), ' '), ' abc ')]
Note that CSS selectors have this ability to match specific class values built-in:
a.abc
I think you can see what is more concise and readable.
it is better if you use css for this exact matches, specially with class attributes, in which case it would be:
a.abc
You can use different css-to-xpath converters on several languages (check this one for example on javascript) and its transformation would be:
descendant-or-self::a[#class and contains(concat(' ', normalize-space(#class), ' '), ' abc ')]

Can't use the right XPath expression for a certain item

Tried a lot but can't locate the item from this element using xpath.
<div class="info-list-text"><b>Contact</b>: James Crisp</div>
I tried this XPath expression, but without luck:
//div[#class="info-list-text"]/text()
Thanks in advance to take care of this problem.
Btw, I wanna get to "James Crisp"
Try this :
normalize-space( translate( //div[#class="info-list-text"]/text() , ':', '' ) )
It works as follows :
Get the text from the <div>
Translate : into empty string
Then remove any spaces

xpath expression to remove whitespace

I have this HTML:
<tr class="even expanded first>
<td class="score-time status">
<a href="/matches/2012/08/02/europe/uefa-cup/">
16 : 00
</a>
</td>
</tr>
I want to extract the (16 : 00) string without the extra whitespace. Is this possible?
I. Use this single XPath expression:
translate(normalize-space(/tr/td/a), ' ', '')
Explanation:
normalize-space() produces a new string from its argument, in which any leading or trailing white-space (space, tab, NL or CR characters) is deleted and any intermediary white-space is replaced by a single space character.
translate() takes the result produced by normalize-space() and produces a new string in which each of the remaining intermediary spaces is replaced by the empty string.
II. Alternatively:
translate(/tr/td/a, '
&#13', '')
Please try the below xpath expression :
//td[#class='score-time status']/a[normalize-space() = '16 : 00']
You can use XPath's normalize-space() as in //a[normalize-space()="16 : 00"]
I came across this thread when I was having my own issue similar to above.
HTML
<div class="d-flex">
<h4 class="flex-auto min-width-0 pr-2 pb-1 commit-title">
<a href="/nsomar/OAStackView/releases/tag/1.0.1">
1.0.1
</a>
XPath start command
tree.xpath('//div[#class="d-flex"]/h4/a/text()')
However this grabbed random whitespace and gave me the output of:
['\n ', '\n 1.0.1\n ']
Using normalize-space, it removed the first blank space node and left me with just what I wanted
tree.xpath('//div[#class="d-flex"]/h4/a/text()[normalize-space()]')
['\n 1.0.1\n ']
I could then grab the first element of the list, and use strip() to remove any further whitespace
XPath final command
tree.xpath('//div[#class="d-flex"]/h4/a/text()[normalize-space()]')[0].strip()
Which left me with exactly what I required:
1.0.1
you can check if text() nodes are empty.
/path/text()[not(.='')]
it may be useful with axes like following-sibling:: if these are no containers, or with child::.
you can use string() or the regex() function of xpath 2.
NOTE: some comments say that xpath cannot do string manipulation... even if it's not really designed for that you can do basic things: contains(), starts-with(), replace().
if you want to check whitespace nodes it's much harder, as you will generally have a nodelist result set, and most xpath functions, like match or replace, only operate one node.
you can separate node and string manipulation
So you may use xpath to retrieve a container, or a list of text nodes, and then process it with another language. (java, php, python, perl for instance).

XPath to find elements that does not have an id or class

How can I get all tr elements without id attribute?
<tr id="name">...</tr>
<tr>...</tr>
<tr>...</tr>
Thanks
Pretty straightforward:
//tr[not(#id) and not(#class)]
That will give you all tr elements lacking both id and class attributes. If you want all tr elements lacking one of the two, use or instead of and:
//tr[not(#id) or not(#class)]
When attributes and elements are used in this way, if the attribute or element has a value it is treated as if it's true. If it is missing it is treated as if it's false.
If you're looking for an element that has class a but doesn't have class b, you can do the following.
//*[contains(#class, 'a') and not(contains(#class, 'b'))]
Or if you want to be sure not to match partial.
//*[contains(concat(' ', normalize-space(#class), ' '), ' some-class ') and
not(contains(concat(' ', normalize-space(#class), ' '), ' another-class '))]
Can you try //tr[not(#id)]?

(ruby) help matching my regular expression

I am trying to match the value of the following HTML snippet:
<input name="example" type="hidden" value="matchTextHere" />
with the following:
x = response.match(/<input name="example" type="hidden" value="^.+$" \/>/)[0]
why is this not working? it doesn't match 'matchTextHere'
edit:
when i use:
x = response.match(/<input name="example" type="hidden" value="(.+)" \/>/)[0]
it matches the whole html element, and not just the value 'matchTextHere'
^ matches start of a line and $ matches end of the line. Change ^.+$ to \w+ and it will work for values that doesn't contain any symbols. Make it a parenthetical group to capture the value - (\w+)
Update: to match anything between the quotes (assuming that there aren't any quotes in the value), use [^"]+. If there are escaped quotes in the value, it is a different ballgame. .+ will work in this case, but it will be slower due to backtracking. .+ first matches upto the end of the string (because . matches even a "), then looks for a " and fails. Then it comes back one position and looks for a " and fails again - and so on until it finds the " - if there was one more attribute after value, then you will get matchTextHere" nextAttr="something as the match.
x = response.match(/<input name="example" type="hidden" value="([^"]+)" \/>/)[1]
That being said, the regex will fail if there is an extra space between any of the attribute values. Parsing html with regex is not a good idea - and if you must use regex, you can allow extra spaces using \s+
/<input\s+name="example"\s+type="hidden"\s+value="([^"]+)"\s*\/>/
Because you have a start-of-line token (^) and an end-of-line token ($) in your regular expression. I think you meant to capture the value, this might solve your problem: value="(.+?)".
Beware, though, that processing html with regular expressions is not a good idea, it can even drive you crazy. Better use an html parser instead.
You don't need the ^ and $:
x = response.match(/<input name="example" type="hidden" value=".+" \/>/)[0]
you just need to change [0] to [1]
response='<input name="example" type="hidden" value="matchTextHere" />'
puts response.match(/<input name="example" type="hidden" value="(.*?)" \/>/)[1]
matchTextHere

Resources