Stop search after matching certain element (XPath) - xpath

I need to match elements with block and result until the first occurrence of the element with class block but NOT result.
XPath:
//div[contains(#class, 'result') and contains(#class ,'block')][following-sibling::div[contains(#class, 'block') and not(contains(#class,
'result'))]]
Example 1 (works in this case):
<div class="block result"></div> <!-- match this -->
<div class="block result"></div> <!-- match this -->
<div class="block"></div>
<div class="block result"></div> <!-- DONT match this -->
<div class="block result"></div> <!-- DONT match this -->
Example 2 (doesn't match anything here)
<div class="block result"></div> <!-- match this -->
<div class="block result"></div> <!-- match this -->
... so it doesn't matching anything in the second example. Can I make the following optional so it matches in both conditions?

Not pretty, but should work...
//div[contains(concat(' ', #class, ' '), ' block ') and contains(concat(' ', #class, ' '), ' result ') and not(preceding-sibling::div[contains(concat(' ', #class, ' '), ' block ') and not(contains(concat(' ', #class, ' '), ' result '))])]
It should match divs with both the block and result classes but only if they don't have a preceding sibling div that contains a block class with no result class.
See answers in this question to see why I'm using concat():
How can I match on an attribute that contains a certain string?

EDIT:
You can put in a List all which contain 'block' and after check with getAttribute("class") if contains "result" until not contain.
List<WebElement> aux = driver.findElements(By.xpath("//div[contains(#class ,'block')]"));
for(WebElement a : aux) {
if(a.getAttribute("class").contains("result")) {
System.out.println(a.getText()); //Save in other list
}else {
break;
}
}

Related

xpath:how to find a node that not contains text?

I have a html like:
...
<div class="grid">
"abc"
<span class="searchMatch">def</span>
</div>
<div class="grid">
<span class="searchMatch">def</span>
</div>
...
I want to get the div which not contains text,but xpath
//div[#class='grid' and text()='']
seems doesn't work,and if I don't know the text that other divs have,how can I find the node?
Let's suppose I have inferred the requirement correctly as:
Find all <div> elements with #class='grid' that have no directly-contained non-whitespace text content, i.e. no non-whitespace text content unless it's within a child element like a <span>.
Then the answer to this is
//div[#class='grid' and not(text()[normalize-space(.)])]
You need a not() statement + normalize-space() :
//div[#class='grid' and not(normalize-space(text()))]
or
//div[#class='grid' and normalize-space(text())='']

XPath V1.0 contains() not specific enough

I have an application that requires me to find a XPath selector for an element and then see if that XPath can be simplified.
So if I have
<a class="abc def gh">
I may determine that the XPath
a[contains(#class, "abc")
is specific enough. The problem is, it also selects items with class "abcxyz",
Is there a way to select items with ONLY class "abc"?
i.e. I think it's clear but I want to find items that have a class of "abc" or "abc def" but not "abcxyz".
Here's a more specific example because I believe neither of the answers so far works:
<div>
<span id="x" class="btnSalePriceLabel">Sale:</span>
<span id="y" class="btnSalePrice highlight">$20.40</span>
</div>
I want whatever XPath selector will select the 2nd span and not the first.
If I try
//span[#class and contains(concat(' ', normalize-space(#class), ' '), ' btnSalesPrice ')]
I get nothing selected. Likewise with
//span[contains(concat(' ', normalize-space(#class), ' '), ' btnSalesPrice ')]
Since class attribute is a multi-valued attribute, you have to account for these spaces between the values with concat():
//a[contains(concat(' ', normalize-space(#class), ' '), ' abc ')]
Note that CSS selectors have this ability to match specific class values built-in:
a.abc
I think you can see what is more concise and readable.
it is better if you use css for this exact matches, specially with class attributes, in which case it would be:
a.abc
You can use different css-to-xpath converters on several languages (check this one for example on javascript) and its transformation would be:
descendant-or-self::a[#class and contains(concat(' ', normalize-space(#class), ' '), ' abc ')]

xpath search for class and text - combine two xpath selector

I know there are tousands of simple question about xpath but i dont get it how to combine two not too simple expressions...
My xml structure:
<div class="some-container">
<div class="btn btn-blue">
<div class="btn-text"><!-- Select by class -->
<span> <!-- Select by text-->
Download
</span>
</div>
</div>
</div>
Select by class
I know achieved to select the div by searching after the class:
//*/div[contains(concat(' ', #class, ' '), ' btn-text')]
Select by text
To select also the span i know i can simply add /span but then i want to select by text.
For that usecase i got the xpath (form here):
//*/text()[normalize-space(.)='Download']/parent::*
Those selector both are working properly but i want to combine them
Search for class "btn"
Search for text inside span that exactly matches
I tried to concat like that but that dont work test-example:
//div[contains(concat(' ', #class, ' '), ' btn-text')]/text()[normalize-space(.)='Download']/parent::*
even if it'd work there is no selecting by span tag
Anyone who could help?
Your XPath is looking for a text node directly inside the div, but the text node you're looking for is inside a span. That's why it's not succeeding.
To get it to work, just change the XPath to look for the span and not the text node:
//div[contains(concat(' ', #class, ' '), ' btn-text ')]/span[normalize-space(.)='Download']

preg_match_all skippes one nested tag

if you look at this tag:
$text = '<div class="inner">
<div class="left">
<h4>text </h4>
<p>Abdijstreet 42b<br>2000 city </p>
</div>
<div class="right">
<span class="red">10:00 - 14:00</span>
</div>
</div>'
I use this to preg_match:
preg_match_all("'<div class=\"inner\">(.*?)</div>'si", $text, $match); // de ul tags
$match[1] = array_splice($match[0], 0);
foreach($match[1] as $val) // hele pagina
{
echo $val;
}
Well i tried many things, but i only get whats between and never what i need for , what am i doing wrong?
Are you trying to get everything between the beginning and ending div tags? If so, then you're really close. All you'd need to do is just remove the question mark ? from your expression. The question mark tells the script to stop matching once it finds the next item in the REGEX. In this case, the next item is a closing div tag. So once it finds it, it stops. If you leave it out, it will keep matching until it hits the last div tag it can find.
$text = '<div class="inner">
<div class="left">
<h4>text </h4>
<p>Abdijstreet 42b<br>2000 city </p>
</div>
<div class="right">
<span class="red">10:00 - 14:00</span>
</div>
</div>';
preg_match_all("'<div class=\"inner\">(.*)</div>'si", $text, $match);
print "<pre><font color=red>"; print_r($match); print "</font></pre>";
If you're trying to pull out each item in a div, then you'd probably want to consider using DOM instead of REGEX to tackle this problem. But since you used the preg-match tag, then here it is in REGEX:
preg_match_all('~<div class="(?!inner).*?>\K(.*?)(?=</div>)~ims', $text, $matches);
print "<PRE><FONT COLOR=BLUE>"; print_r($matches[1]); print "</FONT></PRE>";
That gives you this:
Array
(
[0] =>
<h4>text </h4>
<p>Abdijstreet 42b<br>2000 city </p>
[1] =>
<span class="red">10:00 - 14:00</span>
)
Explanation of the REGEX:
<div class=" (?!inner) .*? > \K (.*?) (?=</div>)
^ ^ ^ ^ ^ ^ ^
1 2 3 4 5 6 7
<div class=" Look for a literal opening div tag <div, followed by a space, followed by the word class, followed by an equal sign, followed by a quotation mark.
(?!inner) This is a negative lookahead (?!) that makes sure the word inner is not coming up next.
.*? Matches any one character ., zero or more times *, all the way up until it hits the next item in our regular expression ?. In this case, it will stop once it finds a closing HTML bracket.
> Find a closing HTML bracket.
\K This tells the expression to forget everything it has matched so far and start matching again from here. This basically makes sure that the first part of the expression is there, but does not store it for us to work with.
(.*?) Same as number 3, except we use parenthesis () around it so we can capture it and do something with it later.
(?=</div>) This is a positive lookahead (?=) that makes sure the closing div tag </div> is coming up at the end of the expression, but does not capture it.
Here is a working demo of the code above

XPath Expressions - find elements without specifying the entire path

I am doing some screen scraping with a library that takes XPath expressions and noticed that several pages are similar, but different.
Is there a way to loosely say "get me divs that have class='mytarget' but exist as a child of a div with class = 'nav' and the exact path is unknown between nav and mytarget."
<div class="nav">
<div>
??????
<div class="mytarget"></div>
??????
</div>
</div>
Yes, using the descendant-or-self axis (//):
//div[#class='nav']//div[#class='mytarget']
Or, if there can be more than one class name on those elements, then this is even better:
//div[contains(concat(' ', #class, ' '), ' nav ')]//
div[contains(concat(' ', #class, ' '), ' mytarget ')]
Warning: this can be very inefficient on large documents. You should use absolute paths wherever the structure is known. Only resort to // when the structure is unknown.
Thats what "//" expressions are for. Something like:
//*[#class="nav"]//*[#class="mytarget"]
http://www.w3schools.com/xpath/xpath_syntax.asp

Resources