Xpath Get elements that are between 2 elements - xpath

can anyone tell me if its possible to select only the divs 2a and 2b from this html fragment?
the problem is that the divs are not children of h4 element and so the xpath query should say like "get the divs that are between the h4='Two' and the h4 that is right after h4='Two'
note that i want the query to be dynamic and u tell her the start element (h4='Two') and the end element (any h4) and then on which filter to get the elements between.
<h4>One</h4>
<div>1a</div>
<div>1b</div>
<div>1c</div>
<h4>Two</h4>
<div>2a</div>
<div>2b</div>
<h4>Three</h4>
<div>3a</div>
<div>3b</div>
<div>3c</div>

div[preceding-sibling::h4[1] = 'Two']

div[preceding-sibling::h4='Two' and following-sibling::h4='Three']

Related

XPath with specific following sibling case

I have structure that looks something like this
<p>
<br>
<b>Text to fetch </b>
<br>
"Some random text"
<b>Text not to fetch</b>
I need XPath that will allow me to fetch following sibling of the br element only if there is no text between br element and his following sibling.
If I do something like this
//br/following-sibling::b/text()[1]
It will fetch both Text to fetch and Text not to fetch, while I only need Text to fetch.
Another possible XPath :
//br/following-sibling::node()[normalize-space()][1][self::b]/text()
brief explanation:
//br/following-sibling::node(): find all nodes that is following-sibling of br element, where the nodes are..
[normalize-space()]: not empty (whitespace only), then..
[1]: for each br found, take only the first of such node, then..
[self::b]: check if the node is a b element, then if it is a b element..
/text(): return text node that is child of the b element
Try below XPath to avoid matching b nodes with preceding sibling text:
//br/following-sibling::b[not(preceding-sibling::text()[1][normalize-space()])]/text()

XPath selecting specific child element

I've some problem with Xpath syntax with html. I want to select an item which is into a div.
I have a Div define by an id : "popin".
In this div, I have a span with his id is "id_yes".
I can get the div with //DIV[contains(#id ,'popin')] but I failed to get the span element.
Have you a solution ?
If you have the ID, you can use:
//span[#id="id_yes"]
If you want to be more specific, //div[#id="popin"]/span[#id="id_yes"]
That, assuming your IDs are unique.

using xpath to select an element after another

I've seen similar questions, but the solutions I've seen won't work on the following. I'm far from an XPath expert. I just need to parse some HTML. How can I select the table that follows Header 2. I thought my solution below should work, but apparently not. Can anyone help me out here?
content = """<div>
<p><b>Header 1</b></p>
<p><b>Header 2</b><br></p>
<table>
<tr>
<td>Something</td>
</tr>
</table>
</div>
"""
from lxml import etree
tree = etree.HTML(content)
tree.xpath("//table/following::p/b[text()='Header 2']")
Some alternatives to #Arup's answer:
tree.xpath("//p[b='Header 2']/following-sibling::table[1]")
select the first table sibling following the p containing the b header containing "Header 2"
tree.xpath("//b[.='Header 2']/following::table[1]")
select the first table in document order after the b containing "Header 2"
See XPath 1.0 specifications for details on the different axes:
the following axis contains all nodes in the same document as the context node that are after the context node in document order, excluding any descendants and excluding attribute nodes and namespace nodes
the following-sibling axis contains all the following siblings of the context node; if the context node is an attribute node or namespace node, the following-sibling axis is empty
You need to use the below XPATH 1.0 using the Axes preceding.
//table[preceding::p[1]/b[.='Header 2']]

Use Nokogiri to get all nodes in an element that contain a specific attribute name

I'd like to use Nokogiri to extract all nodes in an element that contain a specific attribute name.
e.g., I'd like to find the 2 nodes that contain the attribute "blah" in the document below.
#doc = Nokogiri::HTML::DocumentFragment.parse <<-EOHTML
<body>
<h1 blah="afadf">Three's Company</h1>
<div>A love triangle.</div>
<b blah="adfadf">test test test</b>
</body>
EOHTML
I found this suggestion (below) at this website: http://snippets.dzone.com/posts/show/7994, but it doesn't return the 2 nodes in the example above. It returns an empty array.
# get elements with attribute:
elements = #doc.xpath("//*[#*[blah]]")
Thoughts on how to do this?
Thanks!
I found this here
elements = #doc.xpath("//*[#*[blah]]")
This is not a useful XPath expression. It says to give you all elements that have attributes that have child elements named 'blah'. And since attributes can't have child elements, this XPath will never return anything.
The DZone snippet is confusing in that when they say
elements = #doc.xpath("//*[#*[attribute_name]]")
the inner square brackets are not literal... they're there to indicate that you put in the attribute name. Whereas the outer square brackets are literal. :-p
They also have an extra * in there, after the #.
What you want is
elements = #doc.xpath("//*[#blah]")
This will give you all the elements that have an attribute named 'blah'.
You can use CSS selectors:
elements = #doc.css "[blah]"

XPath query. Preceding-sibling of a conditionally reduced set of nodes

I got html code like the following:
<p style="margin:0 0 0.5em 0;"><b>Blablub</b></p>
<table> ... </table>
Now I want to query the content of the <b> right above the table but only if the table does not have any attributes. I tried the following query:
//table[not(#*)]/preceding-sibling::p/b
If I remove the preceding-sibling::p/b part entirely it works. It gives me exactly the tables I need. However, if I use this query it gives me content of an <b> tag which precedes a table WITH attributes.
Use:
//table[not(#*)]/preceding-sibling::*[1][self::p]/b
This means: Select all b elements that are children of all p elements that are the first preceding sibling of a table that has no attributes.
This is quite different from the problematic expression cited in the question:
//table[not(#*)]/preceding-sibling::p[1]/b
The latter selects the b children of the first p following sibling -- there is no guarantee that the first p following sibling is also the first element sibling.

Resources