Xpath start scraping after certain class and stop before next class - xpath

<div class='example'>
<h4 style="" class="">Exterior Design</h4>
<p>Paragraph 1 text</p>
<p>Paragraph 2 text</p>
<h4 class="">Interior Design</h4>
<p>Paragraph 3 text</p>
<h4 class="">Accommodation</h4>
<p>Paragraph 4 text</p>
<h4 class="">Leisure & Entertainment</h4>
<p>Paragraph 5 text</p>
<p>Paragraph 6 text</p>
<h4 class="">Amenities</h4>
<p>Paragraph 7 text</p>`enter code here`
</div>
I am trying to grab any p tag text from this html between each h4 heading, problem is, some of this text uses 1 p tag, some 2 p tags, also the information appears in different orders on each page, so sometimes the "Design" text is above "Exterior Design" in the page code. I do know the names of the h4 classes though. What I am trying to do is have xpath
Find the name of an h4, grab all text underneath it, stop at the next h4 on the page, regardless of what the h4 text is.
I have lots of variations on this:
//h4[starts-with(., 'Accom')][1]/following-sibling::h4[1]/preceding-sibling::p
//h4[starts-with(., 'Accom')]/following-sibling::h4[1]/preceding-sibling::p/text()
So in this example, my output should be: "Paragraph 4 text" and nothing else at all. But I keep grabbing all the p tags on the page.

//h4[contains(., 'Interi')]/following-sibling::p[preceding-sibling::h4[1][contains(., 'Interi')]]
Result: "Paragraph 3 text"

Related

Select only the second preceding tag if there is one, or then just the first one

I have this HTML:
<h4>block 1</h4>
<p>paragraph 1</p>
<p>paragraph 2</p>
<table></table>
<h4>block 2</h4>
<p>paragraph 1</p>
<table></table>
As you can see, the first block contains two <p></p> tags, while the second block only has one.
I am currently using this XPath: //table/preceding::p[1], which returns:
1. <p>paragraph 2</p>
2. <p>paragraph 1</p>
However, this is what I'd like to have:
1. <p>paragraph 1</p>
2. <p>paragraph 1</p>
So basically the farest "preceding" table p tag, as explained in my question title.
I want to keep using //table/preceding, as this is very important in my case.
I already tried //table/preceding::p[1 or 2], but that selects both.
I also tried //table/preceding::p[2] but that will select both paragraphs from the first block, and none from the second one.
As you can probably notice, I'm pretty new to XPath. How can I achieve the desired result?
Try this one to get select desired paragraphs
//table/preceding-sibling::h4[1]/following-sibling::p[1]

How to get specific xpath tag value

<div class="container">
<span class="price">
<bdi> 140 </bdi>
</span>
<span class="price">
<del>
<bdi>90</bdi>
</del>
<ins>
<bdi> 120 </bdi>
</ins>
</span>
</div>
I want to scrape a site which html formatting like below. Here I dont want to bdi tag value which is under del tag and want bdi tag value which is under span class and ins tag. Is there any path to figure it out?
Don't pretty much usual //span/ins/bdi/text() work for you?
This is "text of <bdi> which parent is <ins> which parent is <span>"?
CSS variant span>ins>bdi::text should also work I suppose.
Sorry, haven't noticed that you need two values. In that case .xpath('//bdi[not(parent::del)]/text()').extract() will work well.

Xpath for Text of <p> tag until Following-sibling::h2[text()='some text']

I want xpath for all text of <p> tag following by first h2 until Next h2
Xpath should be like //h2[text()='Title 1']/following::p with condition
----other code-----
<h2>Title 1</h2>
<p>Title 1 text</p>
<p>Title 1 text</p>
<h2>Title 2</h2>
<p>Title 2 text</p>
<p>Title 2 text</p>
----other code-----
I expect the result as Title 1 text, but actual output is Title 1 text and Title 2 text
Note: count of the <p> tag is not fixed.
using xpath 2.0, use this.
distinct-values(//p[following-sibling::h2[text()='Title 2'] or preceding-sibling::h2[text()='Title 1']]/text())

CSS / xpath selector to find h3 tag with text in a given class?

Selector to find a element with <h3> with some text which is a descendant of a class ?
Tried with xpath="//*[#class='body']//descendant::h3[contains(text(), sampletext]
This doesn't work. Is there a way I can find this ?
<div class="body">
<h3> text1 </h3>
<p>....</p>
<h3> text2 </h3>
<p>... </p>
<h3> text3 </h3>
</div>
Selector to find <h3> tag containing text3 in className="body"?
Try this simple xpath and let me know if facing any issue
//div[#class='body']/h3[text()='text3']
OR for trimming the spaces before and after your text
//div[#class='body']/h3[normalize-space()='text3']
Below to get the element bases on partial text match
//div[#class='body']/h3[contains(.,'text3')]
You missed single quote contains(text(), sampletext)]
It should be 'sampletext'
xpath="//div[#class='body']//descendant::h3[contains(text(), 'sampletext')]"
if you want to find h3 tag
xpath="//div[#class='body']/h3[contains(text(), 'text3')]"

Xpath / find all elements which contains attribute

I want to find all elements which have an attribute that contains the word: "aut".
For example:
<div aut20="one" class="model"> Some text </div>
<span aut="two" class="model_1" ng-one="two"> Some text 2 </span>
<a class="three"> some text 2 </a>
Then the xpath query result would be <div> and <span> elements because it has "aut20" and "aut".
//#*[contains(local-name(),'aut')]/..

Resources