I trying to parse a webpage and get all the content inside a div tag named div1. I tried ('div[#class="div1"]') which gives me the content below
<div class="div1">
<p>
something something <br>
abc<br>
def
</p>
</div>
However, I am trying to get everything that is inside the div tag, not including the div tag as shown below
<p>
something something <br>
abc<br>
def
</p>
Try changing your xpath to
div[#class="div1"]/child::*
Quote from https://www.w3.org/TR/xpath/#location-paths:
child::* selects all element children of the context node
For one thing, you're looking for #id when it's #class
Related
I am trying to scrape "Description" from this HTML structure
<div class="menu-index-page__item-content">
<h6 class="menu-index-page__item-title">
<span> Item title </span>
</h6>
<p class="menu-index-page__item-desc">
<span>
<span>
<span>Description</span>
</span>
</span>
Each tag has an element with it that I don't know how to handle:
data-reactid=".3wrqgx5340.3.5.0.4:$523105.2.$3959254.$menuItemContent.1.0"
Each data-reactid is different. So if I target this attribute I will scrape stuff I don't want.
I've tried .search .xpath, using tags and classes but nothing seems to work.
Is there a way to say: give me the p tag that has a class="menu-index-page__item-desc" and scrape the 3rd span from there?
You can get the required value via xpath
//text()[contains(.,'Description')]
You code and xpath:
I have page that looks something like this:
<div>
<div>
<div>
<span class="span class one">
some text
</span>
</div>
</div>
<div>
<div>
<span class="span class two">
span i want to pick
</span>
</div>
</div>
</div>
I want to pick <span class="span class two"> by text thats in <span class="span class one">. I am not sure if it is even possible. Number of elements is not same in each tree part.
Following could be the alternative answer -
//span[normalize-space(text())='some text']/../../following-sibling::div//span
Explanation :-
//span[normalize-space(text())='some text'] is used to find the span tag with required details
/../.. will move to parent element of context node
/following-sibling::div//span will locate the span tag which in sibling element of parent div
You can select the element by the value of the class attribute with:
//span[#class='span class two']
//span[contains(., "some text")]/following::span
out:
Element='<span class="span class two">
span i want to pick
</span>'
I might've understood it differently but I'll try to give out a different answer:
//span[contains(text(),(//span[#class='span class one']/text())) and not(#class='span class one')]
which means:
//span[contains(text(), - you're looking for a span element that contains a certain text
(//span[#class='span class one']/text()))- that text is whatever is the text in span class one
and not(#class='span class one')] - but the span element should not be span class one
of course you can replace text() with a different property such as class or name or whatever... e.g. //span[contains(#class,(//span[#class='span class one']/text()))]
Try this way, as you were mentioned that you want to create xpath along with span class one
//span[text()= 'some text']/following::span[#class='span class two']
Explanation of xpath:- Use text method along with <span> tag and move ahead with another <span> tag using following keyword.
I want to get the text within a certain HTML tag. It looks like:
<div id="data123">data1: value1<br>data2: value2<br> data3: value</div>
My code looks like:
html_page = Nokogiri::HTML open 'my_url'
who_is_raw = html_page.css('div#data123')[0] #.text
I get either the text within the <div> tag without <br> tags or the whole <div> with all <br> inside. But, I want only the text within that <div> tag and <br> tags inside it.
How do I do that?
Try with inner_html
who_is_raw = html_page.css('div#data123')[0].inner_html
HTML structure looks like this:
<div class="Parent">
<div id="A">more tags and text</div>
<div id="B">more tags and text</div>
more tags
<p> and text </p>
</div>
I would like to extract text just from the parent and the tags apart from the A and B children.
I have tried
/div[#class='Parent']//text()
which extracts text from all the descendant nodes, so a made a constraint like /div[#class='Parent']//text()[not(self::div)]
but it did not change a thing.
Thanks for any advice
/div[#class='Parent']/*[not(self::div and (#id='A' or #id='B'))]//text() | /div[#class='Parent']/text()
This is the code:
<li>
<a>
<h1>Quorn StukĀjes</h1>
<p class="price">
</a>
<form>
<button type="submit">+</button>
</form>
</li>
I want to create a locator that finds the first <h1> that has an sibling element <p> with an attribute "price". Easy so far. But now I also want that <h1> to share its grandparent with a <button> class with the attribute type "submit".
What I created was the following:
//a/p[#class="price"]/preceding-sibling::p/preceding-sibling::h1
I'm wondering if this is the most sensible solution (it does work), or if there is something more elegant and robust.
(//*[form/button[#type = 'submit']]/*[p[#class = 'price']]/h1)[1] should do (assuming a submit button only makes sense in a form parent element).