Let's say I have this html that has various depths of descendants and a mixture of element types:
<div class="foo">
<div class="bar"></div>
</div>
<div class="foo">
<div class="baz"></div>
</div>
<div class="foo">
<u><span class="duh">
<div class="bar"></div>
</span></u>
</div>
<div class="foo">
<div class="baz"></div>
</div>
And I want to apply a class of bex to all the foos that contain classes of bar so it looks like:
<div class="bex">
<div class="bar"></div>
</div>
<div class="foo">
<div class="baz"></div>
</div>
<div class="bex">
<u><span class="duh">
<div class="bar"></div>
</span></u>
</div>
<div class="foo">
<div class="baz"></div>
</div>
How wld I do that with ruby/nokogiri? Tried all sorts of things and can't quite get it. Thanks.
Edit: closed the duh, oops.
I spent a long time wondering why the second foo wasn't found.
Your data is broken, "duh isn't closed.
To select the nodes, you can use :
doc.xpath("//div[#class='foo' and .//div[#class='bar']]")
As an example :
data = %q(<div class="foo">
<div class="bar"></div>
</div>
<div class="foo">
<div class="baz"></div>
</div>
<div class="foo">
<u><span class="duh">
<div class="bar"></div>
</span></u>
</div>
<div class="foo">
<div class="baz"></div>
</div>)
require 'nokogiri'
doc = Nokogiri.HTML(data)
doc.xpath("//div[#class='foo' and .//div[#class='bar']]").each do |node|
node["class"] = 'bex'
end
puts doc
Related
I try to extract all links based on these three conditions:
Must be part of <div data-test="cond1">
Must have a <a href="..." class="cond2">
Must not have a <img src="..." class="cond3">
The result should be "/product/1234".
<div data-test="test1">
<div>
<div data-test="cond1">
Link 1
<div class="test4">
<div class="test5">
<div class="test6">
<div class="test7">
<div class="test8">
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
<div data-test="test2">
<div>
<div data-test="cond1">
Link 2
<div class="test4">
<div class="test5">
<div class="test6">
<div class="test7">
<div class="test8">
<img src="bild.jpg" class="cond3">
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
I'm able to extract the links with the following xpath query.
//div[starts-with(#data-test,"cond")]/a[starts-with(#class,"cond")]/#href
(I know the first part is not really neccessary. But better safe than sorry.)
But I'm still struggling with excluding the links containing an descendant img tag and how to add it to the query above.
This should do what you want:
//div[#data-test="cond1" and not(.//img[#class="cond3"])]
/a[#class="cond2"]
/#href
/product/1234
<div class="table">
<div class="table-head">
<div class="table-head-title">Ranking Equipos</div>
</div>
<div class="table-body">
<div class="table-body-row active">
<div class="col-key">Mark</div>
<div class="col-value">9233</div>
</div>
<div class="table-body-row">
<div class="col-key">Amanda</div>
<div class="col-value">7216</div>
</div>
<div class="table-body-row">
<div class="col-key">Mark</div>
<div class="col-value">6825</div>
</div>
<div class="table-body-row">
<div class="col-key">Paul</div>
<div class="col-value">6184</div>
</div>
<div class="table-body-row">
<div class="col-key">Amanda</div>
<div class="col-value">5866</div>
</div>
</div>
</div>
This is my HTML and I want to get last child of .table-body.
I tried to use JavaScript like logic and used indexing like this
$lastChild = $node->filter('.table-body .table-body-row')[4]; but it shows error. Cannot use object of type "Symfony\Component\DomCrawler\Crawler" as array
I was stuck in similar situation recently and I resolve this by using last() method. Syntax is here: $node->filter('.table-body .table-body-row')->last();
<div class="a">
<div class="a random number of div wrapers">
<div>Random1<em>Median</em>
<div class="b">
<div class="c">Edit</div>
</div>
</div>
<div>Random2<em>Median</em></div>
<div>
<em>Median</em>
</div>
<div>Random3<em>Median</em></div>
<div>Random4<em>Median</em>
<div>Random4<em>Median</em></div>
</div>
</div>
<div class="a">
<div class="a random number of div wrapers">
<div>Random1<em>Median</em></div>
<div>Random2<em>Median</em></div>
<div>
<em>Median</em>
</div>
<div>Random3<em>Median</em>
<div class="b">
<div class="c">Edit</div>
</div>
</div>
<div>Random4<em>Median</em>
</div>
</div>
In this case, how to get the two nodes contains 'Median' that doesn't have text before it using XPath?
I prefer not using the index because the node position could be random.
Maybe try:
//*[.='Median'][not(preceding-sibling::text()[normalize-space()])]
The question is simple but I don't have enough practice for this case :)
How to get price text value from every div within "block" if we know that we need only item_promo elements.
<div class="block">
<div class="item_promo">item</div>
<div class="item_price">123</div>
</div>
<div class="block">
<div class="item_promo">item</div>
<div class="item_price">456</div>
</div>
<div class="block">
<div class="item_promo">item</div>
<div class="item_price">789</div>
</div>
<div class="block">
<div class="item">item</div>
<div class="item_price">222</div>
</div>
<div class="block">
<div class="item">item</div>
<div class="item_price">333</div>
</div>
You could use the xpath :
//div[#class='block']/*[#class='item_promo']/following-sibling::div[#class='item_price']/text()
You look for div elements that has attribute class with value item_promo and look at its following sibling which has an attribute item_price and grab the text.
This XPath,
//div[div/#class='item_promo']/div[#class='item_price']
will return those item_price class div elements with sibling item_promo class div elements:
<div class="item_price">123</div>
<div class="item_price">456</div>
<div class="item_price">789</div>
This will work regardless of label/price order.
This one has me stumped., I'm trying to select the first class = csb-quantity-listbox object of the below using the XPATH //select[#class='csb-quantity-listbox'][1], but instead of selecting the first quantity listbox it's selecting ALL the listboxes on the page with that class (see image below).
What am I doing wrong?
<div class="gwt-product-detail-products-container">
<div class="gwt-product-detail-products-header-column">
</div>
<div id="gwt-product-detail-widget-id-12766" class="gwt-product-detail-widget">
<div class="gwt-product-detail-widget-image-column ui-draggable" title="12766">
<div class="gwt-product-detail-widget-options-column">
</div>
<div class="gwt-product-detail-widget-price-column">
</div>
<div class="gwt-product-detail-widget-quantity-panel">
<select class="csb-quantity-listbox" name="quantity_12766"></select>
</div>
<div class="gwt-bundle-add-to-cart-btn">
</div>
</div>
</div>
<div id="gwt-product-detail-widget-id-10617" class="gwt-product-detail-widget">
<div class="gwt-product-detail-widget-image-column ui-draggable" title="10617">
<div class="gwt-product-detail-widget-options-column">
</div>
<div class="gwt-product-detail-widget-price-column">
</div>
<div class="gwt-product-detail-widget-quantity-panel">
<select class="csb-quantity-listbox" name="quantity_10617"></select>
</div>
<div class="gwt-bundle-add-to-cart-btn">
</div>
</div>
</div>
</div>
Image:
You just need to put brackets around the statement before the [1]
Like so:
(//select[#class='csb-quantity-listbox'])[1]