reject li dom element having specific attributes - ruby

I am trying to get scrape a page and get dom elements which is a collection on links with Ruby and Nokogiri. So I have a collection of li's which has a specific attributes in some li's. I need to reject those li;s which has specific attributes and get all the link tags of those li's.
Here is my DOM looks like.
<ul>
<li class="carousel-list-item">
<a itemprop="url" data-cr="CharNav23" class="property-icon property-icon-14" href="/max-and-shred/">
<div itemprop="name" class="property-tooltip">
Max & Shred
</div>
</a>
</li>
<li class="carousel-list-item">
<a itemprop="url" data-cr="CharNav24" class="property-icon property-icon-19" href="/rabbids-invasion/">
<div itemprop="name" class="property-tooltip">
Rabbids Invasion
</div>
</a>
</li>
<li data-sponsor="Sponsor" class="carousel-list-item">
<a itemprop="url" data-cr="CharNav21" class="property-icon property-icon-40" target="_blank" href="http://pubads.g.doubleclick.net/gampad/clk?id=47616903&iu=8675">
<div itemprop="name" class="property-tooltip">
LEGO Friends
</div>
</a>
</li>
<li class="carousel-list-item">
<a itemprop="url" data-cr="CharNav24" class="property-icon property-icon-19" href="/rubyds-investment/">
<div itemprop="name" class="property-tooltip">
Rabbids Invasion
</div>
</a>
</li>
</ul>
I need to collect all a tags whose lis dont have data-sponsor="Sponsor" attributes. I tried like the below but it includes all lis.
page.search('ul.carousel-list > li > a').map{ |link| make_absolute(link['href']) }

The css way to do that is:
page.search('li:not([data-sponsor]) a')
or
page.search('li:not([data-sponsor=Sponsor]) a')
Probably a better option than xpath.

You should try:
# this will give you all ul elements which has no attribute named 'data-sponsor'.
page.search('//ul[#class="carousel-list"]/li[not(#data-sponsor)]/a').map{ |link| make_absolute(link['href']) }

Related

How to extract the value of an neighbour attribute node via XPath?

I have two different web pages and I want to extract some value using XPath.
What request can extract 2386028 from the first page and at the same time can extract 4019606 from the second page? I need one request that can universally extract that values.
First page fragment:
<ul class="g-ul b-properties">
<li class="b-properties__header">General</li>
<li class="b-properties__item">
<span class="b-properties__label">
<span>VendorCode</span>
</span>
<span class="b-properties__value">2386028</span>
</li>
<li class="b-properties__item">...</li>
<li class="b-properties__item">...</li>
<li class="b-properties__item">...</li>
and second page fragment:
<div class="b-properties-holder" id="tab_3">
<ul class="g-ul b-properties">
<li class="b-properties__header">General</li>
<li class="b-properties__item">
<span class="b-properties__label">
<span>Trademark</span>
</span>
<span class="b-properties__value">
<a class="link b-properties-link" href="/trademark/moist-diane/?sort=-date&currency=USD">Moist Diane</a>
</span>
</li>
<li class="b-properties__item">
<span class="b-properties__label">
<span>VendorCode</span>
</span>
<span class="b-properties__value">4019606</span>
</li>
<li class="b-properties__item">...</li>
<li class="b-properties__item">...</li>
You can select the <li> element, which has <span class="b-properties__label">element that contains <span> with value VendorCode, and then get value of <span class="b-properties__value"> under that <li> element.
For example:
//li[span[#class="b-properties__label"]/span="VendorCode"]/span[#class="b-properties__value"]/text()
Alternatively, you can select the <span class="b-properties__label">element , which has <span> with value VendorCode, and get its following sibling.
//span[#class="b-properties__label" and span="VendorCode"]/following-sibling::span/text()

XPath: Select any div that contains one or more descendant divs with a specific class

Assume that following HTML snippet exists somewhere in the <body> element of a web page:
<div id="root_1000" class="root bacon">
<ul>
<li id="item_1234567" class="active">
<div class="userpost author_4281">
<div>This text should be visible.<div>
</div>
<ul><li>Some item</li></ul>
</li>
</ul>
</div>
<div id="root_2000" class="root bacon">
<ul>
<li id="item_8675309" class="active">
<div class="userpost author_3333">
<div>
This text, and as the DIV.root that contains it, should be hidden.
<div>
</div>
<ul><li>Another item</li></ul>
</li>
</ul>
</div>
<div id="root_3000" class="root bacon">
<ul>
<li id="item_7654321" class="active">
<div class="userpost author_9877">
<div>This text should be visible.<div>
</div>
<ul><li>Yet another item</li></ul>
</li>
</ul>
</div>
So here's my question: what would the XPath syntax be to select the div.root that contains info posted by author #3333 (i.e. div[class~="author_3333"])?
The following XPath statement will properly match the div.userpost element associated with author #3333 that I want to hide, but does not include the <ul><li>Another item</li></ul> node, which I also need to hide:
.//div[contains(#class, 'author_3333')]
What I want to do is select the closest div.root ancestor associated with the node that my XPath statement matches. Any help would be greatly appreciated... thanks in advance!
you need to get the parent node that has the second div as its child, something like:
//div[.//div[contains(#class, "author_3333")]]
You can use this XPath expression:
.//div[contains(#class, 'author_3333')]/ancestor::div[contains(#class,'root')][1]
Output is:
<div id="root_2000" class="root bacon">
<ul>
<li id="item_8675309" class="active">
<div class="userpost author_3333">
<div>
This text, and as the DIV.root that contains it, should be hidden.
</div>
</div>
<ul>
<li>Another item</li>
</ul>
</li>
</ul>
</div>

my drop down menu is horizontal. i need it vertical

please help me i cant make my drop down list vertical. when I hover over a list it is horizontal.
my html code
<div id="header">
<div>
<img src="logo.png" alt="LOGO" height="115" width="115px" />
<ul id="navigation">
<li class="active">
Home
</li>
<li>
What We Offer
</li>
<li>
Solutions
<ul>
<li>
Inbound
</li>
<li>
Outbound
</li>
</ul>
</li>
<li>
About
</li>
<li>
Contact Us
</li>
</ul>
</div>
</div>
css
I can't see your CSS, but did you apply display: inline to both the top-level AND sub-level menu items? This will cause the problem you describe.
The top-level li items should be display: inline, but their children should be display: block.
See this example: https://jsfiddle.net/tLqrrfy0/

Xpath - Get parent class by matching two child nodes

I'd like to use xpath to select a link whose class="watchListItem", span="icon icon_checked", and h3="a test". I can use xpath to get either matching link and span, or link and h3, but not link, span, and h3.
Here's what I've tried:
//*[#class = 'watchListItem']/span[#class = 'icon icon_checked']
//*[#class= 'watchListItem']/h3[text()='AA']
I'm looking for something like this:
//*[#class = 'watchListItem']//*[span[#class = 'icon icon_checked'] and h3[text()='AA']]
<li>
<a class="watchListItem" data-id="thisid1" href="javascript:void(0);">
<span class="icon icon_checked"/>
<h3 class="itemList_heading">a test</h3>
</a>
</li>
<li>
<a class="watchListItem" data-id="thisid2" href="javascript:void(0);">
<span class="icon icon_unchecked"/>
<h3 class="itemList_heading">another test</h3>
</a>
</li>
<li>
<a class="watchListItem" data-id="thisid3" href="javascript:void(0);">
<span class="icon icon_checked"/>
<h3 class="itemList_heading">yet another test</h3>
</a>
</li>
You can use the child:: location paths like so:
//a[#class="watchListItem"
and child::span[#class="icon icon_checked"]
and child::h3[text()="another test"]]
This would select the anchor with data-id="thisid3".

xpath selectors

I have the following HTML:
<ul>
<li>
<p class="channel-show-time">Test 1</p>
</li>
<li>
<p class="channel-show-time">Test 2</p>
</li>
<li><span class="channel-show-carousel-label">Next</span>
<p class="channel-show-time">Test 3</p>
</li>
<li>
<p class="channel-show-time">Test 4</p>
</li>
</ul>
I want to select the text in the <p> tags from the preceding li to the li
with span class 'channel-show-carousel-label' so I want the text 'Test 2'.
I have the xpath that selects the text in the <p> tag for the li with the span class, i.e:
xpath=//ul/li/span[#class='channel-show-carousel-label']/../p
Does anyone know how I can achieve this?
You can use the following XPath:
//span[#class="channel-show-carousel-label"]/../preceding-sibling::li[1]/p/text()
It says: find the span with the desired class, go to its parent (li), find the nearest preceding li sibling, go to its p child and return its text.

Resources