xpath selectors - xpath

I have the following HTML:
<ul>
<li>
<p class="channel-show-time">Test 1</p>
</li>
<li>
<p class="channel-show-time">Test 2</p>
</li>
<li><span class="channel-show-carousel-label">Next</span>
<p class="channel-show-time">Test 3</p>
</li>
<li>
<p class="channel-show-time">Test 4</p>
</li>
</ul>
I want to select the text in the <p> tags from the preceding li to the li
with span class 'channel-show-carousel-label' so I want the text 'Test 2'.
I have the xpath that selects the text in the <p> tag for the li with the span class, i.e:
xpath=//ul/li/span[#class='channel-show-carousel-label']/../p
Does anyone know how I can achieve this?

You can use the following XPath:
//span[#class="channel-show-carousel-label"]/../preceding-sibling::li[1]/p/text()
It says: find the span with the desired class, go to its parent (li), find the nearest preceding li sibling, go to its p child and return its text.

Related

UL LI how to remove padding?

Is there a way to remove the empty space from just before list (where the blue line is on the picture)? I would be very grateful is somebody could help me with this.
<p class="has-text-align-justify"><b>List:</b>
<ul>
<li> Text1</li>
<li> Text2</li>
<li> Text3</li></ul>
</p>
How it looks
<p class="has-text-align-justify"><b>List:</b>
<ul style="margin-top:-10px;">
<li> Text1</li>
<li> Text2</li>
<li> Text3</li>
</ul>
</p>
Sorry. Forgot to add the changes

XPath: Select any div that contains one or more descendant divs with a specific class

Assume that following HTML snippet exists somewhere in the <body> element of a web page:
<div id="root_1000" class="root bacon">
<ul>
<li id="item_1234567" class="active">
<div class="userpost author_4281">
<div>This text should be visible.<div>
</div>
<ul><li>Some item</li></ul>
</li>
</ul>
</div>
<div id="root_2000" class="root bacon">
<ul>
<li id="item_8675309" class="active">
<div class="userpost author_3333">
<div>
This text, and as the DIV.root that contains it, should be hidden.
<div>
</div>
<ul><li>Another item</li></ul>
</li>
</ul>
</div>
<div id="root_3000" class="root bacon">
<ul>
<li id="item_7654321" class="active">
<div class="userpost author_9877">
<div>This text should be visible.<div>
</div>
<ul><li>Yet another item</li></ul>
</li>
</ul>
</div>
So here's my question: what would the XPath syntax be to select the div.root that contains info posted by author #3333 (i.e. div[class~="author_3333"])?
The following XPath statement will properly match the div.userpost element associated with author #3333 that I want to hide, but does not include the <ul><li>Another item</li></ul> node, which I also need to hide:
.//div[contains(#class, 'author_3333')]
What I want to do is select the closest div.root ancestor associated with the node that my XPath statement matches. Any help would be greatly appreciated... thanks in advance!
you need to get the parent node that has the second div as its child, something like:
//div[.//div[contains(#class, "author_3333")]]
You can use this XPath expression:
.//div[contains(#class, 'author_3333')]/ancestor::div[contains(#class,'root')][1]
Output is:
<div id="root_2000" class="root bacon">
<ul>
<li id="item_8675309" class="active">
<div class="userpost author_3333">
<div>
This text, and as the DIV.root that contains it, should be hidden.
</div>
</div>
<ul>
<li>Another item</li>
</ul>
</li>
</ul>
</div>

Using indexing in XPath

I have a following fragment of XML:
<ul>
<li>xxx
<ul> enter code here
<li>1</li>
<li>2</li>
</ul>
</li>
<li>yyy
<ul>
<li>3</li>
<li>4</li>
</ul>
</li>
and my XPath
(//ul[#class="some-class"][1]//li)[1]
returns as expected xxx, 1, and 2. But when I use
(//ul[#class="some-class"][1]//li)[2]
it returns starting from 1, not yyy as I expect. Please advise.
Try the XPath
//ul[#class='some-class'][1]/li[2]//text()[1]
It gives
yyy
34
as output.
How to handle the spaces is another thing...
(//ul[#class="some-class"][1]//li)[1]
when you use //li, this means all the descent of ul tag.
<ul>
<li>xxx # the 1st li tag
<ul> enter code here
<li>1</li> # the 2st li tag
<li>2</li> # the 3st li tag
</ul>
</li>
<li>yyy # the 4st li tag
<ul>
<li>3</li> # the 5st li tag
<li>4</li> # the 6st li tag
</ul>
</li>
you should use (//ul[#class="some-class"][1]/li)[1]
/li means the child of ul tag.
<ul>
<li>xxx # the 1st li tag
<ul> enter code here
<li>1</li>
<li>2</li>
</ul>
</li>
<li>yyy # the 2st li tag
<ul>
<li>3</li>
<li>4</li>
</ul>
</li>

reject li dom element having specific attributes

I am trying to get scrape a page and get dom elements which is a collection on links with Ruby and Nokogiri. So I have a collection of li's which has a specific attributes in some li's. I need to reject those li;s which has specific attributes and get all the link tags of those li's.
Here is my DOM looks like.
<ul>
<li class="carousel-list-item">
<a itemprop="url" data-cr="CharNav23" class="property-icon property-icon-14" href="/max-and-shred/">
<div itemprop="name" class="property-tooltip">
Max & Shred
</div>
</a>
</li>
<li class="carousel-list-item">
<a itemprop="url" data-cr="CharNav24" class="property-icon property-icon-19" href="/rabbids-invasion/">
<div itemprop="name" class="property-tooltip">
Rabbids Invasion
</div>
</a>
</li>
<li data-sponsor="Sponsor" class="carousel-list-item">
<a itemprop="url" data-cr="CharNav21" class="property-icon property-icon-40" target="_blank" href="http://pubads.g.doubleclick.net/gampad/clk?id=47616903&iu=8675">
<div itemprop="name" class="property-tooltip">
LEGO Friends
</div>
</a>
</li>
<li class="carousel-list-item">
<a itemprop="url" data-cr="CharNav24" class="property-icon property-icon-19" href="/rubyds-investment/">
<div itemprop="name" class="property-tooltip">
Rabbids Invasion
</div>
</a>
</li>
</ul>
I need to collect all a tags whose lis dont have data-sponsor="Sponsor" attributes. I tried like the below but it includes all lis.
page.search('ul.carousel-list > li > a').map{ |link| make_absolute(link['href']) }
The css way to do that is:
page.search('li:not([data-sponsor]) a')
or
page.search('li:not([data-sponsor=Sponsor]) a')
Probably a better option than xpath.
You should try:
# this will give you all ul elements which has no attribute named 'data-sponsor'.
page.search('//ul[#class="carousel-list"]/li[not(#data-sponsor)]/a').map{ |link| make_absolute(link['href']) }

xpath syntax for selecting the text after <strong> tag

<div class="article_details">
<h1>Product name is</h1>
<div class="left">
<ul class="article_list">
<li>
<strong>art. nr.:</strong>
VS7896
</li>
<li>
<b>Shipping time</b>
: 1-3 Days
</li>
</ul>
</div>
I used //DIV[#class='left']/UL[1]/LI[1] but the result is "art. nr.: VS7896".
Please help me with the correct XPath to select just "VS7896".
To select the text after <strong>, use
//strong/following-sibling::text()[1]

Resources