Make use of XPath Axes to extract sibling elements' text - xpath

Given the following html, how to get a list of tuple (TIME, COMMENT, OOXX) by XPath? I think I need to make use of XPath Axes but not sure how to use that. Furthermore, the OOXX seems not to belong to any tags!
<div class="contents">
<p></p>
<div class="meta">TIME</div>OOXX
<div class="comment">COMMENT</div>
<p></p>
<div class="meta">TIME</div>OOXX
<div class="comment">COMMENT</div>
<p></p>
<div class="meta">TIME</div>OOXX
<div class="comment">COMMENT</div>
<p></p>
<div class="meta">TIME</div>OOXX
<div class="comment">COMMENT</div>
<p></p>
</div>

How you'll want to deal with multiple such tuples in the input XML will depend on your requirements and the facilities of the context of the XPath evaluation.
However, here's how to get the first TIME:
/div/div[#class="meta"][1]/text()
Here's how to get the first COMMENT:
/div/div[#class="comment"][1]/text()
And here's how to get the first OOXX:
/div/div[#class="meta"][1]/following-sibling::text()[1]

Related

Parsing through response created with XPath

Using Scrapy, I want to extract some data from a HTML well-formed site. With XPath I am able to extract a list of items, but I am not able to extra data from the elements in the list, using XPath
All XPath's have been tested using XPather. I have tested the issue using a local file that contains the webpage, same issue.
Here goes:
# Get the webpage
fetch("https://www.someurl.com")
# The following gives me the expected items from the HTML
products = response.xpath("//*[#id='product-list-146620']/div/div")
The items are like this:
<div data-pageindex="1" data-guid="13157582" class="col ">
<div class="item item-card item-card--static">
<div class="item-card__inner">
<div class="item__image item__image--overlay">
<a href="/www.something.anywhere?ref_gr=9801" class="ratio_custom" style="padding-bottom:100%">
</a>
</div>
<div class="item__text-container">
<div class="item__name">
<a class="item__name-link" href="/c.aspx?ref_gr=9801">The text I want</a>
</div>
</div>
</div>
</div>
</div>
When using the following Xpath to extract "The text I want", i dont get anything:
XPATH_PRODUCT_NAME = "/div/div/div/div/div[contains(#class,'item__name')]/a/text()"
products[0].xpath(XPATH_PRODUCT_NAME).extract()
The output is empty, why?
Try the following code.
XPATH_PRODUCT_NAME = ".//div[#class='item__name']/a[#class='item__name-link']/text()"
products[0].xpath(XPATH_PRODUCT_NAME).extract()

How can I make custom class HTML divisions using AsciiDoctor?

I am beginning with AsciiDoctor and I want to output HTML. I've been trying to figure out how to create custom class in divisions, I searched google, manuals etc. and couldn't find a solution. What I want to do is simply write something like this:
Type the word [userinput]#asciidoc# into the search bar.
Which generates HTML
<span class="userinput">asciidoc</span>
but I want to have div tags instead of span. Is there any way to do it or should I just use something like
+++<div class="userinput">asciidoc</span>+++ ?
I think what you need is called "role" in Asciidoctor.
This example:
This is some text.
[.userinput]
Type the word asciidoc into the search bar.
This is some text.
Produces:
<div class="paragraph">
<p>This is some text.</p>
</div>
<div class="paragraph userinput">
<p>Type the word asciidoc into the search bar.</p>
</div>
<div class="paragraph">
<p>This is some text.</p>
</div>
You have now a css selector div.userinput for the concerned div.
See 13.5. Setting attributes on an element in the Asciidoctor User Manual (you can also search for "role").
You may want to use an open block for that purpose:
Type the following commands:
[.userinput]
--
command1
command1
--
Producing:
<div class="paragraph">
<p>Type the following commands:</p>
</div>
<div class="openblock userinput">
<div class="content">
<div class="paragraph">
<p>command1</p>
</div>
<div class="paragraph">
<p>command1</p>
</div>
</div>
</div>
The advantage is it can wrap any other block and is not limited to only one paragraph like the other answer.
For slightly different use cases, you may also consider defining a custom style.

What will be the xpath?

How can I get the element data using jsoup or xpath.
My requirement is
if i have selected class='SecondClass' then how to find its parent "FirstClass". Means if i have selected class="SecondClass">yyyyyyyyy then how to find
class="FirstClass">Hi element
<div class="FirstClass">Hello</div>
<div class="SecondClass">xyza</div>
<div class="SecondClass">lllllllll</div>
<div class="FirstClass">Hi</div>
<div class="SecondClass">ooooooooo</div>
<div class="SecondClass">yyyyyyyyy</div>
<div class="SecondClass">ttttttttyt</div>
<div class="FirstClass">HelloHi</div>
<div class="SecondClass">xysefsfza</div>
<div class="SecondClass">hohoho</div>
<div class="SecondClass">xydadaza</div>
<div class="SecondClass">new</div>
You can try this XPath expression to get nearest preceding <div> element having class attribute value equals FirstClass :
/preceding-sibling::div[#class='FirstClass'][1]
With that, given XML data is as posted in question, and current element is this :
<div class="SecondClass">yyyyyyyyy</div>
XPath query above will return this element :
<div class="FirstClass">Hi</div>

EmberJs: nested views

I have this view which can rotate a div element. Something like
<div class="rotatable">
<div class="front">
{{outlet front}}
</div>
<div class="back">
{{outlet back}}
<div>
</div>
Now I have this index template which contains two of these rotatable elements. Each rotatable elements has a different front and back. So it could look like this
<div id="index">
{{#rotatable}}
{{outlet front App.FrontView1}}
{{outlet back App.BackView1}}
{{/rotatable}}
{{#rotatable}}
<div>This should show up inside {{outlet front}}</div>
{{outlet back App.BackView2}}
{{/rotatable}}
</div>
This doesn't work of course, but how should this be done ?
Cheers
I guess this question was a little bit unclear. Anyway, the answer is given in this post
EmberJs: how to use connectOutlet

xPath strange behaviour - selecting ALL elements even if [1] set

today I stumbled upon a very interesting case (at least for me). I am messing around with Selenium and xPath and tried to get some elements, but got a strange behaviour:
<div class="resultcontainer">
<div class="info">
<div class="title">
<a>
some text
</a>
</div>
</div>
</div>
<div class="resultcontainer">
<div class="info">
<div class="title">
<a>
some other text
</a>
</div>
</div>
</div>
<div class="resultcontainer">
<div class="info">
<div class="title">
<a>
some even unrelated text
</a>
</div>
</div>
</div>
This is my data.
When i run the following xPath query:
//div[#class="title"][1]/a
I get as a result ALL instead of only the first one. But if I query:
//div[#class="resultcontainer"][1]/div[#class="info"]/div[#class="title"]/a
I get only the first , not all.
Is there some divine reason behind that?
Best regards,
bisko
I think you want
(//div[#class="title"])[1]/a
This:
//div[#class="title"][1]/a
selects all (<a> elements that are children of) <div> elements that have a #class of 'title', that are the first children of their parents (in this context). Which means: it selects all of them.
The working XPath selects all <div> elements that have a #class of 'title' - and of those it takes the first one.
The predicates (the expressions in square brackets []) are applied to each element that matched the preceding location step (i.e. "//div") individually. To apply a predicate to a filtered set of nodes, you need to make the grouping clear with parentheses.
Consequently, this:
//div[1][#class="title"]/a
would select all <div> elements, take the first one, and then filter it down futher by checking the #class value. Also not what you want. ;-)

Resources