I have a question about filtering a datafeed with an Xpath.
What I want:
I want to select all the products between 30% en 50% percent discount.
What I’ve already tried after reading this website and didn’t work:
/node[price div price_from from<=.7 to<=.5]
/node[price div price_from from <=.7 to <=.5]
node[price div price_from #from <=.7 and .5 <= #to]
/node[price div price_from from <=”.7” to <=”.5”]
I don’t know what I could try more. Does anyone has a solution for my headache caused problem?
Thank you!
Assuming XML input like this:
<nodes>
<node>
<price>30</price>
<price_from>60</price_from>
</node>
</nodes>
The following XPath expression will match the nodes that have been discounted anywhere between 30 and 50 percent. It's pretty straightforward, you had the correct idea, just needed to fix the syntax:
//node[(price div price_from >= 0.5) and (price div price_from <= 0.7)]
Related
For example, if the user has a gift card, we want to assert that gift card section comes first:
<section id="gift-card-section"> ... </section>
<section id="credit-card-section"> ... </section>
Otherwise, we want to assert that the credit card section comes first. How can this be done in Magellan / Nightwatch?
My thinking is just to get the N in the N-th child... and assert that N1 < N2 or the other way around. How is this done in Magellan / Nightwatch?
This is a great case for using XPATH.
Explicit, if you know that these should be the first and second section elements.
browser
.useXpath().assert.attributeContains('(//section)[1]', 'id', 'gift-card-section');
.useXpath().assert.attributeContains('(//section)[2]', 'id', 'credit-card-section');
or, if they need not be ordered, and they are at the same level in the DOM (if they are siblings) you could use attribute equals:
browser
.useXpath().assert.attributeEquals("//section[#id='credit-card-section']/following-sibling:://section[#id='gift-card-section']", "id", "credit-card-section");
This second option is a bit redundant, but there are plenty of other options if you are using XPATH in Nightwatch.
suppose I have this structure:
<div class="a" attribute="foo">
<div class="b">
<span>Text Example</span>
</div>
</div>
In xpath, I would like to retrieve the value of the attribute "attribute" given I have the text inside: Text Example
If I use this xpath:
.//*[#class='a']//*[text()='Text Example']
It returns the element span, but I need the div.a, because I need to get the value of the attribute through Selenium WebDriver
Hey there are lot of ways by which you can figure it out.
So lets say Text Example is given, you can identify it using this text:-
//span[text()='Text Example']/../.. --> If you know its 2 level up
OR
//span[text()='Text Example']/ancestor::div[#class='a'] --> If you don't know how many level up this `div` is
Above 2 xpaths can be used if you only want to identify the element using Text Example, if you don't want to iterate through this text. There are simple ways to identify it directly:-
//div[#class='a']
From your question itself you have mentioned the answer for it
but I need the div.a,
try this
driver.findElement(By.cssSelector("div.a")).getAttribute("attribute");
use cssSelector for best result.
or else try the following xpath
//div[contains(#class, 'a')]
If you want attribute of div.a with it's descendant span which contains text something, try as below :-
driver.findElement(By.xpath("//div[#class = 'a' and descendant::span[text() = 'Text Example']]")).getAttribute("attribute");
Hope it helps..:)
this is my HTML
<book>
<div id="name"></div>
<span id="age"></span>
<p id="contact_number"></p>
...
...
(more attributes)
</book>
I need to extract all the text() inside <book></book> except the p with id="contact_number"
so basically i need //book//text() except //book//p[#id="contact_number"]//text()
How can i do this in a single xpath query?
There might be a better way if you can put the requirement differently. Anyway, to answer the question the way it asked, you can try this :
//book//text()[not(ancestor::p/#id='contact_number')]
or maybe just use parent::p instead of ancestor::p :
//book//text()[not(parent::p/#id='contact_number')]
add [normalize-space()] at the end if you need to filter out empty text nodes.
Try the following:
//*[not(self::p[#id = 'contact_number'])]/text()[normalize-space()]
DOCUMENT: http://en.wikiquote.org/wiki/The_Matrix
I'd want to get all quotes (//ul/li) of the first section (Neo's quotes).
I cannot do //ul[1]/li because in some wikiquote's pages a quote is represented in this form
<h2><span class="mw-headline" id="Neo">Neo</span></h2>
<ul>
<li> First quote </li>
</ul>
<ul>
<li> Second quote </li>
</ul>
<h2><span class="mw-headline" id="dont wanna this">Useless</span></h2>
Instead of
<ul>
<li> First quote </li>
<li> Second quote </li>
</ul>
I've tried this to get the first section
(//*[#id='mw-content-text']/ul/preceding-sibling::h2/span[#class='mw-headline'])[1]
but I having problem to get only the quotes of the first section. May you help me?
Use:
(//h2[span/#id='Neo'])[1]/following-sibling::ul
[count(.
|
(//h2[span/#id='Neo'])[1]
/following-sibling::h2[1]
/preceding-sibling::ul
)
=
count((//h2[span/#id='Neo'])[1]
/following-sibling::h2[1]
/preceding-sibling::ul
)
]
/li
This selects all li that immediately follow the first h2 with a span child that has an id attribute with value "Neo".
To select the qoutatations for the second such h2, simply replace in the above expression 1 with 2.
Do this for all numbers: 1,2, ..., count(//h2[span/#id='Neo'])
XSLT - based verification:
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output omit-xml-declaration="yes" indent="yes"/>
<xsl:template match="/">
<xsl:copy-of select=
"(//h2[span/#id='Neo'])[1]/following-sibling::ul
[count(.
|
(//h2[span/#id='Neo'])[1]
/following-sibling::h2[1]
/preceding-sibling::ul
)
=
count((//h2[span/#id='Neo'])[1]
/following-sibling::h2[1]
/preceding-sibling::ul
)
]
/li
"/>
</xsl:template>
</xsl:stylesheet>
When this transformation is applied on the provided XML document:
<html>
<h2><span class="mw-headline" id="Neo">Neo</span></h2>
<ul>
<li> First quote </li>
</ul>
<ul>
<li> Second quote </li>
</ul>
<h2><span class="mw-headline" id="dont wanna this">Useless</span></h2> >
</html>
the XPath expression is evaluated, and the selected nodes are copied to the output:
<li> First quote </li>
<li> Second quote </li>
Explanation:
This follows from the Kayessian (by Dr. Michael Kay) formula for intersection of two node-sets:
$ns1[count(.|$ns2) = count($ns2)]
the above selects exactly all nodes that belong both to the nodeset $ns and the nodeset $ns2.
So, we substitute $ns1 with the nodeset consisting of all following siblings ul of the h2 of interest. We substitute $ns2 with the nodeset consisting of all preceding siblings ul of the h2 that is the immediate (1st) following sibling of the h2 of interest.
The intersection of these two nodesets contains exactly all ul elements that are wanted.
Update: In a comment the OP states that he only knows that he wants the results to be from the first section -- the string "Neo" isn't known.
Here is the modified solution:
(//h2[span/#id=$vSectionId])[1]
/following-sibling::ul
[count(.
|
(//h2[span/#id=$vSectionId])[1]
/following-sibling::h2[1]
/preceding-sibling::ul
)
=
count((//h2[span/#id=$vSectionId])[1]
/following-sibling::h2[1]
/preceding-sibling::ul
)
]
/li
The variable $vSectionId must be obtained as the string value of the following XPath expression:
substring(//div[h2='Contents']
/following-sibling::ul[1]
/li[1]/a/#href,
2)
Here we are getting the wanted id from the href of the a in the first Table Of Contents entry, and skipping the first character "#".
Here is again an XSLT - based verification:
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output omit-xml-declaration="yes" indent="yes"/>
<xsl:variable name="vSectionId" select=
"substring(//div[h2='Contents']
/following-sibling::ul[1]
/li[1]/a/#href,
2)
"/>
<xsl:template match="/">
<xsl:copy-of select=
"(//h2[span/#id=$vSectionId])[1]
/following-sibling::ul
[count(.
|
(//h2[span/#id=$vSectionId])[1]
/following-sibling::h2[1]
/preceding-sibling::ul
)
=
count((//h2[span/#id=$vSectionId])[1]
/following-sibling::h2[1]
/preceding-sibling::ul
)
]
/li
"/>
</xsl:template>
</xsl:stylesheet>
When this transformation is applied on the complete XML document that is at:
http://en.wikiquote.org/wiki/The_Matrix, the result of applying these two XPath expressions (substituting the result of the first in the second, then evaluating the second expression) is the wanted, correct one:
<li>I know you're out there. I can feel you now. I know that you're afraid. You're afraid of us. You're afraid of change. I don't know the future. I didn't come here to tell you how this is going to end. I came here to tell you how it's going to begin. I'm going to hang up this phone, and then I'm going to show these people what you don't want them to see. I'm going to show them a world … without you. A world without rules and controls, without borders or boundaries; a world where anything is possible. Where we go from there is a choice I leave to you.</li>
<li>Whoa.</li>
<li>I know kung-fu.</li>
<li>Yeah. Well, that sounds like a pretty good deal. But I think I may have a better one. How about, I give you the finger [He does] and you give me my phone call.</li>
<li>Guns.. lots of guns...</li>
<li>There is no spoon.</li>
<li>My name...is Neo!</li>
Using the API will make it MUCH easier to parse. Here's a query that will pull the first section:
http://en.wikiquote.org/w/api.php?action=parse&page=The_Matrix§ion=1&prop=wikitext
Output:
<?xml version="1.0"?>
<api>
<parse title="The Matrix">
<wikitext xml:space="preserve">== Neo ==
[[File:The.Matrix.glmatrix.2.png|thumb|right|Unfortunately, no one can be ''told'' what The Matrix is. You have to see it for yourself.]]
[[Image:Arty spoon.jpg|thumb|right|Do not try to bend the spoon — that's impossible. Instead, only try to realize the truth: there is no spoon.]]
* I know you're out there. I can feel you now. I know that you're afraid. You're afraid of us. You're afraid of change. I don't know the future. I didn't come here to tell you how this is going to end. I came here to tell you how it's going to begin. I'm going to hang up this phone, and then I'm going to show these people what you don't want them to see. I'm going to show them a world … without you. A world without rules and controls, without borders or boundaries; a world where anything is possible. Where we go from there is a choice I leave to you.
* Whoa.
* I know kung-fu.
* Yeah. Well, that sounds like a pretty good deal. But I think I may have a better one. How about, I give you the finger [He does] and you give me my phone call.
* Guns.. lots of guns...
* There is no spoon.
* My name...is Neo!</wikitext>
</parse>
</api>
Here's one way to parse this (using HTTParty):
require 'httparty'
class Wikiquote
include HTTParty
base_uri 'en.wikiquote.org/w/'
def self.get_quotes(page)
url = "/api.php?action=parse&page=#{page}§ion=1&prop=wikitext&format=xml"
headers = {"User-Agent" => "Wikiquote scraper 1.0"}
content = get(url, headers: headers)['api']['parse']['wikitext']['__content__']
return content.scan(/^\* (.*)$/).flatten
end
end
Usage:
Wikiquote.get_quotes("The_Matrix")
Output:
["I know you're out there. I can feel you now. I know that you're afraid. You're afraid of us. You're afraid of change. I don't know the future. I didn't come here to tell you how this is going to end. I came here to tell you how it's going to begin. I'm going to hang up this phone, and then I'm going to show these people what you don't want them to see. I'm going to show them a world … without you. A world without rules and controls, without borders or boundaries; a world where anything is possible. Where we go from there is a choice I leave to you.",
"Whoa.",
"I know kung-fu.",
"Yeah. Well, that sounds like a pretty good deal. But I think I may have a better one. How about, I give you the finger [He does] and you give me my phone call.",
"Guns.. lots of guns...",
"There is no spoon. ",
"My name...is Neo!"]
I suggest //ul[preceding-sibling::h2[1][span/#id = 'Neo']]/li. Or if the id attribute also not present respectively not relevant for the search, then based on the answer in a comment I think you want
(//h2[span[contains(#class, 'mw-headline')]])[1]/following-sibling::ul
[1 = count(preceding-sibling::h2[1] | (//h2[span[contains(#class, 'mw-headline')]])[1])]/li
See XPath axis, get all following nodes until for an explanation and I hope I have managed to close all brackets and braces correctly, don't have time now to test.
I'm trying to learn on scrap these values which i put it in 2 different task:
get the 35.00 from the entire text
get the 42.00 from the entire text
below is the html:
<p style="font-size: 30px; margin-left: -10px; padding: 15px 0pt;">
$35.00 - $42.00
</p>
the code that im using to get the entire text is as below:
node = html_doc.at_css('p')
p node.text
You can get the whole text from node.text and that's as far as you need to go with Nokogiri. From there you could use scan to find the numbers and a bit of list wrangling (flatten and map) and you're done. Something like this:
first, second = node.text.scan(/(\d+(?:\.\d+))/).flatten.map(&:to_f)
That should leave you with 35.0 in first and 42.0 in second. If you know that the numbers are prices with decimals then you can simplify the regex a bit:
first, second = node.text.scan(/(\d+\.\d+)/).flatten.map(&:to_f)
mu's answer is correct but it seems simpler to use split/splat.
first, second = *node.text.tr('$', '').split(' - ')