How do I extract the value from an xpath expression - xpath

I have a query that looks like...
/freeStyleBuild/artifact[fileName=starts-with(.,'imc')]/fileName
But it returns the data inside tags:
<fileName>imc5.0.0.5078.zip</fileName>
But I just want:
"imc5.0.0.5078.zip"
I'm missing something simple here.

You can try this way using additional /text() at the end :
/freeStyleBuild/artifact[fileName=starts-with(.,'imc')]/fileName/text()

Related

Multiple conditions when contains(#attribute) in Scrapy

For example, if I want to find link tag with application type "application/javascript" or "application/ecmascript", I would like to do something like this:
response.xpath("head/link[contains(#type, "javascript", "ecmascript")]")
It goes without saying that the code I putted above will raise an exception.
But I haven't found the way which will help me to apply multiple conditions in one XPath query.
Try this:
response.xpath("head/link[#type[contains(., 'javascript') or contains(., 'ecmascript')]]")
Be careful not using the same quotes in code and XPath.

Xpath multiply formatted output

Have a many entries in an xml file and have xpath with condition:
/XMLReport/Report/PreflightResult/PreflightResultEntry[
#type = 'Check' and #level = 'warning']/PreflightResultEntryMessage/Message/text()
The output is:
onetwothreefour... and more
I need separation
'---' one---two---three---four
or
[enter]
one
two
three
four
Its possible ?
Why you bound XPath expression inside single quote ':
Use this:
string-join(/XMLReport/Report/PreflightResult/PreflightResultEntry[#type = 'Check' and #level = 'warning']/PreflightResultEntryMessage/Message/text(), '---')
Your XPath expression is actually returning a set of text nodes. The way these are displayed depends on the calling application (which you haven't told us anything about). I think your options are (a) change the way the calling application displays the result, or (b) if you're using XPath 2.0+, use the string-join() function to return the result as a string, formatted any way you like within the XPath expression itself.

How to extract items inside a table using scrapy

I want to extract all the functions listed inside the table in the below link : python functions list
I have tried using the chrome developers console to get the exact xpath to be used in the file spider.py as below:
$x('//*[#id="built-in-functions"]/table[1]/tbody//a/#href')
but this returns a list of all href's ( which I think what the xpath expression refers to).
I need to extract the text from here I believe but appending /text() to the above xpath return nothing. Can someone please help me to extract the function names from the table.
I think this should do the trick
response.css('.docutils .reference .pre::text').extract()
a non-exact xpath equivalent of it (but that also works in this case) would be:
response.xpath('//table[contains(#class, "docutils")]//*[contains(#class, "reference")]//*[contains(#class, "pre")]/text()').extract()
Try this:
for td in response.css("#built-in-functions > table:nth-child(4) td"):
td.css("span.pre::text").extract_first()

Xpath expression with OR

I'd like to know if there is a way to verify multiple strings on a Xpath. This is the one I'm using now:
/td[2][text()[contains(.,'Word1')]]
I'd like to do something like this:
/td[2][text()[contains(.,'Word1' OR 'Word2' OR 'Word3')]]
Is that possible?
Updated answer:
I believe, the problem why you are experiencing is case-sensitivity, try writing or in lower-case:
//td[text()[contains(.,'Word1') or contains(.,'Word2') or contains(.,'Word3')]]
If it doesn't help, you can use Union approach:
/td[2][text()[contains(.,'Word1')]] | /td[2][text()[contains(.,'Word2')]] | /td[2][text()[contains(.,'Word3')]]
yes it's possible:
/td[2][text()[contains(.,'Word1') OR contains(.,'Word2') OR contains(.,'Word3')]]
Yes - you just need separate contains() calls:
[contains(., 'Word1') OR contains(., 'Word2') OR contains(., 'Word3')]
As you have it currently, a boolean being passed as the second parameter to contains, rather than a string.
With XPath 2.0 or 3.0 you could also use:
A Quantified Expression to loop over a sequence of words and test if any of the words are contained
//td[2][text()[some $word in ('Word1', 'Word2', 'Word3') satisfies contains(., $word)]]
The matches() function and specify your list of words in a regex:
//td[2][text()[matches(., 'Word1|Word2|Word3')]]

What's the xpath syntax to get tag names?

I'm using Nokogiri to parse a large XML file. Say I've got the following structure:
<menagerie>
<penguin>Pablo</penguin>
<penguin>Mortimer</penguin>
<bull>Ferdinand</bull>
<aardvark>James Cornelius Madison Humphrey Zophar Handlebrush III</aardvark>
</menagerie>
I can count the non-penguins like this:
xml.xpath('//menagerie//*[not(penguin)]').length // 2
But how do I get a list of the tags, like this? (The exact format isn't important; I just want to visually scan the non-penguins.)
bull
aardvark
Update
This gave me the list I wanted - thanks Oded and TMN and delnan!
xml.xpath('//menageries/*[not(penguin)]').each do |node|
puts node.name()
end
You can use the name() or local-name() XPath function.
See the examples on zvon.
I know it's a bit outdated but you should do: xml.xpath('//meagerie/*[not(penguin)]/name()') as the expression. Note the slash, not the dot. This is how you call methods on the current node in XPath.

Resources