How do I extract the value from an xpath expression

How do I extract the value from an xpath expression - xpath

I have a query that looks like...
/freeStyleBuild/artifact[fileName=starts-with(.,'imc')]/fileName
But it returns the data inside tags:
<fileName>imc5.0.0.5078.zip</fileName>
But I just want:
"imc5.0.0.5078.zip"
I'm missing something simple here.

You can try this way using additional /text() at the end :
/freeStyleBuild/artifact[fileName=starts-with(.,'imc')]/fileName/text()

Related

Multiple conditions when contains(#attribute) in Scrapy

For example, if I want to find link tag with application type "application/javascript" or "application/ecmascript", I would like to do something like this:
response.xpath("head/link[contains(#type, "javascript", "ecmascript")]")
It goes without saying that the code I putted above will raise an exception.
But I haven't found the way which will help me to apply multiple conditions in one XPath query.

Try this:
response.xpath("head/link[#type[contains(., 'javascript') or contains(., 'ecmascript')]]")
Be careful not using the same quotes in code and XPath.

Xpath multiply formatted output

Have a many entries in an xml file and have xpath with condition:
/XMLReport/Report/PreflightResult/PreflightResultEntry[
#type = 'Check' and #level = 'warning']/PreflightResultEntryMessage/Message/text()
The output is:
onetwothreefour... and more
I need separation
'---' one---two---three---four
or
[enter]
one
two
three
four
Its possible ?

Why you bound XPath expression inside single quote ':
Use this:
string-join(/XMLReport/Report/PreflightResult/PreflightResultEntry[#type = 'Check' and #level = 'warning']/PreflightResultEntryMessage/Message/text(), '---')

Your XPath expression is actually returning a set of text nodes. The way these are displayed depends on the calling application (which you haven't told us anything about). I think your options are (a) change the way the calling application displays the result, or (b) if you're using XPath 2.0+, use the string-join() function to return the result as a string, formatted any way you like within the XPath expression itself.

How to extract items inside a table using scrapy

I want to extract all the functions listed inside the table in the below link : python functions list
I have tried using the chrome developers console to get the exact xpath to be used in the file spider.py as below:
$x('//*[#id="built-in-functions"]/table[1]/tbody//a/#href')
but this returns a list of all href's ( which I think what the xpath expression refers to).
I need to extract the text from here I believe but appending /text() to the above xpath return nothing. Can someone please help me to extract the function names from the table.

I think this should do the trick
response.css('.docutils .reference .pre::text').extract()
a non-exact xpath equivalent of it (but that also works in this case) would be:
response.xpath('//table[contains(#class, "docutils")]//*[contains(#class, "reference")]//*[contains(#class, "pre")]/text()').extract()

Try this:
for td in response.css("#built-in-functions > table:nth-child(4) td"):
td.css("span.pre::text").extract_first()

Xpath expression with OR

I'd like to know if there is a way to verify multiple strings on a Xpath. This is the one I'm using now:
/td[2][text()[contains(.,'Word1')]]
I'd like to do something like this:
/td[2][text()[contains(.,'Word1' OR 'Word2' OR 'Word3')]]
Is that possible?

Updated answer:
I believe, the problem why you are experiencing is case-sensitivity, try writing or in lower-case:
//td[text()[contains(.,'Word1') or contains(.,'Word2') or contains(.,'Word3')]]
If it doesn't help, you can use Union approach:
/td[2][text()[contains(.,'Word1')]] | /td[2][text()[contains(.,'Word2')]] | /td[2][text()[contains(.,'Word3')]]

yes it's possible:
/td[2][text()[contains(.,'Word1') OR contains(.,'Word2') OR contains(.,'Word3')]]

Yes - you just need separate contains() calls:
[contains(., 'Word1') OR contains(., 'Word2') OR contains(., 'Word3')]
As you have it currently, a boolean being passed as the second parameter to contains, rather than a string.

With XPath 2.0 or 3.0 you could also use:
A Quantified Expression to loop over a sequence of words and test if any of the words are contained
//td[2][text()[some $word in ('Word1', 'Word2', 'Word3') satisfies contains(., $word)]]
The matches() function and specify your list of words in a regex:
//td[2][text()[matches(., 'Word1|Word2|Word3')]]

What's the xpath syntax to get tag names?

I'm using Nokogiri to parse a large XML file. Say I've got the following structure:
<menagerie>
<penguin>Pablo</penguin>
<penguin>Mortimer</penguin>
<bull>Ferdinand</bull>
<aardvark>James Cornelius Madison Humphrey Zophar Handlebrush III</aardvark>
</menagerie>
I can count the non-penguins like this:
xml.xpath('//menagerie//*[not(penguin)]').length // 2
But how do I get a list of the tags, like this? (The exact format isn't important; I just want to visually scan the non-penguins.)
bull
aardvark
Update
This gave me the list I wanted - thanks Oded and TMN and delnan!
xml.xpath('//menageries/*[not(penguin)]').each do |node|
puts node.name()
end

You can use the name() or local-name() XPath function.
See the examples on zvon.

I know it's a bit outdated but you should do: xml.xpath('//meagerie/*[not(penguin)]/name()') as the expression. Note the slash, not the dot. This is how you call methods on the current node in XPath.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

How do I extract the value from an xpath expression - xpath

I have a query that looks like... /freeStyleBuild/artifact[fileName=starts-with(.,'imc')]/fileName But it returns the data inside tags: <fileName>imc5.0.0.5078.zip</fileName> But I just want: "imc5.0.0.5078.zip" I'm missing something simple here.

You can try this way using additional /text() at the end : /freeStyleBuild/artifact[fileName=starts-with(.,'imc')]/fileName/text()

Related

Multiple conditions when contains(#attribute) in Scrapy

Xpath multiply formatted output

How to extract items inside a table using scrapy

Xpath expression with OR

What's the xpath syntax to get tag names?

Categories

Resources