<product>
<book>
<id>111</id>
<name>xxx</name>
</book>
<pen>
<id>222</id>
<name>yyy</name>
</pen>
<pencil>
<id>333</id>
<name>zzz</name>
</pencil>
I want to remove the "pencil" node and print the remaining xml using REXML (Ruby). Can anybody tell me how to do that ?
By using one of the delete methods http://rubydoc.info/stdlib/rexml/
require "rexml/document"
string = <<EOF
<product>
<book>
<id>111</id>
<name>xxx</name>
</book>
<pen>
<id>222</id>
<name>yyy</name>
</pen>
<pencil>
<id>333</id>
<name>zzz</name>
</pencil>
</product>
EOF
doc = REXML::Document.new(string)
doc.delete_element('//pencil')
puts doc
There is also nice tutorial to get you started: http://www.germane-software.com/software/rexml/docs/tutorial.html
Related
<CATALOG>
<BOOK>
<TITLE>Hadoop Defnitive Guide</TITLE>
<AUTHOR>Tom White</AUTHOR>
<COUNTRY>US</COUNTRY>
<COMPANY>CLOUDERA</COMPANY>
<PRICE>24.90</PRICE>
<YEAR>2012</YEAR>
</BOOK>
</CATALOG>
This is xml i am using.
I want to extract only TITLE and COMPANY elements.Is there any way to extract them by using Regex or XPath();
First thing you need to do is format your XML like so:
<CATALOG>
<BOOK>
<TITLE>Hadoop Defnitive Guide</TITLE>
<AUTHOR>Tom White</AUTHOR>
<COUNTRY>US</COUNTRY>
<COMPANY>CLOUDERA</COMPANY>
<PRICE>24.90</PRICE>
<YEAR>2012</YEAR>
</BOOK>
</CATALOG>
Then you can extract those elements like so:
/CATALOG/BOOK/*[self::title or self::company]
More about axes you can find here: http://www.w3schools.com/xsl/xpath_axes.asp
I have this XML file and I need to get the name and the author/s of a book, where name of at least one author starts with "E".
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE library SYSTEM "knihy.dtd">
<library>
<book isbn="0470114878">
<author hlavni="hlavni">David Hunter</author>
<author>Jeff Rafter</author>
<author>Joe Fawcett</author>
<author>Eric van der Vlist</author>
<author>Danny Ayers</author>
<name>Beginning XML</name>
<publisher>Wrox</publisher>
</book>
<book isbn="0596004206">
<author>Erik Ray</author>
<name>Learning XML</name>
<publisher>O'Reilly</publisher>
</book>
<book isbn="0764547593">
<author>Kay Ethier</author>
<author>Alan Houser</author>
<name>XML Weekend Crash Course</name>
<year>2001</year>
</book>
<book isbn="1590596765">
<author>Sas Jacobs</author>
<name>Beginning XML with DOM and Ajax</name>
<publisher>Apress</publisher>
<year>2006</year>
</book>
</library>
I tried this approach
for $book in /library/book[starts-with(author, "E")]
return $book
but it returns XPathException in invokeTransform: A sequence of more than one item is not allowed as the first argument of starts-with() ("David Hunter", "Jeff Rafter", ...). So how can I check this sequence?
As the error message suggests, use starts-with() in predicate for individual author elements, instead of passing all author child elements to starts-with() function at once :
for $book in /library/book[author[starts-with(., "E")]]
return $book
xpathtester demo
The above will return all books where name of at least one of the author starts with "E".
output :
<book isbn="0470114878">
<author hlavni="hlavni">David Hunter</author>
<author>Jeff Rafter</author>
<author>Joe Fawcett</author>
<author>Eric van der Vlist</author>
<author>Danny Ayers</author>
<name>Beginning XML</name>
<publisher>Wrox</publisher>
</book>
<book isbn="0596004206">
<author>Erik Ray</author>
<name>Learning XML</name>
<publisher>O'Reilly</publisher>
</book>
New to Xpath. Was trying in to use XML task in SSIS to load some values. Using Microsoft' XML inventory mentioned below.
How can I load first-name value in bookstore/books where style is novel and award = 'Pulitzer'?
//book[#style='novel' and ./author/award/text()='Pulitzer'] is what I am trying. It gives the whole element. Where should I modify to just get the first-name value?
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="myfile.xsl" ?>
<bookstore specialty="novel">
<book style="autobiography">
<author>
<first-name>Joe</first-name>
<last-name>Bob</last-name>
<award>Trenton Literary Review Honorable Mention</award>
</author>
<price>12</price>
</book>
<book style="textbook">
<author>
<first-name>Mary</first-name>
<last-name>Bob</last-name>
<publication>Selected Short Stories of
<first-name>Mary</first-name>
<last-name>Bob</last-name>
</publication>
</author>
<editor>
<first-name>Britney</first-name>
<last-name>Bob</last-name>
</editor>
<price>55</price>
</book>
<magazine style="glossy" frequency="monthly">
<price>2.50</price>
<subscription price="24" per="year"/>
</magazine>
<book style="novel" id="myfave">
<author>
<first-name>Toni</first-name>
<last-name>Bob</last-name>
<degree from="Trenton U">B.A.</degree>
<degree from="Harvard">Ph.D.</degree>
<award>P</award>
<publication>Still in Trenton</publication>
<publication>Trenton Forever</publication>
</author>
<price intl="Canada" exchange="0.7">6.50</price>
<excerpt>
<p>It was a dark and stormy night.</p>
<p>But then all nights in Trenton seem dark and
stormy to someone who has gone through what
<emph>I</emph> have.</p>
<definition-list>
<term>Trenton</term>
<definition>misery</definition>
</definition-list>
</excerpt>
</book>
<my:book xmlns:my="uri:mynamespace" style="leather" price="29.50">
<my:title>Who's Who in Trenton</my:title>
<my:author>Robert Bob</my:author>
</my:book>
</bookstore>
I got an answer.
//book[#style='novel' and ./author/award/text()='Pulitzer']//first-name
Use:
/*/book[#style='novel']/author[award = 'Pulitzer']/first-name
This selects any first-name element whose author parent has a award child with string value of 'Pulitzer' and whose (of the author) parent is a book whose style attribute has value "novel" and whose parent is the top element of the XML document.
A similar question in the same context. How can I do the vice-versa ? Let's suppose I want to find the id of all those books whose price is greater than 20 ? I know I am being a nudge, but really want to clear my understanding.
Here is the needed XPATH :
//book/price[text() > 20]/..
I have the below XML, where i have few child elements with empty text.
doc = <<'XML'
<Book>
<BookId>BK45647</BookId>
<BookName>The Client by John Grisham</BookName>
<BookAuthenticationCode></BookAuthenticationCode>
<BookCategory>Suspense</BookCategory>
<BookSequence></BookSequence>
<BookPublisherInfo>
<PublisherId>PBBK12345</PublisherId>
<PublisherName>Mc.GrawHill</PublisherName>
<PublisherIndex></PublisherIndex>
<PublisherCategoryQuota></PublisherCategoryQuota>
</BookPublisherInfo>
<BookPurchaselist>
<Customer>
<FirstName>John</FirstName>
<LastName>Smith</LastName>
<MiddleName></MiddleName>
<NickName></NickName>
</Customer>
<Customer>
<FirstName>Winston</FirstName>
<LastName>Churchill</LastName>
<MiddleName></MiddleName>
<NickName></NickName>
</Customer>
</BookPurchaselist>
</Book>
XML
I tried with below code but its somehow not working properly.
cust = doc.at_xpath("//Customer")
cust.each do |cust_obj|
if cust_obj.has_text? == false
cust_obj.delete
end
end
This is somehow not working properly and giving the below output
<Book>
<BookId>BK45647</BookId>
<BookName>The Client by John Grisham</BookName>
<BookAuthenticationCode></BookAuthenticationCode>
<BookCategory>Suspense</BookCategory>
<BookSequence></BookSequence>
<BookPublisherInfo>
<PublisherId>PBBK12345</PublisherId>
<PublisherName>Mc.GrawHill</PublisherName>
<PublisherIndex></PublisherIndex>
<PublisherCategoryQuota></PublisherCategoryQuota>
</BookPublisherInfo>
<BookPurchaselist>
<Customer>
<FirstName>John</FirstName>
<LastName>Smith</LastName>
<MiddleName></MiddleName>
</Customer>
<Customer>
<FirstName>Winston</FirstName>
<LastName>Churchill</LastName>
<NickName></NickName>
</Customer>
</BookPurchaselist>
</Book>
Few of the elements which has empty text are getting and few of them remain as such. How can i recursively delete elements at specific xpath(with empty data) and re-write the XML.
Got stuck here.. Need suggestions.
doc.xpath('//Customer/child::*[not(text())]').each do |node|
node.remove
end
You can use not(node()) if you want to delete nodes that have no children, too.
EDIT: Full working example (using the same code as above)
require 'nokogiri'
xml = <<-XML
<Book>
<BookId>BK45647</BookId>
<BookName>The Client by John Grisham</BookName>
<BookAuthenticationCode></BookAuthenticationCode>
<BookCategory>Suspense</BookCategory>
<BookSequence></BookSequence>
<BookPublisherInfo>
<PublisherId>PBBK12345</PublisherId>
<PublisherName>Mc.GrawHill</PublisherName>
<PublisherIndex></PublisherIndex>
<PublisherCategoryQuota></PublisherCategoryQuota>
</BookPublisherInfo>
<BookPurchaselist>
<Customer>
<FirstName>John</FirstName>
<LastName>Smith</LastName>
<MiddleName></MiddleName>
</Customer>
<Customer>
<FirstName>Winston</FirstName>
<LastName>Churchill</LastName>
<NickName></NickName>
</Customer>
</BookPurchaselist>
</Book>
XML
doc = Nokogiri.parse(xml)
doc.xpath('//Customer/child::*[not(text())]').each do |node|
node.remove
end
puts doc.to_s
The output of this program is:
<?xml version="1.0"?>
<Book>
<BookId>BK45647</BookId>
<BookName>The Client by John Grisham</BookName>
<BookAuthenticationCode/>
<BookCategory>Suspense</BookCategory>
<BookSequence/>
<BookPublisherInfo>
<PublisherId>PBBK12345</PublisherId>
<PublisherName>Mc.GrawHill</PublisherName>
<PublisherIndex/>
<PublisherCategoryQuota/>
</BookPublisherInfo>
<BookPurchaselist>
<Customer>
<FirstName>John</FirstName>
<LastName>Smith</LastName>
</Customer>
<Customer>
<FirstName>Winston</FirstName>
<LastName>Churchill</LastName>
</Customer>
</BookPurchaselist>
</Book>
I've an xml doc like below. I was trying to select a title node with a particular value in it say "![CDATA[ 1234 ]]". That Title node may be in any Type node. I was using this xpath query
/Results/ResultSet/Type[Title="![CDATA[ 1234 ]]"]
but didnt get anything selected. can someone pls help.
<Results>
<Info>...</Info>
<ResultSet num="4">
<Type type="A">
<Title>
<![CDATA[ 1234 ]]>
</Title>
<Description>
<![CDATA[ 1234 ]]>
</Description>
<Domain>
<![CDATA[1234 ]]>
</Domain>
<Target>
<![CDATA[]]>
</Target>
</Type>
<Type type="A">
<Title>
<![CDATA[ abcdef ]]>
</Title>
<Description>
<![CDATA[abcdef]]>
</Description>
<Domain>
<![CDATA[abcdef]]>
</Domain>
<Target>
<![CDATA[abcdef]]>
</Target>
</Type>
EDIT: included the ruby code that I am using
doc = Nokogiri::HTML(html)
Element = doc.xpath('/Results/ResultSet/Type/Title[text()=" 1234 "]')
if Element.empty?()
puts "not there "
else
Element.each do |node|
puts "Found Title: #{node.text}"
end
end
end
The XPath is wrong:
Use this:
/Results/ResultSet/Type/Title[text()=" 1234 "]
Based on the link OP posted for the XML, here is the working XPath:
/QuigoResults/ResultSet/Listing/Title[text()=" location in DYNAMICREGION "]