I want to reprint the modified xml after deleting entire child node - ruby

<product>
<book>
<id>111</id>
<name>xxx</name>
</book>
<pen>
<id>222</id>
<name>yyy</name>
</pen>
<pencil>
<id>333</id>
<name>zzz</name>
</pencil>
I want to remove the "pencil" node and print the remaining xml using REXML (Ruby). Can anybody tell me how to do that ?

By using one of the delete methods http://rubydoc.info/stdlib/rexml/
require "rexml/document"
string = <<EOF
<product>
<book>
<id>111</id>
<name>xxx</name>
</book>
<pen>
<id>222</id>
<name>yyy</name>
</pen>
<pencil>
<id>333</id>
<name>zzz</name>
</pencil>
</product>
EOF
doc = REXML::Document.new(string)
doc.delete_element('//pencil')
puts doc
There is also nice tutorial to get you started: http://www.germane-software.com/software/rexml/docs/tutorial.html

Related

I am using pig version .8 , How to extract specific elements of xml by using XPath() ?. I tried with multiple ways but couldn't get.Please suggest

<CATALOG>
<BOOK>
<TITLE>Hadoop Defnitive Guide</TITLE>
<AUTHOR>Tom White</AUTHOR>
<COUNTRY>US</COUNTRY>
<COMPANY>CLOUDERA</COMPANY>
<PRICE>24.90</PRICE>
<YEAR>2012</YEAR>
</BOOK>
</CATALOG>
This is xml i am using.
I want to extract only TITLE and COMPANY elements.Is there any way to extract them by using Regex or XPath();
First thing you need to do is format your XML like so:
<CATALOG>
<BOOK>
<TITLE>Hadoop Defnitive Guide</TITLE>
<AUTHOR>Tom White</AUTHOR>
<COUNTRY>US</COUNTRY>
<COMPANY>CLOUDERA</COMPANY>
<PRICE>24.90</PRICE>
<YEAR>2012</YEAR>
</BOOK>
</CATALOG>
Then you can extract those elements like so:
/CATALOG/BOOK/*[self::title or self::company]
More about axes you can find here: http://www.w3schools.com/xsl/xpath_axes.asp

Xpath - multiple tags with same name starting with a substring

I have this XML file and I need to get the name and the author/s of a book, where name of at least one author starts with "E".
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE library SYSTEM "knihy.dtd">
<library>
<book isbn="0470114878">
<author hlavni="hlavni">David Hunter</author>
<author>Jeff Rafter</author>
<author>Joe Fawcett</author>
<author>Eric van der Vlist</author>
<author>Danny Ayers</author>
<name>Beginning XML</name>
<publisher>Wrox</publisher>
</book>
<book isbn="0596004206">
<author>Erik Ray</author>
<name>Learning XML</name>
<publisher>O'Reilly</publisher>
</book>
<book isbn="0764547593">
<author>Kay Ethier</author>
<author>Alan Houser</author>
<name>XML Weekend Crash Course</name>
<year>2001</year>
</book>
<book isbn="1590596765">
<author>Sas Jacobs</author>
<name>Beginning XML with DOM and Ajax</name>
<publisher>Apress</publisher>
<year>2006</year>
</book>
</library>
I tried this approach
for $book in /library/book[starts-with(author, "E")]
return $book
but it returns XPathException in invokeTransform: A sequence of more than one item is not allowed as the first argument of starts-with() ("David Hunter", "Jeff Rafter", ...). So how can I check this sequence?
As the error message suggests, use starts-with() in predicate for individual author elements, instead of passing all author child elements to starts-with() function at once :
for $book in /library/book[author[starts-with(., "E")]]
return $book
xpathtester demo
The above will return all books where name of at least one of the author starts with "E".
output :
<book isbn="0470114878">
<author hlavni="hlavni">David Hunter</author>
<author>Jeff Rafter</author>
<author>Joe Fawcett</author>
<author>Eric van der Vlist</author>
<author>Danny Ayers</author>
<name>Beginning XML</name>
<publisher>Wrox</publisher>
</book>
<book isbn="0596004206">
<author>Erik Ray</author>
<name>Learning XML</name>
<publisher>O'Reilly</publisher>
</book>

How to Use multiple conditions in Xpath?

New to Xpath. Was trying in to use XML task in SSIS to load some values. Using Microsoft' XML inventory mentioned below.
How can I load first-name value in bookstore/books where style is novel and award = 'Pulitzer'?
//book[#style='novel' and ./author/award/text()='Pulitzer'] is what I am trying. It gives the whole element. Where should I modify to just get the first-name value?
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="myfile.xsl" ?>
<bookstore specialty="novel">
<book style="autobiography">
<author>
<first-name>Joe</first-name>
<last-name>Bob</last-name>
<award>Trenton Literary Review Honorable Mention</award>
</author>
<price>12</price>
</book>
<book style="textbook">
<author>
<first-name>Mary</first-name>
<last-name>Bob</last-name>
<publication>Selected Short Stories of
<first-name>Mary</first-name>
<last-name>Bob</last-name>
</publication>
</author>
<editor>
<first-name>Britney</first-name>
<last-name>Bob</last-name>
</editor>
<price>55</price>
</book>
<magazine style="glossy" frequency="monthly">
<price>2.50</price>
<subscription price="24" per="year"/>
</magazine>
<book style="novel" id="myfave">
<author>
<first-name>Toni</first-name>
<last-name>Bob</last-name>
<degree from="Trenton U">B.A.</degree>
<degree from="Harvard">Ph.D.</degree>
<award>P</award>
<publication>Still in Trenton</publication>
<publication>Trenton Forever</publication>
</author>
<price intl="Canada" exchange="0.7">6.50</price>
<excerpt>
<p>It was a dark and stormy night.</p>
<p>But then all nights in Trenton seem dark and
stormy to someone who has gone through what
<emph>I</emph> have.</p>
<definition-list>
<term>Trenton</term>
<definition>misery</definition>
</definition-list>
</excerpt>
</book>
<my:book xmlns:my="uri:mynamespace" style="leather" price="29.50">
<my:title>Who's Who in Trenton</my:title>
<my:author>Robert Bob</my:author>
</my:book>
</bookstore>
I got an answer.
//book[#style='novel' and ./author/award/text()='Pulitzer']//first-name
Use:
/*/book[#style='novel']/author[award = 'Pulitzer']/first-name
This selects any first-name element whose author parent has a award child with string value of 'Pulitzer' and whose (of the author) parent is a book whose style attribute has value "novel" and whose parent is the top element of the XML document.
A similar question in the same context. How can I do the vice-versa ? Let's suppose I want to find the id of all those books whose price is greater than 20 ? I know I am being a nudge, but really want to clear my understanding.
Here is the needed XPATH :
//book/price[text() > 20]/..

How to recursively delete empty child elements at a specific xpath location in an XML using Nokogiri?

I have the below XML, where i have few child elements with empty text.
doc = <<'XML'
<Book>
<BookId>BK45647</BookId>
<BookName>The Client by John Grisham</BookName>
<BookAuthenticationCode></BookAuthenticationCode>
<BookCategory>Suspense</BookCategory>
<BookSequence></BookSequence>
<BookPublisherInfo>
<PublisherId>PBBK12345</PublisherId>
<PublisherName>Mc.GrawHill</PublisherName>
<PublisherIndex></PublisherIndex>
<PublisherCategoryQuota></PublisherCategoryQuota>
</BookPublisherInfo>
<BookPurchaselist>
<Customer>
<FirstName>John</FirstName>
<LastName>Smith</LastName>
<MiddleName></MiddleName>
<NickName></NickName>
</Customer>
<Customer>
<FirstName>Winston</FirstName>
<LastName>Churchill</LastName>
<MiddleName></MiddleName>
<NickName></NickName>
</Customer>
</BookPurchaselist>
</Book>
XML
I tried with below code but its somehow not working properly.
cust = doc.at_xpath("//Customer")
cust.each do |cust_obj|
if cust_obj.has_text? == false
cust_obj.delete
end
end
This is somehow not working properly and giving the below output
<Book>
<BookId>BK45647</BookId>
<BookName>The Client by John Grisham</BookName>
<BookAuthenticationCode></BookAuthenticationCode>
<BookCategory>Suspense</BookCategory>
<BookSequence></BookSequence>
<BookPublisherInfo>
<PublisherId>PBBK12345</PublisherId>
<PublisherName>Mc.GrawHill</PublisherName>
<PublisherIndex></PublisherIndex>
<PublisherCategoryQuota></PublisherCategoryQuota>
</BookPublisherInfo>
<BookPurchaselist>
<Customer>
<FirstName>John</FirstName>
<LastName>Smith</LastName>
<MiddleName></MiddleName>
</Customer>
<Customer>
<FirstName>Winston</FirstName>
<LastName>Churchill</LastName>
<NickName></NickName>
</Customer>
</BookPurchaselist>
</Book>
Few of the elements which has empty text are getting and few of them remain as such. How can i recursively delete elements at specific xpath(with empty data) and re-write the XML.
Got stuck here.. Need suggestions.
doc.xpath('//Customer/child::*[not(text())]').each do |node|
node.remove
end
You can use not(node()) if you want to delete nodes that have no children, too.
EDIT: Full working example (using the same code as above)
require 'nokogiri'
xml = <<-XML
<Book>
<BookId>BK45647</BookId>
<BookName>The Client by John Grisham</BookName>
<BookAuthenticationCode></BookAuthenticationCode>
<BookCategory>Suspense</BookCategory>
<BookSequence></BookSequence>
<BookPublisherInfo>
<PublisherId>PBBK12345</PublisherId>
<PublisherName>Mc.GrawHill</PublisherName>
<PublisherIndex></PublisherIndex>
<PublisherCategoryQuota></PublisherCategoryQuota>
</BookPublisherInfo>
<BookPurchaselist>
<Customer>
<FirstName>John</FirstName>
<LastName>Smith</LastName>
<MiddleName></MiddleName>
</Customer>
<Customer>
<FirstName>Winston</FirstName>
<LastName>Churchill</LastName>
<NickName></NickName>
</Customer>
</BookPurchaselist>
</Book>
XML
doc = Nokogiri.parse(xml)
doc.xpath('//Customer/child::*[not(text())]').each do |node|
node.remove
end
puts doc.to_s
The output of this program is:
<?xml version="1.0"?>
<Book>
<BookId>BK45647</BookId>
<BookName>The Client by John Grisham</BookName>
<BookAuthenticationCode/>
<BookCategory>Suspense</BookCategory>
<BookSequence/>
<BookPublisherInfo>
<PublisherId>PBBK12345</PublisherId>
<PublisherName>Mc.GrawHill</PublisherName>
<PublisherIndex/>
<PublisherCategoryQuota/>
</BookPublisherInfo>
<BookPurchaselist>
<Customer>
<FirstName>John</FirstName>
<LastName>Smith</LastName>
</Customer>
<Customer>
<FirstName>Winston</FirstName>
<LastName>Churchill</LastName>
</Customer>
</BookPurchaselist>
</Book>

xpath selection of a title node within a resultset

I've an xml doc like below. I was trying to select a title node with a particular value in it say "![CDATA[ 1234 ]]". That Title node may be in any Type node. I was using this xpath query
/Results/ResultSet/Type[Title="![CDATA[ 1234 ]]"]
but didnt get anything selected. can someone pls help.
<Results>
<Info>...</Info>
<ResultSet num="4">
<Type type="A">
<Title>
<![CDATA[ 1234 ]]>
</Title>
<Description>
<![CDATA[ 1234 ]]>
</Description>
<Domain>
<![CDATA[1234 ]]>
</Domain>
<Target>
<![CDATA[]]>
</Target>
</Type>
<Type type="A">
<Title>
<![CDATA[ abcdef ]]>
</Title>
<Description>
<![CDATA[abcdef]]>
</Description>
<Domain>
<![CDATA[abcdef]]>
</Domain>
<Target>
<![CDATA[abcdef]]>
</Target>
</Type>
EDIT: included the ruby code that I am using
doc = Nokogiri::HTML(html)
Element = doc.xpath('/Results/ResultSet/Type/Title[text()=" 1234 "]')
if Element.empty?()
puts "not there "
else
Element.each do |node|
puts "Found Title: #{node.text}"
end
end
end
The XPath is wrong:
Use this:
/Results/ResultSet/Type/Title[text()=" 1234 "]
Based on the link OP posted for the XML, here is the working XPath:
/QuigoResults/ResultSet/Listing/Title[text()=" location in DYNAMICREGION "]

Resources