I have a simple XML file, items.xml:
<?xml version="1.0" encoding="UTF-8" ?>
<items>
<item>
<name>mouse</name>
<manufacturer>Logicteh</manufacturer>
</item>
<item>
<name>keyboard</name>
<manufacturer>Logitech - Inc.</manufacturer>
</item>
<item>
<name>webcam</name>
<manufacturer>Logistech</manufacturer>
</item>
</items>
I am trying to insert a new node with the following code:
require 'rubygems'
require 'nokogiri'
f = File.open('items.xml')
#items = Nokogiri::XML(f)
f.close
price = Nokogiri::XML::Node.new "price", #items
price.content = "10"
#items.xpath('//items/item/manufacturer').each do |node|
node.add_next_sibling(price)
end
file = File.open("items_fixed.xml",'w')
file.puts #items.to_xml
file.close
However this code adds a new node only after the last <manufacturer> node, items_fixed.xml:
<?xml version="1.0" encoding="UTF-8"?>
<items>
<item>
<name>mouse</name>
<manufacturer>Logitech</manufacturer>
</item>
<item>
<name>keyboard</name>
<manufacturer>Logitech</manufacturer>
</item>
<item>
<name>webcam</name>
<manufacturer>Logitech</manufacturer><price>10</price>
</item>
</items>
Why?
It would be helpful to distinguish between a Node (a particular piece of structured XML data at a particular place in a tree), and a "node template" which is the structure of the data.
Nokogiri (and most other XML libraries) only allow you to specify Nodes, not node templates. So when you created price = Nokogiri::XML::Node.new "price", #items, you had a particular piece of data that belongs in a particular place, but hadn't defined the place yet.
When you added it to the first <item>, you defined its place. When you added it to the second <item>, you uprooted it from its place and put it in a new place. At that point this Node appeared only in the second <item>. This continues when you add the same Node to each item, until you reach the last <item>, which is where the node stays.
Nokogiri doesn't have any way to specify a node template. What you need to do is:
#items.xpath('//items/item/manufacturer').each do |node|
price = Nokogiri::XML::Node.new "price", #items
price.content = "10"
node.add_next_sibling(price)
end
I'd start with this:
require 'nokogiri'
doc = Nokogiri::XML(<<EOT)
<?xml version="1.0" encoding="UTF-8"?>
<items>
<item>
<name>mouse</name>
<manufacturer>Logitech</manufacturer>
</item>
<item>
<name>keyboard</name>
<manufacturer>Logitech - Inc.</manufacturer>
</item>
</items>
EOT
doc.search('manufacturer').each { |n| n.after('<price>10</price>') }
Which results in:
puts doc.to_xml
# >> <?xml version="1.0" encoding="UTF-8"?>
# >> <items>
# >> <item>
# >> <name>mouse</name>
# >> <manufacturer>Logitech</manufacturer><price>10</price>
# >> </item>
# >> <item>
# >> <name>keyboard</name>
# >> <manufacturer>Logitech - Inc.</manufacturer><price>10</price>
# >> </item>
# >> </items>
It's easy to build upon this to insert different values for the price.
Related
This is what I'm trying to do:
xml = Nokogiri::XML::Builder.new do |x|
x.root do
x.book do
x.attribute('isbn', 12345) # Doesn't work!
x.text("Don Quixot")
end
end
end.doc
I know that I can do x.book(isbn: 12345), but this is not what I want. I want to add an attribute within the do/end block. Is it at all possible?
The XML expected:
<root>
<book isbn="12345">Don Quixot</book>
</root>
Add the attributes to the node like this
xml = Nokogiri::XML::Builder.new do |x|
x.root do
x.book(isbn: 1235) do
x.text('Don Quixot')
end
end
end.doc
Or, after re-rereading your question perhaps you wanted to add it to the parent further in the do block. In that case, this works:
xml = Nokogiri::XML::Builder.new do |x|
x.root do
x.book do
x.parent.set_attribute('isbn', 12345)
x.text('Don Quixot')
end
end
end.doc
Generates:
<?xml version="1.0"?>
<root>
<book isbn="1235">Don Quixot</book>
</root>
I have an xml document full of nested item nodes. In most cases, each item has a name element. I want to check if an item has a name element, and return a default name if one doesn't exist.
<item>
<name>Item 1</name>
</item>
<item>
<items>
<item>
<name>Child Item 1</name>
</item>
<item>
<name>Child Item 2</name>
</item>
</items>
</item>
When I ask node.at('name') for the node with no name element, it picks the next one from the children further down the tree. In the case above, if I ask at('name') on the second item, I get "Child Item 1".
The problem is you're using at(), which can accept either a CSS selector or an XPath expression, and tries to guess which you gave it. In this case it thinks that name is a CSS selector, which is a descendant selector, selecting name elements anywhere below the current node.
Instead, you want to use an XPath expression to find only child <name> elements. You can do this either by making it clearly an XPath expression:
node.at('./name')
or you can do it by using the at_xpath method to be clear:
node.at_xpath('name')
Here's a simple working example:
require 'nokogiri'
doc = Nokogiri.XML '<r>
<item id="a">
<name>Item 1</name>
</item>
<item id="b">
<items>
<item id="c">
<name>Child Item 1</name>
</item>
<item id="d">
<name>Child Item 2</name>
</item>
</items>
</item>
</r>'
doc.css('item').each do |item|
name = item.at_xpath('name')
name = name ? name.text : "DEFAULT"
puts "#{item['id']} -- #{name}"
end
#=> a -- Item 1
#=> b -- DEFAULT
#=> c -- Child Item 1
#=> d -- Child Item 2
My code is supposed to "guess" the path(s) that lies before the relevant text nodes in my XML file. Relevant in this case means: text nodes nested within the recurring product/person/something tag, but not text nodes that are used outside of it.
This code:
#doc, items = Nokogiri.XML(#file), []
path = []
#doc.traverse do |node|
if node.class.to_s == "Nokogiri::XML::Element"
is_path_element = false
node.children.each do |child|
is_path_element = true if child.class.to_s == "Nokogiri::XML::Element"
end
path.push(node.name) if is_path_element == true && !path.include?(node.name)
end
end
final_path = "/"+path.reverse.join("/")
works for simple XML files, for example:
<?xml version="1.0" encoding="utf-8"?>
<rss version="2.0">
<channel>
<title>Some XML file title</title>
<description>Some XML file description</description>
<item>
<title>Some product title</title>
<brand>Some product brand</brand>
</item>
<item>
<title>Some product title</title>
<brand>Some product brand</brand>
</item>
</channel>
</rss>
puts final_path # => "/rss/channel/item"
But when it gets more complicated, how should I then approach the challenge? For example with this one:
<?xml version="1.0" encoding="utf-8"?>
<rss version="2.0">
<channel>
<title>Some XML file title</title>
<description>Some XML file description</description>
<item>
<titles>
<title>Some product title</title>
</titles>
<brands>
<brand>Some product brand</brand>
</brands>
</item>
<item>
<titles>
<title>Some product title</title>
</titles>
<brands>
<brand>Some product brand</brand>
</brands>
</item>
</channel>
</rss>
If you are looking for a list of deepest "parent" paths in the XML, there is more than one way to view that.
Although I think your own code could be adjusted to achieve the same output, I was convinced the same thing could be achieved by using xpath. And my motivation is to get my XML skills unrusty (not used Nokogiri yet, but I will need to do so professionally soon). So here is how to get all parent paths that have just one child level beneath them, using xpath:
xml.xpath('//*[child::* and not(child::*/*)]').each { |node| puts node.path }
The output of this for your second example file is:
/rss/channel/item[1]/titles
/rss/channel/item[1]/brands
/rss/channel/item[2]/titles
/rss/channel/item[2]/brands
. . . if you took this list and gsub out the indexes, then make the array unique, then this looks a lot like the output of your loop . . .
paths = xml.xpath('//*[child::* and not(child::*/*)]').map { |node| node.path }
paths.map! { |path| path.gsub(/\[[0-9]+\]/,'') }.uniq!
=> ["/rss/channel/item/titles", "/rss/channel/item/brands"]
Or in one line:
paths = xml.xpath('//*[* and not(*/*)]').map { |node| node.path.gsub(/\[[0-9]+\]/,'') }.uniq
=> ["/rss/channel/item/titles", "/rss/channel/item/brands"]
I'm created a library to build xpath.
xpath = Jini.new
.add_path('parent')
.add_path('child')
.add_all('toys')
.add_attr('name', 'plane')
.to_s
puts xpath // -> /parent/child//toys[#name="plane"]
I am using XmlSimple, the problem I am having is in parsing a list of entries, determine number of entries with similar xml tag.
<ItemList>
<Item>
<ItemId>123</ItemId>
<ItemName>abc</ItemName>
<ItemType>xyz</ItemType>
<Status>ok</Status>
</Item>
</ItemList>
Above gets parsed as this -
"ItemList"=> {
"Item"=>{ "ItemId"=>"123",
"ItemName"=>"abc",
"ItemType"=>"xyz",
"Status"=>"ok"
}
},
And I access it as - ['ItemList']['Item']['ItemId'], Without any Index number anywhere.
But if ItemList has more then 1 entries then it messes up my application.
<ItemList>
<Item>
<ItemId>123</ItemId>
<ItemName>abc</ItemName>
<ItemType>xyz</ItemType>
<Status>bad</Status>
</Item>
<Item>
<ItemId>456</ItemId>
<ItemName>fgh</ItemName>
<ItemType>nbv</ItemType>
<Status>bad</Status>
</Item>
</ItemList>
Above gets parsed as this -
"ItemList"=> {
"Item"=>{ "ItemId"=>"123",
"ItemName"=>"abc",
"ItemType"=>"xyz",
"Status"=>"bad"
},
"Item"=>{ "ItemId"=>"456",
"ItemName"=>"fgh",
"ItemType"=>"nbv",
"Status"=>"bad"
}
},
I can access it as - ['ItemList']['Item'][0]['ItemId'] and ['ItemList']['Item'][1]['ItemId']. With providing an Index number manually.
But since I don't know how many items are there in the list I cannot provide index number in the actual app, the xml might have No entry or might have hundreds of them.
Thought of using Nokogiri, but it has the same parsing behavior.
How do I handle this?
Sample processing of your data using xml-simple gem
1.9.2p290 :013 > items = "<ItemList> <Item> <ItemId>123</ItemId> <ItemName>abc</ItemName> <ItemType>xyz</ItemType> <Status>bad</Status> </Item> <Item> <ItemId>456</ItemId> <ItemName>fgh</ItemName> <ItemType>nbv</ItemType> <Status>bad</Status> </Item> </ItemList>"
=> "<ItemList> <Item> <ItemId>123</ItemId> <ItemName>abc</ItemName> <ItemType>xyz</ItemType> <Status>bad</Status> </Item> <Item> <ItemId>456</ItemId> <ItemName>fgh</ItemName> <ItemType>nbv</ItemType> <Status>bad</Status> </Item> </ItemList>"
1.9.2p290 :014 > parsed_items = XmlSimple.xml_in(items, { 'KeyAttr' => 'name' })
=> {"Item"=>[{"ItemId"=>["123"], "ItemName"=>["abc"], "ItemType"=>["xyz"], "Status"=>["bad"]}, {"ItemId"=>["456"], "ItemName"=>["fgh"], "ItemType"=>["nbv"], "Status"=>["bad"]}]}
1.9.2p290 :015 > parsed_items.class
=> Hash
1.9.2p290 :016 > parsed_items["Item"].class
=> Array
1.9.2p290 :017 > parsed_items["Item"].length
=> 2
So your Item will be an array and you can apply length method on it. With my example above you can always do parsed_items["Item"].length
If you are using Ruby 1.8+, I use REXML which makes this easy. See the Accessing Elements section: http://www.germane-software.com/software/rexml/docs/tutorial.html
If 'result' is what you get from parsing your XML doc, then you could test
result['ItemList']['Item']
to check whether it is an array (or enumerable). If it is, then there's more than 1 item, and you'll have to enumerate over the items.
Alternatively, you could do this (assuming ruby 1.9):
[*result['ItemList']['Item']].each do |item|
...
end
The splat operator is cool and when used like this lets you transparently handle a value that could be nil, a scalar, or a collection.
Hpricot + Ruby XML parsing and logical selection.
Objective: Find all title written by author Bob.
My XML file:
<rss>
<channel>
<item>
<title>Book1</title>
<pubDate>march 1 2010</pubDate>
<author>Bob</author>
</item>
<item>
<title>book2</title>
<pubDate>october 4 2009</pubDate>
<author>Bill</author>
</item>
<item>
<title>book3</title>
<pubDate>June 5 2010</pubDate>
<author>Steve</author>
</item>
</channel>
</rss>
#my Hpricot, running this code returns no output, however the search pattern works on its own.
(doc % :rss % :channel / :item).each do |item|
a=item.search("author[text()*='Bob']")
#puts "FOUND" if a.include?"Bob"
puts item.at("title") if a.include?"Bob"
end
If you're not set on Hpricot, here's one way to do this with XPath in Nokogiri:
require 'nokogiri'
doc = Nokogiri::XML( my_rss_string )
bobs_titles = doc.xpath("//title[parent::item/author[text()='Bob']]")
p bobs_titles.map{ |node| node.text }
#=> ["Book1"]
Edit: #theTinMan's XPath also works well, is more readable, and may very well be faster:
bobs_titles = doc.xpath("//author[text()='Bob']/../title")
One of the ideas behind XPath is it allows us to navigate a DOM similarly to a disk directory:
require 'hpricot'
xml = <<EOT
<rss>
<channel>
<item>
<title>Book1</title>
<pubDate>march 1 2010</pubDate>
<author>Bob</author>
</item>
<item>
<title>book2</title>
<pubDate>october 4 2009</pubDate>
<author>Bill</author>
</item>
<item>
<title>book3</title>
<pubDate>June 5 2010</pubDate>
<author>Steve</author>
</item>
<item>
<title>Book4</title>
<pubDate>march 1 2010</pubDate>
<author>Bob</author>
</item>
</channel>
</rss>
EOT
doc = Hpricot(xml)
titles = (doc / '//author[text()="Bob"]/../title' )
titles # => #<Hpricot::Elements[{elem <title> "Book1" </title>}, {elem <title> "Book4" </title>}]>
That means: "find all the books by Bob, then look up one level and find the title tag".
I added an extra book by "Bob" to test getting all occurrences.
To get the item containing a book by Bob, just move back up a level:
items = (doc / '//author[text()="Bob"]/..' )
puts items # => nil
# >> <item>
# >> <title>Book1</title>
# >> <pubdate>march 1 2010</pubdate>
# >> <author>Bob</author>
# >> </item>
# >> <item>
# >> <title>Book4</title>
# >> <pubdate>march 1 2010</pubdate>
# >> <author>Bob</author>
# >> </item>
I also figured out what (doc % :rss % :channel / :item) is doing. It's equivalent to nesting the searches, minus the wrapping parenthesis, and these should all be the same in Hpricot-ese:
(doc % :rss % :channel / :item).size # => 4
(((doc % :rss) % :channel) / :item).size # => 4
(doc / '//rss/channel/item').size # => 4
(doc / 'rss channel item').size # => 4
Because '//rss/channel/item' is how you'd normally see an XPath accessor, and 'rss channel item' is a CSS accessor, I'd recommend using those formats for maintenance and clarity.