Parse and read xml array - ruby

This is a simple stuff but driving me really crazy now. Spent hours on figuring this out which I have many many times before.
I am trying to read a parse xmlsimple doc. But I don't know why can't access elements by index number. I can't understand the problem, when I try this in the console it works, but not in actual code. It gives me this error on the view page:
undefined method `[]' for nil:NilClass
Code:
#i = 0
list =""
while #i <= 2
puts xml
a = parsed_items["Item"][#i]["ItemId"]
list << a.to_s << ","
#i += 1
end
puts list.to_s
If I do it by giving a int value manually in my code then it works:
a = parsed_items["Item"][0]["ItemId"] # it works with other exact code
Change to #i and not working:
a = parsed_items["Item"][#i]["ItemId"] # it does not work with other exact code
XML:
1.9.2p290 :013 > items = "<ItemList> <Item> <ItemId>123</ItemId> <ItemName>abc</ItemName> <ItemType>xyz</ItemType> <Status>bad</Status> </Item> <Item> <ItemId>456</ItemId> <ItemName>fgh</ItemName> <ItemType>nbv</ItemType> <Status>bad</Status> </Item> </ItemList>"
=> "<ItemList> <Item> <ItemId>123</ItemId> <ItemName>abc</ItemName> <ItemType>xyz</ItemType> <Status>bad</Status> </Item> <Item> <ItemId>456</ItemId> <ItemName>fgh</ItemName> <ItemType>nbv</ItemType> <Status>bad</Status> </Item> </ItemList>"
1.9.2p290 :014 > parsed_items = XmlSimple.xml_in(items, { 'KeyAttr' => 'name' })
=> {"Item"=>[{"ItemId"=>["123"], "ItemName"=>["abc"], "ItemType"=>["xyz"], "Status"=>["bad"]}, {"ItemId"=>["456"], "ItemName"=>["fgh"], "ItemType"=>["nbv"], "Status"=>["bad"]}]}
XML:
<ItemList>
<Item>
<ItemId>123</ItemId>
<ItemName>abc</ItemName>
<ItemType>xyz</ItemType>
<Status>bad</Status>
</Item>
<Item>
<ItemId>456</ItemId>
<ItemName>fgh</ItemName>
<ItemType>nbv</ItemType>
<Status>bad</Status>
</Item>
</ItemList>

Paraphrased, that error means "Hey, you put [] after something that was nil, but nil doesn't have that method!"
You only have 2 items in your array, so when #i gets to 2—which is the third item in a 0-based list—the code parse_items["Item"][#i] is returning nil; when you try to then execute ["ItemId"] on that value you get the error you stated.
Simplest change to fix this:
while #i<2 # instead of <=2
Better change (let Ruby iterate for you):
list = ""
parsed_items["Item"].each do |item|
list << item["ItemId"].to_s << ","
end
puts list
Even better change (let Ruby do your work for you):
puts parsed_items["Item"].map{ |item| item["ItemId"] }.join(',')

For some reason you're defining an instance variable instead of a local one. Also conversing list into a string is completely unnecessary since it's a string from a very beginning. Working code should look somewhat like this:
i = 0
list =""
while i <= 2
puts xml
a = parsed_items["Item"][i]["ItemId"]
list << a.to_s << ","
i += 1
end
puts list
I strongly suggest you to read about different variable types.

Related

Encode content as CDATA in generated RSS feed

I'm generating an RSS feed using Ruby's built-in RSS library, which seems to escape HTML when generating feeds. For certain elements I'd prefer that it preserve the original HTML by wrapping it in a CDATA block.
A minimal working example:
require 'rss/2.0'
feed = RSS::Rss.new("2.0")
feed.channel = RSS::Rss::Channel.new
feed.channel.title = "Title & Show"
feed.channel.link = "http://foo.net"
feed.channel.description = "<strong>Description</strong>"
item = RSS::Rss::Channel::Item.new
item.title = "Foo & Bar"
item.description = "<strong>About</strong>"
feed.channel.items << item
puts feed
...which generates the following RSS:
<?xml version="1.0"?>
<rss version="2.0">
<channel>
<title>Title & Show</title>
<link>http://foo.net</link>
<description><strong>Description</strong></description>
<item>
<title>Foo & Bar</title>
<description><strong>About</strong></description>
</item>
</channel>
</rss>
Instead of HTML-encoding the channel and item descriptions, I'd like to keep the original HTML and wrap them in CDATA blocks, e.g.:
<description><![CDATA[<strong>Description</strong>]]></description>
monkey-patching the element-generating method works for me:
require 'rss/2.0'
class RSS::Rss::Channel
def description_element need_convert, indent
markup = "#{indent}<description>"
markup << "<![CDATA[#{#description}]]>"
markup << "</description>"
markup
end
end
# ...
this prevents the call to Utils.html_escape which escapes a few special entities.

How to override generating XML from a hash? [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question appears to be off-topic because it lacks sufficient information to diagnose the problem. Describe your problem in more detail or include a minimal example in the question itself.
Closed 8 years ago.
Improve this question
I have a hash like this:
{
12776=>["Item", "01:Antique", "fabric"],
12777=>["Item", "02:Priceless", "porcelain"],
12740=>["Item", "01:Antique", "silver"]
}
And I would like to generate XML like:
<items>
<item type="01:Antique", material="fabric">some other attribute</item>
<item type="02:Priceless", material="porcelain">some other attribute</item>
<item type="01:Antique", material="silver">some other attribute</item>
</items>
Please demonstrate how this is possible.
I would definitely recommend using a gem like Nokogiri to do that for you. Something like this should work:
xml = Nokogiri::XML::Builder.new
xml.items do
hash.values.each do |item_array|
xml.item(type: item_array[1], material: item_array[2]) #some_other_attribute
end
end
Which renders this XML:
1.9.3-p484 :019 > puts xml.to_xml
<?xml version="1.0"?>
<items>
<item type="01:Antique" material="fabric"/>
<item type="02:Priceless" material="porcelain"/>
<item type="01:Antique" material="silver"/>
</items>
This looks about right:
require 'nokogiri'
hash = {
12776 => ["Item", "01:Antique", "fabric"],
12777 => ["Item", "02:Priceless", "porcelain"],
12740 => ["Item", "01:Antique", "silver"]
}
xml = Nokogiri::XML::Builder.new
xml.items do
hash.each do |key, (_, _type, material)|
xml.item(type: _type, material: material) {
text "some_other_attribute"
}
end
end
puts xml.to_xml
# >> <?xml version="1.0"?>
# >> <items>
# >> <item type="01:Antique" material="fabric">some_other_attribute</item>
# >> <item type="02:Priceless" material="porcelain">some_other_attribute</item>
# >> <item type="01:Antique" material="silver">some_other_attribute</item>
# >> </items>
Hash's each sends the key/value pair into the block.
Using (_, _type, material) assigns each of the value's elements to the variables.
_ is a black-hole variable (not really, but it's sufficient to think of it that way for this use), and swallows the value passed to it; Effectively, it means "ignore that".
I used _type to avoid potential confusion with type. Ruby would be happy with it, but I wouldn't be.
The rest should be pretty self-evident.
A dirty raw implementation. You may try some XML library such as Nokogiri to manipulate XML generation, if needed.
hash = {
12776=>["Item", "01:Antique", "fabric"],
12777=>["Item", "02:Priceless", "porcelain"],
12740=>["Item", "01:Antique", "silver"]
}
puts "<items>"
hash.sort_by{|k, _| k}.each do |_, array|
puts %{ <item type="#{array[1]}" material="#{array[2]}">some other attribute</item>}
# or maybe the following?
#puts %{ <item type="#{array[1]}" material="#{array[2]}">#{array[3..-1].join(" ")}</item>}
end
puts "</items>"

Ruby rails - parse xml list of entries without knowing the the length

I am using XmlSimple, the problem I am having is in parsing a list of entries, determine number of entries with similar xml tag.
<ItemList>
<Item>
<ItemId>123</ItemId>
<ItemName>abc</ItemName>
<ItemType>xyz</ItemType>
<Status>ok</Status>
</Item>
</ItemList>
Above gets parsed as this -
"ItemList"=> {
"Item"=>{ "ItemId"=>"123",
"ItemName"=>"abc",
"ItemType"=>"xyz",
"Status"=>"ok"
}
},
And I access it as - ['ItemList']['Item']['ItemId'], Without any Index number anywhere.
But if ItemList has more then 1 entries then it messes up my application.
<ItemList>
<Item>
<ItemId>123</ItemId>
<ItemName>abc</ItemName>
<ItemType>xyz</ItemType>
<Status>bad</Status>
</Item>
<Item>
<ItemId>456</ItemId>
<ItemName>fgh</ItemName>
<ItemType>nbv</ItemType>
<Status>bad</Status>
</Item>
</ItemList>
Above gets parsed as this -
"ItemList"=> {
"Item"=>{ "ItemId"=>"123",
"ItemName"=>"abc",
"ItemType"=>"xyz",
"Status"=>"bad"
},
"Item"=>{ "ItemId"=>"456",
"ItemName"=>"fgh",
"ItemType"=>"nbv",
"Status"=>"bad"
}
},
I can access it as - ['ItemList']['Item'][0]['ItemId'] and ['ItemList']['Item'][1]['ItemId']. With providing an Index number manually.
But since I don't know how many items are there in the list I cannot provide index number in the actual app, the xml might have No entry or might have hundreds of them.
Thought of using Nokogiri, but it has the same parsing behavior.
How do I handle this?
Sample processing of your data using xml-simple gem
1.9.2p290 :013 > items = "<ItemList> <Item> <ItemId>123</ItemId> <ItemName>abc</ItemName> <ItemType>xyz</ItemType> <Status>bad</Status> </Item> <Item> <ItemId>456</ItemId> <ItemName>fgh</ItemName> <ItemType>nbv</ItemType> <Status>bad</Status> </Item> </ItemList>"
=> "<ItemList> <Item> <ItemId>123</ItemId> <ItemName>abc</ItemName> <ItemType>xyz</ItemType> <Status>bad</Status> </Item> <Item> <ItemId>456</ItemId> <ItemName>fgh</ItemName> <ItemType>nbv</ItemType> <Status>bad</Status> </Item> </ItemList>"
1.9.2p290 :014 > parsed_items = XmlSimple.xml_in(items, { 'KeyAttr' => 'name' })
=> {"Item"=>[{"ItemId"=>["123"], "ItemName"=>["abc"], "ItemType"=>["xyz"], "Status"=>["bad"]}, {"ItemId"=>["456"], "ItemName"=>["fgh"], "ItemType"=>["nbv"], "Status"=>["bad"]}]}
1.9.2p290 :015 > parsed_items.class
=> Hash
1.9.2p290 :016 > parsed_items["Item"].class
=> Array
1.9.2p290 :017 > parsed_items["Item"].length
=> 2
So your Item will be an array and you can apply length method on it. With my example above you can always do parsed_items["Item"].length
If you are using Ruby 1.8+, I use REXML which makes this easy. See the Accessing Elements section: http://www.germane-software.com/software/rexml/docs/tutorial.html
If 'result' is what you get from parsing your XML doc, then you could test
result['ItemList']['Item']
to check whether it is an array (or enumerable). If it is, then there's more than 1 item, and you'll have to enumerate over the items.
Alternatively, you could do this (assuming ruby 1.9):
[*result['ItemList']['Item']].each do |item|
...
end
The splat operator is cool and when used like this lets you transparently handle a value that could be nil, a scalar, or a collection.

How to add a new node to XML

I have a simple XML file, items.xml:
<?xml version="1.0" encoding="UTF-8" ?>
<items>
<item>
<name>mouse</name>
<manufacturer>Logicteh</manufacturer>
</item>
<item>
<name>keyboard</name>
<manufacturer>Logitech - Inc.</manufacturer>
</item>
<item>
<name>webcam</name>
<manufacturer>Logistech</manufacturer>
</item>
</items>
I am trying to insert a new node with the following code:
require 'rubygems'
require 'nokogiri'
f = File.open('items.xml')
#items = Nokogiri::XML(f)
f.close
price = Nokogiri::XML::Node.new "price", #items
price.content = "10"
#items.xpath('//items/item/manufacturer').each do |node|
node.add_next_sibling(price)
end
file = File.open("items_fixed.xml",'w')
file.puts #items.to_xml
file.close
However this code adds a new node only after the last <manufacturer> node, items_fixed.xml:
<?xml version="1.0" encoding="UTF-8"?>
<items>
<item>
<name>mouse</name>
<manufacturer>Logitech</manufacturer>
</item>
<item>
<name>keyboard</name>
<manufacturer>Logitech</manufacturer>
</item>
<item>
<name>webcam</name>
<manufacturer>Logitech</manufacturer><price>10</price>
</item>
</items>
Why?
It would be helpful to distinguish between a Node (a particular piece of structured XML data at a particular place in a tree), and a "node template" which is the structure of the data.
Nokogiri (and most other XML libraries) only allow you to specify Nodes, not node templates. So when you created price = Nokogiri::XML::Node.new "price", #items, you had a particular piece of data that belongs in a particular place, but hadn't defined the place yet.
When you added it to the first <item>, you defined its place. When you added it to the second <item>, you uprooted it from its place and put it in a new place. At that point this Node appeared only in the second <item>. This continues when you add the same Node to each item, until you reach the last <item>, which is where the node stays.
Nokogiri doesn't have any way to specify a node template. What you need to do is:
#items.xpath('//items/item/manufacturer').each do |node|
price = Nokogiri::XML::Node.new "price", #items
price.content = "10"
node.add_next_sibling(price)
end
I'd start with this:
require 'nokogiri'
doc = Nokogiri::XML(<<EOT)
<?xml version="1.0" encoding="UTF-8"?>
<items>
<item>
<name>mouse</name>
<manufacturer>Logitech</manufacturer>
</item>
<item>
<name>keyboard</name>
<manufacturer>Logitech - Inc.</manufacturer>
</item>
</items>
EOT
doc.search('manufacturer').each { |n| n.after('<price>10</price>') }
Which results in:
puts doc.to_xml
# >> <?xml version="1.0" encoding="UTF-8"?>
# >> <items>
# >> <item>
# >> <name>mouse</name>
# >> <manufacturer>Logitech</manufacturer><price>10</price>
# >> </item>
# >> <item>
# >> <name>keyboard</name>
# >> <manufacturer>Logitech - Inc.</manufacturer><price>10</price>
# >> </item>
# >> </items>
It's easy to build upon this to insert different values for the price.

Hpricot XML text search

Hpricot + Ruby XML parsing and logical selection.
Objective: Find all title written by author Bob.
My XML file:
<rss>
<channel>
<item>
<title>Book1</title>
<pubDate>march 1 2010</pubDate>
<author>Bob</author>
</item>
<item>
<title>book2</title>
<pubDate>october 4 2009</pubDate>
<author>Bill</author>
</item>
<item>
<title>book3</title>
<pubDate>June 5 2010</pubDate>
<author>Steve</author>
</item>
</channel>
</rss>
#my Hpricot, running this code returns no output, however the search pattern works on its own.
(doc % :rss % :channel / :item).each do |item|
a=item.search("author[text()*='Bob']")
#puts "FOUND" if a.include?"Bob"
puts item.at("title") if a.include?"Bob"
end
If you're not set on Hpricot, here's one way to do this with XPath in Nokogiri:
require 'nokogiri'
doc = Nokogiri::XML( my_rss_string )
bobs_titles = doc.xpath("//title[parent::item/author[text()='Bob']]")
p bobs_titles.map{ |node| node.text }
#=> ["Book1"]
Edit: #theTinMan's XPath also works well, is more readable, and may very well be faster:
bobs_titles = doc.xpath("//author[text()='Bob']/../title")
One of the ideas behind XPath is it allows us to navigate a DOM similarly to a disk directory:
require 'hpricot'
xml = <<EOT
<rss>
<channel>
<item>
<title>Book1</title>
<pubDate>march 1 2010</pubDate>
<author>Bob</author>
</item>
<item>
<title>book2</title>
<pubDate>october 4 2009</pubDate>
<author>Bill</author>
</item>
<item>
<title>book3</title>
<pubDate>June 5 2010</pubDate>
<author>Steve</author>
</item>
<item>
<title>Book4</title>
<pubDate>march 1 2010</pubDate>
<author>Bob</author>
</item>
</channel>
</rss>
EOT
doc = Hpricot(xml)
titles = (doc / '//author[text()="Bob"]/../title' )
titles # => #<Hpricot::Elements[{elem <title> "Book1" </title>}, {elem <title> "Book4" </title>}]>
That means: "find all the books by Bob, then look up one level and find the title tag".
I added an extra book by "Bob" to test getting all occurrences.
To get the item containing a book by Bob, just move back up a level:
items = (doc / '//author[text()="Bob"]/..' )
puts items # => nil
# >> <item>
# >> <title>Book1</title>
# >> <pubdate>march 1 2010</pubdate>
# >> <author>Bob</author>
# >> </item>
# >> <item>
# >> <title>Book4</title>
# >> <pubdate>march 1 2010</pubdate>
# >> <author>Bob</author>
# >> </item>
I also figured out what (doc % :rss % :channel / :item) is doing. It's equivalent to nesting the searches, minus the wrapping parenthesis, and these should all be the same in Hpricot-ese:
(doc % :rss % :channel / :item).size # => 4
(((doc % :rss) % :channel) / :item).size # => 4
(doc / '//rss/channel/item').size # => 4
(doc / 'rss channel item').size # => 4
Because '//rss/channel/item' is how you'd normally see an XPath accessor, and 'rss channel item' is a CSS accessor, I'd recommend using those formats for maintenance and clarity.

Resources