Ruby Nested Hashes - Determining separate events - ruby

I have many unique events in XML that I have converted into a large Hash of around 300 keys. The values of most of these keys are Hashes and again, the value of some of those keys are Hashes again. I do not know how deep the hash nesting will go.
I would like to write each Hash of the original 300 and all of its keys & values (no matter how many it may have) to a schema-less database.
I have managed to write a (messy) method that outputs the values of each Hash, no matter how many Hashes its values may contain.
The problem that I am now faced with is that I am unable to determine where one Hash starts, and one Hash ends. Therefore I am unable to write separate events to the database as I am just left with the output of all my Hashes.
How can I determine which are separate events?
Here is my code:
require 'crack'
require 'awesome_print'
def printingOutHash(inputHash)
#ap inputHash
if inputHash.kind_of?(Array)
puts "array"
inputHash.each do |x|
printingOutHash(x)
end
end
if inputHash.kind_of?(Hash)
inputHash.each do |k, v|
if v.kind_of?(Hash)
printingOutHash(v)
else
ap "#{k}: #{v}"
end
end
end
end
h = Crack::XML.parse("<Events><Event><System><Provider Name='Service Control Manager' Guid='{555908d1-a6d7-4695-8e1e-26931d2012f4}' EventSourceName='Service Control Manager'/><EventID Qualifiers='16384'>7036</EventID><Version>0</Version><Level>4</Level><Task>0</Task><Opcode>0</Opcode><Keywords>0x8080000000000000</Keywords><TimeCreated SystemTime='2013-03-25T05:00:38.021800000Z'/><EventRecordID>17629</EventRecordID><Correlation/><Execution ProcessID='476' ThreadID='3028'/><Channel>System</Channel><Computer>AMAZONA-ONIST5V</Computer><Security/></System><EventData><Data Name='param1'>Windows Modules Installer</Data><Data Name='param2'>stopped</Data><Binary>540072007500730074006500640049006E007300740061006C006C00650072002F0031000000</Binary></EventData></Event><Event><System><Provider Name='Service Control Manager' Guid='{555908d1-a6d7-4695-8e1e-26931d2012f4}' EventSourceName='Service Control Manager'/><EventID Qualifiers='16384'>7040</EventID><Version>0</Version><Level>4</Level><Task>0</Task><Opcode>0</Opcode><Keywords>0x8080000000000000</Keywords><TimeCreated SystemTime='2013-03-25T05:00:37.741000000Z'/><EventRecordID>17628</EventRecordID><Correlation/><Execution ProcessID='476' ThreadID='3028'/><Channel>System</Channel><Computer>AMAZONA-ONIST5V</Computer><Security UserID='S-1-5-18'/></System><EventData><Data Name='param1'>Windows Modules Installer</Data><Data Name='param2'>auto start</Data><Data Name='param3'>demand start</Data><Data Name='param4'>TrustedInstaller</Data></EventData></Event></Events>")
printingOutHash(h['Events']['Event'])

Converting XML to hashes is not a good idea when you're dealing with complex or large data files because walking a hash isn't very convenient in comparison to searching the XML. Parsing XML is really simple using the right tools like Nokogiri.
Basing off your XML:
require 'nokogiri'
xml = "
<Events>
<Event>
<System>
<Provider Name='Service Control Manager' Guid='{555908d1-a6d7-4695-8e1e-26931d2012f4}' EventSourceName='Service Control Manager'/>
<EventID Qualifiers='16384'>7036</EventID>
<Version>0</Version>
<Level>4</Level>
<Task>0</Task>
<Opcode>0</Opcode>
<Keywords>0x8080000000000000</Keywords>
<TimeCreated SystemTime='2013-03-25T05:00:38.021800000Z'/>
<EventRecordID>17629</EventRecordID>
<Correlation/>
<Execution ProcessID='476' ThreadID='3028'/>
<Channel>System</Channel>
<Computer>AMAZONA-ONIST5V</Computer>
<Security/>
</System>
<EventData>
<Data Name='param1'>Windows Modules Installer</Data>
<Data Name='param2'>stopped</Data>
<Binary>540072007500730074006500640049006E007300740061006C006C00650072002F0031000000</Binary>
</EventData>
</Event>
<Event>
<System>
<Provider Name='Service Control Manager' Guid='{555908d1-a6d7-4695-8e1e-26931d2012f4}' EventSourceName='Service Control Manager'/>
<EventID Qualifiers='16384'>7040</EventID>
<Version>0</Version>
<Level>4</Level>
<Task>0</Task>
<Opcode>0</Opcode>
<Keywords>0x8080000000000000</Keywords>
<TimeCreated SystemTime='2013-03-25T05:00:37.741000000Z'/>
<EventRecordID>17628</EventRecordID>
<Correlation/>
<Execution ProcessID='476' ThreadID='3028'/>
<Channel>System</Channel>
<Computer>AMAZONA-ONIST5V</Computer>
<Security UserID='S-1-5-18'/>
</System>
<EventData>
<Data Name='param1'>Windows Modules Installer</Data>
<Data Name='param2'>auto start</Data>
<Data Name='param3'>demand start</Data>
<Data Name='param4'>TrustedInstaller</Data>
</EventData>
</Event>
</Events>"
Here's the start of how I'd grab the data:
doc = Nokogiri::XML(xml)
pp doc.search('Event').map { |event|
system_provider_node = event.at('System Provider')
system = {
provider: {
name: system_provider_node['Name'],
guid: system_provider_node['Guid'],
event_source_name: system_provider_node['EventSourceName']
}
}
event_data = event.search('EventData Data').map{ |data|
{
name: data['Name'],
text: data.text
}
}
{
system: system,
event_data: event_data
}
}
Which results in:
[{:system=>
{:provider=>
{:name=>"Service Control Manager",
:guid=>"{555908d1-a6d7-4695-8e1e-26931d2012f4}",
:event_source_name=>"Service Control Manager"}},
:event_data=>
[{:name=>"param1", :text=>"Windows Modules Installer"},
{:name=>"param2", :text=>"stopped"}]},
{:system=>
{:provider=>
{:name=>"Service Control Manager",
:guid=>"{555908d1-a6d7-4695-8e1e-26931d2012f4}",
:event_source_name=>"Service Control Manager"}},
:event_data=>
[{:name=>"param1", :text=>"Windows Modules Installer"},
{:name=>"param2", :text=>"auto start"},
{:name=>"param3", :text=>"demand start"},
{:name=>"param4", :text=>"TrustedInstaller"}]}]
You don't have to build an array of hashes. For each <Event> you could peel off the elements and build separate rows in various tables in a database. It's really up to what works in your head.

Related

How to combine two XML files with Nokogiri

I am trying to combine two separate, but related, files with Nokogiri. I want to combine the "product" and "product pricing" if "ItemNumber" is the same.
I loaded the documents, but I have no idea how to combine the two.
Product File:
<Products>
<Product>
<Name>36-In. Homeowner Bent Single-Bit Axe Handle</Name>
<ProductTypeId>0</ProductTypeId>
<Description>This single bit curved grip axe handle is made for 3 to 5 pound axes. A good quality replacement handle made of American hickory with a natural wax finish. Hardwood handles do not conduct electricity and American Hickory is known for its strength, elasticity and ability to absorb shock. These handles provide exceptional value and economy for homeowners and other occasional use applications. Each Link handle comes with the required wedges, rivets, or epoxy needed for proper application of the tool head.</Description>
<ActiveFlag>Y</ActiveFlag>
<ImageFile>100024.jpg</ImageFile>
<ItemNumber>100024</ItemNumber>
<ProductVariants>
<ProductVariant>
<Sku>100024</Sku>
<ColorName></ColorName>
<SizeName></SizeName>
<SequenceNo>0</SequenceNo>
<BackOrderableFlag>N</BackOrderableFlag>
<InventoryLevel>0</InventoryLevel>
<ColorCode></ColorCode>
<SizeCode></SizeCode>
<TaxableFlag>Y</TaxableFlag>
<VariantPromoGroupCode></VariantPromoGroupCode>
<PricingGroupCode></PricingGroupCode>
<StartDate xsi:nil="true"></StartDate>
<EndDate xsi:nil="true"></EndDate>
<ActiveFlag>Y</ActiveFlag>
</ProductVariant>
</ProductVariants>
</Product>
</Products>
Product Pricing Fields:
<ProductPricing>
<ItemNumber>100024</ItemNumber>
<AcquisitionCost>8.52</AcquisitionCost>
<MemberCost>10.7</MemberCost>
<Price>14.99</Price>
<SalePrice xsi:nil="true"></SalePrice>
<SaleCode>0</SaleCode>
</ProductPricing>
I am looking to generate a file like this:
<Products>
<Product>
<Name>36-In. Homeowner Bent Single-Bit Axe Handle</Name>
<ProductTypeId>0</ProductTypeId>
<Description>This single bit curved grip axe handle is made for 3 to 5 pound axes. A good quality replacement handle made of American hickory with a natural wax finish. Hardwood handles do not conduct electricity and American Hickory is known for its strength, elasticity and ability to absorb shock. These handles provide exceptional value and economy for homeowners and other occasional use applications. Each Link handle comes with the required wedges, rivets, or epoxy needed for proper application of the tool head.</Description>
<ActiveFlag>Y</ActiveFlag>
<ImageFile>100024.jpg</ImageFile>
<ItemNumber>100024</ItemNumber>
<ProductVariants>
<ProductVariant>
<Sku>100024</Sku>
<ColorName></ColorName>
<SizeName></SizeName>
<SequenceNo>0</SequenceNo>
<BackOrderableFlag>N</BackOrderableFlag>
<InventoryLevel>0</InventoryLevel>
<ColorCode></ColorCode>
<SizeCode></SizeCode>
<TaxableFlag>Y</TaxableFlag>
<VariantPromoGroupCode></VariantPromoGroupCode>
<PricingGroupCode></PricingGroupCode>
<StartDate xsi:nil="true"></StartDate>
<EndDate xsi:nil="true"></EndDate>
<ActiveFlag>Y</ActiveFlag>
</ProductVariant>
</ProductVariants>
</Product>
<ProductPricing>
<ItemNumber>100024</ItemNumber>
<AcquisitionCost>8.52</AcquisitionCost>
<MemberCost>10.7</MemberCost>
<Price>14.99</Price>
<SalePrice xsi:nil="true"></SalePrice>
<SaleCode>0</SaleCode>
</ProductPricing>
</Products>
Here is the code I have so far:
require 'csv'
require 'nokogiri'
xml = File.read('lateApril-product-pricing.xml')
xml2 = File.read('lateApril-master-date')
doc = Nokogiri::XML(xml)
doc2 = Nokogiri::XML(xml2)
pricing_data = []
item_number = []
doc.xpath('//ProductsPricing/ProductPricing').each do |file|
itemNumber = file.xpath('./ItemNumber').first.text
variant_Price = file.xpath('./Price').first.text
pricing_data << [ itemNumber, variant_Price ]
item_number << [ itemNumber ]
end
puts item_number ## This prints all the item number but i have no idea how to loop through them and combine them with Product XML
doc2.xpath('//Products/Product').each do |file|
itemNumber = file.xpath('./ItemNumber').first.text #not sure how to write the conditions here since i don't have pricing fields available in this method
end
Try this on:
require 'nokogiri'
doc1 = Nokogiri::XML(<<EOT)
<Products>
<Product>
<Name>36-In. Homeowner Bent Single-Bit Axe Handle</Name>
</Product>
</Products>
EOT
doc2 = Nokogiri::XML(<<EOT)
<ProductPricing>
<ItemNumber>100024</ItemNumber>
</ProductPricing>
EOT
doc1.at('Product').add_next_sibling(doc2.at('ProductPricing'))
Which results in:
puts doc1.to_xml
# >> <?xml version="1.0"?>
# >> <Products>
# >> <Product>
# >> <Name>36-In. Homeowner Bent Single-Bit Axe Handle</Name>
# >> </Product><ProductPricing>
# >> <ItemNumber>100024</ItemNumber>
# >> </ProductPricing>
# >> </Products>
Please, when you ask, strip the example input and expected resulting output to the absolute, bare, minimum. Anything beyond that wastes space, eye-time and brain CPU.
This is untested code, but is where I'd start if I was going to merge two files containing multiple <ItemNumber> nodes:
require 'nokogiri'
doc1 = Nokogiri::XML(<<EOT)
<Products>
<Product>
<Name>36-In. Homeowner Bent Single-Bit Axe Handle</Name>
<ItemNumber>100024</ItemNumber>
</Product>
</Products>
EOT
doc2 = Nokogiri::XML(<<EOT)
<ProductPricing>
<ItemNumber>100024</ItemNumber>
</ProductPricing>
EOT
# build a hash containing the item numbers in doc1 for each product
doc1_products_by_item_numbers = doc1.search('Product').map { |product|
item_number = product.at('ItemNumber').value
[
item_number,
product
]
}.to_hash
# build a hash containing the item numbers in doc2 for each product pricing
doc2_products_by_item_numbers = doc2.search('ProductPricing').map { |pricing|
item_number = pricing.at('ItemNumber').value
[
item_number,
pricing
]
}.to_hash
# append doc2 entries to doc1 after each product based on item numbers
doc1_products_by_item_numbers.keys.each { |k|
doc1_products_by_item_numbers[k].add_next_sibling(doc2_products_by_item_numbers[k])
}

Parsing with ruby

I am new to ruby and I have a school project were I am parsing a xml file and need to get data after certain tags. I can only use core ruby. No gems
pFile = File.open("myfile.mzML", "r")
regmsLvl = "ms level\" value=\""
pFile.each_line { |line|
scn = line.scan(/#{regmsLvl}(\d)/)
#what I want to do but doesn't work
if scn == 1
puts("Got it!")
end
#what I have to do to compare if == 1
if scn != nil
scn.each do |val|
if val[0].to_i == 1
puts("Got it!")
end
end
end
}
# a sample line that I am parsing is:
<cvParam cvRef="MS" accession="MS:1000511" name="ms level" value="1" />
This seems silly.
line.scans out put makes scn a 2d array. How can I just have it be a string that gets overridden each pass. Or how should I change this whole thing. Any suggestions are appreciated.
puts(scn) prints out the 1 but if I do scn == 1 or scn.to_i == 1 it never gets into the if. I have tried scn.pop and scn.pop.pop
I have added a section to show what I am trying to do now.
I need to check the ms level: if 1 then get scan start time and then the binary. This is the code that I am now working with.
xmlfile = File.new("afile.mzML")
xmldoc = Document.new(xmlfile)
root = xmldoc.root
puts "Root element : " + root.attributes["xmlns"]
xmldoc.elements.each("mzML/run/spectrumList/spectrum/cvParam"){
|e| if e.attributes["value"].to_i ==1
# Now I need to get start time: #
["mzML/run/spectrumList/spectrum/cvParam/scanList/scan/value"]
# and then
["mzML/run/spectrumList/spectrum/cvParam/binaryDataArrayList/binaryDataArray/binary"]
end
}
<run id="ru_0" defaultInstrumentConfigurationRef="ic_0" sampleRef="sa_0" defaultSourceFileRef="sf_ru_0">
<spectrumList count="3310" defaultDataProcessingRef="dp_sp_0">
<spectrum id="scan=8839" index="0" defaultArrayLength="171" dataProcessingRef="dp_sp_0">
<cvParam cvRef="MS" accession="MS:1000525" name="spectrum representation" />
<cvParam cvRef="MS" accession="MS:1000511" name="ms level" value="1" />
<cvParam cvRef="MS" accession="MS:1000294" name="mass spectrum" />
<cvParam cvRef="MS" accession="MS:1000130" name="positive scan" />
<scanList count="1">
<cvParam cvRef="MS" accession="MS:1000795" name="no combination" />
<scan>
<cvParam cvRef="MS" accession="MS:1000016" name="scan start time" value="5429.47" unitAccession="UO:0000010" unitName="second" unitCvRef="UO" />
</scan>
</scanList>
<binaryDataArrayList count="2">
<binaryDataArray encodedLength="1824">
<cvParam cvRef="MS" accession="MS:1000514" name="m/z array" unitAccession="MS:1000040" unitName="m/z" unitCvRef="MS" />
<cvParam cvRef="MS" accession="MS:1000523" name="64-bit float" />
<cvParam cvRef="MS" accession="MS:1000576" name="no compression" />
<binary>AAAAQBCdgkAAAACAP6KCQAAAAAA8pIJAAAAAYAWlgkAAAABgQ6aCQAAAAGCzp4JAAAAAQEaogkAAAACgDKqCQAAAAEAgqoJAAAAAwEOqgkAAAABAWKqCQAAAAGBErIJAAAAAIOetgkAAAABAMLCCQAAAAGDlsYJAAAAA4DeygkAAAACAw7SCQAAAACBauIJAAAAAwFC6gkAAAACAYb6CQAAAAIDnwYJAAAAAwDjHgkAAAAAATMyCQAAAAADnzIJAAAAAAArOgkAAAACgTc6CQAAAAKBqzoJAAAAAQJLPgkAAAACAVNCCQAAAAAAK0oJAAAAAIF7SgkAAAADABNSCQAAAAKAx1YJAAAAAYHXXgkAAAAAg3teCQAAAAOAf2oJAAAAAICbcgkAAAAAAx92CQAAAAKA03oJAAAAAIBXigkAAAABAO+KCQAAAAKCr5YJAAAAAYMnlgkAAAADgK+aCQAAAAKDq6YJAAAAAAC3qgkAAAACgNe6CQAAAAMCA74JAAAAAANL0gkAAAAAAUfiCQAAAAOCt+YJAAAAA4O75gkAAAACAPPqCQAAAAGBq/oJAAAAAwEQCg0AAAABAKAqDQAAAAAAoDoNAAAAA4G0Og0AAAADAZhKDQAAAACCBEoNAAAAAwIQWg0AAAABAjheDQAAAAMA+GoNAAAAAQIYag0AAAAAA7RyDQAAAAEB9HYNAAAAAwIseg0AAAADgbyKDQAAAAAAPJINAAAAAgEUlg0AAAACgYCaDQAAAAOBfKoNAAAAA4DAug0AAAADAZi+DQAAAAAA0MINAAAAAoFMwg0AAAAAgMjKDQAAAACA2NINAAAAAgDk2g0AAAAAg+DyDQAAAAOAfPoNAAAAAAKU/g0AAAAAgQUKDQAAAAKBVQoNAAAAAYNRHg0AAAAAgf0qDQAAAAICZSoNAAAAAIDFQg0AAAAAgM1KDQAAAAEBjUoNAAAAAoGNUg0AAAAAAZ1aDQAAAAABqWINAAAAAYHhZg0AAAACAfl2DQAAAAEAcXoNAAAAAICpfg0AAAADgw2GDQAAAAACmZ4NAAAAAQDRog0AAAABAiWqDQAAAAAAibYNAAAAAQHpug0AAAABAEnKDQAAAAABCcoNAAAAAoHxyg0AAAACgGXaDQAAAAMBDdoNAAAAAgJR2g0AAAAAgHHqDQAAAAEBGeoNAAAAAIHh6g0AAAABAl3qDQAAAAKCkfYNAAAAAYE5+g0AAAAAAm36DQAAAAEDigYNAAAAAQGWCg0AAAABAjYKDQAAAACClgoNAAAAA4ESGg0AAAABgYIaDQAAAAMDSh4NAAAAAYCqIg0AAAADAT4qDQAAAAACCioNAAAAAwJmOg0AAAABAnZKDQAAAAKDJlINAAAAAgHGWg0AAAABgl5eDQAAAAEB4mINAAAAA4B2eg0AAAADgKKCDQAAAAGAvooNAAAAAwJakg0AAAABAUaiDQAAAAGBgqoNAAAAAIBatg0AAAADAxa6DQAAAAKCosoNAAAAAICy6g0AAAAAAbrqDQAAAAACRuoNAAAAAAMa/g0AAAACgOsCDQAAAAABzwoNAAAAAIOTCg0AAAACADcWDQAAAAGB4xoNAAAAAQOfGg0AAAAAAvceDQAAAAEBZyoNAAAAA4OnKg0AAAAAgMs6DQAAAAOC/z4NAAAAAYInUg0AAAABgftaDQAAAAODC1oNAAAAAwJXXg0AAAAAAgdiDQAAAAKA/2oNAAAAAoILag0AAAABghtyDQAAAAGCm3INAAAAAAO7cg0AAAACgr9+DQAAAAGCY4oNAAAAAgDbkg0AAAABAN+WDQAAAAKBU5oNA</binary>
</binaryDataArray>
I think you were pretty close. Assuming you can use that REXML library (which looks like it's part of the core ruby library) you should be able to do this
require 'rexml/document'
xmlfile = File.new("afile.mzML")
xmldoc = REXML::Document.new(xmlfile)
root = xmldoc.root
start_time = nil
binary = nil
# get the ms level
ms_level = root.elements["spectrumList/spectrum/cvParam[#name='ms level']"].attributes["value"].to_i
if ms_level == 1
# get the scan start time
start_time = root.elements["spectrumList/spectrum/scanList/scan/cvParam[#name='scan start time']"].attributes["value"]
# get the binary
binary = root.elements["spectrumList/spectrum/binaryDataArrayList/binaryDataArray/binary"].text
end
p start_time # => "5429.47"
p binary # => that crazy long binary
This REXML tutorial is helpful: http://www.germane-software.com/software/rexml/docs/tutorial.html
Note, I made a few assumptions, like the elements would always exist, the ms level was always an int, the file structure is always the same. Those assumptions may not be true in your situation but this should be a start.

Ruby Hash parsed_response error

BACKGROUND
I am using HTTParty to parse an XML hash response. Unfortunately, when the hash response only has one entry(?), the resulting hash is not indexable. I have confirmed the resulting XML syntax is the same for single and multiple entry(?). I have also confirmed my code works when there are always multiple entries(?) in the hash.
QUESTION
How do I accommodate the single hash entry case and/or is there an easier way to accomplish what I am trying to do?
CODE
require 'httparty'
class Rest
include HTTParty
format :xml
end
def test_redeye
# rooms and devices
roomID = Hash.new
deviceID = Hash.new { |h,k| h[k] = Hash.new }
rooms = Rest.get(#reIp["theater"] + "/redeye/rooms/").parsed_response["rooms"]
puts "rooms #{rooms}"
rooms["room"].each do |room|
puts "room #{room}"
roomID[room["name"].downcase.strip] = "/redeye/rooms/" + room["roomId"]
puts "roomid #{roomID}"
devices = Rest.get(#reIp["theater"] + roomID[room["name"].downcase.strip] + "/devices/").parsed_response["devices"]
puts "devices #{devices}"
devices["device"].each do |device|
puts "device #{device}"
deviceID[room["name"].downcase.strip][device["displayName"].downcase.strip] = "/devices/" + device["deviceId"]
puts "deviceid #{deviceID}"
end
end
say "Done"
end
XML - SINGLE ENTRY
<?xml version="1.0" encoding="UTF-8" ?>
<devices>
<device manufacturerName="Philips" description="" portType="infrared" deviceType="0" modelName="" displayName="TV" deviceId="82" />
</devices>
XML - MULTIPLE ENTRY
<?xml version="1.0" encoding="UTF-8" ?>
<devices>
<device manufacturerName="Denon" description="" portType="infrared" deviceType="6" modelName="Avr-3311ci" displayName="AVR" deviceId="77" />
<device manufacturerName="Philips" description="" portType="infrared" deviceType="0" modelName="" displayName="TV" deviceId="82" />
</devices>
RESULTING ERROR
[Info - Plugin Manager] Matches, executing block
rooms {"room"=>[{"name"=>"Home Theater", "currentActivityId"=>"78", "roomId"=>"-1", "description"=>""}, {"name"=>"Living", "currentActivityId"=>"-1", "roomId"=>"81", "description"=>"2nd Floor"}, {"name"=>"Theater", "currentActivityId"=>"-1", "roomId"=>"80", "description"=>"1st Floor"}]}
room {"name"=>"Home Theater", "currentActivityId"=>"78", "roomId"=>"-1", "description"=>""}
roomid {"home theater"=>"/redeye/rooms/-1"}
devices {"device"=>[{"manufacturerName"=>"Denon", "description"=>"", "portType"=>"infrared", "deviceType"=>"6", "modelName"=>"Avr-3311ci", "displayName"=>"AVR", "deviceId"=>"77"}, {"manufacturerName"=>"Philips", "description"=>"", "portType"=>"infrared", "deviceType"=>"0", "modelName"=>"", "displayName"=>"TV", "deviceId"=>"82"}]}
device {"manufacturerName"=>"Denon", "description"=>"", "portType"=>"infrared", "deviceType"=>"6", "modelName"=>"Avr-3311ci", "displayName"=>"AVR", "deviceId"=>"77"}
deviceid {"home theater"=>{"avr"=>"/devices/77"}}
device {"manufacturerName"=>"Philips", "description"=>"", "portType"=>"infrared", "deviceType"=>"0", "modelName"=>"", "displayName"=>"TV", "deviceId"=>"82"}
deviceid {"home theater"=>{"avr"=>"/devices/77", "tv"=>"/devices/82"}}
room {"name"=>"Living", "currentActivityId"=>"-1", "roomId"=>"81", "description"=>"2nd Floor"}
roomid {"home theater"=>"/redeye/rooms/-1", "living"=>"/redeye/rooms/81"}
devices {"device"=>{"manufacturerName"=>"Philips", "description"=>"", "portType"=>"infrared", "deviceType"=>"0", "modelName"=>"", "displayName"=>"TV", "deviceId"=>"82"}}
device ["manufacturerName", "Philips"]
/usr/local/rvm/gems/ruby-1.9.3-p374#SiriProxy/gems/siriproxy-0.3.2/plugins/siriproxy-redeye/lib/siriproxy-redeye.rb:145:in `[]': can't convert String into Integer (TypeError)
There are a couple of options I see. If you control the endpoint, you could modify the XML being sent to accomodate HTTParty's underlying XML parser, Crack by putting a type="array" attribute on the devices XML element.
Otherwise, you could check to see what class the device is before indexing into it:
case devices["device"]
when Array
# act on the collection
else
# act on the single element
end
It's much less than ideal whenever you have to do type-checking in a dynamic language, so if you find yourself doing this more than once it may be worth introducing polymorphism or at the very least extracting a method to do this.

Should Nokogiri::XML.parse be creating separate Text nodes for linefeeds?

I have an XML document created by an outside tool:
<?xml version="1.0" encoding="UTF-8"?>
<suite>
<id>S1</id>
<name>First Suite</name>
<description></description>
<sections>
<section>
<name>section 1</name>
<cases>
<case>
<id>C1</id>
<title>Test 1.1</title>
<type>Other</type>
<priority>4 - Must Test</priority>
<estimate></estimate>
<milestone></milestone>
<references></references>
</case>
<case>
<id>C2</id>
<title>Test 1.2</title>
<type>Other</type>
<priority>4 - Must Test</priority>
<estimate></estimate>
<milestone></milestone>
<references></references>
</case>
</cases>
</section>
</sections>
</suite>
From irb, I do the following: (Output suppressed until final command)
> require('nokogiri')
> doc = Nokogiri::XML.parse(open('./test.xml'))
> test_case = doc.search('case').first
=> #<Nokogiri::XML::Element:0x3ff75851bc44 name="case" children=[#<Nokogiri::XML::Text:0x3ff75851b8fc "\n ">, #<Nokogiri::XML::Element:0x3ff75851b7bc name="id" children=[#<Nokogiri::XML::Text:0x3ff75851b474 "C1">]>, #<Nokogiri::XML::Text:0x3ff75851b1cc "\n ">, #<Nokogiri::XML::Element:0x3ff75851b078 name="title" children=[#<Nokogiri::XML::Text:0x3ff75851ad58 "Test 1.1">]>, #<Nokogiri::XML::Text:0x3ff75851aa9c "\n ">, #<Nokogiri::XML::Element:0x3ff75851a970 name="type" children=[#<Nokogiri::XML::Text:0x3ff75851a6c8 "Other">]>, #<Nokogiri::XML::Text:0x3ff7585191d8 "\n ">, #<Nokogiri::XML::Element:0x3ff7585190d4 name="priority" children=[#<Nokogiri::XML::Text:0x3ff758518d64 "4 - Must Test">]>, #<Nokogiri::XML::Text:0x3ff758518ad0 "\n ">, #<Nokogiri::XML::Element:0x3ff7585189a4 name="estimate">, #<Nokogiri::XML::Text:0x3ff758518670 "\n ">, #<Nokogiri::XML::Element:0x3ff758518558 name="milestone">, #<Nokogiri::XML::Text:0x3ff7585182b0 "\n ">, #<Nokogiri::XML::Element:0x3ff758518184 name="references">, #<Nokogiri::XML::Text:0x3ff758517ef0 "\n ">]>
This results in a number of children that look like the following:
#<Nokogiri::XML::Text:0x3ff758517ef0 "\n ">
I want to iterate through these XML nodes without having to do something like:
> real_nodes = test_case.children.reject{|n| n.node_name == 'text' && n.content.strip!.empty?}
I couldn't find a parse parameter in the Nokogiri docs to suppress the treating of newlines as separate nodes. Is there a way to do this during the parse instead of after?
Check the documentation. You can just do this:
doc = Nokogiri::XML.parse(open('./test.xml')) do |config|
config.noblanks
end
That will load the file without any empty nodes.
The text nodes are the result of pretty-printing the XML. The spec doesn't require whitespace between tags, and, for efficiency, a huge XML file could be stripped of inter-tag whitespace to save space and reduce transfer time, without sacrificing the data content.
This might show what's happening:
require 'nokogiri'
xml = '<foo></foo>'
Nokogiri::XML(xml).at('foo').child
=> nil
With no whitespace between the tags there's no text node either.
xml = '<foo>
</foo>'
Nokogiri::XML(xml).at('foo').child
=> #<Nokogiri::XML::Text:0x3fcee9436ff0 "\n">
doc.at('foo').child.class
=> Nokogiri::XML::Text
With whitespace for pretty-printing, the XML has a text node following the foo tag.

Trying to parse a XML using Nokogiri with Ruby

I am new to programming so bear with me. I have an XML document that looks like this:
File name: PRIDE1542.xml
<ExperimentCollection version="2.1">
<Experiment>
<ExperimentAccession>1015</ExperimentAccession>
<Title>**Protein complexes in Saccharomyces cerevisiae (GPM06600002310)**</Title>
<ShortLabel>GPM06600002310</ShortLabel>
<Protocol>
<ProtocolName>**None**</ProtocolName>
</Protocol>
<mzData version="1.05" accessionNumber="1015">
<cvLookup cvLabel="RESID" fullName="RESID Database of Protein Modifications" version="0.0" address="http://www.ebi.ac.uk/RESID/" />
<cvLookup cvLabel="UNIMOD" fullName="UNIMOD Protein Modifications for Mass Spectrometry" version="0.0" address="http://www.unimod.org/" />
<description>
<admin>
<sampleName>**GPM06600002310**</sampleName>
<sampleDescription comment="Ho, Y., et al., Systematic identification of protein complexes in Saccharomyces cerevisiae by mass spectrometry. Nature. 2002 Jan 10;415(6868):180-3.">
<cvParam cvLabel="NEWT" accession="4932" name="Saccharomyces cerevisiae (Baker's yeast)" value="Saccharomyces cerevisiae" />
</sampleDescription>
</admin>
</description>
<spectrumList count="0" />
</mzData>
</Experiment>
</ExperimentCollection>
I want to take out the text in between <Title>, <ProtocolName>, and <SampleName> and put into a text file (I tried bolding them to making it easier to see). I have the following code so far (based on posts I saw on this site), but it seems not to work:
>> require 'rubygems'
>> require 'nokogiri'
>> doc = Nokogiri::XML(File.open("PRIDE_Exp_Complete_Ac_10094.xml"))
>> #ExperimentCollection = doc.css("ExperimentCollection Title").map {|node| node.children.text }
Can someone help me?
Try to access them using xpath expressions. You can enter the path through the parse tree using slashes.
puts doc.xpath( "/ExperimentCollection/Experiment/Title" ).text
puts doc.xpath( "/ExperimentCollection/Experiment/Protocol/ProtocolName" ).text
puts doc.xpath( "/ExperimentCollection/Experiment/mzData/description/admin/sampleName" ).text

Resources