Parsing XML to get the population of Albania? - ruby

I am trying to learn how to use Nokogiri and parse XML files, however I can't seem to get past this issue I am having.
I have this XML file with information about countries such as population, name, religion, inflation etc.:
<cia>
<continent id='europe'
name='Europe'/>
<continent id='asia'
name='Asia'/>
<continent id='northAmerica'
name='North America'/>
<continent id='australia'
name='Australia/Oceania'/>
<continent id='southAmerica'
name='South America'/>
<continent id='africa'
name='Africa'/>
<country id='cid-cia-Albania'
continent='Europe'
name='Albania'
datacode='AL'
total_area='28750'
population='3249136'
population_growth='1.34'
infant_mortality='49.2'
gdp_agri='55'
inflation='16'
gdp_total='4100'
indep_date='28 11 1912'
government='emerging democracy'
capital='Tirane'>
<ethnicgroups name='Greeks'>3</ethnicgroups>
<ethnicgroups name='Albanian'>95</ethnicgroups>
<religions name='Muslim'>70</religions>
<religions name='Roman Catholic'>10</religions>
<religions name='Albanian Orthodox'>20</religions>
<borders country='cid-cia-Greece'>282</borders>
<borders country='cid-cia-Macedonia'>151</borders>
<borders country='cid-cia-Serbia-and-Montenegro'>287</borders>
<coasts>Adriatic Sea</coasts>
<coasts>Ionian Sea</coasts>
<coasts>Serbia</coasts>
<coasts>Montenegro</coasts>
</country>
.
.
.
</cia>
I am trying to find a country by passing in the name of the country as an argument, and, from there, trying to get the population of the country, but I can't for some reason. Here is my method:
#doc = Nokogiri::XML(File.read(file)) # get the file from the initialize method
def get_population(country)
element = #doc.xpath("//country[#name='#{country}']")
end
So if I do:
get_population('Albania')
How can I get this method to get the population for Albania? Currently all I get is the XML for that country.
Thanks for all the help in advance!

Do as below
def get_population(country)
element = #doc.at_xpath("//country[#name='#{country}']/#population")
element.text
end
#doc.at_xpath("//country[#name='#{country}']/#population") will give you Nokogiri::XML::Attr instance.Now Nokogiri::XML::Attr inherits from Nokogiri::XML::Node. So you can use Nokogiri::XML::Node#text method, on the instance of Nokogiri::XML::Attr.

Using CSS selectors makes this very straight-forward:
require 'nokogiri'
xml = "<cia>
<continent id='europe'
name='Europe'/>
<continent id='asia'
name='Asia'/>
<continent id='northAmerica'
name='North America'/>
<continent id='australia'
name='Australia/Oceania'/>
<continent id='southAmerica'
name='South America'/>
<continent id='africa'
name='Africa'/>
<country id='cid-cia-Albania'
continent='Europe'
name='Albania'
datacode='AL'
total_area='28750'
population='3249136'
population_growth='1.34'
infant_mortality='49.2'
gdp_agri='55'
inflation='16'
gdp_total='4100'
indep_date='28 11 1912'
government='emerging democracy'
capital='Tirane'>
<ethnicgroups name='Greeks'>3</ethnicgroups>
<ethnicgroups name='Albanian'>95</ethnicgroups>
<religions name='Muslim'>70</religions>
<religions name='Roman Catholic'>10</religions>
<religions name='Albanian Orthodox'>20</religions>
<borders country='cid-cia-Greece'>282</borders>
<borders country='cid-cia-Macedonia'>151</borders>
<borders country='cid-cia-Serbia-and-Montenegro'>287</borders>
<coasts>Adriatic Sea</coasts>
<coasts>Ionian Sea</coasts>
<coasts>Serbia</coasts>
<coasts>Montenegro</coasts>
</country>
</cia>
"
Here's the gist of the code:
doc = Nokogiri::XML(xml)
doc.at('country[name="Albania"]')['population']
# => "3249136"

Related

Ruby nokogiri attribute selector in XML file

this is the xml file:
<?xml version="1.0" encoding="UTF-8"?>
<SOAP-ENV:Envelope SOAP-ENV:encodingStyle="http://schemas.xmlsoap.org/soap/encoding/"
xmlns:SOAP-ENV="http://schemas.xmlsoap.org/soap/envelope/"
xmlns:xsd="http://www.w3.org/2001/XMLSchema"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xmlns:SOAP-ENC="http://schemas.xmlsoap.org/soap/encoding/">
<SOAP-ENV:Body>
<ns1:putResponse
xmlns:ns1="urn:DmsManagerClient">
<result xsi:type="xsd:string">
<?xml version="1.0" encoding="ISO-8859-1"?>
<MESSAGE ID="11c73b9e-687c-4300-baba-b743c26f7c83" TYPE="CUSDMS">
<DELIVERY>
<FROM>
<SENDER>0072000</SENDER>
<SERVICE>eService</SERVICE>
<DATE>2019-03-08T12:27:25</DATE>
</FROM>
<TO>
<DEALER DEALERCODE="0072000" MARKETCODE="1000"/>
</TO>
</DELIVERY>
<CONTENT>
<dms:ComplexResponse ErrorCode="430" ErrorDescription="null : PrivacyUE Mancante" Return="false"
xmlns:dms="http://dmsmanagerservice">
<dms:Element Name="DMSVERSION">2.7</dms:Element>
</dms:ComplexResponse>
</CONTENT>
</MESSAGE>
</result>
</ns1:putResponse>
</SOAP-ENV:Body>
</SOAP-ENV:Envelope>
I am coding with Ruby and I used Nokogiri and the method xpath to extrapole the "CONTENT" of the file
this is the code:
def extrapolate_error(xml)
doc = Nokogiri::XML(File.open(xml))
doc.xpath('//CONTENT')
end
and this is the result:
[#<Nokogiri::XML::Element:0x1c5ba78 name="CONTENT" children=[
#<Nokogiri::XML::Text:0x1c5b940 "\n">,
#<Nokogiri::XML::Element:0x1c5b8bc name="ComplexResponse" namespace=#<Nokogiri::XML::Namespace:0x1c5b88c prefix="dms" href="http://dmsmanagerservice">
attributes=[
#<Nokogiri::XML::Attr:0x1c5b874 name="ErrorCode" value="430">,
#<Nokogiri::XML::Attr:0x1c5b868 name="ErrorDescription" value="null : PrivacyUE Mancante">,
#<Nokogiri::XML::Attr:0x1c5b85c name="Return" value="false">]
children=[#<Nokogiri::XML::Text:0x1c5b118 "\n">,
#<Nokogiri::XML::Element:0x1c5b094 name="Element" namespace=#<Nokogiri::XML::Namespace:0x1c5b88c prefix="dms" href="http://dmsmanagerservice">
attributes=[#<Nokogiri::XML::Attr:0x1c5b058 name="Name" value="DMSVERSION">]
children=[#<Nokogiri::XML::Text:0x1c5abe4 "2.7">]>,
#<Nokogiri::XML::Text:0x1c5aaac "\n">]>,
#<Nokogiri::XML::Text:0x1c5a974 "\n">]>]
Now I need to enter in it and select some attributes.
In the specific I need this:
name="ErrorCode" value="430"
name="ErrorDescription" value="null : PrivacyUE Mancante"
I do not know how to procceed. Can you help me?
The following should work for you assuming the dms namespace is always the same
doc.xpath('//CONTENT/dms:ComplexResponse', dms: 'http://dmsmanagerservice')
.xpath('#ErrorCode | #ErrorDescription')
.each_with_object({}) do |e,obj|
obj[e.name] = e.text
end
#=> {"ErrorCode"=>"430", "ErrorDescription"=>"null : PrivacyUE Mancante"}
You already understand how you got to //CONTENT so from there we use dms:ComplexResponse to navigate deeper but since this is namespaced we have to provide the namespace reference e.g. dms: 'http://dmsmanagerservice'.
Then we select the attributes we are interested in #ErrorCode and #ErrorDescription.
In XPath the pipe | means UNION (think AND) so we want to select both.
Then we are just building a Hash using the name as the key and the text as the value.
XPath Cheatsheet - Useful resource if you need additional reference
Update
You asked about conditionals so this is what I would propose
ndoc = Nokogiri::XML(doc)
namespaces = ndoc.collect_namespaces
response = ndoc.xpath("//CONTENT/dms:ComplexResponse", namespaces)
if response.xpath("self::node()[#ErrorCode != '' and #ErrorDescription != '']").any?
response.xpath("#ErrorCode | #ErrorDescription")
.each_with_object({}) do |e,obj|
obj[e.name] = e.text
end
else
response.xpath('dms:Element/#Name | dms:Element/text()',namespaces)
.each_slice(2)
.map {|s| s.map(&:text)}.to_h
end
This checks to see if there is an ErrorCode and and ErrorDescription if so then Hash as originally proposed. If Not then it returns all the dms:Elements as a Hash so {"DMSVERSION"=>"2.7"} in this case Functional Example

How to add a comment before XML root node, in Nokogiri?

This is what I'm doing:
xml = Nokogiri::XML('<hello/>')
xml.root.add_previous_sibling(
Nokogiri::XML::Comment.new(
xml, '<!-- how are you? -->'
)
)
This is what I'm trying to achieve:
<?xml version="1.0"?>
<!-- how are you? -->
<hello/>
I'm getting:
ArgumentError: A document may not have multiple root nodes.
What is the right way?
Comment should be added inside xml.children NodeSet.
Here is an example:
xml = Nokogiri::XML('<hello/>')
=> #<Nokogiri::XML::Document:0x3fe1db8d0ed0 name="document" children=[#<Nokogiri::XML::Element:0x3fe1db8d0584 name="hello">]>
xml.children.before(Nokogiri::XML::Comment.new(xml, 'how are you?'))
=> #<Nokogiri::XML::Element:0x3fe1db8d0584 name="hello">
xml.to_s
=> "<?xml version=\"1.0\"?>\n<!--how are you?-->\n<hello/>\n"

Can't address XML attribute thought XPath in Ruby (using Nokogiri)

I'm trying to filter xml file to get nodes with certain attribute. I can successfully filter by node (ex. \top_manager), but when I try \\top_manager[#salary='great'] I get nothing.
<?xml version= "1.0"?>
<employee xmlns="http://www.w3schools.com" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="employee.xsd">
<top_manager>
<ceo salary="great" respect="enormous" type="extra">
<fname>
Vasya
</fname>
<lname>
Pypkin
</lname>
<hire_date>
19
</hire_date>
<descr>
Big boss
</descr>
</ceo>
<cio salary="big" respect="great" type="intro">
<fname>
Petr
</fname>
<lname>
Pypkin
</lname>
<hire_date>
25
</hire_date>
<descr>
Resposible for information security
</descr>
</cio>
</top_manager>
......
How I need to correct this code to get what I need?
require 'nokogiri'
f = File.open("employee.xml")
doc = Nokogiri::XML(f)
doc.xpath("//top_manager[#salary='great']").each do |node|
puts node.text
end
thank you.
That's because salary is not attribute of <top_manager> element, it is the attribute of <top_manager>'s children elements :
//xmlns:top_manager[*[#salary='great']]
Above XPath select <top_manager> element having any of it's child element has attribute salary equals "great". Or if you meant to select the children (the <ceo> element in this case) :
//xmlns:top_manager/*[#salary='great']

Ruby Hash parsed_response error

BACKGROUND
I am using HTTParty to parse an XML hash response. Unfortunately, when the hash response only has one entry(?), the resulting hash is not indexable. I have confirmed the resulting XML syntax is the same for single and multiple entry(?). I have also confirmed my code works when there are always multiple entries(?) in the hash.
QUESTION
How do I accommodate the single hash entry case and/or is there an easier way to accomplish what I am trying to do?
CODE
require 'httparty'
class Rest
include HTTParty
format :xml
end
def test_redeye
# rooms and devices
roomID = Hash.new
deviceID = Hash.new { |h,k| h[k] = Hash.new }
rooms = Rest.get(#reIp["theater"] + "/redeye/rooms/").parsed_response["rooms"]
puts "rooms #{rooms}"
rooms["room"].each do |room|
puts "room #{room}"
roomID[room["name"].downcase.strip] = "/redeye/rooms/" + room["roomId"]
puts "roomid #{roomID}"
devices = Rest.get(#reIp["theater"] + roomID[room["name"].downcase.strip] + "/devices/").parsed_response["devices"]
puts "devices #{devices}"
devices["device"].each do |device|
puts "device #{device}"
deviceID[room["name"].downcase.strip][device["displayName"].downcase.strip] = "/devices/" + device["deviceId"]
puts "deviceid #{deviceID}"
end
end
say "Done"
end
XML - SINGLE ENTRY
<?xml version="1.0" encoding="UTF-8" ?>
<devices>
<device manufacturerName="Philips" description="" portType="infrared" deviceType="0" modelName="" displayName="TV" deviceId="82" />
</devices>
XML - MULTIPLE ENTRY
<?xml version="1.0" encoding="UTF-8" ?>
<devices>
<device manufacturerName="Denon" description="" portType="infrared" deviceType="6" modelName="Avr-3311ci" displayName="AVR" deviceId="77" />
<device manufacturerName="Philips" description="" portType="infrared" deviceType="0" modelName="" displayName="TV" deviceId="82" />
</devices>
RESULTING ERROR
[Info - Plugin Manager] Matches, executing block
rooms {"room"=>[{"name"=>"Home Theater", "currentActivityId"=>"78", "roomId"=>"-1", "description"=>""}, {"name"=>"Living", "currentActivityId"=>"-1", "roomId"=>"81", "description"=>"2nd Floor"}, {"name"=>"Theater", "currentActivityId"=>"-1", "roomId"=>"80", "description"=>"1st Floor"}]}
room {"name"=>"Home Theater", "currentActivityId"=>"78", "roomId"=>"-1", "description"=>""}
roomid {"home theater"=>"/redeye/rooms/-1"}
devices {"device"=>[{"manufacturerName"=>"Denon", "description"=>"", "portType"=>"infrared", "deviceType"=>"6", "modelName"=>"Avr-3311ci", "displayName"=>"AVR", "deviceId"=>"77"}, {"manufacturerName"=>"Philips", "description"=>"", "portType"=>"infrared", "deviceType"=>"0", "modelName"=>"", "displayName"=>"TV", "deviceId"=>"82"}]}
device {"manufacturerName"=>"Denon", "description"=>"", "portType"=>"infrared", "deviceType"=>"6", "modelName"=>"Avr-3311ci", "displayName"=>"AVR", "deviceId"=>"77"}
deviceid {"home theater"=>{"avr"=>"/devices/77"}}
device {"manufacturerName"=>"Philips", "description"=>"", "portType"=>"infrared", "deviceType"=>"0", "modelName"=>"", "displayName"=>"TV", "deviceId"=>"82"}
deviceid {"home theater"=>{"avr"=>"/devices/77", "tv"=>"/devices/82"}}
room {"name"=>"Living", "currentActivityId"=>"-1", "roomId"=>"81", "description"=>"2nd Floor"}
roomid {"home theater"=>"/redeye/rooms/-1", "living"=>"/redeye/rooms/81"}
devices {"device"=>{"manufacturerName"=>"Philips", "description"=>"", "portType"=>"infrared", "deviceType"=>"0", "modelName"=>"", "displayName"=>"TV", "deviceId"=>"82"}}
device ["manufacturerName", "Philips"]
/usr/local/rvm/gems/ruby-1.9.3-p374#SiriProxy/gems/siriproxy-0.3.2/plugins/siriproxy-redeye/lib/siriproxy-redeye.rb:145:in `[]': can't convert String into Integer (TypeError)
There are a couple of options I see. If you control the endpoint, you could modify the XML being sent to accomodate HTTParty's underlying XML parser, Crack by putting a type="array" attribute on the devices XML element.
Otherwise, you could check to see what class the device is before indexing into it:
case devices["device"]
when Array
# act on the collection
else
# act on the single element
end
It's much less than ideal whenever you have to do type-checking in a dynamic language, so if you find yourself doing this more than once it may be worth introducing polymorphism or at the very least extracting a method to do this.

Trying to parse a XML using Nokogiri with Ruby

I am new to programming so bear with me. I have an XML document that looks like this:
File name: PRIDE1542.xml
<ExperimentCollection version="2.1">
<Experiment>
<ExperimentAccession>1015</ExperimentAccession>
<Title>**Protein complexes in Saccharomyces cerevisiae (GPM06600002310)**</Title>
<ShortLabel>GPM06600002310</ShortLabel>
<Protocol>
<ProtocolName>**None**</ProtocolName>
</Protocol>
<mzData version="1.05" accessionNumber="1015">
<cvLookup cvLabel="RESID" fullName="RESID Database of Protein Modifications" version="0.0" address="http://www.ebi.ac.uk/RESID/" />
<cvLookup cvLabel="UNIMOD" fullName="UNIMOD Protein Modifications for Mass Spectrometry" version="0.0" address="http://www.unimod.org/" />
<description>
<admin>
<sampleName>**GPM06600002310**</sampleName>
<sampleDescription comment="Ho, Y., et al., Systematic identification of protein complexes in Saccharomyces cerevisiae by mass spectrometry. Nature. 2002 Jan 10;415(6868):180-3.">
<cvParam cvLabel="NEWT" accession="4932" name="Saccharomyces cerevisiae (Baker's yeast)" value="Saccharomyces cerevisiae" />
</sampleDescription>
</admin>
</description>
<spectrumList count="0" />
</mzData>
</Experiment>
</ExperimentCollection>
I want to take out the text in between <Title>, <ProtocolName>, and <SampleName> and put into a text file (I tried bolding them to making it easier to see). I have the following code so far (based on posts I saw on this site), but it seems not to work:
>> require 'rubygems'
>> require 'nokogiri'
>> doc = Nokogiri::XML(File.open("PRIDE_Exp_Complete_Ac_10094.xml"))
>> #ExperimentCollection = doc.css("ExperimentCollection Title").map {|node| node.children.text }
Can someone help me?
Try to access them using xpath expressions. You can enter the path through the parse tree using slashes.
puts doc.xpath( "/ExperimentCollection/Experiment/Title" ).text
puts doc.xpath( "/ExperimentCollection/Experiment/Protocol/ProtocolName" ).text
puts doc.xpath( "/ExperimentCollection/Experiment/mzData/description/admin/sampleName" ).text

Resources