Converting xml into a native Ruby data structure - ruby
I'm grabbing data from an api that is returning xml like this:
<?xml version="1.0" encoding="utf-8" ?> <seriess realtime_start="2013-01-28" realtime_end="2013-01-28"> <series id="GDPC1" realtime_start="2013-01-28" realtime_end="2013-01-28" title="Real Gross Domestic Product, 1 Decimal" observation_start="1947-01-01" observation_end="2012-07-01" frequency="Quarterly" frequency_short="Q" units="Billions of Chained 2005 Dollars" units_short="Bil. of Chn. 2005 $" seasonal_adjustment="Seasonally Adjusted Annual Rate" seasonal_adjustment_short="SAAR" last_updated="2012-12-20 08:16:28-06" popularity="93" notes="Real gross domestic product is the inflation adjusted value of the goods and services produced by labor and property located in the United States. For more information see the Guide to the National Income and Product Accounts of the United States (NIPA) - (http://www.bea.gov/national/pdf/nipaguid.pdf)"/> </seriess>
I'm new to deserialization but what I think is appropriate is to parse this xml into a ruby object that I can then reference like objectFoo.seriess.series.frequency that would return 'Quarterly'.
From my searches here and on google there doesn't seem to be an obvious solution to this in Ruby (NOT rails) which makes me think I'm missing something rather obvious. Any ideas?
Edit
I setup a test case based upon Winfield's suggestion.
class Exopenstruct
require 'ostruct'
def initialize()
hash = {"seriess"=>{"realtime_start"=>"2013-02-01", "realtime_end"=>"2013-02-01", "series"=>{"id"=>"GDPC1", "realtime_start"=>"2013-02-01", "realtime_end"=>"2013-02-01", "title"=>"Real Gross Domestic Product, 1 Decimal", "observation_start"=>"1947-01-01", "observation_end"=>"2012-10-01", "frequency"=>"Quarterly", "frequency_short"=>"Q", "units"=>"Billions of Chained 2005 Dollars", "units_short"=>"Bil. of Chn. 2005 $", "seasonal_adjustment"=>"Seasonally Adjusted Annual Rate", "seasonal_adjustment_short"=>"SAAR", "last_updated"=>"2013-01-30 07:46:54-06", "popularity"=>"93", "notes"=>"Real gross domestic product is the inflation adjusted value of the goods and services produced by labor and property located in the United States.\n\nFor more information see the Guide to the National Income and Product Accounts of the United States (NIPA) - (http://www.bea.gov/national/pdf/nipaguid.pdf)"}}}
object_instance = OpenStruct.new( hash )
end
end
In irb I loaded the rb file and instantiated the class. However, when I tried to access an attribute (e.g. instance.seriess) I received: NoMethodError: undefined method `seriess'
Again apologies if I'm missing something obvious.
You may be better off using standard XML to Hash parsing, such as included with Rails:
object_hash = Hash.from_xml(xml_string)
puts object_hash['seriess']
If you aren't using a Rails stack, you can use a library like Nokogiri for the same behavior.
EDIT: If you're looking for object behavior, using OpenStruct is a great way to wrap the hash for this:
object_instance = OpenStruct.new( Hash.from_xml(xml_string) )
puts object_instance.seriess
NOTE: For deeply nested data, you may need to recursively convert embedded hashes into OpenStruct instances as well. IE: if attribute above is a hash of values, it will be a hash and not an OpenStruct.
I've just started using Damien Le Berrigaud's fork of HappyMapper and I'm really pleased with it. You define simple Ruby classes and include HappyMapper. When you call parse, it uses Nokogiri to slurp in the XML and you get back a complete tree of bona-fide Ruby objects.
I've used it to parse multi-megabyte XML files and found it to be fast and dependable. Check out the README.
One hint: since XML file encoding strings sometimes lie, you may need to sanitize your XML like this:
def sanitize(xml)
xml.encode('UTF-8', 'binary', invalid: :replace, undef: :replace, replace: '')
end
before passing it to the #parse method in order to avoid Nokogiri's Input is not proper UTF-8, indicate encoding ! error.
update
I went ahead and cast the OP's example into HappyMapper:
XML_STRING = '<?xml version="1.0" encoding="utf-8" ?> <seriess realtime_start="2013-01-28" realtime_end="2013-01-28"> <series id="GDPC1" realtime_start="2013-01-28" realtime_end="2013-01-28" title="Real Gross Domestic Product, 1 Decimal" observation_start="1947-01-01" observation_end="2012-07-01" frequency="Quarterly" frequency_short="Q" units="Billions of Chained 2005 Dollars" units_short="Bil. of Chn. 2005 $" seasonal_adjustment="Seasonally Adjusted Annual Rate" seasonal_adjustment_short="SAAR" last_updated="2012-12-20 08:16:28-06" popularity="93" notes="Real gross domestic product is the inflation adjusted value of the goods and services produced by labor and property located in the United States. For more information see the Guide to the National Income and Product Accounts of the United States (NIPA) - (http://www.bea.gov/national/pdf/nipaguid.pdf)"/> </seriess>'
class Series; end; # fwd reference
class Seriess
include HappyMapper
tag 'seriess'
attribute :realtime_start, Date
attribute :realtime_end, Date
has_many :seriess, Series, :tag => 'series'
end
class Series
include HappyMapper
tag 'series'
attribute 'id', String
attribute 'realtime_start', Date
attribute 'realtime_end', Date
attribute 'title', String
attribute 'observation_start', Date
attribute 'observation_end', Date
attribute 'frequency', String
attribute 'frequency_short', String
attribute 'units', String
attribute 'units_short', String
attribute 'seasonal_adjustment', String
attribute 'seasonal_adjustment_short', String
attribute 'last_updated', DateTime
attribute 'popularity', Integer
attribute 'notes', String
end
def test
Seriess.parse(XML_STRING, :single => true)
end
and here's what you can do with it:
>> a = test
>> a.class
Seriess
>> a.seriess.first.frequency
=> "Quarterly"
>> a.seriess.first.observation_start
=> #<Date: 1947-01-01 ((2432187j,0s,0n),+0s,2299161j)>
>> a.seriess.first.popularity
=> 93
Nokogiri solves the parsing. How to handle the data, is up to you, here I use OpenStruct as an example:
require 'nokogiri'
require 'ostruct'
require 'open-uri'
doc = Nokogiri.parse open('http://www.w3schools.com/xml/note.xml')
note = OpenStruct.new
note.to = doc.at('to').text
note.from = doc.at('from').text
note.heading = doc.at('heading').text
note.body = doc.at('body').text
=> #<OpenStruct to="Tove", from="Jani", heading="Reminder", body="ToveJaniReminderDon't forget me this weekend!\r\n">
This is just a teaser, your problem magnitude may be many times bigger. Just giving you an edge to begin to work with
Edit: stumbling across google and stackoverflow I ran into a possible hybrid between my answer and #Winfield's using rails Hash#from_xml:
> require 'active_support/core_ext/hash/conversions'
> xml = Nokogiri::XML.parse(open('http://www.w3schools.com/xml/note.xml'))
> Hash.from_xml(xml.to_s)
=> {"note"=>{"to"=>"Tove", "from"=>"Jani", "heading"=>"Reminder", "body"=>"Don't forget me this weekend!"}}
Then you can use this hash to, for example, initialize a new ActiveRecord::Base model instance or whatever else you decide to do with it.
http://nokogiri.org/
http://ruby-doc.org/stdlib-1.9.3/libdoc/ostruct/rdoc/OpenStruct.html
https://stackoverflow.com/a/7488299/1740079
If you wanted to convert the xml to a Hash, I've found the nori gem to be the simplest.
Example:
require 'nori'
xml = '<?xml version="1.0" encoding="utf-8" ?> <seriess realtime_start="2013-01-28" realtime_end="2013-01-28"> <series id="GDPC1" realtime_start="2013-01-28" realtime_end="2013-01-28" title="Real Gross Domestic Product, 1 Decimal" observation_start="1947-01-01" observation_end="2012-07-01" frequency="Quarterly" frequency_short="Q" units="Billions of Chained 2005 Dollars" units_short="Bil. of Chn. 2005 $" seasonal_adjustment="Seasonally Adjusted Annual Rate" seasonal_adjustment_short="SAAR" last_updated="2012-12-20 08:16:28-06" popularity="93" notes="Real gross domestic product is the inflation adjusted value of the goods and services produced by labor and property located in the United States. For more information see the Guide to the National Income and Product Accounts of the United States (NIPA) - (http://www.bea.gov/national/pdf/nipaguid.pdf)"/> </seriess>'
hash = Nori.new.parse(xml)
hash['seriess']
hash['seriess']['series']
puts hash['seriess']['series']['#frequency']
Note '#' used for frequency as it's an attribute of 'series' not an element.
Related
Time object to String Object to XML and Back Again in Ruby
I am storing Time objects in XML as strings. I am having trouble figuring out the best way to reinitialize them back. From strings to Time objects in order to perform a subtraction on them. here is how they are stored in xml <time> <category> batin </category> <in>2014-10-29 18:20:47 -0400</in> <out>2014-10-29 18:20:55 -0400</out> </time> using t = Time.now i am accessing them from xml with doc = Nokogiri::XML(File.open("time.xml")) nodes = doc.xpath("//time").each do |node| temp = TimeClock.new temp.category = node.xpath('category').inner_text temp.in = node.xpath('in').inner_text. temp.out = node.xpath('out').inner_text. #times << temp end what is the best way to reconvert them back to Time objects? i do not see a method of Time object that does this. I found that it is possible to convert to a Date object. but that seems to only give me a format of mm/dd/yyyy which is partly what i want. in need to be able to subtract <out>2014-10-29 18:20:55 -0400</out> from <in>2014-10-29 18:20:47 -0400</in> the XML will at some point be stored based on dates but i also need the exact time "hh/mm/ss" as well to perform calculations. any sugguestions?
The time stdlib extends the class with parsing/conversion methods. require 'time' Time.parse('2014-10-29 18:20:47 -0400')
I'd do something like: require 'nokogiri' require 'time' doc = Nokogiri::XML(<<EOT) <xml> <time> <category> <in>2014-10-29 18:20:47 -0400</in> <out>2014-10-29 18:20:55 -0400</out> </category> </time> </xml> EOT times = doc.search('time category').map{ |category| in_time, out_time = %w[in out].map{ |n| Time.parse(category.at(n).text) } { in: in_time, out: out_time } } times # => [{:in=>2014-10-29 15:20:47 -0700, :out=>2014-10-29 15:20:55 -0700}] Both the DateTime and Time classes allow parsing of a small variety of date/time formats. Some formats can cause explosions but this one is safe. Use DateTime if the date could be before the Unix epoch. in_time, out_time = %w[in out].map{ |n| Time.parse(category.at(n).text) } Looking at that in IRB: >> doc.search('time category').to_html "<category>\n <in>2014-10-29 18:20:47 -0400</in>\n <out>2014-10-29 18:20:55 -0400</out>\n</category>" doc.search('time category') returns a NodeSet of all <category> nodes. >> %w[in out] [ [0] "in", [1] "out" ] returns an array of strings. Time.parse(category.at(n).text) returns the n node under the <category> node, where n is first 'in', then 'out'.
Ruby RDF query - extracting simple data from Seq and Bag items
I am receiving xml-serialised RDF (as part of XMP media descriptions in case that is relevent), and processing in Ruby. I am trying to work with rdf gem, although happy to look at other solutions. I have managed to load and query the most basic data, but am stuck when trying to build a query for items which contain sequences and bags. Example XML RDF: <rdf:RDF xmlns:rdf='http://www.w3.org/1999/02/22-rdf-syntax-ns#'> <rdf:Description rdf:about='' xmlns:dc='http://purl.org/dc/elements/1.1/'> <dc:date> <rdf:Seq> <rdf:li>2013-04-08</rdf:li> </rdf:Seq> </dc:date> </rdf:Description> </rdf:RDF> My best attempt at putting together a query: require 'rdf' require 'rdf/rdfxml' require 'rdf/vocab/dc11' graph = RDF::Graph.load( 'test.rdf' ) date_query = RDF::Query.new( :subject => { RDF::DC11.date => :date } ) results = date_query.execute(graph) results.map { |result| { result.subject.to_s => result.date.inspect } } => [{"test.rdf"=>"#<RDF::Node:0x3fc186b3eef8(_:g70100421177080)>"}] I get the impression that my results at this stage ("query solutions"?) are a reference to the rdf:Seq container. But I am lost as to how to progress. For the example above, I'd expect to end up, eventually, with an array ["2013-04-08"]. When there is incoming data without the rdf:Seq and rdf:li containers, I am able to extract the strings I want using RDF::Query, following examples at http://rdf.rubyforge.org/RDF/Query.html - unfortunately I cannot find any examples of more complex queries or RDF structures processed in Ruby. Edit: In addition, when I try to find appropriate methods to use with the RDF::Node object, I cannot see any way to explore any further relations it may have: results[0].date.methods - Object.methods => [:original, :original=, :id, :id=, :node?, :anonymous?, :unlabeled?, :labeled?, :to_sym, :resource?, :constant?, :variable?, :between?, :graph?, :literal?, :statement?, :iri?, :uri?, :valid?, :invalid?, :validate!, :validate, :to_rdf, :inspect!, :type_error, :to_ntriples] # None of the above leads AFAICS to more data in the graph I know how to get the same data in xpath (well, at least provided we always get the same paths in the serialisation), but feel it is not the best query language to use in this case (it's my backup plan, however, if it turns out too complex to implement an RDF-query solution)
I think you're correct when saying "my results at this stage ("query solutions"?) are a reference to the rdf:Seq container". RDF/XML is a really horrible serialisation format, instead think of the data as a graph. Here a picture of an RDF:Bag. RDF:Seq works the same and the #students in the example is analogous to the #date in your case. So to get to the date literal, you need to hop one node further in the graph. I'm not familiar with the syntax of this Ruby library, but something like: require 'rdf' require 'rdf/rdfxml' require 'rdf/vocab/dc11' graph = RDF::Graph.load( 'test.rdf' ) date_query = RDF::Query.new({ :yourThing => { RDF::DC11.date => :dateSeq }, :dateSeq => { RDF.type => RDF.Seq, RDF._1 => :dateLiteral } }) date_query.execute(graph).each do |solution| puts "date=#{solution.dateLiteral}" end Of course, if you expect the Seq to actually to contain multiple dates (otherwise it wouldn't make sense to have a Seq), you will have to match them with RDF._1 => :dateLiteral1, RDF._2 => :dateLiteral2, RDF._3 => :dateLiteral3 etc. Or for a more generic solution, match all the properties and objects on the dateSeq with: :dateSeq => { :property => :dateLiteral } and then filter out the case where :property ends up being RDF:type while :dateLiteral isn't actually the date but RDF:Seq. Maybe the library has also a special method to get all the Seq's contents.
Can I avoid transposing an array in Ruby on Rails?
I have a Rails app that has a COUNTRIES list with full country names and abbreviations created inside the Company model. The array for the COUNTRIES list is used for a select tag on the input form to store abbreviations in the DB. See below. VALID_COUNTRIES is used for validations of abbreviations in the DB. FULL_COUNTRIES is used to display the full country name from the abbreviation. class Company < ActiveRecord::Base COUNTRIES = [["Afghanistan","AF"],["Aland Islands","AX"],["Albania","AL"],...] COUNTRIES_TRANSFORM = COUNTRIES.transpose VALID_COUNTRIES = COUNTRIES_TRANSPOSE[1] FULL_COUNTRIES = COUNTRIES_TRANSPOSE[0] validates :country, inclusion: { in: VALID_COUNTRIES, message: "enter a valid country" } ... end On the form: <%= select_tag(:country, options_for_select(Company::COUNTRIES, 'US')) %> And to convert back the the full country name: full_country = FULL_COUNTRIES[VALID_COUNTRIES.index(:country)] This seems like an excellent application for a hash, except the key/value order is wrong. For the select I need: COUNTRIES = {"Afghanistan" => "AF", "Aland Islands" => "AX", "Albania" => "AL",...} While to take the abbreviation from the DB and display the full country name I need: COUNTRIES = {"AF" => "Afghanistan", "AX" => "Aland Islands", "AL" => "Albania",...} Which is a shame, because COUNTRIES.keys or COUNTRIES.values would give me the validation list (depending on which hash layout is used). I'm relatively new to Ruby/Rails and am looking for the more Ruby-like way to solve the problem. Here are the questions: Does the transpose occur only once, and if so, when is it executed? Is there a way to specify the FULL_ and VALID_ lists that do not require the transpose? Is there a better or reasonable alternate way to do this? For instance, VALID_COUNTRIES is COUNTRIES[x][1] and FULL_COUNTRIES is COUNTRIES[x][0], but VALID_ must work with the validation. Is there a way to make a hash work with just one hash rather then one for the select_tag and one for converting the abbreviations in the DB back to full names for display?
1) Does the transpose occur only once, and if so, when is it executed? Yes at compile time because you are assigning to constants if you want it to be evaluated every time use a lambda FULL_COUNTRIES = lambda { COUNTRIES_TRANSPOSE[0] } 2) Is there a way to specify the FULL_ and VALID_ lists that do not require the transpose? Yes use a map or collect (they are the same thing) VALID_COUNTRIES = COUNTRIES.map &:first FULL_COUNTRIES = COUNTRIES.map &:last 3) Is there a better or reasonable alternate way to do this? For instance, VALID_COUNTRIES is COUNTRIES[x][1] and FULL_COUNTRIES is COUNTRIES[x][0], but VALID_ must work with the validation. See Above 4) Is there a way to make the hash work? Yes I am not sure why a hash isn't working as the rails docs say options_for_select will use hash.to_a.map &:first for the options text and hash.to_a.map &:last for the options value so the first hash you give should be working if you can clarify why it is not I can help you more.
How to use Koala Facebook Graph API?
I am a Rails newbie. I want to use Koala's Graph API. In my controller #graph = Koala::Facebook::API.new('myFacebookAccessToken') #hello = #graph.get_object("my.Name") When I do this, I get something like this { "id"=>"123456", "name"=>"First Middle Last", "first_name"=>"First", "middle_name"=>"Middle", "last_name"=>"Last", "link"=>"http://www.facebook.com/MyName", "username"=>"my.name", "birthday"=>"12/12/1212", "hometown"=>{"id"=>"115200305133358163", "name"=>"City, State"}, "location"=>{"id"=>"1054648928202133335", "name"=>"City, State"}, "bio"=>"This is my awesome Bio.", "quotes"=>"I am the master of my fate; I am the captain of my soul. - William Ernest Henley\r\n\r\n"Don't go around saying the world owes you a living. The world owes you nothing. It was here first.\" - Mark Twain", "work"=>[{"employer"=>{"id"=>"100751133333", "name"=>"Company1"}, "position"=>{"id"=>"105763693332790962", "name"=>"Position1"}, "start_date"=>"2010-08", "end_date"=>"2011-07"}], "sports"=>[{"id"=>"104019549633137", "name"=>"Sport1"}, {"id"=>"103992339636529", "name"=>"Sport2"}], "favorite_teams"=>[{"id"=>"105467226133353743", "name"=>"Fav1"}, {"id"=>"19031343444432369133", "name"=>"Fav2"}, {"id"=>"98027790139333", "name"=>"Fav3"}, {"id"=>"104055132963393331", "name"=>"Fav4"}, {"id"=>"191744431437533310", "name"=>"Fav5"}], "favorite_athletes"=>[{"id"=>"10836600585799922", "name"=>"Fava1"}, {"id"=>"18995689436787722", "name"=>"Fava2"}, {"id"=>"11156342219404022", "name"=>"Fava4"}, {"id"=>"11169998212279347", "name"=>"Fava5"}, {"id"=>"122326564475039", "name"=>"Fava6"}], "inspirational_people"=>[{"id"=>"16383141733798", "name"=>"Fava7"}, {"id"=>"113529011990793335", "name"=>"fava8"}, {"id"=>"112032333138809855566", "name"=>"Fava9"}, {"id"=>"10810367588423324", "name"=>"Fava10"}], "education"=>[{"school"=>{"id"=>"13478880321332322233663", "name"=>"School1"}, "type"=>"High School", "with"=>[{"id"=>"1401052755", "name"=>"Friend1"}]}, {"school"=>{"id"=>"11482777188037224", "name"=>"School2"}, "year"=>{"id"=>"138383069535219", "name"=>"2005"}, "type"=>"High School"}, {"school"=>{"id"=>"10604484633093514", "name"=>"School3"}, "year"=>{"id"=>"142963519060927", "name"=>"2010"}, "concentration"=>[{"id"=>"10407695629335773", "name"=>"c1"}], "type"=>"College"}, {"school"=>{"id"=>"22030497466330708", "name"=>"School4"}, "degree"=>{"id"=>"19233130157477979", "name"=>"c3"}, "year"=>{"id"=>"201638419856163", "name"=>"2011"}, "type"=>"Graduate School"}], "gender"=>"male", "interested_in"=>["female"], "relationship_status"=>"Single", "religion"=>"Religion1", "political"=>"Political1", "email"=>"somename#somecompany.com", "timezone"=>-8, "locale"=>"en_US", "languages"=>[{"id"=>"10605952233759137", "name"=>"English"}, {"id"=>"10337617475934611", "name"=>"L2"}, {"id"=>"11296944428713061", "name"=>"L3"}], "verified"=>true, "updated_time"=>"2012-02-24T04:18:05+0000" } How do I show this entire hash in the view in a good format? This is what I did from what ever I learnt.. In my view <% #hello.each do |key, value| %> <li><%=h "#{key.to_s} : #{value.to_s}" %></li> <% end %> This will get the entire thing converted to a list... It works awesome if its just one key.. but how to work with multiple keys and show only the information... something like when it outputs hometown : City, State rather than something like hometown : {"id"=>"115200305133358163", "name"=>"City, State"} Also for education if I just say education[school][name] to display list of schools attended? The error i get is can't convert String into Integer I also tried to do this in my controller, but I get the same error.. #fav_teams = #hello["favorite_teams"]["name"] Also, how can I save all these to the database.. something like just the list of all schools.. not their id no's? Update: The way I plan to save to my database is.. lets say for a user model, i want to save to database as :facebook_id, :facebook_name, :facebook_firstname, ...., :facebook_hometown .. here I only want to save name... when it comes to education.. I want to save.. school, concentration and type.. I have no idea on how to achieve this.. Looking forward for help! thanks!
To show the hash in a pretty-printed way, use the gem 'awesome_print'. Add this to your Gemfile: gem 'awesome_print' And then run: bundle install And then, in your view, you can add: <%= ap #hello %> The question of how to store in the database requires a little more information on what you plan to do with it, but at minimum you could create a model, add a 'facebook_data' (type would be 'text') on that model, and then serialize it (add this line near the top of your model file: serialize :facebook_data). Then you could assign the hash (#hello in this case) to the model's 'facebook_data' property, and then save the model. But you won't be able to query your database for individual attributes of this facebook data very easily this way.
you can just do #hello["name"] then it will give you the value of the name
Your #hello object should be of the class Koala::Facebook::API::GraphCollection or something similar. You should be able to loop through this object, like your question demonstrates. As for what code to put inside your loop that will help you save records to the database, assuming your rails user model class name is User, try something like this: #hello.each do |h| u = User.where(:facebook_id => h["id"]).first_or_initialize u.update_attributes( :name => h["name"], :first_name => h["first_name"], :hometown_city => h["hometown"]["name"].split(",").first, :hometown_state => h["hometown"]["name"].split(",").last.strip # ETC, ETC ) end In the case of the hometown and education fields, you're just going to have to traverse the ruby hash the proper way. See the docs for more info.
Ruby: How to get a class based on class name and how can I get object's field based on field name?
Question 1 How to get a class given a class name as a string ? For example, say Product class has do_something method: str = "product" <what should be here based on str?>.do_something Question 2 How to get object's field given a field name as a string ? For example, say Product class has price field: str = "price" product = Product.new product.<what should be here based on str?> = 1200
Jacob's answer to the first question assumes that you're using Rails and will work fine if you are. In case you're not you can call Kernel::const_get(str) to find an existing constant by name. send is a pure ruby. There's no need to intern your strings with send though (convert them to symbols), straight strings work fine.
Use capitalize and constantize: str.capitalize.constantize.do_something Use send: product.send(str + '=', 1200)