Time object to String Object to XML and Back Again in Ruby - ruby

I am storing Time objects in XML as strings. I am having trouble figuring out the best way to reinitialize them back. From strings to Time objects in order to perform a subtraction on them.
here is how they are stored in xml
<time>
<category> batin </category>
<in>2014-10-29 18:20:47 -0400</in>
<out>2014-10-29 18:20:55 -0400</out>
</time>
using
t = Time.now
i am accessing them from xml with
doc = Nokogiri::XML(File.open("time.xml"))
nodes = doc.xpath("//time").each do |node|
temp = TimeClock.new
temp.category = node.xpath('category').inner_text
temp.in = node.xpath('in').inner_text.
temp.out = node.xpath('out').inner_text.
#times << temp
end
what is the best way to reconvert them back to Time objects? i do not see a method of Time object that does this. I found that it is possible to convert to a Date object. but that seems to only give me a format of mm/dd/yyyy which is partly what i want.
in need to be able to subtract
<out>2014-10-29 18:20:55 -0400</out>
from
<in>2014-10-29 18:20:47 -0400</in>
the XML will at some point be stored based on dates but i also need the exact time "hh/mm/ss" as well to perform calculations.
any sugguestions?

The time stdlib extends the class with parsing/conversion methods.
require 'time'
Time.parse('2014-10-29 18:20:47 -0400')

I'd do something like:
require 'nokogiri'
require 'time'
doc = Nokogiri::XML(<<EOT)
<xml>
<time>
<category>
<in>2014-10-29 18:20:47 -0400</in>
<out>2014-10-29 18:20:55 -0400</out>
</category>
</time>
</xml>
EOT
times = doc.search('time category').map{ |category|
in_time, out_time = %w[in out].map{ |n| Time.parse(category.at(n).text) }
{
in: in_time,
out: out_time
}
}
times # => [{:in=>2014-10-29 15:20:47 -0700, :out=>2014-10-29 15:20:55 -0700}]
Both the DateTime and Time classes allow parsing of a small variety of date/time formats. Some formats can cause explosions but this one is safe. Use DateTime if the date could be before the Unix epoch.
in_time, out_time = %w[in out].map{ |n| Time.parse(category.at(n).text) }
Looking at that in IRB:
>> doc.search('time category').to_html
"<category>\n <in>2014-10-29 18:20:47 -0400</in>\n <out>2014-10-29 18:20:55 -0400</out>\n</category>"
doc.search('time category') returns a NodeSet of all <category> nodes.
>> %w[in out]
[
[0] "in",
[1] "out"
]
returns an array of strings.
Time.parse(category.at(n).text)
returns the n node under the <category> node, where n is first 'in', then 'out'.

Related

Overwrite issue in ruby Nokogiri

below is my code, it only print out 1 records rather than all records in the file
file.xpath("//record").each do |node|
$records << {
"id" => node.xpath('id').text,
"first_name" => node.xpath('first_name').text,
"last_name" => node.xpath('last_name').text,
"email" => node.xpath('email').text,
"gender" => node.xpath('gender').text,
"ip_address" => node.xpath('ip_address').text,
"send_date" => node.xpath('send_date').text,
"email_body" => node.xpath('email_body').text,
"email_title" => node.xpath('email_title').text
}
puts $records
end
this is the xml file for records
<record>
<id>1</id>
<first_name>Adiana</first_name>
<last_name>Paulat</last_name>
<email>apaulat0#technorati.com</email>
<gender>Female</gender>
<ip_address>216.250.245.57</ip_address>
<send_date>2017-05-17T23:04:27Z</send_date>
<email_body>​</email_body>
<email_title>Up-sized</email_title>
</record>
<record>
<id>2</id>
<first_name>Jaye</first_name>
<last_name>O'Donnelly</last_name>
<email>jodonnelly1#amazon.com</email>
<gender>Male</gender>
<ip_address>15.66.35.144</ip_address>
<send_date>2017-11-09T05:08:56Z</send_date>
<email_body><script>alert('hi')</script></email_body>
<email_title>real-time</email_title>
</record>
this is the output of the system
{"id"=>"1", "first_name"=>"Adiana", "last_name"=>"Paulat", "email"=>"apaulat0#technorati.com", "gender"=>"Female", "ip_address"=>"216.250.245.57", "send_date"=>"2017-05-17T23:04:27Z", "email_body"=>"​", "email_title"=>"Up-sized"}
I asked my tutor and he said that I had overwrite issue in my line and I couldnt find it. Anyone can help?
Thank you in advanced
When I write your XML into a file called yo.xml, and run this little program...
require 'nokogiri'
file = Nokogiri::XML(File.open('yo.xml').read())
p file.xpath('//record').size
...I get 1. One record.
This is probably because your XML has no single top-level node, so Nokogiri assumed when it found the first </record> that the XML ended there.
When I wrap your content with <records>...</records>, I get 2.
I believe the issue is because you have 2 root nodes in your xml document, which isn't allowed and isn't liked (nokogiri appears to parse the first one then stop). What you have isn't an xml document, it's an xml fragment, but nokogiri does let you work with those as well, you just need to initialize the file as:
file = Nokogiri::XML::DocumentFragment.parse(xml)
and then you can iterate both of the record elements using the xpath:
file.xpath("./record").each do |node|
I'm not the greatest at xpath, this seems to be working though, I'm not sure why //record doesn't work when you use a fragment while this does.
If you can't change the incoming data then you can use this to get exactly what you are looking for
Nokogiri::XML::DocumentFragment.parse(xml_string).search('record').each {|record| p Hash.from_xml(record.to_xml)}

Ruby: Extract and operate on partially extracted Nokogiri objects

require 'nokogiri'
xml = DATA.read
xml_nokogiri = Nokogiri::XML.parse xml
widgets = xml_nokogiri.xpath("//Widget")
dates = widgets.map { |widget| widget.xpath("//DateAdded").text }
puts dates
__END__
<Widgets>
<Widget>
<Price>42</Price>
<DateAdded>04/22/1989</DateAdded>
</Widget>
<Widget>
<Price>29</Price>
<DateAdded>02/05/2015</DateAdded>
</Widget>
</Widgets>
Notes:
This is a contrived example I cooked up as its very inconvenient to post the actual code because of too many dependencies. Did this as this code is readily testable on copy/paste.
widgets is a Nokogiri::XML::NodeSet object which has two Nokogiri::XML::Elements. Each of which is the xml fragment corresponding to the Widget tag.
I am intending to operate on each of those fragments with xpath again, but use of xpath query that starts with // seems to query from the ROOT of the xml AGAIN not the individual fragment.
Any idea why its so? Was expecting dates to hold the tag of each fragment alone.
EDIT: Assume that the tags have a complicated structure that
relative addressing is not practical (like using
xpath("DateAdded"))
.//DateAdded will give you relative XPath (any nested DateAdded node), as well as simple DateAdded without preceding slashes (immediate child):
- dates = widgets.map { |widget| widget.xpath("//DateAdded").text }
# for immediate children use 'DateAdded'
+ dates = widgets.map { |widget| widget.xpath("DateAdded").text }
# for nested elements use './/DateAdded'
+ dates = widgets.map { |widget| widget.xpath(".//DateAdded").text }
#⇒ [
# [0] "04/22/1989",
# [1] "02/05/2015"
#]

Generate Timestamp in Ruby

I am getting a timestamp value in an xml request as the following format.
2014-06-27T12:41:13.0000617Z
I need to form the xml response with this kind of time format in ruby. How do I get this format for the corresponding time?
I wanted to know the name of this format.
Try this:
t = Time.utc(2010,3,30, 5,43,"25.123456789".to_r)
t.iso8601(10)
This produces:
"2010-03-30T05:43:25.1234567890Z"
require 'date'
datetime = DateTime.parse('2014-06-27T12:41:13.0000617Z')
repr = datetime.strftime('%Y-%m-%dT%H:%M:%S.%7NZ')
puts repr
#=> 2014-06-27T12:41:13.0000617Z

Ruby RDF query - extracting simple data from Seq and Bag items

I am receiving xml-serialised RDF (as part of XMP media descriptions in case that is relevent), and processing in Ruby. I am trying to work with rdf gem, although happy to look at other solutions.
I have managed to load and query the most basic data, but am stuck when trying to build a query for items which contain sequences and bags.
Example XML RDF:
<rdf:RDF xmlns:rdf='http://www.w3.org/1999/02/22-rdf-syntax-ns#'>
<rdf:Description rdf:about='' xmlns:dc='http://purl.org/dc/elements/1.1/'>
<dc:date>
<rdf:Seq>
<rdf:li>2013-04-08</rdf:li>
</rdf:Seq>
</dc:date>
</rdf:Description>
</rdf:RDF>
My best attempt at putting together a query:
require 'rdf'
require 'rdf/rdfxml'
require 'rdf/vocab/dc11'
graph = RDF::Graph.load( 'test.rdf' )
date_query = RDF::Query.new( :subject => { RDF::DC11.date => :date } )
results = date_query.execute(graph)
results.map { |result| { result.subject.to_s => result.date.inspect } }
=> [{"test.rdf"=>"#<RDF::Node:0x3fc186b3eef8(_:g70100421177080)>"}]
I get the impression that my results at this stage ("query solutions"?) are a reference to the rdf:Seq container. But I am lost as to how to progress. For the example above, I'd expect to end up, eventually, with an array ["2013-04-08"].
When there is incoming data without the rdf:Seq and rdf:li containers, I am able to extract the strings I want using RDF::Query, following examples at http://rdf.rubyforge.org/RDF/Query.html - unfortunately I cannot find any examples of more complex queries or RDF structures processed in Ruby.
Edit: In addition, when I try to find appropriate methods to use with the RDF::Node object, I cannot see any way to explore any further relations it may have:
results[0].date.methods - Object.methods
=> [:original, :original=, :id, :id=, :node?, :anonymous?, :unlabeled?, :labeled?, :to_sym, :resource?, :constant?, :variable?, :between?, :graph?, :literal?, :statement?, :iri?, :uri?, :valid?, :invalid?, :validate!, :validate, :to_rdf, :inspect!, :type_error, :to_ntriples]
# None of the above leads AFAICS to more data in the graph
I know how to get the same data in xpath (well, at least provided we always get the same paths in the serialisation), but feel it is not the best query language to use in this case (it's my backup plan, however, if it turns out too complex to implement an RDF-query solution)
I think you're correct when saying "my results at this stage ("query solutions"?) are a reference to the rdf:Seq container". RDF/XML is a really horrible serialisation format, instead think of the data as a graph. Here a picture of an RDF:Bag. RDF:Seq works the same and the #students in the example is analogous to the #date in your case.
So to get to the date literal, you need to hop one node further in the graph. I'm not familiar with the syntax of this Ruby library, but something like:
require 'rdf'
require 'rdf/rdfxml'
require 'rdf/vocab/dc11'
graph = RDF::Graph.load( 'test.rdf' )
date_query = RDF::Query.new({
:yourThing => {
RDF::DC11.date => :dateSeq
},
:dateSeq => {
RDF.type => RDF.Seq,
RDF._1 => :dateLiteral
}
})
date_query.execute(graph).each do |solution|
puts "date=#{solution.dateLiteral}"
end
Of course, if you expect the Seq to actually to contain multiple dates (otherwise it wouldn't make sense to have a Seq), you will have to match them with RDF._1 => :dateLiteral1, RDF._2 => :dateLiteral2, RDF._3 => :dateLiteral3 etc.
Or for a more generic solution, match all the properties and objects on the dateSeq with:
:dateSeq => {
:property => :dateLiteral
}
and then filter out the case where :property ends up being RDF:type while :dateLiteral isn't actually the date but RDF:Seq. Maybe the library has also a special method to get all the Seq's contents.

Converting xml into a native Ruby data structure

I'm grabbing data from an api that is returning xml like this:
<?xml version="1.0" encoding="utf-8" ?> <seriess realtime_start="2013-01-28" realtime_end="2013-01-28"> <series id="GDPC1" realtime_start="2013-01-28" realtime_end="2013-01-28" title="Real Gross Domestic Product, 1 Decimal" observation_start="1947-01-01" observation_end="2012-07-01" frequency="Quarterly" frequency_short="Q" units="Billions of Chained 2005 Dollars" units_short="Bil. of Chn. 2005 $" seasonal_adjustment="Seasonally Adjusted Annual Rate" seasonal_adjustment_short="SAAR" last_updated="2012-12-20 08:16:28-06" popularity="93" notes="Real gross domestic product is the inflation adjusted value of the goods and services produced by labor and property located in the United States. For more information see the Guide to the National Income and Product Accounts of the United States (NIPA) - (http://www.bea.gov/national/pdf/nipaguid.pdf)"/> </seriess>
I'm new to deserialization but what I think is appropriate is to parse this xml into a ruby object that I can then reference like objectFoo.seriess.series.frequency that would return 'Quarterly'.
From my searches here and on google there doesn't seem to be an obvious solution to this in Ruby (NOT rails) which makes me think I'm missing something rather obvious. Any ideas?
Edit
I setup a test case based upon Winfield's suggestion.
class Exopenstruct
require 'ostruct'
def initialize()
hash = {"seriess"=>{"realtime_start"=>"2013-02-01", "realtime_end"=>"2013-02-01", "series"=>{"id"=>"GDPC1", "realtime_start"=>"2013-02-01", "realtime_end"=>"2013-02-01", "title"=>"Real Gross Domestic Product, 1 Decimal", "observation_start"=>"1947-01-01", "observation_end"=>"2012-10-01", "frequency"=>"Quarterly", "frequency_short"=>"Q", "units"=>"Billions of Chained 2005 Dollars", "units_short"=>"Bil. of Chn. 2005 $", "seasonal_adjustment"=>"Seasonally Adjusted Annual Rate", "seasonal_adjustment_short"=>"SAAR", "last_updated"=>"2013-01-30 07:46:54-06", "popularity"=>"93", "notes"=>"Real gross domestic product is the inflation adjusted value of the goods and services produced by labor and property located in the United States.\n\nFor more information see the Guide to the National Income and Product Accounts of the United States (NIPA) - (http://www.bea.gov/national/pdf/nipaguid.pdf)"}}}
object_instance = OpenStruct.new( hash )
end
end
In irb I loaded the rb file and instantiated the class. However, when I tried to access an attribute (e.g. instance.seriess) I received: NoMethodError: undefined method `seriess'
Again apologies if I'm missing something obvious.
You may be better off using standard XML to Hash parsing, such as included with Rails:
object_hash = Hash.from_xml(xml_string)
puts object_hash['seriess']
If you aren't using a Rails stack, you can use a library like Nokogiri for the same behavior.
EDIT: If you're looking for object behavior, using OpenStruct is a great way to wrap the hash for this:
object_instance = OpenStruct.new( Hash.from_xml(xml_string) )
puts object_instance.seriess
NOTE: For deeply nested data, you may need to recursively convert embedded hashes into OpenStruct instances as well. IE: if attribute above is a hash of values, it will be a hash and not an OpenStruct.
I've just started using Damien Le Berrigaud's fork of HappyMapper and I'm really pleased with it. You define simple Ruby classes and include HappyMapper. When you call parse, it uses Nokogiri to slurp in the XML and you get back a complete tree of bona-fide Ruby objects.
I've used it to parse multi-megabyte XML files and found it to be fast and dependable. Check out the README.
One hint: since XML file encoding strings sometimes lie, you may need to sanitize your XML like this:
def sanitize(xml)
xml.encode('UTF-8', 'binary', invalid: :replace, undef: :replace, replace: '')
end
before passing it to the #parse method in order to avoid Nokogiri's Input is not proper UTF-8, indicate encoding ! error.
update
I went ahead and cast the OP's example into HappyMapper:
XML_STRING = '<?xml version="1.0" encoding="utf-8" ?> <seriess realtime_start="2013-01-28" realtime_end="2013-01-28"> <series id="GDPC1" realtime_start="2013-01-28" realtime_end="2013-01-28" title="Real Gross Domestic Product, 1 Decimal" observation_start="1947-01-01" observation_end="2012-07-01" frequency="Quarterly" frequency_short="Q" units="Billions of Chained 2005 Dollars" units_short="Bil. of Chn. 2005 $" seasonal_adjustment="Seasonally Adjusted Annual Rate" seasonal_adjustment_short="SAAR" last_updated="2012-12-20 08:16:28-06" popularity="93" notes="Real gross domestic product is the inflation adjusted value of the goods and services produced by labor and property located in the United States. For more information see the Guide to the National Income and Product Accounts of the United States (NIPA) - (http://www.bea.gov/national/pdf/nipaguid.pdf)"/> </seriess>'
class Series; end; # fwd reference
class Seriess
include HappyMapper
tag 'seriess'
attribute :realtime_start, Date
attribute :realtime_end, Date
has_many :seriess, Series, :tag => 'series'
end
class Series
include HappyMapper
tag 'series'
attribute 'id', String
attribute 'realtime_start', Date
attribute 'realtime_end', Date
attribute 'title', String
attribute 'observation_start', Date
attribute 'observation_end', Date
attribute 'frequency', String
attribute 'frequency_short', String
attribute 'units', String
attribute 'units_short', String
attribute 'seasonal_adjustment', String
attribute 'seasonal_adjustment_short', String
attribute 'last_updated', DateTime
attribute 'popularity', Integer
attribute 'notes', String
end
def test
Seriess.parse(XML_STRING, :single => true)
end
and here's what you can do with it:
>> a = test
>> a.class
Seriess
>> a.seriess.first.frequency
=> "Quarterly"
>> a.seriess.first.observation_start
=> #<Date: 1947-01-01 ((2432187j,0s,0n),+0s,2299161j)>
>> a.seriess.first.popularity
=> 93
Nokogiri solves the parsing. How to handle the data, is up to you, here I use OpenStruct as an example:
require 'nokogiri'
require 'ostruct'
require 'open-uri'
doc = Nokogiri.parse open('http://www.w3schools.com/xml/note.xml')
note = OpenStruct.new
note.to = doc.at('to').text
note.from = doc.at('from').text
note.heading = doc.at('heading').text
note.body = doc.at('body').text
=> #<OpenStruct to="Tove", from="Jani", heading="Reminder", body="ToveJaniReminderDon't forget me this weekend!\r\n">
This is just a teaser, your problem magnitude may be many times bigger. Just giving you an edge to begin to work with
Edit: stumbling across google and stackoverflow I ran into a possible hybrid between my answer and #Winfield's using rails Hash#from_xml:
> require 'active_support/core_ext/hash/conversions'
> xml = Nokogiri::XML.parse(open('http://www.w3schools.com/xml/note.xml'))
> Hash.from_xml(xml.to_s)
=> {"note"=>{"to"=>"Tove", "from"=>"Jani", "heading"=>"Reminder", "body"=>"Don't forget me this weekend!"}}
Then you can use this hash to, for example, initialize a new ActiveRecord::Base model instance or whatever else you decide to do with it.
http://nokogiri.org/
http://ruby-doc.org/stdlib-1.9.3/libdoc/ostruct/rdoc/OpenStruct.html
https://stackoverflow.com/a/7488299/1740079
If you wanted to convert the xml to a Hash, I've found the nori gem to be the simplest.
Example:
require 'nori'
xml = '<?xml version="1.0" encoding="utf-8" ?> <seriess realtime_start="2013-01-28" realtime_end="2013-01-28"> <series id="GDPC1" realtime_start="2013-01-28" realtime_end="2013-01-28" title="Real Gross Domestic Product, 1 Decimal" observation_start="1947-01-01" observation_end="2012-07-01" frequency="Quarterly" frequency_short="Q" units="Billions of Chained 2005 Dollars" units_short="Bil. of Chn. 2005 $" seasonal_adjustment="Seasonally Adjusted Annual Rate" seasonal_adjustment_short="SAAR" last_updated="2012-12-20 08:16:28-06" popularity="93" notes="Real gross domestic product is the inflation adjusted value of the goods and services produced by labor and property located in the United States. For more information see the Guide to the National Income and Product Accounts of the United States (NIPA) - (http://www.bea.gov/national/pdf/nipaguid.pdf)"/> </seriess>'
hash = Nori.new.parse(xml)
hash['seriess']
hash['seriess']['series']
puts hash['seriess']['series']['#frequency']
Note '#' used for frequency as it's an attribute of 'series' not an element.

Resources