Overwrite issue in ruby Nokogiri - ruby

below is my code, it only print out 1 records rather than all records in the file
file.xpath("//record").each do |node|
$records << {
"id" => node.xpath('id').text,
"first_name" => node.xpath('first_name').text,
"last_name" => node.xpath('last_name').text,
"email" => node.xpath('email').text,
"gender" => node.xpath('gender').text,
"ip_address" => node.xpath('ip_address').text,
"send_date" => node.xpath('send_date').text,
"email_body" => node.xpath('email_body').text,
"email_title" => node.xpath('email_title').text
}
puts $records
end
this is the xml file for records
<record>
<id>1</id>
<first_name>Adiana</first_name>
<last_name>Paulat</last_name>
<email>apaulat0#technorati.com</email>
<gender>Female</gender>
<ip_address>216.250.245.57</ip_address>
<send_date>2017-05-17T23:04:27Z</send_date>
<email_body>​</email_body>
<email_title>Up-sized</email_title>
</record>
<record>
<id>2</id>
<first_name>Jaye</first_name>
<last_name>O'Donnelly</last_name>
<email>jodonnelly1#amazon.com</email>
<gender>Male</gender>
<ip_address>15.66.35.144</ip_address>
<send_date>2017-11-09T05:08:56Z</send_date>
<email_body><script>alert('hi')</script></email_body>
<email_title>real-time</email_title>
</record>
this is the output of the system
{"id"=>"1", "first_name"=>"Adiana", "last_name"=>"Paulat", "email"=>"apaulat0#technorati.com", "gender"=>"Female", "ip_address"=>"216.250.245.57", "send_date"=>"2017-05-17T23:04:27Z", "email_body"=>"​", "email_title"=>"Up-sized"}
I asked my tutor and he said that I had overwrite issue in my line and I couldnt find it. Anyone can help?
Thank you in advanced

When I write your XML into a file called yo.xml, and run this little program...
require 'nokogiri'
file = Nokogiri::XML(File.open('yo.xml').read())
p file.xpath('//record').size
...I get 1. One record.
This is probably because your XML has no single top-level node, so Nokogiri assumed when it found the first </record> that the XML ended there.
When I wrap your content with <records>...</records>, I get 2.

I believe the issue is because you have 2 root nodes in your xml document, which isn't allowed and isn't liked (nokogiri appears to parse the first one then stop). What you have isn't an xml document, it's an xml fragment, but nokogiri does let you work with those as well, you just need to initialize the file as:
file = Nokogiri::XML::DocumentFragment.parse(xml)
and then you can iterate both of the record elements using the xpath:
file.xpath("./record").each do |node|
I'm not the greatest at xpath, this seems to be working though, I'm not sure why //record doesn't work when you use a fragment while this does.

If you can't change the incoming data then you can use this to get exactly what you are looking for
Nokogiri::XML::DocumentFragment.parse(xml_string).search('record').each {|record| p Hash.from_xml(record.to_xml)}

Related

Time object to String Object to XML and Back Again in Ruby

I am storing Time objects in XML as strings. I am having trouble figuring out the best way to reinitialize them back. From strings to Time objects in order to perform a subtraction on them.
here is how they are stored in xml
<time>
<category> batin </category>
<in>2014-10-29 18:20:47 -0400</in>
<out>2014-10-29 18:20:55 -0400</out>
</time>
using
t = Time.now
i am accessing them from xml with
doc = Nokogiri::XML(File.open("time.xml"))
nodes = doc.xpath("//time").each do |node|
temp = TimeClock.new
temp.category = node.xpath('category').inner_text
temp.in = node.xpath('in').inner_text.
temp.out = node.xpath('out').inner_text.
#times << temp
end
what is the best way to reconvert them back to Time objects? i do not see a method of Time object that does this. I found that it is possible to convert to a Date object. but that seems to only give me a format of mm/dd/yyyy which is partly what i want.
in need to be able to subtract
<out>2014-10-29 18:20:55 -0400</out>
from
<in>2014-10-29 18:20:47 -0400</in>
the XML will at some point be stored based on dates but i also need the exact time "hh/mm/ss" as well to perform calculations.
any sugguestions?
The time stdlib extends the class with parsing/conversion methods.
require 'time'
Time.parse('2014-10-29 18:20:47 -0400')
I'd do something like:
require 'nokogiri'
require 'time'
doc = Nokogiri::XML(<<EOT)
<xml>
<time>
<category>
<in>2014-10-29 18:20:47 -0400</in>
<out>2014-10-29 18:20:55 -0400</out>
</category>
</time>
</xml>
EOT
times = doc.search('time category').map{ |category|
in_time, out_time = %w[in out].map{ |n| Time.parse(category.at(n).text) }
{
in: in_time,
out: out_time
}
}
times # => [{:in=>2014-10-29 15:20:47 -0700, :out=>2014-10-29 15:20:55 -0700}]
Both the DateTime and Time classes allow parsing of a small variety of date/time formats. Some formats can cause explosions but this one is safe. Use DateTime if the date could be before the Unix epoch.
in_time, out_time = %w[in out].map{ |n| Time.parse(category.at(n).text) }
Looking at that in IRB:
>> doc.search('time category').to_html
"<category>\n <in>2014-10-29 18:20:47 -0400</in>\n <out>2014-10-29 18:20:55 -0400</out>\n</category>"
doc.search('time category') returns a NodeSet of all <category> nodes.
>> %w[in out]
[
[0] "in",
[1] "out"
]
returns an array of strings.
Time.parse(category.at(n).text)
returns the n node under the <category> node, where n is first 'in', then 'out'.

How to load the xml file from webpage and read particular nodes from xml?

I am planning to load below mentioned xml from the webpage and then want to read particular nodes from it.Filtering condition: if "displayname" attribute contains "isc-asr901a"it should pick the first node and return the attribute "id" value of node ethernetProtocolEndpointExtendedDTO"
<queryResponse type="EthernetProtocolEndpoint">
<entity >
<ethernetProtocolEndpointExtendedDTO id="2283315" displayName="4c2b8aa7[2275273_isc- asr901a,GigabitEthernet0/0]">
<name>GigabitEthernet0/0</name>
<adminStatus>UP</adminStatus>
</ethernetProtocolEndpointExtendedDTO>
</entity>
<entity >
<ethernetProtocolEndpointExtendedDTO id="2283315" displayName="4c2b8aa7[2275273_isc-asr901a,GigabitEthernet0/0]">
<name>GigabitEthernet0/0</name>
<adminStatus>UP</adminStatus>
</ethernetProtocolEndpointExtendedDTO>
</entity>
</queryResponse>
I am planning to do this using ruby. but I am new to ruby. Could someone help me to perform this. by using which parser i can do it easily? I am using below code to perform this but code is not returning any value.
strurl = "https://.."
doc = Nokogiri::HTML(open(strurl))
doc.xpath('//queryResponse/entity/ethernetProtocolEndpointDTO[#displayName="[^"]*isc-asr901a[^"]*]').each do |node|
puts node['id']
end
Thanks,
Chandana
You need to use Nokogiri::XML, not Nokogiri::HTML, since this is an XML. Furthermore, you had a typo in ethernetProtocolEndpointExtendedDTO - you wrote ethernetProtocolEndpointDTO.
Also, you should use contains to find the display names which contain your string:
strurl = "https://.."
doc = Nokogiri::XML(open(strurl))
doc.xpath('//queryResponse/entity/ethernetProtocolEndpointExtendedDTO[contains(#displayName, "isc-asr901a")]').each do |node|
puts node['id']
end
# => 2283315

Ruby RDF query - extracting simple data from Seq and Bag items

I am receiving xml-serialised RDF (as part of XMP media descriptions in case that is relevent), and processing in Ruby. I am trying to work with rdf gem, although happy to look at other solutions.
I have managed to load and query the most basic data, but am stuck when trying to build a query for items which contain sequences and bags.
Example XML RDF:
<rdf:RDF xmlns:rdf='http://www.w3.org/1999/02/22-rdf-syntax-ns#'>
<rdf:Description rdf:about='' xmlns:dc='http://purl.org/dc/elements/1.1/'>
<dc:date>
<rdf:Seq>
<rdf:li>2013-04-08</rdf:li>
</rdf:Seq>
</dc:date>
</rdf:Description>
</rdf:RDF>
My best attempt at putting together a query:
require 'rdf'
require 'rdf/rdfxml'
require 'rdf/vocab/dc11'
graph = RDF::Graph.load( 'test.rdf' )
date_query = RDF::Query.new( :subject => { RDF::DC11.date => :date } )
results = date_query.execute(graph)
results.map { |result| { result.subject.to_s => result.date.inspect } }
=> [{"test.rdf"=>"#<RDF::Node:0x3fc186b3eef8(_:g70100421177080)>"}]
I get the impression that my results at this stage ("query solutions"?) are a reference to the rdf:Seq container. But I am lost as to how to progress. For the example above, I'd expect to end up, eventually, with an array ["2013-04-08"].
When there is incoming data without the rdf:Seq and rdf:li containers, I am able to extract the strings I want using RDF::Query, following examples at http://rdf.rubyforge.org/RDF/Query.html - unfortunately I cannot find any examples of more complex queries or RDF structures processed in Ruby.
Edit: In addition, when I try to find appropriate methods to use with the RDF::Node object, I cannot see any way to explore any further relations it may have:
results[0].date.methods - Object.methods
=> [:original, :original=, :id, :id=, :node?, :anonymous?, :unlabeled?, :labeled?, :to_sym, :resource?, :constant?, :variable?, :between?, :graph?, :literal?, :statement?, :iri?, :uri?, :valid?, :invalid?, :validate!, :validate, :to_rdf, :inspect!, :type_error, :to_ntriples]
# None of the above leads AFAICS to more data in the graph
I know how to get the same data in xpath (well, at least provided we always get the same paths in the serialisation), but feel it is not the best query language to use in this case (it's my backup plan, however, if it turns out too complex to implement an RDF-query solution)
I think you're correct when saying "my results at this stage ("query solutions"?) are a reference to the rdf:Seq container". RDF/XML is a really horrible serialisation format, instead think of the data as a graph. Here a picture of an RDF:Bag. RDF:Seq works the same and the #students in the example is analogous to the #date in your case.
So to get to the date literal, you need to hop one node further in the graph. I'm not familiar with the syntax of this Ruby library, but something like:
require 'rdf'
require 'rdf/rdfxml'
require 'rdf/vocab/dc11'
graph = RDF::Graph.load( 'test.rdf' )
date_query = RDF::Query.new({
:yourThing => {
RDF::DC11.date => :dateSeq
},
:dateSeq => {
RDF.type => RDF.Seq,
RDF._1 => :dateLiteral
}
})
date_query.execute(graph).each do |solution|
puts "date=#{solution.dateLiteral}"
end
Of course, if you expect the Seq to actually to contain multiple dates (otherwise it wouldn't make sense to have a Seq), you will have to match them with RDF._1 => :dateLiteral1, RDF._2 => :dateLiteral2, RDF._3 => :dateLiteral3 etc.
Or for a more generic solution, match all the properties and objects on the dateSeq with:
:dateSeq => {
:property => :dateLiteral
}
and then filter out the case where :property ends up being RDF:type while :dateLiteral isn't actually the date but RDF:Seq. Maybe the library has also a special method to get all the Seq's contents.

Any string to XML in Ruby

I am trying to convert a random string (which is build in XML format) in to an xml, so I can apply the "to_hash" function to it.
This is what I have:
model = live_requests[3]
parser = XML::Parser.string(model)
model_xml = parser.parse
puts model.to_hash
Now why am I getting an error when 'model_xml' should be an XML file?
I am using LibXML by the way.
http://libxml.rubyforge.org/rdoc/index.html
Libxml does not support the to_hash method. If you are looking for a way to do this that doesn't require traversing XML nodes and bulding the hash manually you should take a look at Nori.
Nori.parse("<tag>This is the contents</tag>")
# => { 'tag' => 'This is the contents' }
If you want to learn how to traverse Libxml's node trees take a look at the answer to this question.

How to retrieve the nokogiri processing instruction attributes?

I am parsing the XML using Nokogiri.
I am able to retrieve the stylesheets. But not the attributes of each stylesheet.
1.9.2p320 :112 >style = xml.xpath('//processing-instruction("xml-stylesheet")').first
=> #<Nokogiri::XML::ProcessingInstruction:0x5459b2e name="xml-stylesheet">
style.name
=> "xml-stylesheet"
style.content
=> "type=\"text/xsl\" href=\"CDA.xsl\""
Is there any easy way to get the type, href attributes values?
OR
Only way is to parse the content(style.content) of the processing instruction ?
I solved this problem by following instruction in below answer.
Can Nokogiri search for "?xml-stylesheet" tags?
Added new to_element method to Nokogiri::XML::ProcessingInstruction class
class Nokogiri::XML::ProcessingInstruction
def to_element
document.parse("<#{name} #{content}/>")
end
end
style = xml.xpath('//processing-instruction("xml-stylesheet")').first
element = style.to_element
To retrieve the href attribute value
element.attribute('href').value
Cannot you do that?
style.content.attribute['type'] # or attr['type'] I am not sure
style.content.attribute['href'] # or attr['href'] I am not sure
Check this question How to access attributes using Nokogiri .

Resources