Transforming XML to ruby code - ruby

What's a good way to transform XML to ruby code? I've got a GraphML file containing information about a graph structure. I want to instantiate a graph from that with ruby objects.
Currently I use XPath to do this in a procedural way. I know, there's also a way to do it with XSLT in a more declarative way.
Do you know other ways? What would you suggest, any experience?

I don't quite understand why you would want to transform the GraphML data into Ruby code, rather than using Ruby to parse the GraphML data into Ruby object instances?
I made this example as an exercise: https://github.com/endymion/GraphML-parsing-exercise
It uses Nokogiri to parse the XML, then XPath to select nodes, then it iterates through the nodes, instantiating Ruby object instances: https://github.com/endymion/GraphML-parsing-exercise/blob/master/parse.rb
Is that roughly what you're looking for?

Related

How to parse large xml file in ruby

I need to parse a large (4gb) xml file in ruby, preferably with nokogiri. I've seen a lot of code exampled using
File.open(path)
but this takes too much time in my case. Is there an option to read the xml node by node in order to prevent loading the file at ones. Or what would be the fastest way to parse such a large file.
Best,
Phil
You can try using Nokogiri::XML::SAX
The basic way a SAX style parser works is by creating a parser,
telling the parser about the events we’re interested in, then giving
the parser some XML to process. The parser will notify you when it
encounters events your said you would like to know about.
I do this kind of work with LibXML http://xml4r.github.io/libxml-ruby/ (require 'xml') and its LibXML::XML::Reader API. It's simpler than SAX and allows you to make almost everything. REXML includes a similar API also, but it's quite buggy. Stream APIs like the one I mention or SAX shouldn't have any problem with huge files. I have not tested Nokogiri.
you may like to try this out - https://github.com/amolpujari/reading-huge-xml
HugeXML.read xml, elements_lookup do |element|
# => element{ :name, :value, :attributes}
end
I also tried using ox

Nokogiri: ids Vs hierarchy xpath performance

I have to write down the xml schema for a dataset which is hierarchically organized. It has to be parsed by Nokogiri for information retrieval. My question is, under a performance point of view, is it better to respect the hierarchy or to flatten it?
E.g.
<item_1 id="id_1">
<item_2 id="id_2">value</item_2>
</item_1>
or
<item id_1="id_2" id_2="id_2">value</item>
I know that multiple attributes should be avoided as far as readability and maintainability are concerned, but performance is my priority.
If you want the absolute fastest performance and the documents are large, you probably don't want to use XPath at all. A SAX (or Reader) filter will be the fastest.
But if you are going to have Nokogiri parse the document and create a DOM for XPath, I don't think it will make much difference whether you query using:
doc.xpath('/item1[#id=x]/item2[#id=y]') #first case
or
doc.xpath('/item[#id_1=x and #id2=y]') #second case
Of course, benchmarking these two solutions against your real data is the only way to know for sure.

Using Boost Property Tree to replace DOM Parser

I need to write a XML Parser using Boost Property tree which can replace an existing MSXML DOM Parser. Basically my code should return the list of child nodes, number of child nodes etc. Can this be achieved using Property Tree? Eg. GetfirstChild(),selectNodes(),Getlength()etc.
I saw a lot of APIs related to Boost Property Tree, but the documentation seems to be bare minimum and confusing. As of now, I am able to parse the entire XML using BOOST_FOREACH. But the path to each node is hard coded which will not serve my purpose.
boost::property_tree can be used to parse XML and it's a tree so you can use as XML DOM substitution but the library is not intended to be fully fledged XML parser and it's not complaint with XML standard. For instance it can successfully parse non-wellformed xml input and it doesn't support some of XML features. So it's your choice - if you want simple interface to simple XML configuration then yes, you should use boost::property_tree

Customisable, reversible XML serialisation in Ruby

Ox and similar libraries provide the serialisation of ruby objects to XML, but are there any libraries which allow you to define the form that serialisation takes?
Essentially, is there a library akin to (haml|slim|mustache) which allows you to define the mapping of a hash (lets say) to an XML document, but which can also parse the XML document and generate the original hash?
(assuming all the elements of the hash are mapped)
If the mapping is not too exotic, you might be able to do it with representable. It also supports JSON, by the way.

Is there an XPath equivilent for Linq to XML?

I have been using Linq to XML for a few hours and while it seems lovely and powerful when it comes to loops and complex selections, it doesn't seem so good for situations where I just want to select a single node value which XPath seems to be good at.
I may be missing something obvious here but is there a way to use XPath and Linq to XML together without having to parse the document twice?
You can still use XPath, with the XPathEvaluate, XPathSelectElement and XPathSelectElements extension methods. You can also call CreateNavigator to create an XPathNavigator.

Resources