parsing a XML document in ruby - ruby

I am new to ruby and XML. I have been given an XML file and asked to do some data manipulation in that.
For ex. consider the below XML file.
<?xml version="1.0" encoding="UTF-8"?>
<note>
<to> Tove</to>
<from>Jani</from>
<heading>Reminder</heading>
<body>Don't forget me this weekend!</body>
</note>
They are asking me to extract the the string which are inside the tags for ex "Tove", "Jani" and do some manipulation(for ex replacing "tove" with "john") on it and rewrite the data to same xml document.
I know ruby has a lot of gems and utilities and there must be a good utility to do it. If someone has any idea about any utility to do this work easily then just let me know.
And if there is no utility then if someone could give me some idea on how to proceed with it then it would be good.

One way is to use REXML that comes as part of the standard library.
Another way is to use Nokogiri (I would recommend using this).
Here are some good tutorials that will definitely help you:
http://ruby.bastardsbook.com/chapters/html-parsing/
https://blog.engineyard.com/2010/getting-started-with-nokogiri/

Related

Xpath for scrapy with atom namespace

I am trying to scrape data from an xml file using scrapy.
The file is structures as follows:
<feed xml:base="https://example.com/sap/...">
<entry><id>http://example.com/.../idset</id>
<m:properties>
<d:SubID>xyz</d:JobID>
<d:Posting>123456</d:Posting>
<d:Title>BoringTitle</d:Title>
</m:properties>
</entry>
</feed>
In Scrapy I import the atom namespace:
xxs = XmlXPathSelector(response)
xxs.register_namespace("atom", "http://www.w3.org/2005/Atom")
And it is possible to extract some of the data with
xxs.xpath("//atom:entry").extract()
However, I found it impossible to select the data with a colon:
<d:Title>BoringTitle</d:Title>
What would be the right xpath to print the title?
Maybe there is a simple answer, I am a mechanical engineer doing this for a hobby project.
Any help would be appreciated!
Kind regards
John
As mentioned in the question comments, you need to add a namespace for d as well.
However, in your case, it may be better to simply remove all namespaces and work without them.

Apache Nifi - move a top-level element into children (JSON/XML)

New to Apache Nifi and trying to process an XML that looks a bit like this:
<?xml version="1.0" encoding="iso-8859-1"?>
<productCatalog xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<version>CHANNELS-VERSION-100</version>
<channels>
<channel>
<id>1</id>
<name>Super Channel 1</name>
</channel>
<channel>
<id>2</id>
<name>Super Channel 2</name>
</channel>
</channels>
</productCatalog>
What I want, is to read the "version" element, then include it in the "channel" children when I process them further down the pipeline e.g. to produce something like this (in XML or JSON):
<processedChannel>
<catalogVersion>CHANNELS-VERSION-100</catalogVersion>
<id>2</id>
<name>Super Channel 2</name>
</processedChannel>
I've tried various permutations of XQuery, XMLSplit, UpdateAttribute to put the version in a flow attribute (not content),etc, but I cannot seem to make the "version" available for all the "channels" downstream. I can either get a flow that only contains the version, or I can get the channels, but I cannot find a way to combine them.
This seems like it should be easy, but I cannot find an obvious solution.
My real use case has a really big XML file, so I am trying to avoid loading it all in one go - I split it as early as possible so I can stream the children more easily. That's why I want to put the version onto the children if possible.
Any help gratefully received!
ForkRecord should do what you want. From your desired output I think you'll want "extract" as the mode, but you could try both and see what you get for output. ForkRecord and the XML Reader/Writer are available as of NiFi 1.7.0.
#mattyb: Thanks for your suggestions. ForkRecord looks really interesting, but doesn't fit with my current use case because it needs a schema. But the EvaluateXPath and EvaluateXQuery options both seem to work now, even though I spent hours playing around with these previously.
Here's my flow now:
ListFile --> FetchFile --> Evaluate XPath (to get version as flow-file attribute) --> SplitXml --> etc - and now I have the version in my flow-file attributes for the downstream processing, which was what was wanted.
Not sure why it didn't work before, but thanks for prompting me to look at it again.

Ruby: parsing message from confluence xml macro

I am trying to parse the message that says "this is a test"
<ac:structured-macro ac:name="warning"><ac:rich-text-body><strong>High</strong> This is a test!</ac:rich-text-body></ac:structured-macro>
I am using nokogiri in ruby and was able to parse this much and nothing else. To get this far, my code looks something like this:
xml = Nokogiri::XML(response)
body = xml.at("body").text
alert_body = alert[3]
I have wasted too many hours looking in the confluence rest api documentation and google for just general xml parsing.
The problems are:
There is no body tag in your example XML.
You're dealing with XML-Namespaces so your selector needs to change.
Your XML sample is incomplete since it's missing the line that would define the namespaces, so this is a bit of a hack but should give you an idea what needs to be done:
require 'nokogiri'
doc = Nokogiri::XML(<<EOT)
<foo xmlns:ac="http://www.w3.org/2005/Atom">
<ac:structured-macro ac:name="warning"><ac:rich-text-body><strong>High</strong> This is a test!</ac:rich-text-body></ac:structured-macro>
</foo>
EOT
doc.at('ac|rich-text-body').text # => "High This is a test!"
Namespaces are useful but they can be a major pain in the neck. Nokogiri makes it pretty easy to deal with them, especially when using CSS selectors. Read Nokogiri's "Searching an HTML / XML Document" page's "Namespaces" section for more information.

How to parse large xml file in ruby

I need to parse a large (4gb) xml file in ruby, preferably with nokogiri. I've seen a lot of code exampled using
File.open(path)
but this takes too much time in my case. Is there an option to read the xml node by node in order to prevent loading the file at ones. Or what would be the fastest way to parse such a large file.
Best,
Phil
You can try using Nokogiri::XML::SAX
The basic way a SAX style parser works is by creating a parser,
telling the parser about the events we’re interested in, then giving
the parser some XML to process. The parser will notify you when it
encounters events your said you would like to know about.
I do this kind of work with LibXML http://xml4r.github.io/libxml-ruby/ (require 'xml') and its LibXML::XML::Reader API. It's simpler than SAX and allows you to make almost everything. REXML includes a similar API also, but it's quite buggy. Stream APIs like the one I mention or SAX shouldn't have any problem with huge files. I have not tested Nokogiri.
you may like to try this out - https://github.com/amolpujari/reading-huge-xml
HugeXML.read xml, elements_lookup do |element|
# => element{ :name, :value, :attributes}
end
I also tried using ox

how to create XML using fetch data from database

I am fetching data from database and send response in XML like below..
I want to fetch data into array or hash and then response to xml.... OR create direct xml..
Please refer below xml example...
<Response>
<Tolls>
<Toll>
<Id>123</Id>
<Name>Bradfield Highway</Name>
<Address>Bradfield Highway, New York</Address>
<Charge>5.95</Charge>
<Location lat="41.145556" lng="-73.995"/>
<EntryRects>
<EntryRect>
<Points>
<Point lat="41.145556" lng="-73.995"/>
<Point lat="41.145556" lng="-73.995"/>
<Point lat="41.145556" lng="-73.995"/>
<Point lat="41.145556" lng="-73.995"/>
</Points>
</EntryRects>
...
</EntryRects>
</Toll>
<Toll>
...
</Toll>
...
</Tolls>
</Response>
please send me response asap if any one know...
you should use the Builder::XmlMarkup, which provides a simple way to create XML markup and data structures
You don't say what database you are using, but many can generate the XML for you as the result of a query, instead of returning a normal "select" statement's output. That would be the fastest/easiest path because the data is going to have to be returned to your app anyway, so let the DBM do the conversion on the fly.
Second easiest is to use something like Nokogiri, Builder or one of several other gems. They can handle the encoding of non-ASCII characters, specifying the correct headers, and make sure the nesting and tag closure is correct. That's why people use those tools, because they save a huge amount of coding.
The last choice should be attempting to do it yourself. Simply because you asked the question, I suspect you don't really understand what goes into creating well-formed XML. It's possible to generate trivial XML output using something like ERB or maybe HAML to help with the nesting, but encoding will fall directly on you. If you insist on doing it, then start reading all the related links on the right side of the Stack Overflow page, plus any XML documentation you can find.

Resources