Cant Read xml file using sax Parser with Nokogiri - ruby

I am using ruby 1.9.3 with rails 3.1. My requirement is that there is a parser file like below. when i opened with browser; Tags are not aligned in order. After the <item>; the data are clubbed format. There is a presence of
<?xml version="1.0" encoding="utf-8"?>
when I opened in sublime text; it shows after the <item>
<![CDATA[<?xml version="1.0" encoding="utf-8"?>
also after the </item> there is ]]> present. The data needs to be parsed are inside this <item></item>. the method called parse_file form Nokogiri called only start_element, end_element. When we tried manually by editing the file via removing the above statements; then it will call the characters method to fetch the data. Below is the example code.is there any other way?.
<batch transactionType="HC"><item><?xml version="1.0" encoding="utf-8"?><C><CI><Ve>00501</Ve></CI></C></item></batch>

You can do it easily using "xml-simple". Assuming your XML file name is "test.xml", first install the gem:
gem install xml-simple
Then, you can try:
require "XmlSimple"
abc = XmlSimple.xml_in File.read("test.xml")
puts abc['item']
The output should be:
{"C"=>[{"CI"=>[{"Ve"=>["00501"]}]}]}

Related

Ruby gem Diffy not returning differences

I need to compare two xml files and display the differences in a html report. In order to do this, I installed the ruby gem Diffy (and the gems rspec and diff-lcs as directed by the Diffy documentation), but it does not seem to be working properly as differences are not being returned.
I have two xmls files I want to compare.
Xml file one:
<?xml version="1.0" encoding="UTF-8"?>
<SourceDetails>
<Origin>Origin</Origin>
<Identifier>Identifier</Identifier>
<Timestamp>2001-12-31T12:00:00</Timestamp>
</SourceDetails>
<AsOfDate>2001-01-01</AsOfDate>
<Instrument>
<ASXExchangeSecurityIdentifier>ASX</ASXExchangeSecurityIdentifier>
</Instrument>
<Rate>0.0</Rate>
Xml file two:
<?xml version="1.0" encoding="UTF-8"?>
<SourceDetails>
<Origin>FEED</Origin>
<Identifier>IR</Identifier>
<Timestamp>2017-01-01T02:11:01Z</Timestamp>
</SourceDetails>
<AsOfDate>2017-01-02</AsOfDate>
<Instrument>
<CommonCode>GB0</CommonCode>
</Instrument>
<Rate>0.69</Rate>
When I supply the two xml files as arguments to the diffy function:
puts Diffy::Diff.new('xmldoc1', 'xmldoc2', :source => 'files').to_s(:html)
no differences are returned. When I store the two xml files in String variables and supply these variables as arguments to the Diffy function:
puts Diffy::Diff.new(doc1, doc2, :include_plus_and_minus_in_html => true).to_s(:html)
again no differences are returned. To figure out if my xmls were causing the problem, I also tried supplying two different strings to the Diffy function:
puts Diffy::Diff.new("Hello how are you", "Hello how are you\nI'm fine\nThat's great\n")
but this also returned nothing when there are clear differences.
Does anyone know what the problem may be?

How to turn a file into a Nokogiri::XML object?

I have a sample XML file (let's call it example.xml for the sake of this question) and want to turn it into a Nokogiri object.
According to documentation and lots of other online sources, this should work:
xml = Nokogiri::XML(File.read("example.txt"))
But the value of xml.to_xml is only:
"<?xml version=\"1.0\"?>\n"
In other words, it's ignoring the rest of the file. There are many tags afterwards and none of them are in the xml object.
How do I get Nokogiri to get all the tags?
Here's the XML I'm using:
<? xml version="1.0" encoding="UTF-8" ?>
<Document>
<Test>Test</Test>
</Document>
It looks like you are trying to parse an invalid XML doc.
This can be fixed by removing the spaces in the XML declaration:
<?xml version="1.0" encoding="UTF-8"?>
<Document>
<Test>Test</Test>
</Document>
How I figured this out
By default, when Nokogiri has errors parsing a document it populates an errors array.
xml = Nokogiri::XML(File.read("example.txt"))
p xml.errors
# => [#<Nokogiri::XML::SyntaxError: xmlParsePI : no target name>, #<Nokogiri::XML::SyntaxError: Start tag expected, '<' not found>]
You can also configure Nokogiri to raise an exception of it has parsing errors:
xml = Nokogiri::XML(File.read("example.txt")) do |config|
config.strict
end
Both of these cases show that there were issues parsing the document

Why can't I get a result from an XPath with namespace in the root element? [duplicate]

This question already has answers here:
Nokogiri/Xpath namespace query
(3 answers)
Closed 8 years ago.
This is probably an XML namespace newbie question but I can't figure out how to get an XPath to work with the following trunctated XML with this particular root element:
<?xml version="1.0" encoding="UTF-8"?>
<CreateOrUpdateEventsRequest xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns="http://dhamma.org" version="3-0-0">
<LanguageKey>
<IsoCode>en</IsoCode>
</LanguageKey>
<Publish>
<Value>true</Value>
</Publish>
<Events>
<Event>
<EventKey>
<LocationKey>
<SubDomain>rasmi</SubDomain>
</LocationKey>
<EventId>10DayPDFStdTag</EventId>
</EventKey>
</Event>
</Events>
</LanguageKey>
</CreateOrUpdateEventsRequest>
Using Ruby and Nokogiri (with a just updated libxml2), it works fine with XPath only if I delete all the extra info in the root element, making it:
<CreateOrUpdateEventsRequest>
Otherwise nothing works:
$> #doc.xpath("//CreateOrUpdateEventsRequest") #=> [] with original header, an array of nodes with modified header
$> #doc.xpath("//LanguageKey") #=> [] with the original header, an array of nodes with modified header
$> #doc.xpath("//xmlns:LanguageKey") #=> undefined namespace prefix with the original
How do I address namespaces like this with XPath?
Many thanks for the help.
The answer seems to be that the XML re-declared XMLNS when it should have declared the namespace with a prefix as in xmlns:myns.
From www.w3.org:
The XML specification reserves all names beginning with the letters 'x', 'm', 'l' in any combination of upper- and lower-case for use by the W3C. To date three such names have been given definitions—although these names are not in the XML namespace, they are listed here as a convenience to readers and users:
xml: See http://www.w3.org/TR/xml/#NT-XMLDecl and http://www.w3.org/TR/xml-names/#xmlReserved
xmlns: See http://www.w3.org/TR/xml-names/#ns-decl
xml-stylesheet: See The xml-stylesheet processing instruction
I don't use Nokogiri nor Ruby,
but you need to register a prefix for namespace http://dhamma.org
When I read http://nokogiri.org/tutorials/searching_a_xml_html_document.html
I understand you must do something like
$> #doc.xpath('//dha:LanguageKey', 'dha' => 'http://dhamma.org')
Here's some code to consider. Starting with code to create a Nokogiri::XML::Document:
require 'nokogiri'
XML = <<EOT
<?xml version="1.0" encoding="UTF-8"?>
<CreateOrUpdateEventsRequest xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns="http://dhamma.org" version="3-0-0">
<LanguageKey>
<IsoCode>en</IsoCode>
</LanguageKey>
<Publish>
<Value>true</Value>
</Publish>
<Events>
<Event>
<EventKey>
<LocationKey>
<SubDomain>rasmi</SubDomain>
</LocationKey>
<EventId>10DayPDFStdTag</EventId>
</EventKey>
</Event>
</Events>
</LanguageKey>
</CreateOrUpdateEventsRequest>
EOT
doc = Nokogiri::XML(XML)
Here's the root node's name:
doc.root.name # => "CreateOrUpdateEventsRequest"
The docs say:
When using CSS, if the namespace is called “xmlns”, you can even omit the namespace name.
doc.at('CreateOrUpdateEventsRequest').name # => "CreateOrUpdateEventsRequest"
doc.at('LanguageKey').to_xml # => "<LanguageKey>\n <IsoCode>en</IsoCode>\n </LanguageKey>"
Using XPath, we can specify the default namespace as:
doc.at('//xmlns:LanguageKey').to_xml # => "<LanguageKey>\n <IsoCode>en</IsoCode>\n </LanguageKey>"
Sometimes, if there are a lot of namespaces it makes sense to use collect_namespaces and pass them in:
name_spaces = doc.collect_namespaces # =>
doc.at('//xmlns:LanguageKey', name_spaces).to_xml # => "<LanguageKey>\n <IsoCode>en</IsoCode>\n </LanguageKey>"
You'll need to look through the documentation for Nokogiri::XML::Node for more information on the various methods.
I recommend using CSS selectors for simplicity and readability over XPath, as a first try. I think XPath has more functionality but it makes my eyes bug out sometimes, so I prefer CSS.

fsresource - runmode config stored in crx via xml

i'm using the fsresource sling extension to access the filesystem when working on JSP, JS, CSS and so on. When just yanking the bundle into the crx and configuring it via the OSGi console, everything works as expected. But when i try to add a new runmode (configurtion), the result is unsatisfying.
config/src/main/content/jcr_root/apps/samples/config/org.apache.sling.fsprovider.internal.FsResourceProvider.factory.config.xml
Is the path of the main configuration, which i'm using on a local instance to figure out, how to achieve the desired results, but the best i could get was an unbound configuration displayed in the
system/console/configMgr
The contents of the XML file:
<?xml version="1.0" encoding="UTF-8"?>
<jcr:root xmlns:sling="http://sling.apache.org/jcr/sling/1.0"
xmlns:jcr="http://www.jcp.org/jcr/1.0"
jcr:primaryType="sling:OsgiConfig"
provider.roots="/apps/ui-samples"
provider.file="/Volumes/samples/ui/src/main/content/jcr_root/apps/ui-samples"
provider.checkinterval="1000"/>
Apparently, the i just thought too complicated about the name of the file.
org.apache.sling.fsprovider.internal.FsResourceProvider-samples.xml
for instance does the job.

Associate an XML-Stylesheet with an XML Document with Nokogiri

Is it possible to associate a stylesheet with with Nokogiri, to create this structure?
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="http://www.my-site.com/sitemap.xsl"?>
<root>
...
</root>
OMG, there is so much fail here that I am breaking the unofficial policy of Team Nokogiri and am providing the correct, sane answer to this question:
require "nokogiri"
doc = Nokogiri::XML "<root>foo</root>"
doc.root.add_previous_sibling Nokogiri::XML::ProcessingInstruction.new(doc, "xml-stylesheet", 'type="text/xsl" href="foo.xsl"')
puts doc.to_xml
# => <?xml version="1.0"?>
# <?xml-stylesheet type="text/xsl" href="foo.xsl"?>
# <root>foo</root>
In the future, please ask questions about Nokogiri on the nokogiri-talk mailing list (http://groups.google.com/group/nokogiri-talk), get the correct answer in a timely fashion, and save everyone a little effort.
There is not.
The way I did it:
xml.gsub!("<?xml version=\"1.0\"?>") do |head|
result = head
result << "\n"
result << "<?xml-stylesheet type=\"text/xsl\" href=\"#{stylesheet}\"?>"
end
Cheers.

Resources