Get Nokogiri to not add "default" namespace when adding nodes - ruby

Background:
I want to take some xml from one file, put it in a template file and then save the modified template as a new file. It works, but when I save the file out, all the nodes that I added have a default namespace prepeneded, i.e.
<default:ComponentRef Id="C__AD1817F9C64A42F0A14DDDDC82DFC8D9"/>
<default:ComponentRef Id="C__157DD41D70854617A3D6D1E4A39B589F"/>
<default:ComponentRef Id="C__2E6D8662F38FE62CAFA9F8842A28F510"/>
<default:ComponentRef Id="C__54E5E2181323D4A5F37293DAA87B4230"/>
Which I want to be just:
<ComponentRef Id="C__AD1817F9C64A42F0A14DDDDC82DFC8D9"/>
<ComponentRef Id="C__157DD41D70854617A3D6D1E4A39B589F"/>
<ComponentRef Id="C__2E6D8662F38FE62CAFA9F8842A28F510"/>
<ComponentRef Id="C__54E5E2181323D4A5F37293DAA87B4230"/>
The following is my ruby code:
file = "wixmain/generated/DarkOutput.wxs"
template = "wixmain/generated/MsiComponentTemplate.wxs"
output = "wixmain/generated/MSIComponents.wxs"
dark_output = Nokogiri::XML(File.open(file))
template_file = Nokogiri::XML(File.open(template))
#get stuff from dark output
components = dark_output.at_css("Directory[Id='TARGETDIR']")
component_ref = dark_output.at_css("Feature[Id='DefaultFeature']")
#where to insert in template doc
template_component_insert_point = template_file.at_css("DirectoryRef[Id='InstallDir']")
template_ref_insert_point = template_file.at_css("ComponentGroup[Id='MSIComponentGroup']")
template_component_insert_point.children= components.children()
template_ref_insert_point.children= component_ref.children()
#write out filled template to output file
File.open(output, 'w') { |f| template_file.write_xml_to f }
Update
Example of my template file:
<?xml version="1.0" encoding="utf-8"?>
<Wix xmlns='http://schemas.microsoft.com/wix/2006/wi'>
<Fragment>
<ComponentGroup Id='MSIComponentGroup'>
</ComponentGroup>
</Fragment>
<Fragment Id='MSIComponents'>
<DirectoryRef Id='InstallDir'>
</DirectoryRef>
</Fragment>
</Wix>

Workaround was to remove the xmlns attribute in the input file.
Or to use the remove_namespaces! method when opening the input file
input_file = Nokogiri::XML(File.open(input))
input_file.remove_namespaces!

I think you are missing a sample of the template file. Also, is the sample from the input complete?
Nokogiri is either finding the default: namespace during its parsing of one of the two files, and you are inheriting it, or maybe it is not happy with the sample during parsing and is unable to parse cleanly, and as a result somehow adding the default: namespace. You can check the emptiness of the errors array after parsing the dark_output and template_file to see if Nokogiri is happy.
dark_output = Nokogiri::XML(File.open(file))
template_file = Nokogiri::XML(File.open(template))
if (dark_output.errors.any? || template_file.errors.any?)
[... do something here ...]
end
For the fastest answer, you might want to take this question directly to the developers via the Nokogiri-Talk mail-list.

Related

Comment .XML file content using ruby code

Can anyone tell me how I can comment the following line in my .XML file using Ruby?
I hope this can be done by using "nokogiri".
<message group="1" sub_group="1" type="none" destination="mydata" remark="mylist" userOnly="true "/>
output should be:
<!-- <message group="1" sub_group="1" type="none" destination="mydata" remark="mylist" userOnly="true "/> -->
You can search your document with the search method, add a comment with Comment.new and then remove the original line with the remove method.
Nokogiri::XML::Comment.new(doc, node.to_s)
Class: Nokogiri::XML::Comment
Edit:
I implemented an example, but used replace instead of remove:
require 'nokogiri'
f = File.open('./config.xml')
x = Nokogiri::XML(f);
x.search('message').each do |el|
puts(el.to_s)
c = Nokogiri::XML::Comment.new(x, el.to_s);
el.replace(c);
end
File.write('./config.xml', x.to_xml);

How to use Nokogiri to combine multiple like-formatted XML files into CSV

I want to parse multiple like-formatted XML files into a CSV file.
I searched on Google, nokogiri.org, and on SO but I haven't been able to find an answer.
I have ten XML files in identical format in terms of node/element structure, that reside in the current directory.
After combining the XML files into a single XML file, I need to pull out specific elements of the advisory node. I would like to output the link, title, location, os -> language -> name, and reference -> name data to the CSV file.
My code is only able to parse a single XML document and I'd like it to take into account 1:many:
# Parse the XML file into a Nokogiri::XML::Document object
#doc = Nokogiri::XML(File.open("file.xml"))
# Gather the 5 specific XML elements out of the 'advisory' top-level node
data = #doc.search('advisory').map { |adv|
[
adv.at('link').content,
adv.at('title').content,
adv.at('location').content,
adv.at('os > language > name').content,
adv.at('reference > name').content
]
}
# Loop through each array element in the object and write out as CSV row
CSV.open('output_file.csv', 'wb') do |csv|
# Explicitly set headers until you figure out how to get them programatically
csv << ['Link', 'Title', 'Location', 'OS Name', 'Reference Name']
data.each do |row|
csv << row
end
end
I tried changing the code to support multiple XML files and get them into Nokogiri::XML::Document objects:
xml_docs = []
Dir.glob("*.xml").each do |file|
xml = Nokogiri::XML(File.new(file))
xml_docs << Nokogiri::XML::Document.new(xml)
end
This successfully creates an array xml_docs with the correct objects it in, but I don't know how to convert these six objects into a single object.
This is sample XML. All XML files use the same node/element structure:
<advisories>
<title> Not relevant </title>
<customer> N/A </customer>
<advisory id="12345">
<link> https://www.google.com </link>
<release_date>2016-04-07</release_date>
<title> The Short Description Would Go Here </title>
<location> Location Name Here </location>
<os>
<product>
<id>98765</id>
<name>Product Name</name>
</product>
<language>
<id>123</id>
<name>en</name>
</language>
</os>
<reference>
<id>00029</id>
<name>Full</name>
<area>Not Defined</area>
</reference>
</advisory>
<advisory id="98765">
<link> https://www.msn.com </link>
<release_date>2016-04-08</release_date>
<title> The Short Description Would Go Here </title>
<location> Location Name Here </location>
<os>
<product>
<id>12654</id>
<name>Product Name</name>
</product>
<language>
<id>126</id>
<name>fr</name>
</language>
</os>
<reference>
<id>00052</id>
<name>Partial</name>
<area>Defined</area>
</reference>
</advisory>
</advisories>
The code leverages Nokogiri::XML::Document but if Nokogiri::XML::Builder will work better for this, I am more than willing to adjust my code accordingly.
I'd handle the first part, of parsing one XML file, like this:
require 'nokogiri'
doc = Nokogiri::XML(<<EOT)
<advisories>
<advisory id="12345">
<link> https://www.google.com </link>
<title> The Short Description Would Go Here </title>
<location> Location Name Here </location>
<os>
<language>
<name>en</name>
</language>
</os>
<reference>
<name>Full</name>
</reference>
</advisory>
<advisory id="98765">
<link> https://www.msn.com </link>
<release_date>2016-04-08</release_date>
<title> The Short Description Would Go Here </title>
<location> Location Name Here </location>
<os>
<language>
<name>fr</name>
</language>
</os>
<reference>
<name>Partial</name>
</reference>
</advisory>
</advisories>
EOT
Note: This has nodes removed because they weren't important to the question. Please remove fluff when asking as it's distracting.
With this being the core of the code:
doc.search('advisory').map{ |advisory|
link = advisory.at('link').text
title = advisory.at('title').text
location = advisory.at('location').text
os_language_name = advisory.at('os > language > name').text
reference_name = advisory.at('reference > name').text
{
link: link,
title: title,
location: location,
os_language_name: os_language_name,
reference_name: reference_name
}
}
That could be DRY'd but was written as an example of what to do.
Running that results in an array of hashes, which would be easily output via CSV:
# => [
{:link=>" https://www.google.com ", :title=>" The Short Description Would Go Here ", :location=>" Location Name Here ", :os_language_name=>"en", :reference_name=>"Full"},
{:link=>" https://www.msn.com ", :title=>" The Short Description Would Go Here ", :location=>" Location Name Here ", :os_language_name=>"fr", :reference_name=>"Partial"}
]
Once you've got that working then fit it into a modified version of your loops to output CSV and read the XML files. This is untested but looks about right:
CSV.open('output_file.csv', 'w',
headers: ['Link', 'Title', 'Location', 'OS Name', 'Reference Name'],
write_headers: true
) do |csv|
Dir.glob("*.xml").each do |file|
xml = Nokogiri::XML(File.read(file))
# parse a file and get the array of hashes
end
# pass the array of hashes to CSV for output
end
Note that you were using a file mode of 'wb'. You rarely need b with CSV as CSV is supposed to be a text format. If you are sure you will encounter binary data then use 'b' also, but that could lead down a path containing dragons.
Also note that this is using read. read is not scalable, which means it doesn't care how big a file is, it's going to try to read it into memory, whether or not it'll actually fit. There are lots of reasons to avoid that, but the best is it'll take your program to its knees. If your XML files could exceed the available free memory for your system then you'll want to rewrite using a SAX parser, which Nokogiri supports. How to do that is a different question.
it was actually an Array of array of hashes. I'm not sure how I ended up there but I was easily able to use array.flatten
Meditate on this:
foo = [] # => []
foo << [{}] # => [[{}]]
foo.flatten # => [{}]
You probably wanted to do this:
foo = [] # => []
foo += [{}] # => [{}]
Any time I have to use flatten I look to see if I can create the array without it being an array of arrays of something. It's not that they're inherently bad, because sometimes they're very useful, but you really wanted an array of hashes so you knew something was wrong and flatten was a cheap way out, but using it also costs more CPU time. It's better to figure out the problem and fix it and end up with faster/more efficient code. (And some will say that's a wasted effort or is premature optimization, but writing efficient code is a very good trait and goal.)

Boost read/write XML file: how to change the characters encoding?

I'm trying to read/write an XML file, using Boost functions read_xml and write_xml.
The XML file original encoding is "windows-1252", but after the read/write operations, the encoding became "utf-8".
This is the XML original file:
<?xml version="1.0" encoding="windows-1252" standalone="no" ?>
<lot>
<name>Lot1</name>
<lot_id>123</lot_id>
<descr></descr>
<job>
<name>TEST</name>
<num_items>2</num_items>
<item>
<label>Item1</label>
<descr>Item First Test</descr>
</item>
<item>
<label>Item2</label>
<descr>Item Second Test</descr>
</item>
</job>
</lot>
And this is the output one:
<?xml version="1.0" encoding="utf-8"?>
<lot>
<name>Lot1</name>
<lot_id>123</lot_id>
<descr></descr>
<job>
<name>TEST</name>
<num_items>2</num_items>
<item>
<label>Item1</label>
<descr>Item First Test</descr>
</item>
<item>
<label>Item2</label>
<descr>Item Second Test</descr>
</item>
</job>
</lot>
This is my C++ code (just a test code):
#include <boost/property_tree/ptree.hpp>
#include <boost/property_tree/xml_parser.hpp>
using boost::property_tree::ptree;
ptree xmlTree;
read_xml(FILE_XML, xmlTree);
for (auto it = xmlTreeChild.begin(); it != xmlTreeChild.end();)
{
std::string strItem = it->first.data();
if (strcmp(strItem.c_str(), "item") == 0)
{
std::string strLabel = it->second.get_child("label").data();
if (strcmp(strLabel.c_str(), "item3") != 0)
{
it = xmlTreeChild.erase(it);
}
}
++it;
}
auto settings = boost::property_tree::xml_writer_make_settings<std::string>('\t', 1);
write_xml(FILE_XML, xmlTree, std::locale(), settings);
I need to read and re-write the file using the same encoding from the original file.
I've tried also to change the Locale settings, using:
std::locale newlocale1("English_USA.1252");
read_xml(FILE_XML, xmlTree, 0, newlocale1);
...
auto settings = boost::property_tree::xml_writer_make_settings<std::string>('\t', 1);
write_xml(FILE_XML, xmlTree, newlocale1, settings);
but I've got the same result.
How can I be able to read and write, using the original file encoding, with Boost functions?
Thank you
You can pass an encoding via the writer settings:
auto settings = boost::property_tree::xml_writer_make_settings<std::string>(
'\t', 1, "windows-1252");
Of course, make sure key/values are in fact latin1/cp1252 compatible (this makes sense as long as you read all the information from the source file; however you have to take care when e.g. assigning user input to a property tree node; you might need to convert from the input encoding to cp1252 first).
Live On Coliru
To fix the problem you experience you have to replace this line:
read_xml(FILE_XML, xmlTree);
with
read_xml(FILE_XML,
xmlTree,
boost::property_tree::xml_parser::trim_whitespace);
as far as I know your issue cannot be fixed only by modifying the settings of the write_xml function.
I tried it and worked: when I compare the files ignoring the whitespaces, the input and output xml files are identical.
You can also write to a string stream as following:
#include <boost/property_tree/ptree.hpp>
#include <boost/property_tree/xml_parser.hpp>
boost::property_tree::ptree pt;
std::ostringstream oss;
write_xml(
oss, pt,
boost::property_tree::xml_writer_make_settings<char>(
'\t', 0, "ASCII"));

I want to omit specifying default namespace using libxml-ruby

I have questions about libxml-ruby.
There is a xml file "sample.xml".
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<worksheet xmlns="http://***" xmlns:r="http://???">
<sheetData>
<row><v>1</v></row>
</sheetData>
</worksheet>
I want to deal with nodes without specifying default namespace like below.
xml = XML::Document.file('sample.xml')
sheet_data = xml.find_first('sheetData')
Of course, I can do it like below.
NS = {
main: 'http://***',
r: 'http://???',
}
sheet_data = xml.find_first('main:sheetData', NS)
But I want to omit string of default namespace.
I tried some properties and methods belongs to XML::Namespace[s], but not effected.
And one more problem when I save a xml file.
ns = XML::Namespace.new(xml.root, 'main', 'http://***')
row = XML::Node.new('row', nil, ns)
sheet_data << row
xml.save("sample.xml")
Published like below.
<row><v>1</v></row>
<main:row/>
I want that it's omitted string of "main:".
So I do this, but it's really ugly.
open('sample.xml', 'wb') do |f|
f.write(xml.to_s.gsub(/(<\/?)main:/, '\1'))
end
Do you have any good idea?

How to replace a particular line in xml with the new one in ruby

I have a requirement where I need to replace the element value with the new one and I dont want any other modification to be done to the file.
<mtn:test-case title='Power-Consist-Message'>
<mtn:messages>
<mtn:message sequence='4' correlation-key='0x0F04'>
<mtn:header>
<mtn:protocol-version>0x4</mtn:protocol-version>
<mtn:message-type>0x0F04</mtn:message-type>
<mtn:message-version>0x01</mtn:message-version>
<mtn:gmt-time-switch>false</mtn:gmt-time-switch>
<mtn:crc-calc-switch>1</mtn:crc-calc-switch>
<mtn:encrypt-switch>false</mtn:encrypt-switch>
<mtn:compress-switch>false</mtn:compress-switch>
<mtn:ttl>999</mtn:ttl>
<mtn:qos-class-of-service>0</mtn:qos-class-of-service>
<mtn:qos-priority>2</mtn:qos-priority>
<mtn:qos-network-preference>1</mtn:qos-network-preference>
this is how the xml file looks like, I want to replace 999 with "some other value", under s section, but when am doing that using formatter in ruby some other unwanted modifications are taking place, the code that am using is as belows
File.open(ENV['CadPath1']+ "conf\\cad-mtn-config.xml") do |config_file|
# Open the document and edit the file
config = Document.new(config_file)
testField=config.root.elements[4].elements[11].elements[1].elements[1].elements[1].elements[11]
if testField.to_s.match(/<mtn:qos-network-preference>/)
test=config.root.elements[4].elements[11].elements[1].elements[1].elements[1].elements[8].text="2"
# Write the result to a new file.
formatter = REXML::Formatters::Default.new
File.open(ENV['CadPath1']+ "conf\\cad-mtn-config.xml", 'w') do |result|
formatter.write(config, result)
end
end
end
when am writting the modifications to the new file, the xml file size is getting changed from 79kb to 78kb, is there any way to just replace the particular line in xml file and save changes without affecting the xml file.
Please let me know soon...
I prefer Nokogiri as my XML/HTML parser of choice:
require 'nokogiri'
xml =<<EOT
<mtn:test-case title='Power-Consist-Message'>
<mtn:messages>
<mtn:message sequence='4' correlation-key='0x0F04'>
<mtn:header>
<mtn:protocol-version>0x4</mtn:protocol-version>
<mtn:message-type>0x0F04</mtn:message-type>
<mtn:message-version>0x01</mtn:message-version>
<mtn:gmt-time-switch>false</mtn:gmt-time-switch>
<mtn:crc-calc-switch>1</mtn:crc-calc-switch>
<mtn:encrypt-switch>false</mtn:encrypt-switch>
<mtn:compress-switch>false</mtn:compress-switch>
<mtn:ttl>999</mtn:ttl>
<mtn:qos-class-of-service>0</mtn:qos-class-of-service>
<mtn:qos-priority>2</mtn:qos-priority>
<mtn:qos-network-preference>1</mtn:qos-network-preference>
EOT
Notice that the XML is malformed, i.e., it doesn't terminate correctly.
doc = Nokogiri::XML(xml)
I'm using CSS accessors to find the ttl node. Because of some magic, Nokogiri's CSS ignores XML name spaces, simplifying finding nodes.
doc.at('ttl').content = '1000'
puts doc.to_xml
# >> <?xml version="1.0"?>
# >> <test-case title="Power-Consist-Message">
# >> <messages>
# >> <message sequence="4" correlation-key="0x0F04">
# >> <header>
# >> <protocol-version>0x4</protocol-version>
# >> <message-type>0x0F04</message-type>
# >> <message-version>0x01</message-version>
# >> <gmt-time-switch>false</gmt-time-switch>
# >> <crc-calc-switch>1</crc-calc-switch>
# >> <encrypt-switch>false</encrypt-switch>
# >> <compress-switch>false</compress-switch>
# >> <ttl>1000</ttl>
# >> <qos-class-of-service>0</qos-class-of-service>
# >> <qos-priority>2</qos-priority>
# >> <qos-network-preference>1</qos-network-preference>
# >> </header></message></messages></test-case>
Notice that Nokogiri replaced the content of the ttl node. It also stripped the XML namespace info because the document didn't declare it correctly, and, finally, Nokogiri has added closing tags to make the document syntactically correct.
If you want the namespace to be declared in the output, you'll need to make sure it's there in the input.
If you need to just literally replace that value without affecting anything else about the XML file, even if (as pointed by the Tin Man above) that would mean leaving the original XML file malformed, you can do that with direct string manipulation using a regular expression.
Assuming there is guaranteed to only be one <mtn:ttl> tag in your XML document, you could just do:
doc = IO.read("somefile.xml")
doc.sub! /<mtn:ttl>.+?<\/mtn:ttl>/, "<mtn:ttl>some other value<\/mtn:ttl>"
File.open("somefile.xml", "w") {|fh| fh.write(doc)}
If there might be more than one <mtn:ttl> tag, then this is trickier; how much trickier depends on how you want to figure out which tag(s) to change.

Resources