Xml formatting using Node - ruby

Following is the method used to write an entry to xml file
def write_entry(entry)
node = Nokogiri::XML::Node.new("url", #xml_document)
node["loc"]= entry[:url]
node["lastmod"]= entry[:lastmod].to_s
node["changefreq"] = entry[:frequency].to_s
node["priority"] = entry[:priority].to_s
node.to_xml
end
The entry looks like this:
<urlset>
<url loc="http:`enter code here`//www.experteer.co.uk/vacaturebank/banen/vacatures/xing-ag" lastmod="2011-11-23 16:58:27 UTC" changefreq="0.8" priority="monthly"/>
</urlset>
I want the entry of xml to be like this
<urlset>
<url>
<loc> http://www.experteer.co.uk/vacaturebank/banen/vacatures/xing-ag </loc>
<lastmod> 2011-11-23 16:58:27 UTC </lastmod>
<changefreq> 0.8 </changefreq>
<priority> monthly </priority>
</url>
</urlset>
Is it possible with using Node or I have to use Builder?
If possible with Node Then how?
and If I have to use Builder it writes header for each entry how can I handle that it dont write header for each entry.

you can use << or add_child to append children nodes to a node.
def write_entry(entry)
url = Nokogiri::XML::Node.new( "url" , #xml_document )
%w{loc lastmod changefreq priority}.each do |node|
url << Nokogiri::XML::Node.new( node, #xml_document ).tap do |n|
n.content = entry[ node.to_sym ]
end
end
url.to_xml
end
For this to work correctly, you have to change entry[:url] to entry[:loc]. and entry[:frequency] to entry[:changefreq], which shouldn't be a bad thing (it's best to have the same name for the same thing everywhere, isn't it ?).
Alternatively, if your entry hash only contains what you need to convert to xml, use entry.each do |key,value| instead of the array.

Related

How to include the source line number everywhere in html output in Sphinx?

Let's say I'm writing a custom editor for my RestructuredText/Sphinx stuff, with "live" html output preview. Output is built using Sphinx.
The source files are pure RestructuredText. No code there.
One desirable feature would be that right-clicking on some part of the preview opens the editor at the correct line of the source file.
To achieve that, one way would be to put that line number in every tag of the html file, for example using classes (e.g., class = "... lineno-124"). Or use html comments.
Note that I don't want to add more content to my source files, just that the line number be included everywhere in the output.
An approximate line number would be enough.
Someone knows how to do this in Sphinx, my way or another?
I decided to add <a> tags with a specific class "lineno lineno-nnn" where nnn is the line number in the RestructuredText source.
The directive .. linenocomment:: nnn is inserted before each new block of unindented text in the source, before the actual parsing (using a 'source-read' event hook).
linenocomment is a custom directive that pushes the <a> tag at build time.
Half a solution is still a solution...
import docutils.nodes as dn
from docutils.parsers.rst import Directive
class linenocomment(dn.General,dn.Element):
pass
def visit_linenocomment_html(self,node):
self.body.append(self.starttag(node,'a',CLASS="lineno lineno-{}".format(node['lineno'])))
def depart_linenocomment_html(self,node):
self.body.append('</a>')
class LineNoComment(Directive):
required_arguments = 1
optional_arguments = 0
has_content = False
add_index = False
def run(self):
node = linenocomment()
node['lineno'] = self.arguments[0]
return [node]
def insert_line_comments(app, docname, source):
print(source)
new_source = []
last_line_empty = True
lineno = 0
for line in source[0].split('\n'):
if line.strip() == '':
last_line_empty = True
new_source.append(line)
elif line[0].isspace():
new_source.append(line)
last_line_empty = False
elif not last_line_empty:
new_source.append(line)
else:
last_line_empty = False
new_source.append('.. linenocomment:: {}'.format(lineno))
new_source.append('')
new_source.append(line)
lineno += 1
source[0] = '\n'.join(new_source)
print(source)
def setup(app):
app.add_node(linenocomment,html=(visit_linenocomment_html,depart_linenocomment_html))
app.add_directive('linenocomment', LineNoComment)
app.connect('source-read',insert_line_comments)
return {
'version': 0.1
}

How to use Nokogiri to combine multiple like-formatted XML files into CSV

I want to parse multiple like-formatted XML files into a CSV file.
I searched on Google, nokogiri.org, and on SO but I haven't been able to find an answer.
I have ten XML files in identical format in terms of node/element structure, that reside in the current directory.
After combining the XML files into a single XML file, I need to pull out specific elements of the advisory node. I would like to output the link, title, location, os -> language -> name, and reference -> name data to the CSV file.
My code is only able to parse a single XML document and I'd like it to take into account 1:many:
# Parse the XML file into a Nokogiri::XML::Document object
#doc = Nokogiri::XML(File.open("file.xml"))
# Gather the 5 specific XML elements out of the 'advisory' top-level node
data = #doc.search('advisory').map { |adv|
[
adv.at('link').content,
adv.at('title').content,
adv.at('location').content,
adv.at('os > language > name').content,
adv.at('reference > name').content
]
}
# Loop through each array element in the object and write out as CSV row
CSV.open('output_file.csv', 'wb') do |csv|
# Explicitly set headers until you figure out how to get them programatically
csv << ['Link', 'Title', 'Location', 'OS Name', 'Reference Name']
data.each do |row|
csv << row
end
end
I tried changing the code to support multiple XML files and get them into Nokogiri::XML::Document objects:
xml_docs = []
Dir.glob("*.xml").each do |file|
xml = Nokogiri::XML(File.new(file))
xml_docs << Nokogiri::XML::Document.new(xml)
end
This successfully creates an array xml_docs with the correct objects it in, but I don't know how to convert these six objects into a single object.
This is sample XML. All XML files use the same node/element structure:
<advisories>
<title> Not relevant </title>
<customer> N/A </customer>
<advisory id="12345">
<link> https://www.google.com </link>
<release_date>2016-04-07</release_date>
<title> The Short Description Would Go Here </title>
<location> Location Name Here </location>
<os>
<product>
<id>98765</id>
<name>Product Name</name>
</product>
<language>
<id>123</id>
<name>en</name>
</language>
</os>
<reference>
<id>00029</id>
<name>Full</name>
<area>Not Defined</area>
</reference>
</advisory>
<advisory id="98765">
<link> https://www.msn.com </link>
<release_date>2016-04-08</release_date>
<title> The Short Description Would Go Here </title>
<location> Location Name Here </location>
<os>
<product>
<id>12654</id>
<name>Product Name</name>
</product>
<language>
<id>126</id>
<name>fr</name>
</language>
</os>
<reference>
<id>00052</id>
<name>Partial</name>
<area>Defined</area>
</reference>
</advisory>
</advisories>
The code leverages Nokogiri::XML::Document but if Nokogiri::XML::Builder will work better for this, I am more than willing to adjust my code accordingly.
I'd handle the first part, of parsing one XML file, like this:
require 'nokogiri'
doc = Nokogiri::XML(<<EOT)
<advisories>
<advisory id="12345">
<link> https://www.google.com </link>
<title> The Short Description Would Go Here </title>
<location> Location Name Here </location>
<os>
<language>
<name>en</name>
</language>
</os>
<reference>
<name>Full</name>
</reference>
</advisory>
<advisory id="98765">
<link> https://www.msn.com </link>
<release_date>2016-04-08</release_date>
<title> The Short Description Would Go Here </title>
<location> Location Name Here </location>
<os>
<language>
<name>fr</name>
</language>
</os>
<reference>
<name>Partial</name>
</reference>
</advisory>
</advisories>
EOT
Note: This has nodes removed because they weren't important to the question. Please remove fluff when asking as it's distracting.
With this being the core of the code:
doc.search('advisory').map{ |advisory|
link = advisory.at('link').text
title = advisory.at('title').text
location = advisory.at('location').text
os_language_name = advisory.at('os > language > name').text
reference_name = advisory.at('reference > name').text
{
link: link,
title: title,
location: location,
os_language_name: os_language_name,
reference_name: reference_name
}
}
That could be DRY'd but was written as an example of what to do.
Running that results in an array of hashes, which would be easily output via CSV:
# => [
{:link=>" https://www.google.com ", :title=>" The Short Description Would Go Here ", :location=>" Location Name Here ", :os_language_name=>"en", :reference_name=>"Full"},
{:link=>" https://www.msn.com ", :title=>" The Short Description Would Go Here ", :location=>" Location Name Here ", :os_language_name=>"fr", :reference_name=>"Partial"}
]
Once you've got that working then fit it into a modified version of your loops to output CSV and read the XML files. This is untested but looks about right:
CSV.open('output_file.csv', 'w',
headers: ['Link', 'Title', 'Location', 'OS Name', 'Reference Name'],
write_headers: true
) do |csv|
Dir.glob("*.xml").each do |file|
xml = Nokogiri::XML(File.read(file))
# parse a file and get the array of hashes
end
# pass the array of hashes to CSV for output
end
Note that you were using a file mode of 'wb'. You rarely need b with CSV as CSV is supposed to be a text format. If you are sure you will encounter binary data then use 'b' also, but that could lead down a path containing dragons.
Also note that this is using read. read is not scalable, which means it doesn't care how big a file is, it's going to try to read it into memory, whether or not it'll actually fit. There are lots of reasons to avoid that, but the best is it'll take your program to its knees. If your XML files could exceed the available free memory for your system then you'll want to rewrite using a SAX parser, which Nokogiri supports. How to do that is a different question.
it was actually an Array of array of hashes. I'm not sure how I ended up there but I was easily able to use array.flatten
Meditate on this:
foo = [] # => []
foo << [{}] # => [[{}]]
foo.flatten # => [{}]
You probably wanted to do this:
foo = [] # => []
foo += [{}] # => [{}]
Any time I have to use flatten I look to see if I can create the array without it being an array of arrays of something. It's not that they're inherently bad, because sometimes they're very useful, but you really wanted an array of hashes so you knew something was wrong and flatten was a cheap way out, but using it also costs more CPU time. It's better to figure out the problem and fix it and end up with faster/more efficient code. (And some will say that's a wasted effort or is premature optimization, but writing efficient code is a very good trait and goal.)

Get value of XML attribute with namespace

I'm parsing a pptx file and ran into an issue. This is a sample of the source XML:
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<p:presentation xmlns:a="http://schemas.openxmlformats.org/drawingml/2006/main" xmlns:p="http://schemas.openxmlformats.org/presentationml/2006/main" xmlns:r="http://schemas.openxmlformats.org/officeDocument/2006/relationships">
<p:sldMasterIdLst>
<p:sldMasterId id="2147483648" r:id="rId2"/>
</p:sldMasterIdLst>
<p:sldIdLst>
<p:sldId id="256" r:id="rId3"/>
</p:sldIdLst>
<p:sldSz cx="10080625" cy="7559675"/>
<p:notesSz cx="7772400" cy="10058400"/>
</p:presentation>
I need to to get the r:id attribute value in the sldMasterId tag.
doc = Nokogiri::XML(path_to_pptx)
doc.xpath('p:presentation/p:sldMasterIdLst/p:sldMasterId').attr('id').value
returns 2147483648 but I need rId2, which is the r:id attribute value.
I found the attribute_with_ns(name, namespace) method, but
doc.xpath('p:presentation/p:sldMasterIdLst/p:sldMasterId').attribute_with_ns('id', 'r')
returns nil.
You can reference the namespace of attributes in your xpath the same way you reference element namespaces:
doc.xpath('p:presentation/p:sldMasterIdLst/p:sldMasterId/#r:id')
If you want to use attribute_with_ns, you need to use the actual namespace, not just the prefix:
doc.at_xpath('p:presentation/p:sldMasterIdLst/p:sldMasterId')
.attribute_with_ns('id', "http://schemas.openxmlformats.org/officeDocument/2006/relationships")
http://nokogiri.org/Nokogiri/XML/Node.html#method-i-attributes
If you need to distinguish attributes with the same name, with different namespaces use attribute_nodes instead.
doc.xpath('p:presentation/p:sldMasterIdLst/p:sldMasterId').each do |element|
element.attribute_nodes().select do |node|
puts node if node.namespace && node.namespace.prefix == "r"
end
end

How to use xpath from lxml on null namespaced nodes?

What is the best way to handle the lack of a namespace on some of the nodes in an xml document using lxml? Should I first modify all None named nodes to add the "gmd" name and then change the tree attributes to name http://www.isotc211.org/2005/gmd as "gmd"? If so, is there a clean way to do this with lxml or something else that would be relatively clean/safe?
from lxml import etree
nsmap = charts_tree.nsmap
nsmap.pop(None) # complains without this on the xpath with
# TypeError: empty namespace prefix is not supported in XPath
len (charts_tree.xpath('//*/gml:Polygon',namespaces=nsmap))
# 1180
len (charts_tree.xpath('//*/DS_DataSet',namespaces=nsmap))
# 0 ... Bummer!
len (charts_tree.xpath('//*/DS_DataSet'))
# 0 ... Also a bummer
e.g. http://www.charts.noaa.gov/ENCs/ENCProdCat_19115.xml
<DS_Series xmlns="http://www.isotc211.org/2005/gmd" xmlns:gco="http://www.isotc211.org/2005/gco" xmlns:gml="http://www.opengis.net/gml/3.2" xmlns:gsr="http://www.isotc211.org/2005/gsr" xmlns:gss="http://www.isotc211.org/2005/gss" xmlns:gts="http://www.isotc211.org/2005/gts" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.isotc211.org/2005/gmd http://schemas.opengis.net/iso/19139/20070417/gmd/gmd.xsd">
<composedOf>
<DS_DataSet>
<has>
<MD_Metadata>
<parentIdentifier>
<gco:CharacterString>NOAA ENC Product Catalog</gco:CharacterString>
</parentIdentifier>
...
<EX_BoundingPolygon>
<polygon>
<gml:Polygon gml:id="US1AK90M_P1">
<gml:exterior>
<gml:LinearRing>
<gml:pos>67.61505 -178.99979</gml:pos>
<gml:pos>73.99999 -178.99979</gml:pos>
...
<gml:pos>64.99997 -178.99979</gml:pos>
<gml:pos>67.61505 -178.99979</gml:pos>
</gml:LinearRing>
I believe your DS_DataSet is by virtue of being within the DS_Series (implying a default namespace of "http://www.isotc211.org/2005/gmd") carrying a namespace.
Try and map that into your namespace dictionary (you can probably first test through a print to see if it's already in there, otherwise add it and refer to the namespace by your new key).
nsmap['some_ns'] = "http://www.isotc211.org/2005/gmd"
len (charts_tree.xpath('//*/some_ns:DS_DataSet',namespaces=nsmap))
Which becomes:
nsmap['gmd'] = nsmap[None]
nsmap.pop(None)
len(charts_tree.xpath('//*/gmd:DS_DataSet',namespaces=nsmap))

How to replace a particular line in xml with the new one in ruby

I have a requirement where I need to replace the element value with the new one and I dont want any other modification to be done to the file.
<mtn:test-case title='Power-Consist-Message'>
<mtn:messages>
<mtn:message sequence='4' correlation-key='0x0F04'>
<mtn:header>
<mtn:protocol-version>0x4</mtn:protocol-version>
<mtn:message-type>0x0F04</mtn:message-type>
<mtn:message-version>0x01</mtn:message-version>
<mtn:gmt-time-switch>false</mtn:gmt-time-switch>
<mtn:crc-calc-switch>1</mtn:crc-calc-switch>
<mtn:encrypt-switch>false</mtn:encrypt-switch>
<mtn:compress-switch>false</mtn:compress-switch>
<mtn:ttl>999</mtn:ttl>
<mtn:qos-class-of-service>0</mtn:qos-class-of-service>
<mtn:qos-priority>2</mtn:qos-priority>
<mtn:qos-network-preference>1</mtn:qos-network-preference>
this is how the xml file looks like, I want to replace 999 with "some other value", under s section, but when am doing that using formatter in ruby some other unwanted modifications are taking place, the code that am using is as belows
File.open(ENV['CadPath1']+ "conf\\cad-mtn-config.xml") do |config_file|
# Open the document and edit the file
config = Document.new(config_file)
testField=config.root.elements[4].elements[11].elements[1].elements[1].elements[1].elements[11]
if testField.to_s.match(/<mtn:qos-network-preference>/)
test=config.root.elements[4].elements[11].elements[1].elements[1].elements[1].elements[8].text="2"
# Write the result to a new file.
formatter = REXML::Formatters::Default.new
File.open(ENV['CadPath1']+ "conf\\cad-mtn-config.xml", 'w') do |result|
formatter.write(config, result)
end
end
end
when am writting the modifications to the new file, the xml file size is getting changed from 79kb to 78kb, is there any way to just replace the particular line in xml file and save changes without affecting the xml file.
Please let me know soon...
I prefer Nokogiri as my XML/HTML parser of choice:
require 'nokogiri'
xml =<<EOT
<mtn:test-case title='Power-Consist-Message'>
<mtn:messages>
<mtn:message sequence='4' correlation-key='0x0F04'>
<mtn:header>
<mtn:protocol-version>0x4</mtn:protocol-version>
<mtn:message-type>0x0F04</mtn:message-type>
<mtn:message-version>0x01</mtn:message-version>
<mtn:gmt-time-switch>false</mtn:gmt-time-switch>
<mtn:crc-calc-switch>1</mtn:crc-calc-switch>
<mtn:encrypt-switch>false</mtn:encrypt-switch>
<mtn:compress-switch>false</mtn:compress-switch>
<mtn:ttl>999</mtn:ttl>
<mtn:qos-class-of-service>0</mtn:qos-class-of-service>
<mtn:qos-priority>2</mtn:qos-priority>
<mtn:qos-network-preference>1</mtn:qos-network-preference>
EOT
Notice that the XML is malformed, i.e., it doesn't terminate correctly.
doc = Nokogiri::XML(xml)
I'm using CSS accessors to find the ttl node. Because of some magic, Nokogiri's CSS ignores XML name spaces, simplifying finding nodes.
doc.at('ttl').content = '1000'
puts doc.to_xml
# >> <?xml version="1.0"?>
# >> <test-case title="Power-Consist-Message">
# >> <messages>
# >> <message sequence="4" correlation-key="0x0F04">
# >> <header>
# >> <protocol-version>0x4</protocol-version>
# >> <message-type>0x0F04</message-type>
# >> <message-version>0x01</message-version>
# >> <gmt-time-switch>false</gmt-time-switch>
# >> <crc-calc-switch>1</crc-calc-switch>
# >> <encrypt-switch>false</encrypt-switch>
# >> <compress-switch>false</compress-switch>
# >> <ttl>1000</ttl>
# >> <qos-class-of-service>0</qos-class-of-service>
# >> <qos-priority>2</qos-priority>
# >> <qos-network-preference>1</qos-network-preference>
# >> </header></message></messages></test-case>
Notice that Nokogiri replaced the content of the ttl node. It also stripped the XML namespace info because the document didn't declare it correctly, and, finally, Nokogiri has added closing tags to make the document syntactically correct.
If you want the namespace to be declared in the output, you'll need to make sure it's there in the input.
If you need to just literally replace that value without affecting anything else about the XML file, even if (as pointed by the Tin Man above) that would mean leaving the original XML file malformed, you can do that with direct string manipulation using a regular expression.
Assuming there is guaranteed to only be one <mtn:ttl> tag in your XML document, you could just do:
doc = IO.read("somefile.xml")
doc.sub! /<mtn:ttl>.+?<\/mtn:ttl>/, "<mtn:ttl>some other value<\/mtn:ttl>"
File.open("somefile.xml", "w") {|fh| fh.write(doc)}
If there might be more than one <mtn:ttl> tag, then this is trickier; how much trickier depends on how you want to figure out which tag(s) to change.

Resources