I'm parsing a pptx file and ran into an issue. This is a sample of the source XML:
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<p:presentation xmlns:a="http://schemas.openxmlformats.org/drawingml/2006/main" xmlns:p="http://schemas.openxmlformats.org/presentationml/2006/main" xmlns:r="http://schemas.openxmlformats.org/officeDocument/2006/relationships">
<p:sldMasterIdLst>
<p:sldMasterId id="2147483648" r:id="rId2"/>
</p:sldMasterIdLst>
<p:sldIdLst>
<p:sldId id="256" r:id="rId3"/>
</p:sldIdLst>
<p:sldSz cx="10080625" cy="7559675"/>
<p:notesSz cx="7772400" cy="10058400"/>
</p:presentation>
I need to to get the r:id attribute value in the sldMasterId tag.
doc = Nokogiri::XML(path_to_pptx)
doc.xpath('p:presentation/p:sldMasterIdLst/p:sldMasterId').attr('id').value
returns 2147483648 but I need rId2, which is the r:id attribute value.
I found the attribute_with_ns(name, namespace) method, but
doc.xpath('p:presentation/p:sldMasterIdLst/p:sldMasterId').attribute_with_ns('id', 'r')
returns nil.
You can reference the namespace of attributes in your xpath the same way you reference element namespaces:
doc.xpath('p:presentation/p:sldMasterIdLst/p:sldMasterId/#r:id')
If you want to use attribute_with_ns, you need to use the actual namespace, not just the prefix:
doc.at_xpath('p:presentation/p:sldMasterIdLst/p:sldMasterId')
.attribute_with_ns('id', "http://schemas.openxmlformats.org/officeDocument/2006/relationships")
http://nokogiri.org/Nokogiri/XML/Node.html#method-i-attributes
If you need to distinguish attributes with the same name, with different namespaces use attribute_nodes instead.
doc.xpath('p:presentation/p:sldMasterIdLst/p:sldMasterId').each do |element|
element.attribute_nodes().select do |node|
puts node if node.namespace && node.namespace.prefix == "r"
end
end
Related
I have an XML, that as I understand it has already been parsed by tags. My goal is to parse all the information that is in the <GetResidentsContactInfoResult> tag. In this tag of the sample xml below there are two records in here which begin each with the Lease PropertyId key. How can I iterate over the <GetResidentsContactInfoResult> tag and print out the key/value pairs for each record? I'm new to Ruby and working with XML files, is this something I can do with Nokogiri?
<?xml version="1.0" encoding="UTF-8"?>
<soap:Envelope xmlns:soap="http://www.w3.org/2003/05/soap-envelope" xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<soap:Body>
<GetResidentsContactInfoResponse xmlns="http://tempuri.org/">
<GetResidentsContactInfoResult><PropertyResidents><Lease PropertyId="21M" BldgID="00" UnitID="0903" ResiID="3" occustatuscode="P" occustatuscodedescription="Previous" MoveInDate="2016-01-07T00:00:00" MoveOutDate="2016-02-06T00:00:00" LeaseBeginDate="2016-01-07T00:00:00" LeaseEndDate="2017-01-31T00:00:00" MktgSource="DBY" PrimaryEmail="noemail1#fake.com"><Occupant PropertyId="21M" BldgID="00" UnitID="0903" ResiID="3" OccuSeqNo="3444755" OccuFirstName="Efren" OccuLastName="Cerda" Phone2No="(832) 693-9448" ResponsibleFlag="Responsible" /></Lease><Lease PropertyId="21M" BldgID="00" UnitID="0908" ResiID="2" occustatuscode="P" occustatuscodedescription="Previous" MoveInDate="2016-02-20T00:00:00" MoveOutDate="2016-04-25T00:00:00" LeaseBeginDate="2016-02-20T00:00:00" LeaseEndDate="2017-02-28T00:00:00" MktgSource="PW" PrimaryEmail="noemail1#fake.com"><Occupant PropertyId="21M" BldgID="00" UnitID="0908" ResiID="2" OccuSeqNo="3451301" OccuFirstName="Donna" OccuLastName="Mclean" Phone2No="(713) 785-4240" ResponsibleFlag="Responsible" /></Lease></PropertyResidents></GetResidentsContactInfoResult>
</GetResidentsContactInfoResponse>
</soap:Body>
</soap:Envelope>
This uses Nokogiri to find all the GetResidentsContactInfoResponse elements, and then Active Support to convert the inner text to a hash of key-value pairs.
Read "sparklemotion/nokogiri" and "Tutorials" regarding installing and using Nokogiri.
Read "Active Support Core Extensions" about more capabilities of Active Support (though the guide does not include Hash.from_xml). To install it simply do gem install activesupport.
I assume you're fine with Nokogiri as you mentioned it in your question.
If you don't want to use Active Support, consider looking into "Convert a Nokogiri document to a Ruby Hash" as an alternative to the line Hash.from_xml(elm.text):
# Needed in order to use the `Hash.from_xml`
require 'active_support/core_ext/hash/conversions'
def find_key_values(str)
doc = Nokogiri::XML(str)
# Ignore namespaces for easier traversal
doc.remove_namespaces!
doc.css('GetResidentsContactInfoResponse').map do |elm|
Hash.from_xml(elm.text)
end
end
Usage:
# Option 1: if your XML above is stored in a variable called `string`
find_key_values string
# Option 2: if your XML above is stored in a file
find_key_values File.open('/path/to/file')
Which returns:
[{"PropertyResidents"=>
{"Lease"=>
[{"PropertyId"=>"21M",
"BldgID"=>"00",
"UnitID"=>"0903",
"ResiID"=>"3",
"occustatuscode"=>"P",
"occustatuscodedescription"=>"Previous",
"MoveInDate"=>"2016-01-07T00:00:00",
"MoveOutDate"=>"2016-02-06T00:00:00",
"LeaseBeginDate"=>"2016-01-07T00:00:00",
"LeaseEndDate"=>"2017-01-31T00:00:00",
"MktgSource"=>"DBY",
"PrimaryEmail"=>"noemail1#fake.com",
"Occupant"=>
{"PropertyId"=>"21M",
"BldgID"=>"00",
"UnitID"=>"0903",
"ResiID"=>"3",
"OccuSeqNo"=>"3444755",
"OccuFirstName"=>"Efren",
"OccuLastName"=>"Cerda",
"Phone2No"=>"(832) 693-9448",
"ResponsibleFlag"=>"Responsible"}},
{"PropertyId"=>"21M",
"BldgID"=>"00",
"UnitID"=>"0908",
"ResiID"=>"2",
"occustatuscode"=>"P",
"occustatuscodedescription"=>"Previous",
"MoveInDate"=>"2016-02-20T00:00:00",
"MoveOutDate"=>"2016-04-25T00:00:00",
"LeaseBeginDate"=>"2016-02-20T00:00:00",
"LeaseEndDate"=>"2017-02-28T00:00:00",
"MktgSource"=>"PW",
"PrimaryEmail"=>"noemail1#fake.com",
"Occupant"=>
{"PropertyId"=>"21M",
"BldgID"=>"00",
"UnitID"=>"0908",
"ResiID"=>"2",
"OccuSeqNo"=>"3451301",
"OccuFirstName"=>"Donna",
"OccuLastName"=>"Mclean",
"Phone2No"=>"(713) 785-4240",
"ResponsibleFlag"=>"Responsible"}}]}}]
This is my XML:
<?xml version="1.0"?>
<Revision>
<Revision Index="1"/>
</Revision>
When using DomDocument(ver 3) I could run the following(In VB6):
Set objNode = objDom.documentElement.selectSingleNode("//.[#Index = '1']")
This is to identify that this attribute exists with it's value.
Now after moving to DomDocument60(ver 6) this line fails.
How should I re-write it so it works with DomDocument60?
Thanks in advance
Doron
You could try using the XPath:
//Revision[#Index="1"]
This will select all Revision nodes in the document that have the attribute Index="1".
I have questions about libxml-ruby.
There is a xml file "sample.xml".
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<worksheet xmlns="http://***" xmlns:r="http://???">
<sheetData>
<row><v>1</v></row>
</sheetData>
</worksheet>
I want to deal with nodes without specifying default namespace like below.
xml = XML::Document.file('sample.xml')
sheet_data = xml.find_first('sheetData')
Of course, I can do it like below.
NS = {
main: 'http://***',
r: 'http://???',
}
sheet_data = xml.find_first('main:sheetData', NS)
But I want to omit string of default namespace.
I tried some properties and methods belongs to XML::Namespace[s], but not effected.
And one more problem when I save a xml file.
ns = XML::Namespace.new(xml.root, 'main', 'http://***')
row = XML::Node.new('row', nil, ns)
sheet_data << row
xml.save("sample.xml")
Published like below.
<row><v>1</v></row>
<main:row/>
I want that it's omitted string of "main:".
So I do this, but it's really ugly.
open('sample.xml', 'wb') do |f|
f.write(xml.to_s.gsub(/(<\/?)main:/, '\1'))
end
Do you have any good idea?
I'm having a bit of trouble testing some XML generation using Nokogiri when the node is blank. I'm using Minitest to compare the generated XML string with a template fixture file. My test fails with the blank node as Minitest is comparing <Node></Node> with <Node />.
XML Generation
builder = Nokogiri::XML::Builder.new encoding: "UTF-8" do |xml|
xml.Header
xml.FileName #object.filename
end
Template file
This is the file I'm using as a fixture in my tests
<?xml version="1.0" encoding="UTF-8"?>
<Header/>
<FileName></FileName>
Minitest output
3) Failure:
--- expected
+++ actual
## -25,7 +25,7 ##
<Header />
- <FileName/>
+ <FileName></FileName>
As you can see, MiniTest is trying to compare a self-closing tag with a non-self-closing tag and making the test fail. Changing the fixture tag to a self-closing one results, strangely, in exactly the same error message.
It's because sometimes #object.filename is nil - if I have a blank XML node (as in xml.Header above) using a self-closing tag in my fixture works no problem.
I would use XML schema in this case:
def test_that_xml_data_conforms_to_schema
xml_data = ...
schema_data = ...
fragment = Nokogiri::XML.parse(xml_data)
schema = Nokogiri::XML::Schema(schema_data)
assert schema.valid?(fragment)
end
Following is the method used to write an entry to xml file
def write_entry(entry)
node = Nokogiri::XML::Node.new("url", #xml_document)
node["loc"]= entry[:url]
node["lastmod"]= entry[:lastmod].to_s
node["changefreq"] = entry[:frequency].to_s
node["priority"] = entry[:priority].to_s
node.to_xml
end
The entry looks like this:
<urlset>
<url loc="http:`enter code here`//www.experteer.co.uk/vacaturebank/banen/vacatures/xing-ag" lastmod="2011-11-23 16:58:27 UTC" changefreq="0.8" priority="monthly"/>
</urlset>
I want the entry of xml to be like this
<urlset>
<url>
<loc> http://www.experteer.co.uk/vacaturebank/banen/vacatures/xing-ag </loc>
<lastmod> 2011-11-23 16:58:27 UTC </lastmod>
<changefreq> 0.8 </changefreq>
<priority> monthly </priority>
</url>
</urlset>
Is it possible with using Node or I have to use Builder?
If possible with Node Then how?
and If I have to use Builder it writes header for each entry how can I handle that it dont write header for each entry.
you can use << or add_child to append children nodes to a node.
def write_entry(entry)
url = Nokogiri::XML::Node.new( "url" , #xml_document )
%w{loc lastmod changefreq priority}.each do |node|
url << Nokogiri::XML::Node.new( node, #xml_document ).tap do |n|
n.content = entry[ node.to_sym ]
end
end
url.to_xml
end
For this to work correctly, you have to change entry[:url] to entry[:loc]. and entry[:frequency] to entry[:changefreq], which shouldn't be a bad thing (it's best to have the same name for the same thing everywhere, isn't it ?).
Alternatively, if your entry hash only contains what you need to convert to xml, use entry.each do |key,value| instead of the array.