Testing Nokogiri XML generation with blank nodes - ruby

I'm having a bit of trouble testing some XML generation using Nokogiri when the node is blank. I'm using Minitest to compare the generated XML string with a template fixture file. My test fails with the blank node as Minitest is comparing <Node></Node> with <Node />.
XML Generation
builder = Nokogiri::XML::Builder.new encoding: "UTF-8" do |xml|
xml.Header
xml.FileName #object.filename
end
Template file
This is the file I'm using as a fixture in my tests
<?xml version="1.0" encoding="UTF-8"?>
<Header/>
<FileName></FileName>
Minitest output
3) Failure:
--- expected
+++ actual
## -25,7 +25,7 ##
<Header />
- <FileName/>
+ <FileName></FileName>
As you can see, MiniTest is trying to compare a self-closing tag with a non-self-closing tag and making the test fail. Changing the fixture tag to a self-closing one results, strangely, in exactly the same error message.
It's because sometimes #object.filename is nil - if I have a blank XML node (as in xml.Header above) using a self-closing tag in my fixture works no problem.

I would use XML schema in this case:
def test_that_xml_data_conforms_to_schema
xml_data = ...
schema_data = ...
fragment = Nokogiri::XML.parse(xml_data)
schema = Nokogiri::XML::Schema(schema_data)
assert schema.valid?(fragment)
end

Related

Parsing out contents of XML tag in Ruby

I have an XML, that as I understand it has already been parsed by tags. My goal is to parse all the information that is in the <GetResidentsContactInfoResult> tag. In this tag of the sample xml below there are two records in here which begin each with the Lease PropertyId key. How can I iterate over the <GetResidentsContactInfoResult> tag and print out the key/value pairs for each record? I'm new to Ruby and working with XML files, is this something I can do with Nokogiri?
<?xml version="1.0" encoding="UTF-8"?>
<soap:Envelope xmlns:soap="http://www.w3.org/2003/05/soap-envelope" xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<soap:Body>
<GetResidentsContactInfoResponse xmlns="http://tempuri.org/">
<GetResidentsContactInfoResult><PropertyResidents><Lease PropertyId="21M" BldgID="00" UnitID="0903" ResiID="3" occustatuscode="P" occustatuscodedescription="Previous" MoveInDate="2016-01-07T00:00:00" MoveOutDate="2016-02-06T00:00:00" LeaseBeginDate="2016-01-07T00:00:00" LeaseEndDate="2017-01-31T00:00:00" MktgSource="DBY" PrimaryEmail="noemail1#fake.com"><Occupant PropertyId="21M" BldgID="00" UnitID="0903" ResiID="3" OccuSeqNo="3444755" OccuFirstName="Efren" OccuLastName="Cerda" Phone2No="(832) 693-9448" ResponsibleFlag="Responsible" /></Lease><Lease PropertyId="21M" BldgID="00" UnitID="0908" ResiID="2" occustatuscode="P" occustatuscodedescription="Previous" MoveInDate="2016-02-20T00:00:00" MoveOutDate="2016-04-25T00:00:00" LeaseBeginDate="2016-02-20T00:00:00" LeaseEndDate="2017-02-28T00:00:00" MktgSource="PW" PrimaryEmail="noemail1#fake.com"><Occupant PropertyId="21M" BldgID="00" UnitID="0908" ResiID="2" OccuSeqNo="3451301" OccuFirstName="Donna" OccuLastName="Mclean" Phone2No="(713) 785-4240" ResponsibleFlag="Responsible" /></Lease></PropertyResidents></GetResidentsContactInfoResult>
</GetResidentsContactInfoResponse>
</soap:Body>
</soap:Envelope>
This uses Nokogiri to find all the GetResidentsContactInfoResponse elements, and then Active Support to convert the inner text to a hash of key-value pairs.
Read "sparklemotion/nokogiri" and "Tutorials" regarding installing and using Nokogiri.
Read "Active Support Core Extensions" about more capabilities of Active Support (though the guide does not include Hash.from_xml). To install it simply do gem install activesupport.
I assume you're fine with Nokogiri as you mentioned it in your question.
If you don't want to use Active Support, consider looking into "Convert a Nokogiri document to a Ruby Hash" as an alternative to the line Hash.from_xml(elm.text):
# Needed in order to use the `Hash.from_xml`
require 'active_support/core_ext/hash/conversions'
def find_key_values(str)
doc = Nokogiri::XML(str)
# Ignore namespaces for easier traversal
doc.remove_namespaces!
doc.css('GetResidentsContactInfoResponse').map do |elm|
Hash.from_xml(elm.text)
end
end
Usage:
# Option 1: if your XML above is stored in a variable called `string`
find_key_values string
# Option 2: if your XML above is stored in a file
find_key_values File.open('/path/to/file')
Which returns:
[{"PropertyResidents"=>
{"Lease"=>
[{"PropertyId"=>"21M",
"BldgID"=>"00",
"UnitID"=>"0903",
"ResiID"=>"3",
"occustatuscode"=>"P",
"occustatuscodedescription"=>"Previous",
"MoveInDate"=>"2016-01-07T00:00:00",
"MoveOutDate"=>"2016-02-06T00:00:00",
"LeaseBeginDate"=>"2016-01-07T00:00:00",
"LeaseEndDate"=>"2017-01-31T00:00:00",
"MktgSource"=>"DBY",
"PrimaryEmail"=>"noemail1#fake.com",
"Occupant"=>
{"PropertyId"=>"21M",
"BldgID"=>"00",
"UnitID"=>"0903",
"ResiID"=>"3",
"OccuSeqNo"=>"3444755",
"OccuFirstName"=>"Efren",
"OccuLastName"=>"Cerda",
"Phone2No"=>"(832) 693-9448",
"ResponsibleFlag"=>"Responsible"}},
{"PropertyId"=>"21M",
"BldgID"=>"00",
"UnitID"=>"0908",
"ResiID"=>"2",
"occustatuscode"=>"P",
"occustatuscodedescription"=>"Previous",
"MoveInDate"=>"2016-02-20T00:00:00",
"MoveOutDate"=>"2016-04-25T00:00:00",
"LeaseBeginDate"=>"2016-02-20T00:00:00",
"LeaseEndDate"=>"2017-02-28T00:00:00",
"MktgSource"=>"PW",
"PrimaryEmail"=>"noemail1#fake.com",
"Occupant"=>
{"PropertyId"=>"21M",
"BldgID"=>"00",
"UnitID"=>"0908",
"ResiID"=>"2",
"OccuSeqNo"=>"3451301",
"OccuFirstName"=>"Donna",
"OccuLastName"=>"Mclean",
"Phone2No"=>"(713) 785-4240",
"ResponsibleFlag"=>"Responsible"}}]}}]

Nokogiri removing xml encoding

I am using nokogiri to decode some xml. This xml does have some html as values. I am seeing some strange behavior when parsing this. It appears nokogiri is removing some of the html encoded tags, so when i parse the html I am unable to decode it properly. See examples below:
doc = Nokogiri::XML '<?xml version="1.0"?><manifest
xmlns="http://www.imsglobal.org/xsd/imscp_v1p1"
identifier="Manifest-eaf97d26-aa83-4399-8e9b-ae9f6f5fc6a2"
xmlns="http://www.imsglobal.org/xsd/imscp_v1p1"
xmlns:imsmd="http://www.imsglobal.org/xsd/imsmd_v1p2"
xmlns:imsqti="http://www.imsglobal.org/xsd/imsqti_v2p1">
<imsmd:langstring><p>
 These are the<strong>instructions</strong> for the pool</p></imsmd:langstring>'
this yields the following value:
"<?xml version=\"1.0\"?>\n<manifest xmlns=\"http://www.imsglobal.org/xsd/imscp_v1p1\" xmlns:imsmd=\"http://www.imsglobal.org/xsd/imsmd_v1p2\" xmlns:imsqti=\"http://www.imsglobal.org/xsd/imsqti_v2p1\" identifier=\"Manifest-eaf97d26-aa83-4399-8e9b-ae9f6f5fc6a2\">\n<imsmd:langstring>p
 These are thestrong instructions/strong for the pool/p</imsmd:langstring></manifest>\n"
Notice how the < > tags are missing. However the following works as expected.
doc = Nokogiri::XML '<?xml version="1.0"?><imsmd:langstring><p>
 These are the<strong> instructions</strong> for the pool</p></imsmd:langstring>'
and gives the following result
"<?xml version=\"1.0\"?>\n<imsmd:langstring><p>
 These are the<strong> instructions</strong> for the pool</p></imsmd:langstring>\n"
I am sure I am missing something but can't figure out what is causing this.

Why do I get errors with XML modified using Nokogiri?

I am having problems understanding Net::HTTP and Nokogiri.
I have a large number of jobs on my Jenkins server. I have to periodically update the branch name on these jobs. Doing it from the UI is a cumbersome process so I decided to update the Jenkins config.xml.
I use Nokogiri to parse the XML, traverse the XPath and update the value of the node. However, when I try to post the updated XML back to Jenkins, I get a 500 error saying:
Caused by: javax.xml.transform.TransformerException: org.xml.sax.SAXParseExceptionpublicId: -//W3C//DTD HTML 4.0 Transitional//EN; systemId: http://www.w3.org/TR/REC-html40/loose.dtd; lineNumber: 31; columnNumber: 3; The declaration for the entity "HTML.Version" must end with '>'.
Here is what I am doing:
require "net/http"
require "nokogiri"
uri = URI.parse("http://jenkins.my.domain.web:8080")
http = Net::HTTP.new(uri.host, uri.port)
getQueueRequest = Net::HTTP::Get.new("http://jenkins.my.domain.web:8080/my/job/location/config.xml")
getQueue = http.request(getQueueRequest)
xml_doc = Nokogiri::HTML(getQueue.body)
# Get current branch name
branch_name=xml_doc.at_xpath('//hudson.plugins.git.branchspec/name')
# Get new branch name
print "Enter new branch name "
user_input = gets.chomp
new_branch_name = user_input.downcase
# Set branch name and create xml
branch_name.content=new_branch_name
new_config_xml=xml_doc.to_xml
puts "Logging into Jenkins"
update_branch = Net::HTTP::Post.new("http://jenkins.my.domain.web:8080/my/job/location/config.xml")
update_branch.basic_auth 'username', 'password'
update_branch.body = new_config_xml
response = http.request(update_branch)
puts response.body
I understand it might have to do something with the XML that is getting added to request body but I am not sure how to fix the issue.
Original XML:
<?xml version='1.0' encoding='UTF-8'?>
<maven2-moduleset plugin="maven-plugin#1.504">
<actions/>
<description></description>
<keepDependencies>false</keepDependencies>
<properties>
<hudson.plugins.throttleconcurrents.ThrottleJobProperty plugin="throttle-concurrents#1.7.2">
<maxConcurrentPerNode>0</maxConcurrentPerNode>
<maxConcurrentTotal>0</maxConcurrentTotal>
<categories/>
<throttleEnabled>false</throttleEnabled>
<throttleOption>project</throttleOption>
<configVersion>1</configVersion>
</hudson.plugins.throttleconcurrents.ThrottleJobProperty>
</properties>
<scm class="hudson.plugins.git.GitSCM" plugin="git#1.4.0">
<configVersion>2</configVersion>
<userRemoteConfigs>
<hudson.plugins.git.UserRemoteConfig>
<name></name>
<refspec></refspec>
<url>git#github.com:<ORG_NAME>/<REPO_NAME>.git</url>
</hudson.plugins.git.UserRemoteConfig>
</userRemoteConfigs>
<branches>
<hudson.plugins.git.BranchSpec>
<name>release</name>
</hudson.plugins.git.BranchSpec>
</branches>
<disableSubmodules>false</disableSubmodules>
<recursiveSubmodules>false</recursiveSubmodules>
<doGenerateSubmoduleConfigurations>false</doGenerateSubmoduleConfigurations>
<authorOrCommitter>false</authorOrCommitter>
<clean>false</clean>
<wipeOutWorkspace>false</wipeOutWorkspace>
<pruneBranches>false</pruneBranches>
<remotePoll>false</remotePoll>
<ignoreNotifyCommit>false</ignoreNotifyCommit>
<useShallowClone>false</useShallowClone>
<buildChooser class="hudson.plugins.git.util.DefaultBuildChooser"/>
<gitTool>Default</gitTool>
<submoduleCfg class="list"/>
<relativeTargetDir></relativeTargetDir>
<reference></reference>
<excludedRegions></excludedRegions>
<excludedUsers></excludedUsers>
<gitConfigName></gitConfigName>
<gitConfigEmail></gitConfigEmail>
<skipTag>false</skipTag>
<includedRegions></includedRegions>
<scmName></scmName>
</scm>
<canRoam>true</canRoam>
<disabled>false</disabled>
<blockBuildWhenDownstreamBuilding>false</blockBuildWhenDownstreamBuilding>
<blockBuildWhenUpstreamBuilding>false</blockBuildWhenUpstreamBuilding>
<triggers class="vector">
<hudson.triggers.TimerTrigger>
<spec>0 22 * * 4</spec>
</hudson.triggers.TimerTrigger>
</triggers>
<concurrentBuild>false</concurrentBuild>
<rootModule>
<groupId>com.org.project.test</groupId>
<artifactId>functest</artifactId>
</rootModule>
<goals>clean verify -Dtestsuite=<test_suite_name> -Dbrowser=chrome -Dipaddress=http://<IP_ADDRESS>:4444/wd/hub</goals>
<mavenName>apache-maven-3.0.4</mavenName>
<aggregatorStyleBuild>true</aggregatorStyleBuild>
<incrementalBuild>false</incrementalBuild>
<perModuleEmail>true</perModuleEmail>
<ignoreUpstremChanges>false</ignoreUpstremChanges>
<archivingDisabled>false</archivingDisabled>
<resolveDependencies>false</resolveDependencies>
<processPlugins>false</processPlugins>
<mavenValidationLevel>-1</mavenValidationLevel>
<runHeadless>false</runHeadless>
<disableTriggerDownstreamProjects>false</disableTriggerDownstreamProjects>
<settings class="jenkins.mvn.DefaultSettingsProvider"/>
<globalSettings class="jenkins.mvn.DefaultGlobalSettingsProvider"/>
<reporters/>
<publishers/>
<buildWrappers/>
<prebuilders/>
<postbuilders/>
<runPostStepsIfResult>
<name>FAILURE</name>
<ordinal>2</ordinal>
<color>RED</color>
</runPostStepsIfResult>
</maven2-moduleset>
After Editing and Massaging:
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd">
<?xml version="1.0" encoding="UTF-8"?>
<html>
<body>
<maven2-moduleset plugin="maven-plugin#1.504">
<actions />
<description />
<keepdependencies>false</keepdependencies>
<properties>
<hudson.plugins.throttleconcurrents.throttlejobproperty plugin="throttle-concurrents#1.7.2">
<maxconcurrentpernode>0</maxconcurrentpernode>
<maxconcurrenttotal>0</maxconcurrenttotal>
<categories />
<throttleenabled>false</throttleenabled>
<throttleoption>project</throttleoption>
<configversion>1</configversion>
</hudson.plugins.throttleconcurrents.throttlejobproperty>
</properties>
<scm class="hudson.plugins.git.GitSCM" plugin="git#1.4.0">
<configversion>2</configversion>
<userremoteconfigs>
<hudson.plugins.git.userremoteconfig>
<name />
<refspec />
<url>git#github.com:<ORG_NAME>/<REPO_NAME>.git</url>
</hudson.plugins.git.userremoteconfig>
</userremoteconfigs>
<branches>
<hudson.plugins.git.branchspec>
<name>master</name>
</hudson.plugins.git.branchspec>
</branches>
<disablesubmodules>false</disablesubmodules>
<recursivesubmodules>false</recursivesubmodules>
<dogeneratesubmoduleconfigurations>false</dogeneratesubmoduleconfigurations>
<authororcommitter>false</authororcommitter>
<clean>false</clean>
<wipeoutworkspace>false</wipeoutworkspace>
<prunebranches>false</prunebranches>
<remotepoll>false</remotepoll>
<ignorenotifycommit>false</ignorenotifycommit>
<useshallowclone>false</useshallowclone>
<buildchooser class="hudson.plugins.git.util.DefaultBuildChooser" />
<gittool>Default</gittool>
<submodulecfg class="list" />
<relativetargetdir />
<reference />
<excludedregions />
<excludedusers />
<gitconfigname />
<gitconfigemail />
<skiptag>false</skiptag>
<includedregions />
<scmname />
</scm>
<canroam>true</canroam>
<disabled>false</disabled>
<blockbuildwhendownstreambuilding>false</blockbuildwhendownstreambuilding>
<blockbuildwhenupstreambuilding>false</blockbuildwhenupstreambuilding>
<triggers class="vector">
<hudson.triggers.timertrigger>
<spec>0 22 * * 4</spec>
</hudson.triggers.timertrigger>
</triggers>
<concurrentbuild>false</concurrentbuild>
<rootmodule>
<groupid>com.org.project.test</groupid>
<artifactid>functest</artifactid>
</rootmodule>
<goals>clean verify -Dtestsuite=<test_suite_name> -Dbrowser=chrome -Dipaddress=http://<IP_ADDRESS>:4444/wd/hub</goals>
<mavenname>apache-maven-3.0.4</mavenname>
<aggregatorstylebuild>true</aggregatorstylebuild>
<incrementalbuild>false</incrementalbuild>
<permoduleemail>true</permoduleemail>
<ignoreupstremchanges>false</ignoreupstremchanges>
<archivingdisabled>false</archivingdisabled>
<resolvedependencies>false</resolvedependencies>
<processplugins>false</processplugins>
<mavenvalidationlevel>-1</mavenvalidationlevel>
<runheadless>false</runheadless>
<disabletriggerdownstreamprojects>false</disabletriggerdownstreamprojects>
<settings class="jenkins.mvn.DefaultSettingsProvider" />
<globalsettings class="jenkins.mvn.DefaultGlobalSettingsProvider" />
<reporters />
<publishers />
<buildwrappers />
<prebuilders />
<postbuilders />
<runpoststepsifresult>
<name>FAILURE</name>
<ordinal>2</ordinal>
<color>RED</color>
</runpoststepsifresult>
</maven2-moduleset>
</body>
</html>
When you use Nokogiri::HTML(some_html) or Nokogiri::XML(some_xml), Nokogiri will look to see if the content is valid. If it isn't, it will do fix-ups on the content in an attempt to make it so. For instance:
require 'nokogiri'
html_fragment = "<p>foo bar</p>"
Nokogiri::HTML(html_fragment).to_html
# => "<!DOCTYPE html PUBLIC \"-//W3C//DTD HTML 4.0 Transitional//EN\" \"http://www.w3.org/TR/REC-html40/loose.dtd\">\n<html><body><p>foo bar</p></body></html>\n"
If the document is partially correct Nokogiri still adds the DOCTYPE statement:
html = "<html><body><p>foo bar</p></body></html>"
Nokogiri::HTML(html).to_html
# => "<!DOCTYPE html PUBLIC \"-//W3C//DTD HTML 4.0 Transitional//EN\" \"http://www.w3.org/TR/REC-html40/loose.dtd\">\n<html><body><p>foo bar</p></body></html>\n"
If you want Nokogiri to leave the document along, because it's supposed to be a fragment, tell it to do so:
Nokogiri::HTML::DocumentFragment.parse(html_fragment).to_html
# => "<p>foo bar</p>"
Or:
xml_fragment = "<x>foo bar</x>"
Nokogiri::XML::DocumentFragment.parse(xml_fragment).to_xml
# => "<x>foo bar</x>"
Nokogiri is pretty smart about handling XML and HTML. You can try to confuse it and it'll generally do the right thing:
xml_fragment = "<x>foo bar</x>"
Nokogiri::HTML::DocumentFragment.parse(xml_fragment).to_xml
# => "<x>foo bar</x>"
That's parsing XML as an HTML fragment and telling it to emit it as XML.
Now, that all said, it's pretty obvious Nokogiri isn't doing anything mysterious, so, here's how to fix the problem. First, parse it as XML so Nokogiri doesn't think it should add the HTML DOCTYPE declaration, then, if the XML is syntactically correct, tell Nokogiri it's OK to parse it as a complete document:
require 'nokogiri'
xml = %{<?xml version='1.0' encoding='UTF-8'?>
<maven2-moduleset plugin="maven-plugin#1.504">
<actions/>
<description></description>
<keepDependencies>false</keepDependencies>
<properties>
<hudson.plugins.throttleconcurrents.ThrottleJobProperty plugin="throttle-concurrents#1.7.2">
</hudson.plugins.throttleconcurrents.ThrottleJobProperty>
</properties>
</maven2-moduleset>
}
puts Nokogiri::XML.parse(xml).to_xml
# >> <?xml version="1.0" encoding="UTF-8"?>
# >> <maven2-moduleset plugin="maven-plugin#1.504">
# >> <actions/>
# >> <description/>
# >> <keepDependencies>false</keepDependencies>
# >> <properties>
# >> <hudson.plugins.throttleconcurrents.ThrottleJobProperty plugin="throttle-concurrents#1.7.2">
# >> </hudson.plugins.throttleconcurrents.ThrottleJobProperty>
# >> </properties>
# >> </maven2-moduleset>
Or as a fragment, which, because it's complete, will result in the same thing:
puts Nokogiri::XML::DocumentFragment.parse(xml).to_xml
# >> <?xml version='1.0' encoding='UTF-8'?>
# >> <maven2-moduleset plugin="maven-plugin#1.504">
# >> <actions/>
# >> <description/>
# >> <keepDependencies>false</keepDependencies>
# >> <properties>
# >> <hudson.plugins.throttleconcurrents.ThrottleJobProperty plugin="throttle-concurrents#1.7.2">
# >> </hudson.plugins.throttleconcurrents.ThrottleJobProperty>
# >> </properties>
# >> </maven2-moduleset>
Instead of using Net::HTTP, which is the bare-building blocks for HTTP, I'd recommend looking at something a bit higher-level, like HTTPClient. Here's code that is similar to yours:
require 'httpclient'
require 'nokogiri'
URL = 'http://jenkins.my.domain.web:8080/my/job/location/config.xml'
http_client = HTTPClient.new
xml_doc = Nokogiri::HTML(
http_client.get_content(URL)
)
# Get current branch name using CSS for simplicity:
branch_name = xml_doc.at('hudson.plugins.git.branchspec name')
# Get new branch name
print 'Enter new branch name '
new_branch_name = gets.chomp.downcase
# Set branch name and create xml
branch_name.content = new_branch_name
puts 'Logging into Jenkins'
http_client.set_auth(domain, 'user', 'password')
response = http_client.post(URL, :body => xml_doc.to_xml)
I can't test it but it looks close.
I, now, find myself in another dilemma. I am seeing that the methods which allow moving to elements and editing values like at_xpath, at_css only work with Nokogiri::HTML or Nokogiri::HTML::DocumentFragment. They don't work when I use Nokogiri::XML. Using Nokogiri::HTML changes the case of the HTML tags. false becomes false. Jenkins does accept the xml with changed case of tags. Methods to_html, to_xml basically returns a string so I cannot use the xpath or css methods to navigate the xml tree. Is there a way around ?
The at methods work with both XML and HTML, and allows CSS and XPath selectors; Everything inside Nokogiri is really XML-based.
Nokogiri folds HTML tags to lower-case because HTML is case-insensitive, so at expects a lower-case value when dealing with HTML. XML is case-sensitive, so Nokogiri leaves the tag case alone, and at requires you to use the correct case when using CSS.
This is documented in the Nokogiri docs:
Note that the CSS query string is case-sensitive with regards to your document type. That is, if you’re looking for “H1” in an HTML document, you’ll never find anything, since HTML tags will match only lowercase CSS queries. However, “H1” might be found in an XML document, where tags names are case-sensitive (e.g., “H1” is distinct from “h1”).
When you are parsing the XML you are receiving from the service, you are declaring it as HTML:
xml_doc = Nokogiri::HTML(getQueue.body)
And this appears to cause Nokogiri to add HTML nodes.
Try parsing it as XML instead:
xml_doc = Nokogiri::XML(getQueue.body)

Get value of XML attribute with namespace

I'm parsing a pptx file and ran into an issue. This is a sample of the source XML:
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<p:presentation xmlns:a="http://schemas.openxmlformats.org/drawingml/2006/main" xmlns:p="http://schemas.openxmlformats.org/presentationml/2006/main" xmlns:r="http://schemas.openxmlformats.org/officeDocument/2006/relationships">
<p:sldMasterIdLst>
<p:sldMasterId id="2147483648" r:id="rId2"/>
</p:sldMasterIdLst>
<p:sldIdLst>
<p:sldId id="256" r:id="rId3"/>
</p:sldIdLst>
<p:sldSz cx="10080625" cy="7559675"/>
<p:notesSz cx="7772400" cy="10058400"/>
</p:presentation>
I need to to get the r:id attribute value in the sldMasterId tag.
doc = Nokogiri::XML(path_to_pptx)
doc.xpath('p:presentation/p:sldMasterIdLst/p:sldMasterId').attr('id').value
returns 2147483648 but I need rId2, which is the r:id attribute value.
I found the attribute_with_ns(name, namespace) method, but
doc.xpath('p:presentation/p:sldMasterIdLst/p:sldMasterId').attribute_with_ns('id', 'r')
returns nil.
You can reference the namespace of attributes in your xpath the same way you reference element namespaces:
doc.xpath('p:presentation/p:sldMasterIdLst/p:sldMasterId/#r:id')
If you want to use attribute_with_ns, you need to use the actual namespace, not just the prefix:
doc.at_xpath('p:presentation/p:sldMasterIdLst/p:sldMasterId')
.attribute_with_ns('id', "http://schemas.openxmlformats.org/officeDocument/2006/relationships")
http://nokogiri.org/Nokogiri/XML/Node.html#method-i-attributes
If you need to distinguish attributes with the same name, with different namespaces use attribute_nodes instead.
doc.xpath('p:presentation/p:sldMasterIdLst/p:sldMasterId').each do |element|
element.attribute_nodes().select do |node|
puts node if node.namespace && node.namespace.prefix == "r"
end
end

How to replace a particular line in xml with the new one in ruby

I have a requirement where I need to replace the element value with the new one and I dont want any other modification to be done to the file.
<mtn:test-case title='Power-Consist-Message'>
<mtn:messages>
<mtn:message sequence='4' correlation-key='0x0F04'>
<mtn:header>
<mtn:protocol-version>0x4</mtn:protocol-version>
<mtn:message-type>0x0F04</mtn:message-type>
<mtn:message-version>0x01</mtn:message-version>
<mtn:gmt-time-switch>false</mtn:gmt-time-switch>
<mtn:crc-calc-switch>1</mtn:crc-calc-switch>
<mtn:encrypt-switch>false</mtn:encrypt-switch>
<mtn:compress-switch>false</mtn:compress-switch>
<mtn:ttl>999</mtn:ttl>
<mtn:qos-class-of-service>0</mtn:qos-class-of-service>
<mtn:qos-priority>2</mtn:qos-priority>
<mtn:qos-network-preference>1</mtn:qos-network-preference>
this is how the xml file looks like, I want to replace 999 with "some other value", under s section, but when am doing that using formatter in ruby some other unwanted modifications are taking place, the code that am using is as belows
File.open(ENV['CadPath1']+ "conf\\cad-mtn-config.xml") do |config_file|
# Open the document and edit the file
config = Document.new(config_file)
testField=config.root.elements[4].elements[11].elements[1].elements[1].elements[1].elements[11]
if testField.to_s.match(/<mtn:qos-network-preference>/)
test=config.root.elements[4].elements[11].elements[1].elements[1].elements[1].elements[8].text="2"
# Write the result to a new file.
formatter = REXML::Formatters::Default.new
File.open(ENV['CadPath1']+ "conf\\cad-mtn-config.xml", 'w') do |result|
formatter.write(config, result)
end
end
end
when am writting the modifications to the new file, the xml file size is getting changed from 79kb to 78kb, is there any way to just replace the particular line in xml file and save changes without affecting the xml file.
Please let me know soon...
I prefer Nokogiri as my XML/HTML parser of choice:
require 'nokogiri'
xml =<<EOT
<mtn:test-case title='Power-Consist-Message'>
<mtn:messages>
<mtn:message sequence='4' correlation-key='0x0F04'>
<mtn:header>
<mtn:protocol-version>0x4</mtn:protocol-version>
<mtn:message-type>0x0F04</mtn:message-type>
<mtn:message-version>0x01</mtn:message-version>
<mtn:gmt-time-switch>false</mtn:gmt-time-switch>
<mtn:crc-calc-switch>1</mtn:crc-calc-switch>
<mtn:encrypt-switch>false</mtn:encrypt-switch>
<mtn:compress-switch>false</mtn:compress-switch>
<mtn:ttl>999</mtn:ttl>
<mtn:qos-class-of-service>0</mtn:qos-class-of-service>
<mtn:qos-priority>2</mtn:qos-priority>
<mtn:qos-network-preference>1</mtn:qos-network-preference>
EOT
Notice that the XML is malformed, i.e., it doesn't terminate correctly.
doc = Nokogiri::XML(xml)
I'm using CSS accessors to find the ttl node. Because of some magic, Nokogiri's CSS ignores XML name spaces, simplifying finding nodes.
doc.at('ttl').content = '1000'
puts doc.to_xml
# >> <?xml version="1.0"?>
# >> <test-case title="Power-Consist-Message">
# >> <messages>
# >> <message sequence="4" correlation-key="0x0F04">
# >> <header>
# >> <protocol-version>0x4</protocol-version>
# >> <message-type>0x0F04</message-type>
# >> <message-version>0x01</message-version>
# >> <gmt-time-switch>false</gmt-time-switch>
# >> <crc-calc-switch>1</crc-calc-switch>
# >> <encrypt-switch>false</encrypt-switch>
# >> <compress-switch>false</compress-switch>
# >> <ttl>1000</ttl>
# >> <qos-class-of-service>0</qos-class-of-service>
# >> <qos-priority>2</qos-priority>
# >> <qos-network-preference>1</qos-network-preference>
# >> </header></message></messages></test-case>
Notice that Nokogiri replaced the content of the ttl node. It also stripped the XML namespace info because the document didn't declare it correctly, and, finally, Nokogiri has added closing tags to make the document syntactically correct.
If you want the namespace to be declared in the output, you'll need to make sure it's there in the input.
If you need to just literally replace that value without affecting anything else about the XML file, even if (as pointed by the Tin Man above) that would mean leaving the original XML file malformed, you can do that with direct string manipulation using a regular expression.
Assuming there is guaranteed to only be one <mtn:ttl> tag in your XML document, you could just do:
doc = IO.read("somefile.xml")
doc.sub! /<mtn:ttl>.+?<\/mtn:ttl>/, "<mtn:ttl>some other value<\/mtn:ttl>"
File.open("somefile.xml", "w") {|fh| fh.write(doc)}
If there might be more than one <mtn:ttl> tag, then this is trickier; how much trickier depends on how you want to figure out which tag(s) to change.

Resources