Can anyone tell me how I can comment the following line in my .XML file using Ruby?
I hope this can be done by using "nokogiri".
<message group="1" sub_group="1" type="none" destination="mydata" remark="mylist" userOnly="true "/>
output should be:
<!-- <message group="1" sub_group="1" type="none" destination="mydata" remark="mylist" userOnly="true "/> -->
You can search your document with the search method, add a comment with Comment.new and then remove the original line with the remove method.
Nokogiri::XML::Comment.new(doc, node.to_s)
Class: Nokogiri::XML::Comment
Edit:
I implemented an example, but used replace instead of remove:
require 'nokogiri'
f = File.open('./config.xml')
x = Nokogiri::XML(f);
x.search('message').each do |el|
puts(el.to_s)
c = Nokogiri::XML::Comment.new(x, el.to_s);
el.replace(c);
end
File.write('./config.xml', x.to_xml);
Related
I want to parse multiple like-formatted XML files into a CSV file.
I searched on Google, nokogiri.org, and on SO but I haven't been able to find an answer.
I have ten XML files in identical format in terms of node/element structure, that reside in the current directory.
After combining the XML files into a single XML file, I need to pull out specific elements of the advisory node. I would like to output the link, title, location, os -> language -> name, and reference -> name data to the CSV file.
My code is only able to parse a single XML document and I'd like it to take into account 1:many:
# Parse the XML file into a Nokogiri::XML::Document object
#doc = Nokogiri::XML(File.open("file.xml"))
# Gather the 5 specific XML elements out of the 'advisory' top-level node
data = #doc.search('advisory').map { |adv|
[
adv.at('link').content,
adv.at('title').content,
adv.at('location').content,
adv.at('os > language > name').content,
adv.at('reference > name').content
]
}
# Loop through each array element in the object and write out as CSV row
CSV.open('output_file.csv', 'wb') do |csv|
# Explicitly set headers until you figure out how to get them programatically
csv << ['Link', 'Title', 'Location', 'OS Name', 'Reference Name']
data.each do |row|
csv << row
end
end
I tried changing the code to support multiple XML files and get them into Nokogiri::XML::Document objects:
xml_docs = []
Dir.glob("*.xml").each do |file|
xml = Nokogiri::XML(File.new(file))
xml_docs << Nokogiri::XML::Document.new(xml)
end
This successfully creates an array xml_docs with the correct objects it in, but I don't know how to convert these six objects into a single object.
This is sample XML. All XML files use the same node/element structure:
<advisories>
<title> Not relevant </title>
<customer> N/A </customer>
<advisory id="12345">
<link> https://www.google.com </link>
<release_date>2016-04-07</release_date>
<title> The Short Description Would Go Here </title>
<location> Location Name Here </location>
<os>
<product>
<id>98765</id>
<name>Product Name</name>
</product>
<language>
<id>123</id>
<name>en</name>
</language>
</os>
<reference>
<id>00029</id>
<name>Full</name>
<area>Not Defined</area>
</reference>
</advisory>
<advisory id="98765">
<link> https://www.msn.com </link>
<release_date>2016-04-08</release_date>
<title> The Short Description Would Go Here </title>
<location> Location Name Here </location>
<os>
<product>
<id>12654</id>
<name>Product Name</name>
</product>
<language>
<id>126</id>
<name>fr</name>
</language>
</os>
<reference>
<id>00052</id>
<name>Partial</name>
<area>Defined</area>
</reference>
</advisory>
</advisories>
The code leverages Nokogiri::XML::Document but if Nokogiri::XML::Builder will work better for this, I am more than willing to adjust my code accordingly.
I'd handle the first part, of parsing one XML file, like this:
require 'nokogiri'
doc = Nokogiri::XML(<<EOT)
<advisories>
<advisory id="12345">
<link> https://www.google.com </link>
<title> The Short Description Would Go Here </title>
<location> Location Name Here </location>
<os>
<language>
<name>en</name>
</language>
</os>
<reference>
<name>Full</name>
</reference>
</advisory>
<advisory id="98765">
<link> https://www.msn.com </link>
<release_date>2016-04-08</release_date>
<title> The Short Description Would Go Here </title>
<location> Location Name Here </location>
<os>
<language>
<name>fr</name>
</language>
</os>
<reference>
<name>Partial</name>
</reference>
</advisory>
</advisories>
EOT
Note: This has nodes removed because they weren't important to the question. Please remove fluff when asking as it's distracting.
With this being the core of the code:
doc.search('advisory').map{ |advisory|
link = advisory.at('link').text
title = advisory.at('title').text
location = advisory.at('location').text
os_language_name = advisory.at('os > language > name').text
reference_name = advisory.at('reference > name').text
{
link: link,
title: title,
location: location,
os_language_name: os_language_name,
reference_name: reference_name
}
}
That could be DRY'd but was written as an example of what to do.
Running that results in an array of hashes, which would be easily output via CSV:
# => [
{:link=>" https://www.google.com ", :title=>" The Short Description Would Go Here ", :location=>" Location Name Here ", :os_language_name=>"en", :reference_name=>"Full"},
{:link=>" https://www.msn.com ", :title=>" The Short Description Would Go Here ", :location=>" Location Name Here ", :os_language_name=>"fr", :reference_name=>"Partial"}
]
Once you've got that working then fit it into a modified version of your loops to output CSV and read the XML files. This is untested but looks about right:
CSV.open('output_file.csv', 'w',
headers: ['Link', 'Title', 'Location', 'OS Name', 'Reference Name'],
write_headers: true
) do |csv|
Dir.glob("*.xml").each do |file|
xml = Nokogiri::XML(File.read(file))
# parse a file and get the array of hashes
end
# pass the array of hashes to CSV for output
end
Note that you were using a file mode of 'wb'. You rarely need b with CSV as CSV is supposed to be a text format. If you are sure you will encounter binary data then use 'b' also, but that could lead down a path containing dragons.
Also note that this is using read. read is not scalable, which means it doesn't care how big a file is, it's going to try to read it into memory, whether or not it'll actually fit. There are lots of reasons to avoid that, but the best is it'll take your program to its knees. If your XML files could exceed the available free memory for your system then you'll want to rewrite using a SAX parser, which Nokogiri supports. How to do that is a different question.
it was actually an Array of array of hashes. I'm not sure how I ended up there but I was easily able to use array.flatten
Meditate on this:
foo = [] # => []
foo << [{}] # => [[{}]]
foo.flatten # => [{}]
You probably wanted to do this:
foo = [] # => []
foo += [{}] # => [{}]
Any time I have to use flatten I look to see if I can create the array without it being an array of arrays of something. It's not that they're inherently bad, because sometimes they're very useful, but you really wanted an array of hashes so you knew something was wrong and flatten was a cheap way out, but using it also costs more CPU time. It's better to figure out the problem and fix it and end up with faster/more efficient code. (And some will say that's a wasted effort or is premature optimization, but writing efficient code is a very good trait and goal.)
How do i find a text() node in XmlConfig and use it to remove the parent nodeset. All the examples I have seen just find and remove the 'found' node not the parent nodes.
My understanding is that this Xpath finds the matching node via verify path and ElementPath is the path of the nodes to remove. However Its not working at all.
Is text() supported?, I have tried [[*='ATrigger'[]], [[].='ATrigger'[]] but still no luck.
<util:XmlConfig Id="RemoveATriggerCompletely" File="[#QuartzXmlJob]" Sequence="104" Action="delete" On ="install" Node="element"
ElementPath="//job-scheduling-data/schedule/"
VerifyPath="//job-scheduling-data/schedule/trigger/cron/name[\[]text()='ATrigger'[\]]"/>
Given the following XML
<job-scheduling-data xmlns="http://quartznet.sourceforge.net/JobSchedulingData" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" version="2.0">
<schedule>
<trigger>
<cron>
<name>ATrigger</name>
<group>default</group>
<description>Every 2 minutes</description>
<job-name>ATriggerJob</job-name>
<job-group>defaultGroup</job-group>
<misfire-instruction>SmartPolicy</misfire-instruction>
<!-- every 5mins -->
<cron-expression>2 * * * * ?</cron-expression>
</cron>
</trigger>
<trigger>
<cron>
<name>BTrigger</name>
<group>default</group>
<description>Every 2 minutes</description>
<job-name>BTriggerJob</job-name>
<job-group>defaultGroup</job-group>
<misfire-instruction>SmartPolicy</misfire-instruction>
<!-- every 5mins -->
<cron-expression>2 * * * * ?</cron-expression>
</cron>
</trigger>
The output i require is
<job-scheduling-data xmlns="http://quartznet.sourceforge.net/JobSchedulingData" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" version="2.0">
<schedule>
<trigger>
<cron>
<name>BTrigger</name>
<group>default</group>
<description>Every 2 minutes</description>
<job-name>BTriggerJob</job-name>
<job-group>defaultGroup</job-group>
<misfire-instruction>SmartPolicy</misfire-instruction>
<!-- every 5mins -->
<cron-expression>2 * * * * ?</cron-expression>
</cron>
</trigger>
I have been banging my head against a wall for hours now so any help whatsoever is very much appreciated.
I'm not familiar with that "XmlConfig" task.
But I see two issues.
You need to alias the namespace and use that alias for the xpath "select".
I show the XmlPeek version below. Note I use "peanut", you can use any alias name you want.
Note the query now has "peanut:" in all the element names.
<!-- you do not need a namespace for this example, but I left it in for future reference -->
<XmlPeek Namespaces="<Namespace Prefix='peanut' Uri='http://quartznet.sourceforge.net/JobSchedulingData'/>"
XmlInputPath=".\Parameters.xml"
Query="//peanut:job-scheduling-data/peanut:schedule/peanut:trigger/peanut:cron/peanut:name[text()='ATrigger']/../..">
<Output TaskParameter="Result" ItemName="Peeked" />
</XmlPeek>
Note my "../.."
Once you find the correct element, finding its parent(s) is simple as the ".." notation.
You'll need to figure out how to add the namespace and alias to your Task (XmlConfig)
APPEND:
http://sourceforge.net/p/wix/bugs/2384/
Hmmm...dealing with a namespace isn't trivial. I think you're having a similar issue to what is reported there.
APPEND:
"Inline" Namespacing ... (Yuck)
<XmlPeek
XmlInputPath=".\Parameters.xml"
Query="//*[local-name()='job-scheduling-data' and namespace-uri()='http://quartznet.sourceforge.net/JobSchedulingData']/*[local-name()='schedule' and namespace-uri()='http://quartznet.sourceforge.net/JobSchedulingData']/*[local-name()='trigger' and namespace-uri()='http://quartznet.sourceforge.net/JobSchedulingData']/*[local-name()='cron' and namespace-uri()='http://quartznet.sourceforge.net/JobSchedulingData']/*[local-name()='name' and namespace-uri()='http://quartznet.sourceforge.net/JobSchedulingData'][text()='ATrigger']/../..">
<Output TaskParameter="Result" ItemName="Peeked" />
</XmlPeek>
Aka, try this Xpath:
"//*[local-name()='job-scheduling-data' and namespace-uri()='http://quartznet.sourceforge.net/JobSchedulingData']/*[local-name()='schedule' and namespace-uri()='http://quartznet.sourceforge.net/JobSchedulingData']/*[local-name()='trigger' and namespace-uri()='http://quartznet.sourceforge.net/JobSchedulingData']/*[local-name()='cron' and namespace-uri()='http://quartznet.sourceforge.net/JobSchedulingData']/*[local-name()='name' and namespace-uri()='http://quartznet.sourceforge.net/JobSchedulingData'][text()='ATrigger']/../.."
This is a XML code snippet:
<testcase name="T.3.03.02">
<cmd>CMD_EXPORT_RAM_KEY</cmd>
<sreg_pre>40</sreg_pre>
<sreg_pre_bitmask>ff</sreg_pre_bitmask>
<sreg_post>40</sreg_post>
<sreg_post_bitmask>ff</sreg_post_bitmask>
<erc>ERC_NO_ERROR</erc>
<testvector>
<parameter name="UID" type="info">000000000000000000000000000002</parameter>
<parameter name="UID'" type="info">000000000000000000000000000002</parameter>
<parameter name="KeyId" type="info">0e</parameter>
<parameter name="Key" type="info">0f0e0d0c0b0a09080706050403020100</parameter>
<parameter name="AuthId" type="info">00</parameter>
<parameter name="KeyAuth" type="info">2b7e151628aed2a6abf7158809cf4f3c</parameter>
<parameter name="Old counter value of updated key slot" type="info">0000000</parameter>
<parameter name="New counter value C'" type="info">0000000</parameter>
<parameter name="Protection flags F'" type="info">00</parameter>
<parameter name="M1" type="output">000000000000000000000000000002e0</parameter>
<parameter name="M2" type="output">152876f29dc7ca8d18e38d70374492b05d908c8c584a0409849a553c75254def</parameter>
<parameter name="M3" type="output">bc6e79bc4458339174fc80fb08b83188</parameter>
<parameter name="M4" type="output">000000000000000000000000000002e07783b86ae87b87e3ca12809c2df75fae</parameter>
<parameter name="M5" type="output">c8fcc8859c69c8bd840ce8e24c5114e9</parameter>
</testvector>
<precondition>RAM_KEY_PLAIN = 1; RAM_KEY_EMPTY = 0</precondition>
<description>Export plain RAM_KEY with external debugger attached; Note: The security flags SECURE_BOOT_PROTECTION and DEBUGGER_PROTECTION of the key SECRET_KEY are inherited from MASTER_ECU_KEY.</description>
</testcase>
I want to access all "parameter name="Key" type="info" values.
How do I access these values conditionally if the condition <cmd>CMD_EXPORT_RAM_KEY(second line in XML)</cmd> is valid.
In this XML file there are also other commands (<cmd> lines) also with the "Key" parameter,
but in these cases I don't want to get the key-values.
I didn't get it running.
Can anyone help me with some ideas?
Would something like this work?
doc = Nokogiri::parse(File.read( "data.xml" ))
check = doc.xpath( "//cmd" ).select{|el| el.children[0].text == "CMD_EXPORT_RAM_KEY" }
puts "Check: %i" % check.size
if(check.size == 0)
## Do stuff here
end
Try the below XPath with Nokogiri:
//testcase/cmd[text()='CMD_EXPORT_RAM_KEY']/../testvector/parameter[#name="Key" and #type="info"]
Of course, you can parameterize the CMD_EXPORT_RAM_KEY and #name/#type values.
The trick is to locate the specific ones you want using a selector, then narrow down further if necessary.
Using the CSS selector 'parameter[#name="Key"][#type="info"]' Nokogiri easily finds the single occurrence in your sample. If there were more, then more would be returned:
require 'nokogiri'
doc = Nokogiri::XML(<<EOT)
<testcase name="T.3.03.02">
<testvector>
<parameter name="UID" type="info">000000000000000000000000000002</parameter>
<parameter name="Key" type="info">0f0e0d0c0b0a09080706050403020100</parameter>
</testvector>
</testcase>
EOT
doc.search('parameter[#name="Key"][#type="info"]').map(&:content)
# => ["0f0e0d0c0b0a09080706050403020100"]
I used CSS because it looks less like line noise than the equivalent XPath selector would.
Also, when supplying sample data, reduce it to the bare minimum necessary to test the code. Anything beyond that wastes our time, and, if it's too much, can actually cause you to get no answers because nobody wants to wade through that.
I am having problems understanding Net::HTTP and Nokogiri.
I have a large number of jobs on my Jenkins server. I have to periodically update the branch name on these jobs. Doing it from the UI is a cumbersome process so I decided to update the Jenkins config.xml.
I use Nokogiri to parse the XML, traverse the XPath and update the value of the node. However, when I try to post the updated XML back to Jenkins, I get a 500 error saying:
Caused by: javax.xml.transform.TransformerException: org.xml.sax.SAXParseExceptionpublicId: -//W3C//DTD HTML 4.0 Transitional//EN; systemId: http://www.w3.org/TR/REC-html40/loose.dtd; lineNumber: 31; columnNumber: 3; The declaration for the entity "HTML.Version" must end with '>'.
Here is what I am doing:
require "net/http"
require "nokogiri"
uri = URI.parse("http://jenkins.my.domain.web:8080")
http = Net::HTTP.new(uri.host, uri.port)
getQueueRequest = Net::HTTP::Get.new("http://jenkins.my.domain.web:8080/my/job/location/config.xml")
getQueue = http.request(getQueueRequest)
xml_doc = Nokogiri::HTML(getQueue.body)
# Get current branch name
branch_name=xml_doc.at_xpath('//hudson.plugins.git.branchspec/name')
# Get new branch name
print "Enter new branch name "
user_input = gets.chomp
new_branch_name = user_input.downcase
# Set branch name and create xml
branch_name.content=new_branch_name
new_config_xml=xml_doc.to_xml
puts "Logging into Jenkins"
update_branch = Net::HTTP::Post.new("http://jenkins.my.domain.web:8080/my/job/location/config.xml")
update_branch.basic_auth 'username', 'password'
update_branch.body = new_config_xml
response = http.request(update_branch)
puts response.body
I understand it might have to do something with the XML that is getting added to request body but I am not sure how to fix the issue.
Original XML:
<?xml version='1.0' encoding='UTF-8'?>
<maven2-moduleset plugin="maven-plugin#1.504">
<actions/>
<description></description>
<keepDependencies>false</keepDependencies>
<properties>
<hudson.plugins.throttleconcurrents.ThrottleJobProperty plugin="throttle-concurrents#1.7.2">
<maxConcurrentPerNode>0</maxConcurrentPerNode>
<maxConcurrentTotal>0</maxConcurrentTotal>
<categories/>
<throttleEnabled>false</throttleEnabled>
<throttleOption>project</throttleOption>
<configVersion>1</configVersion>
</hudson.plugins.throttleconcurrents.ThrottleJobProperty>
</properties>
<scm class="hudson.plugins.git.GitSCM" plugin="git#1.4.0">
<configVersion>2</configVersion>
<userRemoteConfigs>
<hudson.plugins.git.UserRemoteConfig>
<name></name>
<refspec></refspec>
<url>git#github.com:<ORG_NAME>/<REPO_NAME>.git</url>
</hudson.plugins.git.UserRemoteConfig>
</userRemoteConfigs>
<branches>
<hudson.plugins.git.BranchSpec>
<name>release</name>
</hudson.plugins.git.BranchSpec>
</branches>
<disableSubmodules>false</disableSubmodules>
<recursiveSubmodules>false</recursiveSubmodules>
<doGenerateSubmoduleConfigurations>false</doGenerateSubmoduleConfigurations>
<authorOrCommitter>false</authorOrCommitter>
<clean>false</clean>
<wipeOutWorkspace>false</wipeOutWorkspace>
<pruneBranches>false</pruneBranches>
<remotePoll>false</remotePoll>
<ignoreNotifyCommit>false</ignoreNotifyCommit>
<useShallowClone>false</useShallowClone>
<buildChooser class="hudson.plugins.git.util.DefaultBuildChooser"/>
<gitTool>Default</gitTool>
<submoduleCfg class="list"/>
<relativeTargetDir></relativeTargetDir>
<reference></reference>
<excludedRegions></excludedRegions>
<excludedUsers></excludedUsers>
<gitConfigName></gitConfigName>
<gitConfigEmail></gitConfigEmail>
<skipTag>false</skipTag>
<includedRegions></includedRegions>
<scmName></scmName>
</scm>
<canRoam>true</canRoam>
<disabled>false</disabled>
<blockBuildWhenDownstreamBuilding>false</blockBuildWhenDownstreamBuilding>
<blockBuildWhenUpstreamBuilding>false</blockBuildWhenUpstreamBuilding>
<triggers class="vector">
<hudson.triggers.TimerTrigger>
<spec>0 22 * * 4</spec>
</hudson.triggers.TimerTrigger>
</triggers>
<concurrentBuild>false</concurrentBuild>
<rootModule>
<groupId>com.org.project.test</groupId>
<artifactId>functest</artifactId>
</rootModule>
<goals>clean verify -Dtestsuite=<test_suite_name> -Dbrowser=chrome -Dipaddress=http://<IP_ADDRESS>:4444/wd/hub</goals>
<mavenName>apache-maven-3.0.4</mavenName>
<aggregatorStyleBuild>true</aggregatorStyleBuild>
<incrementalBuild>false</incrementalBuild>
<perModuleEmail>true</perModuleEmail>
<ignoreUpstremChanges>false</ignoreUpstremChanges>
<archivingDisabled>false</archivingDisabled>
<resolveDependencies>false</resolveDependencies>
<processPlugins>false</processPlugins>
<mavenValidationLevel>-1</mavenValidationLevel>
<runHeadless>false</runHeadless>
<disableTriggerDownstreamProjects>false</disableTriggerDownstreamProjects>
<settings class="jenkins.mvn.DefaultSettingsProvider"/>
<globalSettings class="jenkins.mvn.DefaultGlobalSettingsProvider"/>
<reporters/>
<publishers/>
<buildWrappers/>
<prebuilders/>
<postbuilders/>
<runPostStepsIfResult>
<name>FAILURE</name>
<ordinal>2</ordinal>
<color>RED</color>
</runPostStepsIfResult>
</maven2-moduleset>
After Editing and Massaging:
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd">
<?xml version="1.0" encoding="UTF-8"?>
<html>
<body>
<maven2-moduleset plugin="maven-plugin#1.504">
<actions />
<description />
<keepdependencies>false</keepdependencies>
<properties>
<hudson.plugins.throttleconcurrents.throttlejobproperty plugin="throttle-concurrents#1.7.2">
<maxconcurrentpernode>0</maxconcurrentpernode>
<maxconcurrenttotal>0</maxconcurrenttotal>
<categories />
<throttleenabled>false</throttleenabled>
<throttleoption>project</throttleoption>
<configversion>1</configversion>
</hudson.plugins.throttleconcurrents.throttlejobproperty>
</properties>
<scm class="hudson.plugins.git.GitSCM" plugin="git#1.4.0">
<configversion>2</configversion>
<userremoteconfigs>
<hudson.plugins.git.userremoteconfig>
<name />
<refspec />
<url>git#github.com:<ORG_NAME>/<REPO_NAME>.git</url>
</hudson.plugins.git.userremoteconfig>
</userremoteconfigs>
<branches>
<hudson.plugins.git.branchspec>
<name>master</name>
</hudson.plugins.git.branchspec>
</branches>
<disablesubmodules>false</disablesubmodules>
<recursivesubmodules>false</recursivesubmodules>
<dogeneratesubmoduleconfigurations>false</dogeneratesubmoduleconfigurations>
<authororcommitter>false</authororcommitter>
<clean>false</clean>
<wipeoutworkspace>false</wipeoutworkspace>
<prunebranches>false</prunebranches>
<remotepoll>false</remotepoll>
<ignorenotifycommit>false</ignorenotifycommit>
<useshallowclone>false</useshallowclone>
<buildchooser class="hudson.plugins.git.util.DefaultBuildChooser" />
<gittool>Default</gittool>
<submodulecfg class="list" />
<relativetargetdir />
<reference />
<excludedregions />
<excludedusers />
<gitconfigname />
<gitconfigemail />
<skiptag>false</skiptag>
<includedregions />
<scmname />
</scm>
<canroam>true</canroam>
<disabled>false</disabled>
<blockbuildwhendownstreambuilding>false</blockbuildwhendownstreambuilding>
<blockbuildwhenupstreambuilding>false</blockbuildwhenupstreambuilding>
<triggers class="vector">
<hudson.triggers.timertrigger>
<spec>0 22 * * 4</spec>
</hudson.triggers.timertrigger>
</triggers>
<concurrentbuild>false</concurrentbuild>
<rootmodule>
<groupid>com.org.project.test</groupid>
<artifactid>functest</artifactid>
</rootmodule>
<goals>clean verify -Dtestsuite=<test_suite_name> -Dbrowser=chrome -Dipaddress=http://<IP_ADDRESS>:4444/wd/hub</goals>
<mavenname>apache-maven-3.0.4</mavenname>
<aggregatorstylebuild>true</aggregatorstylebuild>
<incrementalbuild>false</incrementalbuild>
<permoduleemail>true</permoduleemail>
<ignoreupstremchanges>false</ignoreupstremchanges>
<archivingdisabled>false</archivingdisabled>
<resolvedependencies>false</resolvedependencies>
<processplugins>false</processplugins>
<mavenvalidationlevel>-1</mavenvalidationlevel>
<runheadless>false</runheadless>
<disabletriggerdownstreamprojects>false</disabletriggerdownstreamprojects>
<settings class="jenkins.mvn.DefaultSettingsProvider" />
<globalsettings class="jenkins.mvn.DefaultGlobalSettingsProvider" />
<reporters />
<publishers />
<buildwrappers />
<prebuilders />
<postbuilders />
<runpoststepsifresult>
<name>FAILURE</name>
<ordinal>2</ordinal>
<color>RED</color>
</runpoststepsifresult>
</maven2-moduleset>
</body>
</html>
When you use Nokogiri::HTML(some_html) or Nokogiri::XML(some_xml), Nokogiri will look to see if the content is valid. If it isn't, it will do fix-ups on the content in an attempt to make it so. For instance:
require 'nokogiri'
html_fragment = "<p>foo bar</p>"
Nokogiri::HTML(html_fragment).to_html
# => "<!DOCTYPE html PUBLIC \"-//W3C//DTD HTML 4.0 Transitional//EN\" \"http://www.w3.org/TR/REC-html40/loose.dtd\">\n<html><body><p>foo bar</p></body></html>\n"
If the document is partially correct Nokogiri still adds the DOCTYPE statement:
html = "<html><body><p>foo bar</p></body></html>"
Nokogiri::HTML(html).to_html
# => "<!DOCTYPE html PUBLIC \"-//W3C//DTD HTML 4.0 Transitional//EN\" \"http://www.w3.org/TR/REC-html40/loose.dtd\">\n<html><body><p>foo bar</p></body></html>\n"
If you want Nokogiri to leave the document along, because it's supposed to be a fragment, tell it to do so:
Nokogiri::HTML::DocumentFragment.parse(html_fragment).to_html
# => "<p>foo bar</p>"
Or:
xml_fragment = "<x>foo bar</x>"
Nokogiri::XML::DocumentFragment.parse(xml_fragment).to_xml
# => "<x>foo bar</x>"
Nokogiri is pretty smart about handling XML and HTML. You can try to confuse it and it'll generally do the right thing:
xml_fragment = "<x>foo bar</x>"
Nokogiri::HTML::DocumentFragment.parse(xml_fragment).to_xml
# => "<x>foo bar</x>"
That's parsing XML as an HTML fragment and telling it to emit it as XML.
Now, that all said, it's pretty obvious Nokogiri isn't doing anything mysterious, so, here's how to fix the problem. First, parse it as XML so Nokogiri doesn't think it should add the HTML DOCTYPE declaration, then, if the XML is syntactically correct, tell Nokogiri it's OK to parse it as a complete document:
require 'nokogiri'
xml = %{<?xml version='1.0' encoding='UTF-8'?>
<maven2-moduleset plugin="maven-plugin#1.504">
<actions/>
<description></description>
<keepDependencies>false</keepDependencies>
<properties>
<hudson.plugins.throttleconcurrents.ThrottleJobProperty plugin="throttle-concurrents#1.7.2">
</hudson.plugins.throttleconcurrents.ThrottleJobProperty>
</properties>
</maven2-moduleset>
}
puts Nokogiri::XML.parse(xml).to_xml
# >> <?xml version="1.0" encoding="UTF-8"?>
# >> <maven2-moduleset plugin="maven-plugin#1.504">
# >> <actions/>
# >> <description/>
# >> <keepDependencies>false</keepDependencies>
# >> <properties>
# >> <hudson.plugins.throttleconcurrents.ThrottleJobProperty plugin="throttle-concurrents#1.7.2">
# >> </hudson.plugins.throttleconcurrents.ThrottleJobProperty>
# >> </properties>
# >> </maven2-moduleset>
Or as a fragment, which, because it's complete, will result in the same thing:
puts Nokogiri::XML::DocumentFragment.parse(xml).to_xml
# >> <?xml version='1.0' encoding='UTF-8'?>
# >> <maven2-moduleset plugin="maven-plugin#1.504">
# >> <actions/>
# >> <description/>
# >> <keepDependencies>false</keepDependencies>
# >> <properties>
# >> <hudson.plugins.throttleconcurrents.ThrottleJobProperty plugin="throttle-concurrents#1.7.2">
# >> </hudson.plugins.throttleconcurrents.ThrottleJobProperty>
# >> </properties>
# >> </maven2-moduleset>
Instead of using Net::HTTP, which is the bare-building blocks for HTTP, I'd recommend looking at something a bit higher-level, like HTTPClient. Here's code that is similar to yours:
require 'httpclient'
require 'nokogiri'
URL = 'http://jenkins.my.domain.web:8080/my/job/location/config.xml'
http_client = HTTPClient.new
xml_doc = Nokogiri::HTML(
http_client.get_content(URL)
)
# Get current branch name using CSS for simplicity:
branch_name = xml_doc.at('hudson.plugins.git.branchspec name')
# Get new branch name
print 'Enter new branch name '
new_branch_name = gets.chomp.downcase
# Set branch name and create xml
branch_name.content = new_branch_name
puts 'Logging into Jenkins'
http_client.set_auth(domain, 'user', 'password')
response = http_client.post(URL, :body => xml_doc.to_xml)
I can't test it but it looks close.
I, now, find myself in another dilemma. I am seeing that the methods which allow moving to elements and editing values like at_xpath, at_css only work with Nokogiri::HTML or Nokogiri::HTML::DocumentFragment. They don't work when I use Nokogiri::XML. Using Nokogiri::HTML changes the case of the HTML tags. false becomes false. Jenkins does accept the xml with changed case of tags. Methods to_html, to_xml basically returns a string so I cannot use the xpath or css methods to navigate the xml tree. Is there a way around ?
The at methods work with both XML and HTML, and allows CSS and XPath selectors; Everything inside Nokogiri is really XML-based.
Nokogiri folds HTML tags to lower-case because HTML is case-insensitive, so at expects a lower-case value when dealing with HTML. XML is case-sensitive, so Nokogiri leaves the tag case alone, and at requires you to use the correct case when using CSS.
This is documented in the Nokogiri docs:
Note that the CSS query string is case-sensitive with regards to your document type. That is, if you’re looking for “H1” in an HTML document, you’ll never find anything, since HTML tags will match only lowercase CSS queries. However, “H1” might be found in an XML document, where tags names are case-sensitive (e.g., “H1” is distinct from “h1”).
When you are parsing the XML you are receiving from the service, you are declaring it as HTML:
xml_doc = Nokogiri::HTML(getQueue.body)
And this appears to cause Nokogiri to add HTML nodes.
Try parsing it as XML instead:
xml_doc = Nokogiri::XML(getQueue.body)
Background:
I want to take some xml from one file, put it in a template file and then save the modified template as a new file. It works, but when I save the file out, all the nodes that I added have a default namespace prepeneded, i.e.
<default:ComponentRef Id="C__AD1817F9C64A42F0A14DDDDC82DFC8D9"/>
<default:ComponentRef Id="C__157DD41D70854617A3D6D1E4A39B589F"/>
<default:ComponentRef Id="C__2E6D8662F38FE62CAFA9F8842A28F510"/>
<default:ComponentRef Id="C__54E5E2181323D4A5F37293DAA87B4230"/>
Which I want to be just:
<ComponentRef Id="C__AD1817F9C64A42F0A14DDDDC82DFC8D9"/>
<ComponentRef Id="C__157DD41D70854617A3D6D1E4A39B589F"/>
<ComponentRef Id="C__2E6D8662F38FE62CAFA9F8842A28F510"/>
<ComponentRef Id="C__54E5E2181323D4A5F37293DAA87B4230"/>
The following is my ruby code:
file = "wixmain/generated/DarkOutput.wxs"
template = "wixmain/generated/MsiComponentTemplate.wxs"
output = "wixmain/generated/MSIComponents.wxs"
dark_output = Nokogiri::XML(File.open(file))
template_file = Nokogiri::XML(File.open(template))
#get stuff from dark output
components = dark_output.at_css("Directory[Id='TARGETDIR']")
component_ref = dark_output.at_css("Feature[Id='DefaultFeature']")
#where to insert in template doc
template_component_insert_point = template_file.at_css("DirectoryRef[Id='InstallDir']")
template_ref_insert_point = template_file.at_css("ComponentGroup[Id='MSIComponentGroup']")
template_component_insert_point.children= components.children()
template_ref_insert_point.children= component_ref.children()
#write out filled template to output file
File.open(output, 'w') { |f| template_file.write_xml_to f }
Update
Example of my template file:
<?xml version="1.0" encoding="utf-8"?>
<Wix xmlns='http://schemas.microsoft.com/wix/2006/wi'>
<Fragment>
<ComponentGroup Id='MSIComponentGroup'>
</ComponentGroup>
</Fragment>
<Fragment Id='MSIComponents'>
<DirectoryRef Id='InstallDir'>
</DirectoryRef>
</Fragment>
</Wix>
Workaround was to remove the xmlns attribute in the input file.
Or to use the remove_namespaces! method when opening the input file
input_file = Nokogiri::XML(File.open(input))
input_file.remove_namespaces!
I think you are missing a sample of the template file. Also, is the sample from the input complete?
Nokogiri is either finding the default: namespace during its parsing of one of the two files, and you are inheriting it, or maybe it is not happy with the sample during parsing and is unable to parse cleanly, and as a result somehow adding the default: namespace. You can check the emptiness of the errors array after parsing the dark_output and template_file to see if Nokogiri is happy.
dark_output = Nokogiri::XML(File.open(file))
template_file = Nokogiri::XML(File.open(template))
if (dark_output.errors.any? || template_file.errors.any?)
[... do something here ...]
end
For the fastest answer, you might want to take this question directly to the developers via the Nokogiri-Talk mail-list.