Nokogiri XML Searching - ruby

I've tried reading the Nokogiri docs, etc, but I've came to a road block.
I get an XML output similar to
<?xml version="1.0"?>
<soap:Envelope xmlns:soap="http://schemas.xmlsoap.org/soap/envelope/">
<soap:Body>
<ns1:getPoliciesResponse xmlns:ns1="http://policy.api.control.r1soft.com/">
<return>
<CDPId>XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXx</CDPId>
<description/>
<diskSafeID>bcb68765-a719-4291-912d-2e6af485ea24</diskSafeID>
<enabled>true</enabled>
<id>cdb65427-d6f4-4a89-9f77-8763e22dc74b</id>
<lastReplicationRunTime>2013-06-12T13:29:40.105-05:00</lastReplicationRunTime>
<name>pstueck-passenger ondemand</name>
<replicationScheduleFrequencyType>ON_DEMAND</replicationScheduleFrequencyType>
<state>OK</state>
</return>
<return>
<CDPId>XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXx</CDPId>
<description/>
<diskSafeID>e8e13555-f577-40d2-99c8-fa8a019d3b55</diskSafeID>
<enabled>true</enabled>
<id>7f55f8d6-92a9-4b14-bff4-631559d92259</id>
<lastReplicationRunTime>2013-06-16T22:00:04.918-05:00</lastReplicationRunTime>
<name>pstueck-mysql daily</name>
<nextReplicationRunTime>2013-06-17T22:00:00-05:00</nextReplicationRunTime>
<replicationScheduleFrequencyType>DAILY</replicationScheduleFrequencyType>
<state>ALERT</state>
<warnings>Policy last completed with alerts</warnings>
</return>
</ns1:getPoliciesResponse>
</soap:Body>
</soap:Envelope>
But I have a large # of 'return' sections that get displayed back. I'm trying to use the .search at the end of string. I'm only wanting it to return the entire 'return' section for a given 'name'. Anyone have any tips?
Current Code:
client = Savon::Client.new do
http.auth.basic "#{opts['api_username']}", "#{opts['api_password']}"
wsdl.document = "#{opts['api_url']}/Policy?wsdl"
end
getPolicyInformation = client.request :getPolicies
getPolicyInformation = Nokogiri::XML(getPolicyInformation.to_xml)
print getPolicyInformation
I'm wanting to return everything in the <return> section if I search for a specified <name>. Example: I only want to see the information relating to <name>pstueck-passenger ondemand</name>, but the entire <return> section that contains that.

You can use XPath to identify a node with a particular value and then specify that an ancestor element is of interest by doing something like the following:
require 'nokogiri'
document = <<-XML
<?xml version="1.0"?>
<soap:Envelope xmlns:soap="http://schemas.xmlsoap.org/soap/envelope/">
<soap:Body>
<ns1:getPoliciesResponse xmlns:ns1="http://policy.api.control.r1soft.com/">
<return>
<CDPId>XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXx</CDPId>
<description/>
<diskSafeID>bcb68765-a719-4291-912d-2e6af485ea24</diskSafeID>
<enabled>true</enabled>
<id>cdb65427-d6f4-4a89-9f77-8763e22dc74b</id>
<lastReplicationRunTime>2013-06-12T13:29:40.105-05:00</lastReplicationRunTime>
<name>pstueck-passenger ondemand</name>
<replicationScheduleFrequencyType>ON_DEMAND</replicationScheduleFrequencyType>
<state>OK</state>
</return>
<return>
<CDPId>XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXx</CDPId>
<description/>
<diskSafeID>e8e13555-f577-40d2-99c8-fa8a019d3b55</diskSafeID>
<enabled>true</enabled>
<id>7f55f8d6-92a9-4b14-bff4-631559d92259</id>
<lastReplicationRunTime>2013-06-16T22:00:04.918-05:00</lastReplicationRunTime>
<name>pstueck-mysql daily</name>
<nextReplicationRunTime>2013-06-17T22:00:00-05:00</nextReplicationRunTime>
<replicationScheduleFrequencyType>DAILY</replicationScheduleFrequencyType>
<state>ALERT</state>
<warnings>Policy last completed with alerts</warnings>
</return>
</ns1:getPoliciesResponse>
</soap:Body>
</soap:Envelope>
XML
doc = Nokogiri::XML(document)
ns = { 'soap' => 'http://schemas.xmlsoap.org/soap/envelope/', 'ns1' => "http://policy.api.control.r1soft.com/" }
ret = doc.xpath('/soap:Envelope/soap:Body/ns1:getPoliciesResponse/return/name[text()="pstueck-passenger ondemand"]/ancestor::return', ns)
puts ret.count
puts ret.at('replicationScheduleFrequencyType').text
EDIT
Updated to reflect updated XML body in question. Now handles namespaces.

Using CSS to find the node:
require 'nokogiri'
doc = Nokogiri::XML(<<EOT)
<?xml version="1.0"?>
<soap:Envelope xmlns:soap="http://schemas.xmlsoap.org/soap/envelope/">
<soap:Body>
<ns1:getPoliciesResponse xmlns:ns1="http://policy.api.control.r1soft.com/">
<return>
<CDPId>XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXx</CDPId>
<description/>
<diskSafeID>e8e13555-f577-40d2-99c8-fa8a019d3b55</diskSafeID>
<enabled>true</enabled>
<id>7f55f8d6-92a9-4b14-bff4-631559d92259</id>
<lastReplicationRunTime>2013-06-16T22:00:04.918-05:00</lastReplicationRunTime>
<name>pstueck-mysql daily</name>
<nextReplicationRunTime>2013-06-17T22:00:00-05:00</nextReplicationRunTime>
<replicationScheduleFrequencyType>DAILY</replicationScheduleFrequencyType>
<state>ALERT</state>
<warnings>Policy last completed with alerts</warnings>
</return>
<return>
<CDPId>XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXx</CDPId>
<description/>
<diskSafeID>bcb68765-a719-4291-912d-2e6af485ea24</diskSafeID>
<enabled>true</enabled>
<id>cdb65427-d6f4-4a89-9f77-8763e22dc74b</id>
<lastReplicationRunTime>2013-06-12T13:29:40.105-05:00</lastReplicationRunTime>
<name>pstueck-passenger ondemand</name>
<replicationScheduleFrequencyType>ON_DEMAND</replicationScheduleFrequencyType>
<state>OK</state>
</return>
</ns1:getPoliciesResponse>
</soap:Body>
</soap:Envelope>
EOT
return_tag = doc.at('return name[text()="pstueck-passenger ondemand"]').parent
puts return_tag.to_xml
Which outputs:
<return>
<CDPId>XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXx</CDPId>
<description/>
<diskSafeID>bcb68765-a719-4291-912d-2e6af485ea24</diskSafeID>
<enabled>true</enabled>
<id>cdb65427-d6f4-4a89-9f77-8763e22dc74b</id>
<lastReplicationRunTime>2013-06-12T13:29:40.105-05:00</lastReplicationRunTime>
<name>pstueck-passenger ondemand</name>
<replicationScheduleFrequencyType>ON_DEMAND</replicationScheduleFrequencyType>
<state>OK</state>
</return>
Nokogiri supports both XPath and CSS. I find CSS easier to read.
I used the at method to find the first matching occurrence, and to show that it was the first matching, I swapped the order of the two <return> blocks. at is the same as search(...).first so when you're looking for the first instance of something in a document at is the way to go.
Nokogiri is usually smart enough to know the difference between XPath and CSS selectors, so we can use the generic at and search. If you need to force CSS or XPath parsing because the selector is gender-unspecific, you can use the specific css or xpath or at_css or at_xpath respectively. They're all documented in the Nokogiri::XML::Node docs.
parent is necessary because we want the parent of the selected node, which was <name>. I just slammed it into reverse and backed up a block. That is easier to do in XPath, where we can use .. to point to the parent node.

Related

Read value from XML within another XML: Mule

I am making a SOAP webservice call and I get the below response. I want to read the value in internal XML, the value is 12345684 in 1234684 in the below XML.
I was able to get internal XML using #[xpath3('//:processaResponse /return[2]')], store it in a flow variable and #[xpath3('/AckReg/DataArea/PRegistration/PRDet/Person/IDSet/:ID[#schemeName="aid"]/text()')].
This works when I try an online parser, but it doesn't read the value in Mule.
Is there any way to extract 1234684 in oa:ID tag using one XPath.
<soapenv:Envelope xmlns:soapenv="http://schemas.xmlsoap.org/soap/envelope/">
<soapenv:Header>
<ns3:TXID xmlns:ns3="http://a.d.r.test.com/"></ns3:TXID>
<ns3:SESSIONID xmlns:ns3="http://a.d.r.test.com/"></ns3:SESSIONID>
</soapenv:Header>
<soapenv:Body>
<ns3:processaResponse xmlns:ns3="http://a.d.r.test.com/" xmlns:ns2="http://p.r.test.com/">
<return>Hi</return>
<return>
<?xml version="1.0" encoding="UTF-8"?>
<AckReg
xmlns="http://www.test.com/e/1" languageCode="en-US" releaseID="normalizedString" systemEnvironmentCode="test" versionID="normalizedString"
xmlns:oa="www.test.com/r/9"
xsi:schemaLocation="http://www.test.com/a/1 ../test/test.xsd">
<Apa>
<oa:CreationDateTime>2018-04-05</oa:CreationDateTime>
</Apa>
<DataArea>
<Ack>
<OArea>
<o:Sender>
<o:LID schemeAgencyName="testi" schemeName="Application ID">test</o:LID>
</o:Sender>
</OArea>
<OriginalActionVerb/>
</Ack>
<PRegistration>
<testids>
<IDSet schemeAgencyName="try">
<oa:ID schemeName="abcid">1234684</oa:ID>
</IDSet>
</testids>
<PRDet>
<Person>
<IDSet schemeAgencyName="try">
<oa:ID schemeName="aid">1364561</oa:ID>
</IDSet>
<IDSet schemeAgencyName="enada">
<oa:ID schemeName="Employee ID">adsad</oa:ID>
</IDSet>
</Person>
<User>
<oa:ID/>
</User>
</PRDet>
</PRegistration>
</DataArea>
</AckReg>
</return>
</ns3:processaResponse>
</soapenv:Body>
</soapenv:Envelope>
In your expressions you were missing namespace prefixes or namespace wildcards *: on some nodes - so your expressions failed.
Is there any way to extract 1234684 in oa:ID tag using one XPath.
Combining both of your partial expressions is possible with namespace wildcards:
//*:processaResponse/return[2]/*:AckReg/*:DataArea/*:PRegistration/*:testids/*:IDSet/*:ID[#schemeName='abcid']/text()
Or you can use an absolute path with namespace wildcards:
/*:Envelope/*:Body/*:processaResponse/return[2]/*:AckReg/*:DataArea/*:PRegistration/*:testids/*:IDSet/*:ID[#schemeName='abcid']/text()
Output in both cases:
1234684
You can even use XmlSlurper class using groovy script to fetch that respective value.
root = new XmlSlurper( false, true).parseText(payload).declareNamespace('soapenv':"http://schemas.xmlsoap.org/soap/envelope/")

Parsing out contents of XML tag in Ruby

I have an XML, that as I understand it has already been parsed by tags. My goal is to parse all the information that is in the <GetResidentsContactInfoResult> tag. In this tag of the sample xml below there are two records in here which begin each with the Lease PropertyId key. How can I iterate over the <GetResidentsContactInfoResult> tag and print out the key/value pairs for each record? I'm new to Ruby and working with XML files, is this something I can do with Nokogiri?
<?xml version="1.0" encoding="UTF-8"?>
<soap:Envelope xmlns:soap="http://www.w3.org/2003/05/soap-envelope" xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<soap:Body>
<GetResidentsContactInfoResponse xmlns="http://tempuri.org/">
<GetResidentsContactInfoResult><PropertyResidents><Lease PropertyId="21M" BldgID="00" UnitID="0903" ResiID="3" occustatuscode="P" occustatuscodedescription="Previous" MoveInDate="2016-01-07T00:00:00" MoveOutDate="2016-02-06T00:00:00" LeaseBeginDate="2016-01-07T00:00:00" LeaseEndDate="2017-01-31T00:00:00" MktgSource="DBY" PrimaryEmail="noemail1#fake.com"><Occupant PropertyId="21M" BldgID="00" UnitID="0903" ResiID="3" OccuSeqNo="3444755" OccuFirstName="Efren" OccuLastName="Cerda" Phone2No="(832) 693-9448" ResponsibleFlag="Responsible" /></Lease><Lease PropertyId="21M" BldgID="00" UnitID="0908" ResiID="2" occustatuscode="P" occustatuscodedescription="Previous" MoveInDate="2016-02-20T00:00:00" MoveOutDate="2016-04-25T00:00:00" LeaseBeginDate="2016-02-20T00:00:00" LeaseEndDate="2017-02-28T00:00:00" MktgSource="PW" PrimaryEmail="noemail1#fake.com"><Occupant PropertyId="21M" BldgID="00" UnitID="0908" ResiID="2" OccuSeqNo="3451301" OccuFirstName="Donna" OccuLastName="Mclean" Phone2No="(713) 785-4240" ResponsibleFlag="Responsible" /></Lease></PropertyResidents></GetResidentsContactInfoResult>
</GetResidentsContactInfoResponse>
</soap:Body>
</soap:Envelope>
This uses Nokogiri to find all the GetResidentsContactInfoResponse elements, and then Active Support to convert the inner text to a hash of key-value pairs.
Read "sparklemotion/nokogiri" and "Tutorials" regarding installing and using Nokogiri.
Read "Active Support Core Extensions" about more capabilities of Active Support (though the guide does not include Hash.from_xml). To install it simply do gem install activesupport.
I assume you're fine with Nokogiri as you mentioned it in your question.
If you don't want to use Active Support, consider looking into "Convert a Nokogiri document to a Ruby Hash" as an alternative to the line Hash.from_xml(elm.text):
# Needed in order to use the `Hash.from_xml`
require 'active_support/core_ext/hash/conversions'
def find_key_values(str)
doc = Nokogiri::XML(str)
# Ignore namespaces for easier traversal
doc.remove_namespaces!
doc.css('GetResidentsContactInfoResponse').map do |elm|
Hash.from_xml(elm.text)
end
end
Usage:
# Option 1: if your XML above is stored in a variable called `string`
find_key_values string
# Option 2: if your XML above is stored in a file
find_key_values File.open('/path/to/file')
Which returns:
[{"PropertyResidents"=>
{"Lease"=>
[{"PropertyId"=>"21M",
"BldgID"=>"00",
"UnitID"=>"0903",
"ResiID"=>"3",
"occustatuscode"=>"P",
"occustatuscodedescription"=>"Previous",
"MoveInDate"=>"2016-01-07T00:00:00",
"MoveOutDate"=>"2016-02-06T00:00:00",
"LeaseBeginDate"=>"2016-01-07T00:00:00",
"LeaseEndDate"=>"2017-01-31T00:00:00",
"MktgSource"=>"DBY",
"PrimaryEmail"=>"noemail1#fake.com",
"Occupant"=>
{"PropertyId"=>"21M",
"BldgID"=>"00",
"UnitID"=>"0903",
"ResiID"=>"3",
"OccuSeqNo"=>"3444755",
"OccuFirstName"=>"Efren",
"OccuLastName"=>"Cerda",
"Phone2No"=>"(832) 693-9448",
"ResponsibleFlag"=>"Responsible"}},
{"PropertyId"=>"21M",
"BldgID"=>"00",
"UnitID"=>"0908",
"ResiID"=>"2",
"occustatuscode"=>"P",
"occustatuscodedescription"=>"Previous",
"MoveInDate"=>"2016-02-20T00:00:00",
"MoveOutDate"=>"2016-04-25T00:00:00",
"LeaseBeginDate"=>"2016-02-20T00:00:00",
"LeaseEndDate"=>"2017-02-28T00:00:00",
"MktgSource"=>"PW",
"PrimaryEmail"=>"noemail1#fake.com",
"Occupant"=>
{"PropertyId"=>"21M",
"BldgID"=>"00",
"UnitID"=>"0908",
"ResiID"=>"2",
"OccuSeqNo"=>"3451301",
"OccuFirstName"=>"Donna",
"OccuLastName"=>"Mclean",
"Phone2No"=>"(713) 785-4240",
"ResponsibleFlag"=>"Responsible"}}]}}]

Adding new child nodes to an existing XML file with Ruby & Nokogiri

I have this server project in Ruby, and I would like to keep tracks of events and user sessions in a XML file. I'm totally new to this, and after days of research, I'm hitting a wall.
Here's my current sample code, assuming there's already a file named "test.xml" that contains a root node called
$ cat test.xml
<server></server>
and the code :
require 'nokogiri'
require 'securerandom'
logintime = Time.now
sessionid = SecureRandom.hex(10)
file = File.open("test.xml",'a+')
doc = Nokogiri::XML.parse file
session_node = Nokogiri::XML::Node.new("session",doc)
session_node['id'] = sessionid
logintime_node = Nokogiri::XML::Node.new("logintime",doc)
logintime_node.content = logintime
session_node << logintime_node
doc.root << session_node
file.print doc.to_xml
file.close
and here's the test.xml file after 4 runs
<server></server>
<?xml version="1.0"?>
<server>
<session id="5ef27ade2afaf5c2162f">
<logintime>2015-07-07 17:27:20 +0200</logintime>
</session>
</server>
<?xml version="1.0"?>
<server>
<session id="637595bd0857c8af1cc0">
<logintime>2015-07-07 17:27:36 +0200</logintime>
</session>
</server>
<?xml version="1.0"?>
<?xml version="1.0"?>
<server>
<session id="41e6082c4db7d1dc8692">
<logintime>2015-07-07 17:27:37 +0200</logintime>
</session>
</server>
<?xml version="1.0"?>
<?xml version="1.0"?>
<server>
<session id="1cad6c3d38d4fb96632b">
<logintime>2015-07-07 17:27:38 +0200</logintime>
</session>
</server>
<?xml version="1.0"?>
And the desired output should be something like this :
<?xml version="1.0"?>
<server>
<session id="5ef27ade2afaf5c2162f">
<logintime>2015-07-07 17:27:20 +0200</logintime>
</session>
<session id="637595bd0857c8af1cc0">
<logintime>2015-07-07 17:27:36 +0200</logintime>
</session>
<session id="41e6082c4db7d1dc8692">
<logintime>2015-07-07 17:27:37 +0200</logintime>
</session>
<session id="1cad6c3d38d4fb96632b">
<logintime>2015-07-07 17:27:38 +0200</logintime>
</session>
</server>
And I really don't know why should I do to obtain that result.
First, if there's no existing file containing the root node, the script run only once, then complains that there's already a root node when I try to run it a second time :
/System/Library/Frameworks/Ruby.framework/Versions/2.0/usr/lib/ruby/gems/2.0.0/gems/nokogiri-1.5.6/lib/nokogiri/xml/document.rb:232:in `add_child': Document already has a root node (RuntimeError)
from /Users/xxx/nokogiri.rb:13:in `<top (required)>'
from -e:1:in `load'
from -e:1:in `<main>'
So... I'm kinda lost here. Any ideas ?
The problem is you're opening your file in append mode with File.open('test.xml', 'a+') and then writing the entire XML doc to it with file.print doc.to_xml. That's why you end up with the entire document written several times into the file.
If you read and write the file independently, the XML doc will replace the file the way you want. If you need to handle the file not existing yet, you can also check for it and initialize the data with your <server> root tag.
require 'nokogiri'
require 'securerandom'
logintime = Time.now
sessionid = SecureRandom.hex(10)
# Read or initialize the data
if File.exist?('test.xml')
data = File.read("test.xml")
else
data = '<server></server>'
end
doc = Nokogiri::XML.parse data
session_node = Nokogiri::XML::Node.new("session",doc)
session_node['id'] = sessionid
logintime_node = Nokogiri::XML::Node.new("logintime",doc)
logintime_node.content = logintime
session_node << logintime_node
doc.root << session_node
# Write the document to disk
File.open('test.xml', 'w') do |file|
file.print doc.to_xml
end
I wouldn't recommend logging sessions this way for long. At any significant user load, writing the file will become very expensive. Also, if you have multiple servers running, they'll all be clobbering the file out from under one another. When you get to that point, you should at least convert your storage to a database, or even better use something like an ELK Stack that's built for this.

Why do I get errors with XML modified using Nokogiri?

I am having problems understanding Net::HTTP and Nokogiri.
I have a large number of jobs on my Jenkins server. I have to periodically update the branch name on these jobs. Doing it from the UI is a cumbersome process so I decided to update the Jenkins config.xml.
I use Nokogiri to parse the XML, traverse the XPath and update the value of the node. However, when I try to post the updated XML back to Jenkins, I get a 500 error saying:
Caused by: javax.xml.transform.TransformerException: org.xml.sax.SAXParseExceptionpublicId: -//W3C//DTD HTML 4.0 Transitional//EN; systemId: http://www.w3.org/TR/REC-html40/loose.dtd; lineNumber: 31; columnNumber: 3; The declaration for the entity "HTML.Version" must end with '>'.
Here is what I am doing:
require "net/http"
require "nokogiri"
uri = URI.parse("http://jenkins.my.domain.web:8080")
http = Net::HTTP.new(uri.host, uri.port)
getQueueRequest = Net::HTTP::Get.new("http://jenkins.my.domain.web:8080/my/job/location/config.xml")
getQueue = http.request(getQueueRequest)
xml_doc = Nokogiri::HTML(getQueue.body)
# Get current branch name
branch_name=xml_doc.at_xpath('//hudson.plugins.git.branchspec/name')
# Get new branch name
print "Enter new branch name "
user_input = gets.chomp
new_branch_name = user_input.downcase
# Set branch name and create xml
branch_name.content=new_branch_name
new_config_xml=xml_doc.to_xml
puts "Logging into Jenkins"
update_branch = Net::HTTP::Post.new("http://jenkins.my.domain.web:8080/my/job/location/config.xml")
update_branch.basic_auth 'username', 'password'
update_branch.body = new_config_xml
response = http.request(update_branch)
puts response.body
I understand it might have to do something with the XML that is getting added to request body but I am not sure how to fix the issue.
Original XML:
<?xml version='1.0' encoding='UTF-8'?>
<maven2-moduleset plugin="maven-plugin#1.504">
<actions/>
<description></description>
<keepDependencies>false</keepDependencies>
<properties>
<hudson.plugins.throttleconcurrents.ThrottleJobProperty plugin="throttle-concurrents#1.7.2">
<maxConcurrentPerNode>0</maxConcurrentPerNode>
<maxConcurrentTotal>0</maxConcurrentTotal>
<categories/>
<throttleEnabled>false</throttleEnabled>
<throttleOption>project</throttleOption>
<configVersion>1</configVersion>
</hudson.plugins.throttleconcurrents.ThrottleJobProperty>
</properties>
<scm class="hudson.plugins.git.GitSCM" plugin="git#1.4.0">
<configVersion>2</configVersion>
<userRemoteConfigs>
<hudson.plugins.git.UserRemoteConfig>
<name></name>
<refspec></refspec>
<url>git#github.com:<ORG_NAME>/<REPO_NAME>.git</url>
</hudson.plugins.git.UserRemoteConfig>
</userRemoteConfigs>
<branches>
<hudson.plugins.git.BranchSpec>
<name>release</name>
</hudson.plugins.git.BranchSpec>
</branches>
<disableSubmodules>false</disableSubmodules>
<recursiveSubmodules>false</recursiveSubmodules>
<doGenerateSubmoduleConfigurations>false</doGenerateSubmoduleConfigurations>
<authorOrCommitter>false</authorOrCommitter>
<clean>false</clean>
<wipeOutWorkspace>false</wipeOutWorkspace>
<pruneBranches>false</pruneBranches>
<remotePoll>false</remotePoll>
<ignoreNotifyCommit>false</ignoreNotifyCommit>
<useShallowClone>false</useShallowClone>
<buildChooser class="hudson.plugins.git.util.DefaultBuildChooser"/>
<gitTool>Default</gitTool>
<submoduleCfg class="list"/>
<relativeTargetDir></relativeTargetDir>
<reference></reference>
<excludedRegions></excludedRegions>
<excludedUsers></excludedUsers>
<gitConfigName></gitConfigName>
<gitConfigEmail></gitConfigEmail>
<skipTag>false</skipTag>
<includedRegions></includedRegions>
<scmName></scmName>
</scm>
<canRoam>true</canRoam>
<disabled>false</disabled>
<blockBuildWhenDownstreamBuilding>false</blockBuildWhenDownstreamBuilding>
<blockBuildWhenUpstreamBuilding>false</blockBuildWhenUpstreamBuilding>
<triggers class="vector">
<hudson.triggers.TimerTrigger>
<spec>0 22 * * 4</spec>
</hudson.triggers.TimerTrigger>
</triggers>
<concurrentBuild>false</concurrentBuild>
<rootModule>
<groupId>com.org.project.test</groupId>
<artifactId>functest</artifactId>
</rootModule>
<goals>clean verify -Dtestsuite=<test_suite_name> -Dbrowser=chrome -Dipaddress=http://<IP_ADDRESS>:4444/wd/hub</goals>
<mavenName>apache-maven-3.0.4</mavenName>
<aggregatorStyleBuild>true</aggregatorStyleBuild>
<incrementalBuild>false</incrementalBuild>
<perModuleEmail>true</perModuleEmail>
<ignoreUpstremChanges>false</ignoreUpstremChanges>
<archivingDisabled>false</archivingDisabled>
<resolveDependencies>false</resolveDependencies>
<processPlugins>false</processPlugins>
<mavenValidationLevel>-1</mavenValidationLevel>
<runHeadless>false</runHeadless>
<disableTriggerDownstreamProjects>false</disableTriggerDownstreamProjects>
<settings class="jenkins.mvn.DefaultSettingsProvider"/>
<globalSettings class="jenkins.mvn.DefaultGlobalSettingsProvider"/>
<reporters/>
<publishers/>
<buildWrappers/>
<prebuilders/>
<postbuilders/>
<runPostStepsIfResult>
<name>FAILURE</name>
<ordinal>2</ordinal>
<color>RED</color>
</runPostStepsIfResult>
</maven2-moduleset>
After Editing and Massaging:
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd">
<?xml version="1.0" encoding="UTF-8"?>
<html>
<body>
<maven2-moduleset plugin="maven-plugin#1.504">
<actions />
<description />
<keepdependencies>false</keepdependencies>
<properties>
<hudson.plugins.throttleconcurrents.throttlejobproperty plugin="throttle-concurrents#1.7.2">
<maxconcurrentpernode>0</maxconcurrentpernode>
<maxconcurrenttotal>0</maxconcurrenttotal>
<categories />
<throttleenabled>false</throttleenabled>
<throttleoption>project</throttleoption>
<configversion>1</configversion>
</hudson.plugins.throttleconcurrents.throttlejobproperty>
</properties>
<scm class="hudson.plugins.git.GitSCM" plugin="git#1.4.0">
<configversion>2</configversion>
<userremoteconfigs>
<hudson.plugins.git.userremoteconfig>
<name />
<refspec />
<url>git#github.com:<ORG_NAME>/<REPO_NAME>.git</url>
</hudson.plugins.git.userremoteconfig>
</userremoteconfigs>
<branches>
<hudson.plugins.git.branchspec>
<name>master</name>
</hudson.plugins.git.branchspec>
</branches>
<disablesubmodules>false</disablesubmodules>
<recursivesubmodules>false</recursivesubmodules>
<dogeneratesubmoduleconfigurations>false</dogeneratesubmoduleconfigurations>
<authororcommitter>false</authororcommitter>
<clean>false</clean>
<wipeoutworkspace>false</wipeoutworkspace>
<prunebranches>false</prunebranches>
<remotepoll>false</remotepoll>
<ignorenotifycommit>false</ignorenotifycommit>
<useshallowclone>false</useshallowclone>
<buildchooser class="hudson.plugins.git.util.DefaultBuildChooser" />
<gittool>Default</gittool>
<submodulecfg class="list" />
<relativetargetdir />
<reference />
<excludedregions />
<excludedusers />
<gitconfigname />
<gitconfigemail />
<skiptag>false</skiptag>
<includedregions />
<scmname />
</scm>
<canroam>true</canroam>
<disabled>false</disabled>
<blockbuildwhendownstreambuilding>false</blockbuildwhendownstreambuilding>
<blockbuildwhenupstreambuilding>false</blockbuildwhenupstreambuilding>
<triggers class="vector">
<hudson.triggers.timertrigger>
<spec>0 22 * * 4</spec>
</hudson.triggers.timertrigger>
</triggers>
<concurrentbuild>false</concurrentbuild>
<rootmodule>
<groupid>com.org.project.test</groupid>
<artifactid>functest</artifactid>
</rootmodule>
<goals>clean verify -Dtestsuite=<test_suite_name> -Dbrowser=chrome -Dipaddress=http://<IP_ADDRESS>:4444/wd/hub</goals>
<mavenname>apache-maven-3.0.4</mavenname>
<aggregatorstylebuild>true</aggregatorstylebuild>
<incrementalbuild>false</incrementalbuild>
<permoduleemail>true</permoduleemail>
<ignoreupstremchanges>false</ignoreupstremchanges>
<archivingdisabled>false</archivingdisabled>
<resolvedependencies>false</resolvedependencies>
<processplugins>false</processplugins>
<mavenvalidationlevel>-1</mavenvalidationlevel>
<runheadless>false</runheadless>
<disabletriggerdownstreamprojects>false</disabletriggerdownstreamprojects>
<settings class="jenkins.mvn.DefaultSettingsProvider" />
<globalsettings class="jenkins.mvn.DefaultGlobalSettingsProvider" />
<reporters />
<publishers />
<buildwrappers />
<prebuilders />
<postbuilders />
<runpoststepsifresult>
<name>FAILURE</name>
<ordinal>2</ordinal>
<color>RED</color>
</runpoststepsifresult>
</maven2-moduleset>
</body>
</html>
When you use Nokogiri::HTML(some_html) or Nokogiri::XML(some_xml), Nokogiri will look to see if the content is valid. If it isn't, it will do fix-ups on the content in an attempt to make it so. For instance:
require 'nokogiri'
html_fragment = "<p>foo bar</p>"
Nokogiri::HTML(html_fragment).to_html
# => "<!DOCTYPE html PUBLIC \"-//W3C//DTD HTML 4.0 Transitional//EN\" \"http://www.w3.org/TR/REC-html40/loose.dtd\">\n<html><body><p>foo bar</p></body></html>\n"
If the document is partially correct Nokogiri still adds the DOCTYPE statement:
html = "<html><body><p>foo bar</p></body></html>"
Nokogiri::HTML(html).to_html
# => "<!DOCTYPE html PUBLIC \"-//W3C//DTD HTML 4.0 Transitional//EN\" \"http://www.w3.org/TR/REC-html40/loose.dtd\">\n<html><body><p>foo bar</p></body></html>\n"
If you want Nokogiri to leave the document along, because it's supposed to be a fragment, tell it to do so:
Nokogiri::HTML::DocumentFragment.parse(html_fragment).to_html
# => "<p>foo bar</p>"
Or:
xml_fragment = "<x>foo bar</x>"
Nokogiri::XML::DocumentFragment.parse(xml_fragment).to_xml
# => "<x>foo bar</x>"
Nokogiri is pretty smart about handling XML and HTML. You can try to confuse it and it'll generally do the right thing:
xml_fragment = "<x>foo bar</x>"
Nokogiri::HTML::DocumentFragment.parse(xml_fragment).to_xml
# => "<x>foo bar</x>"
That's parsing XML as an HTML fragment and telling it to emit it as XML.
Now, that all said, it's pretty obvious Nokogiri isn't doing anything mysterious, so, here's how to fix the problem. First, parse it as XML so Nokogiri doesn't think it should add the HTML DOCTYPE declaration, then, if the XML is syntactically correct, tell Nokogiri it's OK to parse it as a complete document:
require 'nokogiri'
xml = %{<?xml version='1.0' encoding='UTF-8'?>
<maven2-moduleset plugin="maven-plugin#1.504">
<actions/>
<description></description>
<keepDependencies>false</keepDependencies>
<properties>
<hudson.plugins.throttleconcurrents.ThrottleJobProperty plugin="throttle-concurrents#1.7.2">
</hudson.plugins.throttleconcurrents.ThrottleJobProperty>
</properties>
</maven2-moduleset>
}
puts Nokogiri::XML.parse(xml).to_xml
# >> <?xml version="1.0" encoding="UTF-8"?>
# >> <maven2-moduleset plugin="maven-plugin#1.504">
# >> <actions/>
# >> <description/>
# >> <keepDependencies>false</keepDependencies>
# >> <properties>
# >> <hudson.plugins.throttleconcurrents.ThrottleJobProperty plugin="throttle-concurrents#1.7.2">
# >> </hudson.plugins.throttleconcurrents.ThrottleJobProperty>
# >> </properties>
# >> </maven2-moduleset>
Or as a fragment, which, because it's complete, will result in the same thing:
puts Nokogiri::XML::DocumentFragment.parse(xml).to_xml
# >> <?xml version='1.0' encoding='UTF-8'?>
# >> <maven2-moduleset plugin="maven-plugin#1.504">
# >> <actions/>
# >> <description/>
# >> <keepDependencies>false</keepDependencies>
# >> <properties>
# >> <hudson.plugins.throttleconcurrents.ThrottleJobProperty plugin="throttle-concurrents#1.7.2">
# >> </hudson.plugins.throttleconcurrents.ThrottleJobProperty>
# >> </properties>
# >> </maven2-moduleset>
Instead of using Net::HTTP, which is the bare-building blocks for HTTP, I'd recommend looking at something a bit higher-level, like HTTPClient. Here's code that is similar to yours:
require 'httpclient'
require 'nokogiri'
URL = 'http://jenkins.my.domain.web:8080/my/job/location/config.xml'
http_client = HTTPClient.new
xml_doc = Nokogiri::HTML(
http_client.get_content(URL)
)
# Get current branch name using CSS for simplicity:
branch_name = xml_doc.at('hudson.plugins.git.branchspec name')
# Get new branch name
print 'Enter new branch name '
new_branch_name = gets.chomp.downcase
# Set branch name and create xml
branch_name.content = new_branch_name
puts 'Logging into Jenkins'
http_client.set_auth(domain, 'user', 'password')
response = http_client.post(URL, :body => xml_doc.to_xml)
I can't test it but it looks close.
I, now, find myself in another dilemma. I am seeing that the methods which allow moving to elements and editing values like at_xpath, at_css only work with Nokogiri::HTML or Nokogiri::HTML::DocumentFragment. They don't work when I use Nokogiri::XML. Using Nokogiri::HTML changes the case of the HTML tags. false becomes false. Jenkins does accept the xml with changed case of tags. Methods to_html, to_xml basically returns a string so I cannot use the xpath or css methods to navigate the xml tree. Is there a way around ?
The at methods work with both XML and HTML, and allows CSS and XPath selectors; Everything inside Nokogiri is really XML-based.
Nokogiri folds HTML tags to lower-case because HTML is case-insensitive, so at expects a lower-case value when dealing with HTML. XML is case-sensitive, so Nokogiri leaves the tag case alone, and at requires you to use the correct case when using CSS.
This is documented in the Nokogiri docs:
Note that the CSS query string is case-sensitive with regards to your document type. That is, if you’re looking for “H1” in an HTML document, you’ll never find anything, since HTML tags will match only lowercase CSS queries. However, “H1” might be found in an XML document, where tags names are case-sensitive (e.g., “H1” is distinct from “h1”).
When you are parsing the XML you are receiving from the service, you are declaring it as HTML:
xml_doc = Nokogiri::HTML(getQueue.body)
And this appears to cause Nokogiri to add HTML nodes.
Try parsing it as XML instead:
xml_doc = Nokogiri::XML(getQueue.body)

Read an XML node value using Linq

I have what should be a fairly simple requirement.
I want to use LINQ to XML in a VS2010 project to get the following node values:
GPO->Name
GPO->LinksTo->SOMName
GPO->LinksTo->SOMPath
Then:
If GPO->User->ExtensionDataCount node exists, I want to return count of all child nodes
Likewise:
If GPO->Computer->ExtensionData exists, return count of all child nodes
here is the XML, those of you familiar with Group Policy exports will have seen this before:
<?xml version="1.0" encoding="utf-16"?>
<GPO xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns="http://www.microsoft.com/GroupPolicy/Settings">
<Identifier>
<Domain xmlns="http://www.microsoft.com/GroupPolicy/Types">our.domain.fqdn</Domain>
</Identifier>
<Name>The Name I want to get</Name>
<Computer>
<VersionDirectory>1</VersionDirectory>
<VersionSysvol>1</VersionSysvol>
<Enabled>true</Enabled>
</Computer>
<User>
<VersionDirectory>4</VersionDirectory>
<VersionSysvol>4</VersionSysvol>
<Enabled>true</Enabled>
<ExtensionData>
<Extension xmlns:q1="http://www.microsoft.com/GroupPolicy/Settings/Scripts" xsi:type="q1:Scripts">
<q1:Script>
<q1:Command>Logon.cmd</q1:Command>
<q1:Type>Logon</q1:Type>
<q1:Order>0</q1:Order>
<q1:RunOrder>PSNotConfigured</q1:RunOrder>
</q1:Script>
</Extension>
<Name>Scripts</Name>
</ExtensionData>
</User>
<LinksTo>
<SOMName>an interesting data value</SOMName>
<SOMPath>some data value</SOMPath>
<Enabled>true</Enabled>
<NoOverride>false</NoOverride>
</LinksTo>
</GPO>
I have loaded the XML file into an XDocument, then tried to pull the Name value with:
XDoc.Elements("Name").FirstOrDefault.Value
XDoc.Descendants("Name").First().Value
But I'm getting errors: Object reference not set to an instance of an object and then: Sequence contains no elements.
I'm guessing I might have the path wrong, but I thought Descendants didn't require an exact path..
Where am I going wrong?
You have to use the declared namespace:
XNamespace ns = "http://www.microsoft.com/GroupPolicy/Settings";
var firstName = xdoc.Descendants(ns + "Name").First().Value;

Resources