Ruby/Nokogiri nested loop in Nexpose XML results parser failing - ruby

I am trying to write a ruby script to take the Nexpose Simple XML results export, parse it, and write the required results out to a prettier format for easy review. I am using Nokogiri to parse the XML. My issue is that I have a nested loop that for each device, iterates through each service section and pulls out the name, port, and protocol attributes from each one. This will ultimately be printed back out to a file either a text file or a csv. However, my nested loops seems to only pull those three attributes from the first service section and prints them repeatedly.
Sample Input (there will be more than one of these device blocks):
<device address="10.x.x.1" id="20xx">
<fingerprint certainty="0.85">
<description>Microsoft Windows</description>
<vendor>Microsoft</vendor>
<family>Windows</family>
<product>Windows</product>
<version/>
<device-class>General</device-class>
<architecture/>
</fingerprint>
<vulnerabilities>
</vulnerabilities>
<services>
<service name="NTP" port="123" protocol="udp">
<vulnerabilities>
</vulnerabilities>
</service>
<service name="HTTP" port="8080" protocol="tcp">
<fingerprint certainty="0.75">
<description>Apache</description>
</device>
<device address="10.x.x.2" id="20xx">
<fingerprint certainty="0.85">
<description>Microsoft Windows</description>
<vendor>Microsoft</vendor>
<family>Windows</family>
<product>Windows</product>
<version/>
<device-class>General</device-class>
<architecture/>
</fingerprint>
<vulnerabilities>
</vulnerabilities>
<services>
<service name="DNS" port="53" protocol="udp">
<vulnerabilities>
</vulnerabilities>
</service>
<service name="HTTP" port="80" protocol="tcp">
<fingerprint certainty="0.75">
<description>Apache</description>
</device>
Ruby Code:
#! /usr/bin/env ruby
require 'rubygems'
require 'nokogiri'
doc = Nokogiri::XML(open('report.xml').read)
device = doc.xpath('//device')
device.each do |d|
service = d.xpath('//service')
puts d.attr('address')
service.each do |s|
name = s.attr('name')
port = s.attr('port')
protocol = s.attr('protocol')
puts port
puts protocol
puts name
end
end
Desired Output:
10.x.x.1
123
udp
NTP
8080
tcp
HTTP
10.x.x.2
53
udp
DNS
80
tcp
HTTP
Actual Output:
123
NTP
udp
123
NTP
udp
So the code should show a list of service port, name, and protocol for each service of each device. However, the current code seems to just print the set for the first service (which is 123, NTP, and udp) over and over and over.
Am I missing something in the logic of my loop? Or do you see anything wrong with the loops? Any help getting this working would be helpful. Thanks.

Note that the XPath construct // means find the element anywhere in the document. You don't want to do that in the inner loop, because you've already done that for your device.
Update
Based on the new input document, here is one way to extract the information you need. I took the liberty of using CSV, for a nice Excel-ready output file. Note that there is a single parsing loop. Code:
require 'nokogiri'
require 'csv'
doc = Nokogiri::XML(open('report.xml').read)
CSV.open("devices.csv", "wb") do |csv|
csv << ["Device", "Service", "Port", "Protocol"]
doc.search('//service').each do |s|
device = s.xpath('ancestor::device[1]/#address')
name = s.attr('name')
port = s.attr('port')
protocol = s.attr('protocol')
csv << [device, name, port, protocol]
end
end
Here's the contents of devices.csv:
Device,Service,Port,Protocol
10.x.x.1,NTP,123,udp
10.x.x.1,HTTP,8080,tcp
10.x.x.2,DNS,53,udp
10.x.x.2,HTTP,80,tcp

Related

Ruby Savon - Parse XML string

I have exhausted google on this subject and I just can't seem to get it right..
I have the following XML payload returned from Savon:
<?xml version='1.0' encoding='UTF-8'?>
<soapenv:Envelope xmlns:soapenv="http://schemas.xmlsoap.org/soap/envelope/">
<soapenv:Body>
<ns:listGFUsersResponse xmlns:ns="http://ws.fds.com">
<ns:return>
<responseCode>0000</responseCode><responseDescription>No Errors-DWI</responseDescription><user><login>aa1283</login><name>Andrew Alonzo</name><team>DIALER</team><secLev>-1</secLev><maxDiscount>0.00</maxDiscount><phoneSystemId></phoneSystemId></user><user><login>aaronc</login><name>Aaron Callison</name><team></team><secLev>-1</secLev><maxDiscount>0.00</maxDiscount><phoneSystemId></phoneSystemId></user>
</ns:return>
</ns:listGFUsersResponse>
</soapenv:Body>
</soapenv:Envelope>
I would like to parse out ALL values of <name> * </name> and <login> * </login>
A few of my attempts here:
response1 = client1.call(
:list_gf_users,
message: message)
doc = Nokogiri::XML(response1.to_s)
pp doc
p doc.search('/name').text
p doc.search('/login').text
Nothing returned...
doc = Nokogiri::XML(response1.to_s)
value = doc.xpath('/name').map(&:text)
puts value
Nada....
doc = Nokogiri::XML(response1.to_s)
value = doc.xpath('/user[name]').map(&:text)
puts value
Zilch...
would love to be able to see:
name: Andrew Alonzo
login: aa1283
or even better a Hash?
{"aa1283" => "Andrew Alonzo"}
Getting 0 results such as:
""
[]
nil
Figured it out... probably not most efficient but gets the job done:
Convert Savon response to string(can't use scan on Savon output)
doc = response1.to_s
subFile = doc.gsub("<","<") #Replace the string convert characters
Run scan using regex capture groups:
#user = subFile.scan /<user><login>(.+?)<\/login><name>(.*?)<\/name>.+?><\/user>/
In your comments you have
doc = response1.doc
which gives you a Nokogiri document. With that you should be able to do the following:
doc.xpath("//user").each do |user|
login = user.at("login")&.text
name = user.at("name")&.text
puts "#{login}: #{name}"
end
The output is
aa1283: Andrew Alonzo
aaronc: Aaron Callison
I used the XML from your comment:
<root>
<responseCode>0000</responseCode>
<responseDescription>No Errors-DWI</responseDescription>
<user>
<login>aa1283</login>
<name>Andrew Alonzo</name>
<team>DIALER</team>
<secLev>-1</secLev>
<maxDiscount>0.00</maxDiscount>
<phoneSystemId></phoneSystemId>
</user>
<user>
<login>aaronc</login>
<name>Aaron Callison</name>
<team></team>
<secLev>-1</secLev>
<maxDiscount>0.00</maxDiscount>
<phoneSystemId></phoneSystemId>
</user>
</root>
Note that I had to convert this to plaintext. You have some non-printing unicode characters sprinkled throughout the document in seemingly random places (which makes me wonder if that's actually the cause of your problems).

How to use Nokogiri to combine multiple like-formatted XML files into CSV

I want to parse multiple like-formatted XML files into a CSV file.
I searched on Google, nokogiri.org, and on SO but I haven't been able to find an answer.
I have ten XML files in identical format in terms of node/element structure, that reside in the current directory.
After combining the XML files into a single XML file, I need to pull out specific elements of the advisory node. I would like to output the link, title, location, os -> language -> name, and reference -> name data to the CSV file.
My code is only able to parse a single XML document and I'd like it to take into account 1:many:
# Parse the XML file into a Nokogiri::XML::Document object
#doc = Nokogiri::XML(File.open("file.xml"))
# Gather the 5 specific XML elements out of the 'advisory' top-level node
data = #doc.search('advisory').map { |adv|
[
adv.at('link').content,
adv.at('title').content,
adv.at('location').content,
adv.at('os > language > name').content,
adv.at('reference > name').content
]
}
# Loop through each array element in the object and write out as CSV row
CSV.open('output_file.csv', 'wb') do |csv|
# Explicitly set headers until you figure out how to get them programatically
csv << ['Link', 'Title', 'Location', 'OS Name', 'Reference Name']
data.each do |row|
csv << row
end
end
I tried changing the code to support multiple XML files and get them into Nokogiri::XML::Document objects:
xml_docs = []
Dir.glob("*.xml").each do |file|
xml = Nokogiri::XML(File.new(file))
xml_docs << Nokogiri::XML::Document.new(xml)
end
This successfully creates an array xml_docs with the correct objects it in, but I don't know how to convert these six objects into a single object.
This is sample XML. All XML files use the same node/element structure:
<advisories>
<title> Not relevant </title>
<customer> N/A </customer>
<advisory id="12345">
<link> https://www.google.com </link>
<release_date>2016-04-07</release_date>
<title> The Short Description Would Go Here </title>
<location> Location Name Here </location>
<os>
<product>
<id>98765</id>
<name>Product Name</name>
</product>
<language>
<id>123</id>
<name>en</name>
</language>
</os>
<reference>
<id>00029</id>
<name>Full</name>
<area>Not Defined</area>
</reference>
</advisory>
<advisory id="98765">
<link> https://www.msn.com </link>
<release_date>2016-04-08</release_date>
<title> The Short Description Would Go Here </title>
<location> Location Name Here </location>
<os>
<product>
<id>12654</id>
<name>Product Name</name>
</product>
<language>
<id>126</id>
<name>fr</name>
</language>
</os>
<reference>
<id>00052</id>
<name>Partial</name>
<area>Defined</area>
</reference>
</advisory>
</advisories>
The code leverages Nokogiri::XML::Document but if Nokogiri::XML::Builder will work better for this, I am more than willing to adjust my code accordingly.
I'd handle the first part, of parsing one XML file, like this:
require 'nokogiri'
doc = Nokogiri::XML(<<EOT)
<advisories>
<advisory id="12345">
<link> https://www.google.com </link>
<title> The Short Description Would Go Here </title>
<location> Location Name Here </location>
<os>
<language>
<name>en</name>
</language>
</os>
<reference>
<name>Full</name>
</reference>
</advisory>
<advisory id="98765">
<link> https://www.msn.com </link>
<release_date>2016-04-08</release_date>
<title> The Short Description Would Go Here </title>
<location> Location Name Here </location>
<os>
<language>
<name>fr</name>
</language>
</os>
<reference>
<name>Partial</name>
</reference>
</advisory>
</advisories>
EOT
Note: This has nodes removed because they weren't important to the question. Please remove fluff when asking as it's distracting.
With this being the core of the code:
doc.search('advisory').map{ |advisory|
link = advisory.at('link').text
title = advisory.at('title').text
location = advisory.at('location').text
os_language_name = advisory.at('os > language > name').text
reference_name = advisory.at('reference > name').text
{
link: link,
title: title,
location: location,
os_language_name: os_language_name,
reference_name: reference_name
}
}
That could be DRY'd but was written as an example of what to do.
Running that results in an array of hashes, which would be easily output via CSV:
# => [
{:link=>" https://www.google.com ", :title=>" The Short Description Would Go Here ", :location=>" Location Name Here ", :os_language_name=>"en", :reference_name=>"Full"},
{:link=>" https://www.msn.com ", :title=>" The Short Description Would Go Here ", :location=>" Location Name Here ", :os_language_name=>"fr", :reference_name=>"Partial"}
]
Once you've got that working then fit it into a modified version of your loops to output CSV and read the XML files. This is untested but looks about right:
CSV.open('output_file.csv', 'w',
headers: ['Link', 'Title', 'Location', 'OS Name', 'Reference Name'],
write_headers: true
) do |csv|
Dir.glob("*.xml").each do |file|
xml = Nokogiri::XML(File.read(file))
# parse a file and get the array of hashes
end
# pass the array of hashes to CSV for output
end
Note that you were using a file mode of 'wb'. You rarely need b with CSV as CSV is supposed to be a text format. If you are sure you will encounter binary data then use 'b' also, but that could lead down a path containing dragons.
Also note that this is using read. read is not scalable, which means it doesn't care how big a file is, it's going to try to read it into memory, whether or not it'll actually fit. There are lots of reasons to avoid that, but the best is it'll take your program to its knees. If your XML files could exceed the available free memory for your system then you'll want to rewrite using a SAX parser, which Nokogiri supports. How to do that is a different question.
it was actually an Array of array of hashes. I'm not sure how I ended up there but I was easily able to use array.flatten
Meditate on this:
foo = [] # => []
foo << [{}] # => [[{}]]
foo.flatten # => [{}]
You probably wanted to do this:
foo = [] # => []
foo += [{}] # => [{}]
Any time I have to use flatten I look to see if I can create the array without it being an array of arrays of something. It's not that they're inherently bad, because sometimes they're very useful, but you really wanted an array of hashes so you knew something was wrong and flatten was a cheap way out, but using it also costs more CPU time. It's better to figure out the problem and fix it and end up with faster/more efficient code. (And some will say that's a wasted effort or is premature optimization, but writing efficient code is a very good trait and goal.)

Parsing out contents of XML tag in Ruby

I have an XML, that as I understand it has already been parsed by tags. My goal is to parse all the information that is in the <GetResidentsContactInfoResult> tag. In this tag of the sample xml below there are two records in here which begin each with the Lease PropertyId key. How can I iterate over the <GetResidentsContactInfoResult> tag and print out the key/value pairs for each record? I'm new to Ruby and working with XML files, is this something I can do with Nokogiri?
<?xml version="1.0" encoding="UTF-8"?>
<soap:Envelope xmlns:soap="http://www.w3.org/2003/05/soap-envelope" xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<soap:Body>
<GetResidentsContactInfoResponse xmlns="http://tempuri.org/">
<GetResidentsContactInfoResult><PropertyResidents><Lease PropertyId="21M" BldgID="00" UnitID="0903" ResiID="3" occustatuscode="P" occustatuscodedescription="Previous" MoveInDate="2016-01-07T00:00:00" MoveOutDate="2016-02-06T00:00:00" LeaseBeginDate="2016-01-07T00:00:00" LeaseEndDate="2017-01-31T00:00:00" MktgSource="DBY" PrimaryEmail="noemail1#fake.com"><Occupant PropertyId="21M" BldgID="00" UnitID="0903" ResiID="3" OccuSeqNo="3444755" OccuFirstName="Efren" OccuLastName="Cerda" Phone2No="(832) 693-9448" ResponsibleFlag="Responsible" /></Lease><Lease PropertyId="21M" BldgID="00" UnitID="0908" ResiID="2" occustatuscode="P" occustatuscodedescription="Previous" MoveInDate="2016-02-20T00:00:00" MoveOutDate="2016-04-25T00:00:00" LeaseBeginDate="2016-02-20T00:00:00" LeaseEndDate="2017-02-28T00:00:00" MktgSource="PW" PrimaryEmail="noemail1#fake.com"><Occupant PropertyId="21M" BldgID="00" UnitID="0908" ResiID="2" OccuSeqNo="3451301" OccuFirstName="Donna" OccuLastName="Mclean" Phone2No="(713) 785-4240" ResponsibleFlag="Responsible" /></Lease></PropertyResidents></GetResidentsContactInfoResult>
</GetResidentsContactInfoResponse>
</soap:Body>
</soap:Envelope>
This uses Nokogiri to find all the GetResidentsContactInfoResponse elements, and then Active Support to convert the inner text to a hash of key-value pairs.
Read "sparklemotion/nokogiri" and "Tutorials" regarding installing and using Nokogiri.
Read "Active Support Core Extensions" about more capabilities of Active Support (though the guide does not include Hash.from_xml). To install it simply do gem install activesupport.
I assume you're fine with Nokogiri as you mentioned it in your question.
If you don't want to use Active Support, consider looking into "Convert a Nokogiri document to a Ruby Hash" as an alternative to the line Hash.from_xml(elm.text):
# Needed in order to use the `Hash.from_xml`
require 'active_support/core_ext/hash/conversions'
def find_key_values(str)
doc = Nokogiri::XML(str)
# Ignore namespaces for easier traversal
doc.remove_namespaces!
doc.css('GetResidentsContactInfoResponse').map do |elm|
Hash.from_xml(elm.text)
end
end
Usage:
# Option 1: if your XML above is stored in a variable called `string`
find_key_values string
# Option 2: if your XML above is stored in a file
find_key_values File.open('/path/to/file')
Which returns:
[{"PropertyResidents"=>
{"Lease"=>
[{"PropertyId"=>"21M",
"BldgID"=>"00",
"UnitID"=>"0903",
"ResiID"=>"3",
"occustatuscode"=>"P",
"occustatuscodedescription"=>"Previous",
"MoveInDate"=>"2016-01-07T00:00:00",
"MoveOutDate"=>"2016-02-06T00:00:00",
"LeaseBeginDate"=>"2016-01-07T00:00:00",
"LeaseEndDate"=>"2017-01-31T00:00:00",
"MktgSource"=>"DBY",
"PrimaryEmail"=>"noemail1#fake.com",
"Occupant"=>
{"PropertyId"=>"21M",
"BldgID"=>"00",
"UnitID"=>"0903",
"ResiID"=>"3",
"OccuSeqNo"=>"3444755",
"OccuFirstName"=>"Efren",
"OccuLastName"=>"Cerda",
"Phone2No"=>"(832) 693-9448",
"ResponsibleFlag"=>"Responsible"}},
{"PropertyId"=>"21M",
"BldgID"=>"00",
"UnitID"=>"0908",
"ResiID"=>"2",
"occustatuscode"=>"P",
"occustatuscodedescription"=>"Previous",
"MoveInDate"=>"2016-02-20T00:00:00",
"MoveOutDate"=>"2016-04-25T00:00:00",
"LeaseBeginDate"=>"2016-02-20T00:00:00",
"LeaseEndDate"=>"2017-02-28T00:00:00",
"MktgSource"=>"PW",
"PrimaryEmail"=>"noemail1#fake.com",
"Occupant"=>
{"PropertyId"=>"21M",
"BldgID"=>"00",
"UnitID"=>"0908",
"ResiID"=>"2",
"OccuSeqNo"=>"3451301",
"OccuFirstName"=>"Donna",
"OccuLastName"=>"Mclean",
"Phone2No"=>"(713) 785-4240",
"ResponsibleFlag"=>"Responsible"}}]}}]

Xml formatting using Node

Following is the method used to write an entry to xml file
def write_entry(entry)
node = Nokogiri::XML::Node.new("url", #xml_document)
node["loc"]= entry[:url]
node["lastmod"]= entry[:lastmod].to_s
node["changefreq"] = entry[:frequency].to_s
node["priority"] = entry[:priority].to_s
node.to_xml
end
The entry looks like this:
<urlset>
<url loc="http:`enter code here`//www.experteer.co.uk/vacaturebank/banen/vacatures/xing-ag" lastmod="2011-11-23 16:58:27 UTC" changefreq="0.8" priority="monthly"/>
</urlset>
I want the entry of xml to be like this
<urlset>
<url>
<loc> http://www.experteer.co.uk/vacaturebank/banen/vacatures/xing-ag </loc>
<lastmod> 2011-11-23 16:58:27 UTC </lastmod>
<changefreq> 0.8 </changefreq>
<priority> monthly </priority>
</url>
</urlset>
Is it possible with using Node or I have to use Builder?
If possible with Node Then how?
and If I have to use Builder it writes header for each entry how can I handle that it dont write header for each entry.
you can use << or add_child to append children nodes to a node.
def write_entry(entry)
url = Nokogiri::XML::Node.new( "url" , #xml_document )
%w{loc lastmod changefreq priority}.each do |node|
url << Nokogiri::XML::Node.new( node, #xml_document ).tap do |n|
n.content = entry[ node.to_sym ]
end
end
url.to_xml
end
For this to work correctly, you have to change entry[:url] to entry[:loc]. and entry[:frequency] to entry[:changefreq], which shouldn't be a bad thing (it's best to have the same name for the same thing everywhere, isn't it ?).
Alternatively, if your entry hash only contains what you need to convert to xml, use entry.each do |key,value| instead of the array.

How to replace a particular line in xml with the new one in ruby

I have a requirement where I need to replace the element value with the new one and I dont want any other modification to be done to the file.
<mtn:test-case title='Power-Consist-Message'>
<mtn:messages>
<mtn:message sequence='4' correlation-key='0x0F04'>
<mtn:header>
<mtn:protocol-version>0x4</mtn:protocol-version>
<mtn:message-type>0x0F04</mtn:message-type>
<mtn:message-version>0x01</mtn:message-version>
<mtn:gmt-time-switch>false</mtn:gmt-time-switch>
<mtn:crc-calc-switch>1</mtn:crc-calc-switch>
<mtn:encrypt-switch>false</mtn:encrypt-switch>
<mtn:compress-switch>false</mtn:compress-switch>
<mtn:ttl>999</mtn:ttl>
<mtn:qos-class-of-service>0</mtn:qos-class-of-service>
<mtn:qos-priority>2</mtn:qos-priority>
<mtn:qos-network-preference>1</mtn:qos-network-preference>
this is how the xml file looks like, I want to replace 999 with "some other value", under s section, but when am doing that using formatter in ruby some other unwanted modifications are taking place, the code that am using is as belows
File.open(ENV['CadPath1']+ "conf\\cad-mtn-config.xml") do |config_file|
# Open the document and edit the file
config = Document.new(config_file)
testField=config.root.elements[4].elements[11].elements[1].elements[1].elements[1].elements[11]
if testField.to_s.match(/<mtn:qos-network-preference>/)
test=config.root.elements[4].elements[11].elements[1].elements[1].elements[1].elements[8].text="2"
# Write the result to a new file.
formatter = REXML::Formatters::Default.new
File.open(ENV['CadPath1']+ "conf\\cad-mtn-config.xml", 'w') do |result|
formatter.write(config, result)
end
end
end
when am writting the modifications to the new file, the xml file size is getting changed from 79kb to 78kb, is there any way to just replace the particular line in xml file and save changes without affecting the xml file.
Please let me know soon...
I prefer Nokogiri as my XML/HTML parser of choice:
require 'nokogiri'
xml =<<EOT
<mtn:test-case title='Power-Consist-Message'>
<mtn:messages>
<mtn:message sequence='4' correlation-key='0x0F04'>
<mtn:header>
<mtn:protocol-version>0x4</mtn:protocol-version>
<mtn:message-type>0x0F04</mtn:message-type>
<mtn:message-version>0x01</mtn:message-version>
<mtn:gmt-time-switch>false</mtn:gmt-time-switch>
<mtn:crc-calc-switch>1</mtn:crc-calc-switch>
<mtn:encrypt-switch>false</mtn:encrypt-switch>
<mtn:compress-switch>false</mtn:compress-switch>
<mtn:ttl>999</mtn:ttl>
<mtn:qos-class-of-service>0</mtn:qos-class-of-service>
<mtn:qos-priority>2</mtn:qos-priority>
<mtn:qos-network-preference>1</mtn:qos-network-preference>
EOT
Notice that the XML is malformed, i.e., it doesn't terminate correctly.
doc = Nokogiri::XML(xml)
I'm using CSS accessors to find the ttl node. Because of some magic, Nokogiri's CSS ignores XML name spaces, simplifying finding nodes.
doc.at('ttl').content = '1000'
puts doc.to_xml
# >> <?xml version="1.0"?>
# >> <test-case title="Power-Consist-Message">
# >> <messages>
# >> <message sequence="4" correlation-key="0x0F04">
# >> <header>
# >> <protocol-version>0x4</protocol-version>
# >> <message-type>0x0F04</message-type>
# >> <message-version>0x01</message-version>
# >> <gmt-time-switch>false</gmt-time-switch>
# >> <crc-calc-switch>1</crc-calc-switch>
# >> <encrypt-switch>false</encrypt-switch>
# >> <compress-switch>false</compress-switch>
# >> <ttl>1000</ttl>
# >> <qos-class-of-service>0</qos-class-of-service>
# >> <qos-priority>2</qos-priority>
# >> <qos-network-preference>1</qos-network-preference>
# >> </header></message></messages></test-case>
Notice that Nokogiri replaced the content of the ttl node. It also stripped the XML namespace info because the document didn't declare it correctly, and, finally, Nokogiri has added closing tags to make the document syntactically correct.
If you want the namespace to be declared in the output, you'll need to make sure it's there in the input.
If you need to just literally replace that value without affecting anything else about the XML file, even if (as pointed by the Tin Man above) that would mean leaving the original XML file malformed, you can do that with direct string manipulation using a regular expression.
Assuming there is guaranteed to only be one <mtn:ttl> tag in your XML document, you could just do:
doc = IO.read("somefile.xml")
doc.sub! /<mtn:ttl>.+?<\/mtn:ttl>/, "<mtn:ttl>some other value<\/mtn:ttl>"
File.open("somefile.xml", "w") {|fh| fh.write(doc)}
If there might be more than one <mtn:ttl> tag, then this is trickier; how much trickier depends on how you want to figure out which tag(s) to change.

Resources