Adding new child nodes to an existing XML file with Ruby & Nokogiri - ruby

I have this server project in Ruby, and I would like to keep tracks of events and user sessions in a XML file. I'm totally new to this, and after days of research, I'm hitting a wall.
Here's my current sample code, assuming there's already a file named "test.xml" that contains a root node called
$ cat test.xml
<server></server>
and the code :
require 'nokogiri'
require 'securerandom'
logintime = Time.now
sessionid = SecureRandom.hex(10)
file = File.open("test.xml",'a+')
doc = Nokogiri::XML.parse file
session_node = Nokogiri::XML::Node.new("session",doc)
session_node['id'] = sessionid
logintime_node = Nokogiri::XML::Node.new("logintime",doc)
logintime_node.content = logintime
session_node << logintime_node
doc.root << session_node
file.print doc.to_xml
file.close
and here's the test.xml file after 4 runs
<server></server>
<?xml version="1.0"?>
<server>
<session id="5ef27ade2afaf5c2162f">
<logintime>2015-07-07 17:27:20 +0200</logintime>
</session>
</server>
<?xml version="1.0"?>
<server>
<session id="637595bd0857c8af1cc0">
<logintime>2015-07-07 17:27:36 +0200</logintime>
</session>
</server>
<?xml version="1.0"?>
<?xml version="1.0"?>
<server>
<session id="41e6082c4db7d1dc8692">
<logintime>2015-07-07 17:27:37 +0200</logintime>
</session>
</server>
<?xml version="1.0"?>
<?xml version="1.0"?>
<server>
<session id="1cad6c3d38d4fb96632b">
<logintime>2015-07-07 17:27:38 +0200</logintime>
</session>
</server>
<?xml version="1.0"?>
And the desired output should be something like this :
<?xml version="1.0"?>
<server>
<session id="5ef27ade2afaf5c2162f">
<logintime>2015-07-07 17:27:20 +0200</logintime>
</session>
<session id="637595bd0857c8af1cc0">
<logintime>2015-07-07 17:27:36 +0200</logintime>
</session>
<session id="41e6082c4db7d1dc8692">
<logintime>2015-07-07 17:27:37 +0200</logintime>
</session>
<session id="1cad6c3d38d4fb96632b">
<logintime>2015-07-07 17:27:38 +0200</logintime>
</session>
</server>
And I really don't know why should I do to obtain that result.
First, if there's no existing file containing the root node, the script run only once, then complains that there's already a root node when I try to run it a second time :
/System/Library/Frameworks/Ruby.framework/Versions/2.0/usr/lib/ruby/gems/2.0.0/gems/nokogiri-1.5.6/lib/nokogiri/xml/document.rb:232:in `add_child': Document already has a root node (RuntimeError)
from /Users/xxx/nokogiri.rb:13:in `<top (required)>'
from -e:1:in `load'
from -e:1:in `<main>'
So... I'm kinda lost here. Any ideas ?

The problem is you're opening your file in append mode with File.open('test.xml', 'a+') and then writing the entire XML doc to it with file.print doc.to_xml. That's why you end up with the entire document written several times into the file.
If you read and write the file independently, the XML doc will replace the file the way you want. If you need to handle the file not existing yet, you can also check for it and initialize the data with your <server> root tag.
require 'nokogiri'
require 'securerandom'
logintime = Time.now
sessionid = SecureRandom.hex(10)
# Read or initialize the data
if File.exist?('test.xml')
data = File.read("test.xml")
else
data = '<server></server>'
end
doc = Nokogiri::XML.parse data
session_node = Nokogiri::XML::Node.new("session",doc)
session_node['id'] = sessionid
logintime_node = Nokogiri::XML::Node.new("logintime",doc)
logintime_node.content = logintime
session_node << logintime_node
doc.root << session_node
# Write the document to disk
File.open('test.xml', 'w') do |file|
file.print doc.to_xml
end
I wouldn't recommend logging sessions this way for long. At any significant user load, writing the file will become very expensive. Also, if you have multiple servers running, they'll all be clobbering the file out from under one another. When you get to that point, you should at least convert your storage to a database, or even better use something like an ELK Stack that's built for this.

Related

Nokogiri parsing through XML fails

The code:
response = Nokogiri::XML(open('https://geocode-maps.yandex.ru/1.x/?geocode=%D0%A1%D0%B0%D0%BD%D0%BA%D1%82-%D0%9F%D0%B5%D1%82%D0%B5%D1%80%D0%B1%D1%83%D1%80%D0%B3+%D0%A1%D0%B2%D0%B5%D1%80%D0%B4%D0%BB%D0%BE%D0%B2%D1%81%D0%BA%D0%B0%D1%8F+%D0%BD%D0%B0%D0%B1%D0%B5%D1%80%D0%B5%D0%B6%D0%BD%D0%B0%D1%8F+44%D0%A2'), nil, Encoding::UTF_8.to_s)
lowerCorner = response.xpath("//lowerCorner")
XML document I parse is like:
<?xml version="1.0" encoding="utf-8"?>
<ymaps xmlns="http://maps.yandex.ru/ymaps/1.x" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation=" http://maps.yandex.ru/business/1.x http://maps.yandex.ru/schemas/business/1.x/business.xsd http://maps.yandex.ru/geocoder/1.x http://maps.yandex.ru/schemas/geocoder/1.x/geocoder.xsd http://maps.yandex.ru/psearch/1.x http://maps.yandex.ru/schemas/psearch/1.x/psearch.xsd http://maps.yandex.ru/search/1.x http://maps.yandex.ru/schemas/search/1.x/search.xsd http://maps.yandex.ru/web/1.x http://maps.yandex.ru/schemas/web/1.x/web.xsd http://maps.yandex.ru/search/internal/1.x http://maps.yandex.ru/schemas/search/internal/1.x/internal.xsd">
<GeoObjectCollection>
<metaDataProperty xmlns="http://www.opengis.net/gml">
<GeocoderResponseMetaData xmlns="http://maps.yandex.ru/geocoder/1.x">
<request>Санкт-Петербург Свердловская набережная 44Т</request>
<found>1</found>
<results>10</results>
</GeocoderResponseMetaData>
</metaDataProperty>
<featureMember xmlns="http://www.opengis.net/gml">
<GeoObject xmlns="http://maps.yandex.ru/ymaps/1.x" xmlns:gml="http://www.opengis.net/gml" gml:id="1">
<metaDataProperty xmlns="http://www.opengis.net/gml">
<GeocoderMetaData xmlns="http://maps.yandex.ru/geocoder/1.x">
<kind>house</kind>
<text>Россия, Санкт-Петербург, Свердловская набережная, 44Т</text>
<precision>exact</precision>
</GeocoderMetaData>
</metaDataProperty>
<Envelope>
<lowerCorner>30.397902 59.959183</lowerCorner>
<upperCorner>30.406113 59.9633</upperCorner>
</Envelope>
</boundedBy>
<Point xmlns="http://www.opengis.net/gml">
<pos>30.402008 59.961242</pos>
</Point>
</GeoObject>
</featureMember>
</GeoObjectCollection>
</ymaps>
I'd like to get lowerCorner, but nothing from official and others sources does work:
response.xpath('//lowerCorner')
response.search('//lowerCorner')
response.xpath('xmlns:lowerCorner')
response.xpath('xmlns:lowerCorner', ns).text
response.css('lowerCorner')
The only result is: []
So how to parse lowerCorner's content?
Removing the namespaces (or using them in your path) should help.
Try this:
require "nokogiri"
require "open-uri"
response = Nokogiri::XML(open('https://geocode-maps.yandex.ru/1.x/?geocode=%D0%A1%D0%B0%D0%BD%D0%BA%D1%82-%D0%9F%D0%B5%D1%82%D0%B5%D1%80%D0%B1%D1%83%D1%80%D0%B3+%D0%A1%D0%B2%D0%B5%D1%80%D0%B4%D0%BB%D0%BE%D0%B2%D1%81%D0%BA%D0%B0%D1%8F+%D0%BD%D0%B0%D0%B1%D0%B5%D1%80%D0%B5%D0%B6%D0%BD%D0%B0%D1%8F+44%D0%A2'), nil, Encoding::UTF_8.to_s)
response.remove_namespaces! # <<<<<<<
lower_corner = response.xpath("/ymaps/GeoObjectCollection/featureMember/GeoObject/boundedBy/Envelope/lowerCorner").first
p lower_corner.text #> "30.397902 59.959183"

Nokogiri XML Searching

I've tried reading the Nokogiri docs, etc, but I've came to a road block.
I get an XML output similar to
<?xml version="1.0"?>
<soap:Envelope xmlns:soap="http://schemas.xmlsoap.org/soap/envelope/">
<soap:Body>
<ns1:getPoliciesResponse xmlns:ns1="http://policy.api.control.r1soft.com/">
<return>
<CDPId>XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXx</CDPId>
<description/>
<diskSafeID>bcb68765-a719-4291-912d-2e6af485ea24</diskSafeID>
<enabled>true</enabled>
<id>cdb65427-d6f4-4a89-9f77-8763e22dc74b</id>
<lastReplicationRunTime>2013-06-12T13:29:40.105-05:00</lastReplicationRunTime>
<name>pstueck-passenger ondemand</name>
<replicationScheduleFrequencyType>ON_DEMAND</replicationScheduleFrequencyType>
<state>OK</state>
</return>
<return>
<CDPId>XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXx</CDPId>
<description/>
<diskSafeID>e8e13555-f577-40d2-99c8-fa8a019d3b55</diskSafeID>
<enabled>true</enabled>
<id>7f55f8d6-92a9-4b14-bff4-631559d92259</id>
<lastReplicationRunTime>2013-06-16T22:00:04.918-05:00</lastReplicationRunTime>
<name>pstueck-mysql daily</name>
<nextReplicationRunTime>2013-06-17T22:00:00-05:00</nextReplicationRunTime>
<replicationScheduleFrequencyType>DAILY</replicationScheduleFrequencyType>
<state>ALERT</state>
<warnings>Policy last completed with alerts</warnings>
</return>
</ns1:getPoliciesResponse>
</soap:Body>
</soap:Envelope>
But I have a large # of 'return' sections that get displayed back. I'm trying to use the .search at the end of string. I'm only wanting it to return the entire 'return' section for a given 'name'. Anyone have any tips?
Current Code:
client = Savon::Client.new do
http.auth.basic "#{opts['api_username']}", "#{opts['api_password']}"
wsdl.document = "#{opts['api_url']}/Policy?wsdl"
end
getPolicyInformation = client.request :getPolicies
getPolicyInformation = Nokogiri::XML(getPolicyInformation.to_xml)
print getPolicyInformation
I'm wanting to return everything in the <return> section if I search for a specified <name>. Example: I only want to see the information relating to <name>pstueck-passenger ondemand</name>, but the entire <return> section that contains that.
You can use XPath to identify a node with a particular value and then specify that an ancestor element is of interest by doing something like the following:
require 'nokogiri'
document = <<-XML
<?xml version="1.0"?>
<soap:Envelope xmlns:soap="http://schemas.xmlsoap.org/soap/envelope/">
<soap:Body>
<ns1:getPoliciesResponse xmlns:ns1="http://policy.api.control.r1soft.com/">
<return>
<CDPId>XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXx</CDPId>
<description/>
<diskSafeID>bcb68765-a719-4291-912d-2e6af485ea24</diskSafeID>
<enabled>true</enabled>
<id>cdb65427-d6f4-4a89-9f77-8763e22dc74b</id>
<lastReplicationRunTime>2013-06-12T13:29:40.105-05:00</lastReplicationRunTime>
<name>pstueck-passenger ondemand</name>
<replicationScheduleFrequencyType>ON_DEMAND</replicationScheduleFrequencyType>
<state>OK</state>
</return>
<return>
<CDPId>XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXx</CDPId>
<description/>
<diskSafeID>e8e13555-f577-40d2-99c8-fa8a019d3b55</diskSafeID>
<enabled>true</enabled>
<id>7f55f8d6-92a9-4b14-bff4-631559d92259</id>
<lastReplicationRunTime>2013-06-16T22:00:04.918-05:00</lastReplicationRunTime>
<name>pstueck-mysql daily</name>
<nextReplicationRunTime>2013-06-17T22:00:00-05:00</nextReplicationRunTime>
<replicationScheduleFrequencyType>DAILY</replicationScheduleFrequencyType>
<state>ALERT</state>
<warnings>Policy last completed with alerts</warnings>
</return>
</ns1:getPoliciesResponse>
</soap:Body>
</soap:Envelope>
XML
doc = Nokogiri::XML(document)
ns = { 'soap' => 'http://schemas.xmlsoap.org/soap/envelope/', 'ns1' => "http://policy.api.control.r1soft.com/" }
ret = doc.xpath('/soap:Envelope/soap:Body/ns1:getPoliciesResponse/return/name[text()="pstueck-passenger ondemand"]/ancestor::return', ns)
puts ret.count
puts ret.at('replicationScheduleFrequencyType').text
EDIT
Updated to reflect updated XML body in question. Now handles namespaces.
Using CSS to find the node:
require 'nokogiri'
doc = Nokogiri::XML(<<EOT)
<?xml version="1.0"?>
<soap:Envelope xmlns:soap="http://schemas.xmlsoap.org/soap/envelope/">
<soap:Body>
<ns1:getPoliciesResponse xmlns:ns1="http://policy.api.control.r1soft.com/">
<return>
<CDPId>XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXx</CDPId>
<description/>
<diskSafeID>e8e13555-f577-40d2-99c8-fa8a019d3b55</diskSafeID>
<enabled>true</enabled>
<id>7f55f8d6-92a9-4b14-bff4-631559d92259</id>
<lastReplicationRunTime>2013-06-16T22:00:04.918-05:00</lastReplicationRunTime>
<name>pstueck-mysql daily</name>
<nextReplicationRunTime>2013-06-17T22:00:00-05:00</nextReplicationRunTime>
<replicationScheduleFrequencyType>DAILY</replicationScheduleFrequencyType>
<state>ALERT</state>
<warnings>Policy last completed with alerts</warnings>
</return>
<return>
<CDPId>XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXx</CDPId>
<description/>
<diskSafeID>bcb68765-a719-4291-912d-2e6af485ea24</diskSafeID>
<enabled>true</enabled>
<id>cdb65427-d6f4-4a89-9f77-8763e22dc74b</id>
<lastReplicationRunTime>2013-06-12T13:29:40.105-05:00</lastReplicationRunTime>
<name>pstueck-passenger ondemand</name>
<replicationScheduleFrequencyType>ON_DEMAND</replicationScheduleFrequencyType>
<state>OK</state>
</return>
</ns1:getPoliciesResponse>
</soap:Body>
</soap:Envelope>
EOT
return_tag = doc.at('return name[text()="pstueck-passenger ondemand"]').parent
puts return_tag.to_xml
Which outputs:
<return>
<CDPId>XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXx</CDPId>
<description/>
<diskSafeID>bcb68765-a719-4291-912d-2e6af485ea24</diskSafeID>
<enabled>true</enabled>
<id>cdb65427-d6f4-4a89-9f77-8763e22dc74b</id>
<lastReplicationRunTime>2013-06-12T13:29:40.105-05:00</lastReplicationRunTime>
<name>pstueck-passenger ondemand</name>
<replicationScheduleFrequencyType>ON_DEMAND</replicationScheduleFrequencyType>
<state>OK</state>
</return>
Nokogiri supports both XPath and CSS. I find CSS easier to read.
I used the at method to find the first matching occurrence, and to show that it was the first matching, I swapped the order of the two <return> blocks. at is the same as search(...).first so when you're looking for the first instance of something in a document at is the way to go.
Nokogiri is usually smart enough to know the difference between XPath and CSS selectors, so we can use the generic at and search. If you need to force CSS or XPath parsing because the selector is gender-unspecific, you can use the specific css or xpath or at_css or at_xpath respectively. They're all documented in the Nokogiri::XML::Node docs.
parent is necessary because we want the parent of the selected node, which was <name>. I just slammed it into reverse and backed up a block. That is easier to do in XPath, where we can use .. to point to the parent node.

Replace all occurrences except the first in Ruby. The regular expression spans multiple lines

I am trying to down my last 3200 tweets in groups of 200(in multiple pages) using restclient gem.
In the process, I end up adding the following lines multiple times to my file:
</statuses>
<?xml version="1.0" encoding="UTF-8"?>
<statuses type="array">
To get this right(as the XML parsing goes for a toss), after downloading the file, I want to replace all occurrences of the above string except the first.
I am trying the following:
tweets_page = RestClient.get("#{GET_STATUSES_URL}&page=#{page_number}")
message = <<-MSG
</statuses>
<?xml version="1.0" encoding="UTF-8"?>
<statuses type="array">
MSG
unless page_number == 1
tweets_page.gsub!(message,"")
end
What is wrong in the above? Is there a better way to do the same?
I believe it would be faster to download the whole bunch at once and split the body of your response by message and add it for the first entry.
Something like this, can't try it out so consider this just as an idea.
tweets_page = RestClient.get("#{GET_STATUSES_URL}").body
tweets = tweets_page.split(message)
tweets_page = tweets[0]+message+tweets[1..-1]
You could easily break them up in groups of 200 like that also
If you want to do it with a gsub on the whole text you could use the following
tweets_page = <<-MSG
first
</statuses>
<?xml version="1.0" encoding="UTF-8"?>
<statuses type="array">
second
</statuses>
<?xml version="1.0" encoding="UTF-8"?>
<statuses type="array">
rest
MSG
message = <<-MSG
</statuses>
<?xml version="1.0" encoding="UTF-8"?>
<statuses type="array">
MSG
new_str = tweets_page.gsub message do |match|
if defined? #first
""
else
#first = true
message
end
end
p new_str
gives
type=\"array\">\nrest\n"
"first\n</statuses>\n<?xml version=\"1.0\" encoding=\"UTF-8\"?>\n<statuses type=\"array\">\nsecond\nrest\n"

Read an XML node value using Linq

I have what should be a fairly simple requirement.
I want to use LINQ to XML in a VS2010 project to get the following node values:
GPO->Name
GPO->LinksTo->SOMName
GPO->LinksTo->SOMPath
Then:
If GPO->User->ExtensionDataCount node exists, I want to return count of all child nodes
Likewise:
If GPO->Computer->ExtensionData exists, return count of all child nodes
here is the XML, those of you familiar with Group Policy exports will have seen this before:
<?xml version="1.0" encoding="utf-16"?>
<GPO xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns="http://www.microsoft.com/GroupPolicy/Settings">
<Identifier>
<Domain xmlns="http://www.microsoft.com/GroupPolicy/Types">our.domain.fqdn</Domain>
</Identifier>
<Name>The Name I want to get</Name>
<Computer>
<VersionDirectory>1</VersionDirectory>
<VersionSysvol>1</VersionSysvol>
<Enabled>true</Enabled>
</Computer>
<User>
<VersionDirectory>4</VersionDirectory>
<VersionSysvol>4</VersionSysvol>
<Enabled>true</Enabled>
<ExtensionData>
<Extension xmlns:q1="http://www.microsoft.com/GroupPolicy/Settings/Scripts" xsi:type="q1:Scripts">
<q1:Script>
<q1:Command>Logon.cmd</q1:Command>
<q1:Type>Logon</q1:Type>
<q1:Order>0</q1:Order>
<q1:RunOrder>PSNotConfigured</q1:RunOrder>
</q1:Script>
</Extension>
<Name>Scripts</Name>
</ExtensionData>
</User>
<LinksTo>
<SOMName>an interesting data value</SOMName>
<SOMPath>some data value</SOMPath>
<Enabled>true</Enabled>
<NoOverride>false</NoOverride>
</LinksTo>
</GPO>
I have loaded the XML file into an XDocument, then tried to pull the Name value with:
XDoc.Elements("Name").FirstOrDefault.Value
XDoc.Descendants("Name").First().Value
But I'm getting errors: Object reference not set to an instance of an object and then: Sequence contains no elements.
I'm guessing I might have the path wrong, but I thought Descendants didn't require an exact path..
Where am I going wrong?
You have to use the declared namespace:
XNamespace ns = "http://www.microsoft.com/GroupPolicy/Settings";
var firstName = xdoc.Descendants(ns + "Name").First().Value;

Get node name with REXML

I have an XML, which can be like
<?xml version="1.0" encoding="utf-8"?>
<testnode type="1">123</testnode>
or like
<?xml version="1.0" encoding="utf-8"?>
<othernode attrib="true">other value</othernode>
or the root node can be something completely unexpected. (Theoretically anything.)
I'm using REXML to parse it. How can I find out what XML node is the root element?
xml = REXML::Document.new "<?xml version" #etc (or load from file)
root_node = xml.elements[1]
root_node_name = root_node.name

Resources