Parse namespaced xml with ruby nokogiri - ruby

I have as second of xml
<Environment
Name="test"
xmlns="http://schemas.dmtf.org/ovf/environment/1"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xmlns:oe="http://schemas.dmtf.org/ovf/environment/1"
oe:id="123456789">
<PropertySection>
<Property oe:key="mykey" oe:value="test"/>
</PropertySection>
</Environment>
I'm using ruby and nokogiri to parse the document. i.e.
file = File.open('/tmp/myxml.xml')
doc = Nokogiri::XML(file)
env = doc.at('Environment')
id = env['id']
printf("ID [%s]\n", id)
properties = env.at('PropertySection')
This works and successfully prints the id from the xml.
I now want to access the Property attribute with the key 'mykey'. I tried the following:
value = properties.at('Property[#key="mykey"]')['value']
printf("Value %s\n", value)
Unfortunately the properties.at method returns a nil object. I tried modifying the xml itself to remove the 'oe' namespace from the attribute 'key'. Re-running my script it works.
How can I get nokogiri to recognise the namespace when calling .at() ?

You should use the Nokogiri namespace syntax: http://nokogiri.org/tutorials/searching_a_xml_html_document.html#namespaces.
First, make sure you have namespaces you can use:
ns = {
'xmlns' => 'http://schemas.dmtf.org/ovf/environment/1',
'oe' => 'http://schemas.dmtf.org/ovf/environment/1'
}
(I'm defining both even though they are the same in this example). You might also look into using the namespaces already available in doc.collect_namespaces.
Then you can just do:
value = properties.at('./xmlns:Property[#oe:key="mykey"]/#oe:value', ns).content
Note that I am using ./ here because, for this specific search, Nokogiri interprets the XPath as CSS without it. You may wish to use .//.

Related

Ignore namespaces on xmldocument in nokogiri

Im trying to learn to parse xml with nokogiri.
I dont have control of how the xml file is generated and it seems the namespaces are causing issues because they are not defined.
Im using the following test code to try to get this to work.
require 'nokogiri'
def getxml
xml_str = <<EOF
<root>
<THING1:things type="Container">
<PART1:Id type="Property">1234</PART1:Id>
<PART1:Name type="Property">The Name1</PART1:Name>
</THING1:things>
<THING2:things type="Container">
<PART2:Id type="Property">2234</PART2:Id>
<PART2:Name type="Property">The Name2</PART2:Name>
</THING2:things>
</root>
EOF
doc = Nokogiri::XML(xml_str)
puts(doc.errors())
doc.xpath('//Id').each do |thing|
puts(thing.inspect)
#puts "ID = " + thing.at_xpath('Id').content
#puts "Name = " + thing.at_xpath('Name').content
end
end
getxml()
I'm getting the following errors:
2:38: ERROR: Namespace prefix THING1 on things is not defined
3:34: ERROR: Namespace prefix PART1 on Id is not defined
4:36: ERROR: Namespace prefix PART1 on Name is not defined
6:38: ERROR: Namespace prefix THING2 on things is not defined
7:34: ERROR: Namespace prefix PART2 on Id is not defined
8:36: ERROR: Namespace prefix PART2 on Name is not defined
How am I suppose to deal with namespaces not defined. Is there a way to ignore namespaces.
Nokogiri does have the remove_namespaces! method, but it wont help in your case as your XML isn’t actually using namespaces.
As there are no namespace declarations, your XML elements are just treated as non-namespaced elements that contain a : character in their name. This makes it difficult to use with XPath as XPath assumes a : indicates a namespace.
One way to get round this is to use the local-name() function to select elements. For example to select all elements named PART1:Id you could use this:
doc.xpath('//*[local-name()="PART1:Id"]')
If you want to select all elements where the final part is Id, regardless of what the prefix is, such as PART1:Id and PART2:Id, you could combine local-name() with substring-after():
doc.xpath('//*[substring-after(local-name(), ":")="Id"]')

XQuery/Xpath referring to xml elements with no namespace, in a namespace environment

In Xquery 3.1 (under eXist-DB 4.7) I receive xml data like this, with no namespace:
<edit-request id="TC9999">
<title-collection>foocolltitle</title-collection>
<title-exempla>fooextitle</title-exempla>
<title-short>fooshorttitle</title-short>
</edit-request>
This is assigned to a variable $content and this statement:
let $collid := $content/edit-request/#id
...correctly returns: TC9999
Now, I need to actually transform all the data in $content into a TEI xml document.
I first need to get some info from an existing TEI file, so I assigned another variable:
let $oldcontent := doc(concat($globalvar:URIdata,$collid,"/",$collid,".xml"))
And then I create the new TEI document, referring to both $content and $oldcontent:
let $xml := <listBibl xmlns="http://www.tei-c.org/ns/1.0"
type="collection"
xml:id="{$collid}">
<bibl>
<idno type="old_sql_id">{$oldcontent//tei:idno[#type="old_sql_id"]/text()}</idno>
<title type="collection">{$content//title-exempla/text()}</title>
</bibl>
</listBibl>
The references to the TEI namespace in $oldcontent come through, but to my surprise the references to $content (no namespace) don't show up:
<listBibl xmlns="http://www.tei-c.org/ns/1.0"
type="collection"
xml:id="TC9999">
<bibl>
<idno type="old_sql_id">1</idno>
<title type="collection"/>
</bibl>
</listBibl>
The question is: how do I refer to the non-namespace elements in $content in the context of let $xml=...?
Nb: the Xquery document has a declaration at the top (as it is the principle namespace of virtually all the documents):
declare namespace tei = "http://www.tei-c.org/ns/1.0";
In essence you are asking how to write an XPath expression to select nodes in an empty namespace in a context where the default element namespace is non-empty. One of the most direct solutions is to use the "URI plus local-name syntax" for writing QNames. Here is an example:
xquery version "3.1";
let $x := <x><y>Jbrehr</y></x>
return
<p xmlns="foo">Hey there,
{ $x/Q{}y => string() }!</p>
If instead of $x/Q{}y the example had used the more common form of the path expression, $x/y, its result would have been an empty sequence, since the local name y used to select the <y> element specifies no namespace and thus inherits the foo element namespace from its context. By using the "URI plus local-name syntax", though, we are able to specify the empty namespace we are looking for.
For more information on this, see the XPath 3.1 specification's discussion of expanded QNames: https://www.w3.org/TR/xpath-31/#doc-xpath31-EQName.

Use Datasource Properties in XPath Expression of SoapUI

I need to know whether it is possible to use a datasource property in XPath Expression panel of XPath Match Configuration. For instance, if we have the following XML document:
<ns1:Ions>
<ns1:Ion>UI</ns1:Ion>
<ns1:IonType>X</ns1:IonType>
<ns1:StartDate>2010-05-10</ns1:StartDate>
</ns1:Ions>
<ns1:Ions>
<ns1:Ion>HH</ns1:Ion>
<ns1:IonType>RI</ns1:IonType>
<ns1:StartDate>1998-11-23</ns1:StartDate>
</ns1:Ions>
<ns1:Ions>
<ns1:Ion>CF</ns1:Ion>
<ns1:IonType>A</ns1:IonType>
<ns1:StartDate>2000-06-10</ns1:StartDate>
</ns1:Ions>
I need to evaluate to see whether a content of IonType is 'A' only if its sibling node, Ion, has a value of 'CF'. I was hoping to accomplish this by setting XPath Match Configuration as following:
XPath Expression (DataSourceInput#ION is 'CF')
declare namespace ns1='http://my.namespace.com';
//ns1:Ions[ns1:Ion[text()=${DataSourceInput#ION}]]/ns1:IonType/text()
Expected Results (DataSourceInput#ION_TYPE is 'A')
${DataSourceInput#ION_TYPE}
Running the test would result in SoapUI [Pro] to error the following, Missing content for xpath declare. If I replace ${DataSourceInput#ION} with an actual value, i.e. 'CF', the test works accordingly (I even tried place single quotes around ${DataSourceInput#ION}, but it didn't work).
Is there another way of accomplish this in SoapUI?
I try what you do and it works for me if I put single quotes around the property:
declare namespace ns1='http://my.namespace.com';
//ns1:Ions[ns1:Ion[text()='${DataSourceInput#ION}']]/ns1:IonType/text()
Did you check that testStep name is exactly DataSourceInput? If there are spaces in the TestStep name (i.e your testStep name is Data Source Input you have to put ${Data Source Input#ION}).
Anyway I give you another way to do so, you can add a testStep of type groovy script after the testStep where you are getting the <Ions>response, and check the assert here like follows:
// get xml holder
def groovyUtils = new com.eviware.soapui.support.GroovyUtils(context);
def ionsHolder = groovyUtils.getXmlHolder("IonsTestStepName#response");
// generate xpath expression
def xpathExpression = "//*:Ions[*:Ion[text()='" + context.expand('${DataSourceInput#ION}') + "']]/*:IonType/text()";
log.info xpathExpression;
// get the node value
def nodeValue = ionsHolder.getNodeValue(xpathExpression);
// check expected value
assert nodeValue == context.expand('${DataSourceInput#ION_TYPE}'),'ERROR IONS VALUE';
Hope this helps,

How to retrieve the nokogiri processing instruction attributes?

I am parsing the XML using Nokogiri.
I am able to retrieve the stylesheets. But not the attributes of each stylesheet.
1.9.2p320 :112 >style = xml.xpath('//processing-instruction("xml-stylesheet")').first
=> #<Nokogiri::XML::ProcessingInstruction:0x5459b2e name="xml-stylesheet">
style.name
=> "xml-stylesheet"
style.content
=> "type=\"text/xsl\" href=\"CDA.xsl\""
Is there any easy way to get the type, href attributes values?
OR
Only way is to parse the content(style.content) of the processing instruction ?
I solved this problem by following instruction in below answer.
Can Nokogiri search for "?xml-stylesheet" tags?
Added new to_element method to Nokogiri::XML::ProcessingInstruction class
class Nokogiri::XML::ProcessingInstruction
def to_element
document.parse("<#{name} #{content}/>")
end
end
style = xml.xpath('//processing-instruction("xml-stylesheet")').first
element = style.to_element
To retrieve the href attribute value
element.attribute('href').value
Cannot you do that?
style.content.attribute['type'] # or attr['type'] I am not sure
style.content.attribute['href'] # or attr['href'] I am not sure
Check this question How to access attributes using Nokogiri .

How to create XML object from string using xml-mapping in Ruby

I'm using xml-mapping in Ruby (on Sinatra) for some XML stuff. Generally I follow this tutorial: http://xml-mapping.rubyforge.org/. I can create objects and write them to XML strings using
login.save_to_xml.to_s
But when I try
login = Login.load_from_xml(xml_string)
I get the following error:
XML::MappingError - no value, and no default value: Attribute username not set (XXPathError: path not found: username):
Here is the XML string I receive:
<login><username>ali</username><password>baba</password></login>
This is what the class looks like:
class Login
include XML::Mapping
text_node :username, "username"
text_node :password, "password"
end
So the class name is the same, the nodes are named the same. I actually get the exact same string when I create an instance of my object and fill it with ali/baba:
test = Login.new
test.username = "ali"
test.password = "baba"
p test.save_to_xml.to_s
<login><username>ali</username><password>baba</password></login>
What am I missing?
Thanks,
MrB
EDIT:
When I do
test = login.save_to_xml
And then
login = Login.load_from_xml(test)
it works. So the problem seems to be that I'm passing a string, while the method is expecting.. well, something else. There is definitely a load_from_xml(string) method in the rubydocs, so not sure what to pass here. I guess I need some kind of reverse to_s?
It looks like you save_to_xml creates a REXML::Element. Since that works, you may want to try:
Login.load_from_xml(REXML::Document.new(xml_string).root)
See the section on "choice_node" for a more detailed example http://xml-mapping.rubyforge.org/

Resources