Using nokogiri xpath to access nested elements within an xmlns - ruby

I'm new to nokogiri and am having trouble using xpath to access nested elements of an xml document with a specific xmlns.
Given the following code
#!/opt/chef/embedded/bin/ruby
require 'nokogiri'
doc = Nokogiri::XML.parse <<-XML
<?xml version="1.0" encoding="UTF-8" ?>
<domain xmlns="urn:jboss:domain:1.8">
<profiles>
<profile name="full">
<subsystem xmlns="urn:jboss:domain:datasources:1.2">
<datasources>
<datasource jndi-name="java:/Paulstestjndi" pool-name="pauls_ds" enabled="false">
<connection-url>jdbc:oracle:thin:#testhost1:80001paulstestinstance|jdbc:oracle:thin:#testhost2:80001paulstestinstance</connection-url>
</datasource>
</datasources>
</subsystem>
</profile>
</profiles>
</domain>
XML
datasources = doc.xpath('//datasources:datasource', 'datasources' => "urn:jboss:domain:datasources:1.2")
datasources.each do |datasource|
conn_url = datasource.xpath("connection-url")
puts "CLASS = #{conn_url.class}"
puts "No of Entries = #{conn_url.length}"
end
I am able to retrieve datasources using xpath but am unable to use xpath to access 'connection-url' for each datasource.
I have tried several xpath calls to achieve this the following are examples
conn_url = datasource.xpath("connection-url")
conn_url = datasource.xpath("//connection-url")
conn_url = datasource.xpath("//datasources:datasource/connection-url", 'datasources'=>"urn:jboss:domain:datasources:1.2")
But each seems to return an empty set of results.
What am I missing?

It’s a namespacing issue:
datasource.xpath(
'subsystem:connection-url',
'subsystem' => 'urn:jboss:domain:datasources:1.2')
#⇒ [#<... name="connection-url" namespace=...

Related

Sitecore Custom Index - WARN Could not map index document (field: _uniqueid

I have created my custom Index in Sitecore with FlatDataCrawler.
The Index has been created and I can see my documents in Solr.
The problem is, whenever I'm trying to get those documents in my code I see exception like this:
Object reference not set to an instance of an object.
And in sitecore log file I see this WARN:
ManagedPoolThread #4 14:29:09 INFO Solr Query - ?q=*:*&rows=1000000&fq=_indexname:(products_index)&wt=xml
ManagedPoolThread #4 14:29:09 WARN Could not map index document (field: _uniqueid; Value: fae308d2-233f-4f7f-a4fd-9d880e42ff13) - Object reference not set to an instance of an object.
This is my Index config:
<?xml version="1.0" encoding="utf-8" ?>
<configuration xmlns:patch="http://www.sitecore.net/xmlconfig/" xmlns:role="http://www.sitecore.net/xmlconfig/role/" xmlns:search="http://www.sitecore.net/xmlconfig/search/">
<sitecore role:require="Standalone or ContentManagement" search:require="solr">
<contentSearch>
<configuration type="Sitecore.ContentSearch.ContentSearchConfiguration, Sitecore.ContentSearch">
<indexes hint="list:AddIndex">
<index id="products_index" type="Sitecore.ContentSearch.SolrProvider.SolrSearchIndex, Sitecore.ContentSearch.SolrProvider">
<param desc="name">$(id)</param>
<param desc="core">$(id)</param>
<param desc="propertyStore" ref="contentSearch/indexConfigurations/databasePropertyStore" param1="$(id)" />
<configuration ref="contentSearch/indexConfigurations/defaultSolrIndexConfiguration">
<documentOptions type="Sitecore.ContentSearch.SolrProvider.SolrDocumentBuilderOptions, Sitecore.ContentSearch.SolrProvider">
<indexAllFields>false</indexAllFields>
</documentOptions>
</configuration>
<strategies hint="list:AddStrategy">
<strategy ref="contentSearch/indexConfigurations/indexUpdateStrategies/manual" />
</strategies>
<locations hint="list:AddCrawler">
<crawler type="Feature.ProductsIndex.Crawlers.CustomOrderCrawler, Feature.ProductsIndex" />
</locations>
</index>
</indexes>
</configuration>
</contentSearch>
</sitecore>
</configuration>
This is my code:
using (var searchContext = ContentSearchManager.GetIndex("products_index").CreateSearchContext())
{
int count = searchContext.GetQueryable<SearchResultItem>().Count(); //This works
var results = searchContext.GetQueryable<SearchResultItem>().ToList(); //Exception here!
}
See in your schema file , if you have
<uniqueKey>_uniqueid</uniqueKey>
<field name="_uniqueid" type="string" indexed="true" required="true" stored="true"/>
if not follow this link populate solr schema , and restart the solr service and then try to rebuild the index

JAXB Moxy getValueByXpath gives null

I want to see if a theme element exists with specified name in the following xml file.
input.xml
<data>
<artifacts>
<document id="efqw4eads">
<name>composite</name>
</document>
<theme id="1">
<name>Terrace</name>
</theme>
<theme id="2">
<name>Garage</name>
</theme>
<theme id="3">
<name>Reception</name>
</theme>
<theme id="4">
<name>Grade II</name>
</theme>
</artifacts>
</data>
I have the following code. return true statement of the method is never executed. answer always contains a null value.
public boolean themeExists(String name, Data data){
String expression = "artifacts/theme[name='"+name+"']/name/text()";
String answer = jaxbContext.getValueByXPath(data, expression, null, String.class);
if(answer == null || answer.equals("")){
return false;
}
return true;
}
This use case isn't currently supported by EclipseLink JAXB (MOXy). I have opened the following enhancement you can use to track our progress:
http://bugs.eclipse.org/413823
There is no <artifacts/> element you're look for in the first axis step. Your XPath expression should be something like
String expression = "data/theme[name='"+name+"']/name/text()";

Parsing an XML file with Nokogiri?

<DataSet xmlns="http://www.atcomp.cz/webservices">
<xs:schema xmlns="" xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:msdata="urn:schemas-microsoft-com:xml-msdata" id="file_mame">...</xs:schema>
<diffgr:diffgram xmlns:msdata="urn:schemas-microsoft-com:xml-msdata" xmlns:diffgr="urn:schemas-microsoft-com:xml-diffgram-v1">
<alldata xmlns="">
<category diffgr:id="category1" msdata:rowOrder="0">
<category_code>P.../category_code>
<category_name>...</category_name>
<subcategory diffgr:id="subcategory1" msdata:rowOrder="0">
<category_code>...</category_code>
<subcategory_code>...</subcategory_code>
<subcategory_name>...</subcategory_name>
</subcategory>
....
How can I obtain all categories and subcategories data?
I am trying something like:
reader.xpath('//DataSet/diffgr:diffgram/alldata').each do |node|
But this gives me:
undefined method `xpath' for #<Nokogiri::XML::Reader:0x000001021d1750>
Nokogiri's Reader parser does not support XPath. Try using Nokogiri's in-memory Document parser instead.
On another note, to query xpath namespaces, you need to provide a namespace mapping, like this:
doc = Nokogiri::XML(my_document_string_or_io)
namespaces = {
'default' => 'http://www.atcomp.cz/webservices',
'diffgr' => 'urn:schemas-microsoft-com:xml-diffgram-v1'
}
doc.xpath('//default:DataSet/diffgr:diffgram/alldata', namespaces).each do |node|
# ...
end
Or you can remove the namespaces:
doc.remove_namespaces!
doc.xpath('//DataSet/diffgram/alldata').each { |node| }

Use of text() function when using xPath in dom4j

I have inherited an application that parses xml using dom4j and xPath:
The xml being parsed is similar to the following:
<cache>
<content>
<transaction>
<page>
<widget name="PAGE_ID">WRK_REGISTRATION</widget>
<widget name="TRANS_DETAIL_ID">77145</widget>
<widget name="GRD_ERRORS" />
</page>
<page>
<widget name="PAGE_ID">WRK_REGISTRATION</widget>
<widget name="TRANS_DETAIL_ID">77147</widget>
<widget name="GRD_ERRORS" />
</page>
<page>
<widget name="PAGE_ID">WRK_PROCESSING</widget>
<widget name="TRANS_DETAIL_ID">77152</widget>
<widget name="GRD_ERRORS" />
</page>
</transaction>
</content>
</cache>
Individual Nodes are being searched using the following:
String xPathToGridErrorNode = "//cache/content/transaction/page/widget[#name='PAGE_ID'][text()='WRK_DNA_REGISTRATION']/../widget[#name='TRANS_DETAIL_ID'][text()='77147']/../widget[#name='GRD_ERRORS_TEMP']";
org.dom4j.Element root = null;
SAXReader reader = new SAXReader();
Document document = reader.read(new BufferedInputStream(new ByteArrayInputStream(xmlToParse.getBytes())));
root = document.getRootElement();
Node gridNode = root.selectSingleNode(xPathToGridErrorNode);
where xmlToParse is a String of xml similar to the excerpt provided above.
The code is trying to obtain the GRD_ERROR node for the page with the PAGE_ID and TRANS_DETAIL_ID provided in the xPath.
I am seeing an intermittent (~1-2%) failure (returned node is null) of this selectSingleNode request even though the requested node is in the xml being searched.
I know there are some gotchas associated with using text()= in xPath and was wondering if there was a better way to format the xPath string for this type of search.
From your snippets, there is a problem regarding GRD_ERRORS vs. GRD_ERRORS_TMP and WRK_REGISTRATION vs. WRK_DNA_REGISTRATION.
Ignoring that, I would suggest to rewrite
//cache/content/transaction/page
/widget[#name='PAGE_ID'][text()='WRK_DNA_REGISTRATION']
/../widget[#name='TRANS_DETAIL_ID'][text()='77147']
/../widget[#name='GRD_ERRORS_TEMP']
as
//cache/content/transaction/page
[widget[#name='PAGE_ID'][text()='WRK_REGISTRATION']]
[widget[#name='TRANS_DETAIL_ID'][text()='77147']]
/widget[#name='GRD_ERRORS']
Just because it makes the code, in my eyes, easier to read, and expresses what you seem to mean more clearly: “the page element that has children with these conditions, and then take the widget with this #name.” Or, if that is closer to how you think about it,
//cache/content/transaction/page/widget[#name='GRD_ERRORS']
[preceding-sibling::widget[#name='PAGE_ID'][text()='WRK_REGISTRATION']]
[preceding-sibling::widget[#name='TRANS_DETAIL_ID'][text()='77147']]

Nokogiri: controlling element prefix for new child elments

I have an xml document like this:
<?xml version="1.0" encoding="UTF-8"?>
<foo:root xmlns:foo="http://abc.com#" xmlns:bar="http://def.com" xmlns:ex="http://ex.com">
<foo:element foo:attribute="attribute_value">
<bar:otherElement foo:otherAttribute="otherAttributeValue"/>
</foo:element>
</foo:root>
I need to add child elements to the element so that it looks like this:
<?xml version="1.0" encoding="UTF-8"?>
<foo:root xmlns:foo="http://abc.com#" xmlns:bar="http://def.com" xmlns:ex="http://ex.com">
<foo:element foo:attribute="attribute_value">
<bar:otherElement foo:otherAttribute="otherAttributeValue"/>
<bar:otherElement foo:otherAttribute="newAttributeValue"/>
<ex:yetAnotherElement foo:otherAttribute="yetANewAttributeValue"/>
</foo:element>
</foo:root>
I can add elements in the correct location using the following:
require 'rubygems'
require 'nokogiri'
doc = Nokogiri::XML::Document.parse(File.open("myfile.xml"))
el = doc.at_xpath('//foo:element')
newEl = Nokogiri::XML::Node.new("otherElement", doc)
newEl["foo:otherAttribute"] = "newAttributeValue"
el.add_child(newEl)
newEl = Nokogiri::XML::Node.new("yetAnotherElement", doc)
newEl["foo:otherAttribute"] = "yetANewAttributeValue"
el.add_child(newEl)
However the prefix of the new elements is always "foo":
<foo:root xmlns:foo="http://abc.com#" xmlns:bar="http://def.com" xmlns:ex="http://ex.com">
<foo:element foo:attribute="attribute_value">
<bar:otherElement foo:otherAttribute="otherAttributeValue" />
<foo:otherElement foo:otherAttribute="newAttributeValue" />
<foo:yetAnotherElement foo:otherAttribute="yetANewAttributeValue" />
</foo:element>
</foo:root>
How can I set the prefix on the element name for these new child elements? Thanks,
Eoghan
(removed bit about defining namespace, orthogonal to question and fixed in edit)
just add a few lines to your code, and you get the result desired:
require 'rubygems'
require 'nokogiri'
doc = Nokogiri::XML::Document.parse(File.open("myfile.xml"))
el = doc.at_xpath('//foo:element')
newEl = Nokogiri::XML::Node.new("otherElement", doc)
newEl["foo:otherAttribute"] = "newAttributeValue"
# ADDITIONAL CODE
newEl.namespace = doc.root.namespace_definitions.find{|ns| ns.prefix=="bar"}
#
el.add_child(newEl)
newEl = Nokogiri::XML::Node.new("yetAnotherElement", doc)
newEl["foo:otherAttribute"] = "yetANewAttributeValue"
# ADDITIONAL CODE
newEl.namespace = doc.root.namespace_definitions.find{|ns| ns.prefix == "ex"}
#
el.add_child(newEl)
and the result:
<?xml version="1.0" encoding="UTF-8"?>
<foo:root xmlns:abc="http://abc.com#" xmlns:def="http://def.com" xmlns:ex="http://ex.com" xmlns:foo="http://foo.com" xmlns:bar="http://bar.com">
<foo:element foo:attribute="attribute_value">
<bar:otherElement foo:otherAttribute="otherAttributeValue"/>
<bar:otherElement foo:otherAttribute="newAttributeValue"/>
<ex:yetAnotherElement foo:otherAttribute="yetANewAttributeValue"/>
</foo:element>
</foo:root>
The namespace 'foo' is not defined.
See this for more details:
Nokogiri/Xpath namespace query

Resources