When does XPath property have to be set to XML DOM object? - xpath

For example:
Set objXML = CreateObject("Microsoft.XMLDOM")
objXML.async = False
objXML.validateOnParse = False
objXML.resolveExternals = False
objXML.load("http://www.w3schools.com/dom/books.xml")
'objXML.setProperty "SelectionLanguage", "XPath"
For Each x In objXML.selectNodes("//book[#category='cooking' and #category='children']")
WScript.Echo x.text
Next
For Each y In objXML.selectNodes("//book[position()<3]")
WScript.Echo y.text
Next
When objXML.setProperty "SelectionLanguage", "XPath" is commented then first xpath expression (x object) is returned valid but second xpath expression (y object) raises error:
msxml3.dll (14, 1) : Unknown method.
//book[-->position()<--<3]
If I uncomment objXML.setProperty "SelectionLanguage", "XPath" both expressions work.
My question is when XPath property has to be explicitly set, or what kind of expressions are executed without setting this property?

Default language is not XPath for older versions of MSXML.
You've created DomDocument instance using an old, "version independent ProgID". Microsoft.XMLDOM corresponds MSXML 3.0 (if you have) as the last version of MSXML which supported independent ProgIDs.
You can determine default selection language like this :
WScript.Echo objXML.getProperty("SelectionLanguage")
Must be return XSLPattern which a selection language does not supports methods like position().
XPath is default selection language for MSXML 4.0 and later, so you have two choices using XPath properly.
Using older versions specifying selection language as XPath.
Using newer (less older?) versions without specifyng any selection language
From an ancient article that smells like my teenage times describing the difference between XSL Patterns and XPath.
MSXML 2.0 provides support for XSL Patterns, the precursor to XPath
1.0. The notion of an XML addressing language was introduced into the original W3C XSL Working Drafts
(http://www.w3.org/TR/1998/WD-xsl-19981216.html) and called XSL
Patterns. MSXML 2.0 implements the XSL Patterns language as described
in the original XSL specification with a few minor exceptions.
So, I think you were on minor (!) exceptions.

Related

Microsoft Access isnumeric system language issue

We are having an issue with Microsoft access and a function called " isnumeric ". When running our software(it's using Access) on a English Windows, 12.2 is isnumeric = false , but on a Swedish Windows 12.2 isnumeric = true.
I am by no means a developer, I'm just trying to find out why this problem occur, since one of our developers is running into this issue right now.
First of all: IsNumeric() is locale aware, keep that in mind when developing for an international market. E.g. Debug.Print IsNumeric("$12.2") returns False for me, whereas Debug.Print IsNumeric("€12.2") returns True.
That said, I can see two possibilities for this to happen: 1) the regional settings on the English Windows have been edited or 2) you're using a self-written isnumber method.
When you create a public method with the same name as an intrinsic method, your method takes precedence over VB's method. If you now want to use VB's method instead of your own, you need to prefix that with its namespace, which in the case of IsNumeric is VBA: VBA.IsNumeric.

Find same-name siblings (SNS) using SQL2, SQL, XPath or QueryBuilder query in CQ5/AEM

Is it possible to find same-name siblings (SNS) using SQL2, SQL, XPath or QueryBuilder in Adobe CQ5/Adobe Experience Manager. We are trying to prepare the instance for upgrade to AEM 6.X and as already known jackrabbit oak has disabled the support for SNS, which makes the upgrade without solving this problem impossible. The repository could be traversed recursively, but this is too slow and I'm looking for better options using queries. SNS are defined as follows:
/a/b/c
/a/b/c[2]
/a/b/c[3]
/a/b[2]/c[2]
/a/b/c[3]
I would prefer SQL2, but any other option is also possible.
Note that no functions or xslt are possible, because we are not talking about xml documents, but for java content repository (JCR).
In XQuery 1.0 or XPath 3.0 the same-name siblings of the context node can be found as
let $n := node-name(.)
return (preceding-sibling::* | following-sibling::*)[node-name(.) = $n]
or as
let $n := node-name(.)
return ../*[node-name(.) = $n] except .
(the "except ." can be omitted if you want to include the original element in the result).
I don't think that a pure XPath 1.0 solution is possible, because of the absence of range variables, but with XPath 1.0 within XSLT 1.0 you can do
(preceding-sibling::* | following-sibling::*)
[local-name(.) = local-name(current()) and
namespace-uri(.) = namespace-uri(current())]

XPath tokenize() method not recognized by msxml3.dll

I'm attempting to use the tokenize method in a SelectNodes(" ") call, to filter some things out.
I have something along the lines of:
<nodes>
<node colors="RED,BLUE,YELLOW"/>
</nodes>
And my xpath is as such:
nodes/node[not(empty(tokenize("GREEN,YELLOW,PURPLE", ",") intersect tokenize(#colors, ",")))]
Simply, I've got two comma delimited list, one as an attribute, and one as a "filter" for the attributes. I want to select all nodes where #colors contains, somewhere, one of the words inside of "GREEN,YELLOW,PURPLE".
I thought I had the solution for it with that XPath, but it seems either:
A: I did something wrong, or
B: The version of XML DOM document I am using does not support tokenize()
The XPath above, in a SelectNodes( ) call will throw up an error message, saying msxml3.dll: Unknown method.", then pointing to the tokenize() method.
I tried doing setProperty("SelectionLanguage", "XPath"), but that did not seem to solve the issue either.
Is there any way for me to perform an equivalent XPath selection, without resorting to a bunch of and contains(#colors, "GREEN") and contains(#colors, "YELLOW")...?
As JLRishe says, msxml does not support XPath 2.0.
Depending on the environment that you are in there is probably third-party software you can use that supports either XPath 2.0 or XQuery 1.0 (which is a superset of XPath 2.0).
Microsoft's XML software is getting very dated and there has been little new development for 10 years now. It's time to consider alternatives.

Select default namespace in XPath with HtmlUnit

I want to parse a Feedburner feed with HtmlUnit.
The feed is this one: http://feeds.feedburner.com/alcoanewsreleases
From this feed I want to read all item nodes, so normally a //item XPath should do the trick. Unfortunately that does not work in this case.
groovy code snippet:
def page = webClient.getPage("http://feeds.feedburner.com/alcoanewsreleases")
def elements = page.getByXPath("//item")
Sample of the XML feed:
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" media="screen" href="/~d/styles/rss1full.xsl"?>
<?xml-stylesheet type="text/css" media="screen" href="http://feeds.feedburner.com/~d/styles/itemcontent.css"?>
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns="http://purl.org/rss/1.0/" xmlns:feedburner="http://rssnamespace.org/feedburner/ext/1.0">
[...SNIP...]
<item rdf:about="http://www.alcoa.com/global/en/news/news_detail.asp?newsYear=2011&pageID=20110518006002en">
<title>Chris L. Ayers Named President, Alcoa Global Primary Products</title>
<dc:date>2011-05-18</dc:date
<link>http://feedproxy.google.com/~r/alcoanewsreleases/~3/PawvdhpJrkc/news_detail.asp</link>
<description>NEW YORK--(BUSINESS WIRE)--Alcoa (NYSE:AA) announced today that Chris L. Ayers has been named President of Alcoa’s Global Primary Products (GPP) business, effective May 18, 2011. Ayers, previously Chief Operating Officer of GPP, succeeds John Thuestad, who will be handling special projects for the Company. Ayers joined Alcoa in February 2010 as Chief Operating Officer of Alcoa Cast, Forged and Extruded Products, a new position. He was elected a Vice President of Alcoa in April 2010 and Executive</description>
<feedburner:origLink xmlns:feedburner="http://rssnamespace.org/feedburner/ext/1.0">http://www.alcoa.com/global/en/news/news_detail.asp?newsYear=2010&pageID=20100104006194en</feedburner:origLink>
</item>
[...SNIP...]
</rdf:RDF>
I suspect this to be an issue with the namespaces because this document has 4 namespaces. The namespaces are
(this is the default) xmlns="http://purl.org/rss/1.0/"
xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns:dc="http://purl.org/dc/elements/1.1/"
xmlns:feedburner="http://rssnamespace.org/feedburner/ext/1.0"
I have tried to use Nokogiri with this (another XML Parser that I use for ruby scripts).
With Nokogiri I could just us the XPath //xmlns:item which works and returns all nodes from the feed.
I have tried the same XPath with HtmlUnit but it does not work.
So I think I can phrase my question as:
How can I select a node from the default namespace with HtmlUnit?
Any ideas?
From this feed I want to read all item
nodes, so normally a //item XPath
should do the trick. Unfortunately
that does not work in this case.
In XPath, that means "select all elements whose local name is item that are in no namespace". In RSS, the item elements must be in a namespace. So the above should never work with a conforming XML parser and XPath engine.
What's confusing is that in XML, <item> means "an element named item that is in the default namespace, i.e. whatever default namespace is in scope at this place in the document;" whereas in XPath, "item" means an element in no namespace. (Or, you could say, it means an element in the default namespace, but unless you have a way to tell XPath what the default namespace is, the default namespace is no namespace. Usually (always?) in XPath 1.0 there is no way to declare the default namespace for XPath expressions.)
The other confusing thing to beginners is that the namespace prefix mappings in the source XML document are not considered significant by the XPath processor. When the XML document is parsed, a data structure is built that remembers the name and namespace of every element (and other nodes). The namespace prefixes used, including the empty prefix of the default namespace, are considered mere syntactic convenience. More on this below...
With Nokogiri I could just us the
XPath //xmlns:item which works and
returns all nodes from the feed.
Whatever that is, it's not XPath. Maybe it's a Nokogiri extension to it (a very convenient one, but its syntax is really counter-intuitive).
So I think I can phrase my question
as: How can I select a node from the
default namespace with HtmlUnit?
Let's phrase it as: How can I select the RSS item elements with HtmlUnit? I phrase it that way because the RSS spec (actually in general any conforming XML vocabulary spec) does not require that its elements will be in the default namespace. That happens to be true in the sample you received, but the service provider could change that tomorrow and still be perfectly conformant to RSS. Tomorrow, the service provider could use the "rss" namespace prefix for that namespace; or any other arbitrary prefix. What RSS does specify is what namespace its elements will be in: the namespace whose URI is http://purl.org/rss/1.0/.
It's kind of like asking, "How do I write a function (in Javascript, C, Java, etc.) that can tell me the value of the variable a?" Usually a function has no idea what variable name was used for what in the caller. All it knows are the values of its arguments. If you call sqrt(4), you'll get the same answer as with a = 4; sqrt(a) or rumpelstiltzkin = 4; sqrt(rumpelstiltzkin). Clearly, the name of the variable argument has no direct effect on the result of the function call. It just needs to be the name of a variable that holds the right value. If a compiler complained because you wrote b = 4; return sqrt(b) instead of using a, you'd think that compiler was nuts. It's not supposed to care about variable names as long as you use valid identifiers.
In the same way, when processing RSS, we're not supposed to care about what namespace prefix is used, as long as it's a prefix that identifies the right namespace. It could be no prefix (which identifies the default namespace).
In XPath 2.0, you can wildcard the namespace. This is very handy if you know you're not going to need namespaces for disambiguation. In that case you can select //*:item. However, I don't think HTMLUnit supports XPath 2.0. Also in XPath 2.0 environments like XSLT 2.0, you can specify a default namespace for XPath expressions, but that won't help you in HTMLUnit.
So you have a couple of choices:
Use an XPath expression that ignores namespaces, such as //*[local-name() = 'item'].
or
The robust way: Register a namespace prefix for http://purl.org/rss/1.0/ and use it in your XPath expression: //rss:item. The question then becomes, how do you register a namespace prefix in HTMLUnit and pass it to the XPath processor? I took a quick look in the docs and didn't find any facility for doing that.
Caveat: I should add that the above is in regard to conforming XPath processors. I have no idea what XPath processor HTMLUnit uses. There are some XPath processors out there that ignore the specs and make the world more confusing for everybody.
I saw here that someone used the following syntax for elements in the default namespace in HTMLUnit:
//:item
But I wouldn't recommend that, for three reasons:
It's not valid XPath, so you can't expect it to work with other programs.
It will only work on RSS feeds that declare the RSS namespace to be the default namespace. RSS feeds that use a namespace prefix will cause the above to fail.
It will hold you back from learning how XML namespaces really work, and it will help preserve the status quo of tools that don't adequately support namespaces.
HTMLUnit is primarily designed for HTML, so incomplete handling of XML is understandable. But claiming to support XPath and then not providing ways to declare namespace prefixes is a bug. HTMLUnit uses an XPath package that seems to be part of Xalan-J. That package has ways to provide namespace mappings to XPath, but I don't know if HTMLUnit exposes that functionality.
This sounds familiar enough that I'm quite sure I've used namespaces and XPath successfully with HtmlUnit in the past, but of course I can't find the code. I suspect it must have been with HTML pages only: the page reference in your example is an XmlPage which has a number of methods specific to namespaces, all of which throw a "not implemented yet" exception when used. :-(
The current version (2.8) of HtmlUnit is nearly a year old, so it may be that some work has been done in the meantime to support XML namespaces. The "HtmlUnit Users" mailing list would be the place to find out.
In the meantime, as always there is a workaround:
final XmlPage page = webClient.getPage("http://feeds.feedburner.com/alcoanewsreleases");
// no good
List elements = page.getByXPath("//item");
System.out.println( elements.size() ) ;
// ugly, but it works
DomElement de = (DomElement)page.getFirstByXPath( "//rdf:RDF" );
List<DomNode> items = new ArrayList<DomNode>() ;
for( DomNode dn : de.getChildNodes() )
{
String name = dn.getLocalName() ;
if( ( name != null ) && ( name.equals( "item" ) ) )
items.add( dn ) ;
}
System.out.println( "found " + items.size() ) ;
Oh boy Java is painful after working in Scala... ;-)

Explain xpath and xquery in simple terms

I am new to programming. I know what XML is. Can anyone please explain in simple terms what xpath and xquery do Where are they used?
XPath is a way of locating specific elements in an XML tree.
For instance, given the following structure:
<myfarm>
<animal type="dog">
<name>Fido</name>
<color>Black</color>
</animal>
<animal type="cat">
<name>Mitsy</name>
<color>Orange</color>
</animal>
</myfarm>
XPath allows you to traverse the structure, such as:
/myfarm/animal[#type="dog"]/name/text()
which would give you "Fido"
XQuery is an XML query language that makes use of XPath to query XML structures. However it also allows for functions to be defined and called, as well as complex querying of data structures using FLWOR expressions. FLWOR allows for join functionality between data sets defined in XML.
FLWOR article from wikipedia
Sample XQuery (using some XPath) is:
declare function local:toggle-boolean($b as xs:string)
as xs:string
{
if ($b = "Yes") then "true"
else if ($b = "No") then "false"
else if ($b = "true") then "Yes"
else if ($b = "false") then "No"
else "[ERROR] # local:toggle-boolean"
};
<ResultXML>
<ChangeTrue>{ local:toggle-boolean(doc("file.xml")/article[#id="1"]/text()) }</ChangeTrue>
<ChangeNo>{ local:toggle-boolean(doc("file.xml")/article[#id="2"]/text()) }</ChangeNo>
</ResultXML>
XPath is a simple query language which serves to search in XML DOM. I think that it can be compared to SQL Select statements with databases. XPath can evaluate many programs which work with XML and has a mass usage. I recommend u to learn it.
XQuery is much more powerful and complicated it also offers many options how to transform result, it offers cycles etc. But also it is query language. It is also used as query language into XML databases. I think that this language has only specific usage and probably is not necessary to know it, in the beginning there will be enough if u know that it exists and what it can
There is simple explanation I hope that it is enough and understandable

Resources