Nodes with case-insensitive content using XOM - xpath

I want to query nodes from a XOM document which contains certain value but case insensitive. Something like this:
doc.query('/root/book[contains(.,"case-insentive-string")]')
But it contains is case sensitive.
I tried to use regexes, but it is
only XPATH2.0 and XOM does not seem
to support it.
I tried
contains(translate(."ABCEDF...","abcdef..."),"case-insentive-string")]'
failed too.
I tried to match
subnodes and read parent attributes
using getParent, but there is no
method to read parents attributes.
Any suggestions ?

If you are using XOM, then you can use Saxon to run XPath or XQuery against it. That gives you the ability to use the greatly increased function library in XPath 2.0, which includes functions lower-case() and upper-case(), and also the ability (though in a somewhat product-specific way) to choose your own collations for use with functions such as contains() - which means you can do matching that ignores accents as well as case, for example.

2.I tried contains(translate(."ABCEDF...","abcdef..."),"case-insentive-string")]'
failed too.
The proper way to write this is:
/root/book[contains(translate(., $vUpper, $vLower),
translate($vCaseInsentiveString, $vUpper, $vLower)
)
]
where $vUpper and $vLower are defined as (should be substituted by) the strings:
'ABCDEFGHIJKLMNOPQRSTUVWXYZ'
and
'abcdefghijklmnopqrstuvwxyz'
and $vCaseInsentiveString is defined as (should be substituted by) the specific case-insensitive string.
For example, given the following XML document:
<authors>
<author>
<name>Victor Hugo & Co.</name>
<nationality>French</nationality>
</author>
<author period="classical" category="children">
<name>J.K.Rollings</name>
<nationality>British</nationality>
</author>
<author period="classical">
<name>Sophocles</name>
<nationality>Greek</nationality>
</author>
<author>
<name>Leo Tolstoy</name>
<nationality>Russian</nationality>
</author>
<author>
<name>Alexander Pushkin</name>
<nationality>Russian</nationality>
</author>
<author period="classical">
<name>Plato</name>
<nationality>Greek</nationality>
</author>
</authors>
the following XPath expression (substitute the variables by the corresponding strings):
/*/author/name
[contains(translate(., $vUpper, $vLower),
translate('lEo', $vUpper, $vLower)
)
]
selects this element:
<name>Leo Tolstoy</name>
Explanation: Both arguments of the contains() function are converted to lower-case, and then the comparison is performed.

Related

Difference between XPath & XQuery when selecting attribute value

There seems to be a difference in how XPath and XQuery selects attributes.
Here is a toy example stolen from W3Schools:
<?xml version="1.0" encoding="UTF-8"?>
<bookstore>
<book category="cooking">
<title lang="en">Everyday Italian</title>
<author>Giada De Laurentiis</author>
<year>2005</year>
<price>30.00</price>
</book>
<book category="children">
<title lang="en">Harry Potter</title>
<author>J K. Rowling</author>
<year>2005</year>
<price>29.99</price>
</book>
</bookstore>
I want to retrieve the values of the lang attributes.
So, naively I do:
//title/#lang which works perfectly ... on an XPath evaluator but not on an XQuery evaluator.
What I need to know: How should I write my XPath expression to work on an XQuery evaluator?
What I want to know: What is going on?!
Here's the TL part:
I'm on a legacy platform without these capabilities so I send the XML and my query expression to an external service, which I believe is a Saxon-based XQuery evaluator. My syntax works as expected on CodeBeautify's XPath Tester.
I've also verified this difference on xpathtester.com: It works as expected in XPath mode but not in XQuery mode. (Note: link is not encrypted.).
xpathtester.com returns the following error message: ERROR - Cannot create an attribute node (lang) whose parent is a document node
The expression //title/#lang is valid under both XPath and XQuery, and returns a sequence of two attribute nodes.
Where you are seeing differences is in how different XPath and XQuery clients handle a result consisting of two attribute nodes.
If the tool tries to serialize the result as XML, it's going to fail, because XML serialization tries to construct a document node and attach the attributes to the document.
So you need to look at what options your XPath or XQuery tool provides for displaying the results.
After really having understood what #Martin Honnen and #Aaron were talking about and some deep dives in tutorials, I think I've come up with a solution that I'm happy with:
string-join((for $l in //title/#lang return string($l)) , ',')
https://xqueryfiddle.liberty-development.net/3Nzd8bR/2

XPath and and or syntax, any shorter way to write this Xpath

I'm filtering a big file that contains types of shoes for children, man as wel as woman.
Now I want to filter out certain types of woman shoes, the following xpath works but there is a xpath length limitation with the program I'm using. So I'm wondering if there a shorter / more efficient way to construct this xpath
/Products/Product[contains(CategoryPath/ProductCategoryPath,'Halbschuhe') and contains(CategoryPath/ProductCategoryPath,'Damen') or contains(CategoryPath/ProductCategoryPath,'Sneaker') and contains(CategoryPath/ProductCategoryPath,'Damen') or contains(CategoryPath/ProductCategoryPath,'Ballerinas') and contains(CategoryPath/ProductCategoryPath,'Damen')]
Edit: Added requested file sample
<Products>
<!-- snip -->
<Product ProgramID="4875" ArticleNumber="GO1-f05-0001-12">
<CategoryPath>
<ProductCategoryID>34857489</ProductCategoryID>
<ProductCategoryPath>Damen > Sale > Schuhe > Sneaker > Sneaker Low</ProductCategoryPath>
<AffilinetProductCategoryPath>Kleidung & Accessoires?</AffilinetProductCategoryPath>
</CategoryPath>
<Price>
<DisplayPrice>40.95 EUR</DisplayPrice>
<Price>40.95</Price>
</Price>
</Product>
<!-- snip -->
</Products>
If you had XPath 2.0 available, you should try the matches() function or even tokenize() as suggested by Ranon in his great answer.
With XPath 1.0, one way to shorten the expression could be this:
/Products/Product[
CategoryPath/ProductCategoryPath[
contains(., 'Damen')
and ( contains(., 'Halbschuhe')
or contains(., 'Sneaker')
or contains(., 'Ballerinas') )] ]
A convenient oneliner for easier copy-paste:
/Products/Product[CategoryPath/ProductCategoryPath[contains(.,'Damen') and (contains(.,'Halbschuhe') or contains(.,'Sneaker') or contains(.,'Ballerinas'))]]
I tried to preserve your expression exactly how it was, none of the changes should change the behaviour in any way.
There are some even shorter solutions that would have to take assumptions about the XML structure etc., but those could be broken in some hidden way we can't see without the full context, so we're not going that way.
If your XPath engine supports XPath 2.0, it can be done in an even more convenient (and probably efficient) way:
//Product[
CategoryPath/ProductCategoryPath[
tokenize(., '\s') = ('Halbschuhe', 'Sneaker', 'Ballerinas') and contains(., 'Damen')
]
]
fn:tokenize($string, $token) splits a string on a regex (here using whitespace, you also could provide a space only). = compares on a set based semantics, so if any of the strings on the left side equal any of the strings on the right side, it returns true.

Problem running xpath query with namespaces

I'm trying to use an xpath expression to select a node-set in an xml document with different namespaces defined.
The xml looks something like this:
<?POSTEN SND="SE00317644000" REC="5566420989" MSGTYPE="EPIX"?>
<ns:Msg xmlns:ns="http://www.noventus.se/epix1/genericheader.xsd">
<GenericHeader>
<SubsysId>1</SubsysId>
<SubsysType>30003</SubsysType>
<SendDateTime>2009-08-13T14:28:15</SendDateTime>
</GenericHeader>
<m:OrderStatus xmlns:m="http://www.noventus.se/epix1/orderstatus.xsd">
<Header>
<OrderSystemId>Soda SE</OrderSystemId>
<OrderNo>20090811</OrderNo>
<Status>0</Status>
</Header>
<Lines>...
I want to select only "Msg"-nodes that has the "OrderStatus" child and therefore I want to use the following xpath expression: /Msg[count('OrderStatus') > 0] but this won't work since I get an error message saying: "Namespace Manager or XsltContext needed. This query has a prefix, variable, or user-defined function".
So I think I want to use an expression that looks something like this: /*[local-name()='Msg'][count('OrderStatus') > 0] but that doesn't seem to work.. any ideas?
Br,
Andreas
I want to use the following xpath
expression:
/Msg[count('OrderStatus')[ 0]
but this won't work since I get an error message saying: "Namespace
Manager or XsltContext needed.
This is a FAQ.
In XPath a unprefixed name is always considered to belong in "no namespace".
However, the elements you want to select are in fact in the "http://www.noventus.se/epix1/genericheader.xsd"
namespace.
You have two possible ways to write your XPath expression:
Use the facilities of the hosting language to associate prefixes to all different namespaces to which names from the expression belong. You haven't indicated what is the hosting language in this concrete case, so I can't help you with this. A C# example can be found here.
If you have associated the prefix "xxx" to the namespace "http://www.noventus.se/epix1/genericheader.xsd" and the prefix "yyy" to the namespace "http://www.noventus.se/epix1/orderstatus.xsd", then your Expression can be written as:
/xxx:Msg[yyy:OrderStatus]
:2: If you don't want to use any prefixes at all, an XPath expression can still be constructed, however it will not be too readable:
/*[local-name() = 'Msg' and *[local-name() = 'OrderStatus']]
Finally, do note:
In order to test if an element x has a child y it isn't necessary to test for a positive count(y). Just use: x[y]
Xpath positions are 1-based. This means that NodeSetExpression[0] never selects a node. You want: NodeSetExpression[1]

Define keyref selector based on element type in XPath

Let's say I have an XML file that will look like this:
<a>
<b d="value1"/>
<c d="value2"/>
</a>
In the XSD file that defines the structure of this XML file I defined the elements by name 'b' and 'c' to be of the same type (and the type requires attribute 'd').
Let's say that I want to make a keyReference of all elements of the type that both 'b' and 'c' are, is there any way in XPath to do this?
At the definition of the type of 'a' I would expect something like this:
<xs:keyref name="myReferenceName" refer="keyToReferTo">
<xs:selector xpath="[#type='typenameof elements b and c?']"/>
<xs:field xpath="#d"/>
</xs:keyref>
Is something like this possible, or is XPath, even in the XSD, schema-unaware?
XPath 1.0 is certainly not aware of any schemas and the W3C XML schema specification in version 1.0 even only uses a subset of XPath 1.0.
I think there is work going on to define a new version of the W3C XML schema language that uses XPath 2.0 but I have no idea about its details and whether it allows then to select elements in a selector based on schema types.
The XPath would be element(*, NameOfTypeGoesHere) I think, see http://www.w3.org/TR/xpath20/#id-element-test

XPath concat multiple nodes

I'm not very familiar with xpath. But I was working with xpath expressions and setting them in a database. Actually it's just the BAM tool for biztalk.
Anyway, I have an xml which could look like:
<File>
<Element1>element1<Element1>
<Element2>element2<Element2>
<Element3>
<SubElement>sub1</SubElement>
<SubElement>sub2</SubElement>
<SubElement>sub3</SubElement>
<Element3>
</File>
I was wondering if there is a way to use an xpath expression of getting all the SubElements concatted? At the moment, I am using:
/*[local-name()='File']/*[local-name()='Element3']/*[local-name()='SubElement']
This works if it only has one index. But apparently my xml sometimes has more nodes, so it gives NULL. I could just use
/*[local-name()='File']/*[local-name()='Element3']/*[local-name()='SubElement'][0]
but I need all the nodes. Is there a way to do this?
Thanks a lot!
Edit: I changed the XML, I was wrong, it's different, it should look like this:
<item>
<element1>el1</element1>
<element2>el2</element2>
<element3>el3</element3>
<element4>
<subEl1>subel1a</subEl1>
<subEl2>subel2a</subEl2>
</element4>
<element4>
<subEl1>subel1b</subEl1>
<subEl2>subel2b</subEl2>
</element4>
</item>
And I need to have a one line code to get a result like: "subel2a subel2b";
I need the one line because I set this xpath expression as an xml attribute (not my choice, it's specified). I tried string-join but it's not really working.
string-join(/file/Element3/SubElement, ',')
/File/Element3/SubElement will match all of the SubElement elements in your sample XML. What are you using to evaluate it?
If your evaluation method is subject to the "first node rule", then it will only match the first one. If you are using a method that returns a nodeset, then it will return all of them.
You can get all SubElements by using:
//SubElement
But this won't keep them grouped together how you want. You will want to do a query for all elements that contain a SubElement (basically do a search for the parent of any SubElements).
//parent::SubElement
Once you have that, you could (depending on your programming language) loop through the parents and concatenate the SubElements.

Resources