I have the an XML document that will balloon in size as time goes on and I would like to ensure that my XPath choice for an XSL select will be as efficient as possible.
The document contains the following types of elements:
<simple_instance>
<name>Class0</name>
<type>Business_Capability</type>
<own_slot_value>
<slot_reference>contained_business_capabilities</slot_reference>
<value value_type="simple_instance">Class1</value>
<value value_type="simple_instance">Class3</value>
<value value_type="simple_instance">Class4</value>
<value value_type="simple_instance">Class5</value>
</own_slot_value>
<own_slot_value>
<slot_reference>business_capability_level</slot_reference>
<value value_type="string">1</value>
</own_slot_value>
<own_slot_value>
<slot_reference>name</slot_reference>
<value value_type="string">Planning</value>
</own_slot_value>
</simple_instance>
Which of these two selectors (which find elements like the one above) will be more efficient in the long run?
/node()/simple_instance[type='Business_Capability']/own_slot_value/slot_reference[text()='business_capability_level']/following-sibling::value[text()='1']
or
/node()/simple_instance[type='Business_Capability' and (own_slot_value/slot_reference='business_capability_level') and (own_slot_value/value='1')]
My guess is that, if the implementation of XML short-circuits the and, the latter will be quicker.
Note: I'm using Protege's XML/XSL capabilities.
The two XPath expressions have different results, so asking which is faster seems irrelevant (the first selects a value element, the second a simple_instance element).
In addition, XPath is a specification not an implementation. Implementations differ widely in their strategies for evaluating complex paths. An answer that is true for one implementation may well not be true for another. Measure it and see (and tell us the answer).
Related
Using XSD schema validation 1.0 I want to verify an element has at least one attribute specified.
For example, a simple element like this:
<foo a="1" b="2" c="3" />
I want to verify that at least attribute b or c is specified. But note that both can also be specified--they're not mutually exclusive.
I tried using a key along the lines of:
<xs:key name="AttributeSpecified">
<xs:selector xpath="." />
<xs:field xpath="#b|#c" />
</xs:key>
but it fails when both attributes are specified (because multiple results are returned).
Can it be done?
This is not possible in XSD 1.0. It might be possible in XSD 1.1.
I am a fan of XML Schema, but I would not choose it for this type of validation. You might be able to make it work using XSD1.1 but if your requirements became just a little more complex you could end up with some horrible-looking constraints.
On the other hand, an XPath expression can elegantly express any constraint you can think of, and you would not need to bend the language to make it work.
I learned that every Xpath expression is also a valid Xquery expression. I'm using Oxygen 16.1 with this sample XML:
<actors>
<actor filmcount="4" sex="m" id="15">Anderson, Jeff</actor>
<actor filmcount="9" sex="m" id="38">Bishop, Kevin</actor>
</actors>
My expression is:
//actor/#id
When I evaluate this expression in Oxygen with Xpath 3.0, I get exactly what I expect:
15
38
However, when I evaluate this expression with Xquery 3.0 (also 1.0), I get the message: "Your query returned an empty sequence.
Can anyone provide any insight as to why this is, and how I can write the equivalent Xquery statement to get what the Xpath statement did above?
Other XQuery implementations do support this query
If you want to validate that your query (as corrected per discussion in comments) does in fact work with other XQuery implementations when entered exactly as given in the question, you can run it as follows (tested in BaseX):
declare context item := document { <actors>
<actor filmcount="4" sex="m" id="15">Anderson, Jeff</actor>
<actor filmcount="9" sex="m" id="38">Bishop, Kevin</actor>
</actors> };
//actor/#id
Oxygen XQuery needs some extra help
Oxygen XML doesn't support serializing attributes, and consequently discards them from a result sequence when that sequence would otherwise be provided to the user.
Thus, you can work around this with a query such as the following:
//actor/#id/string(.)
data(//actor/#id)
Below applies to a historical version of the question.
Frankly, I would not expect //actors/#id to return anything against that data with any valid XPath or XQuery engine, ever.
The reason is that there's only one place you're recursing -- one // -- and that's looking for actors. The single / between the actors and the #id means that they need to be directly connected, but that's not the case in the data you give here -- there's an actor element between them.
Thus, you need to fix your query. There are numerous queries you could write that would find the data you wanted in this document -- knowing which one is appropriate would require more information than you've provided:
//actor/#id - Find actor elements anywhere, and take their id attribute values.
//actors/actor/#id - Find actors elements anywhere; look for actor elements directly under them, and take the id attribute of such actor elements.
//actors//#id - Find all id attributes in subtrees of actors elements.
//#id - Find id attributes anywhere in the document.
...etc.
I'm trying to figure out how to fully specify a sliced element. If I'm reading the spec right, nameReference is the only place where a "sub element" of a slice can declare which slice it's "on".
So, if telecom is sliced by use and system and I want to specify a constraint on home phone, I have to fix use and system to those values and then add my constraints on that slice.
Consider:
Resource Example ElementDefinition attributes
================================ =====================================================================
<Patient> name="Patient"
... snip ...
<telecom> name="HomePhone"
<system value="phone" /> name="HomePhone.system", nameReference="HomePhone", fixedCode="phone"
<use value="home" /> name="HomePhone.use" , nameReference="HomePhone", fixedCode="home"
<value value="5551234567" /> name="HomePhone.value" , nameReference="HomePhone"
</telecom>
... snip ...
</Patient>
In most examples, it appears that a dotted notation of Name has been used (as I've placed in the example). But the specification doesn't require this and provides no format that could be reliably parsed.
The problem is: nameReference and fixed[x] are mutually exclusive. What's the correct way to handle this??
Repetitions in an instance don't "declare" what slice they're part of. They simply declare the appropriate value for what ever element(s) are the discriminator for the slicing process. nameReference isn't involved at all. On the definition side, association is simply handled by name. So HomePhone.system is associated with HomePhone simply by the name and by sequential proximity. The dot-notation is required. We could probably be a bit more explicit about that though, so feel free to submit a change request.
I'm filtering a big file that contains types of shoes for children, man as wel as woman.
Now I want to filter out certain types of woman shoes, the following xpath works but there is a xpath length limitation with the program I'm using. So I'm wondering if there a shorter / more efficient way to construct this xpath
/Products/Product[contains(CategoryPath/ProductCategoryPath,'Halbschuhe') and contains(CategoryPath/ProductCategoryPath,'Damen') or contains(CategoryPath/ProductCategoryPath,'Sneaker') and contains(CategoryPath/ProductCategoryPath,'Damen') or contains(CategoryPath/ProductCategoryPath,'Ballerinas') and contains(CategoryPath/ProductCategoryPath,'Damen')]
Edit: Added requested file sample
<Products>
<!-- snip -->
<Product ProgramID="4875" ArticleNumber="GO1-f05-0001-12">
<CategoryPath>
<ProductCategoryID>34857489</ProductCategoryID>
<ProductCategoryPath>Damen > Sale > Schuhe > Sneaker > Sneaker Low</ProductCategoryPath>
<AffilinetProductCategoryPath>Kleidung & Accessoires?</AffilinetProductCategoryPath>
</CategoryPath>
<Price>
<DisplayPrice>40.95 EUR</DisplayPrice>
<Price>40.95</Price>
</Price>
</Product>
<!-- snip -->
</Products>
If you had XPath 2.0 available, you should try the matches() function or even tokenize() as suggested by Ranon in his great answer.
With XPath 1.0, one way to shorten the expression could be this:
/Products/Product[
CategoryPath/ProductCategoryPath[
contains(., 'Damen')
and ( contains(., 'Halbschuhe')
or contains(., 'Sneaker')
or contains(., 'Ballerinas') )] ]
A convenient oneliner for easier copy-paste:
/Products/Product[CategoryPath/ProductCategoryPath[contains(.,'Damen') and (contains(.,'Halbschuhe') or contains(.,'Sneaker') or contains(.,'Ballerinas'))]]
I tried to preserve your expression exactly how it was, none of the changes should change the behaviour in any way.
There are some even shorter solutions that would have to take assumptions about the XML structure etc., but those could be broken in some hidden way we can't see without the full context, so we're not going that way.
If your XPath engine supports XPath 2.0, it can be done in an even more convenient (and probably efficient) way:
//Product[
CategoryPath/ProductCategoryPath[
tokenize(., '\s') = ('Halbschuhe', 'Sneaker', 'Ballerinas') and contains(., 'Damen')
]
]
fn:tokenize($string, $token) splits a string on a regex (here using whitespace, you also could provide a space only). = compares on a set based semantics, so if any of the strings on the left side equal any of the strings on the right side, it returns true.
I'm not very familiar with xpath. But I was working with xpath expressions and setting them in a database. Actually it's just the BAM tool for biztalk.
Anyway, I have an xml which could look like:
<File>
<Element1>element1<Element1>
<Element2>element2<Element2>
<Element3>
<SubElement>sub1</SubElement>
<SubElement>sub2</SubElement>
<SubElement>sub3</SubElement>
<Element3>
</File>
I was wondering if there is a way to use an xpath expression of getting all the SubElements concatted? At the moment, I am using:
/*[local-name()='File']/*[local-name()='Element3']/*[local-name()='SubElement']
This works if it only has one index. But apparently my xml sometimes has more nodes, so it gives NULL. I could just use
/*[local-name()='File']/*[local-name()='Element3']/*[local-name()='SubElement'][0]
but I need all the nodes. Is there a way to do this?
Thanks a lot!
Edit: I changed the XML, I was wrong, it's different, it should look like this:
<item>
<element1>el1</element1>
<element2>el2</element2>
<element3>el3</element3>
<element4>
<subEl1>subel1a</subEl1>
<subEl2>subel2a</subEl2>
</element4>
<element4>
<subEl1>subel1b</subEl1>
<subEl2>subel2b</subEl2>
</element4>
</item>
And I need to have a one line code to get a result like: "subel2a subel2b";
I need the one line because I set this xpath expression as an xml attribute (not my choice, it's specified). I tried string-join but it's not really working.
string-join(/file/Element3/SubElement, ',')
/File/Element3/SubElement will match all of the SubElement elements in your sample XML. What are you using to evaluate it?
If your evaluation method is subject to the "first node rule", then it will only match the first one. If you are using a method that returns a nodeset, then it will return all of them.
You can get all SubElements by using:
//SubElement
But this won't keep them grouped together how you want. You will want to do a query for all elements that contain a SubElement (basically do a search for the parent of any SubElements).
//parent::SubElement
Once you have that, you could (depending on your programming language) loop through the parents and concatenate the SubElements.