resilient searching of elements via xpath?

resilient searching of elements via xpath? - xpath

from my previous question,
how does this xpath behave?
I found that
html//p//table//tr//td/a
could deal with any unexpected elements that show up between the above xpath.
For instance the xpath above could handle:
html/p/div/table/tr/td/a
html/p/table/tr/td/b/div/a
However, how can I formulate an xpath which be fully resilient to missing/unexpected elements ?
For instance, the xpath mentioned in the beginning cannot handle the following:
/html/table/tr/td/a (p is missing)
/html/div/span/table/tr/td/a (p is missing and position replaced with `div/span/`)
Does an xpath syntax exist to deal with above case ? If not, what would be an alternate approach ?
My gut tells me, it's not possible with xpath alone so I am utilizing the following algorithm using pseudocode.
Essentially, what it will do is split up the given xpath, and look for immediate child for each ancestor. If the expected child doesn't exist or is some other element, it will dig through all children of current ancestor, and attempt to discover the expected child.
function searchElement(){
elements[] = "/html/p/table/tr/td/a".split("/");
thisElement = "";
for (element in elements) {
if (firstItem){
thisElement = findElementByXpath(element);
}else{
try{
thisElement.findElementByXpath(element); //look for this element inside previous element (from previous iteration);
}catch(NotFoundException e){ //if element is not found, search all elements inside previous element, and look for it.
foundElement = false;
discoveredElement = thisElement.findElementByXpath("*");
while(foundElement != true){
if (discoveredElement.findEleemntByXpath(element) != null){
//successful, element found, overwrite.
thisElement = thisElement.findElementByXpath("*").findEleemntByXpath(element);
foundElement = true;
}else{
//not successful, keep digging.
discoveredElement = discoveredElement.findElementByXpath("*");
}
}
}
}
}
return thisElement;
}
Is this an optimal approach ? I am worried that searching for "*" and digging through each Element is rather inefficient.
I don't know what to tag this question besides "xpath"...feel free to edit.
Thank you.

If I understand you correctly, you want to select a elements with specific ordered optional ancestors.
Then your expression: /html//p//table//tr//td/a
It should be:
//a[(self::*|parent::td)[1]
[(self::*|ancestor::tr)[1]
[(self::*|ancestor::table)[1]
[(self::*|ancestor::p)[1]
[ancestor::html[not(parent::*)]]
]
]
]
]
But this is the same as:
/html//a |
/html//td/a |
/html//tr//a |
/html//tr//td/a |
/html//table//a |
/html//table//td/a |
/html//table//tr//a |
/html//table//tr//td/a |
/html//p//a |
/html//p//td/a |
/html//p//tr//a |
/html//p//tr//td/a |
/html//p//table//a |
/html//p//table//td/a |
/html//p//table//tr//a |
/html//p//table//tr//td/a |
and /html//a is so general that it would select any a

It's possible, but a really bad idea.
The // construct means "skip any number of elements." So you could use a path of //td to find a "td" element anywhere in the DOM.
Which means that you'll pick up the element at /html/body/im/not/what/you/want/td

Related

Xpath uninion parameter from parent

here xpath one
/Document/Attributes/BlobContent/Property[#Name="FileName"]/parent::*/Reference/#Link
and xpath two
Document/Attributes/BlobContent/Property[#Name="FileName"]/parent::*/Property[#Name="FileName"]/#Value
both bring back the right result !
I would like to avoid the complete chaining [one | two] as that brought back only a list of alternating results.
tried with
/Document/Attributes/BlobContent/Property[#Name="FileName"]/parent::*/Reference/#Link | */Property[#Name="FileName"]/#Value
but that brings back only the later one.
So how would I correctly bring back two child node attributes from a found parent ?

For anyone interested I didn't find any XPATH solution. However that python code did work for me
import xml.etree.ElementTree as ET
tree = ET.parse(file_xml)
root = tree.getroot()
blobs = root.findall("*/Attributes[1]/BlobContent")
for blob in blobs:
try:
filename = blob.find('Property[#Name="FileName"]').attrib["Value"]
exportname = blob.find('Reference[#Type="RELATIVEFILE"]').attrib["Link"]
print(filename + "," + exportname)
except:
#no filename Property
pass

What is the syntax for a sorted XPath query on atomic values?

I am trying to execute the following select in my Java code:
// initialized earlier
private XdmNode xmlDocument;
private XPathCompiler xPath;
// ... the code that's a problem:
XPathExecutable exec = xPath.compile("sort(distinct-values(/root/data/hasTransaction/element/hasAssets/element/associatedAttributes/element[(value != '') and (dataParamName != 'modelNomenclature')]/name), (), function($node) { $node/displaySeq })");
XPathSelector selector = exec.load();
selector.setContextItem(xmlDocument);
selector.evaluate();
The call to evaluate() throws the exception:
net.sf.saxon.trans.XPathException: The required item type of the first operand of '/' is node(); the supplied value u"Model Name" is an atomic value
What is wrong with the query? I know distinct-values() returns atomic values, but why is there a problem sorting those? Is it that $node makes no sense to sort atomic values? But the select (without the sort) is:
/root/data/hasTransaction/element/hasAssets/element/associatedAttributes/element[(value != '') and (dataParamName != 'modelNomenclature')]/name
And there is at /root/data/hasTransaction/element/hasAssets/element/associatedAttributes:
<associatedAttributes>
<element>
<uom>N/A</uom>
<name>Model Name</name>
<dataParamName>modelName</dataParamName>
<seq />
<value>A17-230P1A</value>
<displayLevel>0</displayLevel>
<displaySeq>5</displaySeq>
<displayLevelTitle />
<displayName>Model Name</displayName>
<dataGroup />
</element>
So it seems logical (to me) as it's "...[...]/name" that it can sort on displaySeq

The "/" operator requires a node (not an atomic value) on the LHS (as the error message says).
You're trying, I think, to eliminate element elements as duplicates if they have the same name child, and then to sort them by the value of displaySeq. Unfortunately distinct-values() only retains the (atomic) values, it loses knowledge of the nodes from which these values were derived. (And in principle at least, two elements with the same name can have different values for displaySeq, so it's not clear which one you want to retain.
Ideally you would use XSLT or XQuery grouping for this, rather than distinct-values. If you have to use XPath, you could consider creating a map to do the deduplication:
let $index := map:merge(/root/data/hasTransaction/element/hasAssets
/element/associatedAttributes/element[
(value != '') and (dataParamName != 'modelNomenclature')
]!map{displaySeq : .}/name)
return sort(map:for-each($index, function($k, $v){$v}),
(), function($node) { $node/displaySeq })
Not tested.

Perhaps
distinct-values(sort(/root/data/hasTransaction/element/hasAssets/element/associatedAttributes/element[(value != '') and (dataParamName != 'modelNomenclature')], (), function($node) { $node/displaySeq })!name)
gives you the right result, it sorts the element elements by the displaySeq, than selects the name child elements and computes the distinct ones.
You could also write it as
(/root/data/hasTransaction/element/hasAssets/element/associatedAttributes/element[(value != '') and (dataParamName != 'modelNomenclature')] => sort((), function($node) { $node/displaySeq })) ! name => distinct-values()

Sitecore item multilistfield XPATH builder

I'm trying to count with XPATH Builder in Sitecore, the number of items which have more than 5 values in a multilist field.
I cannot count the number of "|" from raw values, so I can say I am stuck.
Any info will be helpful.
Thank you.

It's been a long time since I used XPath in Sitecore - so I may have forgotten something important - but:
Sadly, I don't think this is possible. XPath Builder doesn't really run proper XPath. It understands a subset of things that would evaluate correctly in a full XPath parser.
One of the things it can't do (on the v8-initial-release instance I have to hand) is be able to process XPath that returns things that are not Sitecore Items. A query like count(/sitecore/content/*) should return a number - but if you try to run that using either the Sitecore Query syntax, or the XPath syntax options you get an error:
If you could run such a query, then your answer would be based on an expression like this, to perform the count of GUIDs referenced by a specific field:
string-length( translate(/yourNodePath/#yourFieldName, "abcdefg1234567890{}-", "") ) + 1
(Typed from memory, as I can't run a test - so may not be entirely correct)
The translate() function replaces any character in the first string with the relevant character in the second. Hence (if I've typed it correctly) that expression should remove all your GUIDs and just leave the pipe-separator characters. Hence one plus the length of the remaining string is your answer for each Item you need to process.
But, as I say, I don't think you can actually run that from Query Builder...
These days, people tend to use Sitecore PowerShell Extensions to write ad-hoc queries like this. It's much more flexible and powerful - so if you can use that, I'd recommend it.
Edited to add: This question got a bit stuck in my head - so if you are able to use PowerShell, here's how you might do it:
Assuming you have declared where you're searching, what MultiList field you're querying, and what number of selections Items must exceed:
$root = "/sitecore/content/Root"
$field = "MultiListField"
$targetNumber = 3
then the "easy to read" code might look like this:
foreach($item in Get-ChildItem $root)
{
$currentField = Get-ItemField $item -ReturnType Field -Name $field
if($currentField)
{
$count = $currentField.Value.Split('|').Count
if($count -gt $targetNumber)
{
$item.Paths.Path
}
}
}
It iterates the children of the root item you specified, and gets the contents of your field. If that field name had a value, it then splits that into GUIDs and counts them. If the result of that count is greater than your threshold it returns the item's URI.
You can get the same answer out of a (harder to read) one-liner, which would look something like:
Get-ChildItem $root | Select-Object Paths, #{ Name="FieldCount"; Expression={ Get-ItemField $_ -ReturnType Field -Name $field | % { $_.Value.Split('|').Count } } } | Where-Object { $_.FieldCount -gt $targetNumber } | % { $_.Paths.Path }
(Not sure if that's the best way to write that - I'm no expert at PowerShell syntax - but it gives the same results as far as I can see)

xpath: check if current elements position is second in order

Background:
I have an XML document with the following structure:
<body>
<section>content</section>
<section>content</section>
<section>content</section>
<section>content</section>
</body>
Using xpath I want to check if a <section> element is the second element and if so apply some function.
Question:
How do I check if a <section> element is the second element in the body element?
../section[position()=2]

If you want to know if the second element in the body is named section then you can do this:
local-name(/body/child::element()[2]) eq "section"
That will return either true or false.
However, you then asked how can you check this and if it is true, then apply some function. In XPath you cannot author your own functions you can only do that in XQuery or XSLT. So let me for a moment assume you are wishing to call a different XPath function on the value of the second element if it is a section. Here is an example of applying the lower-case function:
if(local-name(/body/child::element()[2]) eq "section")then
lower-case(/body/child::element()[2])
else()
However, this can simplified as lower-case and many other functions take a value with a minimum cardinality of zero. This means that you can just apply the function to a path expression, and if the path did not match anything then the function typically returns an empty sequence, in the same way as a path that did not match will. So, this is semantically equivalent to the above:
lower-case(/body/child::element()[2][local-name(.) eq "section"])
If you are in XQuery or XSLT and are writing your own functions, I would encourage you to write functions that will accept a minimum cardinality of zero, just like lower-case does. By doing this you can chain functions together, and if there is no input data (i.e. from a path expression that does not match anything), these is no output data. This leads to a very nice functional programming style.

Question: How do I check if a element is the second element
in the body element?
Using C#, you can utilize theXPathNodeIterator class in order to traverse the nodes data, and use its CurrentPosition property to investigate the current node position:
XPathNodeIterator.CurrentPosition
Example:
const string xmlStr = #"<body>
<section>1</section>
<section>2</section>
<section>3</section>
<section>4</section>
</body>";
using (var stream = new StringReader(xmlStr))
{
var document = new XPathDocument(stream);
XPathNavigator navigator = document.CreateNavigator();
XPathNodeIterator nodes = navigator.Select("/body/section");
if (nodes.MoveNext())
{
XPathNavigator nodesNavigator = nodes.Current;
XPathNodeIterator nodesText =
nodesNavigator.SelectDescendants(XPathNodeType.Text, false);
while (nodesText.MoveNext())
{
if (nodesText.CurrentPosition == 2)
{
//DO SOMETHING WITH THE VALUE AT THIS POSITION
var currentValue = nodesText.Current.Value;
}
}
}
}

Is there a DRYer XPath expression for union?

This works nicely for finding button-like HTML elements, (purposely simplified):
//button[text()='Buy']
| //input[#type='submit' and #value='Buy']
| //a/img[#title='Buy']
Now I need to constrain this to a context. For example, the Buy button that appears inside a labeled box:
//legend[text()='Flubber']
And this works, (.. gets us to the containing fieldset):
//legend[text()='Flubber']/..//button[text()='Buy']
| //legend[text()='Flubber']/..//input[#type='submit' and #value='Buy']
| //legend[text()='Flubber']/..//a/img[#title='Buy']
But is there any way to simplify this? Sadly, this sort of thing doesn't work:
//legend[text()='Flubber']/..//(
button[text()='Buy']
| input[#type='submit' and #value='Buy']
| a/img[#title='Buy'])
(Note that this is for XPath within the browser, so XSLT solutions will not help.)

Combine multiple conditions in a single predicate:
//legend[text()='Flubber']/..//*[self::button[text()='Buy'] or
self::input[#type='submit' and #value='Buy'] or
self::img[#title='Buy'][parent::a]]
In English:
Select all descendants of the parent (or the parent itself)
for any legend element having the
text "Flubber" that are any of 1) a button
element having the text "Buy" or 2) an
input element having an attribute
type whose value is "submit" and an
attribute named value whose value is
"Buy" or 3) an img having an
attribute named title with a value
of "Buy" and whose parent is an a
element.

From comments:
Adjusting slightly to obtain the A
rather than the IMG:
self::a[img[#title='Buy']]. (Now if
only 'Buy' could be reduced
Use this XPath 1.0 expression:
//legend[text() = 'Flubber']/..
//*[
self::button/text()
| self::input[#type = 'submit']/#value
| self::a/img/#title
= 'Buy'
]
EDIT: I didn't see the parent accessor. Other way in one direction only:
//*[legend[text() = 'Flubber']]
//*[
self::button/text()
| self::input[#type = 'submit']/#value
| self::a/img/#title
= 'Buy'
]

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

resilient searching of elements via xpath? - xpath

It's possible, but a really bad idea. The // construct means "skip any number of elements." So you could use a path of //td to find a "td" element anywhere in the DOM. Which means that you'll pick up the element at /html/body/im/not/what/you/want/td

Related

Xpath uninion parameter from parent

What is the syntax for a sorted XPath query on atomic values?

Sitecore item multilistfield XPATH builder

xpath: check if current elements position is second in order

Is there a DRYer XPath expression for union?

Categories

Resources