XPath :: running counter two levels - xpath

Using the count(preceding-sibling::*) XPath expression one can obtaining incrementing counters. However, can the same also be accomplished in a two-levels deep sequence?
example XML instance
<grandfather>
<father>
<child>a</child>
</father>
<father>
<child>b</child>
<child>c</child>
</father>
</grandfather>
code (with Saxon HE 9.4 jar on the CLASSPATH for XPath 2.0 features)
Trying to get an counter sequence of 1,2 and 3 for the three child nodes with different kinds of XPath expressions:
XPathExpression expr = xpath.compile("/grandfather/father/child");
NodeList nodes = (NodeList) expr.evaluate(doc, XPathConstants.NODESET);
for (int i = 0 ; i < nodes.getLength() ; i++) {
Node node = nodes.item(i);
System.out.printf("child's index is: %s %s %s, name is: %s\n"
,xpath.compile("count(preceding-sibling::*)").evaluate(node)
,xpath.compile("count(preceding-sibling::child)").evaluate(node)
,xpath.compile("//child/position()").evaluate(doc)
,xpath.compile(".").evaluate(node));
}
The above code prints:
child's index is: 0 0 1, name is: a
child's index is: 0 0 1, name is: b
child's index is: 1 1 1, name is: c
None of the three XPaths I tried managed to produce the correct sequence: 1,2,3. Clearly it can trivially be done using the i loop variable but I want to accomplish it with XPath if possible. Also I need to keep the basic framework of evaluating an XPath expression to get all the nodes to visit and then iterating on that set since that's the way the real application I work on is structured. Basically I visit each node and then need to evaluate a number of XPath expressions on it (node) or on the document (doc); one of these XPAth expressions is supposed to produce this incrementing sequence.

Use the preceding axis with a name test instead.
count(preceding::child)
Using XPath 2.0, there is a much better way to do this. Fetch all <child/> nodes and use the position() function to get the index:
//child/concat("child's index is: ", position(), ", name is: ", text())

You don't say efficiency is important, but I really hate to see this done with O(n^2) code! Jens' solution shows how to do that if you can use the result in the form of a sequence of (position, name) pairs. You could also return an alternating sequence of strings and numbers using //child/(string(.), position()): though you would then want to use the s9api API rather than JAXP, because JAXP can only really handle the data types that arise in XPath 1.0.
If you need to compute the index of each node as part of other processing, it might still be worth computing the index for every node in a single initial pass, and then looking it up in a table. But if you're doing that, the simplest way is surely to iterate over the result of //child and build a map from nodes to the sequence number in the iteration.

Related

XQuery/XPath "except" operation - Select part of sequence that is not in other sequence

I have a pretty simple example but I am just learning and can't find a solution for the following:
Given 2 sequences, being
<emp>10</emp>
<emp>42</emp>
<emp>100</emp>
and another sequence
<emp>10</emp>
<emp>42</emp>
Want i want to do is: Compare the sequences and return the part of sequences that is in the first, but not in the 2nd sequence, being <emp>100</emp> in this case.
I was thinking about an "except"-operation, but can't figure out how to make it working.
Help greatly appreciated.
The except expression operates on node identity, not node value. What I think you want is a value comparison over your sequences. For example:
let $seq1 :=
(<emp>10</emp>,
<emp>42</emp>,
<emp>100</emp>)
let $seq2 :=
(<emp>10</emp>,
<emp>42</emp>)
return $seq1[not(. = $seq2)]
=>
<emp>100</emp>

How to return X elements [Selenium]?

A page loads 35.000 elements, which only the first 10 are of interest to me. Returning all elements makes the scraping extremely slow.
I only succeeded in either returning the first element with:
driver.find_element_by
Or returning all, 35.000 elements, with:
driver.find_elements_by
Anyone knows a way to return x amount of elements found?
Selenium does not provide a facility that allows returning only a slice of the .find_elements... calls. A general solution if you want to optimize things so that you do not need to have Selenium return every single element is perform the slice operation on the browser side, in JavaScript. I present this solution in this answer here. If you want to use XPath for selecting the DOM nodes, you could adapt the answer here to that, or you could use the method in another answer I've submitted.
from selenium import webdriver
driver = webdriver.Chrome()
driver.get("http://www.example.com")
# We add 35000 paragraphs with class `test` to the page so that we can
# later show how to get the first 10 paragraphs of this class. Each
# paragraph is uniquely numbered.
driver.execute_script("""
var html = [];
for (var i = 0; i < 35000; ++i) {
html.push("<p class='test'>"+ i + "</p>");
}
document.body.innerHTML += html.join("");
""")
elements = driver.execute_script("""
return Array.prototype.slice.call(document.querySelectorAll("p.test"), 0, 10);
""")
# Verify that we got the first 10 elements by outputting the text they
# contain to the console. The loop here is for illustration purposes
# to show that the `elements` array contains what we want. In real
# code, if I wanted to process the text of the first 10 elements, I'd
# do what I show next.
for element in elements:
print element.text
# A better way to get the text of the first 10 elements. This results
# in 1 round-trip between this script and the browser. The loop above
# would take 10 round-trips.
print driver.execute_script("""
return Array.prototype.slice.call(document.querySelectorAll("p.test"), 0, 10)
.map(function (x) { return x.textContent; });;
""")
driver.quit()
The Array.prototype.slice.call rigmarole is needed because what document.querySelectorAll returns looks like an Array but is not actually an Array object. (It is a NodeList.) So it does not have a .slice method but you can pass it to Array's slice method.
Here is a significantly different approach presented as a different answer because some people will prefer this one to the other one I gave, or the other one to this one.
This one relies on using XPath to slice the results:
from selenium import webdriver
driver = webdriver.Chrome()
driver.get("http://www.example.com")
# We add 35000 paragraphs with class `test` to the page so that we can
# later show how to get the first 10 paragraphs of this class. Each
# paragraph is uniquely numbered. These paragraphs are put into
# individual `div` to make sure they are not siblings of one
# another. (This prevents offering a naive XPath expression that would
# work only if they *are* siblings.)
driver.execute_script("""
var html = [];
for (var i = 0; i < 35000; ++i) {
html.push("<div><p class='test'>"+ i + "</p></div>");
}
document.body.innerHTML += html.join("");
""")
elements = driver.find_elements_by_xpath(
"(//p[#class='test'])[position() < 11]")
for element in elements:
print element.text
driver.quit()
Note that XPath uses 1-based indexes so < 11 is indeed the proper expression. The parentheses around the first part of the expression are absolutely necessary. With these parentheses, the [position() < 11] test checks the position each node has in the nodeset which is the result of the expression in parentheses. Without them, the position test would check the position of the nodes relative to their parents nodes, which would match all nodes because all <p> are at the first position in their respective <div>. (This is why I've added those <div> elements above: to show this problem.)
I would use this solution if I were already using XPath for my selection. Otherwise, if I were doing a search by CSS selector or by id I would not convert it to XPath only to perform the slice. I'd use the other method I've shown.

Need some explanation about getting max in XPath

I'm kinda new to XPath and I've found that to get the max attribute number I can use the next statement: //Book[not(#id > //Book/#id) and it works quite well.
I just can't understand why does it return max id instead of min id, because it looks like I'm checking whether id of a node greater than any other nodes ids and then return a Book where it's not.
I'm probably stupid, but, please, someone, explain :)
You're not querying for maximum values, but for minimum values. Your query
//Book[not(#id > //Book/#id)
could be translated to natural language as "Find all books, which do not have an #id that is larger than any other book's #id". You probably want to use
//Book[not(#id < //Book/#id)
For arbitrary input you might have wanted to use <= instead, so it only returns a single maximum value (or none if it is shared). As #ids must be unique, this does not matter here.
Be aware that //Book[#id > //Book/#id] is not equal to the query above, although math would suggest so. XPath's comparison operators adhere to a kind of set-semantics: if any value on the left side is larger than any value on the right side, the predicate would be true; thus it would include all books but the one with minimum #id value.
Besides XPath 1.0 your function is correct, in XPath 2.0:
/Books/Book[id = max(../Book/id)]
The math:max function returns the maximum value of the nodes passed as the argument. The maximum value is defined as follows. The node set passed as an argument is sorted in descending order as it would be by xsl:sort with a data type of number. The maximum is the result of converting the string value of the first node in this sorted list to a number using the number function.
If the node set is empty, or if the result of converting the string values of any of the nodes to a number is NaN, then NaN is returned.
The math:max template returns a result tree fragment whose string value is the result of turning the number returned by the function into a string.

How to select all nodes such that their group size is higher than a given value, in XPath

I would like to select all <mynode> elements that have a value that appears a certain number of times (say, x) in all the elements.
Example:
<root>
<mynode>
<attr1>value_1</attr1>
<attr2>value_2</attr2>
</mynode>
<mynode>
<attr1>value_3</attr1>
<attr2>value_3</attr2>
</mynode>
<mynode>
<attr1>value_4</attr1>
<attr2>value_5</attr2>
</mynode>
<mynode>
<attr1>value_6</attr1>
<attr2>value_5</attr2>
</mynode>
</root>
In this case, I want all the <mynode> elements that whose attr2 value occurs > 1 time (x = 1). So, the last two <mynode>s.
Which query I have to perform in order to achieve this target?
If you're using XPath 2.0 or greater, then the following will work:
for $value in distinct-values(/root/mynode/attr2)
return
if (count(/root/mynode[attr2 = $value]) > 1) then
/root/mynode[attr2 = $value]
else ()
For a more detailed discussion see: XPath/XSLT nested predicates: how to get the context of outer predicate?
This is also possible in plain XPath 1.0 (also works in newer versions of XPath); and probably easier to read. Think of your problem as you're looking for all <mynode/>s which have an <att2/> node that also occurs before or after the <mynode/>:
//mynode[attr2 = preceding::attr2 or attr2 = following::attr2]
If <att2/> nodes can also accour inside other elements and you do not want to test for those:
//mynode[attr2 = preceding::mynode/attr2 or attr2 = following::mynode/attr2]

simple method to keep last n elements in a queue for vb6?

I am trying to keep the last n elements from a changing list of x elements (where x >> n)
I found out about the deque method, with a fixed length, in other programming languages. I was wondering if there is something similar for VB6
Create a Class that extends an encapsulated Collection.
Add at the end (anonymous), retrieve & remove from the beginning (index 1). As part of adding check your MaxDepth property setting (or hard code it if you like) and if Collection.Count exceeds it remove the extra item.
Or just hard code it all inline if a Class is a stumper for you.
This is pretty routine.
The only thing I can think of is possibly looping through the last 5 values of the dynamic array using something like:
For UBound(Array) - 5 To UBound(Array)
'Code to store or do the desired with these values
Loop
Sorry I don't have a definite answer, but hopefully that might help.
Here's my simplest solution to this:
For i = n - 1 To 1 Step -1
arrayX(i) = arrayX(i - 1)
Next i
arrayX(0) = latestX
Where:
arrayX = array of values
n = # of array elements
latestX = latest value of interest (assumes entire code block is also
within another loop)

Resources