Consider a query
//item[value='testvalue']/ancestor::container[1]
if item appears several times inside a container then we have several hits that supposedly should appear several times in the results. The results are nodes, right? So if I apply distinct-values to them they would stop being nodes and the function would technically return values losing positional information. But is there operation (refactoring, function) that allows to keep "noded" result while at the same time exclude duplicate hits?
is there operation (refactoring, function) that allows to keep "noded"
result while at the same time exclude duplicate hits?
By definition the XPath operator / performs deduplication, therefore:
//item[value='testvalue']/ancestor::container[1]
doesn't select two identical nodes.
Related
I am trying to construct a JSONata query using the try.jsonata.org Invoice data.
The query I am trying to pose is select distinct OrderID where Order.Product.Price is < 50?
I have not been able to figure out how to do this using the predicate in square brackets notation ... my attempts have been thwarted when I try to get past the $.Account.Order.Product array.
Using $map and $reduce I was able to come up with this rather complex solution ... which still doesn't correctly handle duplicate OrderIDs. (I see that the issue of duplicate removal has been requested here)
Q: What is the proper way to express this query in JSONata?
I think this does what you need:
Account.Order[Product.Price.($ < 50)].OrderID
The expression in the predicate, which gets tested for each Order, will generate an array of Booleans (one for each Product.Price). The resulting predicate will evaluate to true if any of the Booleans within that array are true, due to the semantics of the $boolean function which is implicitly applied.
Overall, the expression will return the OrderID for every Order which has at least one Product whose Price is less than 50
In current project we need to find cheapest paths in almost fully connected graph which can contain lots of edges per vertex pair.
We developed a plugin containing functions
for special traversal this graph to lower reoccurences of similar paths while TRAVERSE execution. We will refer it as search()
for special effective extraction of desired information from results of such traverses. We will refer it as extract()
for extracting best N records according to target parameter without costly ORDER BY. We will refer it as best()
But resulted query still has unsatisfactory performance on full data.
So we decided to modify search() function so it could watch best edges first and prune paths leading to definitely undesired result by using current state of best() function.
Overall solution is effectively a flexible implementation of Branch and Bound method
Resulting query (omitting extract() step) should look like
SELECT best(path, <limit>) FROM (
TRAVERSE search(<params>) FROM #<starting_point>
WHILE <conditions on intermediate vertixes>
) WHERE <conditions on result elements>
This form is very desired so we could adapt conditions under WHILE and WHERE for our current task. The path field is generated by search() containing all information for best() to proceed.
The trouble is that best() function is executed strictly after search() function, so search() can not prune non-optimal branches according to results already evaluated by best().
So the Question is:
Is there a way to pipeline results from TRAVERSE step to SELECT step in the way that older paths were TRAVERSEd with search() after earlier paths handled by SELECT with best()?
the query execution in this case will be streamed. If you add a
System.out.println()
or you put a breakpoint in your functions you'll see that the invocation sequence will be
search
best
search
best
search
...
You can use a ThreadLocal object http://docs.oracle.com/javase/7/docs/api/java/lang/ThreadLocal.html
to store some context data and share it between the two functions, or you can use the OCommandContext (the last parameter in OSQLFunction.execute() method to store context information.
You can use context.getVariable() and context.setVariable() for this.
The contexts of the two queries (the parent and the inner query) are different, but they should be linked by a parent/child relationship, so you should be able to retrieve them using OCommandContext.getParent()
Can someone explain how to read these diagrams? I understand the flow from head to tail, but I am specifically wondering about how to read the field (bracket) transitions between ellipses (Pipes/Taps).
By way of example using the Fields following the Every Pipe in the image, the way I have been able to interpret these is the first Field set i.e. [{2}:'token', 'count'] is what goes into the next Pipe/Tap, but what is the significance of the second Field set [{1}: 'token']?
Is this the field set that went into the previous Pipe above? Is there a programmatic significance to the second bracket i.e. are we able to access it within that pipe with particular Cascading code? (In the case where the second Fields set is greater than the first)
(source: cascading.org)
The second field set represents which fields are available for subsequent operations in that map or reduce.
In your example above, in the reduce step, since you grouped by 'token', only 'token' is available for subsequent aggregations (Everys) in that reduce step. You could, for example, add another aggregation which output the average token length, but you could not use an aggregation which utilized the 'count' yet.
The reason for this behaviour is that subsequent aggregations on the same group happen in parallel. Thus, the Count won't be completed to feed into any other aggregations you chained on.
I need to count how many times a particular node occurs in a document based on the values of two if its attributes. So, given the following small sample of XML:
<p:entry timestamp="2012-11-15T17:53:34.642-05:00" ticks="89709622449012" system="OSD" component="OSD5" marker=".\Launcher.cpp:1741" severity="Info" type="Driver" subtype="Start" tags="" sensitivity="false">
This can occur one or more times in the document with different attribute sets. I need to count how many show up with type="Driver" AND subtype="Start". I am able to count how many just have type="Driver" using:
count(//p:entry[#type="Driver"])
but haven't been able to combine them. This didn't work:
count(//p:entry[#type="Driver" and #subtype="Start"])
This works for the OP. Specify 2 predicates in succession instead of using operator and result in the same effect:
count(//p:entry[#type="Driver"][#subtype="Start"])
By right, the original code count(//p:entry[#type="Driver" and #subtype="Start"]) should work, as far as my knowledge goes.
I use libxmljs to parse some html.
I have a xpath query which has an "or" conjunction to retrieve basically the information of two queries
Example
doc.find("//div[contains(#class,'important') or contains(#class,'overdue')]")
this returns all the divs with either important or overdue...
Can I prefix or see within my result set which comes from which condition?
The result could be an array with an index for the match 0 for the first condition and 1 for the 2... Is this possible...
Or how can I find out which result comes from which query condition...
Thanks for any help...
P.S.: this is a simplified exampled of a sequence of elements which either have an important or an overdue item ... both, one or none of them... So I cannot go by looking for every second entry ... etc
This is the result I want to get...
message:{},
message:{
.....
important: "some immportant text",
overdue: "overdue date,
.....
}
There is no way to know which clause of an or XPath query caused a particular result to be included. It's simply not information that's kept around.
You'll either need to do entirely separate queries for important and overdue, or do one large query to get the entire result set (as you are now) and then further test each result's class to find out which one it is.