What's the difference between name() condition and self:: axis - xpath

For input xml
<?xml version="1.0" encoding="UTF-8"?>
<root>
<a>1</a>
<b>2</b>
<b>3</b>
<c>4</c>
</root>
I wonder if there is any difference between following xpath
//b
//*[name() = 'b']
//*[self::b] (or //self::b)
These expressions seem to be returning the same result but is it everytime true? I have tendency to interchange them freely but have such feeling I shouldn't.

One difference is that the name() function uses the namespace declarations in effect on the passed-in node (usually from the XML source) while the other methods use the namespace declarations from the expression context. This means that the name() function can lead to unpredictable results if different input documents use different namespace prefixes.
Another difference is that the name() function also works for nodes other than elements. But for elements without namespaces, you can use all methods interchangeably.

Related

Julia: Abstract types vs Union of types

I am quite new to Julia, and I still have some doubts on which style is better when trying to do certain things... For instance, I have lots of doubts on the performance or style differences of using Abstract types vs defining Unions.
An example: Let's imagine we want to implement several types of units (Mob, Knight, ...) which should share most (if not all) of their attributes and most (if not all) of their methods.
I see two options to provide structure: First, one could declare an abstract type AbstractUnit from which the other types derive from, and then create methods for the abstract type. It would look something like this:
abstract type AbstractUnit end
mutable struct Knight <: AbstractUnit
id :: Int
[...]
end
mutable struct Peasant <: AbstractUnit
id :: Int
[...]
end
id(u::T) where T <: AbstractUnit = u.id
[...]
Alternatively, one could define a union of types and create methods for the union. It would look something like this:
mutable struct Knight
id :: Int
[...]
end
mutable struct Peasant
id :: Int
[...]
end
const Unit = Union{Knight,Peasant}
id(u::Unit) = u.id
[...]
I understand some of the conceptual differences between these two approaches, and think the first one is more expandable. However, I have a lot of doubts in terms of performance. For instance, how bad would it be creating arrays of the AbstractUnit vs arrays of the union type in terms of memory allocation at runtime?
Thanks!
Absolutely use AbstractUnit instead of Unit in your methods' argument type annotations. Unions are abstract types too, actually, but as you pointed out, you can't add new types to it. In either case, the method is compiled to a specialization for each concrete type like Knight or Peasant, so your methods' performance won't be different.
As for Arrays' element type parameters, there is the isbits Union optimization, but as the name suggests it only works if all the types in your Union are isbitstypes (no pointers, immutable). Your structs are mutable so that's already not applicable. See, memory access is faster when instances are directly stored in Arrays, and the element type parameter (T in Vector{T}) must be a concrete isbitstype to allow this. When the element type parameter is abstract or mutable, generally Arrays only directly store pointers to the actual instances because multiple or mutable concrete types can have unknown and varying memory size. If the abstract type is an isbits Union though, instances could be stored directly in the Array: enough memory is allocated per element to contain the largest concrete type in the Union as well as a tag byte for each element specifying its type. A byte only has 256 values, so presumably this only works for Unions of at most 256 concrete types.
Another possible optimization by using a Vector{Unit} over Vector{AbstractUnit} is Union-splitting a type instability. I'm really not going to be able to make an example method that explains it better than the linked blog, so I'll just give the short version. When Julia's compiler fails to infer the type of a variable in your method at all (::Any annotations in #code_warntype), inner method calls involving the variable must do type checks and dispatch (selecting the specialization) at run-time, which can cost significant time. However, if Julia's compiler can infer the variable to be a Union of a few concrete types (in practice, at most 4), conditional branches for each type can be used to eliminate most type checks and do dispatch at compile-time. Vector{AbstractUnit} can contain any number of types <: AbstractUnit, so the compiler cannot use Union-splitting. Vector{Unit} however lets the compiler know the elements must be either Knight or Peasant, which allows Union-splitting.
P.S. This is often a source of confusion for beginners, but while Unit is an abstract type, Vector{Unit} is a concrete type, just with an abstract type parameter. After all, it can have instances, all Arrays directly containing pointers to Knight or Peasant instances.

What is the purpose/use of a map in XPath 3.1?

I understand the need for an array type in XPath 3.1 as they're fundamental to JSON. And yes I understand you can create a literal map() in an XPath query.
But is there a way XML or JSON can be structured where a query would naturally return a map on an XPath query against the underlying document? Or does it exist solely for the case where converting results into a map to then operate on is of benefit?
Probably the main use cases I've seen for maps are
(a) to capture the result of parsing JSON input, when the input data is in JSON
(b) to construct a structure that can be serialized as JSON, when JSON output is required.
(c) to provide complex input parameters to functions (like the fn:transform() or fn:serialize() functions)
(d) to capture multiple results or compound results from functions, e.g. a function that computes both the min and max of a sequence. If maps had been available at the time, they could have been used to get the namespace context of an element much more elegantly than the in-scope-prefixes/namespace-uri-for-prefix mechanism.
(e) a map whose entries are functions can be used like an object in OO languages, to achieve polymorphism -- especially useful in XQuery which lacks XSLT's template rule despatch mechanism. The fn:random-number-generator() function design illustrates the idea.
(f) a map can act as a simple struct for compound values, e.g. complex numbers. (It could have been used for date/time/duration/QName if available, or for the error information available in a catch clause)
"is there a way [..] JSON can be structured where a query would naturally return a map?": anything in JSON being an "object"
https://www.json.org/json-en.html: "An object is an unordered set of
name/value pairs. An object begins with {left brace and ends with
}right brace")
maps (pun intended) to an XDM map.
So in JSON both arrays and objects are fundamental and in the XDM you can represent a JSON array as an XDM array and a JSON object as an XDM map.

Anonymous struct as pipeline in template

Is there a way to do the following in a html/template?
{{template "mytemplate" struct{Foo1, Foo2 string}{"Bar1", "Bar2"}}}
Actually in the template, like above. Not via a function registered in FuncMap which returns the struct.
I tried it, but Parse panics, see Playground. Maybe just the syntax is wrong?
As noted by others, it's not possible. Templates are parsed at runtime, without the help of the Go compiler. So allowing arbitrary Go syntax would not be feasible (although note that it wouldn't be impossible, as the standard lib contains all the tools to parse Go source text, see packages "prefixed" with go/ in the standard lib). By design philosophy, complex logic should be outside of templates.
Back to your example:
struct{Foo1, Foo2 string}{"Bar1", "Bar2"}
This is a struct composite literal and it is not supported in templates, neither when invoking another template nor at other places.
Invoking another template with a custom "argument" has the following syntax (quoting from text/template: Actions):
{{template "name" pipeline}}
The template with the specified name is executed with dot set
to the value of the pipeline.
TL;DR; A pipeline may be a constant, an expression denoting a field or method of some value (where the method will be called and its return value will be used), it may be a call to some "template-builtin" function or a custom registered function, or a value in a map.
Where Pipeline is:
A pipeline is a possibly chained sequence of "commands". A command is a simple value (argument) or a function or method call, possibly with multiple arguments:
Argument
The result is the value of evaluating the argument.
.Method [Argument...]
The method can be alone or the last element of a chain but,
unlike methods in the middle of a chain, it can take arguments.
The result is the value of calling the method with the
arguments:
dot.Method(Argument1, etc.)
functionName [Argument...]
The result is the value of calling the function associated
with the name:
function(Argument1, etc.)
Functions and function names are described below.
And an Argument is:
An argument is a simple value, denoted by one of the following.
- A boolean, string, character, integer, floating-point, imaginary
or complex constant in Go syntax. These behave like Go's untyped
constants. Note that, as in Go, whether a large integer constant
overflows when assigned or passed to a function can depend on whether
the host machine's ints are 32 or 64 bits.
- The keyword nil, representing an untyped Go nil.
- The character '.' (period):
.
The result is the value of dot.
- A variable name, which is a (possibly empty) alphanumeric string
preceded by a dollar sign, such as
$piOver2
or
$
The result is the value of the variable.
Variables are described below.
- The name of a field of the data, which must be a struct, preceded
by a period, such as
.Field
The result is the value of the field. Field invocations may be
chained:
.Field1.Field2
Fields can also be evaluated on variables, including chaining:
$x.Field1.Field2
- The name of a key of the data, which must be a map, preceded
by a period, such as
.Key
The result is the map element value indexed by the key.
Key invocations may be chained and combined with fields to any
depth:
.Field1.Key1.Field2.Key2
Although the key must be an alphanumeric identifier, unlike with
field names they do not need to start with an upper case letter.
Keys can also be evaluated on variables, including chaining:
$x.key1.key2
- The name of a niladic method of the data, preceded by a period,
such as
.Method
The result is the value of invoking the method with dot as the
receiver, dot.Method(). Such a method must have one return value (of
any type) or two return values, the second of which is an error.
If it has two and the returned error is non-nil, execution terminates
and an error is returned to the caller as the value of Execute.
Method invocations may be chained and combined with fields and keys
to any depth:
.Field1.Key1.Method1.Field2.Key2.Method2
Methods can also be evaluated on variables, including chaining:
$x.Method1.Field
- The name of a niladic function, such as
fun
The result is the value of invoking the function, fun(). The return
types and values behave as in methods. Functions and function
names are described below.
- A parenthesized instance of one the above, for grouping. The result
may be accessed by a field or map key invocation.
print (.F1 arg1) (.F2 arg2)
(.StructValuedMethod "arg").Field
The proper solution would be to register a custom function that constructs the value you want to pass to the template invocation, as you can see in this related / possible duplicate: Golang pass multiple values from template to template?
Another, half solution could be to use the builtin print or printf functions to concatenate the values you want to pass, but that would require to split in the other template.
As mentioned by #icza, this is not possible.
However, you might want to provide a generic dict function to templates to allow to build a map[string]interface{} from a list of arguments. This is explained in this other answer: https://stackoverflow.com/a/18276968/328115

Refactoring Business Rule, Function Naming, Width, Height, Position X & Y

I am refactoring some business rule functions to provide a more generic version of the function.
The functions I am refactoring are:
DetermineWindowWidth
DetermineWindowHeight
DetermineWindowPositionX
DetermineWindowPositionY
All of them do string parsing, as it is a string parsing business rules engine.
My question is what would be a good name for the newly refactored function?
Obviously I want to shy away from a function name like:
DetermineWindowWidthHeightPositionXPositionY
I mean that would work, but it seems unnecessarily long when it could be something like:
DetermineWindowMoniker or something to that effect.
Function objective: Parse an input string like 1280x1024 or 200,100 and return either the first or second number. The use case is for data-driving test automation of a web browser window, but this should be irrelevant to the answer.
Question objective: I have the code to do this, so my question is not about code, but just the function name. Any ideas?
There are too little details, you should have specified at least the parameters and returns of the functions.
Have I understood correctly that you use strings of the format NxN for sizes and N,N for positions?
And that this generic function will have to parse both (and nothing else), and will return either the first or second part depending on a parameter of the function?
And that you'll then keep the various DetermineWindow* functions but make them all call this generic function?
If so:
Without knowing what parameters the generic function has it's even harder to help, but it's most likely impossible to give it a simple name.
Not all batches of code can be described by a simple name.
You'll most likely need to use a different construction if you want to have clear names. Here's an idea, in pseudo code:
ParseSize(string, outWidth, outHeight) {
ParsePair(string, "x", outWidht, outHeight)
}
ParsePosition(string, outX, outY) {
ParsePair(string, ",", outX, outY)
}
ParsePair(string, separator, outFirstItem, outSecondItem) {
...
}
And the various DetermineWindow would call ParseSize or ParsePosition.
You could also use just ParsePair, directly, but I thinks it's cleaner to have the two other functions in the middle.
Objects
Note that you'd probably get cleaner code by using objects rather than strings (a Size and a Position one, and probably a Pair one too).
The ParsePair code (adapted appropriately) would be included in a constructor or factory method that gives you a Pair out of a string.
---
Of course you can give other names to the various functions, objects and parameters, here I used the first that came to my mind.
It seems this question-answer provides a good starting point to answer this question:
Appropriate name for container of position, size, angle
A search on www.thesaurus.com for "Property" gives some interesting possible answers that provide enough meaningful context to the usage:
Aspect
Character
Characteristic
Trait
Virtue
Property
Quality
Attribute
Differentia
Frame
Constituent
I think ConstituentProperty is probably the most apt.

XPath to find all following siblings up until the next sibling of a particular type

Given this XML/HTML:
<dl>
<dt>Label1</dt><dd>Value1</dd>
<dt>Label2</dt><dd>Value2</dd>
<dt>Label3</dt><dd>Value3a</dd><dd>Value3b</dd>
<dt>Label4</dt><dd>Value4</dd>
</dl>
I want to find all <dt> and then, for each, find the following <dd> up until the next <dt>.
Using Ruby's Nokogiri I am able to accomplish this like so:
dl.xpath('dt').each do |dt|
ct = dt.xpath('count(following-sibling::dt)')
dds = dt.xpath("following-sibling::dd[count(following-sibling::dt)=#{ct}]")
puts "#{dt.text}: #{dds.map(&:text).join(', ')}"
end
#=> Label1: Value1
#=> Label2: Value2
#=> Label3: Value3a, Value3b
#=> Label4: Value4
However, as you can see I'm creating a variable in Ruby and then composing an XPath using it. How can I write a single XPath expression that does the equivalent?
I guessed at:
following-sibling::dd[count(following-sibling::dt)=count(self/following-sibling::dt)]
but apparently I don't understand what self means there.
This question is similar to XPath : select all following siblings until another sibling except there is no unique identifier for the 'stop' node.
This question is almost the same as xpath to find all following sibling adjacent nodes up til another type except that I'm asking for an XPath-only solution.
This is an interesting question. Most of the problems were already mentioned in #lwburk's answer and in its comments. Just to open up a bit more the complexity hidden in this question for a random reader, my answer is probably more elaborate or more verbose than OP needed.
Features of XPath 1.0 related to this problem
In XPath each step, and each node in the set of selected nodes, work independently. This means that
a subexpression has no generic way to access data that was computed in a previous subexpression or share data computed in this subexpression to other subexpressions
a node has no generic way to refer to a node that was used as a context node in a previous subexpression
a node has no generic way to refer to other nodes that are currently selected.
if everyone of the selected nodes must be compared to a same certain node, then that node must be uniquely definable in a way that is common to all selected nodes
(Well, in fact I'm not 100% sure if that list is absolutely correct in every case. If anyone has better knowledge of the quirks of XPath, please comment or correct this answer by editing it.)
Despite the lack of generic solutions some of these restrictions can be overcome if there is proper knowledge of the document structure, and/or the axis used previously can be "reverted" with another axis that serves as a backlink i.e. matches only nodes that were used as context node in the previous expression. A common example of this is when a parent axis is used after first using a child axis (the opposite case, from child to parent, is not uniquely revertible without additional information). In such cases, the information from previous steps is more precisely recreated at a later step (instead of accessing previously known information).
Unfortunately in this case I couldn't come up with any other solution to refer to previously known nodes except using XPath variables (that needs to be defined beforehand).
XPath specifies a syntax for referring a variable but it does not specify syntax for defining variables, the way how to define variables depends on the environment where XPath is used. Actually since the recommendation states that "The variable bindings used to evaluate a subexpression are always the same as those used to evaluate the containing expression", you could also claim that XPath explicitly forbids defining variables inside an XPath expression.
Problem reformulated
In your question the problem would be, when given a <dt>, to identify the following <dd> elements or the initially given node after the context node has been switched. Identifying the originally given <dt> is crucial since for each node in the node-set to be filtered, the predicate expression is evaluated with that node as the context node; so one cannot refer to the original <dt> in a predicate, if there is no way to identify it after the context has changed. The same applies to <dd> elements that are following siblings of the given <dt>.
If you are using variables, one could debate is there a major difference between 1) using XPath variable syntax and a Nokogiri specific way to declare that variable or 2) using Nokogiri extended XPath syntax that allows you to use Ruby variables in an XPath expression. In both cases the variable is defined in environment specific way and the meaning of the XPath is clear only if the definition of the variable is also available. Similar case can be seen with XSLT where in some cases you could make a choice between 1) defining a variable with <xsl:variable> prior to using your XPath expression or 2) using current() (inside your XPath expression) which is an XSLT extension.
Solution using nodeset variables and Kaysian method
You can select all the <dd> elements following the current <dt> element with following-sibling::dd (set A). Also you can select all the <dd> elements following the next <dt> element with following-sibling::dt[1]/following-sibling::dd (set B). Now a set difference A\B leaves the <dd> elements you actually wanted (elements that are in set A but not in set B). If variable $setA contains nodeset A and variable $setB contains nodeset B, the set difference can be obtained with (a modification of) Kaysian technique:
dds = $setA[count(.|$setB) != count($setB)]
A simple workaround without any variables
Currently your method is to select all the <dt> elements and then try to couple the value of each such element with values of corresponding <dd> elements in a single operation. Would it be possible to convert that coupling logic to work the other way round? So you would first select all <dd> elements and then for each <dd> find the corresponding <dt>. This would mean that you end up accessing same <dt> elements several times and with every operation you add only one new <dd> value. This could affect performance and the Ruby code could be more complicated.
The good side is the simplicity of the required XPath. When given a <dd> element, finding the corresponding <dt> is amazingly simple: preceding-sibling::dt[1]
As applied to your current Ruby code
dl.xpath('dd').each do |dd|
dt = dd.xpath("preceding-sibling::dt[1]")
## Insert new Ruby magic here ##
end
One possible solution:
dl.xpath('dt').each_with_index do |dt, i|
dds = dt.xpath("following-sibling::dd[not(../dt[#{i + 2}]) or " +
"following-sibling::dt[1]=../dt[#{i + 2}]]")
puts "#{dt.text}: #{dds.map(&:text).join(', ')}"
end
This relies on a value comparison of dt elements and will fail when there are duplicates. The following (much more complicated) expression does not depend on unique dt values:
following-sibling::dd[not(../dt[$n]) or
(following-sibling::dt[1] and count(following-sibling::dt[1]|../dt[$n])=1)]
Note: Your use of self fails because you're not properly using it as an axis (self::). Also, self always contains just the context node, so it would refer to each dd inspected by the expression, not back to the original dt

Resources