How to cement two xpath expressions into one? - xpath

Can't get any idea to cement the two expressions into one xpath. They both fall under the same class "pagination". I need to use that in a loop. I tried separately like this:
//div[#class='pagination']//a/#href
//div[#class='pagination']//a[contains(#class,'next')]/#href
Elements for the expression:
<div class="pagination"><p><span>Showing</span>1-30
of 483<span>results</span></p><ul><li><span class="disabled">1</span></li><li>2</li><li>3</li><li>4</li><li>5</li><li>Next</li></ul></div>

There are many ways you can combine two XPath expressions. One way, for example, is to use the union operator "|". Whether that's the right operator to use depends on what you want to achieve. Unfortunately you forget to tell us what you want to achieve, so that might not be the right operator for your purposes.

Related

Do predicates concatenated in multiple brackets behave the same as the and operator?

I'm writing a DOM selectors to xpath converter and it just so happens that it would suit me very much if I could concatenate multiple predicates like this:
//div[#id][#class]
instead of like this:
//div[#id and #class]
Although a cursory test appears to suggest they behave the same, I'm not entirely sure they will under all circumstances. Will they?
Using multiple predicates is 100% equivalent to using the and operator provided that neither predicate is positional. A positional predicate is one whose value is numeric, or that explicitly uses position() or last(). For example, *[#x][1] is not the same as *[1][#x].

Is there a short and elegant way to write an XPath 1.0 expression to get all HREF values containing at least one of many search values?

I was just wondering if there is a shorter way of writing an XPath query to find all HREF values containing at least one of many search values?
What I currently have is the following:
//a[contains(#href, 'value1') or contains(#href, 'value2')]
But it seems quite ugly, especially if I were to have more values.
First of all, in many cases you have to live with the "ugliness" or long-windedness of expressions if only XPath 1.0 is at your disposal. Elegance is something introduced with version 2.0, I'd daresay.
But there might be ways to improve your expression: Is there a regularity to the href attributes you'd like to find? For instance, if it is sufficient as a rule to say that the said href attribute values must start with "value", then the expression could be
//a[starts-with(#href,'value')]
I know that "value1" and "value2" are most probably not your actual attribute values but there might be something else that uniquely identifies the group of a elements you're after. Post your HTML input if this is something you want us to help you with.
Personally, I do not find your expression ugly. There is just one or operator and the expression is quite short and readable. I take
if I were to have more values.
to mean that currently, there are only two attribute values you are interested in and that your question therefore is a theoretical one.
In case you're using XPath 2 and would like to have exact matches instead of also matches only containing part of a search value, you can shorten with
//a[#href = ('value1', 'value2')]
For contains() this syntax wouldn't work as the second argument of contains() is only allowed to be 0 or 1 value.
In XPath 2 you could also use
//a[some $s in ('value1', 'value2') satisfies contains(#href, $s)]
or
//a[matches(#href, "value1|value2")]

Regexp in ruby - can I use parenthesis without grouping?

I have a regexp of the form:
/(something complex and boring)?(something complex and interesting)/
I'm interested in the contents of the second parenthesis; the first ones are there only to ensure a correct match (since the boring part might or might not be present but if it is, I'll match it by accident with the regexp for the interesting part).
So I can access the second match using $2. However, for uniformity with other regexps I'm using I want that somehow $1 will contain the contents of the second parethesis. Is it possible?
Use a non-capturing group:
r = /(?:ab)?(cd)/
This is a non-ruby regexp feature. Use /(?:something complex and boring)?(something complex and interesting)/ (note the ?:) to achieve this.
By the way, in Ruby 1.9, you can do /(something complex and boring)?(?<interesting>something complex and interesting)/ and access the group with $~[:interesting] ;)
Yup, use the ?: syntax:
/(?:something complex and boring)?(something complex and interesting)/
I'm not a ruby developer however I know other regex flavors. So I bet you can use a non capturing group
/(?:something complex and boring)?(something complex and interesting)/
There is only one capturing group, hence $1
HTH
Not really, no. But you can use a named group for uniformity, like this:
/(?<group1>something complex and boring)?(?<group2>something complex and interesting)/
You can change the names (the text in the angle brackets) for the uniformity that you want to achieve. You can then access the groups like this:
string.match(/(?<group1>something complex and boring)?(?<group2>something complex and interesting)/) do |m|
# Do something with the match, m['group'] can be used to access the group
end

Is this XPath technique reliable in all situations?

I am developing an application that accepts user-defined XPath expressions and employs them as part of its runtime operation.
However, I would like to be able to infer some additional data by programmatically manipulating the expression, and I am curious to know whether there are any situations in which this approach might fail.
Given any user-defined XPath expression that returns a node set, is it safe to wrap it in the XPath count() function to determine the number of nodes in the set:
count(user_defined_expression)
Similarly, is it safe to append an array index to the expression to extract one of the nodes in the set:
user_defined_expression[1]
Well an XPath expression (in XPath 1.0) can yield a node-set or a string or a number or a boolean and doing count(expression) only makes sense on any expression yielding a node-set.
As for adding a positional predicate, I think you might want to use parentheses around your expression i.e. to change /root/foo/bar into (/root/foo/bar)[1] as that way you select the first bar element in the node-set selected by /root/foo/bar while without them you would get /root/foo/bar[1] which would select the first bar child element of any foo child element of the root element.
Are you checking that such user-defined expressions always evaluate to node-set?
If yes, first Expr is ok. Datatype will be correct for fn:count
Second one is a lot trickier, with a lot of situations there predicate will overweight axis, for example. Check this answer for a simple analysis. It will be difficult to say, what a user really meant.
A more robust approach would be to convert the XPath expression to XQueryX, which is an XML representation of the abstract syntax tree; you can then do XQuery or XSLT transformations on this XML representation, and then convert back to a modified XPath (or XQuery) for evaluation.
However, this will still only give you the syntactic structure of the expression; if you want semantic information, such as the inferred static type of the result, you will probably have to poke inside an XPath process that exposes this information.

What is the best way to match id's against a regular expression in Hpricot?

Using apricot, it is pretty easy to see how I can extract all elements with a given id or class using a CSS Selector. Is it possible to extract elements from a document based on whether some attribute of those elements matches against some regular expression?
If you mean do something like:
doc.search("//div[#id=/regex/]")
then I don't think it can be done. The alternative is to find all elements and then iterate through the results deleting those that don't match a regex.
result = doc.search("//div")
result.delete_if (|x| x.to_s !~ /regex/)
There are lots of alternative approaches. This thread has two other suggestions: Hpricot and Regular Expression.
Note, depending on exactly what it is you are trying to match you may be able to use the "Supported, but different" syntaxes available on the Hpricot Wiki, e.g:
E[#foo$=“bar”]
Matches an E element whose “foo”
attribute value ends exactly with the
string “bar”

Resources