How to match a case insensitive value with XPath - xpath

I have an XPath with which I'm trying to match meta tags that have a name attribute with a value that contains the word 'keyword' irrespective of case. Basically, I'm trying to match:
<meta name="KEYWORDS">
<meta name="Keywords">
<meta name="keywords">
with the XPath
'descendant::meta[contains(lower-case(#name), "keyword")]/#content'
I'm using Scrapy and it's built-in Selectors, but when I try this XPath, I get an error "Invalid XPath:...".
What am I doing wrong and what's the right way to do what I want?

Scrapy Selectors are built over the libxml2 library, which, AFAIK, doesn't support XPath 2.0. At least libxslt does not for sure.
You can use XPath 1.0 translate() to solve this. In general it will look like:
translate(yourString,
'ABCDEFGHIJKLMNOPQRSTUVWXYZ',
'abcdefghijklmnopqrstuvwxyz')

Related

xpath - how to use contains to filter the xpath?

I got this xpath
//td[#id='datepicker-54143-1002-23']/button/span
The numbers 54142-1002 will change. 23 is the date. How can I use contains to make the xpath? So I would like to make the xpath contain datepicker and 23 in this case.
You can try
//td[starts-with(#id, 'datepicker-') and ends-with(#id, '-23')]/button/span
But note that ends-with available from XPath version 2.0
If ends-with doesn't work for you, as #JaSON suggested in his answer, it's because you're using XPATH 1.0, and ends-with is available only in XPATH 2.0.
Try this instead:
//td[starts-with(#id,'datepicker-') and substring(#id,string-length(#id)-2)='-23']
Please note that if "23" changes to string of different length you'll have to adjust "-2" part accordingly.

XPath concat function not works

I am trying get the first and second <td> in all the <tr> from this table with XPath. But something I am doing wrong. Return [INVALID XPATH EXPRESSION]
//table[#id='thetable']/tbody/tr/concat(td[1],'-',td[2])
Try:
string-join(//table[#id='thetable']/tbody/tr/td[1 to 2]/string(), "-")
Using concat() on the right hand side of "/" requires an XPath 2.0 engine. The error message suggests you are trying to run this using an XPath 1.0 engine. The string-join version also needs XPath 2.0.
In fact any expression that returns a sequence of strings is going to need XPath 2.0 because the XPath 1.0 type system doesn't have any such data type.
If you want an XPath 2.0 implementation that runs in the browser you could try Saxon-JS. (In fact that will give you XPath 3.1).

How do I write a CSS selector that looks for an element starting with text in a case-insensitive way?

I'm using Rails 5.0.1 with Nokogiri. How do I select a CSS element whose text starts with a certain string in a case insensitive way? Right now I can search for something in a case-sensitive way using
doc.css("#select_id option:starts-with('ABC')")
but I would like to know how to disregard case when looking for an option that starts with certain text?
Summary It's ugly. You're better off just using Ruby:
doc.css('select#select_id > option').select{ |opt| opt.text =~ /^ABC/i }
Details
Nokogiri uses libxml2, which uses XPath to search XML and HTML documents. Nokogiri transforms ~CSS expressions into XPath. For example, for your ~CSS selector, this is what Nokogiri actually searches for:
Nokogiri::CSS.xpath_for("#select_id option:starts-with('ABC')")
#=> ["//*[#id = 'select_id']//option[starts-with(., 'ABC')]"]
The expression you wrote is not actually CSS. There is no :starts-with() pseudo-class in CSS, not even proposed in Selectors 4. What there is is the starts-with() function in XPath, and Nokogiri is (somewhat surprisingly) allowing you to mix XPath functions into your CSS and carrying them over to the XPath it uses internally.
The libxml2 library is limited to XPath 1.0, and in XPath 1.0 case-insensitive searches are done by translating all characters to lowercase. The XPath expression you'd want is thus:
//select[#id='select_id']/option[starts-with(translate(.,'ABC','abc'),'abc')]
(Assuming you only care about those characters!)
I'm not sure that you CAN write CSS+XPath in a way that Nokogiri would produce that expression. You'd need to use the xpath method and feed it that query.
Finally, you can create your own custom CSS pseudo-classes and implement them in Ruby. For example:
class MySearch
def insensitive_starts_with(nodes, str)
nodes.find_all{ |n| n.text =~ /^#{Regex.escape(str)}/i }
end
end
doc.css( "select#select_id > option:insensitive_starts_with('ABC')", MySearch )
...but all this gives you is re-usability of your search code.

Testcases of XPath 2.0 expressions which can test if a parser is support XPath 2.0

I am try to test if a parser support XPath 2.o. Can someone give me some basic XPath 2.o expression which will pass if the parser support XPath 2.o and will fail if the parser only support 1.0.
Do not have to be fancy strings, just 4-5 basic XPath expressions.
XSLT provides system-property('xsl:version') for identifying version information. Unfortunately, XPath has no such function to distinguish between XPath 1.0 vs XPath 2.0. You can probe the library by calling a function that's only defined in XPath 2.0 and see if it fails. Here are a few such functions:
current-dateTime()
lower-case(), upper-case(), ends-with()
matches(), replace(), tokenize()
Wikipedia has a more extensive list of library XPath functions, grouped by category, and marked per XPath 1.0 vs XPath 2.0.
The simplest I can think of are
()
1 eq 1
abs(3)
''''
*:x

xpath to check '#' present

I want to write xpath to check node contain '#'
<node1>
<node11>Some text</node11>
<node11>#2o11 PickMe</node12>
</node1>
I want to write xpath like "//node11[contains(,'#\d+')]". Whats correct way to check #
The correct XPath expression is:
//node11[contains(., '#')]
In your XML, the closing tag of the second subnote should be </node11> instead of </node12>.
If you are using xpath 2.0 you should be able to use something like:
"//node11[matches(.,'#\d+')]"
However, if you aren't using 2.0 you won't have regex support directly. If you are using 1.0 then you won't be able to match using \d+. But this will work:
"//node11[contains(.,'#')]"
Or even:
"//node11[starts-with(.,'#')]"
Use:
/*/node11[contains(., '#')]
Note: It is recommended to avoid using the // pseudo-operator because this most often leads to very slow evaluation of the XPath expression.

Resources