CQ JCR XPath query contains square bracket - xpath

I have properties in the repository with values that contain double square brackets.
I'd like to find these using a JCR XPath query (assuming it's possible)
I've tried using the following using the Query tool in CRXDE but there are "no results to display":
/jcr:root/content//*[jcr:contains(., '[[')] order by #jcr:score
Do I have to escape these characters and if so how?
Thanks, Rick.

Text.escapeIllegalXpathSearchChars(searchTerm) should do the trick for the jcr:contains() term.
See https://wiki.apache.org/jackrabbit/EncodingAndEscaping

Related

XPath expression from a website to extract price information

I am trying to extract the price of this type of website using XPath but I don't have any experience and using an addon I got this Xpath expression //div[#class='teaser--product-prices public-product']/div[#class='ui-table' and 1]/div[#class='ui-table-cell' and 1]/div[#class='teaser--product-final-price large sell-price' and 1] which is not working.
The website also uses dot (.) as a thousand separator and doesn't use a dot (.) for decimals so I would really appreciate if there was a way to remove the first dot and add one for decimals through the Xpath expression.
The expression is to be used for Content Egg Wordpress plugin to feed price information to a website.
The website is https://www.public-cyprus.com.cy/product/tileoraseis/tileoraseis/tileorasi-samsung-65-smart-8k-qled-qe65q950t/prod10634476pp/
You could use the following XPath expression :
concat(translate(//div[#class="product-main-container product-page"]/#data-price,".",""),".",translate(//div[#class="product-main-container product-page"]/#data-decimals,"€",""))
We use function concat to join the result of 2 XPath expressions. The first in which we remove the dot with funtion translate. The second in which we remove the euro symbol with translate too. We also add the decimal separator during the concat step.
Output :
6599.00
Side note : It could not work since I don't know if your plugin supports these XPath functions.

elasticsearch - fulltext search for words with special/reserved characters

I am indexing documents that may contain any special/reserved characters in their fulltext body. For example
"PDF/A is an ISO-standardized version of the Portable Document Format..."
I would like to be able to search for pdf/a without having to escape the forward slash.
How should i analyze my query-string and what type of query should i use?
The default standard analyzer will tokenize a string like that so that "PDF" and "A" are separate tokens. The "A" token might get cut out by the stop token filter (See Standard Analyzer). So without any custom analyzers, you will typically get any documents with just "PDF".
You can try creating your own analyzer modeled off the standard analyzer that includes a Mapping Char Filter. The idea would that "PDF/A" might get transformed into something like "pdf_a" at index and query time. A simple match query will work just fine. But this is a very simplistic approach and you might want to consider how '/' characters are used in your content and use slightly more complex regex filters which are also not perfect solutions.
Sorry, I completely missed your point about having to escape the character. Can you elaborate on your use case if this turns out to not be helpful at all?
To support queries containing reserved characters i now use the Simple Query String Query (https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-simple-query-string-query.html)
As of not using a query parser it is a bit limited (e.g. no field-queries like id:5), but it solves the purpose.

Is it safe to concatenate two XPath 1.0 queries?

If I have two XPath queries where the second one is meant to further drill down the result of the first, can I safely let my script combine them into a single query by...
placing parenthesis around the first query,
prefixing the second query with with a slash, and then
simply concatenating the two strings ?
Context
The concrete usecase that sparked this question involves extracting information from XML/XHTML documents according to externally supplied pairs of "CSS selector + attribute name" using XPath behind the scenes.
For example the script may get the following as input:
selector: a#home, a.chapter
attribute: href
It then compiles the selector to an XPath query using the HTML::Selector::XPath Perl module, and the attribute by simply prefixing a # ... which in this case would yield:
XPath query 1: //a[#id='home'] | //a[contains(concat(' ', #class, ' '), ' chapter ')]
XPath query 2: #href
And then it repeatedly passes those queries to libxml2's XPath engine to extract the requested information (in this example, a list of URLs) from the XML documents in question.
It works, but I would prefer to combine the two queries into a single one, which would simplify the code for invoking them and reduce the performance overhead:
XPath query: (//a[#id='home'] | //a[contains(concat(' ', #class, ' '), ' chapter ')])/#href
(note the added parenthesis and slash)
But is this safe to do programmatically, for arbitrary input queries?
In general, no, you can't concatenate two arbitrary XPath expressions in this way, especially not in XPath 1.0. It's easy to find counter-examples: in XPath 1.0 you can't even have a union expression on the RHS of '/', so concatenating "/a" and "(b|c)" would fail.
In XPath 2.0, the result will always be syntactically valid, but in may contain type errors, e.g. if the expressions are "count(a)" and "b". The LHS operand of "/" must evaluate to a sequence of nodes.
Sure, this should work. However, you will always have to respect the correct context. If the elements in your example in the first query have no href attribute, you will get an empty result set.
Also, you will have to take care of e.g. a leading slash in front of your second query, so that you don't end up with a descendant-or-self axis step, which might not be what you want. Apart from that, this should always work - The worst that can happen that it is not logical correct (i.e. you don't get the expected result), but it should always be valid XPath.

Finding an exact match of an inner text using XPath

What XPath syntax can I use to find an anchor tag where the inner text is "abc". The closest I can get to this is:
SelectSingleNode(".//a[starts-with(., \"abc\")]");
I couldn't find any "equals" function to use.
Try the following
SelectSingleNode("//a[.='abc']");
// ususally means you intend to search the whole tree, why would you add . before that?

Trouble using Xpath "starts with" to parse xhtml

I'm trying to parse a webpage to get posts from a forum.
The start of each message starts with the following format
<div id="post_message_somenumber">
and I only want to get the first one
I tried xpath='//div[starts-with(#id, '"post_message_')]' in yql without success
I'm still learning this, anyone have suggestions
I think I have a solution that does not require dealing with namespaces.
Here is one that selects all matching div's:
//div[#id[starts-with(.,"post_message")]]
But you said you wanted just the "first one" (I assume you mean the first "hit" in the whole page?). Here is a slight modification that selects just the first matching result:
(//div[#id[starts-with(.,"post_message")]])[1]
These use the dot to represent the id's value within the starts-with() function. You may have to escape special characters in your language.
It works great for me in PowerShell:
# Load a sample xml document
$xml = [xml]'<root><div id="post_message_somenumber"/><div id="not_post_message"/><div id="post_message_somenumber2"/></root>'
# Run the xpath selection of all matching div's
$xml.selectnodes('//div[#id[starts-with(.,"post_message")]]')
Result:
id
--
post_message_somenumber
post_message_somenumber2
Or, for just the first match:
# Run the xpath selection of the first matching div
$xml.selectnodes('(//div[#id[starts-with(.,"post_message")]])[1]')
Result:
id
--
post_message_somenumber
I tried xpath='//div[starts-with(#id,
'"post_message_')]' in yql without
success I'm still learning this,
anyone have suggestions
If the problem isn't due to the many nested apostrophes and the unclosed double-quote, then the most likely cause (we can only guess without being shown the XML document) is that a default namespace is used.
Specifying names of elements that are in a default namespace is the most FAQ in XPath. If you search for "XPath default namespace" in SO or on the internet, you'll find many sources with the correct solution.
Generally, a special method must be called that binds a prefix (say "x:") to the default namespace. Then, in the XPath expression every element name "someName" must be replaced by "x:someName.
Here is a good answer how to do this in C#.
Read the documentation of your language/xpath-engine how something similar should be done in your specific environment.
#FindBy(xpath = "//div[starts-with(#id,'expiredUserDetails') and contains(text(), 'Details')]")
private WebElementFacade ListOfExpiredUsersDetails;
This one gives a list of all elements on the page that share an ID of expiredUserDetails and also contains the text or the element Details

Resources