Special characters in Solr filter fq

Special characters in Solr filter fq - filter

I'm trying to filter with fq for fields having special characters, particularly parentheses. For example, given the document:
<result name="response" numFound="1" start="0">
<doc>
<arr name="town_snc">
<str>Hartford (Connecticut)</str>
</arr>
</doc>
</result>
I want to do e.g. ?fq:town_snc=Hartford (Connecticut)
I'm not getting any results; I presume that the parentheses need to be escaped, but I was not able to find the escaping method.
Thank you!

Using the "field" qparser allows you to not have to do any escaping:
fq={!field f=town_snc}Hartford (Connecticut)
Or you can use the normal lucene query parser and use double quotes (but then you must still escape some things like quotes)
fq=town_snc:"Hartford (Connecticut)"
Or you could use backslash escaping too (just remember to also escape the space).
http://wiki.apache.org/solr/SolrQuerySyntax

Related

How do I "regex-quote" a string in XPath

I have an XSL template that takes 2 parameters (text and separator) and calls tokenize($text, $separator).
The problem is, the separator is supposed to be a regex. I have no idea what I get passed as separator string.
In Java I would call Pattern.quote(separator) to get a pattern that matches the exact string, no matter what weird characters it might contain, but I could not find anything like that in XPath.
I could iterate through the string and escape any character that I recognize as a special character with regard to regex.
I could build an iteration with substring-before and do the tokenize that way.
I was wondering if there is an easier, more straightforward way?

You could escape your separator tokens using the XPath replace function to find any character that requires escaping, and precede each occurrence with a \. Then you could pass such an escaped token to the XPath tokenize function.
Alternatively you could just implement your own tokenize stylesheet function, and use the substring-before and substring-after functions to extract substrings, and recursion to process the full string.

What characters are never used in xpath?

I'm trying to build a DSL which will contain a number of XPaths as parameters. I'm new to XPath, and I need a character which is never used in the XPath syntax so I can delimit n number of XPaths on a single line of a script. My question: what characters are NOT part of the XPath syntax?

The null character.
Seriously. Because an XPath is supposed to support any XML document, it must be capable of matching text nodes that contain any allowed Unicode character. However, XML disallows one character: the null character.
Ok, that is not entirely true, but it is simplest. As in XML 1.1, control characters were supported, except Unicode Null. However, as per the XML 1.0 production of Char, there are a few other characters you can choose from: surrogate pairs (as characters, not as correctly encoded octets representing a non-BMP character), and anything before 0x20, except linefeed, carriage return and tab.
Another good guess is any Private Use character, as it is unlikely it is used by your input documents, however, this is not guaranteed, and you asked for "never".

I'm trying to build a DSL which will contain a number of XPaths as parameters.
Well, many people use XML for DSLs, and this is how you would do it in XML:
<paths>
<path>/a/b/c/d</path>
<path>/w/x/y/z</path>
</path>
So how do we reconcile this with the fact that "<" can appear in an XPath expression? Answer: if it does appear, we escape it:
<paths>
<path>/a/b/c/d[e < 3]</path>
<path>/w/x/y/z[v < 2]</path>
</path>
So: don't try to find a character that can't appear in an XPath expression. Use a character that can appear, and escape it if it does.

What does Elasticsearch's auto_generate_phrase_queries do?

In the docs for query string query, auto_generate_phrase_queries is listed as a parameter but the only description is "defaults to false." So what does this parameter do exactly?
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-query-string-query.html

This will directly match to the lucene's org.apache.lucene.queryparser.classic.QueryParserSettings#autoGeneratePhraseQueries. When the analyzer applied on the query string, this setting allows lucene to generate quoted phrases no keywords.
Quoting:
SOLR-2015: Add a boolean attribute autoGeneratePhraseQueries to TextField.
autoGeneratePhraseQueries="true" (the default) causes the query parser to
generate phrase queries if multiple tokens are generated from a single
non-quoted analysis string. For example WordDelimiterFilter splitting text:pdp-11
will cause the parser to generate text:"pdp 11" rather than (text:PDP OR text:11).
Note that autoGeneratePhraseQueries="true" tends to not work well for non whitespace
delimited languages.
where word delimiter works as WordDelimiterFilter.html
Important thing to note is single non-quoted analysis string, i.e. if your query string is non-quoted. If you are already searching for a quoted phrase then it won't make any sense.

Spring Expression Language equivalent for \r, \n, \t etc

I am using Spring Integration. I get a string (payload) like below:
<Element>
<Sub-Element>5</Sub-Element>
</Element>
I need to test if above string starts with <Element><Sub-Element> which is actually <Element>\r\n <Sub-Element>.
<int:recipient-list-router id="customRouter" input-channel="routingChannel">
<int:recipient channel="channel1" selector-expression="payload.startsWith('<Element><Sub-Element>')"/>
<int:recipient channel="channel2" selector-expression="!payload.startsWith('<Element><Sub-Element>')"/>
</int:recipient-list-router>
Ideally the first router should pass the test but in this case its failing. Can anyone help me finding out what is the SpEL equivalent of \r \n etc ?

SpEL doesn't have escapes for those, but you can use regular expressions to do the selection...
<recipient selector-expression="payload matches '<Element>\r\n<Sub-Element>.*'" channel="channel1"/>
<recipient selector-expression="!(payload matches '<Element>\r\n<Sub-Element>.*')" channel="channel2"/>
If you are not familiar with regex, the .* at the end matches anything (hence this regex is the equivalent of startsWith()).
EDIT:
While this will work, I feel I should point out that relying on specific values in insignificant white space in XML documents is brittle - if the client changes to use, say \n instead, or even no whitespace, your application will break. You should consider using something like an <int-xml:xpath-router/> instead.

Thanks Gary.
So the working list-recipient-router looks like
Either
<recipient selector-expression="payload matches '(?s)<Element>(\s*)<Sub>(.*)'" channel="channel1"/>
<recipient selector-expression="!(payload matches '(?s)<Element>(\s*)<Sub>(.*)')" channel="channel2"/>
Or
<recipient selector-expression="payload matches '(?s)<Element>(.*)<Sub>(.*)'" channel="channel1"/>
<recipient selector-expression="!(payload matches '(?s)<Element>(.*)<Sub>(.*)')" channel="channel2"/>
May keep captures () or may not. Both works.

Another way instead of escaping regex patterns?

Usually when my regex patterns look like this:
http://www.microsoft.com/
Then i have to escape it like this:
string.match(/http:\/\/www\.microsoft\.com\//)
Is there another way instead of escaping it like that?
I want to be able to just use it like this http://www.microsoft.com, cause I don't want to escape all the special characters in all my patterns.

Regexp.new(Regexp.quote('http://www.microsoft.com/'))
Regexp.quote simply escapes any characters that have special regexp meaning; it takes and returns a string. Note that . is also special. After quoting, you can append to the regexp as needed before passing to the constructor. A simple example:
Regexp.new(Regexp.quote('http://www.microsoft.com/') + '(.*)')
This adds a capturing group for the rest of the path.

You can also use arbitrary delimiters in Ruby for regular expressions by using %r and defining a character before the regular expression, for example:
%r!http://www.microsoft.com/!

Regexp.quote or Regexp.escape can be used to automatically escape things for you:
https://ruby-doc.org/core/Regexp.html#method-c-escape
The result can be passed to Regexp.new to create a Regexp object, and then you can call the object's .match method and pass it the string to match against (the opposite order from string.match(/regex/)).

You can simply use single quotes for escaping.
string.match('http://www.microsoft.com/')
you can also use %q{} if you need single quotes in the text itself. If you need to have variables extrapolated inside the string, then use %Q{}. That's equivalent to double quotes ".
If the string contains regex expressions (eg: .*?()[]^$) that you want extrapolated, use // or %r{}

For convenience I just define
def regexcape(s)
Regexp.new(Regexp.escape(s))
end

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

Special characters in Solr filter fq - filter

Related

How do I "regex-quote" a string in XPath

What characters are never used in xpath?

What does Elasticsearch's auto_generate_phrase_queries do?

Spring Expression Language equivalent for \r, \n, \t etc

Another way instead of escaping regex patterns?

Categories

Resources