Case insensitive substring LDAP search on OpenLDAP 2.4.33 - filter

The current question is not the same as this one.
I have an LDAP entry which the content "This is a SimpleTest indeed" in the "supName" field.
I need to write a filter so that when the user of my software introduces any substring of this content in any case (upper, lower or mixed case), it finds the entry. It must work even if the user does not input a complete word ("impletes", for example).
The supName field follows DirectoryString syntax. This means that the default matching rule is exact and case sensitive ("caseExactMatch"). But this syntax in theory, should allow also "caseIgnoreMatch" and "caseIgnoreSubstringsMatch" matching rules. I though I just needed to force to use the last one ("caseIgnoreSubstringsMatch"), so I tried this filter:
(supName:caseIgnoreSubstringsMatch:=*impletes*)
But this does not work. I make my tests using Apache Directory Studio, and that tool refuses to accept the above filter. It complains on the asterisks, and I don't understand why, since I am using a Substring match (and thus asterisks should be allowed). If I run the filter from command line (using ldapsearch), I get this error message:
ldap_search_ext: Bad search filter (-7)
Therefore this is not an issue with Apache Directory Studio.
So my question is: What is the correct way of defining a case-insensitive substring filter on a field that is case-sensitive by default?
Futher tests:
What follows are some other filters I have tested, and the reasons they do not suit me.
Test #1 filters:
(supName=*impleTes*)
This operator (=) returns my test entry, but it's not case-insensitive. If I replace "impleTes" for "impletes" it does not return anything.
Test #2 filter:
(supName~=simpletest)
This operator (~=) works, but:
It needs a complete word. If I replace "simpletest" for "impletes" it does not return anything.
As it is an "approximate" search operator, it may return unwanted results. For example, the filter above returns also a second entry: "This is a SimpleTast indeed" (notice the "a" instead of "e" in "SimpleTast"). I don't want approximate results.
Test #3 filter:
(supName:caseIgnoreMatch:=this is a simpletest indeed)
This returns the entry I was expecting, and only that entry. It is also case-insensitive. But it forces the user to write the whole content of the field: it is not a substring search, but a case-insensitive exact-match search.
Test #4 filter:
(supName:caseIgnoreMatch:=*impletes*)
This returns a "Bad search filter (-7)" error, with is expected since I am not allowed to use substring syntax in an exact matching rule.
And finally, the Test #5 filter:
(supName:caseIgnoreSubstringsMatch:=*impletes*)
Which I expected to work, but returns a "Bad search filter (-7)" error.
Additional info - Opposite example
I have found here (see the "Extensible Matching" section at the end) examples of the opposite case. In the example, the field "sn" uses the "caseIgnoreMatch" matching rule by default (making it case-insensitive). So what they want in the example is to do a case-sensitive substring search. This is the filter they use:
(sn:caseExactSubstringMatch:=*S*)
But I doubt if this example is correct, because if I try exactly the same filter on my side:
(supName:caseExactSubstringMatch:=*S*)
I get a "Bad search filter (-7)" error.
So maybe my issue is due to limitation on OpenLDAP 2.4.33 but would work with other LDAP servers, although the example comes from a guide that is supposed to cover OpenLDAP 2.x ... (?)

If I'm reading RFC 4515ยง3 correctly, an extensible match can only be done with an assertion value (read: a fixed string) and not with a substring filter. If that were permitted, I would expect your original example (supName:caseIgnoreSubstringsMatch:=*impletes*) to work.

Other option is
(Attribute~=value)
-- http://publib.boulder.ibm.com/tividd/td/IBMDS/IDSprogref52/en_US/HTML/progref.htm#HDRLDAPSRCH

Related

How do I escape the word "And" in Elasticsearch if I want to search by the literal "And"?

I'm trying to search over an index that includes constellation code names, and the code name for the Andromeda constellation is And.
Unfortunately, if I search using And, all results are returned. This is the only one that doesn't work, across dozens of constellation code names, and I assume it's because it's interpreted as the logical operator AND.
(constellation:(And)) returns my entire result set, regardless of the value of constellation.
Is there a way to fix this without doing tricks like indexing with an underscore in front?
Thanks!
I went for a bit of a hack, indexing the constellation as __Foo__ and then changing my search query accordingly by adding the __ prefix and suffix to the selected constellation.

Maching two words as a single word

Consider that I have a document which has a field with the following content: 5W30 QUARTZ INEO MC 3 5L
A user wants to be able to search for MC3 (no space) and get the document; however, search for MC 3 (with spaces) should also work. Moreover, there can be documents that have the content without spaces and that should be found when querying with a space.
I tried indexing without spaces (e.g. 5W30QUARTZINEOMC35L), but that does not really work as using a wildcard search I would match too much, e.g. MC35 would also match, and I only want to match two exact words concatenated together (as well as exact single word).
So far I'm thinking of additionally indexing all combinations of two words, e.g. 5W30QUARTZ, QUARTZINEO, INEOMC, MC3, 35L. However, does Elasticsearch have a native solution for this?
I'm pretty sure what you want can be done with the shingle token filter. Depending on your mapping, I would imagine you'd need to add a filter looking something like this to your content field to get your tokens indexed in pairs:
"filter_shingle":{
"type":"shingle",
"max_shingle_size":2,
"min_shingle_size":2,
"output_unigrams":"true"
}
Note that this is also already the default configuration, I just added it for clarity.

Getting an exact match to the string `#deprecated` in Kibana/ELK

I'm using Kibana to find all logs containing an exact match of the string #deprecated.
For a reason I don't understand, it matches string with the word "deprecated" without the # sign.
I tried to use escaping for # according to the Lucene Documentation. i.e. message:"\\#deprecated" - without change in results.
How can I query to exact match the #deprecated text exact match only
Why is this happening?
You problem isn't an issue with query syntax, which is what escaping is for, it's with analysis. You analyzer removes punctuation, because it's parsing it as full text. It removes #, in much the same way that it will remove periods and commas.
So, after analysis (assuming standard analysis) of something like: "Class is #deprecated" the token stream generated will have the following tokens: "class", "deprecated" ("is" is a stop word). The indexed form of "#deprecated" and "deprecated" are identical, so it is impossible to have a query that can differentiate between them as it is currently indexed.
To fix this you would have to change your analyzer. WhitespaceAnalyzer may be a good choice, and should fix this issue. However, be careful you aren't doing more harm than good. If you use WhitespaceAnalyzer, you are going to have to contend with other punctuation as well, and a search for "sentence"
would not find "match at the end of this sentence.", because of the period. So, if you are searching full text, this will certainly cause far more problems than it solves.
If you want to know the full rules of standard analysis, by the way, it's an implementation of UAX #29 word boundaries

How to use a DN containing commas as the attribute value in an LDAP search filter?

Was attempting to search our directory based on an attribute whose value is a DN. However, our user RDNs are of the form CN=Surname, GivenName, which requires that the comma be quoted in the full DN. But given an attribute like manager whose value is the DN of another user, I was unable to search for all users having specific manager. I tried (manager=CN=Surname\, GivenName,CN=users,DC=mydomain,DC=com), but got a syntax error "Bad search filter". I tried various options for quoting the DN, but all either gave me a syntax error or failed to match any objects. What am I doing wrong?
(Note that if I were looking for user objects directly, I could search for simply (CN=Surname, GivenName), with no quoting required, but I was searching for users having a specific manager. The comma-containing attribute value only becomes a problem when part of a Distinguished Name.)
The problem is that quoting the comma in the Common Name is not for the benefit of the filter parser, but for the benefit of the DN parser; the attribute value passed to that by the filter has to literally contain the backslash character. Unfortunately, the backslash is also (differently) special in LDAP filters, thus the syntax errors.
The solution is simple, but it isn't as obvious as doubling the backslash; backslash in LDAP filters works like % in URIs, so you have to use a literal backslash followed by the 2-digit hexadecimal code point for a backslash:
(manager=CN=Surname\5c, Givenname,OU=org,DC=mydomain,DC=com)
It turns out there's an example of this specific use case at the very bottom of https://docs.oracle.com/cd/E19424-01/820-4811/gdxpo/index.html#6ng8i269q.

Is there a way to search fhir resources on a text search parameter using wildcards?

I'm trying to search for all Observations where "blood" is associated with the code using:
GET [base]/Observation?code:text=blood
It appears that the search is matching Observations where the associated text starts with "blood" but not matching on associated text that contains "blood".
Using the following, I get results with a Coding.display of "Systolic blood pressure" but I'd like to also get these Observations by searching using the text "blood".
GET [base]/Observation?code:text=sys
Is there a different modifier I should be using or wildcards I should use?
The servers seem to do as the spec requests: when using the modifier :text on a token search parameter (like code here), the spec says:
":text The search parameter is processed as a string that searches
text associated with the code/value"
If we look at how a server is supposed to search a string, we find:
"By default, a field matches a string query if the value of the field
equals or starts with the supplied parameter value, after both have
been normalized by case and accent."
Now, if code would have been a true string search parameter, we could have applied the modifier contains, however we cannot stack modifiers, so in this case code:text:containts would may logical, but is not part of the current specification.
So, I am afraid that there is currently no "standard" way to do what you want.

Resources