Why is this space breaking my text filter?
I have JSON logs that start like this, and I want to be able to use the value of this field to determine the destination of the log file output.
{"LogType": "Status",
This works, meaning logs match the filter and I'm getting output in my target file:
if ($hostname contains "10.1.2.3" and $rawmsg-after-pri startswith '{\"LogType\":') then {
action(type="omfile" dynaFile="myFile" template="myTemplate") }
This does not work, meaning no messages are matched, after adding a space into the startswith filter:
if ($hostname contains "10.1.2.3" and $rawmsg-after-pri startswith '{\"LogType\": "') then {
action(type="omfile" dynaFile="myFile" template="myTemplate") }
For now I will filter for just "status" with the quotes around it, but I really want to filter on both the field and value together to be sure it's not picking up the value of some other field. Beyond that, I also wanted to use the startswith filter over the contains filter for performance reasons, but the same issue occurs if I try to use a contains filter that includes the space.
Secondarily, if anyone cares to indulge, I don't understand why $rawmsg-after-pri startswith worked, but $msg and $rawmsg didn't, completely regardless of the space issue:
$rawmsg-after-pri startswith '{'
vs.
$msg startswith '{'
I've spent hours searching and reading the documentation. At first, I thought it was something about escaping characters, or maybe it was using regex.
So, I tried every possible combination of these, and more
'{\"LogType\": \"Status\"'
"{\"LogType\": \"Status\""
"{\\\"LogType\\\":\\s\\\"Status\\\""
Finally I went back to the beginning and tested all of these until I realized it's the space character causing the issue.
matched: '{''
matched: '{\"'
matched: '{\"LogType'
matched: '{\"LogType\"'
matched: '{\"LogType\":'
UN-matched: '{\"LogType\": '
Edit: also, I can confirm it is not a tab. This is the character set making it through the working filters: {"LogType": "Status"
Related
I'm trying to search over an index that includes constellation code names, and the code name for the Andromeda constellation is And.
Unfortunately, if I search using And, all results are returned. This is the only one that doesn't work, across dozens of constellation code names, and I assume it's because it's interpreted as the logical operator AND.
(constellation:(And)) returns my entire result set, regardless of the value of constellation.
Is there a way to fix this without doing tricks like indexing with an underscore in front?
Thanks!
I went for a bit of a hack, indexing the constellation as __Foo__ and then changing my search query accordingly by adding the __ prefix and suffix to the selected constellation.
Consider that I have a document which has a field with the following content: 5W30 QUARTZ INEO MC 3 5L
A user wants to be able to search for MC3 (no space) and get the document; however, search for MC 3 (with spaces) should also work. Moreover, there can be documents that have the content without spaces and that should be found when querying with a space.
I tried indexing without spaces (e.g. 5W30QUARTZINEOMC35L), but that does not really work as using a wildcard search I would match too much, e.g. MC35 would also match, and I only want to match two exact words concatenated together (as well as exact single word).
So far I'm thinking of additionally indexing all combinations of two words, e.g. 5W30QUARTZ, QUARTZINEO, INEOMC, MC3, 35L. However, does Elasticsearch have a native solution for this?
I'm pretty sure what you want can be done with the shingle token filter. Depending on your mapping, I would imagine you'd need to add a filter looking something like this to your content field to get your tokens indexed in pairs:
"filter_shingle":{
"type":"shingle",
"max_shingle_size":2,
"min_shingle_size":2,
"output_unigrams":"true"
}
Note that this is also already the default configuration, I just added it for clarity.
I'm using Kibana to find all logs containing an exact match of the string #deprecated.
For a reason I don't understand, it matches string with the word "deprecated" without the # sign.
I tried to use escaping for # according to the Lucene Documentation. i.e. message:"\\#deprecated" - without change in results.
How can I query to exact match the #deprecated text exact match only
Why is this happening?
You problem isn't an issue with query syntax, which is what escaping is for, it's with analysis. You analyzer removes punctuation, because it's parsing it as full text. It removes #, in much the same way that it will remove periods and commas.
So, after analysis (assuming standard analysis) of something like: "Class is #deprecated" the token stream generated will have the following tokens: "class", "deprecated" ("is" is a stop word). The indexed form of "#deprecated" and "deprecated" are identical, so it is impossible to have a query that can differentiate between them as it is currently indexed.
To fix this you would have to change your analyzer. WhitespaceAnalyzer may be a good choice, and should fix this issue. However, be careful you aren't doing more harm than good. If you use WhitespaceAnalyzer, you are going to have to contend with other punctuation as well, and a search for "sentence"
would not find "match at the end of this sentence.", because of the period. So, if you are searching full text, this will certainly cause far more problems than it solves.
If you want to know the full rules of standard analysis, by the way, it's an implementation of UAX #29 word boundaries
The current question is not the same as this one.
I have an LDAP entry which the content "This is a SimpleTest indeed" in the "supName" field.
I need to write a filter so that when the user of my software introduces any substring of this content in any case (upper, lower or mixed case), it finds the entry. It must work even if the user does not input a complete word ("impletes", for example).
The supName field follows DirectoryString syntax. This means that the default matching rule is exact and case sensitive ("caseExactMatch"). But this syntax in theory, should allow also "caseIgnoreMatch" and "caseIgnoreSubstringsMatch" matching rules. I though I just needed to force to use the last one ("caseIgnoreSubstringsMatch"), so I tried this filter:
(supName:caseIgnoreSubstringsMatch:=*impletes*)
But this does not work. I make my tests using Apache Directory Studio, and that tool refuses to accept the above filter. It complains on the asterisks, and I don't understand why, since I am using a Substring match (and thus asterisks should be allowed). If I run the filter from command line (using ldapsearch), I get this error message:
ldap_search_ext: Bad search filter (-7)
Therefore this is not an issue with Apache Directory Studio.
So my question is: What is the correct way of defining a case-insensitive substring filter on a field that is case-sensitive by default?
Futher tests:
What follows are some other filters I have tested, and the reasons they do not suit me.
Test #1 filters:
(supName=*impleTes*)
This operator (=) returns my test entry, but it's not case-insensitive. If I replace "impleTes" for "impletes" it does not return anything.
Test #2 filter:
(supName~=simpletest)
This operator (~=) works, but:
It needs a complete word. If I replace "simpletest" for "impletes" it does not return anything.
As it is an "approximate" search operator, it may return unwanted results. For example, the filter above returns also a second entry: "This is a SimpleTast indeed" (notice the "a" instead of "e" in "SimpleTast"). I don't want approximate results.
Test #3 filter:
(supName:caseIgnoreMatch:=this is a simpletest indeed)
This returns the entry I was expecting, and only that entry. It is also case-insensitive. But it forces the user to write the whole content of the field: it is not a substring search, but a case-insensitive exact-match search.
Test #4 filter:
(supName:caseIgnoreMatch:=*impletes*)
This returns a "Bad search filter (-7)" error, with is expected since I am not allowed to use substring syntax in an exact matching rule.
And finally, the Test #5 filter:
(supName:caseIgnoreSubstringsMatch:=*impletes*)
Which I expected to work, but returns a "Bad search filter (-7)" error.
Additional info - Opposite example
I have found here (see the "Extensible Matching" section at the end) examples of the opposite case. In the example, the field "sn" uses the "caseIgnoreMatch" matching rule by default (making it case-insensitive). So what they want in the example is to do a case-sensitive substring search. This is the filter they use:
(sn:caseExactSubstringMatch:=*S*)
But I doubt if this example is correct, because if I try exactly the same filter on my side:
(supName:caseExactSubstringMatch:=*S*)
I get a "Bad search filter (-7)" error.
So maybe my issue is due to limitation on OpenLDAP 2.4.33 but would work with other LDAP servers, although the example comes from a guide that is supposed to cover OpenLDAP 2.x ... (?)
If I'm reading RFC 4515ยง3 correctly, an extensible match can only be done with an assertion value (read: a fixed string) and not with a substring filter. If that were permitted, I would expect your original example (supName:caseIgnoreSubstringsMatch:=*impletes*) to work.
Other option is
(Attribute~=value)
-- http://publib.boulder.ibm.com/tividd/td/IBMDS/IDSprogref52/en_US/HTML/progref.htm#HDRLDAPSRCH
Is it possible to find (search) in Dynamics AX 2009 for an exact match?
For example, when I am searching in the AOT for "AddressRelationship", I don't want to see DirPartyAddressRelationship in the results.
Okay, it took me a while, but I have figured this out, it Is possible.
Adding a breakpoint to the find form shows that it uses a class called SysUtilScanSource to find your string within the AX source code.
In SysUtilScanSource.do() the method match is used to find a match against the specific source code. You can read more about match here;
http://msdn.microsoft.com/en-us/library/aa886279(v=ax.10).aspx
The match method allows you to use expressions.
The expression you require is as follows;
:SPACE
Where SPACE is the character ' '. Sets the match to blanks, tabulations, and control characters such as Enter (new line).
For example:
match("ab: cd","ab cd"); //returns 1
match("ab: cd","ab\ncd"); //returns 1
match("ab: cd","ab\tcd"); //returns 1
match("ab: cd","ab cd"); //returns 0 - only the first space is matched
Therefore, in your example you need enter the following string in the "containing text" field;
: AddressRelationship:
Note that in the above string there are spaces in the following locations;
:SPACEAddressRelationship:SPACE
Try it. I did, it works a treat.
When you do the find, look at "properties" tab at the end of the find form window. This allows you to scale down the search based on properties. I do not believe there is a way to use an exact match but you can narrow your search down using the properties.