So I have a certain query running in (Yelp's) Elastalert and I am trying to filter out logs containing one of several keywords. If I use the any rule type, I get a set of 30 matches to the certain query I have. When I change the ruletype to whitelist:
type: whitelist
compare_key: message
ignore_null: true
whitelist: ["exclude_strings"...]
I still get the same 30 matches, even when I know the message field contains the listed strings. I've also tried changing the compare key or the strings, using strings that exactly match the entire field, I've changed the formatting to
whitelist:
- "string"
...
and nothing has made a difference. The same thing happens also with the blacklist type.
What am I missing?
After further testing, it turns out that either of the above formats will work correctly. The reason I thought it was not working is that I was looking at the hits term in the Elastalert status. Instead I should have been looking at the matches term. The search returned the same number of hits because the query was the same each time, but it seems that the matches term comes, not from ElasticSearch, but from Elastalert itself.
That is, Elastalert sends the full query to ElasticSearch, and then does the filtering on the returned data based on the whitelist terms. hits will be the same every time, but matches depends on the whitelist. If you set realert to zero, you will see that the number of alerts generated is the same as the number of matches.
Related
Consider that I have a document which has a field with the following content: 5W30 QUARTZ INEO MC 3 5L
A user wants to be able to search for MC3 (no space) and get the document; however, search for MC 3 (with spaces) should also work. Moreover, there can be documents that have the content without spaces and that should be found when querying with a space.
I tried indexing without spaces (e.g. 5W30QUARTZINEOMC35L), but that does not really work as using a wildcard search I would match too much, e.g. MC35 would also match, and I only want to match two exact words concatenated together (as well as exact single word).
So far I'm thinking of additionally indexing all combinations of two words, e.g. 5W30QUARTZ, QUARTZINEO, INEOMC, MC3, 35L. However, does Elasticsearch have a native solution for this?
I'm pretty sure what you want can be done with the shingle token filter. Depending on your mapping, I would imagine you'd need to add a filter looking something like this to your content field to get your tokens indexed in pairs:
"filter_shingle":{
"type":"shingle",
"max_shingle_size":2,
"min_shingle_size":2,
"output_unigrams":"true"
}
Note that this is also already the default configuration, I just added it for clarity.
I'm trying to search for all Observations where "blood" is associated with the code using:
GET [base]/Observation?code:text=blood
It appears that the search is matching Observations where the associated text starts with "blood" but not matching on associated text that contains "blood".
Using the following, I get results with a Coding.display of "Systolic blood pressure" but I'd like to also get these Observations by searching using the text "blood".
GET [base]/Observation?code:text=sys
Is there a different modifier I should be using or wildcards I should use?
The servers seem to do as the spec requests: when using the modifier :text on a token search parameter (like code here), the spec says:
":text The search parameter is processed as a string that searches
text associated with the code/value"
If we look at how a server is supposed to search a string, we find:
"By default, a field matches a string query if the value of the field
equals or starts with the supplied parameter value, after both have
been normalized by case and accent."
Now, if code would have been a true string search parameter, we could have applied the modifier contains, however we cannot stack modifiers, so in this case code:text:containts would may logical, but is not part of the current specification.
So, I am afraid that there is currently no "standard" way to do what you want.
I'm testing Amazon Cloudsearch for my web application and i'm running into some strange issues.
I have the following domain indexes: name, email, id.
For example, I have data such as: John Doe, John#example.com, 1
When I search for jo I get nothing. If I search for joh I still get nothing, But if I search for john then I get the above document as a hit. Why is it not getting when I put partial strings? I even put suggestors on name and email with fuzzy matching enabled. Is there something else i'm missing? I read the below on this:
http://docs.aws.amazon.com/cloudsearch/latest/developerguide/searching-text.html
http://docs.aws.amazon.com/cloudsearch/latest/developerguide/searching.html
http://docs.aws.amazon.com/cloudsearch/latest/developerguide/searching-compound-queries.html
I'm doing the searches using boto as well as with the form on AWS page.
What you're trying to do -- finding "john" by searching "jo" -- is a called a prefix search.
You can accomplish this either by searching
(prefix field=name 'jo')
or
q=jo*
Note that if you use the q=jo* method of appending * to all your queries, you may want to do something like q=jo* |jo because john* will not match john.
This can seem a little confusing but imagine if google gave back results for prefix matches: if you searched for tort and got back a mess of results about tortoises and torture instead of tort (a legal term), you would be very confused (and frustrated).
A suggester is also a viable approach but that's going to give you back suggestions (like john, jordan and jostle rather than results) that you would then need to search for; it does not return matching documents to you.
See "Searching for Prefixes in Amazon CloudSearch" at http://docs.aws.amazon.com/cloudsearch/latest/developerguide/searching-text.html
Are your index field types "Text"? If they are just "Literals", they have to be an exact match.
I think you must have your name and email fields set as the literal type instead of the text type, otherwise a simple text search of 'jo' or 'Joh' should've found the example document.
While using a prefix search may have solved your problem (and that makes sense if the fields are set as the literal type), the accepted answer isn't really correct. The notion that it's "like a google search" isn't based on anything in the documentation. It actually contradicts the example they use, and in general muddies up what's possible with the service. From the docs:
When you search text and text-array fields for individual terms, Amazon CloudSearch finds all documents that contain the search terms anywhere within the specified field, in any order. For example, in the sample movie data, the title field is configured as a text field. If you search the title field for star, you will find all of the movies that contain star anywhere in the title field, such as star, star wars, and a star is born. This differs from searching literal fields, where the field value must be identical to the search string to be considered a match.
The current question is not the same as this one.
I have an LDAP entry which the content "This is a SimpleTest indeed" in the "supName" field.
I need to write a filter so that when the user of my software introduces any substring of this content in any case (upper, lower or mixed case), it finds the entry. It must work even if the user does not input a complete word ("impletes", for example).
The supName field follows DirectoryString syntax. This means that the default matching rule is exact and case sensitive ("caseExactMatch"). But this syntax in theory, should allow also "caseIgnoreMatch" and "caseIgnoreSubstringsMatch" matching rules. I though I just needed to force to use the last one ("caseIgnoreSubstringsMatch"), so I tried this filter:
(supName:caseIgnoreSubstringsMatch:=*impletes*)
But this does not work. I make my tests using Apache Directory Studio, and that tool refuses to accept the above filter. It complains on the asterisks, and I don't understand why, since I am using a Substring match (and thus asterisks should be allowed). If I run the filter from command line (using ldapsearch), I get this error message:
ldap_search_ext: Bad search filter (-7)
Therefore this is not an issue with Apache Directory Studio.
So my question is: What is the correct way of defining a case-insensitive substring filter on a field that is case-sensitive by default?
Futher tests:
What follows are some other filters I have tested, and the reasons they do not suit me.
Test #1 filters:
(supName=*impleTes*)
This operator (=) returns my test entry, but it's not case-insensitive. If I replace "impleTes" for "impletes" it does not return anything.
Test #2 filter:
(supName~=simpletest)
This operator (~=) works, but:
It needs a complete word. If I replace "simpletest" for "impletes" it does not return anything.
As it is an "approximate" search operator, it may return unwanted results. For example, the filter above returns also a second entry: "This is a SimpleTast indeed" (notice the "a" instead of "e" in "SimpleTast"). I don't want approximate results.
Test #3 filter:
(supName:caseIgnoreMatch:=this is a simpletest indeed)
This returns the entry I was expecting, and only that entry. It is also case-insensitive. But it forces the user to write the whole content of the field: it is not a substring search, but a case-insensitive exact-match search.
Test #4 filter:
(supName:caseIgnoreMatch:=*impletes*)
This returns a "Bad search filter (-7)" error, with is expected since I am not allowed to use substring syntax in an exact matching rule.
And finally, the Test #5 filter:
(supName:caseIgnoreSubstringsMatch:=*impletes*)
Which I expected to work, but returns a "Bad search filter (-7)" error.
Additional info - Opposite example
I have found here (see the "Extensible Matching" section at the end) examples of the opposite case. In the example, the field "sn" uses the "caseIgnoreMatch" matching rule by default (making it case-insensitive). So what they want in the example is to do a case-sensitive substring search. This is the filter they use:
(sn:caseExactSubstringMatch:=*S*)
But I doubt if this example is correct, because if I try exactly the same filter on my side:
(supName:caseExactSubstringMatch:=*S*)
I get a "Bad search filter (-7)" error.
So maybe my issue is due to limitation on OpenLDAP 2.4.33 but would work with other LDAP servers, although the example comes from a guide that is supposed to cover OpenLDAP 2.x ... (?)
If I'm reading RFC 4515ยง3 correctly, an extensible match can only be done with an assertion value (read: a fixed string) and not with a substring filter. If that were permitted, I would expect your original example (supName:caseIgnoreSubstringsMatch:=*impletes*) to work.
Other option is
(Attribute~=value)
-- http://publib.boulder.ibm.com/tividd/td/IBMDS/IDSprogref52/en_US/HTML/progref.htm#HDRLDAPSRCH
I want to fetch data from different counters from graphite in one single request like:-
summarize(site.testing_server_2.triggers_unknown.count,'1hour','sum')&format=json
summarize(site.testing_server_2.requests_failed.count,'1hour','sum')&format=json
summarize(site.testing_server_2.core_network_bad_soap.count,'1hour','sum')&format=json
and so on.. 20 more.
But I don't want to fetch
summarize(site.testing_server_2.module_xyz_abc.count,'1hour','sum')&format=json
in that request how can i do that?
This is what I tried:
summarize(site.testing_server_2.*.count,'1hour','sum')&format=json&from=-24hour
It gets json data for 'module_xyz_abc' too, but that i don't want.
You can't use regular expressions per se, but you can use some similar (in concept and somewhat in format) matching techniques available within the Graphite Render URL API. There are a few ways you can "match" within a target's "bucket" (i.e. between the dots).
Target Matching
Asterisk * match
The asterisk can be used to match ANY -zero or more- character(s). It can be used to replace the entire bucket (site.*.test) or within the bucket (site.w*t.test). Here is an example:
site.testing_server_2.requests_*.count
This would match site.testing_server_2.requests_failed.count, site.testing_server_2.requests_success.count, site.testing_server_2.requests_blah123.count, and so forth.
Character range [a-z0-9] match
The character range match is used to match on a single character (site.w[0-9]t.test) in the target's bucket and is specified as a range or list. For example:
site.testing_server_[0-4].requests_failed.count
This would match on site.testing_server_0.requests_failed.count, site.testing_server_1.requests_failed.count, site.testing_server_2.requests_failed.count, and so forth.
Value list (group capture) {blah, test, ...} match
The value list match can be used to match anything in the list of values, in the specified portion of the target's bucket.
site.testing_server_2.{triggers_unknown,requests_failed,core_network_bad_soap}.count
This would match site.testing_server_2.triggers_unknown.count, site.testing_server_2.requests_failed.count, and site.testing_server_2.core_network_bad_soap.count. But nothing else, so site.testing_server_2.module_xyz_abc.count would not match.
Answer
Without knowing all of your bucket values it is difficult to be surgical with the approach (perhaps with a combination of the matching options), so I'll recommend just going with a value list match. This should allow you to get all of the values in one -somewhat long- request. For example (and keep in mind you'd need to include all of your values):
summarize(site.testing_server_2.{triggers_unknown,requests_failed,core_network_bad_soap}.count,'1hour','sum')&format=json&from=-24hour
For more, see Graphite Paths and Wildcards