Case insensitivity does not work - elasticsearch

I cant figure out why my searches are case sensitive. Everything I've read says that ES is insensitive by default. I have mappings that specify the standard analyzer for indexing and search but it seems like some things are still case sensitive - ie, wildcard:
"query": {
"bool": {
"must": [
{
"wildcard": {
"name": {
"value": "Rae*"
}
}
}
]
}
This fails but "rae*" works as wanted. I need to use wildcard for 'starts-with' type searches (I presume).
I'm using NEST from a .Net app and am specifying the analyzers when I create the index thus:
var settings = new IndexSettings();
settings.NumberOfReplicas = _configuration.Replicas;
settings.NumberOfShards = _configuration.Shards;
settings.Add("index.refresh_interval", "10s");
settings.Analysis.Analyzers.Add(new KeyValuePair<string, AnalyzerBase>("keyword", new KeywordAnalyzer()));
settings.Analysis.Analyzers.Add(new KeyValuePair<string, AnalyzerBase>("simple", new SimpleAnalyzer()));
In this case it's using the simple analyzer but the standard one has the same result.
The mapping looks like this:
name: {
type: string
analyzer: simple
store: yes
}
Anyone got any ideas whats wrong here?
Thanks

From the documentation,
"[The wildcard query] matches documents that have fields matching a wildcard expression (not analyzed)".
Because the search term is not analyzed, you'll essentially need to run the analysis yourself before generating the search query. In this case, this just means that your search term needs to be lowercase. Alternatively, you could use query_string:
{
"query": {
"bool": {
"must": [
{
"query_string": {
"query": "name:Rae*"
}
}
]
}
}
}

Related

Elastic search wildcard query crashes cluster

I run the query below on a large elastic search cluster. The cluster bcomes unresponsive
{
"size": 10000,
"query": {
"bool": {
"must": [
{
"regexp": {
"message": {
"value": ".*exception.*"
}
}
},
{
"bool": {
"should": [
{
"term": {
"beat.hostname": "ip-xxx-xx-xx-xx"
}
}
]
}
},
{
"range": {
"#timestamp": {
"lt": 1518459660000,
"format": "epoch_millis",
"gte": 1518459600000
}
}
}
]
}
}
}
When I remove the wildcarded .*exception.* and replace it with any non wildcarded string like xyz it returns fast. Though the query uses a wildcarded expression, it also looks for a small time range and a specific host. I would think this is a very simple query. Any reason why elasticsearch server can't handle this query? The cluster has 10 nodes and 20 TB of data.
See the documentation for Regexp Query. It clearly states the following:
Note: The performance of a regexp query heavily depends on the regular
expression chosen. Matching everything like .* is very slow
What would be ideal is to change the text analysis on the message field with a WordDelimiterTokenFilter and set split_on_case_change to true. Then something like NullPointerException will get indexed as three separate tokens [Null, Pointer, Exception]. This can help you search on exception without using a regex. Caveat is you need to reindex all your documents.
Another quick thing to try might be to keep your filter conditions on the hostname and timestamp in a filter context, which will prefilter documents before running your regexp query. This may be a short-term solution for you until you fix the text analysis.

How do I not match a bare hyphen in Elasticsearch?

I am querying apache logs stored in Elasticsearch. I want to return log entries from a given hostname that has a hyphen and with a populated auth field.
These strings should be an exact match: "hostname": "example-dev" and not "auth": "-".
My questions are:
How do I correctly remap a type in Elasticsearch to allow a hyphen to be part of the matched string.
How do I correctly query a type in Elasticsearch with a bare hyphen.
The hyphen is a reserved character in Elasticsearch, so I understand it takes special effort. However, I'm having what seems like a lot of trouble figuring out how to include it in my query.
I have tried to remap the type to be not_analysed. It looks like the format has recently changed. The old way of defining the index ("analysed", "not_analysed", and "no") makes sense to me. The new way (true or false) does not. In either case, I cannot seem to get remapping to work.
Here is my attempt at remapping:
DELETE /search
PUT search
{
"mappings" : {
"beat" : {
"properties" : {
"hostname" : {
"type" : "text",
"norms" : false,
"index" : false
}
}
}
}
}
I have not included the remapping of the auth field because it only returns a mapper_parsing_exception.
I am using json to query Elasticsearch. Here is my query:
GET _search
{
"query": {
"bool": {
"filter": {
"bool": {
"must": [
{
"match": {
"beat.hostname": "example-dev"
}
}
],
"must_not": [
{
"match": {
"auth.keyword": "-"
}
}
]
}
}
}
}
}
I have tried escaping the hyphen with \\- but that returns results that match "auth": "-". The hostname still does not match exactly. The hostname query also matches something like "example-prod".
I have tried using "term" rather than "match"; that returns no results.
I can match a specific string for "auth", for example "must": { "match": { "auth": "foo" } } returns all entries for auth = "foo". That is opposite of what I need, but it does work. The hostname is still not exactly matched if it includes a hyphen.
The log entries are parsed into Elasticsearch using ELK stack, however this will be a report that is generated outside of Kibana for legacy reasons.
I have read the documentation and examples, but there is a lot to dig through. Many of the examples I have found are for older versions of Elasticsearch, which is understandable, but confusing.
I am new to Elasticsearch. It feels like I am just overlooking something, but it the problem might stem from a basic misunderstanding of how Elasticsearch is doing things.
After spending some more time with ElascticSearch queries, I think I have it figured out.
Splitting the hostname string into two separate string and matching for both filters the hostname as expected. Using an empty string for the negative match also seems to work as expected.
Here is the updated query:
{
"query": {
"bool": {
"filter": {
"bool": {
"must": [
{
"match": {
"beat.hostname": "example"
}
},
{
"match": {
"beat.hostname": "dev"
}
}
],
"must_not": [
{
"match_phrase": {
"auth.keyword": ""
}
}
]
}
}
}
}
I will do bit more testing is need to make sure this is actually returning what I need.
I was trying too hard to make ElasticSearch fit what I expected. Instead of working with ElasticSearch, I was trying to fight against it.

How to use multifield search in elasticsearch combining should and must clause

This may be a repeted question but I'm not findin' a good solution.
I'm trying to search elasticsearch in order to get documents that contains:
- "event":"myevent1"
- "event":"myevent2"
- "event":"myevent3"
the documents must not contain all of them in the same document but the result should contain only documents that are only with those types of events.
And this is simple because elasticsearch helps me with the clause should
which returns exactly what i want.
But then, I want that all the documents must contain another condition that is I want the field result.example.example = 200 and this must be in every single document PLUS the document should be 1 of the previously described "event".
So, for example, a document has "event":"myevent1" and result.example.example = 200 another one has "event":"myevent2" and result.example.example = 200 etc etc.
I've tried this configuration:
{
"query": {
"bool": {
"must":{"match":{"operation.result.http_status":200}},
"should": [
{
"match": {
"event": "bank.account.patch"
}
},
{
"match": {
"event": "bank.account.add"
}
},
{
"match": {
"event": "bank.user.patch"
}
}
]
}
}
}
but is not working 'cause I also get documents that not contain 1 of the should field.
Hope I explained well,
Thanks in advance!
As is, your query tells ES to look for documents that must have "operation.result.http_status":200 and to boost those that have a matching event type.
You're looking to combine two must queries
one that matches one of your event types,
one for your other condition
The event clause accepts multiple values and those values are exact matches : you're looking for a terms query.
Try
{
"query": {
"bool": {
"must": [
{"match":{"operation.result.http_status":200}},
{
"terms" : {
"event" : [
"bank.account.patch",
"bank.account.add",
"bank.user.patch"
]
}
}
]
}
}
}

Elasticsearch: filter by any field

I am playing with filters in elasticsearch (we use old version 1.3.1), and I need to filter my search results by any field. With query, this can be done like this:
"query": {
"query_string": {
"query": "_all:test"
}
}
But filters seems to not work with _all statement. What can I do? Would newer elasticsearch version solve my problem?
Thanks in advance!
PS: I need to search exact values, so I cannot use queries. There is difference between queries and filters - if you search for my brown, then you can expect results like:
my brown
This is my brown dog.
someone stolen my brown wallet
But filter will return only my brown, and that is what I need.
You might want to read up a little on the distinction between queries and filters. What you're doing there is a query string query.
If you do actually want to filter against exact text tokens (read up on analysis if you don't know what I mean by "tokens"), AND you have your mapping set up such that the "_all" field behaves as you're expecting then try something like this:
POST /test_index/_search
{
"query": {
"filtered": {
"filter": {
"term": {
"_all": "test"
}
}
}
}
}
If, on the other hand, you want to allow some analysis (so that "Test" is tokenized to "test", for example), you may want this instead:
POST /test_index/_search
{
"query": {
"match": {
"_all": "Test"
}
}
}
Here is some code I used to play around with it:
http://sense.qbox.io/gist/44adf2c2ade8abd6758f0e08ed2e40434850fc1c

elasticsearch - confused on how to searching items that a field contains string

This query is returning fine only one item "steve_jobs".
{
"query": {
"constant_score": {
"filter": {
"term": {
"name":"steve_jobs"
}
}
}
}
}
So, now I want to get all people with name prefix steve_. So I try this:
{
"query": {
"constant_score": {
"filter": {
"term": {
"name": "steve_"
}
}
}
}
}
This is returning nothing. Why?
I'm confused about when to use term query / term filter / terms filter / querystring query.
What you need is Prefix Query.
If you are indexing your document like so:
POST /testing_nested_query/class/
{
"name": "my name is steve_jobs"
}
And you are using the default analyzer, then the problem is that the term steve_jobs will be indexed as one term. So your Term Query will never be able to find any docs matching the term steve as there is no term like in the index. Prefix Query helps you solve your problem by searching for a prefix in all the indexed terms.
You can solve the same problem by making your custom analyzers (read this and this) so that steve_jobs is stored as steve and jobs.

Resources