Elasticsearch return unique values for a field - elasticsearch

I am trying to build an Elasticsearch query that will return only unique values for a particular field.
I do not want to return all the values for that field nor count them.
For example, if there are 50 different values currently contained by the field, and I do a search to return only 20 hits (size=20). I want each of the 20 results to have a unique result for that field, but I don't care about the 30 other values not represented in the result.
For example with the following search (pseudo code - not checked):
{
from: 0,
size: 20,
query: {
bool: {
must: {
range: { field1: { gte: 50 }},
term: { field2: 'salt' },
/**
* I want to return only unique values for "field3", but I
* don't want to return all of them or count them.
*
* How do I specify this in my query?
**/
unique: 'field3',
},
mustnot: {
match: { field4: 'pepper'},
}
}
}
}

You should be able to do this pretty easily with a terms aggregation.
Here's an example. I defined a simple index, containing a field that has "index": "not_analyzed" so we can get the full text of each field as a unique value, rather than terms generated from tokenizing it, etc.
DELETE /test_index
PUT /test_index
{
"settings": {
"number_of_shards": 1
},
"mappings": {
"doc": {
"properties": {
"title": {
"type": "string",
"index": "not_analyzed"
}
}
}
}
}
Then I add a few docs with the bulk API.
POST /test_index/_bulk
{"index":{"_index":"test_index","_type":"doc","_id":1}}
{"title":"first doc"}
{"index":{"_index":"test_index","_type":"doc","_id":2}}
{"title":"second doc"}
{"index":{"_index":"test_index","_type":"doc","_id":3}}
{"title":"third doc"}
{"index":{"_index":"test_index","_type":"doc","_id":4}}
{"title":"third doc"}
Now we can run our terms aggregation:
POST /test_index/_search?search_type=count
{
"aggs": {
"unique_vals": {
"terms": {
"field": "title"
}
}
}
}
...
{
"took": 1,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"failed": 0
},
"hits": {
"total": 4,
"max_score": 0,
"hits": []
},
"aggregations": {
"unique_vals": {
"buckets": [
{
"key": "third doc",
"doc_count": 2
},
{
"key": "first doc",
"doc_count": 1
},
{
"key": "second doc",
"doc_count": 1
}
]
}
}
}

I'm very surprised a filter aggregation hasn't been suggested. It goes back all the way to ES version 1.3.
The filter aggregation is similar to a regular filter query but can instead be nested into an aggregation chain to filter out counts of documents that don't meet a particular criteria and give you sub-aggregation results based only on the documents that meet the criteria of the query.
First, we'll put our mapping.
curl --request PUT \
--url http://localhost:9200/items \
--header 'content-type: application/json' \
--data '{
"mappings": {
"item": {
"properties": {
"field1" : { "type": "integer" },
"field2" : { "type": "keyword" },
"field3" : { "type": "keyword" },
"field4" : { "type": "keyword" }
}
}
}
}
'
Then let's load some data.
curl --request PUT \
--url http://localhost:9200/items/_bulk \
--header 'content-type: application/json' \
--data '{"index":{"_index":"items","_type":"item","_id":1}}
{"field1":50, "field2":["salt", "vinegar"], "field3":["garlic", "onion"], "field4":"paprika"}
{"index":{"_index":"items","_type":"item","_id":2}}
{"field1":40, "field2":["salt", "pepper"], "field3":["onion"]}
{"index":{"_index":"items","_type":"item","_id":3}}
{"field1":100, "field2":["salt", "vinegar"], "field3":["garlic", "chives"], "field4":"pepper"}
{"index":{"_index":"items","_type":"item","_id":4}}
{"field1":90, "field2":["vinegar"], "field3":["chives", "garlic"]}
{"index":{"_index":"items","_type":"item","_id":5}}
{"field1":900, "field2":["salt", "vinegar"], "field3":["garlic", "chives"], "field4":"paprika"}
'
Notice, that only the documents with id's 1 and 5 will pass the criteria and so we will be left to aggregate on these two field3 arrays and four values total. ["garlic", "chives"], ["garlic", "onion"]. Also notice that field3 can be an array or single value in the data but I'm making them arrays to illustrate how the counts will work.
curl --request POST \
--url http://localhost:9200/items/item/_search \
--header 'content-type: application/json' \
--data '{
"size": 0,
"aggregations": {
"top_filter_agg" : {
"filter" : {
"bool": {
"must":[
{
"range" : { "field1" : { "gte":50} }
},
{
"term" : { "field2" : "salt" }
}
],
"must_not":[
{
"term" : { "field4" : "pepper" }
}
]
}
},
"aggs" : {
"field3_terms_agg" : { "terms" : { "field" : "field3" } }
}
}
}
}
'
After running the conjuncted filter/terms aggregation. We only have a count of 4 terms on field3 and three unique terms altogether.
{
"took": 46,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"skipped": 0,
"failed": 0
},
"hits": {
"total": 5,
"max_score": 0.0,
"hits": []
},
"aggregations": {
"top_filter_agg": {
"doc_count": 2,
"field3_terms_agg": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "garlic",
"doc_count": 2
},
{
"key": "chives",
"doc_count": 1
},
{
"key": "onion",
"doc_count": 1
}
]
}
}
}
}

Related

Fielddata is disabled on text fields by default in elasticsearch

I have problem that I updated from elasticsearch 2.x to 5.1. However, some of my data does not work in newer elasticsearch because of this "Fielddata is disabled on text fields by default" https://www.elastic.co/guide/en/elasticsearch/reference/5.1/fielddata.html before 2.x it was enabled it seems.
Is there way to enable fielddata automatically to text fields?
I tried code like this
curl -XPUT http://localhost:9200/_template/template_1 -d '
{
"template": "*",
"mappings": {
"_default_": {
"properties": {
"fielddata-*": {
"type": "text",
"fielddata": true
}
}
}
}
}'
but it looks like elasticsearch does not understand wildcard there in field name. Temporary solution to this is that I am running python script every 30 minutes, scanning all indices and adding fielddata=true to fields which are new.
The problem is that I have string data like "this is cool" in elasticsearch.
curl -XPUT 'http://localhost:9200/example/exampleworking/1' -d '
{
"myfield": "this is cool"
}'
when trying to aggregate that:
curl 'http://localhost:9200/example/_search?pretty=true' -d '
{
"aggs": {
"foobar": {
"terms": {
"field": "myfield"
}
}
}
}'
"Fielddata is disabled on text fields by default. Set fielddata=true on [myfield]"
that elasticsearch documentation suggest using .keyword instead of adding fielddata. However, that is not returning data what I want.
curl 'http://localhost:9200/example/_search?pretty=true' -d '
{
"aggs": {
"foobar": {
"terms": {
"field": "myfield.keyword"
}
}
}
}'
returns:
"buckets" : [
{
"key" : "this is cool",
"doc_count" : 1
}
]
which is not correct. Then I add fielddata true and everything works:
curl -XPUT 'http://localhost:9200/example/_mapping/exampleworking' -d '
{
"properties": {
"myfield": {
"type": "text",
"fielddata": true
}
}
}'
and then aggregate
curl 'http://localhost:9200/example/_search?pretty=true' -d '
{
"aggs": {
"foobar": {
"terms": {
"field": "myfield"
}
}
}
}'
return correct result
"buckets" : [
{
"key" : "cool",
"doc_count" : 1
},
{
"key" : "is",
"doc_count" : 1
},
{
"key" : "this",
"doc_count" : 1
}
]
How I can add this fielddata=true automatically to all indices to all text fields? Is that even possible? In elasticsearch 2.x this is working out of the box.
i will answer to myself
curl -XPUT http:/localhost:9200/_template/template_1 -d '
{
"template": "*",
"mappings": {
"_default_": {
"dynamic_templates": [
{
"strings2": {
"match_mapping_type": "string",
"mapping": {
"type": "text",
"fielddata": true
}
}
}
]
}
}
}'
this is doing what i want. Now all indexes have default settings fielddata true
Adding "fielddata": true allows the text field to be aggregated, but this has performance problems at scale. A better solution is to use a multi-field mapping.
Unfortunately, this is hidden a bit deep in Elasticsearch's documentations, in a warning under the fielddata mapping parameter: https://www.elastic.co/guide/en/elasticsearch/reference/current/text.html#before-enabling-fielddata
Here's a complete example of how this helps with a terms aggregation, tested on Elasticsearch 7.12 as of 2021-04-24:
Mapping (in ES7, under the mappings property of the body of a "put index template" request etc):
{
"properties": {
"bio": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword"
}
}
}
}
}
Four documents indexed:
{
"bio": "Dogs are the best pet."
}
{
"bio": "Cats are cute."
}
{
"bio": "Cats are cute."
}
{
"bio": "Cats are the greatest."
}
Aggregation query:
{
"size": 0,
"aggs": {
"bios_with_cats": {
"filter": {
"match": {
"bio": "cats"
}
},
"aggs": {
"bios": {
"terms": {
"field": "bio.keyword"
}
}
}
}
}
}
Aggregation query results:
{
"took": 2,
"timed_out": false,
"_shards": {
"total": 2,
"successful": 2,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 4,
"relation": "eq"
},
"max_score": null,
"hits": []
},
"aggregations": {
"bios_with_cats": {
"doc_count": 3,
"bios": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "Cats are cute.",
"doc_count": 2
},
{
"key": "Cats are the greatest.",
"doc_count": 1
}
]
}
}
}
}
Basically, this aggregation says "Of the documents whose bios are like 'cats', how many of each distinct bio are there?" The one document without "cats" in its bio property is excluded, and then the remaining documents are grouped into buckets, one of which has one document and the other has two documents.

Elasticsearch Histogram of visits

I'm quite new to Elasticsearch and I fail to build a histogram based on ranges of visits. I am not even sure that it's possible to create this kind of chart by using a single query in Elasticsearch, but I'm the feeling that could be possible with pipeline aggregation or may be scripted aggregation.
Here is a test dataset with which I'm working:
PUT /test_histo
{ "settings": { "number_of_shards": 1 }}
PUT /test_histo/_mapping/visit
{
"properties": {
"user": {"type": "string" },
"datevisit": {"type": "date"},
"page": {"type": "string"}
}
}
POST test_histo/visit/_bulk
{"index":{"_index":"test_histo","_type":"visit"}}
{"user":"John","page":"home.html","datevisit":"2015-11-25"}
{"index":{"_index":"test_histo","_type":"visit"}}
{"user":"Jean","page":"productXX.hmtl","datevisit":"2015-11-25"}
{"index":{"_index":"test_histo","_type":"visit"}}
{"user":"Robert","page":"home.html","datevisit":"2015-11-25"}
{"index":{"_index":"test_histo","_type":"visit"}}
{"user":"Mary","page":"home.html","datevisit":"2015-11-25"}
{"index":{"_index":"test_histo","_type":"visit"}}
{"user":"Mary","page":"media_center.html","datevisit":"2015-11-25"}
{"index":{"_index":"test_histo","_type":"visit"}}
{"user":"John","page":"home.html","datevisit":"2015-11-25"}
{"index":{"_index":"test_histo","_type":"visit"}}
{"user":"John","page":"media_center.html","datevisit":"2015-11-26"}
If we consider the ranges [1,2[, [2,3[, [3, inf.[
The expected result should be :
[1,2[ = 2
[2,3[ = 1
[3, inf.[ = 1
All my efforts to find the histogram showing a customer visit frequency remained to date unsuccessful. I would be pleased to have a few tips, tricks or ideas to get a response to my problem.
There are two ways you can do it.
First is doing it in ElasticSearch which will require Scripted Metric Aggregation. You can read more about it here.
Your query would look like this
{
"size": 0,
"aggs": {
"visitors_over_time": {
"date_histogram": {
"field": "datevisit",
"interval": "week"
},
"aggs": {
"no_of_visits": {
"scripted_metric": {
"init_script": "_agg['values'] = new java.util.HashMap();",
"map_script": "if (_agg.values[doc['user'].value]==null) {_agg.values[doc['user'].value]=1} else {_agg.values[doc['user'].value]+=1;}",
"combine_script": "someHashMap = new java.util.HashMap();for(x in _agg.values.keySet()) {value=_agg.values[x];if(value<3){key='[' + value +',' + (value + 1) + '[';}else{key='[' + value +',inf[';}; if(someHashMap[key]==null){someHashMap[key] = 1}else{someHashMap[key] += 1}}; return someHashMap;"
}
}
}
}
}
}
where you can change period of time in date_histogram object in the field interval by values like day, week, month.
Your response would look like this
{
"took": 5,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"failed": 0
},
"hits": {
"total": 7,
"max_score": 0,
"hits": []
},
"aggregations": {
"visitors_over_time": {
"buckets": [
{
"key_as_string": "2015-11-23T00:00:00.000Z",
"key": 1448236800000,
"doc_count": 7,
"no_of_visits": {
"value": [
{
"[2,3[": 1,
"[3,inf[": 1,
"[1,2[": 2
}
]
}
}
]
}
}
}
Second method is to the work of scripted_metric in client side. You can use the result of Terms Aggregation. You can read more about it here.
Your query will look like this
GET test_histo/visit/_search
{
"size": 0,
"aggs": {
"visitors_over_time": {
"date_histogram": {
"field": "datevisit",
"interval": "week"
},
"aggs": {
"no_of_visits": {
"terms": {
"field": "user",
"size": 10
}
}
}
}
}
}
and the response will be
{
"took": 3,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"failed": 0
},
"hits": {
"total": 7,
"max_score": 0,
"hits": []
},
"aggregations": {
"visitors_over_time": {
"buckets": [
{
"key_as_string": "2015-11-23T00:00:00.000Z",
"key": 1448236800000,
"doc_count": 7,
"no_of_visits": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "john",
"doc_count": 3
},
{
"key": "mary",
"doc_count": 2
},
{
"key": "jean",
"doc_count": 1
},
{
"key": "robert",
"doc_count": 1
}
]
}
}
]
}
}
}
where on the response you can do count for each doc_count for each period.
Have a look at:
https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-bucket-datehistogram-aggregation.html
If you whant to show it in fancy already fixed UI use Kibana.
A query like this:
GET _search
{
"query": {
"match_all": {}
},
{
"aggs" : {
"visits" : {
"date_histogram" : {
"field" : "datevisit",
"interval" : "month"
}
}
}
}
}
Should give you a histogram, I don't have elastic here at the moment so I might have some fat finggered typos.
Then you could ad query terms to only show histogram for specific page our you could have an aouter aggregation bucket wich aggregates / page or user.
Something like this:
GET _search
{
"query": {
"match_all": {}
},
{
{
"aggs" : {
"users" : {
"terms" : {
"field" : "user",
},
"aggs" : {
"visits" : {
"date_histogram" : {
"field" : "datevisit",
"interval" : "month"
}
}
}
}
}
Have a look to this solution:
{
"query": {
"match_all": {}
},
"aggs": {
"periods": {
"filters": {
"filters": {
"1-2": {
"range": {
"datevisit": {
"gte": "2015-11-25",
"lt": "2015-11-26"
}
}
},
"2-3": {
"range": {
"datevisit": {
"gte": "2015-11-26",
"lt": "2015-11-27"
}
}
},
"3-": {
"range": {
"datevisit": {
"gte": "2015-11-27",
}
}
}
}
},
"aggs": {
"users": {
"terms": {"field": "user"}
}
}
}
}
}
Step by step:
Filter aggregation: You can define ranged values for the next aggregation, in this case we define 3 periods based on date range filter
Nested Users aggregation: This aggregation returns as many results as filters you'd defined. So, in this case, you'll get 3 values using range date filtering
You'll get a result like this:
{
...
"aggregations" : {
"periods" : {
"buckets" : {
"1-2" : {
"users" : {
"buckets" : [
{"key" : XXX,"doc_count" : NNN},
{"key" : YYY,"doc_count" : NNN},
]
}
},
"2-3" : {
"users" : {
"buckets" : [
{"key" : XXX1,"doc_count" : NNN1},
{"key" : YYY1,"doc_count" : NNN1},
]
}
},
"3-" : {
"users" : {
"buckets" : [
{"key" : XXX2,"doc_count" : NNN2},
{"key" : YYY2,"doc_count" : NNN2},
]
}
},
}
}
}
}
Try it, and tell if it works

Why is my query ignoring my filter aggregation?

Preface
I have 4 days experience of Elasticsearch 1.7.2.
Setup
I have a collection of documents, each document is a User. The User has a number of Answers which is linked through UserAnswers. Which gives a document reference of user_answers.answer[]. Where the answers array is an array of objects.
The user_answers.answer[].correct is a boolean field which tells me if the answer given by the user is correct or not.
Objective
I would like to list the users and also display the total number of correct and incorrect answers they have.
Approach
So far I have tried a number of different approaches and the one I'll include here is as close as I've got so far in 1.5 days of trying.
Use a terms aggregation to create a bucket for each User by username.
Filter each bucket to leave only correct or incorrect answers.
Count the number of filtered answers.
Query
{
"size": 0,
"filter": {
"bool": {
"must_not": {
// Remove users who already have this award
"term": {"awards_users.award_id": 2}
}
}
},
"aggs": {
"users": {
"terms": {"field": "username"},
"aggs": {
"correct": {
"filter": {
"term": {"user_answers.answer.correct": true}
},
"aggs": {
"count": {
"value_count": {
"field": "user_answers.answer.id"
}
}
}
},
// Same for incorrect, but inverted correct value
}
}
}
}
Sample response
{
"key": "neon1024",
"doc_count": 1,
"correct": {
"doc_count": 1,
"count": {
"value": 7 // Expected 1 correct & 6 incorrect
}
}
},
This is the record which I am testing against, and I am expecting that 1 is returned instead of 7. There are 7 answers in total, 6 incorrect and 1 correct. This I have verified in my document index.
The problem
For some reason the actual filter seems to be being ignored, and leaving all possible related answers in the bucket. Hence the aggregation is seeing them all, rather than showing the expected value.
Question
How can I use an aggregation to segregate my counts based on the value of the related answers values?
Thanks for reading my long question!
As suggested, you probably have your answers mapped as object, while you should be using nested type.
Using nested type, elasticsearch will store your answers as individual documents linked to the root one and will let you do expected aggregations on them. You'll have to use nested type aggregation in your query to achieve that.
So I'd say it would be best to map your document like this:
PUT /test
{
"mappings" : {
"your_type" : {
"properties" : {
"username" : {
"type" : "string",
"index" : "not_analyzed"
},
"user_answers" : {
"type" : "nested",
"properties" : {
"id" : {
"type" : "integer"
},
"answer" : {
"type" : "string"
},
"correct" : {
"type" : "boolean"
}
}
}
}
}
}
}
Test document:
PUT /test/your_type/1
{
"username": "neon1024",
"user_answers": [
{
"id": 1,
"answer": "answer1",
"correct": true
},
{
"id": 2,
"answer": "answer2",
"correct": true
},
{
"id": 3,
"answer": "answer3",
"correct": false
}
]
}
Query:
POST /test/_search?search_type=count
{
"aggs": {
"users": {
"terms": {
"field": "username"
},
"aggs": {
"DiveIn": {
"nested": {
"path": "user_answers"
},
"aggs": {
"CorrectVsIncorrect": {
"terms": {
"field": "user_answers.correct",
"size": 2
}
}
}
}
}
}
}
}
And Final result:
{
"took": 6,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 1,
"max_score": 0,
"hits": []
},
"aggregations": {
"users": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "neon1024",
"doc_count": 1,
"DiveIn": {
"doc_count": 3,
"CorrectVsIncorrect": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "T",
"doc_count": 2
},
{
"key": "F",
"doc_count": 1
}
]
}
}
}
]
}
}
}
Where "key": "T" represents correct answers and "doc_count": 2 represents amount of them.

How to apply default post filter with ElasticSearch?

I would like to implement an engine of backtesting using elasticsearch. To be able to do that I would need to filter the hits by excluding the ones that are posterior to the testing date and I would like to do that by default because the algorithm (that I want to backtest) is not supposed to know about the backtesting.
In other words, is it possible to apply a default post filter to ElasticSearch queries?
For example, let's say that those documents are in ES:
{ name: 'Jean', weight: 70, date: 2012-01-01 }
{ name: 'Jules', weight: 70, date: 2010-01-01 }
{ name: 'David', weight: 80, date: 2010-01-01 }
I want to apply a default post filter to exclude documents posterior to 2011 in a way that if I do a query to get every persons with a weight of 70, the only result I have is Jules.
You can do that with Filtered Aliases. When you query through the alias, the filter is automatically applied to your query...which hides it from your application:
// Insert the data
curl -XPOST "http://localhost:9200/people/data/" -d'
{ "name": "Jean", "weight" : 70, "date": "2012-01-01" }'
curl -XPOST "http://localhost:9200/people/ata" -d'
{ "name": "Jules", "weight" : 70, "date": "2010-01-01" }'
curl -XPOST "http://localhost:9200/people/data/" -d'
{ "name": "David", "weight" : 80, "date": "2010-01-01" }'
// Add a filtered alias
curl -XPOST "http://localhost:9200/_aliases" -d'
{
"actions" : [
{
"add" : {
"index" : "people",
"alias" : "filtered_people",
"filter" : {
"range" : {
"date" : { "gte" : "2011-01-01"}
}
}
}
}
]
}'
Now you execute the search against filtered_people instead of the underlying people index:
curl -XGET "http://localhost:9200/filtered_people/_search" -d'
{
"query": {
"filtered": {
"filter": {
"term": {
"weight": 70
}
}
}
}
}'
Which will return just the doc you are interested in:
{
"took": 3,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 1,
"max_score": 1,
"hits": [
{
"_index": "people",
"_type": "ata",
"_id": "AUudZPUfCSiheYJkTW-h",
"_score": 1,
"_source": {
"name": "Jules",
"weight": 70,
"date": "2010-01-01"
}
}
]
}
}

Search query for elasticsearch when child element is array of string

I created a documents in elasticsearch in the following format
curl -XPUT "http://localhost:9200/my_base.main_candidate/" -d'
{
"specific_location": {
"location_name": "Mumbai",
"location_tags": [
"Mumbai"
],
"tags": [
"Mumbai"
]
}
}'
My requirement is to search for location_tags containing one of the given options like ["Mumbai", "Pune"]. How do I do this?
I tried:
curl -XGET "http://localhost:9200/my_base.main_candidate/_search" -d '
{
"query": {
"filtered": {
"query": {
"match_all": {}
},
"filter": {
"terms": {
"specific_location.location_tags" : ["Mumbai"]
}
}
}
}
}'
which didn't work.
I got this output :
{
"took": 72,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 0,
"max_score": null,
"hits": []
}
}
There are a several ways to solve this. Perhaps the most immediate one is to search for mumbai instead of Mumbai.
If I create the index with no mapping,
curl -XDELETE "http://localhost:9200/my_base.main_candidate/"
curl -XPUT "http://localhost:9200/my_base.main_candidate/"
then add a doc:
curl -XPUT "http://localhost:9200/my_base.main_candidate/doc/1" -d'
{
"specific_location": {
"location_name": "Mumbai",
"location_tags": [
"Mumbai"
],
"tags": [
"Mumbai"
]
}
}'
then run your query with the lower-case term
curl -XPOST "http://localhost:9200/my_base.main_candidate/_search" -d'
{
"query": {
"filtered": {
"query": {
"match_all": {}
},
"filter": {
"terms": {
"specific_location.location_tags": [
"mumbai"
]
}
}
}
}
}'
I get back the expected doc:
{
"took": 3,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 1,
"max_score": 1,
"hits": [
{
"_index": "my_base.main_candidate",
"_type": "doc",
"_id": "1",
"_score": 1,
"_source": {
"specific_location": {
"location_name": "Mumbai",
"location_tags": [
"Mumbai"
],
"tags": [
"Mumbai"
]
}
}
}
]
}
}
This is because, since no explicit mapping was used, Elasticsearch uses defaults, which means the location_tags field will be analyzed with the standard analyzer, which will convert terms to lower-case. So the term Mumbai does not exist, but mumbai does.
If you want to be able to use upper-case terms in your query, you will need to set up an explicit mapping that tells Elasticsearch not to analyze the location_tags field. Maybe something like this:
curl -XDELETE "http://localhost:9200/my_base.main_candidate/"
curl -XPUT "http://localhost:9200/my_base.main_candidate/" -d'
{
"mappings": {
"doc": {
"properties": {
"specific_location": {
"properties": {
"location_tags": {
"type": "string",
"index": "not_analyzed"
},
"tags": {
"type": "string",
"index": "not_analyzed"
}
}
}
}
}
}
}'
curl -XPUT "http://localhost:9200/my_base.main_candidate/doc/1" -d'
{
"specific_location": {
"location_name": "Mumbai",
"location_tags": [
"Mumbai"
],
"tags": [
"Mumbai"
]
}
}'
curl -XPOST "http://localhost:9200/my_base.main_candidate/_search" -d'
{
"query": {
"filtered": {
"query": {
"match_all": {}
},
"filter": {
"terms": {
"specific_location.location_tags": [
"Mumbai"
]
}
}
}
}
}'
Here is all the above code in a handy place:
http://sense.qbox.io/gist/74844f4d779f7c2b94a9ab65fd76eb0ffe294cbb
[EDIT: by the way, I used Elasticsearch 1.3.4 when testing the above code]

Resources