enabled fielddata on text field in ElasticSearch but aggregation is not working - elasticsearch

According to the documentation you can run ElasticSearch aggregations on fields that are type keyword or not a text field or which have fielddata set to true in the index mapping.
I am trying to count city_names in an nginx log. It works fine with the int field result. But it does not work with the field city_name even when I updated the index mapping for that to put fielddata=true. The should have been not required as it was of type keyword.
To say it does not work means that:
"aggregations" : {
"cities" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [ ]
Here is the field mapping:
"city_name" : {
"type" : "text",
"fielddata" : true
And here is the aggression query:
curl -XGET --user $pwd --header 'Content-Type: application/json' https://58571402f5464923883e7be42a037917.eu-central-1.aws.cloud.es.io:9243/logstash/_search?pretty -d '{
"aggs" : {
"cities": {
"terms" : { "field": "city_name"}

If you don't get any error when executing your search it seems that is more like a problem with the data. Are you sure you have, at least, one document with the field city_name filled?
I tried to reproduce your issue with ElasticSearch 6.6.2.
I created an index
PUT cities
"mappings": {
"city": {
"dynamic": "true",
"properties": {
"id": {
"type": "long"
"city_name": {
"type": "text",
"fielddata": true
I added one document without the city_name
PUT cities/city/1
"id": "1"
When i performed the search:
GET cities/_search
"aggs": {
"cities": {
"terms" : { "field": "city_name"}
I got no buckets in the cities aggregation. But when I added one document with the city name filled:
PUT cities/city/2
"id": "2",
"city_name": "London"
I got the expected result:
"took" : 3,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"skipped" : 0,
"failed" : 0
"hits" : {
"total" : 2,
"max_score" : 1.0,
"hits" : [
"_index" : "cities",
"_type" : "city",
"_id" : "2",
"_score" : 1.0,
"_source" : {
"id" : "2",
"city_name" : "london"
"_index" : "cities",
"_type" : "city",
"_id" : "1",
"_score" : 1.0,
"_source" : {
"id" : "1"
"aggregations" : {
"cities" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
"key" : "london",
"doc_count" : 1


why elasticsearch can not search a document contains one word?

I am using default settings for one index, follow DSL is how to create the documents and searching.
### create index
PUT /mk_test
"settings": {
"number_of_shards": 1,
"number_of_replicas": 1
"mappings": {
"_doc": {
"properties": {
"nickName": {
"type": "text"
### get index
GET /mk_test/_mapping
### create document
POST /mk_test/_doc
"nickName": "C.BP"
### create document
POST /mk_test/_doc
"nickName": "BP"
### create document
POST /mk_test/_doc
"nickName": "C.B"
### create document
POST /mk_test/_doc
"nickName": "你好,中国"
now I have 4 document in mk_test index,
and I have 2 search query, give me different answers.
I want to query docs contains "中国"
GET /mk_test/_search
"query": {
"bool": {
"must": [
{"match_phrase": {"nickName": "中国"}}
server responses:
"took" : 0,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
"hits" : {
"total" : 1,
"max_score" : 1.5779729,
"hits" : [
"_index" : "mk_test",
"_type" : "_doc",
"_id" : "c2gwwX0BTkUG9klh1b8k",
"_score" : 1.5779729,
"_source" : {
"nickName" : "你好,中国"
I want to query docs contains "BP", I can't get "C.BP",
GET /mk_test/_search
"query": {
"bool": {
"must": [
{"match_phrase": {"nickName": "BP"}}
server give me only "BP", but "C.BP" not found
"took" : 1,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
"hits" : {
"total" : 1,
"max_score" : 1.4599355,
"hits" : [
"_index" : "mk_test",
"_type" : "_doc",
"_id" : "TmguwX0BTkUG9klhAJ_S",
"_score" : 1.4599355,
"_source" : {
"nickName" : "BP"
How can I find both "BP" and "C.BP" ?

Why does elastic search wildcard query return no results?

Query #1 in Kibana returns results, however Query #2 returns no results. I search for only "bob" and get results, but when searching for "bob smith", no results, even though "Bob Smith" exists in the index. Any reason why?
Query #1: returns results
GET people/_search
"query": {
"wildcard" : {
"name" : "*bob*"
"took" : 4,
"timed_out" : false,
"_shards" : {
"total" : 2,
"successful" : 2,
"skipped" : 0,
"failed" : 0
"hits" : {
"total" : {
"value" : 23,
"relation" : "eq"
"max_score" : 1.0,
"hits" : [
"_index" : "people",
"_type" : "_doc",
"_id" : "xxxxx",
"_score" : 1.0,
"_source" : {
"name" : "Bob Smith",
Query #2: returns nothing.. why(?)
GET people/_search
"query": {
"wildcard" : {
"name" : "*bob* *smith*"
"took" : 3,
"timed_out" : false,
"_shards" : {
"total" : 2,
"successful" : 2,
"skipped" : 0,
"failed" : 0
"hits" : {
"total" : {
"value" : 0,
"relation" : "eq"
"max_score" : null,
"hits" : [ ]
Look like the reason of the empty result is your index mapping. If you use "text" type field, you actually search in the inverted index, mean you search in the token "bob" and token "smith" (standard analyzer) and not in the "Bob Smith". If you want to search in "Bob Smith" as one token, you need to use "keyword" type (maybe with lowercase normalizer, if you want to use not key sensetive search)
For example:
PUT test
"settings": {
"analysis": {
"normalizer": {
"lowercase_normalizer": {
"type": "custom",
"char_filter": [],
"filter": [
"mappings": {
"properties": {
"name": {
"type": "keyword",
"ignore_above": 256,
"normalizer": "lowercase_normalizer"
PUT test/_doc/1
"name" : "Bob Smith"
GET test/_search
"query": {
"wildcard": {
"name": "*bob* *Smith*"

delete all documents where id start with a number Elasticsearch

What is the fastest way to get all _ids ?
I need a query to delete all documents where _id start with a number in elasticsearch.
"took" : 3,
"timed_out" : false,
"_shards" : {
"total" : 2,
"successful" : 2,
"skipped" : 0,
"failed" : 0
"hits" : {
"total" : {
"value" : 10000,
"relation" : "gte"
"max_score" : 1.0,
"hits" : [
"_index" : "myindex",
"_type" : "_doc",
"_id" : "_2432475",
"_score" : 1.0,
"_source" : {
"name" : "999",
"file" : null,
"age" : null,
Your best bet is to first copy the internal _id into a doc-level field (let's call it internal_id:
POST myindex/_update_by_query
"query": {
"match_all": {}
"script": {
"source": "ctx._source.internal_id = ctx._id",
"lang": "painless"
and then use a match_phrase_prefix query like so:
GET myindex/_search
"query": {
"match_phrase_prefix": {
"internal_id": "_24"
POST /myindex/_delete_by_query' \
-H 'Content-Type: application/json' \
-d '{
"query": {
"terms": {
"_id": [ "1", "2" ]
wild card on _id is not supported in elasticsearch, either you have to index similar key explictly into the doc or
you can update doc using _update_by_query and add _id key into it

ElasticSearch join data within the same index

I am quite new with ElasticSearch and I am collecting some application logs within the same index which have this format
"_index" : "app_logs",
"_type" : "_doc",
"_id" : "JVMYi20B0a2qSId4rt12",
"_source" : {
"username" : "mapred",
"app_id" : "application_1569623930006_490200",
"event_type" : "STARTED",
"ts" : "2019-10-02T08:11:53Z"
I can have different event types. In this case I am interested in STARTED and FINISHED. I would like to query ES in order to get all the app that started in a certain day and enrich them with their end time. Basically I want to create couples of start/end (an end might also be missing, but that's fine).
I have realized join relations in sql cannot be used in ES and I was wondering if I can exploit some other feature in order to get this result in one query.
Edit: these are the details of the index mapping
“app_logs" : {
"mappings" : {
"_doc" : {
"properties" : {
"event_type" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
“app_id" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
"ts" : {
"type" : "date"
“event_type” : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
What I understood is that you would want to collate list of documents having same app_id along with the status as either STARTED or FINISHED.
I do not think Elasticsearch is not meant to perform JOIN operations. I mean you can but then you have to design your documents as mentioned in this link.
What you would need is an Aggregation query.
Below is the sample mapping, documents, the aggregation query and the response as how it appears, which would actually help you get the desired result.
PUT mystatusindex
"mappings": {
"properties": {
"type": "keyword"
"type": "keyword"
"type": "date"
Sample Documents
POST mystatusindex/_doc/1
"username" : "mapred",
"app_id" : "application_1569623930006_490200",
"event_type" : "STARTED",
"ts" : "2019-10-02T08:11:53Z"
POST mystatusindex/_doc/2
"username" : "mapred",
"app_id" : "application_1569623930006_490200",
"event_type" : "FINISHED",
"ts" : "2019-10-02T08:12:53Z"
POST mystatusindex/_doc/3
"username" : "mapred",
"app_id" : "application_1569623930006_490201",
"event_type" : "STARTED",
"ts" : "2019-10-02T09:30:53Z"
POST mystatusindex/_doc/4
"username" : "mapred",
"app_id" : "application_1569623930006_490202",
"event_type" : "STARTED",
"ts" : "2019-10-02T09:45:53Z"
POST mystatusindex/_doc/5
"username" : "mapred",
"app_id" : "application_1569623930006_490202",
"event_type" : "FINISHED",
"ts" : "2019-10-02T09:45:53Z"
POST mystatusindex/_doc/6
"username" : "mapred",
"app_id" : "application_1569623930006_490203",
"event_type" : "STARTED",
"ts" : "2019-10-03T09:30:53Z"
POST mystatusindex/_doc/7
"username" : "mapred",
"app_id" : "application_1569623930006_490203",
"event_type" : "FINISHED",
"ts" : "2019-10-03T09:45:53Z"
POST mystatusindex/_search
"size": 0,
"query": {
"bool": {
"must": [
"range": {
"ts": {
"gte": "2019-10-02T00:00:00Z",
"lte": "2019-10-02T23:59:59Z"
"should": [
"match": {
"event_type": "STARTED"
"match": {
"event_type": "FINISHED"
"aggs": {
"application_IDs": {
"terms": {
"field": "app_id"
"aggs": {
"ids": {
"top_hits": {
"size": 10,
"_source": ["event_type", "app_id"],
"sort": [
{ "event_type": { "order": "desc"}}
Notice that for filtering I've made use of Range Query as you only want to filter documents for that date and also added a bool should logic to filter based on STARTED and FINISHED.
Once I have the documents, I've made use of Terms Aggregation and Top Hits Aggregation to get the desired result.
"took" : 12,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
"hits" : {
"total" : {
"value" : 5,
"relation" : "eq"
"max_score" : null,
"hits" : [ ]
"aggregations" : {
"application_IDs" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
"key" : "application_1569623930006_490200", <----- APP ID
"doc_count" : 2,
"ids" : {
"hits" : {
"total" : {
"value" : 2,
"relation" : "eq"
"max_score" : null,
"hits" : [
"_index" : "mystatusindex",
"_type" : "_doc",
"_id" : "1", <--- Document with STARTED status
"_score" : null,
"_source" : {
"event_type" : "STARTED",
"app_id" : "application_1569623930006_490200"
"sort" : [
"_index" : "mystatusindex",
"_type" : "_doc",
"_id" : "2", <--- Document with FINISHED status
"_score" : null,
"_source" : {
"event_type" : "FINISHED",
"app_id" : "application_1569623930006_490200"
"sort" : [
"key" : "application_1569623930006_490202",
"doc_count" : 2,
"ids" : {
"hits" : {
"total" : {
"value" : 2,
"relation" : "eq"
"max_score" : null,
"hits" : [
"_index" : "mystatusindex",
"_type" : "_doc",
"_id" : "4",
"_score" : null,
"_source" : {
"event_type" : "STARTED",
"app_id" : "application_1569623930006_490202"
"sort" : [
"_index" : "mystatusindex",
"_type" : "_doc",
"_id" : "5",
"_score" : null,
"_source" : {
"event_type" : "FINISHED",
"app_id" : "application_1569623930006_490202"
"sort" : [
"key" : "application_1569623930006_490201",
"doc_count" : 1,
"ids" : {
"hits" : {
"total" : {
"value" : 1,
"relation" : "eq"
"max_score" : null,
"hits" : [
"_index" : "mystatusindex",
"_type" : "_doc",
"_id" : "3",
"_score" : null,
"_source" : {
"event_type" : "STARTED",
"app_id" : "application_1569623930006_490201"
"sort" : [
Note that the last document with only STARTED appears in the aggregation result as well.
Updated Answer
"event_type.keyword":"STARTED" <----- Changed this
"event_type.keyword":"FINISHED" <----- Changed this
"field":"app_id.keyword" <----- Changed this
"event_type.keyword":{ <----- Changed this
Note the changes I've made. Whenever you would need exact matches or want to make use of aggregation, you would need to make use of keyword type.
In the mapping you've shared, there is no username field but two event_type fields. I'm assuming its just a human err and that one of the field should be username.
Now if you notice carefully, the field event_type has a text and its sibling keyword field. I've just modified the query to make use of the keyword field and when I am doing that, I'm use Term Query.
Try this out and let me know if it helps!

Find list of Distinct string Values stored in a field in ElasticSearch

I have stored my data in elasticsearch which is as given below. It returns only distinct words in the given field and not the entire distinct phrase.
"_index" : "test01",
"_type" : "whatever01",
"_id" : "1234",
"_score" : 1.0,
"_source" : {
"company_name" : "State Bank of India",
"user" : ""
"_index" : "test01",
"_type" : "whatever01",
"_id" : "5678",
"_score" : 1.0,
"_source" : {
"company_name" : "State Bank of India",
"user" : ""
"_index" : "test01",
"_type" : "whatever01",
"_id" : "8901",
"_score" : 1.0,
"_source" : {
"company_name" : "Kotak Mahindra Bank",
"user" : ""
I tried using Term Aggregation Function
GET /test01/_search/
"aggs" : {
"terms" :
{ "field": "company_name"}
I get the following output
"aggregations" : {
"genres" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 10531,
"buckets" : [
"key" : "bank",
"doc_count" : 2818
"key" : "mahindra",
"doc_count" : 1641
"key" : "state",
"doc_count" : 1504
How to get the entire string in the field "company_name" with only distinct values as given below?
"aggregations" : {
"genres" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 10531,
"buckets" : [
"key" : "Kotak Mahindra Bank",
"doc_count" : 2818
"key" : "State Bank of India",
"doc_count" : 1641
It appears that you've set "fielddata": "true" for your field company_name which is of type text. This is not good as it can end up consuming lot of heap space as mentioned in this link.
Further more, the field's values of type text are broken down into tokens and is saved in inverted index using a process called Analysis. Setting fielddata on fields of type text would cause the aggregation to work as what you mentioned in your question.
What you'd need to do is create its sibling equivalent of type keyword as mentioned in this link and perform aggregation on that field.
Basically modify your mapping for company_name as below:
PUT <your_index_name>/_search
"mappings": {
"mydocs": {
"properties": {
"company_name": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword"
Run the below aggregation query on this company_name.keyword field and you'd get what you are looking for.
POST <your_index_name>/_search
"aggs": {
"unique_names": {
"terms": {
"field": "company_name.keyword", <----- Run on this field
"size": 10
Hope this helps!
