ElasticSearch 5.1 Fielddata is disabled in text field by default [ERROR: trying to use aggregation on field] - elasticsearch

Having this field in my mapping
"answer": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
i try to execute this aggregation
"aggs": {
"answer": {
"terms": {
"field": "answer"
}
},
but i get this error
"type": "illegal_argument_exception",
"reason": "Fielddata is disabled on text fields by default. Set fielddata=true on [answer] in order to load fielddata in memory by uninverting the inverted index. Note that this can however use significant memory."
Do i have to change my mapping or am i using the wrong aggregation ? (just updated from 2.x to 5.1)

You need to aggregate on the keyword sub-field, like this:
"aggs": {
"answer": {
"terms": {
"field": "answer.keyword"
}
},
That will work.

In Aggregation, just add keyword to answer.It worked for me. For text fields we need to add keyword.
"field": "answer.keyword"

Adding to #Val's answer, you can also set the fielddata to true during your mapping itself:
"answer": {
"type": "text",
"fielddata": true, <-- add this line
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},

Related

Elasticsearch match string with spaces, columns, dashes exactly

I'm using Elasticsearch 6.8, and trying to write a query in python notebook. Here is a mapping used for the index i'm working with:
{ "mapping": { "news": { "properties": { "dateCreated": { "type": "date", "format": "yyyy/MM/dd HH:mm:ss||yyyy/MM/dd||epoch_millis" }, "itemId": { "type": "text", "fields": { "keyword": { "type": "keyword", "ignore_above": 256 } } }, "market": { "type": "text", "fields": { "keyword": { "type": "keyword", "ignore_above": 256 } } }, "timeWindow": { "type": "text", "fields": { "keyword": { "type": "keyword", "ignore_above": 256 } } }, "title": { "type": "text", "fields": { "keyword": { "type": "keyword", "ignore_above": 256 } } } } } } }
I'm trying to search for exact string like "[2020-08-16 10:00:00.0,2020-08-16 11:00:00.0]" in "timeWindow" field (which is a "text" type, not a "date" field), and also select by market="en-us" (market is a "text" field too). This string has spaces,colons,commas, a lot of whitecharacters, and I don't know how to make a right query.
At the moment I have this query:
res = es.search(index='my_index',
doc_type='news',
body={
'size': size,
'query':{
"bool":{
"must":[{
"simple_query_string": {
"query": "[2020-08-17 00:00:00.0,2020-08-17 01:00:00.0]",
"default_operator": "and",
"minimum_should_match":"100%"
}
},
{"match":{"market":"en-us"}}
]
}
}
})
The problem is that is doesn't match my "simple_query_string" for timeWindow string exactly (I understand that this string gets tokenized, splitted into parts like "2020","08","17","00","01", etc, and each token is analyzed separately), and I'm getting different values for timeWindow that I want to exclude, like
['[2020-08-17 00:00:00.0,2020-08-17 01:00:00.0]'
'[2020-08-17 00:05:00.0,2020-08-17 01:05:00.0]'
...
'[2020-08-17 00:50:00.0,2020-08-17 01:50:00.0]'
'[2020-08-17 00:55:00.0,2020-08-17 01:55:00.0]'
'[2020-08-17 01:00:00.0,2020-08-17 02:00:00.0]']
Is there a way to do what I want?
UPD (and answer):
My current query uses "term" and "timeWindow.keyword", this combination allows me to do exact search for string with spaces and other whitecharacters:
res = es.search(index='msn_click_events', doc_type='news', body={
'size': size,
'query':{
"bool":{
"must":[{
"term": {
"timeWindow.keyword": tw
}
},
{"match":{"market":"en-us"}}
]
}
}
})
And this query selects only right timewindows values (string):
['[2020-08-17 00:00:00.0,2020-08-17 01:00:00.0]'
'[2020-08-17 01:00:00.0,2020-08-17 02:00:00.0]'
'[2020-08-17 02:00:00.0,2020-08-17 03:00:00.0]'
...
'[2020-08-17 22:00:00.0,2020-08-17 23:00:00.0]'
'[2020-08-17 23:00:00.0,2020-08-18 00:00:00.0]']
On your timeWindow field you need a keyword aka exact search but you are using the full-text query and as you defined this field as text field and you already guessed it correct, it gets analyzed during the index time, hence you are not getting the correct results.
If you are using the dynamic mapping, then .keyword field would be generated for each text field in the mapping, so you can simply use timeWindow.keyword in your query and it will work.
If you have defined your mapping than you need to add the keyword field to store the timewindow, reindex the data and use that keyword field in query to get the expected results.

How to set elasticsearch index mapping as not_analysed for all the fields

I want my elasticsearch index to match the exact value for all the fields. How do I map my index to "not_analysed" for all the fields.
I'd suggest making use of multi-fields in your mapping (which would be default behavior if you aren't creating mapping (dynamic mapping)).
That way you can switch to traditional search and exact match searches when required.
Note that for exact matches, you would need to have keyword datatype + Term Query. Sample examples are provided in the links I've specified.
Hope it helps!
You can use dynamic_templates mapping for this. As a default, Elasticsearch is making the fields type as text and index: true like below:
{
"products2": {
"mappings": {
"product": {
"properties": {
"color": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"type": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
}
}
}
}
}
}
As you see, also it creates a keyword field as multi-field. This keyword fields indexed but not analyzed like text. if you want to drop this default behaviour. You can use below configuration for the index while creating it :
PUT products
{
"settings": {
"number_of_shards": 1,
"number_of_replicas": 0
},
"mappings": {
"product": {
"dynamic_templates": [
{
"strings": {
"match_mapping_type": "string",
"mapping": {
"type": "keyword",
"index": false
}
}
}
]
}
}
}
After doing this the index will be like below :
{
"products": {
"mappings": {
"product": {
"dynamic_templates": [
{
"strings": {
"match_mapping_type": "string",
"mapping": {
"type": "keyword",
"index": false
}
}
}
],
"properties": {
"color": {
"type": "keyword",
"index": false
},
"type": {
"type": "keyword",
"index": false
}
}
}
}
}
}
Note: I don't know the case but you can use the multi-field feature as mentioned by #Kamal. Otherwise, you can not search on the not analyzed fields. Also, you can use the dynamic_templates mapping set some fields are analyzed.
Please check the documentation for more information :
https://www.elastic.co/guide/en/elasticsearch/reference/current/dynamic-templates.html
Also, I was explained the behaviour in this article. Sorry about that but it is Turkish. You can check the example code samples with google translate if you want.

ElasticSearch Reindex API not analyzing the new field

I have an existing index named "Docs" which has documents in it.
I am creating a new Index named "Docs1" exactly same like "Docs" with only one extra field with analyzer in one property, which I want to use for autocomplete purpose.
Property in "Docs" index
"name": {
"type": "text",
"analyzer": "text_standard_analyzer",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
}
Property in the "Docs1" index going to be
{
"name": {
"type": "text",
"analyzer": "text_standard_analyzer",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
},
"pmatch": {
"type": "text",
"analyzer": "text_partialmatching_analyzer"
}
}
}
}
I am using Reindex API to copy records from "Docs" to "Docs1"
POST _reindex
{
"source": {
"index": "Docs"
},
"dest": {
"index": "Docs1"
}
}
when I reindex, I expect for the older documents to contain the new field with the information in that field.
I am noticing the new field in my destination index "Docs1" is not analyzed for existing data. But it is analyzed for any new documents I am adding.
Please suggest
Reindex by adding "type" worked
POST _reindex
{
"source":
{ "index": "sourceindex" },
"dest":
{ "index": "destindex",
"type":"desttype"
}
}

Elasticsearch 5.2.2: terms aggregation case insensitive

I am attempting to do a case-insensitive aggregation on a keyword type field, but I'm having issues in getting this to work.
What I have tried so far is to add a custom analyzer called "lowercase" which uses the "keyword" tokenizer, and "lowercase" filter. I then added a field to the mapping called "use_lowercase" for the field I want to work with. I wanted to retain the existing "text" and "keyword" field components as well since I may want to search for the terms within the field.
Here is the index definition, including the custom analyzer:
PUT authors
{
"settings": {
"analysis": {
"analyzer": {
"lowercase": {
"type": "custom",
"tokenizer": "keyword",
"filter": "lowercase"
}
}
}
},
"mappings": {
"famousbooks": {
"properties": {
"Author": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
},
"use_lowercase": {
"type": "text",
"analyzer": "lowercase"
}
}
}
}
}
}
}
Now I add 2 records with the same Author, but with different case:
POST authors/famousbooks/1
{
"Book": "The Mysterious Affair at Styles",
"Year": 1920,
"Price": 5.92,
"Genre": "Crime Novel",
"Author": "Agatha Christie"
}
POST authors/famousbooks/2
{
"Book": "And Then There Were None",
"Year": 1939,
"Price": 6.99,
"Genre": "Mystery Novel",
"Author": "Agatha christie"
}
So far so good. Now if I do a terms aggregation based on Author,
GET authors/famousbooks/_search
{
"size": 0,
"aggs": {
"authors-aggs": {
"terms": {
"field": "Author.use_lowercase"
}
}
}
}
I get the following result:
{
"error": {
"root_cause": [
{
"type": "illegal_argument_exception",
"reason": "Fielddata is disabled on text fields by default. Set fielddata=true on [Author.use_lowercase] in order to load fielddata in memory by uninverting the inverted index. Note that this can however use significant memory."
}
],
"type": "search_phase_execution_exception",
"reason": "all shards failed",
"phase": "query",
"grouped": true,
"failed_shards": [
{
"shard": 0,
"index": "authors",
"node": "yxcoq_eKRL2r6JGDkshjxg",
"reason": {
"type": "illegal_argument_exception",
"reason": "Fielddata is disabled on text fields by default. Set fielddata=true on [Author.use_lowercase] in order to load fielddata in memory by uninverting the inverted index. Note that this can however use significant memory."
}
}
],
"caused_by": {
"type": "illegal_argument_exception",
"reason": "Fielddata is disabled on text fields by default. Set fielddata=true on [Author.use_lowercase] in order to load fielddata in memory by uninverting the inverted index. Note that this can however use significant memory."
}
},
"status": 400
}
So it seems to me that the aggregation is thinking that the search field is text instead of keyword, and hence giving me the fielddata warning. I would think that ES would be sophisticated enough to recognize that the terms field is in fact a keyword (via custom analyzer) and therefore aggregate-able, but that doesn't appear to be the case.
If I add "fielddata":true to the mapping for Author, the aggregation then works fine, but I'm hesitant to do this given the dire warnings of high heap usage when setting this value.
Is there a best practice for doing this type of insensitive keyword aggregation? I was hoping I could just say "type":"keyword", "filter":"lowercase" in the mappings section but that is not available it seems.
It feels like I'm having to use too big of a stick to get this to work if I go the "fielddata":true route. Any help on this would be appreciated!
Turns out the solution is to use a custom normalizer instead of a custom analyzer.
PUT authors
{
"settings": {
"analysis": {
"normalizer": {
"myLowercase": {
"type": "custom",
"filter": [ "lowercase" ]
}
}
}
},
"mappings": {
"famousbooks": {
"properties": {
"Author": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
},
"use_lowercase": {
"type": "keyword",
"normalizer": "myLowercase",
"ignore_above": 256
}
}
}
}
}
}
}
This then allows terms aggregation using field Author.use_lowercase without issue.
It seems this is not possible by default,(without "lowercase" normalizer) but without this you can use a trick - translate the string in a case insensitive regex match.
e.g. for string "bar" - a case insensitive regex would be "[bB][aA][rR]"
I used a python helper for doing this:
def case_insensitive_regex_from_string(v):
if not v:
return v
zip_obj = zip(itertools.cycle('['), v, v.swapcase(), itertools.cycle(']'))
return ''.join(''.join(x) for x in zip_obj)
Well you did define use_lowercase as text:
"use_lowercase": {
"type": "text",
"analyzer": "lowercase"
}
Try defining it as type: keyword - It helped me with a similar problem I had with sorting.

elasticsearch 5.2 sorting with ICU plugin needs fielddata = true?

I want to sort elasticsearch result documents with icu_collation filter. So I have
settings for index:
"settings": {
"analysis": {
"analyzer": {
"ducet_sort": {
"tokenizer": "keyword",
"filter": [ "icu_collation" ]
}
}
}
}
and mappings
"mappings": {
"card": {
"properties": {
"title": {
"type": "text",
"fields": {
"sort": {
"type": "text",
"analyzer": "ducet_sort",
"index": false
}
}
}
}}}
and query:
{
"sort": ["title.sort"]
}
But query failed:
"caused_by": {
"type": "illegal_argument_exception",
"reason": "Fielddata is disabled on text fields by default. Set fielddata=true on [title.sort] in order to load fielddata in memory by uninverting the inverted index. Note that this can however use significant memory."
}
In documentation the suggested data type for sorting is keyword. But data type keyword doesn't support analyzer. In addition the fielddata is not recommended:
documentation
So is there a way for sorting documents in elasticsearch with some specific collation e.g. icu_collation without fielddata=true?
Thank you.
In Kibana, open Dev Tools option from left menu and execute the query below after update according to your settings.
PUT _mapping/INDEX_NAME?update_all_types
{
"properties": {
"FIELD_NAME": {
"type": "text",
"fielddata": true
}
}
}
or through Curl or a terminal like Cygwin(for Windows) execute the query below after update according to your settings.
curl -XPUT http://DOCKER_MACHINE_IP:9200/INDEX_NAME -d '{
"mappings": {
"type": {
"properties": {
"FIELD_NAME": {
"type": "text",
"fielddata": true
}
}
}
}
}'

Resources