FIELDDATA Data is too large - elasticsearch

I open kibana and do a search and i get the error where shards failed. I looked in the elasticsearch.log file and I saw this error:
org.elasticsearch.common.breaker.CircuitBreakingException: [FIELDDATA] Data too large, data for [#timestamp] would be larger than limit of [622775500/593.9mb]
Is there any way to increase that limit of 593.9mb?

You can try to increase the fielddata circuit breaker limit to 75% (default is 60%) in your elasticsearch.yml config file and restart your cluster:
indices.breaker.fielddata.limit: 75%
Or if you prefer to not restart your cluster you can change the setting dynamically using:
curl -XPUT localhost:9200/_cluster/settings -d '{
"persistent" : {
"indices.breaker.fielddata.limit" : "40%"
}
}'
Give it a try.

I meet this problem,too.
Then i check the fielddata memory.
use below request:
GET /_stats/fielddata?fields=*
the output display:
"logstash-2016.04.02": {
"primaries": {
"fielddata": {
"memory_size_in_bytes": 53009116,
"evictions": 0,
"fields": {
}
}
},
"total": {
"fielddata": {
"memory_size_in_bytes": 53009116,
"evictions": 0,
"fields": {
}
}
}
},
"logstash-2016.04.29": {
"primaries": {
"fielddata": {
"memory_size_in_bytes":0,
"evictions": 0,
"fields": {
}
}
},
"total": {
"fielddata": {
"memory_size_in_bytes":0,
"evictions": 0,
"fields": {
}
}
}
},
you can see my indexes name base datetime, and evictions is all 0. Addition, 2016.04.02 memory is 53009116, but 2016.04.29 is 0, too.
so i can make conclusion, the old data have occupy all memory, so new data cant use it, and then when i make agg query new data , it raise the CircuitBreakingException
you can set config/elasticsearch.yml
indices.fielddata.cache.size: 20%
it make es can evict data when reach the memory limit.
but may be the real solution you should add you memory in furture.and monitor the fielddata memory use is good habits.
more detail: https://www.elastic.co/guide/en/elasticsearch/guide/current/_limiting_memory_usage.html

Alternative solution for CircuitBreakingException: [FIELDDATA] Data too large error is cleanup the old/unused FIELDDATA cache.
I found out that fielddata.limit been shared across indices, so deleting a cache of an unused indice/field can solve the problem.
curl -X POST "localhost:9200/MY_INDICE/_cache/clear?fields=foo,bar"
For more info https://www.elastic.co/guide/en/elasticsearch/reference/7.x/indices-clearcache.html

I think it is important to understand why this is happening in the first place.
In my case, I had this error because I was running aggregations on "analyzed" fields. In case you really need your string field to be analyzed, you should consider using multifields and make it analyzed for searches and not_analyzed for aggregations.

I ran into this issue the other day. In addition to checking the fielddata memory, I'd also consider checking the JVM and OS memory as well. In my case, the admin forgot to modify the ES_HEAP_SIZE and left it at 1gig.

just use:
ES_JAVA_OPTS="-Xms10g -Xmx10g" ./bin/elasticsearch
since the default heap is 1G, if your data is big ,you should set it bigger

Related

cluster.routing.allocation.enable: none seems not working

I disable the shard allocation with following snippet:
PUT _cluster/settings
{
"persistent": {
"cluster.routing.allocation.enable": "none"
}
}
I double check GET _cluster/settings' and confirm it has been set with none.
But when I use the following snippet trying to move a shard between nodes. And the move succeeds,
looks the "cluster.routing.allocation.enable": "none" doesn't take effect?
POST /_cluster/reroute
{
"commands": [
{
"move": {
"index": "lib38",
"shard": 0,
"from_node": "node-1",
"to_node": "node-3"
}
}
]
}
Manually rerouting a shard will always take precedence over configuration.
cluster.routing.allocation.enable is only giving a hint to the cluster that automatic reallocation should not take place.
In your other question, you were concerned about automatic rebalancing, it seems.

Fielddata is disabled on text fields by default

I've encountered a classic problem, hovewer, no page on SO or any other Q&A or forum has helped me.
I need to extract a numerical value of parameter "wsProcessingElapsedTimeMS" out of string, like (where the parameter is contained in the message field):
2018-07-31 07:37:43,740|DEBUG|[ACTIVE] ExecuteThread: '43' for queue:
'weblogic.kernel.Default (self-tuning)'
|LoggerHandler|logMessage|sessionId=9AWTu
wsOperationName=FindBen wsProcessingEndTime=2018-07-31 07:37:43.738
wsProcessingElapsedTimeMS=6 httpStatus=200 outgoingAddress=172.xxx.xxx.xxx
and keep getting error:
"type":"illegal_argument_exception","reason":"Fielddata is disabled on text
fields by default. Set fielddata=true on [message] in order to load fielddata
in memory by uninverting the inverted index. Note that this can however use
significant memory. Alternatively use a keyword field instead."
Point is, I've already did run the query (by Dev Tools in Kibana GUI if that matters) to mark a field as fielddata in following way:
PUT my_index/_mapping/message
{
"message": {
"properties": {
"publisher": {
"type": "text",
"fielddata": true
}
}
}
}
, which returned brief information:
{
"acknowledged": true
}
After which I've tried to rebuilt the index like:
POST _reindex?wait_for_completion=false
{
"source": {
"index": "my_index"
},
"dest": {
"index": "my_index"
}
}
(the ?wait_for_completion=false flag is set because otherwise it was a timeout; theres a lot of data in the system now).
And finally, having performed above steps, I've also tried to relaunch the kibana and elasticsearch services (processes) to force a reindexing (which tool really long).
Also, using the "message.keyword" instead of "message" (as suggested in the official documentation) is not helping - it's just empty in most of cases.
I'm using the Kibana to access the ElasticSearch engine.
ElasticSearch v. 5.6.3
Kibana v. 5.6.3
Logstash v. 5.5.0
Any suggestion will be appreciated, even regarding the use of additional plugins (provided they have a release compliant with above Kibana/ElasticSearch/Logstash versions, as I can't update them to newer right now).
PUT your_index/_mapping/your_type
{
"your_type": {
"properties": {
"publisher": {
"type": "text",
"fielddata": true
}
}
}
}

Reindex fail due to SearchContextMissingException

My company is using elasticsearch 2.3.4.
We have a cluster that contains 38 ES nodes, and we've been having a problem with reindexing some of our data lately...
We've reindexed before very large indexes and had no problems, but recently, when trying to reindex much smaller indexed (less than 10GB) - we get : "SearchContextMissingException [No search context found for id [XXX]]".
We have no idea what's causing this problem or how to fix it. We'd like some guidance.
Has anyone saw this exception before?
From github comments on issues related to this , i think this can be avoided by changing batch size :
From documentation:
By default _reindex uses scroll batches of 1000. You can change the batch size with the size field in the source element:
POST _reindex
{
"source": {
"index": "source",
"size": 100
},
"dest": {
"index": "dest",
"routing": "=cat"
}
}
I had the same problem with an index that holds many huge documents. I had to reduce the batch size down to 10. (100 and 50 both didn't work).
This was the request that worked in the end:
POST _reindex?slices=5&refresh
{
"source": {
"index": "source_index",
"size": 10
},
"dest": {
"index": "dest_index"
}
}
You should also set the slices to the number of shards you have in your index.

Elasticsearch memory usage of fielddata

I created a simple autocomplete using ElasticSearch following the article http://www.bilyachat.com/2015/07/search-like-google-with-elasticsearch.html.
The below is a simple example.
https://gist.github.com/jinkyou/ac92c0d9fc53860b703ac773af03b0da
At first, I didn't set "fielddata": true to the autocomplete property.
Then ES returned an error.
{
"error": {
"root_cause": [
{
"type": "illegal_argument_exception",
"reason": "Fielddata is disabled on text fields by default. Set fielddata=true on [autocomplete] in order to load fielddata in memory by uninverting the inverted index. Note that this can however use significant memory."
}
}
So, I added "fielddata": true to the property of autocomplete and it works fine.
But after the documents are increased, [fielddata] Data too large, data for [autocomplete] would be larger than limit of [249416908/237.8mb]] error occurred.
Here's the my questions.
Is it a right way to implement 'autocomplete'?
I though it's a common function of search engine, but there is no suggester for it in ES. Am I right?
Is 'fielddata' is necessary for this function?
I think even it's not necessary for the function, I need it to response in short time.
How can I reduce fielddata size? or increase fielddata limit size?
I'm gonna try to decrease max_shingle_size. If there is any other good way, please let me know.
What is the real memory usage of ES
I want to figure out memory consumption following document size, but the result of GET _nodes/stats contains strange things that
{
"os": {
"mem": {
"total_in_bytes": 128922271744,
"free_in_bytes": 2966560768,
"used_in_bytes": 125955710976,
"free_percent": 2,
"used_percent": 98
}
}
}
I just turn on the ES and there is no index but it shows almost memories are in use.
Thank you so much to read this.

elasticsearch in memory speed

I'm trying to test how much faster would be the in-memory solution with elasticsearch.
For this, I wrote a test in which Im generating ~10milion records and after that performing a text search. Result comes in 3-20ms but there is no difference (at all) when I do the search in memory and without this setting. Is it possible? Is 10million records too small to see any difference? Im not even 100% sure if I enabled the in-memory mode correctly. Im loading the settings from a json file, in which I places some settings I found on internet that was supposed to improve overall solution, but it seems like its not working at all.
The settings regarding index looks like this:
"index": {
"store": {
"type":"memory"
},
"merge": {
"policy": {
"use_compound_file": false
}
},
"translog": {
"flush_threshold": 50000
},
"engine": {
"robin": {
"refresh_interval": 2
}
},
"cache": {
"field": {
"max_size": 500000,
"expire": "30m"
}
}
},
"indices": {
"memory": {
"index_buffer_size": 256
}
},
I don't know if you are using in-memory storage wisely or not. you can just match what type of storage do you need here.
But, You have to provide storage setting, while creating the index (make sure that index doesn't exists previously)
Try this,
curl -XPUT "http://localhost:9200/my_index/" -d'
{
"settings": {
"index.store.type": "memory"
}
}'
This will create index, which will stores the index in main memory, using Lucene’s RamIndexStore.

Resources