buffer flush took longer time than slow_flush_log_threshold - elasticsearch

I am getting following Error in kibana-
buffer flush took longer time than slow_flush_log_threshold.
Don't know how to solve this.
does anyone know how to solve this ?
Things done so far-
ES data nodes were filled 95%, so freed up some space, now at 70%
cluster health was yellow, after freeing up space turned green.
However, still not able to get data in Kibana.

in kibana dev tools, do this-
PUT /_all/_settings
{
"index" : {
"blocks" : {
"read_only_allow_delete" : "false"
}
}
}

Related

Elasticsearch posting documents : FORBIDDEN/12/index read-only / allow delete (api)]

Running Elasticsearch version 7.3.0, I posted 50 million documents in my index. when trying to post more documents to Elasticsearch i keep getting this message :
response code: 403 cluster_block_exception [FORBIDDEN/12/index read-only / allow delete (api)];
Disk watermark exceeded
I have 40 GB of free data and extended disk space but still keep getting this error
Any thoughts on what can be causing this ?
You must have hit the floodstage watermark at 95%. Where to go from here:
Free disk space (you seem to have done that already).
Optionally change the default setting. 5% of 400GB might be a bit too aggressive for blocking write operations. You can either use percent or absolute values for this — this is just a sample and you might want to pick different values:
PUT _cluster/settings
{
"transient": {
"cluster.routing.allocation.disk.watermark.low": "5gb",
"cluster.routing.allocation.disk.watermark.high": "2gb",
"cluster.routing.allocation.disk.watermark.flood_stage": "1gb",
"cluster.info.update.interval": "1m"
}
}
You must reset the index blocking (either per affected index or on a global level with all):
PUT /_all/_settings
{
"index.blocks.read_only_allow_delete": null
}
BTW in 7.4 this will change: Once you go below the high watermark, the index will unlock automatically.

Performance when using NFS storage for Elasticsearch

I have a server with 32 cores, 62 GB of RAM but we have NFS storage and I think it's starting to bottleneck our daily work. In our Kibana errors like queue_size are appearing more frequently. We just got a new (same) server to use it as a replica and share the load, will this help? What other recomendations you have? We have multiple dashboards with like 20 different variables each, will they be evenly distributed between the primary node and the replica? Unfortunately, local storage is not an option.
Are you actively indexing data on these nodes? If yes you can increase refresh_interval
PUT /myindex/_settings
{
"index" : {
"refresh_interval" : "30s"
}
}
https://www.elastic.co/guide/en/elasticsearch/reference/current/indices-update-settings.html to make system less demanding for IO. You can completely disable refresh functionality and trigger it manually.
PUT /myindex/_settings
{
"index" : {
"refresh_interval" : "-1"
}
}
POST /myindex/_refresh
Take a look on Bulk API it significantly decrease load on indexing stage.
Adding new servers to cluster helps too. Elasticsearch designed to scale horizontally. From my experience you can run 6-8 virtual nodes on server you have described. Put more shards to evenly distribute load.
Do you see what is your bottleneck (Lan, IO, CPU)?

Elasticsearch - How to remove stuck persistent setting after upgrade

I have just upgraded my cluster from 5.6 to 6.1. I did a rolling upgrade as the documentation specified. It looks like a setting that I was using isn't available anymore in 6.1. That would've been fine, but now I can't even enable my shard allocation, so now my last node won't allocate its shards. Doing something as simple as this:
curl -XPUT 'localhost:9200/_cluster/settings?pretty' -H 'Content-Type: application/json' -d'
{
"persistent" : {
"cluster.routing.allocation.enable" : "all"
}
}
results in this:
{
"error" : {
"root_cause" : [
{
"type" : "remote_transport_exception",
"reason" : "[inoreader-es4][92.247.179.253:9300][cluster:admin/settings/update]"
}
],
"type" : "illegal_argument_exception",
"reason" : "unknown setting [indices.store.throttle.max_bytes_per_sec] did you mean [indices.recovery.max_bytes_per_sec]?"
},
"status" : 400
}
No matter what setting I try to change I always get this error.
Yes, I did set indices.store.throttle.max_bytes_per_sec as persistent setting once in 5.x, and I'm OK with having to set it to a new name now, but how can I even remove it? It's not in elasticsearch.yml.
You'll need to unset that value. If you are still on the old version, you can use the following command (in 5.0 unsetting with null was added):
PUT _cluster/settings
{
"persistent": {
"indices.store.throttle.max_bytes_per_sec": null
}
}
This will however fail with a "persistent setting [indices.store.throttle.max_bytes_per_sec], not recognized" in your cluster if you have already upgraded.
At the moment (Elasticsearch version 6.1.1) the removed setting will be archived under archived.indices.store.throttle.max_bytes_per_sec. You can remove this and any other archived setting with:
PUT _cluster/settings
{
"persistent": {
"archived.*": null
}
}
However, there is a bug that only lets you unset archived settings before you change any other settings.
If you have already made other settings and are affected by this bug, the only solution is to downgrade to 5.6 again, unset the configuration (command at the top of this answer), and then do the upgrade again. It's probably enough to do this on one node (stop all others) as long as it's the master and all other nodes join that master and accept its corrected cluster state. Be sure to take a snapshot before in any case.
For future versions the archived.* behavior will probably change as stated in the ticket (though it's just in the planning phase right now):
[...] we should not archive unknown and broken cluster settings.
Instead, we should fail to recover the cluster state. The solution for
users in an upgrade case would be to rollback to the previous version,
address the settings that would be unknown or broken in the next major
version, and then proceed with the upgrade.
Manually editing or even deleting the cluster state on disk sounds very risky: The cluster state includes a lot of information (check for yourself with GET /_cluster/state) like templates, indices, routing table,... Even if you have the data of the data nodes, but you lost the cluster state, you wouldn't be able to access your data (the "map" how to form indices out of the shards is missing). If I remember correctly, in more recent ES versions the data nodes cache the cluster state and will try to restore from that, but that's a last resort and I wouldn't want to rely on it. Also I'm not sure if that might not also bring back your bad setting.
PS: I can highly recommend the free upgrade assistant going from 5.6 to 6.x.

Best practices for ElasticSearch initial bulk import

I'm running a docker setup with ElasticSearch, Logstash, Filebeat and Kibana inspired by the Elastic Docker Compose. I need to initial load 15 GB og logfiles into the system (Filebeat->Logstash->ElasticSearch) but I'm having some issues with performance.
It seems that Filebeat/Logstash is outputting too much work for ElasticSearch. After some time I begin to see a bunch of errors in ElasticSearch like this:
[INFO ][o.e.i.IndexingMemoryController] [f8kc50d] now throttling indexing for shard [log-2017.06.30]: segment writing can't keep up
I've found this old documentation article on how to disable merge throttling: https://www.elastic.co/guide/en/elasticsearch/guide/master/indexing-performance.html#segments-and-merging.
PUT /_cluster/settings
{
"transient" : {
"indices.store.throttle.type" : "none"
}
}
But in current version (ElasticSearch 6) it gives me this error:
{
"error": {
"root_cause": [
{
"type": "illegal_argument_exception",
"reason": "transient setting [indices.store.throttle.type], not dynamically updateable"
}
],
"type": "illegal_argument_exception",
"reason": "transient setting [indices.store.throttle.type], not dynamically updateable"
},
"status": 400
}
How can I solve the above issue?
The VM has 4 CPU cores (Intel Xeon E5-2650) and ElasticSearch is assigned 4GB of RAM, Logstash and Kibana 1GB each. Swapping is disabled using "swapoff -a". X-pack and monitoring is enabled. I only have one ES node for this log server. Do I need to have multiple node for this initial bulk import?
EDIT1:
Changing the number_of_replicas and refresh_interval seems to make it perform better. Still testing.
PUT /log-*/_settings
{
"index.number_of_replicas" : "0",
"index.refresh_interval" : "-1"
}
Most likely the bottleneck is IO (you can confirm this running iostat, also it would be useful if you post ES monitoring screenshots), so you need to reduce pressure on it.
Default ES configuration causes generation of many index segments during a bulk load. To fix this, for the bulk load, increase index.refresh_interval (or set it to -1) - see doc. The default value is 1 sec, which causes new segment to be created every 1 second, also try to increase batch size and see if it helps.
Also if you use spinning disks,set index.merge.scheduler.max_thread_count to 1. This will allow only one thread to perform segments merging and will reduce contention for IO between segments merging and indexing.

High disk watermark exceeded even when there is not much data in my index

I'm using elasticsearch on my local machine. The data directory is only 37MB in size but when I check logs, I can see:
[2015-05-17 21:31:12,905][WARN ][cluster.routing.allocation.decider] [Chrome] high disk watermark [10%] exceeded on [h9P4UqnCR5SrXxwZKpQ2LQ][Chrome] free: 5.7gb[6.1%], shards will be relocated away from this node
Quite confused about what might be going wrong. Any help?
From Index Shard Allocation:
... watermark.high controls the high watermark. It defaults to 90%, meaning ES will attempt to relocate shards to another node if the node disk usage rises above 90%.
The size of your actual index doesn't matter; it's the free space left on the device which matters.
If the defaults are not appropriate for you, you've to change them.
To resolve the issue in which, the log is recorded as:
high disk watermark [90%] exceeded on
[ytI5oTyYSsCVfrB6CWFL1g][ytI5oTy][/var/lib/elasticsearch/nodes/0]
free: 552.2mb[4.3%], shards will be relocated away from this node
You can update the threshold limit by executing following curl request:
curl -XPUT "http://localhost:9200/_cluster/settings" \
-H 'Content-Type: application/json' -d'
{
"persistent": {
"cluster": {
"routing": {
"allocation.disk.threshold_enabled": false
}
}
}
}'
this slightly modified curl command from the Elasticsearch 6.4 docs worked for me:
curl -X PUT "localhost:9200/_cluster/settings" -H 'Content-Type: application/json' -d'
{
"transient": {
"cluster.routing.allocation.disk.watermark.low": "2gb",
"cluster.routing.allocation.disk.watermark.high": "1gb",
"cluster.routing.allocation.disk.watermark.flood_stage": "500mb",
"cluster.info.update.interval": "1m"
}
}
'
if the curl -XPUT command succeeds, you should see logs like this in the Elasticsearch terminal window:
[2018-08-24T07:16:05,584][INFO ][o.e.c.s.ClusterSettings ] [bhjM1bz] updating [cluster.routing.allocation.disk.watermark.low] from [85%] to [2gb]
[2018-08-24T07:16:05,585][INFO ][o.e.c.s.ClusterSettings ] [bhjM1bz] updating [cluster.routing.allocation.disk.watermark.high] from [90%] to [1gb]
[2018-08-24T07:16:05,585][INFO ][o.e.c.s.ClusterSettings ] [bhjM1bz] updating [cluster.routing.allocation.disk.watermark.flood_stage] from [95%] to [500mb]
[2018-08-24T07:16:05,585][INFO ][o.e.c.s.ClusterSettings ] [bhjM1bz] updating [cluster.info.update.interval] from [30s] to [1m]
https://www.elastic.co/guide/en/elasticsearch/reference/current/disk-allocator.html
Its a warning and won't affect anything. The storage processors (SPs) use high and low watermarks to determine when to flush their write caches.
The possible solution can be to free some memory
And the warning will disappear. Even with it showing, the replicas will not be assigned to the node which is okay. The elasticsearch will work fine.
Instead of percentage I use absolute values and rise values for better space use (in pre-prod):
PUT _cluster/settings
{
"persistent": {
"cluster.routing.allocation.disk.threshold_enabled": true,
"cluster.routing.allocation.disk.watermark.low": "1g",
"cluster.routing.allocation.disk.watermark.high": "500m",
"cluster.info.update.interval": "5m"
}
}
Also I reduce pooling interval to make ES logs shorter ))
Clear up some space on your hard drive, that should fix the issue. This shall also change the health of your ES clusters from Yellow to green (if you got the above issue, you are most likely to face the yellow cluster health issue as well).

Resources