Cannot set thread_pool setting in Elastic Cloud user settings - elasticsearch

Using ES v6.4.3
I'm getting a bunch of TransportService errors when writing a high volume of transactions. The exact error is:
StatusCodeError: 429 - {"error":{"root_cause":[{"type":"remote_transport_exception","reason":"[instance-0000000002][10.44.0.71:19428][indices:data/write/bulk[s][p]]"}],"type":"es_rejected_execution_exception","reason":"rejected execution of org.elasticsearch.transport.TransportService$7#35110df8 on EsThreadPoolExecutor[name = instance-0000000002/write, queue capacity = 200, org.elasticsearch.common.util.concurrent.EsThreadPoolExecutor#3fd60b4f[Running, pool size = 2, active threads = 2, queued tasks = 200, completed tasks = 1705133]]"},"status":429}
The general consensus seems to be bumping the queue_size so requests don't get dropped. As you can see in the error my queue_size is the default 200, and filled up. (I know simple bumping the queue_size is not a magic solution, but this is exactly what I need in this case)
So following this doc on how to change the elasticsearch.yml setting I try to add the queue_size bump here:
thread_pool.write.queue_size: 2000
And when I save I get this error:
'thread_pool.write.queue_size': is not allowed
I understand the user override settings blacklist certain settings, so if my problem is truly that thread_pool.write.queue_size is blacklisted, how can I access my elasticsearch.yml file to change it?
Thank you!

Related

Reached max children process limit: 2, extra: 0, current: 2, busy: 2, please increase LSAPI_CHILDREN

We are using 2 lb + 2 api's + 3 mysql for running our burger website. During peak order time website get slow and some time not available. We have checked the logs in detail and we could see the below in API litespeed error logs
[STDERR] [24627] Reached max children process limit: 2, extra: 0,
current: 2, busy: 2, please increase LSAPI_CHILDREN
We have tried raising the LSAPI_CHILDREN and other limits via litespeed admin url meanwhile the setting is not getting affected on back end. We get the same error again, when we have tried api cluster restart the settings are again reverted to the same.
I am attaching the screen shot of the changes we have done, the above error log is continuously logged after the change and litespeed restart. Due to continuous down issues we are moving to nginx for now. We need a proper solution for this so that we can use litespeed again.
You may use (add & set) the variable LSWS_MAX_CHILDREN in the LSWS layer (in your case) or in LLSMP layer (in case when LLSMP layer is used) to set the maximum children process limit for the server via the Dashboard.
The variables list access
Add and set the variable
Restart is required to apply changes.
To ensure the best operability, Jelastic sets this value equal to the number of available CPU cores (by default) and due to it this variable is not visible in the variable list. For more details, please follow the link LiteSpeed Web Server.
To read more about environment variables configuration, the below-listed links could be also in use:
Variables
Container Configuration

messages lost due to rate-limiting

We are testing the capacity of a Mail relay based on RHEL 7.6.
We are observing issues when sending an important number of msgs (e.g.: ~1000 msgs in 60 seconds).
While we have sent all the msgs and the recipient has received all the msgs, logs are missing in the /var/log/maillog_rfc5424.
We have the following message in the /var/log/messages:
rsyslogd: imjournal: XYZ messages lost due to rate-limiting
We adapted the /etc/rsyslog.conf with the following settings but without effect:
$SystemLogRateLimitInterval 0 # turn off rate limit
$SystemLogRateLimitBurst 0 # turn rate limit off
Any ideas ?
The error is from imjournal, but your configuration settings are for imuxsock.
According to the rsyslog configuration page you need to set
$imjournalRatelimitInterval 0
$imjournalRatelimitBurst 0
Note that for very high message rates you might like to change to imuxsock, as it says:
this module may be notably slower than when using imuxsock. The journal provides imuxsock with a copy of all “classical” syslog messages, however, it does not provide structured data. Only if that structured data is needed, imjournal must be used. Otherwise, imjournal may simply be replaced by imuxsock, and we highly suggest doing so.

elasticsearch es_rejected_execution_exception

I'm trying to index a 12mb log file which has 50,000 logs.
After Indexing around 30,000 logs, I'm getting the following error
[2018-04-17T05:52:48,254][INFO ][logstash.outputs.elasticsearch] retrying failed action with response code: 429 ({"type"=>"es_rejected_execution_exception", "reason"=>"rejected execution of org.elasticsearch.transport.TransportService$7#560f63a9 on EsThreadPoolExecutor[name = EC2AMAZ-1763048/bulk, queue capacity = 200, org.elasticsearch.common.util.concurrent.EsThreadPoolExecutor#7d6ae98b[Running, pool size = 2, active threads = 2, queued tasks = 200, completed tasks = 3834]]"})
However, I've gone through the documentation and elasticsearch forum which suggested me to increase the elasticsearch bulk queue size. I tried using curl but I'm not able to do that.
curl -XPUT localhost:9200/_cluster/settings -d '{"persistent" : {"threadpool.bulk.queue_size" : 100}}'
is increasing the queue size good option? I can't increase the hardware because I have fewer data.
The error I'm facing is due to the problem with the queue size or something else? If with queue size How to update the queue size in elasticsearch.yml and do I need to restart es after updating in elasticsearch.yml?
Please let me know. Thanks for your time
Once your indexing cant keep up with indexing requests - elasticsearch enqueues them in threadpool.bulk.queue and starts rejecting if the # of requests in queue exceeds threadpool.bulk.queue_size
Its good idea to consider throttling your indexing . Threadpool size defaults are generally good ; While you can increase them , you may not have enough resources ( memory, CPU ) available .
This blogpost from elastic.co explains the problem really well .
by reducing the batch size it resolved my problem.
POST _reindex
{
"source":{
"index":"sourceIndex",
"size": 100
},
"dest":{
"index":"destIndex"}
}

es_rejected_execution_exception rejected execution

I'm getting the following error when doing indexing.
es_rejected_execution_exception rejected execution of org.elasticsearch.action.support.replication.TransportReplicationAction$PrimaryPhase$1#16248886
on EsThreadPoolExecutor[bulk, queue capacity = 50,
org.elasticsearch.common.util.concurrent.EsThreadPoolExecutor#739e3764[Running,
pool size = 16, active threads = 16, queued tasks = 51, completed
tasks = 407667]
My current setup:
Two nodes. One is the master (data: true, master: true) while the other one is data only (data: true, master: false). They are both EC2 I2.4XL (16 Cores, 122GB RAM, 320GB instance storage). 2 shards, 1 replication.
Those two nodes are being fed by our aggregation server which has 20 separate workers. Each worker makes bulk indexing request to our ES cluster with 50 items to index. Each item is between 1000-4000 characters.
Current server setup: 4x client facing servers -> aggregation server -> ElasticSearch.
Now the issue is this error only started occurring when we introduced the second node. Before when we had one machine, we got consistent indexing throughput of 20k request per second. Now with two machine, once it hits the 10k mark (~20% CPU usage)
we start getting some of the errors outlined above.
But here is the interesting thing which I have noticed. We have a mock item generator which generates a random document to be indexed. Generally these documents are of the same size, but have random parameters. We use this to do the stress test and check the stability. This mock item generator sends requests to aggregation server which in turn passes them to Elasticsearch. The interesting thing is, we are able to index around 40-45k (# ~80% CPU usage) items per second without getting this error. So it seems really interesting as to why we get this error. Has anyone seen this error or know what could be causing it?

administrative limit exceeded, REST

Using rest I got this exception
http://localhost:8080/customgroups?_queryFilter=(members/uid+co+%22test%22)
{"code":413,"reason":"Request Entity Too Large","message":"Administrative Limit Exceeded"}
I turned all limits off:
ds-cfg-lookthrough-limit: 0
ds-cfg-size-limit: 0
Is there another constrain? The result should be 1-3 entries. Other requests like get all customGroups = 83 or users = 1300 works fine, so why does the query_filter making problems?
Thank You
There are a few things you might try out:
can you check the ds-rlim-lookthrough-limit operational attribute is correctly set? Especially for cn=Directory Manager if you are using it to make requests.
I can see there is a special config for collective attributes ds-rlim-lookthrough-limit;collective: 0. Maybe does it apply to your request?
References:
http://ludopoitou.com/2012/04/10/tips-resource-limits-in-opendj/
http://opendj.forgerock.org/opendj-server/configref/global.html#lookthrough-limit
http://docs.forgerock.org/en/opendj/2.6.0/admin-guide/index/chap-resource-limits.html

Resources