What does the term `requests` in `requests_per_second` actually mean? - elasticsearch

In Delete By Query API doc, Throttling delete requests section
To control the rate at which delete by query issues batches of delete operations, you can set requests_per_second to any positive decimal number. This pads each batch with a wait time to throttle the rate. Set requests_per_second to -1 to disable throttling.
Throttling uses a wait time between batches so that the internal scroll requests can be given a timeout that takes the request padding into account. The padding time is the difference between the batch size divided by the requests_per_second and the time spent writing. By default the batch size is 1000, so if requests_per_second is set to 500:
target_time = 1000 / 500 per second = 2 seconds
wait_time = target_time - write_time = 2 seconds - .5 seconds = 1.5 seconds
Q1: From the above example, it seems that the wording requests within requests_per_second indicates the documents in ES index, not the internal scroll/batch requests, that is this parameter indeed control how many docs to handle per second, right? If so, I think docs_per_second might be a better name.
Since the batch is issued as a single _bulk request, large batch sizes cause Elasticsearch to create many requests and wait before starting the next set. This is "bursty" instead of "smooth".
Q2: I do not understand well why the batch is issued as a single _bulk request?

Q1: No actually it scroll_size which determine batch size for scrolling results used by the _delete_by_query method
Q2: The batch size here is the size of DELETE queries which was generated by the batch_size setup in the _search generate by the _delete_by_query / scroll_size sent to _bulk function to delete documents, so every batch selected are sent to delete in 1 _bulk

Related

Signalfx- Count number of request taking time > a threshold value

I have a metric named http.server.request that stores the duration it took to respond to a http request. I want to get the number of request for which this duration is above a threshold ( say 1000ms ). I could see various percentiles (p99, p90, mean, etc) in the options but nothing that could give me this number. Is it possible to get this?

How to adjust JMeter repeat requests?

I'm testing the performance of my website using JMeter.
1000 users were forced to log in sequentially and call the sub-page periodically (using Loop Controller).
The interval for sub-page calls was made to be called every 10 seconds using the Constant Timer.
As 1000 users accessed DB data at the same time every cycle, the CPU load of DB increased by 100%.
Can only 50 out of 1000 logged-in users access the sub-page every second?
You can put the "sub-page" request under a Throughput Controller and specify the desired frequency of the requests execution there:
More information: Running JMeter Samplers with Defined Percentage Probability
However you should realize that artificially limiting the load will lead to false positive results so you might want to perform the query optimization or increase the DB server resources instead

Rate limiting WebCient requests after certain number of retries?

Is it possible to rate limit webclient requests for a url or any requests after a defined number of retries using resilience4j or otherwise?
Following is a sample code.
webClient.post()
.bodyValue(...)
.header(...)
.uri(...)
.retrieve()
.bodyToMono(Void.class)
.retryWhen(...) // every 15 mins, for 1 day
.subscribe();
Example use is say 10000 requests in a day need to be sent to 100 different urls. There are retries for case when a url is temporary unavailable.
But if a url comes back up after few hours, it would get accumulated large number of requests which I would like to rate limit.
Effectively I don't want to rate limit the whole operation but for specific urls which are not available for a long time or rate limit requests which have been retried 'x' number of times. Is there a way to achieve this?
Not to be confused with circuit breaker requirement, it's not exactly an issue to keep retrying every few mins atleast in the context.

Elasticsearch timeout true but still get result

I'm setting the timeout to 10ms to my search query, so I'm expecting that elasticsearch search query should timeout in 10ms.
In the response, I do get "timed_out":true but the query doesnt seem to timeout. It still runs for a few hundred milliseconds.
Sample response:
{
"took": 460,
"timed_out": true,
....
Is this the expected behavior or am I missing something here ? My goal is to terminate the query if its taking too long so that it doesnt put load on the cluster.
What to expect from query timeout?
Elasticsearch query running with timeout set may return partial or empty results (if timeout has expired), from the Elasticsearch Guide:
The timeout parameter tells shards how long they are allowed to
process data before returning a response to the coordinating node. If
there was not enough time to process all data, results for this shard
will be partial, even possibly empty.
The documentation of the Request Body Search parameters also tells this:
timeout
A search timeout, bounding the search request to be executed within
the specified time value and bail with the hits accumulated up to that
point when expired. Defaults to no timeout.
For further details please consult this page in the guide.
How to terminate queries that run too long?
Looks like Elasticsearch does not have an ultimate answer, rather several workarounds for particular cases. Here they are.
There isn't a way to protect system from DoS attacks (as of year 2015). Long-running queries can be limited with timeout or terminate_after query parameters. terminate_after is like timeout but it counts the number of documents per shard. Both of these parameters are more like recommendations to Elasticsearch, means that some long-running queries can still pass through the desired max execution time (like a script query for instance).
Since then Task Management API was introduced and monitoring and cancelling long-running tasks became possible. This means that you will have to write some additional code that will check the health of the cluster and cancel the tasks.

ElasticSearch gives error about queue size

RemoteTransportException[[Death][inet[/172.18.0.9:9300]][bulk/shard]]; nested: EsRejectedExecutionException[rejected execution (queue capacity 50) on org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction$AsyncShardOperationAction$1#12ae9af];
Does this mean I'm doing too many operations in one bulk at one time, or too many bulks in a row, or what? Is there a setting I should be increasing or something I should be doing differently?
One thread suggests "I think you need to increase your 'threadpool.bulk.queue_size' (and possibly 'threadpool.index.queue_size') setting due to recent defaults." However, I don't want to arbitrarily increase a setting without understanding the fault.
I lack the reputation to reply to the comment as a comment.
It's not exactly the number of bulk requests made, it is actually the total number of shards that will be updated on a given node by the bulk calls. This means the contents of the actual bulk operations inside the bulk request actually matter. For instance, if you have a single node, with a single index, running on an 8 core box, with 60 shards and you issue a bulk request that has indexing operations that affects all 60 shards, you will get this error message with a single bulk request.
If anyone wants to change this, you can see the splitting happening inside of org.elasticsearch.action.bulk.TransportBulkAction.executeBulk() near the comment "go over all the request and create a ShardId". The individual requests happen a few lines down around line 293 on version 1.2.1.
You want to up the number of bulk threads available in the thread pool. ES sets aside threads in several named pools for use on various tasks. These pools have a few settings; type, size, and queue size.
from the docs:
The queue_size allows to control the size of the queue of pending
requests that have no threads to execute them. By default, it is set
to -1 which means its unbounded. When a request comes in and the queue
is full, it will abort the request.
To me that means you have more bulk requests queued up waiting for a thread from the pool to execute one of them than your current queue size. The documentation seems to indicate the queue size is defaulted to both -1 (the text above says that) and 50 (the call out for bulk in the doc says that). You could take a look at the source to be sure for your version of es OR set the higher number and see if your bulk issues simply go away.
ES thread pool settings doco
elasticsearch 1.3.4
our system 8 core * 2
4 bulk worker each insert 300,000 message per 1 min => 20,000 per sec
i'm also that exception! then set config
elasticsearch.yml
threadpool.bulk.type: fixed
threadpool.bulk.size: 8 # availableProcessors
threadpool.bulk.queue_size: 500
source
BulkRequestBuilder bulkRequest = es.getClient().prepareBulk();
bulkRequest.setReplicationType (ReplicationType.ASYNC).setConsistencyLevel(WriteConsistencyLevel.ONE);
loop begin
bulkRequest.add(es.getClient().prepareIndex(esIndexName, esTypeName).setSource(document.getBytes ("UTF-8")));
loop end
BulkResponse bulkResponse = bulkRequest.execute().actionGet();
4core => bulk.size 4
then no error
I was having this issue and my solution ended up being increasing ulimit -Sn and ulimit Hn for the elasticsearch user. I went from 1024 (default) to 99999 and things cleaned right up.

Resources