I am using ElasticSearch6.2.1. I am using single node cluster. It is working fine with my small size indices and medium traffic. But when I test for large number of concurrent request to handle using Apache JMeter, ES is going down with error message like below.
My requirement is to prevent ES to not crash even in such high traffic situation. It should discard requests after a certain time but not to stop working. Is there any option by which I can achieve it? Please advise.
if the requests are going up for just few seconds, you can increase the queue size of requested thread_pool (for example search thread-pool). otherwise you should add some node to cluster.
(please add some log of elastic crashing. do you have any out of memory exception?)
Are you sure elasticsearch is crashing? here, it's saying the search thread pool is full.
More read at https://www.elastic.co/guide/en/elasticsearch/reference/current/modules-threadpool.html.
Related
My Opensearch sometimes reaches the error "429 Too Many Requests" when writing data. I know there is a queue, when the queue is full it will show that error. So is there any Api to check that bulk queue status, current size...? Example: queue 150/200 (nearly full)
Yes, you can use the following API call
GET _cat/thread_pool?v
You will get something like this, where you can see the node name, the thread pool name (look for write), the number of active requests currently being carried out, the number of requests waiting in the queue and finally the number of rejected requests.
node_name name active queue rejected
node01 search 0 0 0
node01 write 8 2 0
The write queue can handle as many requests as 1 + number of CPUs, i.e. as many can be active at the same time. If active is full and new requests come in, they go directly in the queue (default size 10000). If active and queue are full, requests start to be rejected.
Your mileage may vary, but when optimizing this, you're looking at:
keeping rejected at 0
minimizing the number of requests in the queue
making sure that active requests get carried out as fast as possible.
Instead of increasing the queue, it's usually preferable to increase the number of CPU. If you have heavy ingest pipelines kicking in, it's often a good idea to add ingest nodes whose goal will be to execute that pipeline instead of on the data node.
I am building an application with InfluxDB. I need to monitor sensor data from many thousands of sensors, and I need to set up alerts and notifications for each one of them. This results in several thousands of checks.
When I started creating checks, Influx started choking at about 500 running checks. Is this expected? Am I working against how Influx is supposed to be used here?
Additional info:
My series cardinality is low, <50.
The database in question had no actual data in it, so the checks ran on an empty database.
Influx runs in docker, and the container went from about 2% cpu usage to closer to 400% when the checks were saved.
When I request an API in Nifi, more than one response returns. And the content of these responses is the same. If I don't stop the processor, it keeps coming. I keep turning the processor on and off quickly. Is there a way to restrict this?
Can I have the API return a certain number of times no matter how many requests it sends? For example, return only 3 requests.
NiFi flows are intended to be always-on streams. If you go to the Scheduling tab of a processor's config, you'll see that, by default, it is scheduled to run continuously (0 ms).
If you don't want this style of streaming behaviour, you need to change the Scheduling of the processor.
You can change it to only schedule the processor every X seconds, or you can change it to run based on a CRON expression.
I have been running my JMeter script with the following setup
Users: 100
Loop Controller: 5
I used Loop Controller on the http request where transactions are needed to iterate.
My question is, there is a particular request which is searching where after 5 successful search the proceeding searches displayed “Customer API. Unable to establish connection”
Please see image below:
1 First Image displays lower load time while second image displays higher load time
It looks like you discovered a bottleneck in your application or at least it is not properly configured for high loads. Almost a minute response time for transfering 600 kilobytes is not something I would expect from a well-behaved application.
The error message you're getting is very specific so I would recommend checking your application logs as the very first step. The remaining steps would be inspecting application and middleware configuration and ensuring that it's properly set up for high performance and it has enough headroom to operate in terms of CPU, RAM, Network, Disk, etc.
Context:
My Spring-Boot app runs as expected on Cloud Run when I deploy it with max-instances set to 1: It receives a constant stream of pubsub messages via push, and makes anywhere from 0 to 5 writes to an associated CloudSQL instance, depending on the message payload. Typically it handles between 20 and 40 messages per second. Latency/response-time varies between 50ms and 60sec, probably due to some resource contention.
In order to increase throughput/ decrease resource contention, I'm looking to experiment with the connection pool size per app-instance, as well as the concurrency and max-instances parameters for my cloud run app.
I understand that due to Spring-Boot, my app has a relatively high cold-start time of about 30-40 seconds. This is acceptable for how this service is used.
Problem:
I'm experiencing problems when deploying a spring-boot app to cloud run with max-instances set to a value greater than 1:
Instances start, handle a single request successfully, and then produce no more logs.
This happens a few times per minute, leading me to believe that instances get started (cold-start), handle a single request, die, and then get started again. They are not being reused as described in the docs, and as is happening when I set max-instances to 1. Official docs on concurrency
Instead, I expect 3 container instances to be started, which then each requests according to max-concurrency setting.
Billable container time at max-instances=3:
As shown in the graph, the number of instances is fluctuating wildly, once the new revision with max-instances=3 is deployed.
The graphs for CPU- and memory-usage also look like this.
There are no error logs. As before at max-instaces=1, there are warnings indicating that there are not enough instances available to handle requests (HTTP 429).
Connection Limit of CloudSQL instance has not been exceeded
Requests are handled at less than 10/s
Finally, this is the command used to deploy:
gcloud beta run deploy my-service --project=[...] --image=[...] --add-cloudsql-instances=[...] --region=[...] --platform=managed --memory=1Gi --max-instances=3 --concurrency=3 --no-allow-unauthenticated
What could cause this behavior?
Some month ago, in private Alpha, I performed tests and I observed the same behavior. After discussion with Google team, I understood that instances are over provisioned "in case of": an instances crashes, an instances is preempted, the traffic suddenly increase,...
The trade-off of this is that you will have more cold start that your max instances values. Worse, you will be charged for this over provisioned cold start -> this is not an issue because Cloud Run has a huge free tier that covers this kind of glitches.
Going deeper in the logs (you can do it by creating a sink of Cloud Run logs into BigQuery and then by requesting them), even if there is more instances up than your max instances, only your max instances are active in the same time. I'm not sure to be clear. With your parameters, that means, if you have 5 instances up in the same time, only 3 serve the traffic at the same point of time
This part is not documented because it evolves constantly for find the best balance between over-provisioning and lack of ressources (and 429 errors).
#Steren #AhmetB can you confirm or correct me?
When Cloud Run receives and processes requests rapidly, it predicts how many instances it needs, and will try to scale to the amount. If a sudden burst of requests occur, Cloud Run will instantiate a larger number of instances as a response. This is done in order to adapt to a possible higher number of network requests beyond what it is currently serving, with attempts to take into consideration the length of time it will take for the existing instance to complete loading the request. Per the documentation, it is possible that the amount of container instances can go above the max instance value when it spikes.
You mentioned with max-instances set to 1 it was running fine, but later you mentioned it was in fact producing 429s with it set to 1 as well. Seeing behavior of 429s as well as the instances spiking could indicate that the amount of traffic is not being handled fluidly.
It is also worth noting, because of the cold start time you mention, when instances are serving the first request(s), by design, the number of concurrent requests is actually hard set to 1. Once things are fully ready,only then the concurrency setting you have chosen is applied.
Was there some specific reason you chose 3 and 3 for Max Instance settings and concurrency? Also how was the concurrency set when you had max instance set to 1? Perhaps you could try tinkering up further the concurrency (max 80) and /or Max instances (high limit up to 1000) and see if that removes the 429s.