Retry individual action error of logstash-output-elasticsearch - elasticsearch

I am using twitter-input to fetch data from twitter and ouput-Elasticsearch to store it in elasticsearch
and I am using Logstash 5.2.1 in Ubuntu OS when I run it through the following error
[2017-03-02T08:18:45,576][ERROR][logstash.outputs.elasticsearch] Retrying individual actions
[2017-03-02T08:18:45,576][ERROR][logstash.outputs.elasticsearch] Action
[2017-03-02T08:18:50,796][INFO][logstash.outputs.elasticsearch] retrying failed action with response code: 503 ({"type"=>"unavailable_shards_exception", "reason"=>"[twitter_news][0] primary shard is not active Timeout: [1m], request: [BulkShardRequest to [twitter_news] containing [1] requests]"})
[2017-03-02T08:18:50,796][ERROR][logstash.outputs.elasticsearch] Retrying individual actions
[2017-03-02T08:18:50,796][ERROR][logstash.outputs.elasticsearch] Action
[2017-03-02T08:18:55,840][INFO][logstash.outputs.elasticsearch] retrying failed action with response code: 503 ({"type"=>"unavailable_shards_exception", "reason"=>"[twitter_news][0] primary shard is not active Timeout: [1m], request: [BulkShardRequest to [twitter_news] containing [1] requests]"})
[2017-03-02T08:18:55,840][ERROR][logstash.outputs.elasticsearch] Retrying individual actions
[2017-03-02T08:18:55,841][ERROR][logstash.outputs.elasticsearch] Action

This isn't really anything we should've cared about in the first place. Nobody knows why they made that an error message...
Anyway, they fixed it and it's an information now.

Related

How to stop Kafka producer messages (Debezium + Azure EventHub)

I have setup Debezium and Azure Event Hub as CDC engine from PostgeSQL.
Exactly like on this tutorial: https://dev.to/azure/tutorial-set-up-a-change-data-capture-architecture-on-azure-using-debezium-postgres-and-kafka-49h6
Everything was working good until I have changed something (I don't know exactly what I changed).
Now my kafka-connect log is spammed with below WARN entry and CDC stopped working...
[2022-03-03 08:31:28,694] WARN [dbz-ewldb-connector|task-0] [Producer clientId=connector-producer-dbz-ewldb-connector-0] Got error produce response with correlation id 2027 on topic-partition ewldb-0, retrying (2147481625 attempts left). Error: REQUEST_TIMED_OUT (org.apache.kafka.clients.producer.internals.Sender:616)
[2022-03-03 08:31:28,775] WARN [dbz-cmddb-connector|task-0] [Producer clientId=connector-producer-dbz-cmddb-connector-0] Got error produce response with correlation id 1958 on topic-partition cmddb-0, retrying (2147481694 attempts left). Error: REQUEST_TIMED_OUT (org.apache.kafka.clients.producer.internals.Sender:616)
[2022-03-03 08:31:28,800] WARN [dbz-ewldb-connector|task-0] [Producer clientId=connector-producer-dbz-ewldb-connector-0] Got error produce response with correlation id 2028 on topic-partition ewldb-0, retrying (2147481624 attempts left). Error: REQUEST_TIMED_OUT (org.apache.kafka.clients.producer.internals.Sender:616)
[2022-03-03 08:31:28,880] WARN [dbz-cmddb-connector|task-0] [Producer clientId=connector-producer-dbz-cmddb-connector-0] Got error produce response with correlation id 1959 on topic-partition cmddb-0, retrying (2147481693 attempts left). Error: REQUEST_TIMED_OUT (org.apache.kafka.clients.producer.internals.Sender:616)
This messages appear even when I delete the Kafka connectors.
Restarting kafka and kafka connect does not help.
How to stop this retries?
Only thing that helps to workaround is to:
Delete connector from Debezium API
Stop Kafka-Connect
Delete the EventHub
Start Kafka-Connect
Add connector from Debezium API
To permanently change how reconnect works change below parameter of producer:
producer.retries=10 (by default it is set to over 2 billions causing spam in kafka-connect.log)

How to replication Connection reset by peer in Spring boot?

In my production environment I got the following error in my server:
Cannot forward to error page for request [/api/validation] as the response has already been committed. As a result, the response may have the wrong status code. If your application is running on WebSphere Application Server you may be able to resolve this problem by setting com.ibm.ws.webcontainer.invokeFlushAfterService to false
org.apache.catalina.connector.ClientAbortException: java.io.IOException: Connection reset by peer
Now I created a client and produced 1000 thread every second to call this [/api/validation].
The error I got was
Exception in thread "Thread-9954" org.springframework.web.client.ResourceAccessException: I/O error on POST request for "http://localhost:7080/v1/name/validate": Timeout waiting for connection from pool; nested exception is org.apache.http.conn.ConnectionPoolTimeoutException: Timeout waiting for connection from pool.
Now I want to know is what is the cause of Connection reset by peer .
According to what I know is this error occurs when the client aborts the connection by sending the RST packet.
I set the socket Timeout of my client's rest template to 9000. I make the server sleep for about 15000 MS. Now shouldn't the server get Connection reset by peer as the server tries to send the response after 15 seconds and my client just waits for about 9 seconds. Shouldn't I get the error?
Also in the production environment the wait time (Rest template socket time out) for the client is set to about a 90 seconds ( more than the time the server requires to response). Why is the error being produced in the production?

Some postgress connections timing-out while others don't

I have an AWS EC2 machine running a Laravel 5.2 application that connects to a Postgress 9.6 databse running in RDS. While most of the connections work, some of them are getting rejected when trying to stablish, which causes a Timeout and consequently an error in my API. I don't know what is causing them to be rejected. Also, it is very random when it happens, when it does happen it may be in any API endpoint and inside the endpoint in any query.
When the timeout is handled by PHP, it shows a message like:
SQLSTATE[08006] [7] timeout expired (SQL: ...)
Sometimes the Nginx handles the timeout and replies with a 504 Error. When Nginx handles the timeout I get an error like:
2019/04/24 09:48:18 [error] 20657#20657: *3236 upstream timed out (110: Connection timed out) while reading response header from upstream, client: {client-ip-here}, server: {my-url-here}, request: "GET {my-endpoint-here} HTTP/2.0", upstream: "fastcgi://unix:/var/run/php/php7.0-fpm.sock", host: "{}", referrer: "https://app.cartoriovirtual.com/"
All usage charts on the RDS and EC2 seems ok, I have plenty of RAM, storage, CPU and available connections for RDS. I also checked inner VPC Flows and they seem alright, however I have many IPs (listed as attackers) scanning my network interfaces, most of them been rejected. Some (to port 22) accepted but stoped at authentication, I use a .pem Key File for auth.
The RDS network interface only accepts requests from inner VPC machines. In its logs, every 5 minutes I have a Checkpoint like this:
2019-04-25 01:05:29 UTC::#:[22595]:LOG: checkpoint starting: time
2019-04-25 01:05:34 UTC::#:[22595]:LOG: checkpoint complete: wrote 43 buffers (0.1%); 0 transaction log file(s) added, 0 removed, 1 recycled; write=4.393 s, sync=0.001 s, total=4.404 s; sync files=19, longest=0.001 s, average=0.000 s; distance=16515 kB, estimate=16515 kB
Anyone has tips on how to find a solution? I looked at all possible logs that came in mind, fixed a few little issues but the error persists. I am running out of ideas.

Connection refused: connect Aborting action - session using JMeter

I have 50 threads test in JMeter with multiple session but when I test it half of the threads is failed and I got this error Connection:
Response code: 500
Response message: Connection refused: connect Aborting action - session 656255658 was closed
Check 2 things:
are you sure you’re not reusing same session accross threads ? Are you correctly correlating the session id.
If issue only happens over some limit (not at 25 users for example, but at 50) then it’s a load issue or configuration limit on server side

Connection/Response timeout values don't seem to take effect in JMeter

I am getting 'Non HTTP response message: Connection timed out: connect' for some HTTP requests so I tried to set the connection/response timeout value to 2 minutes (which is more than the connect time required for failing HTTP requests). To do this, I updated "HTTP Request Defaults" and added 120000 as Connect and Response Timeouts.
HTTP Request Defaults timeouts
[
However, when I run the test again, the HTTP requests still gave the same error. The sample result is as follows -
Load time: 21007
Connect Time: 21007
Latency: 0
Size in bytes: 2212
Sent bytes:0
Headers size in bytes: 0
Body size in bytes: 2212
Sample Count: 1
Error Count: 1
Data type ("text"|"bin"|""): text
Response code: Non HTTP response code: java.net.ConnectException
Response message: Non HTTP response message: Connection timed out: connect
It looks like the timeout value I set in HTTP request Defaults is not getting used here. I also tried to set the value of httpclient.timeout=120000 in jmeter.properties but no change. Have I missed something?
Can somebody please help me with this?
Thanks.
Edit - I have multiple HTTP requests and each run, different requests time-out. Here is one of the HTTP requests -
Updates:
I tried changing the Timeouts values in HTTP Request Defaults to very low (2000) to see how HTTP requests work. In this case, I was getting different error for requests exceeding connection time of 2000ms -
Non HTTP response code: org.apache.http.NoHttpResponseException/Non HTTP response message: : failed to respond
So I think changing the timeout values is not affecting my original error -
Non HTTP response code: java.net.ConnectException/Non HTTP response message: Connection timed out: connect
What is the difference between these two message?
The issue seems more of a server configuration of connection timeout than client side configuration of connection timeout, though both must be configured appropriately.
Default connectionTimeout in tomcat server is 20 seconds. and you request is failed due to connection timeout at 21 seconds. so, though you configured at client side (120000) you must configure appropriately at server side as well, otherwise, server forces to close the connection attempt and raises Connect Timeout exception.
Reference:
The HTTP Connector (refer connectionTimeout attribute)
Recently I have faced the same problem and found that it is the default configuration in my OS (Windows). Check the following links for details:
Where does the socket timeout of 21000 ms come from?
Which is the default TCP connect timeout in Windows?
Shortly, based on articles mentioned in the links above, Windows uses 3000ms initial timeout (InitialRto setting) and does 2 retries with doubled timeout from the previous attempt (MaxSynRetransmissions setting): 3sec + 2*3sec + 4*3sec = 21 sec.
In order to increase this timeout you can set more retries with the following command:
netsh interface tcp set global MaxSynRetransmissions=3

Resources