504 Gateway timeout for ELB - spring-boot

I have AWS Elastic load balancer which has two healthy instances. If I make a POST request, it gets accepted. But consequent requests throw 504 gateway timeout error. After 5-10 minutes, it accepts 2-4 requests, and then start throwing 504 error. I try to reach is Spring Boot Application hosted on these two instances. There are no application level timeouts. Further time duration between failed and accepted requests vary, so I believe no fixed timeout configuration setting is causing an issue. How can I resolve this?

Related

gateway timeout error "The gateway did not receive a timely response from the upstream server or application."

I'm getting an error of "gateway timeout. The gateway did not receive a timely response from the upstream server or application." What I'm doing is I'm getting data from sticky CRM by API and storing it into my database and it takes much time to execute. What I noticed is that I'm getting this error right after one minute. So I want to increase that time. I have install web server under ec2 instance and database under rds. I want to increase my gateway time from one minute to infinite or 15-20 minutes.
Help will be appreciated.
Thanks.

AWS ALB returning 502 without any log entries

We're using node js backend servers running in AWS ECS, behind an ALB. We then have AWS API gateway with a proxy lambda calling the ALB. This has been running in production for months, when suddenly a few days ago we started seeing 502 errors from some API calls.
I've checked the proxy lambda logs to see that the 502 is returned from the ALB. However, when I check my node application logs, there are no failing requests, in fact no requests seem to have reached the application at these timestamps. I then enabled access logs on the ALB, which only shows 200/201 responses - no 5xx whatsoever. I'm now a bit confused as to where to look next. What could cause my ALB to return 502 without this being present in the ALB access logs? And what could cause the requests to not reach my node app in ECS? Does anyone have any idea on what logs to check next or what to do to pinpoint the errors? Could some layer within ECS cause those symptoms? I can't see any errors in my docker containers or anything.
It seems to happen in bursts, up to 50 failed requests within a period of time, then all ok for several hours.
It could be due to a number of reasons. The below may be applicable to you -
The load balancer received a TCP RST from the target when attempting
to establish a connection.
The load balancer received an unexpected response from the target,
such as "ICMP Destination unreachable (Host unreachable)", when
attempting to establish a connection. Check whether traffic is allowed
from the load balancer subnets to the targets on the target port.
The target closed the connection with a TCP RST or a TCP FIN while the
load balancer had an outstanding request to the target. Check whether
the keep-alive duration of the target is shorter than the idle timeout
value of the load balancer.
The target response is malformed or contains HTTP headers that are not
valid.
The load balancer encountered an SSL handshake error or SSL handshake
timeout (10 seconds) when connecting to a target.
reference docs
This turned out to be memory leaks in my container applications. The RAM usage grew with every request until crash. At that point it took a while for ECS and ALB to react, so a bunch of requests were routed to the dead instance.
The problem was resolved by fixing the leak, but I'd have wanted better built in support for alarms on high memory usage from ECS/cloudwatch with triggers to replace instances on high usage gracefully. Seems i have to build that from scratch.

Jersey client gets 504 when server keeps processing request

I have a Jersey client and server. And I see this behavior:
In client I post a request
In the server I see the request and start to handle it
Then out of a sudden I receive an empty response with status 504 to the client while the server still processes the request
I've set the client properties to have read and connect timeouts much higher than the time I get the empty response
After further analysis - the gateway timeout was due to a Load-Balancer between the client and the server.
Reconfiguring the timeout in the Load-Balancer solved the issue

Increase Timeouts To Avoid 504 Gateway Timeouts

I've taken over this rails app that is hosted on EC2 using passenger. After 1 minute requests are stopped with a gateway timeout 504 error. Where can this be increased. I've increased apache's /etc/http/http.conf to be TimeOut: 100000000, but that doesn't work.

What raises HTTP 503 and how to change timeout?

I have inherited an application (internal to my company) that uses javascript running in Internet Explorer which makes Ajax calls to a Struts-based application running in WebLogic Server v10.
Certain server-side operations in the system are taking longer than 3 minutes. Users consistently noticed that the Ajax call returns 503 error at the 3 minute mark. My users can wait longer than 3 minutes, but 503 errors interrupt their work.
This application needs to be performance tuned, but we badly need a temporary workaround to extend how much time can occur before a 503 error is returned.
The current theory is that the 503 error is being raised by the IE XMLHttpRequest object. A team of supposed WebLogic experts poured over our code and WebLogic logs, and declared that there's no timeout occurring on the server side. But I have my doubts.
My question is, which piece of software is responsible for raising 503 error: the browser, the Ajax javascript, or the server? And can this timeout period be changed?
A 503 error is kind of a catch-all for a lot of different types of errors, usually on the server side. In your case it could be that the server is just rejecting the connection after a certain timeout, and responding back with a 503 to indicate that the server is overloaded or cannot process your request.
A lot of times with web services, a 503 will be returned when the server code throws an exception or error. If the server code doesn't properly handle the error, it will bubble up to the server, which will just respond back with a generic 503.
http://www.checkupdown.com/status/E503.html
Error code 5xx (alternate definition)
RFC 2616
503 is a server error. XMLHttpRequest will happily wait longer than 3 minutes. The first thing you should do is satisfy yourself of that by visiting the problem URL in telnet or netcat or similar and seeing the 503 with javascript out of the picture.
Then you can proceed to find the timeout on the server side.
Your web server has a request reply timeout which is being tripped by long-running service requests. It could be the WebLogic server or a proxy. It is certainly not the client.
Have you considered submitting an asynchronous HTTP request that will be responded to immediately, and then polling another location for the eventual results? Three minutes is about 170 seconds too long.
503 is most likely due to a timeout on the server. If you can tune your Apache server, read about the Timeout attribute that you can set in httpd.conf.
Look in the httpd/logs/error_log to see if timeouts are occurring.
Refer also to this answer: Mod cluster proxy timeout in apache error logs .

Resources