I have a load balancer which redirects to an ec2 server which has a mulesoft application running on a port,
Load balancer redirects whatever comes through to that port
I get a 502 bad gateway everytime I try to POST data through postman, as soon as I post again the request goes through.
1st Attempt: Fail with 502 bad gateway
2nd Attempt (Immediate): Goes through
I increased the timeout on the load balancer to 5 minutes but still I get 502 bad gateway on the first try
Any help/suggestions appreciated
Related
We're using node js backend servers running in AWS ECS, behind an ALB. We then have AWS API gateway with a proxy lambda calling the ALB. This has been running in production for months, when suddenly a few days ago we started seeing 502 errors from some API calls.
I've checked the proxy lambda logs to see that the 502 is returned from the ALB. However, when I check my node application logs, there are no failing requests, in fact no requests seem to have reached the application at these timestamps. I then enabled access logs on the ALB, which only shows 200/201 responses - no 5xx whatsoever. I'm now a bit confused as to where to look next. What could cause my ALB to return 502 without this being present in the ALB access logs? And what could cause the requests to not reach my node app in ECS? Does anyone have any idea on what logs to check next or what to do to pinpoint the errors? Could some layer within ECS cause those symptoms? I can't see any errors in my docker containers or anything.
It seems to happen in bursts, up to 50 failed requests within a period of time, then all ok for several hours.
It could be due to a number of reasons. The below may be applicable to you -
The load balancer received a TCP RST from the target when attempting
to establish a connection.
The load balancer received an unexpected response from the target,
such as "ICMP Destination unreachable (Host unreachable)", when
attempting to establish a connection. Check whether traffic is allowed
from the load balancer subnets to the targets on the target port.
The target closed the connection with a TCP RST or a TCP FIN while the
load balancer had an outstanding request to the target. Check whether
the keep-alive duration of the target is shorter than the idle timeout
value of the load balancer.
The target response is malformed or contains HTTP headers that are not
valid.
The load balancer encountered an SSL handshake error or SSL handshake
timeout (10 seconds) when connecting to a target.
reference docs
This turned out to be memory leaks in my container applications. The RAM usage grew with every request until crash. At that point it took a while for ECS and ALB to react, so a bunch of requests were routed to the dead instance.
The problem was resolved by fixing the leak, but I'd have wanted better built in support for alarms on high memory usage from ECS/cloudwatch with triggers to replace instances on high usage gracefully. Seems i have to build that from scratch.
I have AWS Elastic load balancer which has two healthy instances. If I make a POST request, it gets accepted. But consequent requests throw 504 gateway timeout error. After 5-10 minutes, it accepts 2-4 requests, and then start throwing 504 error. I try to reach is Spring Boot Application hosted on these two instances. There are no application level timeouts. Further time duration between failed and accepted requests vary, so I believe no fixed timeout configuration setting is causing an issue. How can I resolve this?
I have a Jersey client and server. And I see this behavior:
In client I post a request
In the server I see the request and start to handle it
Then out of a sudden I receive an empty response with status 504 to the client while the server still processes the request
I've set the client properties to have read and connect timeouts much higher than the time I get the empty response
After further analysis - the gateway timeout was due to a Load-Balancer between the client and the server.
Reconfiguring the timeout in the Load-Balancer solved the issue
I have configured spring with web sockets, including rabbit mq on the back end and I can confirm that I can send push messages to the browser.
And using SockJS on the front end.
Up until now I have been using the classic load balancer.
I am trying to get web sockets to work on AWS. I have upgraded to the Application Load Balancer but I still get Bad Request response when I try to make the web socket connection to:
ws://XXXX.eu-west-1.elasticbeanstalk.com/spring/hello/870/sbmdv5tn/websocket
That call still gives 400 Bad Request response...
And I see
Handshake failed due to invalid Upgrade header: null
Errors on the back end...
It has to do the fact that the a connection upgrade is requested and these upgrade requests occur "per hop".
In my scenario I am running with apache in front of tomcat and in order for tomcat to receive these upgrade headers I need to enable web socket tunnelling on the apache proxy such that apache will simply pass through the upgrade request.
UPDATE:
Although a better solution is to bypass apache altogether and go straight to tomcat - that is configure the load balancer to route to port 8080 and not port 80. I suspect the reason elastic beanstalk does not do this by default because it then requires a load balancer - and if you only want single instance you don't need a load balancer.
I have inherited an application (internal to my company) that uses javascript running in Internet Explorer which makes Ajax calls to a Struts-based application running in WebLogic Server v10.
Certain server-side operations in the system are taking longer than 3 minutes. Users consistently noticed that the Ajax call returns 503 error at the 3 minute mark. My users can wait longer than 3 minutes, but 503 errors interrupt their work.
This application needs to be performance tuned, but we badly need a temporary workaround to extend how much time can occur before a 503 error is returned.
The current theory is that the 503 error is being raised by the IE XMLHttpRequest object. A team of supposed WebLogic experts poured over our code and WebLogic logs, and declared that there's no timeout occurring on the server side. But I have my doubts.
My question is, which piece of software is responsible for raising 503 error: the browser, the Ajax javascript, or the server? And can this timeout period be changed?
A 503 error is kind of a catch-all for a lot of different types of errors, usually on the server side. In your case it could be that the server is just rejecting the connection after a certain timeout, and responding back with a 503 to indicate that the server is overloaded or cannot process your request.
A lot of times with web services, a 503 will be returned when the server code throws an exception or error. If the server code doesn't properly handle the error, it will bubble up to the server, which will just respond back with a generic 503.
http://www.checkupdown.com/status/E503.html
Error code 5xx (alternate definition)
RFC 2616
503 is a server error. XMLHttpRequest will happily wait longer than 3 minutes. The first thing you should do is satisfy yourself of that by visiting the problem URL in telnet or netcat or similar and seeing the 503 with javascript out of the picture.
Then you can proceed to find the timeout on the server side.
Your web server has a request reply timeout which is being tripped by long-running service requests. It could be the WebLogic server or a proxy. It is certainly not the client.
Have you considered submitting an asynchronous HTTP request that will be responded to immediately, and then polling another location for the eventual results? Three minutes is about 170 seconds too long.
503 is most likely due to a timeout on the server. If you can tune your Apache server, read about the Timeout attribute that you can set in httpd.conf.
Look in the httpd/logs/error_log to see if timeouts are occurring.
Refer also to this answer: Mod cluster proxy timeout in apache error logs .