Getting error 504 in test server but not locally - spring

I've been trying to fix a really annoying problem in a Java Web project. We use Spring (3.1.4) with webflow (2.3.1), running over Wildfly 11.0 on a Amazon Server. In short, the remote server is giving 504 - Timeout error for a task that the local environment doesn't.
An specific page exports a .xls report which internally do a lot of work in consulting database and other aplication REST APIs. This functionality is inside a ManagedBean like so:
#org.springframework.stereotype.Component
#Scope(value = "request")
public class ReportMB {
...
public void export(){
try{
// code goes here
// generates HttpServletResponse containing the report file
} catch (...) {...}
}
This report is successfully generated in my development machine even if I choose too many data to be included, although it takes up to few minutes to complete. When it runs on the remote server, with the same Wildfly version and project deployment, we get 504 - Timeout with far less data in under 100 seconds.
I've already examined the logs and even debugged the proccess (locally), and both environments (local and remote) do not thow any exception. Do you have any clue why the local server waits until the function finishes and the file gets ready but the remote server doesn't, or at least do you have an idea on how to increase the Wildfly/Spring timeout tolerance for this request? (I have seen many examples of changing the deployment timeout but nothing related to the request timeout)
Thank you in advance.

The 504 will be being created by the loadbalancer not the app server. Increase the timeout on the loadbalancer

Related

How do I solve Jmeter Error Connection reset?

I am running a performance test on a site using Jmeter. Using a load of up to 100 simultaneous users(Threads) the tests pass perfectly, trying to raise this load to 300 users (Threads) I get the following error:
Non HTTP response code: java.net.SocketException / Non HTTP response message: Connection reset
The error occurs in only 0.68% of requests (out of 2412 requests made by 300 users(Threads) only 2 requests generated this error)
I thought it was the maximum number of connections allowed on my server, I went to my application's webconfig and entered the following information: "Min Pool Size = 5; Max Pool Size = 500;". but still not solve the problem.
Does anyone know what I can do to not generate these errors?
Most probably it indicates a problem with your application, try checking:
application logs
application/web server logs and configuration
underlying operating system logs and networking configuration. also pay attention to number of open ports/sockets/handles (can be checked using either built-in OS monitoring tools or JMeter PerfMon Plugin)
If you're absolutely sure that there is nothing wrong with your test script and application and JMeter is configured to behave exactly like a real browser you can follow instructions from JMeterSocketClosed wiki page
More information: The Mysteries of Connection Close

TFS Build server unable to communicate with controller for load testing

What steps can I take to test the connectivity of a controller?
How can I ensure that my tfs build server is able to communicate with a controller in order to perform a load test?
My company is trying to automate load testing, and to accomplish this we are using a tfs build. I have a command line task to actually start the load tests. To visualize, it looks like this:
Command Line Task
Tool: MSTest.exe
Arguments: /testcontainer:"some load test.loadtest" /testsettings:"some test settings.testsettings"
When I run the build, it times out after about 3-4.5 minutes and gives me this error:
Failed to queue test run 'some load test': A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond
I've tried a few things to test the connectivity of the controller, but I'm not sure I've reached the answer I'm looking for yet.
The first thing I tried was to restart the controller, however that didn't change anything.
I then went into vs2017 and opened up the load test editor. From there I went into Manage test controllers. The controller dialogue box is set to the correct controller, and the agents it's controlling are at the Ready.
Then I looked at the test settings, and verified that the <RemoteController/> was also set correctly. I did this more just to confirm that the error isn't with the files themselves. So, the timeout should be happening due to some connection problem between the server or the controller.
I know that the controller uses a specific port for incoming traffic, port 6901, so I checked the port connection using a .NET TcpClient in PS:
$server = 'mycontroller'
$port = 6901
$client = New-Object Net.Sockets.TcpClient
try{
$client.Connect($server, $port)
$msg = "Connected to $server on Port $port"
} catch {
$msg = "Could not connect to $server on Port $port"
} finally {
$client.Dispose()
}
write-host $msg
Provided the controller is turned on, $client always returns that it is able to connect to $server. Are there any properties besides Connected a TcpClient object has that could give me more information about its connection? Because the build server still times out despite $client seemingly being able to connect. Do I need to give the build server the specific port to connect to? Shouldn't it do that automatically?
I tried tracert through another command line task to see if I could pinpoint where the server was getting timed out:
2019-07-10T20:28:32.9051572Z C:\Windows\system32\cmd.exe /c "TRACERT.exe atsvstctstvm400"
2019-07-10T20:28:32.9271264Z Tracing route to mycontroller.local [10.208.3.56]
2019-07-10T20:28:32.9281250Z over a maximum of 30 hops:
2019-07-10T20:28:37.7253994Z 1 1 ms <1 ms <1 ms 10.200.0.5
2019-07-10T20:28:38.7269952Z 2 <1 ms <1 ms <1 ms 10.50.0.2
2019-07-10T20:28:39.8534160Z 3 38 ms 39 ms 38 ms 172.20.0.2
2019-07-10T20:28:52.3918376Z 4 * * * Request timed out.
2019-07-10T20:28:52.5396304Z 5 48 ms 48 ms 48 ms 10.208.3.56
2019-07-10T20:28:52.5396304Z Trace complete.
Could the 4th hop be where the build server is getting hung up on? It seems like despite the timeout on hop 4, the agent was still able to ping through to the controller. If it is the 4th hop I need to be concerned with, how do I even go about fixing it? It doesn't give me any information about where the 4th hop wanted to go.
So, in sum, I'm trying to run a load test through a command line task in tfs using MSTest.exe. When I run the build, I get a timeout error, which I believe means that the build server is unable to communicate with the controller.
I've tried a few ideas for troubleshooting, but I haven't been able to resolve the issue or shed any light on how to proceed.
How can I pinpoint where the communication error is?
What other troubleshooting solutions are there to resolve this issue?
For future load test runs, how can I test to ensure that the build server is able to communicate with the controller?

Postgresql: No connection could be made because the target machine actively refused it

Running Postgresql 9.5 on a windows server 2012 R2 in Azure
While running some loadtests on my application, I get errors on not being able to connect to the postgres server. In the logs of postgres I get the following message:
could not receive data from client: No connection could be made
because the target machine actively refused it.
This only happens when the loadtest goes to the next scenario, hitting a different part of the code. So new connections to the database are required. But after 10-20 seconds the rest of the scenario works flawlessly without hitting any other hiccups. So the problem seems to be the tcp connections. (My code retries a couple of times but it is not feasible to let it retry for 20 seconds)
I'm using the following settings in the config files
postgresql.conf
listen_addresses = '*'
max_connections = 500
shared_buffers = 1024MB
temp_buffers = 2MB
work_mem = 2MB
maintenance_work_mem = 128MB
pg_hba.conf
host all all 0.0.0.0/0 trust
host all all ::/0 trust
I know, I know.. It is not save to accept connections from everyone, but this is just for testing purposes and to make sure these settings are not blocking any connection. So this answer is void
I've been monitoring the number of connection on the server and under the load it is stable at 75. Postgres is using around 350mb of RAM. So given the config and the vm specs (7gb ram) there should be plenty of space to create more connections. However when the next scenario is spinning up the number of connections does not increase, it stays level and starts giving these log messages about no connection could be made.
What could be the problem here?
It does sound like this isn't really a Postgres problem (hence no changes in DB stats you're checking), rather that the traffic is being stopped by the server. Possibly because traffic on that port is saturated while handling your load testing queries?
It doesn't sound like you're hitting any of the Azure resource limits (including the database limits if that applies to your setup?), but without more detail on your load tests it's hard to say exactly what is needed.
Solutions from around the web and other SO answers suggest:
Disable TCP autotuning and tweak the TCP/IP registry keys on the server, e.g. set TcpAckFrequency - see this article for details
Make TCP setting adjustments (like WinsockListenBacklog) - which may be affected by whether connection pooling is in use or not - see this MS support article, which is for SQL Server 2005 but has some great tips on troubleshooting rejected TCP/IP connections (using Network Monitor, but applies to newer tools)
Faster request processing if you have enough control of the server - source
Disabling network proxying (in your load testing app): <defaultProxy> <proxy usesystemdefault="False"/> </defaultProxy> - source
Most possible reason is a Firewall/Anti-virus:
Software/Personal Firewall Settings
Multiple Software/Personal Firewalls
Anti-virus Software
LSP Layer
(Virtual) Router Firmware
Does your current Azure infrastructure contain Firewall or Anti-virus ?
Additionally on doing some additional searches, it looks like this is a standard Windows "connection refused" message, which suggests that PostgreSQL is trying to connect to something and being refused.
Also possible that one network element in your network - assuming that you are still connected to the server - will delay or drop somes DB login/authentication network packets (considered for example as a fake auth.replay) ...
You may also use a packet analyzer (like Wireshark) to record/inspect network flow when the error appear.
Regards
I was facing the same issue in my AspNet core application while I was trying to connect the Postgresql from my application. The error was thrown in the Program.cs file when I was calling the Migrate function.
public static void Main(string[] args) {
try {
var host = BuildWebHost(args);
using(var scope = host.Services.CreateScope()) {
// Migrate once after app is started.
scope.ServiceProvider.GetService <MyDatabaseContext>().Migrate();
}
host.Run();
}
catch(Exception e) {
//NLog: catch setup errors
_logger ? .Error(e, "Stopped program because of exception: ");
throw;
}
}
To fix this problem I did the following steps.
Check whether the Postgresql service is running by going to the services.msc
Tried to login to the pgAdmin with the user and password I provided in the database context
Everything was file, and as you know that 5432 is the default port of Postgresql and somehow I was using a different port in my application connection string, changing it to 5432 fixed this issue for me.
"ConnectionString": "User Id=postgres;Password=mypwd;Host=localhost;Port=5432;Database=mydb;"
I came across a similar issue whilst trying to beast my api, where I was seeing Npgsql.NpgsqlException No connection could be made because the target machine actively refused it..
However my issue was was down to the fact that I was re-creating my NpgsqlConnection for each query rather than re-using and keeping it alive.

Performance - High Context Switch

I have an application which exposes a web service on which I am trying to do a load test.
It works for few concurrent users without any issue.
When I increase the user count to 30, I simply get this error in JMeter within 100 milli seconds.
Non HTTP response code: java.net.SocketException - Non HTTP response message: Connection reset
[I thought my JMeter config was wrong - but one of the web application which uses this web-service also failed consistently around that time saying the service was unavailable. So, server itself has some issue].
I checked the web service - application log - No exception & very clean.
CPU, Memory utilization of server is also very normal on the server machine.
However, 'Context Switch' & 'Device Interrupts' are increasing under load.
Context Switch is avg 1500/sec under heavy load. Normally It is 500/Sec.
Is this bad? Is it what makes my application perform badly? I have no clue to resolve this issue.
Note: It is JBOSS server

Connecting two WebSocket servers to eachother

I have two WebSocket servers that can communicate wonderfully with a client. They are on two separate machines, implemented in Java and running inside WildFly8 webservers. What I need them to do now is communicate with each other. That means: client sends message to server 1, server 1 sends message to server 2, receives the reply and sends it back to client.
The servers run on different apps in OpenShift and I need them to use websockets. Or some other type of communication, but I haven't managed to find anything that actually works so far (RMI or normal socket connections won't work).
What I basically tried to do is use the same code from the client within the onMessage method of the first server. Something like this:
#OnMessage
public void message(Session session, String msg){
...
WebSocketContainer container = ContainerProvider.getWebSocketContainer();
Session NewSession = container.connectToServer(Client.class, URI.create(URL));
NewSession.getBasicRemote().sendText("Routed :" + input);
...
}
However, the server does not connect to the other server and I don't know why.
Any suggestions?
Thank you!
Put connectToServer inside a try {} catch, you might get an error. Log it.
I'm struggling to do exactly the same things (2 websocket servers, Wildfly 8), and I get a permission denied error. See my post here:
https://stackoverflow.com/questions/30966757/java-server-to-server-communication-with-websockets-permission-denied

Resources