Postgresql: No connection could be made because the target machine actively refused it - windows

Running Postgresql 9.5 on a windows server 2012 R2 in Azure
While running some loadtests on my application, I get errors on not being able to connect to the postgres server. In the logs of postgres I get the following message:
could not receive data from client: No connection could be made
because the target machine actively refused it.
This only happens when the loadtest goes to the next scenario, hitting a different part of the code. So new connections to the database are required. But after 10-20 seconds the rest of the scenario works flawlessly without hitting any other hiccups. So the problem seems to be the tcp connections. (My code retries a couple of times but it is not feasible to let it retry for 20 seconds)
I'm using the following settings in the config files
postgresql.conf
listen_addresses = '*'
max_connections = 500
shared_buffers = 1024MB
temp_buffers = 2MB
work_mem = 2MB
maintenance_work_mem = 128MB
pg_hba.conf
host all all 0.0.0.0/0 trust
host all all ::/0 trust
I know, I know.. It is not save to accept connections from everyone, but this is just for testing purposes and to make sure these settings are not blocking any connection. So this answer is void
I've been monitoring the number of connection on the server and under the load it is stable at 75. Postgres is using around 350mb of RAM. So given the config and the vm specs (7gb ram) there should be plenty of space to create more connections. However when the next scenario is spinning up the number of connections does not increase, it stays level and starts giving these log messages about no connection could be made.
What could be the problem here?

It does sound like this isn't really a Postgres problem (hence no changes in DB stats you're checking), rather that the traffic is being stopped by the server. Possibly because traffic on that port is saturated while handling your load testing queries?
It doesn't sound like you're hitting any of the Azure resource limits (including the database limits if that applies to your setup?), but without more detail on your load tests it's hard to say exactly what is needed.
Solutions from around the web and other SO answers suggest:
Disable TCP autotuning and tweak the TCP/IP registry keys on the server, e.g. set TcpAckFrequency - see this article for details
Make TCP setting adjustments (like WinsockListenBacklog) - which may be affected by whether connection pooling is in use or not - see this MS support article, which is for SQL Server 2005 but has some great tips on troubleshooting rejected TCP/IP connections (using Network Monitor, but applies to newer tools)
Faster request processing if you have enough control of the server - source
Disabling network proxying (in your load testing app): <defaultProxy> <proxy usesystemdefault="False"/> </defaultProxy> - source

Most possible reason is a Firewall/Anti-virus:
Software/Personal Firewall Settings
Multiple Software/Personal Firewalls
Anti-virus Software
LSP Layer
(Virtual) Router Firmware
Does your current Azure infrastructure contain Firewall or Anti-virus ?
Additionally on doing some additional searches, it looks like this is a standard Windows "connection refused" message, which suggests that PostgreSQL is trying to connect to something and being refused.
Also possible that one network element in your network - assuming that you are still connected to the server - will delay or drop somes DB login/authentication network packets (considered for example as a fake auth.replay) ...
You may also use a packet analyzer (like Wireshark) to record/inspect network flow when the error appear.
Regards

I was facing the same issue in my AspNet core application while I was trying to connect the Postgresql from my application. The error was thrown in the Program.cs file when I was calling the Migrate function.
public static void Main(string[] args) {
try {
var host = BuildWebHost(args);
using(var scope = host.Services.CreateScope()) {
// Migrate once after app is started.
scope.ServiceProvider.GetService <MyDatabaseContext>().Migrate();
}
host.Run();
}
catch(Exception e) {
//NLog: catch setup errors
_logger ? .Error(e, "Stopped program because of exception: ");
throw;
}
}
To fix this problem I did the following steps.
Check whether the Postgresql service is running by going to the services.msc
Tried to login to the pgAdmin with the user and password I provided in the database context
Everything was file, and as you know that 5432 is the default port of Postgresql and somehow I was using a different port in my application connection string, changing it to 5432 fixed this issue for me.
"ConnectionString": "User Id=postgres;Password=mypwd;Host=localhost;Port=5432;Database=mydb;"

I came across a similar issue whilst trying to beast my api, where I was seeing Npgsql.NpgsqlException No connection could be made because the target machine actively refused it..
However my issue was was down to the fact that I was re-creating my NpgsqlConnection for each query rather than re-using and keeping it alive.

Related

gRPC stopped working with APO on Windows 11

I have a Windows application (APP) and Audio Processing Object (APO) loaded by AudioDG.exe that communicate via gRPC:
APP part that is written in C# creates server via Grpc.Core.
APO part creates client via grpc++.
Server is on 127.0.0.1:20000 (I can see it's up and listening with netstat -ano).
I can confirm that APO is loaded into audio device graph by inspecting it with process explorer.
Everything worked like a charm on Windows 8 and 10, but on 11 it cannot communicate at all - I get either Error Code 14, Unavailable, failed to connect to all addresses or 4, Deadline Exceeded.
After enabling debug traces, I now see "socket is null" description for "connect failed" error:
I0207 16:20:59.916447 0 ..\..\..\src\core\ext\filters\client_channel\subchannel.cc:950: subchannel 000001D8B9B01E20 {address=ipv4:127.0.0.1:10000, args=grpc.client_channel_factory=0x1d8bb660460, grpc.default_authority=127.0.0.1:10000, grpc.internal.subchannel_pool=0x1d8b8c291b0, grpc.primary_user_agent=grpc-csharp/2.43.0 (.NET Framework 4.8.4470.0; CLR 4.0.30319.42000; net45; x64), grpc.resource_quota=0x1d8b8c28d90, grpc.server_uri=dns:///127.0.0.1:10000}: connect failed: {"created":"#1644240059.916000000","description":"socket is null","file":"..\..\..\src\core\lib\iomgr\tcp_client_windows.cc","file_line":112}
What I've tried so far:
Updating both parts to the latest grpc versions.
Using "no proxy", "Http2UnencryptedSupport" and other env variables.
Using "localhost" or "0.0.0.0" instead of "127.0.0.1".
Updating connection to use self signed SSL certificates (root CA, server cert + key, client cert + key).
Adding inbound / outbound rules for my port, and then disabling firewall completely.
Creating server on APO side and trying to connect with the client in APP.
Everything works (both insecure and SSL creds) if I create both client and server in C# part, but as soon as it's APP-APO communication it feels blocked or sandboxed.
What has been changed in Windows 11 that can "block" gRPC?
Thanks in advance!
In your input you write:
Server is at 127.0.0.1:20000
Further looking at the logs, you can see that:
The server is located at
grpc.server_uri=dns:///127.0.0.1:10000
Based on the question posed and the amount of data provided, I would check which port the server is really using and which port the client is looking for a connection on.
The easiest way to do this is to use the built-in Resource Monitor application. On the Network tab, in the TCP Connections list, you can find the application and the port it uses.
You can also use the PowerShell command
Test-NetConnection -Port 10000 -InformationLevel "Detailed"
Test-NetConnection -Port 20000 -InformationLevel "Detailed"
At least this is the first thing I would check based on what you described.
Regarding your question about the changes in Windows 11, I do not think that this is something that's causing problems for you. However, Windows 11 has additional security features compared to Windows 10, try disabling the security features completely as a test. Perhaps this will help solve the problem.
As for ASP.NET Core 6.0 itself (if I understood the version correctly), then there is a possibility that the server part, working not in the sandbox of the programming environment, still does not accept the client certificate. At the program level, you can try to fix this by adding the following exception to the code:
// This switch must be set before creating the GrpcChannel/HttpClient.
AppContext.SetSwitch(
"System.Net.Http.SocketsHttpHandler.Http2UnencryptedSupport", true);
// The port number(5000) must match the port of the gRPC server.
var channel = GrpcChannel.ForAddress("http://localhost:5000");
var client = new Greet.GreeterClient(channel);
More troubleshooting issues with ASP.NET Core 6.0 Microsoft described in detail here.
https://learn.microsoft.com/en-us/aspnet/core/grpc/troubleshoot?view=aspnetcore-6.0
I hope it was useful and at least one of the solutions I suggested will help solve your problem. In any case, if I had more information, I think I could help you more accurately.

TCPListener in golang: error when number of connections is above 60 / 260

I am building TCP Proxy: client <-> proxy <-> Vertica
I have a net.TCPListener, which takes incoming requests by AcceptTCP() and creating connections, then, making connection to destination socket by net.DialTCP("tcp", nil, raddr). Looks like a bridge. Default proxy model.
Firstly, at first version, i have a trouble: if i have 59 parallel incoming request, everything is fine. But if i have one more (60), i have a trouble: 1-59 connections are OK, but 60 and newer are fault. I cant catch error properly. Looks like some socket unexpectedly closes
Secondly, i tried to set queue for listener. It helps me a lot: but if i have more than 258 requests, i get error again.
My question: is there any limit of connections in net package? May be it is system limitation?
For external info: Vertica running in docker container, hw/system: macbook, vertica limit connection pool: 5, but pool logic implemented into proxy.
I also tried set "raw" proxy without pool logic (thats why i set queue for listener: i must not exceed threshold of Vertica User's pool), result is 258 requests..
UPDATED: (05.04.2020)
Looks like it is system limitations fault. Did I mention anywhere that I trying to run the whole system on one PC?
So, what I had:
300 parallel processes as requests (making by multiprocessing.Pool
Python) (300 sockets)
Listener that creates 300 connections (once
more 300 sockets)
And series of rapidly creating/closing sockets in
deep of proxy (according to queue and Vertica pool)
What I have now:
300 python requests making from another PC in my local network (on Windows)
Proxy works fine
But I have several errors on Windows PC, which creating requests to my proxy. Errors like low memory in "swap file".
I still need to make some stress test for proxy. Adding less memory for swap file didn't solve my problem on Windows PC. I will be grateful for any suggestions and ideas. Thanks!
How does the proxy connect to Vertica?
There is by default a maximum of 50 ordinary mortal users to be connected to one Vertica node at any one time. The superuser "dbadmin" always has 5 connections in addition to that.
So if I try to connect 60 times as dbadmin, I get this on a default Vertica configuration:
Connection attempt failed: FATAL 4060: New session rejected due to limit, already 55 sessions active
You can increase the Vertica config item MaxClientSessions from its default of 50 per node.
Command is : ALTER NODE <_node_name_> SET MaxClientSessions = 100, for example.
I suppose you are always connecting to the same Vertica node, and that you have set ConnectionLoadBalancing to FALSE. So you always connect to the same node, and soon reach the default maximum of 50.
Hope that's the reason found ....

Websocket server stops accepting after ~600 connections

I'm running a websocket server (command line program) off port 9000 on a Windows 2008 server. I can't seem to figure out why it will not accept more than about 600 concurrent connections. Testing on my local machine, I can create thousands of concurrent connections. But on the server, I get the following error after about 600:
No connection could be made because the target machine actively refused it
I have tried adjusting registry entries for the max port number, and turning off the firewall to no avail. I have also tried a different websocket server implementation. Is there some other setting I need to change?
edit: I tried this on a Linux server as well with the same problem.
I found the problem:
It seems to be my client side internet connection. By running the same tests on a different network from the client side, I can create thousands of connections.

Stale connection with Pheanstalk

I'm using beanstalkd to offload some work to other machines. The setup is a bit unusual, the server is on the internet (public ip) but the consumers are behind adsl lines on some peoples homes. So there is a linux server as client going out through a dynamic ip and connecting to the server to get a job. It's all PHP and I'm using pheanstalk library.
Everything runs smoothly for some time, but then the adsl changes the IP (every 24h hours the provider forces a disconnect-reconnect) the client just hangs, never to go out of "reserve".
I thought that putting a timeout on the reserve would help it, but it didn't. As it seems, the client issues a command and blocks, it never checks the timeout. It just issues a reserve-with-timeout (instead of a simple reserve) and it is the servers responsibility to return a TIME_OUT as the timeout occurs. The problem is, the connection is broken (but the TCP/IP doesn't know about that yet until any of the sides try to talk to the other side) and if the client blocked reading, it will never return.
The library seems to have support for some kind of timeouts locally (for example when trying to connect to server), but it does not seem to contemplate this scenario.
How could I detect the stale connection and force a reconnect? Is there some kind of keepalive on the protocol (and on the pheanstalk itself)?
Thanks!
You could try to close each connection right after the request is answered and reopen a new connection each time.
There is no close() function but you deleting the Pheanstaly Object with unset($pheanstalk) will close it.
This explanation is quite helpful:
Pheanstalk (PHP client for beanstalk) - how do connections work?
I haven't tried it yet, but I came up with the idea of connecting to the beanstalk server through an SSH tunnel. We can enable the ServerAliveCountMax and ServerAliveInterval options on the tunnel, so that a network or server failure will cause the tunnel to close. This should then cause the pheanstalk client to report an error.

TCP: Address already in use exception - possible causes for client port? NO PORT EXHAUSTION

stupid problem. I get those from a client connecting to a server. Sadly, the setup is complicated making debugging complex - and we run out of options.
The environment:
*Client/Server system, both running on the same machine. The client is actually a service doing some database manipulation at specific times.
* The cnonection comes from C# going through OleDb to an EasySoft JDBC driver to a custom written JDBC server that then hosts logic in C++. Yeah, compelx - but the third party supplier decided to expose the extension mechanisms for their server through a JDBC interface. Not a lot can be done here ;)
The Symptom:
At (ir)regular intervals we get a "Address already in use: connect" told from the JDBC driver. They seem to come from one particular service we run.
Now, I did read all the stuff about port exhaustion. This is why we have a little tool running now that counts ports and their states every minute. Last time this happened, we had an astonishing 370 ports in use, with the count rising to about 900 AFTER the error. We aleady patched the registry (it is a windows machine) to allow more than the 5000 client ports standard, but even then, we are far far from that limit to start with.
Which is why I am asking here. Ayneone an ide what ELSE could cause this?
It is a Windows 2003 Server machine, 64 bit. The only other thing I can see that may cause it (but this functionality is supposedly disabled) is Symantec Endpoint Protection that is installed on the server - and being capable of actinc as a firewall, it could possibly intercept network traffic. I dont want to open a can of worms by pointing to Symantec prematurely (if pointing to Symantec can ever be seen as such). So, anyone an idea what else may be the cause?
Thanks
"Address already in use", aka WSAEADDRINUSE (10048), means that when the client socket prepared to connect to the server socket, it first tried to bind itself to a specific local IP/Port pair that was already in use by another socket, either an active one or one that has been closed but is still in the FD_WAIT state. This has nothing to do with the number of ports that are available.
I'm having the same issue on a Windows 2000 Server with a .Net application connecting to a SQL Server 7.0. There's like 10 servers with the same configuration and only one is showing this error several times a day. With a small test program I'm able to reproduce the error by just establishing a TCP connection on the SQL Server listening port. Running CurrPorts (http://www.nirsoft.net/utils/cports.html) shows there's still plenty of available ports in range 1024-5000.
I'm out of ideas and would like to know if you've found a solution since you've posted your question.
Edit : I finally found the solution : a worm was present on the server (WORM_DOWNAD.A) and exhausted local ports without being noticed.

Resources