Some postgress connections timing-out while others don't - laravel

I have an AWS EC2 machine running a Laravel 5.2 application that connects to a Postgress 9.6 databse running in RDS. While most of the connections work, some of them are getting rejected when trying to stablish, which causes a Timeout and consequently an error in my API. I don't know what is causing them to be rejected. Also, it is very random when it happens, when it does happen it may be in any API endpoint and inside the endpoint in any query.
When the timeout is handled by PHP, it shows a message like:
SQLSTATE[08006] [7] timeout expired (SQL: ...)
Sometimes the Nginx handles the timeout and replies with a 504 Error. When Nginx handles the timeout I get an error like:
2019/04/24 09:48:18 [error] 20657#20657: *3236 upstream timed out (110: Connection timed out) while reading response header from upstream, client: {client-ip-here}, server: {my-url-here}, request: "GET {my-endpoint-here} HTTP/2.0", upstream: "fastcgi://unix:/var/run/php/php7.0-fpm.sock", host: "{}", referrer: "https://app.cartoriovirtual.com/"
All usage charts on the RDS and EC2 seems ok, I have plenty of RAM, storage, CPU and available connections for RDS. I also checked inner VPC Flows and they seem alright, however I have many IPs (listed as attackers) scanning my network interfaces, most of them been rejected. Some (to port 22) accepted but stoped at authentication, I use a .pem Key File for auth.
The RDS network interface only accepts requests from inner VPC machines. In its logs, every 5 minutes I have a Checkpoint like this:
2019-04-25 01:05:29 UTC::#:[22595]:LOG: checkpoint starting: time
2019-04-25 01:05:34 UTC::#:[22595]:LOG: checkpoint complete: wrote 43 buffers (0.1%); 0 transaction log file(s) added, 0 removed, 1 recycled; write=4.393 s, sync=0.001 s, total=4.404 s; sync files=19, longest=0.001 s, average=0.000 s; distance=16515 kB, estimate=16515 kB
Anyone has tips on how to find a solution? I looked at all possible logs that came in mind, fixed a few little issues but the error persists. I am running out of ideas.

Related

I see segment errors when issuing ddev commands (pi-hole?)

I see errors like this when issuing ddev commands:
segment 2020/03/31 11:30:15 ERROR: sending request - Post https://api.segment.io/v1/batch: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
segment 2020/03/31 11:30:15 ERROR: 2 messages dropped because they failed to be sent and the client was closed
Does it matter? What can I do about it?
This is usually a result of either really bad internet or pi-hole (or similar DNS interceptor) being active and preventing proper lookup of api.segment.io (it returns 0.0.0.0 as the IP address instead of the real address)
It does no harm but it's certainly annoying.
There are at least two solutions if pi-hole is the culprit:
Whitelist api.segment.io in pi-hole; use this command: pihole -w api.segment.io
Tell ddev not to send instrumentation messages via segment: ddev config global --instrumentation-opt-in=false
Hi have a slightly different error message:
segment 2020/08/17 09:39:08 ERROR: sending request - Post "https://api.segment.io/v1/batch": x509: certificate is valid for *.ddev.local, *.ddev.site, localhost, ddev-router, ddev-router.ddev_default, not api.segment.io
segment 2020/08/17 09:39:08 ERROR: 2 messages dropped because they failed to be sent and the client was closed
But it has the same reason: pi-hole is blocking segment.io.
I can find the blocked requests in the pi-hole log (pihole -t). And I found the domains segment.io and segment.com in one of the pi-hole default blocklists on GitHub. This list is genereated automatically and the segment.io entry comes from adaway.org. Seems like the lines where added ~8 month ago.
Like described in this answer it helps to whitlist segment.io in pi-hole or disable the reporting feature in ddev.

Do I need to open port 8300 for consul servers in different DCs?

I have created a Consul architecture that spans across different consul datacenters.
When I now open the UI on one of the consul servers, and switch via the little dropdown menu
to look at at a different datacenter the request times out. In the log I can see this error message:
2016/05/03 06:26:08 [ERR] http: Request GET /v1/internal/ui/nodes?dc=dc1-live&token=<hidden>, error: rpc error: failed to get conn: dial tcp xx.xxx.xxx.xxx:8300: i/o timeout from=xxx.xxx.xxx.xxx:53174
Does this mean I need to open port 8300 additionally to port 8302 between the servers of the different datacenters?
I ended up having port 8300 being opended and that made the error messages go away. So I conclude that it is necessary.

FTP Connection Refused (Using FTPZilla)

I have googled and searched all over but I am still having trouble getting connected to a site using the ftpzilla
I am getting this read out when I try to connect to the server using the network connection wizard
Connecting to probe.filezilla-project.org
Response: 220 FZ router and firewall tester ready
USER FileZilla
Response: 331 Give any password.
PASS 3.9.0.6
Response: 230 logged on.
Checking for correct external IP address
Retrieving external IP address from
http://ip.filezilla-project.org/ip.php
Checking for correct external IP address IP 173.56.114.112
bhd-fg-bbe-bbc
Response: 200 OK
PREP 60010
Response: 200 Using port 60010, data token 1063172065
PORT 173,56,114,112,234,106
Response: 200 PORT command successful
LIST
Response: 150 opening data connection
Response: 503 Failure of data connection.
Server sent unexpected reply.
Connection closed
The weird thing is I only get this error for this particular server and the server I use for my personal site (namecheap.com) gives me no such error. Does anyone know why this may be happening? And please try not to point me to the network configuration wiki because I have read through that and I still am at this point.
PORT 173,56,114,112,234,106
....
Response: 503 Failure of data connection.
...
please try not to point me to the network configuration wiki
You are using active mode, that is the ftp client (FileZilla) waits for a connection from the server. Obviously the server can not connect to the client which indicates that something like a firewall restricts the connection.
Since according to your description this happens only with few servers, you either use only these servers with active mode or these servers are protected by firewalls which do not allow active mode. Have you tried with passive mode?
I had a similar issue connecting and made the following changes and had success.
Go to File>>>Site Manager>>>
For my site, I changed the Encryption to "Only use plain FTP(insecure)" and had success. May you find the same success.

Redis server reports Reading from client: Connection reset on amazon ec2 c1.medium instance

I run redis2.4.16 on ec2 medium instance, the persistent is standard ebs, and i checked the redis log , found there is some log report "Reading from client: Connection reset " occurs every few hours, all my clients and server are in the same zone:ap-northeast-1a, and the operation system is ubuntu server 12.04. The client is the jredis + spring data redis 1.0.0.M4,Anyone can figure this out or give some advice, thanks!
below is the redis info command result:
redis_version:2.4.16
redis_git_sha1:00000000
redis_git_dirty:0
arch_bits:64
multiplexing_api:epoll
gcc_version:4.5.2
process_id:3265
uptime_in_seconds:2658600
uptime_in_days:30
lru_clock:561139
used_cpu_sys:29421.34
used_cpu_user:10731.37
used_cpu_sys_children:20022.24
used_cpu_user_children:75702.79
connected_clients:44
connected_slaves:1
client_longest_output_list:0
client_biggest_input_buf:0
blocked_clients:0
used_memory:1111572800
used_memory_human:1.04G
used_memory_rss:1133101056
used_memory_peak:1112071512
used_memory_peak_human:1.04G
mem_fragmentation_ratio:1.02
mem_allocator:jemalloc-3.0.0
loading:0
aof_enabled:0
changes_since_last_save:1343
bgsave_in_progress:0
last_save_time:1368760178
bgrewriteaof_in_progress:0
total_connections_received:904643
total_commands_processed:592333133
expired_keys:0
evicted_keys:0
keyspace_hits:443393839
keyspace_misses:30383206
pubsub_channels:0
pubsub_patterns:0
latest_fork_usec:359082
vm_enabled:0
role:master
slave0:xxx,online
db0:keys=364558,expires=0
As you can see from logs, redis tries to communicate with a client that has closed its connection.
Thats probably because some of your client are not closing the connection with redis after they are done with it.
This can eventually lead redis to run out of connections (depending on your connection limits and the amount of traffic you have)
An easy solution for this is to set a connection timeout (0 as 'no timeout' by default) in redis.conf so that redis will close opened connection after X seconds.
Note: you should include the output of redis config get * when asking this kind of questions ;)

How can I set the timeout on OCILogon2?

When the Oracle 10 databases are up and running fine, OCILogon2() will connect immediately. When the databases are turned off or inaccessible due to network issues - it will fail immediately.
However when our DBAs go into emergency maintenance and block incomming connections, it can take 5 to 10 minutes to timeout.
This is problematic for me since I've found that OCILogin2 isn't thread safe and we can only use it serially - and I connect to quite a few Oracle DBs. 3 blocked servers X 5-10 minutes = 15 to 30 minutes of lockup time
Does anyone know how to set the OCILogon2 connection timeout?
Thanks.
I'm currenty playing with OCI and it seems to me that it's impossible.
The only way I can think of is to use non-blocking mode. You'll need OCIServerAttach() and OCISessionBegin() instead of OCILogon() in this case. But when I tried this, OCISessionBegin() constantly returns OCI_ERROR with the following error code:
ORA-03123 operation would block
Cause: The attempted operation cannot complete now.
Action: Retry the operation later.
It looks strange and I don't yet know how to deal with it.
Possible workaround is to run your logon in another process, which you can kill after timeout...
We think we found the right file setting - but it's one of those problems where we have to wait until something rare and horrible occurs before we can verify it :-/
[sqlnet.ora]
SQLNET.OUTBOUND_CONNECT_TIMEOUT=60
From the Oracle docs..
http://download.oracle.com/docs/cd/B28359_01/network.111/b28317/sqlnet.htm#BIIFGFHI
5.2.35 SQLNET.OUTBOUND_ CONNECT _TIMEOUT
Purpose
Use the SQLNET.OUTBOUND_ CONNECT _TIMEOUT parameter to specify the time, in seconds, for a client to establish an Oracle Net connection to the database instance.
If an Oracle Net connection is not established in the time specified, the connect attempt is terminated. The client receives an ORA-12170: TNS:Connect timeout occurred error.
The outbound connect timeout interval is a superset of the TCP connect timeout interval, which specifies a limit on the time taken to establish a TCP connection. Additionally, the outbound connect timeout interval includes the time taken to be connected to an Oracle instance providing the requested service.
Without this parameter, a client connection request to the database server may block for the default TCP connect timeout duration (approximately 8 minutes on Linux) when the database server host system is unreachable.
The outbound connect timeout interval is only applicable for TCP, TCP with SSL, and IPC transport connections.
Default
None
Example
SQLNET.OUTBOUND_ CONNECT _TIMEOUT=10

Resources