dial tcp remote_ip:6379: connect: connection timed out - go

I'm using redigo for both regular commands as well as subscribing. Every few days I get this error which causes a panic.
dial tcp IP:6379: connect: connection timed out
I'm guessing there is a some lag or minor disturbance with the network which is causing the connection to time out.
How can I avoid this? I'm OK with the program waiting a few seconds until the problem is resolves, rather than panicking.
How can I avoid this? Should I define Timeouts for Dial? Such as
DialReadTimeout
DialWriteTimeout

Use DialConnectTimeout to specify a timeout for dialing a network connection or DialNetDial for complete control over dialing a network connection.
The application supplied NetDial function can set timeouts, throttle connect attempts on failure, and more.
Panics related to a dial failure are probably due to a lack of error checking in the application.
DialWriteTimeout and DialReadTimeout are dial options for specifying the timeout when writing a command to the network connection and reading a reply from the network connection respectively. These options have no bearing on timeouts during connect.

Related

No connection could be made because the target machine actively refused it. - connect(2)

require 'watir-webdriver'
begin
url='http://localhost/test/test.php'
ie =Watir::Browser.new:chrome
ie.goto url
rescue Timeout::Error
puts "time out"
ie.close
retry
end
This my php file:http:
//localhost/test/test.php
<?php
set_time_limit(90);
sleep(60);
echo "hello"
?>
output -> time-out, and after it error:
Errno::ECONNREFUSED (No connection could be made because the target machine actively refused it. - connect(2)).
Basically it should close the ie after time out and then again open ie and so on
When you try a connect to any box, there's multiple ways the connection can be handled. If you have a firewall blocking the connection, it can either DROP (as in DROP target for netfilter) or REJECT the incoming connection.
The difference:
DROP means that the incoming packet is dropped (as in on the floor). There is no reply from the target. The source does not get any information as to what happened to the packet. It can only make assumptions, but not say with certainty that the packet hasn't been swallowed by a network component en route.
REJECT means that for the incoming packet (like a SYN request for opening a connection) a reply will be generated, stating that on the port at the target server no application is listening. This means that the packet reached it's destination and has been processed (interpreted) successfully, yet there is no application to give control of the package to.
You get connection refused, meaning that the target replied, but said that a connection cannot or will not be established (actively refused). The expected timeout only occurs when the target machine does not answer and DROPs the packet.
You can see here how connections are built and established or rejected.
It's probably not a problem with waitr. From this answer ( https://stackoverflow.com/a/2972662/131051 )
If this happens always, it literally means that the machine
exists but that it has no services listening on the specified
port, or there is a firewall stopping you.

Event machine connection time out handling?

What is the standard way of handling TCP connection timeout issues in EventMachine?
The EventMachine::Connection.send_data method does not return a deferrable(Future) So there is no way to check whether send_data method is failed due to TCP connection timeout or for some other reason.

Winsock error codes 10054 and 10053

I have an application that is listening for data received from GPRS units in the field on a normal TCP connection. I'm getting Winsock 10054 and 10053 errors.
As explained by microsoft
10053 : Software caused connection abort. An established connection was aborted by the software in your host machine, possibly due to a data transmission time-out or protocol error.
and
10054 : Connection reset by peer. An existing connection was forcibly closed by the remote host. This normally results if the peer application on the remote host is suddenly stopped, the host is rebooted, or the remote host uses a hard close (see setsockopt (Windows Sockets) for more information on the SO_LINGER option on the remote socket.) This error may also result if a connection was broken due to keep-alive activity detecting a failure while one or more operations are in progress. Operations that were in progress fail with WSAENETRESET. Subsequent operations fail with WSAECONNRESET.
I'm not sure how to interpret this. How do I determine if the error is caused on the server or by the client sending the information?

keepalive timeout on unix/windows

What is the error returned on aix/linux when a connection breaks down due to keepalive activity? Is it a unique error code which can be distinguished from other socket errors?
On windows this can be either WSAECONNRESET or WSAENETRESET.
Is there a way to differentiate the error due to keepalive activity when WSAECONNRESET is returned?
WSAECONNRESET
10054
Connection reset by peer.
An existing connection was forcibly closed by the remote host. This normally results if the peer application on the remote host is suddenly stopped, the host is rebooted, the host or remote network interface is disabled, or the remote host uses a hard close (see setsockopt for more information on the SO_LINGER option on the remote socket). This error may also result if a connection was broken due to keep-alive activity detecting a failure while one or more operations are in progress. Operations that were in progress fail with WSAENETRESET. Subsequent operations fail with WSAECONNRESET.
Is there a way to differentiate the error due to keepalive activity when WSAECONNRESET is returned ?
No. The underlying condition is a 'connection reset' in all cases.

How can I set the timeout on OCILogon2?

When the Oracle 10 databases are up and running fine, OCILogon2() will connect immediately. When the databases are turned off or inaccessible due to network issues - it will fail immediately.
However when our DBAs go into emergency maintenance and block incomming connections, it can take 5 to 10 minutes to timeout.
This is problematic for me since I've found that OCILogin2 isn't thread safe and we can only use it serially - and I connect to quite a few Oracle DBs. 3 blocked servers X 5-10 minutes = 15 to 30 minutes of lockup time
Does anyone know how to set the OCILogon2 connection timeout?
Thanks.
I'm currenty playing with OCI and it seems to me that it's impossible.
The only way I can think of is to use non-blocking mode. You'll need OCIServerAttach() and OCISessionBegin() instead of OCILogon() in this case. But when I tried this, OCISessionBegin() constantly returns OCI_ERROR with the following error code:
ORA-03123 operation would block
Cause: The attempted operation cannot complete now.
Action: Retry the operation later.
It looks strange and I don't yet know how to deal with it.
Possible workaround is to run your logon in another process, which you can kill after timeout...
We think we found the right file setting - but it's one of those problems where we have to wait until something rare and horrible occurs before we can verify it :-/
[sqlnet.ora]
SQLNET.OUTBOUND_CONNECT_TIMEOUT=60
From the Oracle docs..
http://download.oracle.com/docs/cd/B28359_01/network.111/b28317/sqlnet.htm#BIIFGFHI
5.2.35 SQLNET.OUTBOUND_ CONNECT _TIMEOUT
Purpose
Use the SQLNET.OUTBOUND_ CONNECT _TIMEOUT parameter to specify the time, in seconds, for a client to establish an Oracle Net connection to the database instance.
If an Oracle Net connection is not established in the time specified, the connect attempt is terminated. The client receives an ORA-12170: TNS:Connect timeout occurred error.
The outbound connect timeout interval is a superset of the TCP connect timeout interval, which specifies a limit on the time taken to establish a TCP connection. Additionally, the outbound connect timeout interval includes the time taken to be connected to an Oracle instance providing the requested service.
Without this parameter, a client connection request to the database server may block for the default TCP connect timeout duration (approximately 8 minutes on Linux) when the database server host system is unreachable.
The outbound connect timeout interval is only applicable for TCP, TCP with SSL, and IPC transport connections.
Default
None
Example
SQLNET.OUTBOUND_ CONNECT _TIMEOUT=10

Resources