I'm thinking about a simple scheme to transfer data over a very unreliable network: Simply disable all TCP timeouts or set them to extreme values such as a day.
If the network fails temporarily (router unplugged etc.) will the TCP connection recover eventually? How quickly will that happen?
I'm not concerned about transfer speed consistency. Very long pauses are OK.
If this is OS specific then I'd like to know for Windows.
Related
Is there a way to distinguish between a lost connection and an expected timeout on reading from a serial port (/dev/tty... file descriptor) in golang 1.16 on a linux host?
All I get from two different implementations of "serial" for go are a error string holding only "EOF" in both scenarios. I tried os.IsTimeout() but there is no type associated with that error from io.file.
I need to know if I lost connection so I can reconnect (close and reopen the port). Without it is reconnecting on every timeout.
Can someone hint me to a way to distinguish these to scenarios or a different approach for handling a disconnected device?
I am working on an FTP client for kicks and I am trying to understand the workflow of data connections.
As I understand, the initial (command) connection is permanent until you quit. However, I am unsure of the data connection - is it re-initiated per-command? So you call PORT ... or PASV, get a second connection, do a LIST, get the results, connection closes, start over?
Also, do you need to call PASV (or PORT ...) again after each connection closes? It seems that when I try to test some things out using a passive connection, I cannot re-connect to the same port after the first command has returned the results and closed the data connection. I can keep calling PASV -> Data Connect -> Run Command -> Get Results -> Data Connection closed -> PASV, but it seems like it's not how it's meant to run?
Also, if someone has a good material on FTP that is more terse than the RFC I really appreciate it.
You have to open a new connection every time. It's only the closing of the connection, how you (or the server) can tell that the transfer completed (at least in the common "stream mode").
You cannot even reuse the local/remote port number combination, as when a TCP connection is closed, it enters TIME_WAIT mode and the port number combination cannot be used for some time. So for two immediately consecutive transfers you have to use a different port number combination anyway.
Refer to RFC 959, section 3.3. Data management:
Reuse of the Data Connection: When using the stream mode of data
transfer the end of the file must be indicated by closing the
connection. This causes a problem if multiple files are to be
transfered in the session, due to need for TCP to hold the
connection record for a time out period to guarantee the reliable
communication. Thus the connection can not be reopened at once.
There are two solutions to this problem. The first is to
negotiate a non-default port. The second is to use another
transfer mode.
A comment on transfer modes. The stream transfer mode is
inherently unreliable, since one can not determine if the
connection closed prematurely or not. The other transfer modes
(Block, Compressed) do not close the connection to indicate the
end of file. They have enough FTP encoding that the data
connection can be parsed to determine the end of the file.
Thus using these modes one can leave the data connection open
for multiple file transfers.
See also:
Why does FTP passive mode require a port range as opposed to only one port?
How many data channel ports do I need for an FTPS server running behind NAT?
I am new to comet,and have two questions:
I think comet will cause the TCP connection between client and server become long(than normal request/response),this will reduce server performance?(server has TCP connection size limit)
And sometimes the nature of the device or network can prevent an application from maintaining a long-lived TCP connection to a server.how comet aviod this issue?
On Linux (epoll) or BSD (kqueue), you can have hundreds of thousands of idle connections without a performance pennalty (except memory usage). The same is not true on other systems which hit the wall much earlier: because of the limited pool of Windows handles allocated for this purpose in the kernel, your applications will suffer (unless you invest in an 'unlimited' Windows Server license).
Proxy servers notably (low-end routers also), will cut idle connections after a short delay but the usual workaround is to use connection keep-alives.
Hope it helps.
I am working in an environment where we get production issues from time to time related to Oracle connections. We use ODP.NET from ASP.NET applications, and we suspect the firewall closes connections that have been in the connection pool too long.
Sometimes we get an "ORA-12571: TNS packet writer failure" error, and sometimes we get "ORA-03135: connection lost contact."
I was wondering if someone has run into this and/or has an understanding of the difference between the 2 errors.
Using a mobile phone analogy:
ORA-12571 (Failure) Means call is dropped.
ORA-03135 (Connection Lost) Other party hung up.
My understanding is that 3135 occurs when a connection is lost. This doesn't tell you why the connection was lost, though. It may have been terminated by the server because the server failed to recieve a response to a probe for a certain amount of time, and assumed that the connection was dead. Or (I'm not sure about this) the exact reverse of that: the client failed to recieve a probe response from the server for a certain amount of time, so it assumed the connection was lost. The "certain amount of time" is cotrolled by SQLNET.EXPIRE_TIME=[minutes] in sqlnet.ora.
As for 12571, my (again vague) understanding is that there was a sudden failure to send a packet during communication with the server, and that this is typically caused by some software or hardware interfering with the connection (either by design, or by error). For instance, if you pull out your ethernet cable and then try to execute a query, you'll probably get this. Or if a firewall or anti-malware application decides to block the traffic.
I am trying to simulate a scenario where connection to the server of one process is down while the connection to another server is up. Just pulling the network cable won't work in my case since I need another process connection to stay up.
Is there any tool for this kind of job? I am on Windows. Thanks!
There's a few layers which you can simulate this at. The easiest would be if your two servers listen on two distinct TCP ports. In that case, you could run two tcp proxies, and stop/pause one when you want to simulate a failure. For Windows I would suggest using tcpTrace to do this.
Another option would be to have the two servers bound to two virtual NICs, which are bridged to the physical NIC. Of course if you have two physical NICs, you could bind each server process to a different physical NIC.
At a lower level, you can ran a WAN simulator. Most simulators allow you to impair specific types of traffic or specific ports. One such simulator is Packetstorm.
One other method which I would suggest is attaching a debugger to one process, and halting all threads on the process with the debugger. Often, a process doesn't die, but gets stuck in garbage collection, or in a loop. As the sockets don't close, many 'high availability' solutions won't automatically failover.
One approach would be to mock the relevant network connection code for the purposes of testing. In this case you would probably want to mock it returning whatever it usually would if the connection was down.
A poor man's approach if you can use sleep/hibernate mode on your machine :
Set an Outbound rule in the Windows Firewall to disallow connection for a particular Program.
Already connected sockets stay connected: put the machine in sleep/hibernate mode for a brief moment to force those sockets to disconnect.
When the system is restored, the program cannot establish new connections.
New connections are made possible as soon as you disable the firewall rule.
Note that it does not simulate network outage because each connection fails immediately with an permission error. But it prevents a process to establish connections.