DNS resolution timeouts after upgrade to go1.5 - go

After upgrading from go1.3 to go1.5.2, I have been experiencing DNS request timeouts in my connections which go through ResolveTCPAddr() API in net library. I dug a little deeper to see what's going wrong in go1.5 because there's was a mention of DNS resolution implementation change in release notes of go1.5.
https://golang.org/doc/go1.5#net
In go1.5, for DNS resolution of unknown address, there's first a UDP dial and then a TCP dial to make the DNS request.
https://github.com/golang/go/blob/release-branch.go1.5/src/net/dnsclient_unix.go#L134
Sometimes, the read on UDP conn for a dns request is timing out (which could be for whatever reason being UDP) and my connection establishment is terribly slowing down (1ms becomes 5sec sometimes) randomly.
This made me force --netcgo flag at compile time to revert to old behavior mentioned in the release notes. But I want to resolve this without have to force anything that's not default.
Is this a known issue? Did anybody else run into this? I also want to understand why its making a udp connection first and what it did earlier and what changed for this to act differently now.

Related

Enable TCP keepalive on port open by another program

On a Debian machine I'm using an OPCUA server https://github.com/FreeOpcUa/opcua-asyncio. The server does not give the possibility to enable TCP keepalive on the port opened by the server.
Basically, I want to know if it's possible to start the server then in another script, enable the tcp keepalive on that port.
I also found some other information from Redhat https://access.redhat.com/solutions/19029, and https://access.redhat.com/solutions/25773 (requires you to sign up to see the articles). But again I'm still lost as to what to do exactly.
I'll keep reading up on this, but so far I've spent about 10 hours trying to figure out whether it's even possible. So I thought I should ask for some help.
Any advice is welcome, thanks!
For operations of socket of another process socket must be shared from it https://docs.python.org/3/library/socket.html#socket.socket.share or duplicated.
Its easier to patch your server for keepalive.

What does "Blocked" really mean in the Firefox developer tools Network monitoring?

The timing section of the Firefox Network Monitor documentation, "Blocked" is explained as:
Time spent in a queue waiting for a network connection.
The browser imposes a limit on the number of simultaneous connections that can be made to a single server. In Firefox this defaults to 6
Is the limit on the number connections the only limitation? Or is the browser blocked waiting to get a connection from the OS count as blocked too?
In a fresh browser, on a first connection, before any other connection is made (so the limit should not apply here), I get blocked for 195 ms.
Is this the browser waiting for the OS? Was does "Blocked" mean here?
We changed the Firefox setting (about:config) 'network.http.max-persistent-connections-per-server' to 64 and the blocks went away. We changed it back to 6. We changed our design/development method to a more 'asynchronous' loading method so as not to have a large number simultaneous connections. The blocks were mostly loading a lot of png flags for locale settings.
I have a server that takes several seconds to respond, which allowed me to cross-reference the firefox measurement with a wireshark trace. I see that the first SYN is sent out immediately. The end of the "Blocked" time corresponds to when the Server Hello comes back.
I couldn't relate the end of "TLS setup" to any wireshark packet. It extends a few seconds belong the last data that is exchanged on the initial TLS connection.
Bottom line: it doesn't look like the time spent in "Blocked" and "TLS setup" is very reliable, at least in some cases.
My setup has a TLS reverse proxy that forwards the connection with SNI. I'm not sure if that might be related.
Time spent in a queue waiting for a network connection.
The browser imposes a limit on the number of simultaneous connections
that can be made to a single server. In Firefox this defaults to 6,
but can be changed using the
network.http.max-persistent-connections-per-server preference. If all
connections are in use, the browser can't download more resources
until a connection is released.
Source : https://developer.mozilla.org/en-US/docs/Tools/Network_Monitor
It's very clear that the browser fixes the limit to 6 concurrent connections per server (domains/IP), the OS question is not very relevent.
In my case both waiting for network connection and DNS lookup times were pretty high, up to 2 seconds each, caused significant page load times if the page was loaded for the first time. Firefox was freshly installed without addons and just started with no other opened tabs. I tried on both Ubuntu 18.04 LTS and Ubuntu 19.04 with the same results. Although my ISP doesn't provide support, my router assignes IPv6 addresses. As it turned out the problem was the IPv6 broken network, which forced Firefox to fall back to IPv4 (of course after some time(time-out)). After I turned off the IPv6 support in Linux the requests speeded up significantly.
Here is a relavant discussion: https://bugzilla.mozilla.org/show_bug.cgi?id=1452028
I encountered this error whilst using an Angular 9 'dist' deployment. I discovered that the error appeared because I was trying to access an unreachable API, according to the specified IP address and port.
Therefore to solve it, I just have to reference a valid and accessible API.

packet_write_wait: Connection to xxx.xxx.xxx.xxx: Broken pipe

What does it mean when the terminal throw this error and how to solve it?
packet_write_wait: Connection to xxx.xxx.xxx.xxx: Broken pipe
It was just happen today. After it work normally for year.
My terminal keep disconnect at a certain time. I had already search on google but most of it is about "Write failed: Broken pipe."
Which I already solved that for years. I just found this new annoyed problems today
I experienced this problem as well and spent a few days trying to bisect it.
Like specified, playing with SSH KeepAlive parameters (ClientAliveInterval, ClientAliveCountMax, ServerAliveInterval and ServerAliveCountMax) or kernel TCP parameters (TCPKeepAlive on/off) does not solve the problem.
After playing with USB to Ethernet drivers and tcpdump, I realized the issue was due to the kernel 4.8 I was using. I switched the source (sending side) to 4.4 LTS and the problem disappeared (rsync via ssh and scp were working nicely again). The destination side can remain on 4.8 if you want, in my use case this was working (tested).
On the technical side, we can narrow a little bit the issue thanks to the wireshark dump below I made. We can see the TCP channel of the SSHv2 protocol is being reset (RST flag of TCP set to 1) causing the connection to abort. I don't know the cause of that RST yet. I need to make some bisection from 4.8.1 to 4.8.11 for that.
I'm not saying your problem is specifically due to the kernel 4.8, but wrt. the date you posted your question/message, there are high chances you are currently using a kernel more recent than 4.4.
If that is an ssh connection, then you might want to make sure you send a keepalive message to the server.
ServerAliveInterval seems to be the most common strategy to keep a connection alive. To prevent the broken pipe problem, here is the ssh config I useed in my .ssh/ssh_config file (may be named as /etc/ssh/config or sshd_config):
Host myhostshortcut
HostName myhost.com
User barthelemy
ServerAliveInterval 60
ServerAliveCountMax 10
Connect through another wifi.
I don't know why or how it works, but it does.
The original poster sthapaun already mentioned this solution in a comment, but I want to add that the solution works for me, too.

Stale connection with Pheanstalk

I'm using beanstalkd to offload some work to other machines. The setup is a bit unusual, the server is on the internet (public ip) but the consumers are behind adsl lines on some peoples homes. So there is a linux server as client going out through a dynamic ip and connecting to the server to get a job. It's all PHP and I'm using pheanstalk library.
Everything runs smoothly for some time, but then the adsl changes the IP (every 24h hours the provider forces a disconnect-reconnect) the client just hangs, never to go out of "reserve".
I thought that putting a timeout on the reserve would help it, but it didn't. As it seems, the client issues a command and blocks, it never checks the timeout. It just issues a reserve-with-timeout (instead of a simple reserve) and it is the servers responsibility to return a TIME_OUT as the timeout occurs. The problem is, the connection is broken (but the TCP/IP doesn't know about that yet until any of the sides try to talk to the other side) and if the client blocked reading, it will never return.
The library seems to have support for some kind of timeouts locally (for example when trying to connect to server), but it does not seem to contemplate this scenario.
How could I detect the stale connection and force a reconnect? Is there some kind of keepalive on the protocol (and on the pheanstalk itself)?
Thanks!
You could try to close each connection right after the request is answered and reopen a new connection each time.
There is no close() function but you deleting the Pheanstaly Object with unset($pheanstalk) will close it.
This explanation is quite helpful:
Pheanstalk (PHP client for beanstalk) - how do connections work?
I haven't tried it yet, but I came up with the idea of connecting to the beanstalk server through an SSH tunnel. We can enable the ServerAliveCountMax and ServerAliveInterval options on the tunnel, so that a network or server failure will cause the tunnel to close. This should then cause the pheanstalk client to report an error.

What is the difference between "ORA-12571: TNS packet writer failure" and "ORA-03135: connection lost contact"?

I am working in an environment where we get production issues from time to time related to Oracle connections. We use ODP.NET from ASP.NET applications, and we suspect the firewall closes connections that have been in the connection pool too long.
Sometimes we get an "ORA-12571: TNS packet writer failure" error, and sometimes we get "ORA-03135: connection lost contact."
I was wondering if someone has run into this and/or has an understanding of the difference between the 2 errors.
Using a mobile phone analogy:
ORA-12571 (Failure) Means call is dropped.
ORA-03135 (Connection Lost) Other party hung up.
My understanding is that 3135 occurs when a connection is lost. This doesn't tell you why the connection was lost, though. It may have been terminated by the server because the server failed to recieve a response to a probe for a certain amount of time, and assumed that the connection was dead. Or (I'm not sure about this) the exact reverse of that: the client failed to recieve a probe response from the server for a certain amount of time, so it assumed the connection was lost. The "certain amount of time" is cotrolled by SQLNET.EXPIRE_TIME=[minutes] in sqlnet.ora.
As for 12571, my (again vague) understanding is that there was a sudden failure to send a packet during communication with the server, and that this is typically caused by some software or hardware interfering with the connection (either by design, or by error). For instance, if you pull out your ethernet cable and then try to execute a query, you'll probably get this. Or if a firewall or anti-malware application decides to block the traffic.

Resources