ssh session open request is giving EOF eventually on ssh connection - go

I work on agent demon which runs on a node. When we start the agent, all the ssh connections to the other nodes are built and later on at particular time intervals(weekly once) we create new sessions on these ssh connections to communicate with the other nodes. but it has been observed that after 1 month around the time frame, the session creation would give EOF error.
User: config.Username,
Auth: []ssh.AuthMethod{
ssh.Password(config.Password),
},
HostKeyCallback: func(hostname string, remote net.Addr, key ssh.PublicKey) error {
return nil
},
}
client, err := ssh.Dial("tcp", config.Host+":22", sshConfig)
session, err := client.NewSession()
defer session.Close()
b, err := session.CombinedOutput(command)
When this session creation gives EOF error, is there any way to get back the ssh connection in working state?
If I consider redialing or reconnect when only if I get error on read/write then which design pattren will be useful here.
Thanks

I think the problem is connection getting broken after sometime, a few things you can try is
Create a mechanism to keep the connection alive if the server provides with an api or ping to increase connection expire time. like github.com/felixge/tcpkeepalive
Or Try re dialing the connection at a fixed time interval.

Related

Why is tls.Client failing with message:first record does not look like a TLS handshake

I'm trying to run the buildlet at https://github.com/golang/build/tree/master/cmd/coordinator
There is a locally hosted server connection that keeps failing to connect giving the error:
first record does not look like a TLS handshake
The piece of code that fails is from build/cmd/buildlet/reverse.go and it is:
tcpConn.SetDeadline(time.Now().Add(30 * time.Second))
config := &tls.Config{
ServerName: serverName,
InsecureSkipVerify: devMode,
}
conn := tls.Client(tcpConn, config)
if err := conn.Handshake(); err != nil {
return nil, fmt.Errorf("failed to handshake with coordinator: %v", err)
}
I've gathered that the connection should be established while ignoring TLS issues because the server is at localhost
I can't seem to figure out how to fix this issue.
Instructions on recreating my problem are at the above link. The only change I recommend is using
go run . -mode=dev -listen-http=localhost:8119
for the first command
InsecureSkipVerify just means that TLS certificate validation constraints are relaxed (to a point where your connection is insecure and prone to MITM attacks)
From the documentation:
If InsecureSkipVerify is true, crypto/tls accepts any certificate presented by the server and any host name in that certificate.
You still need to have a connection that uses TLS at the other end. The error that you're getting means that the other side of the connect doesn't speak TLS.
If you don't want to use TLS in devMode, then you should use the tcpConn directly while in dev mode, without wrapping it with a *tls.Conn. *tls.Conn implements net.Conn so after the handshake there shouldn't be any difference in how you use the connection, whether it has TLS or not.

Check for server reachability in Golang conn.Write

I am working on an application that tries to send some data to a remote server. Once I get the hostname, I get a connection by resolving the hostname using net.Dialer.DialContext. Once, I resolve the hostname, I keep on using conn.Write method to write data to the connection.
conn, err := d.DialContext(ctx, string(transport), addr)
_, err := client.conn.Write([]byte(msg))
Error faced: I observed that due to some issues, I was not able to ping my server. Surprisingly, conn obtained from DialContext did not complain while doing conn.Write and it kept writing to the same connection.
Can someone help me in how to modify my writing methods in order to get an error in case the destination server is not reachable?
From this UDP connection example
the best a "connected" UDP socket can do to simulate a send failure is to save the ICMP response, and return it as an error on the next write.
So try and (for testing) make a second conn.Write, to confirm that you would indeed get an error this time.

server error: rpc error: code = Unavailable desc = transport is closing" in gRPC

I have a grpc server and a client (in my blog project). when I run the server, It seems everything is ok, when I run the client, I face this error, and both server and client close.
rpc error: code = Unavailable desc = transport is closing
I think error is related to this piece of code:
func newPost(c proto_blog.BlogServiceClient) {
fmt.Println("Starting to do a Unary RPC")
req := &proto_blog.ReqNewPost{
Title: "How can we make an gRPC server?",
Content: "First You have to.....\nAt the end, you have to....",
Author: "Arsham Ahora",
Date: fmt.Sprint(time.Now()),
}
res, err := c.NewPost(context.Background(), req)
if err != nil {
log.Fatalf("Error calling greet server: %v", err)
}
log.Printf("Response from Greet: %v", res.Id)
}
** I noticed this error is not related to whether you have Unary or Streaming.
I want to list some possible reasons cause code = Unavailable desc = transport is closing on gRPC per gRPC faq, in case of someone meets those reasons.
This error means the connection the RPC is using was closed, and there are many possible reasons, including:
mis-configured transport credentials, connection failed on handshaking
bytes disrupted, possibly by a proxy in between
server shutdown
Keepalive parameters caused connection shutdown, for example if you have configured your server to terminate connections regularly to trigger DNS lookups. If this is the case, you may want to increase your MaxConnectionAgeGrace, to allow longer RPC calls to finish.
It can be tricky to debug this because the error happens on the client side but the root cause of the connection being closed is on the server side. Turn on logging on both client and server, and see if there are any transport errors.
The default logger is controlled by environment variables. Turn everything on like this:
$ export GRPC_GO_LOG_VERBOSITY_LEVEL=99
$ export GRPC_GO_LOG_SEVERITY_LEVEL=info
I found that in server code I have a code like this before returning the response:
log.Fatalf("Error happend: %v", e)
And I changed my code like this:
if e != nil {
log.Fatalf("Error happend: %v", e)
}
That error has not occurred but log.Fatalf() has broken my app.
For more details, it was not an error directly from the grpc part, it was because my app broke before returning any response to the gRPC client.
I think you sent wrong piece of code, anyway, As the error was said: "transport is closing" your connection is closed, You have to find where in your server you are exiting your server and handle that.

Does redigo reconnects to the server?

I am using Redigo to connect to redis server through golang.
redisConnection, err = redis.Dial("tcp", "...")
redisConnection.Do(..., ...)
If I restart my server, I am unable to execute any command using the same redisConnection. Shouldn't it reconnect when I execute Do again?
No, your assumption is not correct. Using the Dial function it returns a single connection when the server terminates the connection, the client is not able to reconnect.
You should use redis.Pool and it should be able to auto-reconnect when you ask for a new connection, the function is: pool.Get()
redisConnection.Err() returns a non nil value if the connection is not usable. We can Dial again in that case.

Connectex: Error connecting to a physical device

I am trying to communicate with a device (connected using ethernet) using TCP/IP connection. When a connection request is sent, I am getting an error:
dial tcp 192.168.137.10:502: connectex: A connection attempt failed because
the connected party did not properly respond after a period of time,
or established connection failed because connected host has failed to respond
But if I am connecting to the simulator (which will act as device), it is getting connected and sending me response.
I am using GO for coding. This is my code to connect to device
conn, err := net.Dial("tcp", "192.168.137.10:502")
if err != nil {
return nil, err
} else {
return conn, nil
}
Hardware Info:
Windows 10, 64 bit machine
PLC device connected over TCP/IP
I suspect that there is a problem with the server and not your client code. The fact that you aren't just getting a "connection refused" error tells me that the remote port is probably open. Chances are that the server is not performing an accept() on the incoming connection within a reasonable time.
Things that might cause this
Maximum number of connection configured on the server has been exceeded or the service is too busy.
Server has crashed
Funny firewall or another routing issue between you and the server. Some deep packet inspection firewalls sometimes cause these types of issues.
I suggest you try and do troubleshooting on the server side.

Resources