Golang - instability with Gob over TCP - go

I am having an issue with instability when using gob for TCP communication.
The main problem is that if i transfer data "to fast" i either get errors where the server terminates the connection or the package simply doesn't arrive at the server. Now, if i add a delay of 20ms between packages, everything runs as expected.
Sadly i cannot link this to playground as i am running it in three different libraries, but i have inserted the essential/culprit code.
My guess is that if i don't have a delay timer i am overwriting the stream. Any ideas?
Update: By swapping out the receiver to bufio.NewReader(c.socket) i am actually able to get an error: "gob: unknown type id or corrupted data" which somewhat like the same issue as https://github.com/golang/go/issues/1238#event-242248991
//client -> server, running as a goroutine
func (c *Client) transmitter() {
d := 20 * time.Millisecond
timer := time.NewTimer(d) //silly hack
for {
select {
case gd := <- c.broadcast:
<- timer.C
Write(c.socket, gd.Data, gd.NetworkDataType)
timer.Reset(d) //refreshing timer
}
}
}
//server <- client, running as a goroutine
func (c *client) receiver (s *Server){
for {
in, err := Read(c.socket)
if err != nil {
log.println(err)
}
//doing stuff
}
}
func Read(conn net.Conn) (GeneralData, error) {
in := GeneralData{}
if err := gob.NewDecoder(conn).Decode(&in); err != nil {
return in, err
}
return in, nil
}
func Write(conn net.Conn, data interface{}, ndt NetworkDataType) {
err := gob.NewEncoder(conn).Encode(GeneralData{ndt,data})
if err != nil {
log.Println("error in write: ", err)
}
}

Related

automatic gRPC unix reconnect after EOF

I have an application (let's call it client) connecting to another process (let's call it server) on the same machine via gRPC. The communication goes over unix socket.
If server is restarted, my client gets an EOF and does not re-establish the connection, although I expected the clientConn to handle the reconnection automatically.
Why isn't the dialer taking care of the reconnection?
I expect it to do so with the backoff params I passed.
Below some pseudo-MWE.
Run establish the initial connection, then spawns goroutineOne
goroutineOne waits for the connection to be ready and delegates the send to fooUpdater
fooUpdater streams the data, or returns in case of errors
for waitUntilReady I used the pseudo-code referenced by this answer to get a new stream.
func main() {
go func() {
if err := Run(ctx); err != nil {
log.Errorf("connection error: %v", err)
}
ctxCancel()
}()
// some wait logic
}
func Run(ctx context.Context) {
backoffConfig := backoff.Config{
BaseDelay: time.Duration(1 * time.Second),
Multiplier: backoff.DefaultConfig.Multiplier,
Jitter: backoff.DefaultConfig.Jitter,
MaxDelay: time.Duration(120 * time.Second),
}
myConn, err := grpc.DialContext(ctx,
"/var/run/foo.bar",
grpc.WithTransportCredentials(insecure.NewCredentials()),
grpc.WithConnectParams(grpc.ConnectParams{Backoff: backoffConfig, MinConnectTimeout: time.Duration(1 * time.Second)}),
grpc.WithContextDialer(func(ctx context.Context, addr string) (net.Conn, error) {
d := net.Dialer{}
c, err := d.DialContext(ctx, "unix", addr)
if err != nil {
return nil, fmt.Errorf("connection to unix://%s failed: %w", addr, err)
}
return c, nil
}),
)
if err != nil {
return fmt.Errorf("could not establish socket for foo: %w", err)
}
defer myConn.Close()
return goroutineOne()
}
func goroutineOne() {
reconnect := make(chan struct{})
for {
if ready := waitUntilReady(ctx, myConn, time.Duration(2*time.Minute)); !ready {
return fmt.Errorf("myConn: %w, timeout: %s", ErrWaitReadyTimeout, "2m")
}
go func() {
if err := fooUpdater(ctx, dataBuffer, myConn); err != nil {
log.Errorf("foo updater: %v", err)
}
reconnect <- struct{}{}
}()
select {
case <-ctx.Done():
return nil
case <-reconnect:
}
}
}
func fooUpdater(ctx context.Context, dataBuffer custom.CircularBuffer, myConn *grpc.ClientConn) error {
clientStream, err := myConn.Stream(ctx) // custom pb code, returns grpc.ClientConn.NewStream(...)
if err != nil {
return fmt.Errorf("could not obtain stream: %w", err)
}
for {
select {
case <-ctx.Done():
return nil
case data := <-dataBuffer:
if err := clientStream.Send(data); err != nil {
return fmt.Errorf("could not send data: %w", err)
}
}
}
}
func waitUntilReady(ctx context.Context, conn *grpc.ClientConn, maxTimeout time.Duration) bool {
ctx, cancel := context.WithTimeout(ctx, maxTimeout)
defer cancel()
currentState := conn.GetState()
timeoutValid := true
for currentState != connectivity.Ready && timeoutValid {
timeoutValid = conn.WaitForStateChange(ctx, currentState)
currentState = conn.GetState()
// debug print currentState -> prints IDLE
}
return currentState == connectivity.Ready
}
Debugging hints also welcome :)
Based on the provided code and information, there might be an issue with how ctx.Done is being utilized.
The ctx.Done() is being used in fooUpdater and goroutineOnefunctions. When connection breaks, I believe that the ctx.Done() gets called in both functions, with the following execution order:
Connection breaks, the ctx.Done case in the fooUpdater function gets called, exiting the function. The select statement in the goroutineOne function also executes the ctx.Done case, which exists the function, and the client doesn't reconnect.
Try debugging it to check if both select case blocks get executed, but I believe that is the issue here.
According to the GRPC documentation, the connection is re-established if there is a transient failure otherwise it fails immediately. You can try to verify that the failure is transient by printing the connectivity state.
You should print the error code also to understand Why RPC failed.
Maybe what you have tried is not considered a transient failure.
Also, according to the following entry retry logic does not work with streams: grpc-java: Proper handling of retry on client for service streaming call
Here are the links to the corresponding docs:
https://grpc.github.io/grpc/core/md_doc_connectivity-semantics-and-api.html
https://pkg.go.dev/google.golang.org/grpc#section-readme
Also, check the following entry:
Ways to wait if server is not available in gRPC from client side

Setting idletimeout in Go

I have a function in go which is handling connections which are coming through tcp and handled via ssh. I am trying to set an idle timeout by creating struct in the connection function.
Use case - a customer should be able to make a connection and upload/download multiple files
Reference - IdleTimeout in tcp server
Function code:
type Conn struct {
net.Conn
idleTimeout time.Duration
}
func HandleConn(conn net.Conn) {
var err error
rAddr := conn.RemoteAddr()
session := shortuuid.New()
config := LoadSSHServerConfig(session)
blocklistItem := blocklist.GetBlockListItem(rAddr)
if blocklistItem.IsBlocked() {
conn.Close()
atomic.AddInt64(&stats.Stats.BlockedConnections, 1)
return
}
func (c *Conn) Read(b []byte) (int, error) {
err := c.Conn.SetReadDeadline(time.Now().Add(c.idleTimeout))
if err != nil {
return 0, err
}
return c.Conn.Read(b)
}
sConn, chans, reqs, err := ssh.NewServerConn(conn, config)
if err != nil {
if err == io.EOF {
log.Errorw("SSH: Handshaking was terminated", log.Fields{
"address": rAddr,
"error": err,
"session": session})
} else {
log.Errorw("SSH: Error on handshaking", log.Fields{
"address": rAddr,
"error": err,
"session": session})
}
atomic.AddInt64(&stats.Stats.AuthorizationFailed, 1)
return
}
log.Infow("connection accepted", log.Fields{
"user": sConn.User(),
})
if user, ok := users[session]; ok {
log.Infow("SSH: Connection accepted", log.Fields{
"user": user.LogFields(),
"clientVersion": string(sConn.ClientVersion())})
atomic.AddInt64(&stats.Stats.AuthorizationSucceeded, 1)
// The incoming Request channel must be serviced.
go ssh.DiscardRequests(reqs)
// Key ID: sConn.Permissions.Extensions["key-id"]
handleServerConn(user, chans)
log.Infow("connection finished", log.Fields{"user": user.LogFields()})
log.Infow("checking connections", log.Fields{
//"cc": Stats.AcceptedConnections,
"cc2": &stats.Stats.AcceptedConnections})
// Remove connection from local cache
delete(users, session)
} else {
log.Infow("user not found from memory", log.Fields{"username": sConn.User()})
}
}
This code is coming from the Listen function:
func Listen() {
listener, err := net.Listen("tcp", sshListen)
if err != nil {
panic(err)
}
if useProxyProtocol {
listener = &proxyproto.Listener{
Listener: listener,
ProxyHeaderTimeout: time.Second * 10,
}
}
for {
// Once a ServerConfig has been configured, connections can be accepted.
conn, err := listener.Accept()
if err != nil {
log.Errorw("SSH: Error accepting incoming connection", log.Fields{"error": err})
atomic.AddInt64(&stats.Stats.FailedConnections, 1)
continue
}
// Before use, a handshake must be performed on the incoming net.Conn.
// It must be handled in a separate goroutine,
// otherwise one user could easily block entire loop.
// For example, user could be asked to trust server key fingerprint and hangs.
go HandleConn(conn)
}
}
Is that even possible to set a deadline for only the connections which have been idle for 20 secinds (no upload/downloads).
EDIT 1 : Following #LiamKelly's suggestions, I have made the changes in the code. Now the code is like
type SshProxyConn struct {
net.Conn
idleTimeout time.Duration
}
func (c *SshProxyConn) Read(b []byte) (int, error) {
err := c.Conn.SetReadDeadline(time.Now().Add(c.idleTimeout))
if err != nil {
return 0, err
}
return c.Conn.Read(b)
}
func HandleConn(conn net.Conn) {
//lines of code as above
sshproxyconn := &SshProxyConn{nil, time.Second * 20}
Conn, chans, reqs, err := ssh.NewServerConn(sshproxyconn, config)
//lines of code
}
But now the issue is that SSH is not happening. I am getting the error "Connection closed" when I try to do ssh. Is it still waiting for "conn" variable in the function call?
Is that even possible to set a deadline for only the connections which have been idle for 20 [seconds]
Ok so first a general disclaimer, I am going to assume go-protoproxy implements the Conn interface as we would expected. Also as you hinted at before, I don't think you can put a a struct method inside another function (I also recommend renaming it something unique to prevent Conn vs net.Conn confusion).
type SshProxyConn struct {
net.Conn
idleTimeout time.Duration
}
func (c *SshProxyConn) Read(b []byte) (int, error) {
err := c.Conn.SetReadDeadline(time.Now().Add(c.idleTimeout))
if err != nil {
return 0, err
}
return c.Conn.Read(b)
}
func HandleConn(conn net.Conn) {
This makes is more clear what your primary issue is, which you passed the normal net.Conn to your SSH server, not your wrapper class. So
sConn, chans, reqs, err := ssh.NewServerConn(conn, config)
should be EDIT
sshproxyconn := &SshProxyConn{conn, time.Second * 20}
Conn, chans, reqs, err := ssh.NewServerConn(sshproxyconn , config)

Cancelling a net.Listener via Context in Golang

I'm implementing a TCP server application that accepts incoming TCP connections in an infinite loop.
I'm trying to use Context throughout the application to allow shutting down, which is generally working great.
The one thing I'm struggling with is cancelling a net.Listener that is waiting on Accept(). I'm using a ListenConfig which, I believe, has the advantage of taking a Context when then creating a Listener. However, cancelling this Context does not have the intended effect of aborting the Accept call.
Here's a small app that demonstrates the same problem:
package main
import (
"context"
"fmt"
"net"
"time"
)
func main() {
lc := net.ListenConfig{}
ctx, cancel := context.WithCancel(context.Background())
go func() {
time.Sleep(2*time.Second)
fmt.Println("cancelling context...")
cancel()
}()
ln, err := lc.Listen(ctx, "tcp", ":9801")
if err != nil {
fmt.Println("error creating listener:", err)
} else {
fmt.Println("listen returned without error")
defer ln.Close()
}
conn, err := ln.Accept()
if err != nil {
fmt.Println("accept returned error:", err)
} else {
fmt.Println("accept returned without error")
defer conn.Close()
}
}
I expect that, if no clients connect, when the Context is cancelled 2 seconds after startup, the Accept() should abort. However, it just sits there until you Ctrl-C out.
Is my expectation wrong? If so, what is the point of the Context passed to ListenConfig.Listen()?
Is there another way to achieve the same goal?
I believe you should be closing the listener when your timeout runs out. Then, when Accept returns an error, check that it's intentional (e.g. the timeout elapsed).
This blog post shows how to do a safe shutdown of a TCP server without a context. The interesting part of the code is:
type Server struct {
listener net.Listener
quit chan interface{}
wg sync.WaitGroup
}
func NewServer(addr string) *Server {
s := &Server{
quit: make(chan interface{}),
}
l, err := net.Listen("tcp", addr)
if err != nil {
log.Fatal(err)
}
s.listener = l
s.wg.Add(1)
go s.serve()
return s
}
func (s *Server) Stop() {
close(s.quit)
s.listener.Close()
s.wg.Wait()
}
func (s *Server) serve() {
defer s.wg.Done()
for {
conn, err := s.listener.Accept()
if err != nil {
select {
case <-s.quit:
return
default:
log.Println("accept error", err)
}
} else {
s.wg.Add(1)
go func() {
s.handleConection(conn)
s.wg.Done()
}()
}
}
}
func (s *Server) handleConection(conn net.Conn) {
defer conn.Close()
buf := make([]byte, 2048)
for {
n, err := conn.Read(buf)
if err != nil && err != io.EOF {
log.Println("read error", err)
return
}
if n == 0 {
return
}
log.Printf("received from %v: %s", conn.RemoteAddr(), string(buf[:n]))
}
}
In your case you should call Stop when the context runs out.
If you look at the source code of TCPConn.Accept, you'll see it basically calls the underlying socket accept, and the context is not piped through there. But Accept is simple to cancel by closing the listener, so piping the context all the way isn't strictly necessary.

Go webserver performance drastically drops as number of requests increases

I'm benchmarking a simple webserver written in Go using wrk. The server is running on a machine with 4GB RAM. At the beginning of the test, the performance is very good with the code serving up to 2000 requests/second. But as time goes on, memory utilized by the process creeps up and once it reaches 85% (I'm checking this using top), the throughput drops to ~100 requests/second. The throughput again increases to optimal amounts once I restart the server.
Is the degradation of performance due to memory issues? Why isn't Go releasing this memory? My Go server looks like this:
func main() {
defer func() {
// Wait for all messages to drain out before closing the producer
p.Flush(1000)
p.Close()
}()
http.HandleFunc("/endpoint", handler)
log.Fatal(http.ListenAndServe(":8080", nil))
}
In the handler, I convert the incoming Protobuf message to a Json and write it to Kafka using confluent Kafka Go library.
var p, err = kafka.NewProducer(&kafka.ConfigMap{
"bootstrap.servers": "abc-0.com:6667,abc-1.com:6667",
"message.timeout.ms": "30000",
"sasl.kerberos.keytab": "/opt/certs/TEST.KEYTAB",
"sasl.kerberos.principal": "TEST#TEST.ABC.COM",
"sasl.kerberos.service.name": "kafka",
"security.protocol": "SASL_PLAINTEXT",
})
var topic = "test"
func handler(w http.ResponseWriter, r *http.Request) {
body, _ := ioutil.ReadAll(r.Body)
// Deserialize byte[] to Protobuf message
protoMessage := &tutorial.REALTIMEGPS{}
_ := proto.Unmarshal(body, protoMessage)
// Convert Protobuf to Json
realTimeJson, _ := convertProtoToJson(protoMessage)
_, err := fmt.Fprintf(w, "")
if err != nil {
log.Fatal(responseErr)
}
// Send to Kafka
produceMessage([]byte(realTimeJson))
}
func produceMessage(message []byte) {
// Delivery report
go func() {
for e := range p.Events() {
switch ev := e.(type) {
case *kafka.Message:
if ev.TopicPartition.Error != nil {
log.Println("Delivery failed: ", ev.TopicPartition)
} else {
log.Println("Delivered message to ", ev.TopicPartition)
}
}
}
}()
// Send message
_ := p.Produce(&kafka.Message{
TopicPartition: kafka.TopicPartition{Topic: &topic, Partition: kafka.PartitionAny},
Value: message,
}, nil)
}
func convertProtoToJson(pb proto.Message) (string, error) {
marshaler := jsonpb.Marshaler{}
json, err := marshaler.MarshalToString(pb)
return json, err
}
The problem is that at the end of each of your request you call produceMessage(), which sends a message to kafka, and launches a goroutine to receive events for checking errors.
Your code launches goroutines non-stop as incoming requests come in, and they don't end until something goes wrong with your kafka client. This requires more and more memory, and likely more and more CPU as more goroutines are to be scheduled.
Don't do this. A single goroutine is sufficient for delivery report purposes. Launch a single goroutine when your p variable is set, and you're good to go.
For example:
var p *kafka.Producer
func init() {
var err error
p, err = kafka.NewProducer(&kafka.ConfigMap{
// ...
}
if err != nil {
// Handle error
}
// Delivery report
go func() {
for e := range p.Events() {
switch ev := e.(type) {
case *kafka.Message:
if ev.TopicPartition.Error != nil {
log.Println("Delivery failed: ", ev.TopicPartition)
} else {
log.Println("Delivered message to ", ev.TopicPartition)
}
}
}
}()
}

Accept a persistent tcp connection in Golang Server

I am experimenting with Go - and would like to create a TCP server which I can telnet to, send commands and receive responses.
const (
CONN_HOST = "localhost"
CONN_PORT = "3333"
CONN_TYPE = "tcp"
)
func main() {
listener, err := net.Listen(CONN_TYPE, fmt.Sprintf("%s:%s", CONN_HOST, CONN_PORT))
if err != nil {
log.Panicln(err)
}
defer listener.Close()
for {
conn, err := listener.Accept()
if err != nil {
log.Panicln(err)
}
go handleRequest(conn)
}
}
func handleRequest(conn net.Conn) {
buffer := make([]byte, 1024)
length, err := conn.Read(buffer)
if err != nil {
log.Panicln(err)
}
str := string(buffer[:length])
fmt.Println(conn.RemoteAddr().String())
fmt.Printf("Received command %d\t:%s\n", length, str)
switch str {
case "PING\r\n":
sendResponse("PONG", conn)
case "PUSH\r\n":
sendResponse("GOT PUSH", conn)
default:
conn.Write([]byte(fmt.Sprintf("UNKNOWN_COMMAND: %s\n", str)))
}
conn.Close() // closes the connection
}
func sendResponse(res string, conn net.Conn) {
conn.Write([]byte(res+"\n"))
}
The above snippet will close the connection every time, kicking me out of the terminal session. But what I actually want, is to be able to keep the connection open for more I/O operations. If I simply remove the conn.Close(), then the server appears to hang somewhere as it does not get any more responses.
The way I have resolved this is to have my handleRequest method endlessly loop so that it never exits till it receives a QUIT\r\n message. Is this appropriate - or is there a better way of achieving?
func handleRequest(conn net.Conn) {
for {
log.Println("Handling Request")
buffer := make([]byte, 1024)
length, err := conn.Read(buffer)
if err != nil {
log.Panicln(err)
}
str := string(buffer[:length])
fmt.Println(conn.RemoteAddr().String())
fmt.Printf("Received command %d\t:%s\n", length, str)
switch str {
case "PING\r\n":
sendResponse("PONG", conn)
case "PUSH\r\n":
sendResponse("GOT PUSH", conn)
case "QUIT\r\n":
sendResponse("Goodbye", conn)
conn.Close()
default:
conn.Write([]byte(fmt.Sprintf("UNKNOWN_COMMAND: %s\n", str)))
}
}
}
Your second example with the loop is already what you want. You simply loop and read as long as you want (or probably until some read/write timeout or an external cancellation signal).
However it still has an error in it:
TCP gives you a stream of bytes, where it is not guaranteed that one write from a side will yield exactly one read on the other side with the same data length. This means if the client writes PING\r\n you could still receive only PI in the first read. You could fix that by using a bufio.Scanner and always read up to the first newline.
Not sure if this is what you're looking for. Taken from net/http implementation, wrapping your net.TCPListener's Accept method.
tcpKeepAliveListener{listener.(*net.TCPListener)}
type tcpKeepAliveListener struct {
*net.TCPListener
}
func (ln tcpKeepAliveListener) Accept() (c net.Conn, err error) {
tc, err := ln.AcceptTCP()
if err != nil {
return
}
tc.SetKeepAlive(true)
tc.SetKeepAlivePeriod(3 * time.Minute)
return tc, nil
}
Refer : Link 1 & Link 2

Resources