Websocket waiting for message with Timeout - go

I want to create a Websocket connection via GO. This connection follows a clearly defined pattern: The client should "authenticate" (enter data) themself immediately after creating the connection. If the client does not do it, the connection will be closed after a short period.
My current code contains this initial timeout (initTimeout) and the maximum timeout for all connections. While those timers can easily be checked, i am not sure how i can combine the timers with waiting for a message which blocks the execution.
ws, err := upgrader.Upgrade(w, r, nil)
initTimeout := time.NewTicker(time.Duration(30) * time.Second)
maxTimeout := time.NewTicker(time.Duration(45) * time.Minute)
for {
select {
case <- initTimeout.C:
ws.WriteMessage(websocket.TextMessage, []byte("No input received"))
ws.Close()
case <- maxTimeout.C:
ws.WriteMessage(websocket.TextMessage, []byte("Maximum timeout"))
ws.Close()
default:
mt, message, err := c.ReadMessage()
// will this block the timers?
}
}

Use the read deadline to implement the timeouts:
ws, err := upgrader.Upgrade(w, r, nil)
if err != nil {
// handle error
}
// Read the initial message with deadline of 30 seconds
ws.SetReadDeadline(time.Now().Add(30 * time.Second))
mt, message, err := ws.ReadMessage()
if err != nil {
// Handle the error which might be a deadline exceeded error.
}
// process the initial message
// ...
for {
// Read next message with deadline of 45 minutes
ws.SetReadDeadline(time.Now().Add(45 * time.Minute))
mt, message, err = ws.ReadMessage()
if err != nil {
// Handle the error which might be a deadline exceeded error.
}
// process message
// ....
}

Related

automatic gRPC unix reconnect after EOF

I have an application (let's call it client) connecting to another process (let's call it server) on the same machine via gRPC. The communication goes over unix socket.
If server is restarted, my client gets an EOF and does not re-establish the connection, although I expected the clientConn to handle the reconnection automatically.
Why isn't the dialer taking care of the reconnection?
I expect it to do so with the backoff params I passed.
Below some pseudo-MWE.
Run establish the initial connection, then spawns goroutineOne
goroutineOne waits for the connection to be ready and delegates the send to fooUpdater
fooUpdater streams the data, or returns in case of errors
for waitUntilReady I used the pseudo-code referenced by this answer to get a new stream.
func main() {
go func() {
if err := Run(ctx); err != nil {
log.Errorf("connection error: %v", err)
}
ctxCancel()
}()
// some wait logic
}
func Run(ctx context.Context) {
backoffConfig := backoff.Config{
BaseDelay: time.Duration(1 * time.Second),
Multiplier: backoff.DefaultConfig.Multiplier,
Jitter: backoff.DefaultConfig.Jitter,
MaxDelay: time.Duration(120 * time.Second),
}
myConn, err := grpc.DialContext(ctx,
"/var/run/foo.bar",
grpc.WithTransportCredentials(insecure.NewCredentials()),
grpc.WithConnectParams(grpc.ConnectParams{Backoff: backoffConfig, MinConnectTimeout: time.Duration(1 * time.Second)}),
grpc.WithContextDialer(func(ctx context.Context, addr string) (net.Conn, error) {
d := net.Dialer{}
c, err := d.DialContext(ctx, "unix", addr)
if err != nil {
return nil, fmt.Errorf("connection to unix://%s failed: %w", addr, err)
}
return c, nil
}),
)
if err != nil {
return fmt.Errorf("could not establish socket for foo: %w", err)
}
defer myConn.Close()
return goroutineOne()
}
func goroutineOne() {
reconnect := make(chan struct{})
for {
if ready := waitUntilReady(ctx, myConn, time.Duration(2*time.Minute)); !ready {
return fmt.Errorf("myConn: %w, timeout: %s", ErrWaitReadyTimeout, "2m")
}
go func() {
if err := fooUpdater(ctx, dataBuffer, myConn); err != nil {
log.Errorf("foo updater: %v", err)
}
reconnect <- struct{}{}
}()
select {
case <-ctx.Done():
return nil
case <-reconnect:
}
}
}
func fooUpdater(ctx context.Context, dataBuffer custom.CircularBuffer, myConn *grpc.ClientConn) error {
clientStream, err := myConn.Stream(ctx) // custom pb code, returns grpc.ClientConn.NewStream(...)
if err != nil {
return fmt.Errorf("could not obtain stream: %w", err)
}
for {
select {
case <-ctx.Done():
return nil
case data := <-dataBuffer:
if err := clientStream.Send(data); err != nil {
return fmt.Errorf("could not send data: %w", err)
}
}
}
}
func waitUntilReady(ctx context.Context, conn *grpc.ClientConn, maxTimeout time.Duration) bool {
ctx, cancel := context.WithTimeout(ctx, maxTimeout)
defer cancel()
currentState := conn.GetState()
timeoutValid := true
for currentState != connectivity.Ready && timeoutValid {
timeoutValid = conn.WaitForStateChange(ctx, currentState)
currentState = conn.GetState()
// debug print currentState -> prints IDLE
}
return currentState == connectivity.Ready
}
Debugging hints also welcome :)
Based on the provided code and information, there might be an issue with how ctx.Done is being utilized.
The ctx.Done() is being used in fooUpdater and goroutineOnefunctions. When connection breaks, I believe that the ctx.Done() gets called in both functions, with the following execution order:
Connection breaks, the ctx.Done case in the fooUpdater function gets called, exiting the function. The select statement in the goroutineOne function also executes the ctx.Done case, which exists the function, and the client doesn't reconnect.
Try debugging it to check if both select case blocks get executed, but I believe that is the issue here.
According to the GRPC documentation, the connection is re-established if there is a transient failure otherwise it fails immediately. You can try to verify that the failure is transient by printing the connectivity state.
You should print the error code also to understand Why RPC failed.
Maybe what you have tried is not considered a transient failure.
Also, according to the following entry retry logic does not work with streams: grpc-java: Proper handling of retry on client for service streaming call
Here are the links to the corresponding docs:
https://grpc.github.io/grpc/core/md_doc_connectivity-semantics-and-api.html
https://pkg.go.dev/google.golang.org/grpc#section-readme
Also, check the following entry:
Ways to wait if server is not available in gRPC from client side

PubSub isn't acknowledging messages

I have a pubsub subscription (all default settings except the number of go-routines is 1000), and for some reason messages never get acknowledged, and therefore redelivered. Redelivery is taking between 1 and 2 minutes. I'm calling message.Ack() less than 1 second after the message is received, so I don't understand what is happening. It shouldn't be because of latency between the app and pubsub itself, because after publishing a message to the topic, the message is delivered practically immediately.
The subscription has an acknowledgement deadline of 10 seconds. I tried increasing this to 120, but the same problem still occurred. I can't think of any reason why these messages aren't being acknowledged, and therefore being redelivered.
Code for reference:
if err := pubsubSubscription(client).Receive(ctx, func(lctx context.Context, message *pubsub.Message) {
log.Println("Received message") // occurs < 1s after publishing
ack := message.Ack
if err := adapters.Handle(conn, id, gatewayAddr, message.Data); err != nil {
log.Println("Will nack message")
ack = message.Nack // not reached (in this context/example)
cancel()
}
log.Println("Will ack message") // occurs ~200µs after message receipt
ack()
}); err != nil {
return fmt.Errorf("unable to subscribe to PubSub messages: %s", err)
}
To clarify, I've only published 1 message to the topic, but that callback is called every 1 or 2 minutes infinitely.
EDIT
This only occurs when the number of go-routines in the subscription receive settings is set to a number higher than runtime.NumCPU(). Is this the expected behaviour? If so, how does this work with Kubernetes (which I'm using)?
EDIT 2 -- request for full code for reproduction
const (
DefaultMaxOutstandingMessages = 1000000
DefaultMaxOutstandingBytes = 1e9
)
func SubscribeToTables(id int) error {
var opts []option.ClientOption
if sa := os.Getenv("SERVICE_ACCOUNT"); sa != "" {
opts = append(opts, option.WithCredentialsJSON([]byte(sa)))
}
ctx := context.Background()
projectID := os.Getenv("PROJECT_ID")
client, err := pubsub.NewClient(ctx, projectID, opts...)
if err != nil {
return fmt.Errorf("error creating GCP PubSub client: %s", err)
}
cctx, cancel := context.WithCancel(ctx)
go func() {
qch := make(chan os.Signal)
signal.Notify(qch, os.Interrupt, syscall.SIGTERM)
<-qch
cancel()
}()
mch := make(chan *pubsub.Message)
gatewayAddr := os.Getenv("GATEWAY_ADDRESS")
conn, err := adapters.GetGatewayConn(gatewayAddr)
if err != nil {
return fmt.Errorf("unable to connect to Gateway: %s", err)
}
go func() {
for {
select {
case message := <-mch:
if err := adapters.Handle(conn, id, gatewayAddr, message.Data); err != nil {
cancel()
return
}
message.Ack()
case <-ctx.Done():
return
}
}
}()
if err := pubsubSubscription(client).Receive(cctx, func(_ context.Context, message *pubsub.Message) {
mch <- message
}); err != nil {
return fmt.Errorf("unable to subscribe to PubSub messages: %s", err)
}
return nil
}
func pubsubSubscription(client *pubsub.Client) *pubsub.Subscription {
sub := client.Subscription(os.Getenv("SUBSCRIPTION_ID"))
sub.ReceiveSettings = pubsub.ReceiveSettings{
MaxExtension: pubsub.DefaultReceiveSettings.MaxExtension,
MaxExtensionPeriod: pubsub.DefaultReceiveSettings.MaxExtensionPeriod,
MaxOutstandingMessages: parsePubSubReceiveSetting(
"MAX_OUTSTANDING_MESSAGES",
"max outstanding messages",
DefaultMaxOutstandingMessages,
),
MaxOutstandingBytes: parsePubSubReceiveSetting(
"MAX_OUTSTANDING_BYTES",
"max outstanding bytes",
DefaultMaxOutstandingBytes,
),
NumGoroutines: parsePubSubReceiveSetting( // if this is higher than runtimie.NumCPU(), the aforementioned issue occurs
"NUM_GO_ROUTINES",
"Go-routines",
1000,
),
}
return sub
}
func parsePubSubReceiveSetting(env, name string, defaultValue int) int {
e := os.Getenv(env)
i, err := strconv.Atoi(e)
if err != nil {
log.Printf("Unable to parse number of GCP PubSub %s. Can't parse '%s' as int", name, e)
log.Printf("Using default number of %s (%d)", name, defaultValue)
return defaultValue
}
return i
}
I suspect that you exit too quickly your code. You have to cancel() the context for stopping the Receive loop and flushing the data back to PubSub.
Try to add cancel() just after your ack()

How to timeout RabbitMQConsumer if it didn`t receive response

I got code that consumes messages from RabbitMQ queue. It should fail if it doesn`t received message for some time.
msgs, err := ch.Consume(
c.ResponseQueue, // queue
"", // consumer
false, // auto-ack
false, // exclusive
false, // no-local
false, // no-wait
nil, // args
)
failOnError(err, "Failed to register a consumer")
...
loop:
for timeout := time.After(time.Second); ; {
select {
case <-timeout:
log.Printf("Failed to receive response for action %+v\n Payload: %+v\nError: %+v\n", action, body, err)
return errors.New("Failed to receive response for action")
default:
for d := range msgs {
if corrID == d.CorrelationId {
err = json.Unmarshal([]byte(uncompress(d.Body)), &v)
if err != nil {
return err
}
ch.Ack(d.DeliveryTag, false)
break loop
}
}
}
}
I took consume code from RabbitMQ manual and tried some advices for implementing timeout. I know how to do it in Java, but can`t repeat it in Golang.
Thanks in advance.
Update:
Changed select to this:
c1 := make(chan error, 1)
go func() {
for d := range msgs {
if corrID == d.CorrelationId {
err = json.Unmarshal([]byte(uncompress(d.Body)), &v)
if err != nil {
c1 <- err
}
ch.Ack(d.DeliveryTag, false)
c1 <- nil
}
}
}()
select {
case <-time.After(defaultTimeout * time.Second):
log.Printf("Failed to receive response for action %+v\n Payload: %+v\nError: %+v\n", action, body, err)
return errors.New("Failed to receive response in time for action")
case err := <-c1:
failOnError(err, "Failed to process response")
}
return err
Now it works as expected - if it doesn`t receive message with proper corellationId it will fail with timeout. Thanks for help everyone.
Your loop has a select with 2 cases: a timeout and a default branch. Upon entering the loop the timeout will not fire, so the default branch is executed.
The default branch contains a for range over the msgs channel which keeps receiving from the channel until it is closed (and all values have been received from it). Normally this shouldn't happen, so the timeout case will not be revisited (only if some error occurs and msgs is closed).
Instead inside the loop use a select with 2 cases, one timeout and one that receives only a single value from msgs. If a message is received, restart the timeout. For a restartable timer use time.Timer.
timeout := time.Second
timer := time.NewTimer(timeout)
for {
select {
case <-timer.C:
fmt.Println("timeout, returning")
return
case msg := <-msgs:
fmt.Println("received message:", msg)
// Reset timer: it must be stopped first
// (and drain its channel if it reports false)
if !timer.Stop() {
<-timer.C
}
timer.Reset(timeout)
}
}
Check this Go Playground example to see it in action.
Note that if you don't need to reset the timer once a message is received, just comment out the resetter code. Also, if no reset is needed, time.After() is simpler:
timeout := time.After(time.Second)
for {
select {
case <-timeout:
fmt.Println("timeout, returning")
return
case msg := <-msgs:
fmt.Println("received message:", msg, time.Now())
}
}
Try this one on the Go Playground.
One final note: if you would break from the loop before the timeout happens, the timer in the background would not be freed immediately (only when the timeout happens). If you need this operation frequently, you may use context.WithTimeout() to obtain a context.Context and a cancel function which you may call immediately before returning to free up the timer resource (preferably as deferred).
This is how it would look like:
ctx, cancel := context.WithTimeout(context.Background(), time.Second)
defer cancel()
for {
select {
case <-ctx.Done():
fmt.Println("timeout, returning")
return
case msg := <-msgs:
fmt.Println("received message:", msg, time.Now())
}
}
Try this one on the Go Playground.
Changed select to this:
c1 := make(chan error, 1)
go func() {
for d := range msgs {
if corrID == d.CorrelationId {
err = json.Unmarshal([]byte(uncompress(d.Body)), &v)
if err != nil {
c1 <- err
}
ch.Ack(d.DeliveryTag, false)
c1 <- nil
}
}
}()
select {
case <-time.After(defaultTimeout * time.Second):
log.Printf("Failed to receive response for action %+v\n Payload: %+v\nError: %+v\n", action, body, err)
return errors.New("Failed to receive response in time for action")
case err := <-c1:
failOnError(err, "Failed to process response")
}
return err

How to check if an error is "deadline exceeded" error?

I'm sending a request with a context which specified with a 10 seconds timeout:
ctx, cancel := context.WithTimeout(context.Background(), time.Second * 10)
defer cancel()
_, err := client.SendRequest(ctx)
if err != nil {
return 0, err
}
now when I hit that timeout the error message is confusing:
context deadline exceeded
Is it possible to check if the err is the timeout error so that I can print a nicer error message?
ctx, cancel := context.WithTimeout(context.Background(), time.Second * 10)
defer cancel()
_, err := client.SendRequest(ctx)
if err != nil {
if isTimeoutError(err) {
return nil, fmt.Errorf("the request is timeout after 10 seconds")
}
return nil, err
}
How to implement such isTimeoutError function?
The cleanest way to do this in Go 1.13+ is using the new errors.Is function.
// Create a context with a very short timeout
ctx, cancel := context.WithTimeout(context.Background(), time.Millisecond)
defer cancel()
// Create the request with it
r, _ := http.NewRequest("GET", "http://example.com", nil)
r = r.WithContext(ctx)
// Do it, it will fail because the request will take longer than 1ms
_, err := http.DefaultClient.Do(r)
log.Println(err) // Get http://example.com: context deadline exceeded
// This prints false, because the http client wraps the context.DeadlineExceeded
// error into another one with extra information.
log.Println(err == context.DeadlineExceeded)
// This prints true, because errors.Is checks all the errors in the wrap chain,
// and returns true if any of them matches.
log.Println(errors.Is(err, context.DeadlineExceeded))
You can determine if an error is the result of a context timeout by comparing the error to context.DeadlineExceeded:
if err == context.DeadlineExceeded {
// context deadline exceeded
}
You can determine if an error is any timeout error using the following function:
func isTimeoutError(err error) bool {
e, ok := err.(net.Error)
return ok && e.Timeout()
}
This function returns true all timeout errors including the value context.DeadlineExceeded. That value satisfies the net.Error interface and has a Timeout method that always returns true.

Unusually High Amount of TCP Connection Timeout Errors

I am using a Go TCP Client to connect to our Go TCP Server.
I am able to connect to the Server and run commands properly, but every so often there will be an unusually high amount of consecutive TCP connection errors reported by my TCP Client when trying to either connect to our TCP Server or sending a message once connected:
dial tcp kubernetes_node_ip:exposed_kubernetes_port:
connectex: A connection attempt failed because the connected party did not properly
respond after a period of time, or established connection failed because connected
host has failed to respond.
read tcp unfamiliar_ip:unfamiliar_port->kubernetes_node_ip:exposed_kubernetes_port
wsarecv: A connection attempt failed because the connected party did not properly
respond after a period of time, or established connection failed because connected
host has failed to respond.
I say "unusually high" because I assume that the number of times these errors occur should be very minimal (about 5 or less within the hour). Note that I am not dismissing the possibility of this being caused by connection instabilities, as I have also noticed that it is possible to run several commands in rapid succession without any errors.
However, I am still going to post my code in case I am doing something wrong.
Below is the code that my TCP Client uses to connect to our server:
serverAddress, err := net.ResolveTCPAddr("tcp", kubernetes_ip+":"+kubernetes_port)
if err != nil {
fmt.Println(err)
return
}
// Never stop asking for commands from the user.
for {
// Connect to the server.
serverConnection, err := net.DialTCP("tcp", nil, serverAddress)
if err != nil {
fmt.Println(err)
continue
}
defer serverConnection.Close()
// Added to prevent connection timeout errors, but doesn't seem to be helping
// because said errors happen within just 1 or 2 minutes.
err = serverConnection.SetDeadline(time.Now().Add(10 * time.Minute))
if err != nil {
fmt.Println(err)
continue
}
// Ask for a command from the user and convert to JSON bytes...
// Send message to server.
_, err = serverConnection.Write(clientMsgBytes)
if err != nil {
err = merry.Wrap(err)
fmt.Println(merry.Details(err))
continue
}
err = serverConnection.CloseWrite()
if err != nil {
err = merry.Wrap(err)
fmt.Println(merry.Details(err))
continue
}
// Wait for a response from the server and print...
}
Below is the code that our TCP Server uses to accept client requests:
// We only supply the port so the IP can be dynamically assigned:
serverAddress, err := net.ResolveTCPAddr("tcp", ":"+server_port)
if err != nil {
return err
}
tcpListener, err := net.ListenTCP("tcp", serverAddress)
if err != nil {
return err
}
defer tcpListener.Close()
// Never stop listening for client requests.
for {
clientConnection, err := tcpListener.AcceptTCP()
if err != nil {
fmt.Println(err)
continue
}
go func() {
// Add client connection to Job Queue.
// Note that `clientConnections` is a buffered channel with a size of 1500.
// Since I am the only user connecting to our server right now, I do not think
// this is a channel blocking issue.
clientConnections <- clientConnection
}()
}
Below is the code that our TCP Server uses to process client requests:
defer clientConnection.Close()
// Added to prevent connection timeout errors, but doesn't seem to be helping
// because said errors happen within just 1 or 2 minutes.
err := clientConnection.SetDeadline(time.Now().Add(10 * time.Minute))
if err != nil {
return err
}
// Read full TCP message.
// Does not stop until an EOF is reported by `CloseWrite()`
clientMsgBytes, err := ioutil.ReadAll(clientConnection)
if err != nil {
err = merry.Wrap(err)
return nil, err
}
// Process the message bytes...
My questions are:
Am I doing something wrong in the above code, or is the above decent enough for basic TCP Client-Server operations?
Is it okay that both the TCP Client and TCP Server have code that defers closing their one connection?
I seem to recall that calling defer inside a loop does nothing. How do I properly close Client connections before starting new ones?
Some extra information:
Said errors are not logged by the TCP Server, so aside from
connection instabilities, this might also be a
Kubernetes/Docker-related issue.
It seems this piece of code does not act as you think it does. The defer statement on the connection close will only happen when the function returns, not when an iteration ends. So as far as I can see here, you are creating a lot of connections on the client side, it could be the problem.
serverAddress, err := net.ResolveTCPAddr("tcp", kubernetes_ip+":"+kubernetes_port)
if err != nil {
fmt.Println(err)
return
}
// Never stop asking for commands from the user.
for {
// Connect to the server.
serverConnection, err := net.DialTCP("tcp", nil, serverAddress)
if err != nil {
fmt.Println(err)
continue
}
defer serverConnection.Close()
// Added to prevent connection timeout errors, but doesn't seem to be helping
// because said errors happen within just 1 or 2 minutes.
err = serverConnection.SetDeadline(time.Now().Add(10 * time.Minute))
if err != nil {
fmt.Println(err)
continue
}
// Ask for a command from the user and send to the server...
// Wait for a response from the server and print...
}
I suggest to write it this way:
func start() {
serverAddress, err := net.ResolveTCPAddr("tcp", kubernetes_ip+":"+kubernetes_port)
if err != nil {
fmt.Println(err)
return
}
for {
if err := listen(serverAddress); err != nil {
fmt.Println(err)
}
}
}
func listen(serverAddress string) error {
// Connect to the server.
serverConnection, err := net.DialTCP("tcp", nil, serverAddress)
if err != nil {
fmt.Println(err)
continue
}
defer serverConnection.Close()
// Never stop asking for commands from the user.
for {
// Added to prevent connection timeout errors, but doesn't seem to be helping
// because said errors happen within just 1 or 2 minutes.
err = serverConnection.SetDeadline(time.Now().Add(10 * time.Minute))
if err != nil {
fmt.Println(err)
return err
}
// Ask for a command from the user and send to the server...
// Wait for a response from the server and print...
}
}
Also, you should keep a single connection open, or a pool of connections, instead of opening and closing the connection right away. Then when you send a message you get a connection from the pool (or the single connection), and you write the message and wait for the response, then you release the connection to the pool.
Something like that:
res, err := c.Send([]byte(`my message`))
if err != nil {
// handle err
}
// the implementation of send
func (c *Client) Send(msg []byte) ([]byte, error) {
conn, err := c.pool.Get() // returns a connection from the pool or starts a new one
if err != nil {
return nil, err
}
// send your message and wait for response
// ...
return response, nil
}

Resources