Golang grpc: how to recover the grpc server from panic? - go

I have a handler like this:
func (daemon *Daemon) List(item string) (map[string][]string, error) {
panic("this is a panic")
...
I just want to recover this potential panic, so in my grpc server I wrote:
// Serve serves gRPC request by goroutines
func (s *ServerRPC) Serve(addr string) error {
defer func() {
if err := recover(); err != nil {
glog.Errorf("Recovered from err: ", err)
}
}()
l, err := net.Listen("tcp", addr)
if err != nil {
glog.Fatalf("Failed to listen %s: %v", addr, err)
return err
}
return s.server.Serve(l)
}
But it does not capture the panic, I thought it was because my panic happened in child goroutines? If so, how can I recover my grpc server from crashing properly?

Take a look at the recovery interceptor in the Golang gRPC Middlewares package. It recovers from panics in the request Goroutine and returns them as an internal gRPC error.
myServer := grpc.NewServer(
grpc.StreamInterceptor(grpc_middleware.ChainStreamServer(
grpc_zap.StreamServerInterceptor(zapLogger),
grpc_auth.StreamServerInterceptor(myAuthFunction),
grpc_recovery.StreamServerInterceptor(),
)),
grpc.UnaryInterceptor(grpc_middleware.ChainUnaryServer(
grpc_zap.UnaryServerInterceptor(zapLogger),
grpc_auth.UnaryServerInterceptor(myAuthFunction),
grpc_recovery.UnaryServerInterceptor(),
)),
)

Related

Cancelling a net.Listener via Context in Golang

I'm implementing a TCP server application that accepts incoming TCP connections in an infinite loop.
I'm trying to use Context throughout the application to allow shutting down, which is generally working great.
The one thing I'm struggling with is cancelling a net.Listener that is waiting on Accept(). I'm using a ListenConfig which, I believe, has the advantage of taking a Context when then creating a Listener. However, cancelling this Context does not have the intended effect of aborting the Accept call.
Here's a small app that demonstrates the same problem:
package main
import (
"context"
"fmt"
"net"
"time"
)
func main() {
lc := net.ListenConfig{}
ctx, cancel := context.WithCancel(context.Background())
go func() {
time.Sleep(2*time.Second)
fmt.Println("cancelling context...")
cancel()
}()
ln, err := lc.Listen(ctx, "tcp", ":9801")
if err != nil {
fmt.Println("error creating listener:", err)
} else {
fmt.Println("listen returned without error")
defer ln.Close()
}
conn, err := ln.Accept()
if err != nil {
fmt.Println("accept returned error:", err)
} else {
fmt.Println("accept returned without error")
defer conn.Close()
}
}
I expect that, if no clients connect, when the Context is cancelled 2 seconds after startup, the Accept() should abort. However, it just sits there until you Ctrl-C out.
Is my expectation wrong? If so, what is the point of the Context passed to ListenConfig.Listen()?
Is there another way to achieve the same goal?
I believe you should be closing the listener when your timeout runs out. Then, when Accept returns an error, check that it's intentional (e.g. the timeout elapsed).
This blog post shows how to do a safe shutdown of a TCP server without a context. The interesting part of the code is:
type Server struct {
listener net.Listener
quit chan interface{}
wg sync.WaitGroup
}
func NewServer(addr string) *Server {
s := &Server{
quit: make(chan interface{}),
}
l, err := net.Listen("tcp", addr)
if err != nil {
log.Fatal(err)
}
s.listener = l
s.wg.Add(1)
go s.serve()
return s
}
func (s *Server) Stop() {
close(s.quit)
s.listener.Close()
s.wg.Wait()
}
func (s *Server) serve() {
defer s.wg.Done()
for {
conn, err := s.listener.Accept()
if err != nil {
select {
case <-s.quit:
return
default:
log.Println("accept error", err)
}
} else {
s.wg.Add(1)
go func() {
s.handleConection(conn)
s.wg.Done()
}()
}
}
}
func (s *Server) handleConection(conn net.Conn) {
defer conn.Close()
buf := make([]byte, 2048)
for {
n, err := conn.Read(buf)
if err != nil && err != io.EOF {
log.Println("read error", err)
return
}
if n == 0 {
return
}
log.Printf("received from %v: %s", conn.RemoteAddr(), string(buf[:n]))
}
}
In your case you should call Stop when the context runs out.
If you look at the source code of TCPConn.Accept, you'll see it basically calls the underlying socket accept, and the context is not piped through there. But Accept is simple to cancel by closing the listener, so piping the context all the way isn't strictly necessary.

Golang grpc: how to determine when the server has started listening?

So I have the following:
type Node struct {
Table map[string]string
thing.UnimplementedGreeterServer
address string
}
func (n *Node) Start() {
lis, err := net.Listen("tcp", port)
if err != nil {
log.Fatalf("failed to listen: %v", err)
}
s := grpc.NewServer()
thing.RegisterGreeterServer(s, n)
if err := s.Serve(lis); err != nil {
log.Fatalf("failed to serve: %v", err)
}
}
In my main function I'll spin up mulitple nodes like so:
func main() {
n :=Node{Table: map[string]string{}}
go n.Start()
conn, err := grpc.Dial("localhost:50051", grpc.WithInsecure(), grpc.WithBlock())
}
The problem is, because I'm spinning up the node concurrently, there's a chance the dial up connection might not work because the node might not have been setup yet.
Ideally, I'd like a done channel that tells me when the grpc server has actually started listening. How do I accomplish this?
This is essntially the same problem as How to add hook on golang grpc server start? which doesn't have an answer
s.Serve(listener) blocks, so you can't achieve your purpose by having a done chan, instead you have to implement the healthcheck and readiness for your service, and check those before performing any request by the client.
The server should implement the following proto:
syntax = "proto3";
package grpc.health.v1;
message HealthCheckRequest {
string service = 1;
}
message HealthCheckResponse {
enum ServingStatus {
UNKNOWN = 0;
SERVING = 1;
NOT_SERVING = 2;
SERVICE_UNKNOWN = 3; // Used only by the Watch method.
}
ServingStatus status = 1;
}
service Health {
rpc Check(HealthCheckRequest) returns (HealthCheckResponse);
rpc Watch(HealthCheckRequest) returns (stream HealthCheckResponse);
}
For example, the envoy proxy grpc_health_check works with the above proto.
Read GRPC Health Checking Protocol for more information.
The server can be Dialed as soon as net.Listen returns a nil error. Dial will block until the server calls Accept (which will happen somewhere in s.Serve in this case).
Either move creation of the listener into the caller and pass it as an argument:
func (n *Node) Start(lis net.Listener) {
s := grpc.NewServer()
thing.RegisterGreeterServer(s, n)
if err := s.Serve(lis); err != nil {
log.Fatalf("failed to serve: %v", err)
}
}
func main() {
lis, err := net.Listen("tcp", port)
if err != nil {
log.Fatalf("failed to listen: %v", err)
}
n := Node{Table: map[string]string{}}
go n.Start(lis)
conn, err := grpc.Dial("localhost:50051", grpc.WithInsecure(), grpc.WithBlock())
}
Or signal that the listener is up after Listen returns:
func (n *Node) Start(up chan struct{}) {
lis, err := net.Listen("tcp", port)
if err != nil {
log.Fatalf("failed to listen: %v", err)
}
if up != nil {
close(up)
}
s := grpc.NewServer()
thing.RegisterGreeterServer(s, n)
if err := s.Serve(lis); err != nil {
log.Fatalf("failed to serve: %v", err)
}
}
func main() {
n := Node{Table: map[string]string{}}
up := make(chan struct{})
go n.Start(up)
<-up
conn, err := grpc.Dial("localhost:50051", grpc.WithInsecure(), grpc.WithBlock())
}
For all those who are still looking for an answer to this, here is another simple way to do it. Start the server in a child routine. Here is a code snippet:
// Start the server in a child routine
go func() {
if err := s.Serve(listener); err != nil {
log.Fatalf("Failed to serve: %v", err)
}
}()
fmt.Println("Server succesfully started on port :50051")
In my case I am using MongoDB as well, so when you run it, you get:
grpc-go-mongodb-cobra>go run server/main.go
Starting server on port :50051...
Connecting to MongoDB...
Connected to MongoDB
Server succesfully started on port :50051
I have also written a Blog post on this, with working code in GitHub. Here is the link: https://softwaredevelopercentral.blogspot.com/2021/03/golang-grpc-microservice.html

Log from withing gRPC/Go handlers

I am trying to debug code by printing/logging from within a gRPC/Go function but the output does not print to the terminal.
fmt.Println() works in my main() method, but not in the server callbacks. How can I print from within these gRPC handlers? I'm open to other methods for debugging if there's a different approach too.
Here's the related code from my main() function. Logging and printing works here:
func main() {
...
lis, err := net.Listen("tcp", fmt.Sprintf("0.0.0.0:%s", os.Getenv("APP_PORT")))
if err != nil {
logger.Error(fmt.Sprintf("Failed start on port: %s", os.Getenv("APP_PORT")), err)
log.Fatalf("Failed to start: %v", err)
}
opts := []grpc.ServerOption{}
s := grpc.NewServer(opts...)
pb.RegisterAuthServiceServer(s, &server{})
// Register reflection service on gRPC server.
reflection.Register(s)
go func() {
fmt.Println("Starting Server...")
if err := s.Serve(lis); err != nil {
log.Fatalf("failed to server: %v", err)
}
}()
// Wait for Control C to exit
ch := make(chan os.Signal, 1)
signal.Notify(ch, os.Interrupt)
// Run until a signal is received
<-ch
fmt.Println("")
fmt.Println("Stopping the server")
s.Stop()
fmt.Println("Closing the listener")
lis.Close()
Then here is one of my handlers, where logging or printing does not output to the terminal where I started the server.
func (*server) AuthUser(ctx context.Context, req *pb.AuthUserRequest) (*pb.AccessToken, error) {
fmt.Println("AuthUser***")
...

Synchronizing a Test Server During Tests

Summary: I'm running into a race condition during testing where my server is not reliably ready to serve requests before making client requests against it. How can I block only until the listener is ready, and still maintain composable public APIs without requiring users to BYO net.Listener?
We see the following error as the goroutine that spins up our (blocking) server in the background isn't listening before we call client.Do(req) in the TestRun test function.
--- FAIL: TestRun/Server_accepts_HTTP_requests (0.00s)
/home/matt/repos/admission-control/server_test.go:64: failed to make a request: Get https://127.0.0.1:37877: dial tcp 127.0.0.1:37877: connect: connection refused
I'm not using httptest.Server directly as I'm attempting to test the blocking & cancellation characteristics of my own server componenent.
I create an httptest.NewUnstartedServer, clone its *tls.Config into a new http.Server after starting it with StartTLS(), and then close it, before calling *AdmissionServer.Run(). This also has the benefit of giving me a *http.Client with the matching RootCAs configured.
Testing TLS is important here as the daemon this exposes lives in a TLS-only environment.
func newTestServer(ctx context.Context, t *testing.T) *httptest.Server {
testHandler := http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
fmt.Fprintln(w, "OK")
})
testSrv := httptest.NewUnstartedServer(testHandler)
admissionServer, err := NewServer(nil, &noopLogger{})
if err != nil {
t.Fatalf("admission server creation failed: %s", err)
return nil
}
// We start the test server, copy its config out, and close it down so we can
// start our own server. This is because httptest.Server only generates a
// self-signed TLS config after starting it.
testSrv.StartTLS()
admissionServer.srv = &http.Server{
Addr: testSrv.Listener.Addr().String(),
Handler: testHandler,
TLSConfig: testSrv.TLS.Clone(),
}
testSrv.Close()
// We need a better synchronization primitive here that doesn't block
// but allows the underlying listener to be ready before
// serving client requests.
go func() {
if err := admissionServer.Run(ctx); err != nil {
t.Fatalf("server returned unexpectedly: %s", err)
}
}()
return testSrv
}
// Test that we can start a minimal AdmissionServer and handle a request.
func TestRun(t *testing.T) {
testSrv := newTestServer(context.TODO(), t)
t.Run("Server accepts HTTP requests", func(t *testing.T) {
client := testSrv.Client()
req, err := http.NewRequest(http.MethodGet, testSrv.URL, nil)
if err != nil {
t.Fatalf("request creation failed: %s", err)
}
resp, err := client.Do(req)
if err != nil {
t.Fatalf("failed to make a request: %s", err)
}
// Later sub-tests will test cancellation propagation, signal handling, etc.
For posterity, this is our composable Run function, that listens in a goroutine and then blocks on our cancellation & error channels in a for-select:
type AdmissionServer struct {
srv *http.Server
logger log.Logger
GracePeriod time.Duration
}
func (as *AdmissionServer) Run(ctx context.Context) error {
sigChan := make(chan os.Signal, 1)
defer close(sigChan)
signal.Notify(sigChan, os.Interrupt, syscall.SIGTERM)
// run in goroutine
errs := make(chan error)
defer close(errs)
go func() {
as.logger.Log(
"msg", fmt.Sprintf("admission control listening on '%s'", as.srv.Addr),
)
if err := as.srv.ListenAndServeTLS("", ""); err != nil && err != http.ErrServerClosed {
errs <- err
as.logger.Log(
"err", err.Error(),
"msg", "the server exited",
)
return
}
return
}()
// Block indefinitely until we receive an interrupt, cancellation or error
// signal.
for {
select {
case sig := <-sigChan:
as.logger.Log(
"msg", fmt.Sprintf("signal received: %s", sig),
)
return as.shutdown(ctx, as.GracePeriod)
case err := <-errs:
as.logger.Log(
"msg", fmt.Sprintf("listener error: %s", err),
)
// We don't need to explictly call shutdown here, as
// *http.Server.ListenAndServe closes the listener when returning an error.
return err
case <-ctx.Done():
as.logger.Log(
"msg", fmt.Sprintf("cancellation received: %s", ctx.Err()),
)
return as.shutdown(ctx, as.GracePeriod)
}
}
}
Notes:
There is a (simple) constructor for an *AdmissionServer: I've left it out for brevity. The AdmissionServer is composable and accepts a *http.Server so that it can be plugged into existing applications easily.
The wrapped http.Server type that we create a listener from doesn't itself expose any way to tell if its listening; at best we can try to listen again and catch the error (e.g. port already bound to another listener), which does not seem robust as the net package doesn't expose a useful typed error for this.
You can just attempt to connect to the server before starting the test suite, as part of the initialization process.
For example, I usually have a function like this in my tests:
// waitForServer attempts to establish a TCP connection to localhost:<port>
// in a given amount of time. It returns upon a successful connection;
// ptherwise exits with an error.
func waitForServer(port string) {
backoff := 50 * time.Millisecond
for i := 0; i < 10; i++ {
conn, err := net.DialTimeout("tcp", ":"+port, 1*time.Second)
if err != nil {
time.Sleep(backoff)
continue
}
err = conn.Close()
if err != nil {
log.Fatal(err)
}
return
}
log.Fatalf("Server on port %s not up after 10 attempts", port)
}
Then in my TestMain() I do:
func TestMain(m *testing.M) {
go startServer()
waitForServer(serverPort)
// run the suite
os.Exit(m.Run())
}

writing a client over websocket in golang

I'm writing an client that dials a websocket, and waits to receive information. I can successfully dial the websocket, but I cannot figure out how to implement some kind of asynchronous callback using the "golang.org/x/net/websocket" package. Is this even possible, or should I use the Gorilla package?
Use a for loop to read a websocket using the Gorilla package:
for {
_, message, err := c.ReadMessage()
if err != nil {
log.Println("read:", err)
c.Close()
break
}
handleMessage(message)
}
Run the loop in a goroutine to make it asynchronous:
go func() {
for {
_, message, err := c.ReadMessage()
if err != nil {
log.Println("read:", err)
c.Close()
break
}
handleMessage(message)
}
}()

Resources