redigo: getting dial tcp: connect: cannot assign requested address - go

I have an application that makes about 400 reads per seconds and 100 writes per second to redis (hosted on redislabs). The application is using github.com/garyburd/redigo package as a redis proxy.
I have two functions which are the only ones being used to read and write:
func getCachedVPAIDConfig(key string) chan *cachedVPAIDConfig {
c := make(chan *cachedVPAIDConfig)
go func() {
p := pool.Get()
defer p.Close()
switch p.Err() {
case nil:
item, err := redis.Bytes(p.Do("GET", key))
if err != nil {
c <- &cachedVPAIDConfig{nil, err}
return
}
c <- &cachedVPAIDConfig{item, nil}
default:
c <- &cachedVPAIDConfig{nil, p.Err()}
return
}
}()
return c
}
func setCachedVPAIDConfig(key string, j []byte) chan error {
c := make(chan error)
go func() {
p := pool.Get()
defer p.Close()
switch p.Err() {
case nil:
_, err := p.Do("SET", key, j)
if err != nil {
c <- err
return
}
c <- nil
default:
c <- p.Err()
return
}
}()
return c
}
As you can see, I'm using the recommended connection pooling mechanism (http://godoc.org/github.com/garyburd/redigo/redis#Pool).
I'm calling these functions on every http request an endpoint on the application is getting. The problem is: once the application starts getting requests, it immediately starts throwing the error
dial tcp 54.160.xxx.xx:yyyy: connect: cannot assign requested address
(54.160.xxx.xx:yyyy is the redis host)
I see on redis that there are only about 600 connections when this starts to happen, which doesn't sound like a lot.
I tried playing with the MaxActive setting of the pool, setting it anywhere between 1000 and 50K, but the result is the same.
Any ideas?
EDIT
Here's my pool initialization code (doing this in func init):
pool = redis.Pool{
MaxActive: 1000, // note: I tried changing this to 50K, result the same
Dial: func() (redis.Conn, error) {
c, err := redis.Dial("tcp", redisHost)
if err != nil {
return nil, err
}
if _, err := c.Do("AUTH", redisPassword); err != nil {
c.Close()
return nil, err
}
return c, err
},
}
Edit 2:
Issue solved by applying the stuff suggested in the answer below!
New code for pool init:
pool = redis.Pool{
MaxActive: 500,
MaxIdle: 500,
IdleTimeout: 5 * time.Second,
Dial: func() (redis.Conn, error) {
c, err := redis.DialTimeout("tcp", redisHost, 100*time.Millisecond, 100*time.Millisecond, 100*time.Millisecond)
if err != nil {
return nil, err
}
if _, err := c.Do("AUTH", redisPassword); err != nil {
c.Close()
return nil, err
}
return c, err
},
}
This new init makes it so that the get and set timeouts are handled by redigo internally, so I no longer need to return a channel on the getCachedVPAIDConfig and setCachedVPAIDConfig funcs. This is how they look now:
func setCachedVPAIDConfig(key string, j []byte) error {
p := pool.Get()
switch p.Err() {
case nil:
_, err := p.Do("SET", key, j)
p.Close()
return err
default:
p.Close()
return p.Err()
}
}
func getCachedVPAIDConfig(key string) ([]byte, error) {
p := pool.Get()
switch p.Err() {
case nil:
item, err := redis.Bytes(p.Do("GET", key))
p.Close()
return item, err
default:
p.Close()
return nil, p.Err()
}
}

You're closing the connection after sending on the channels, if the channel is blocking you're not closing connections, which would result in the error you're seeing. so don't just defer, close the connection explicitly.
I don't think it's the problem but a good idea regardless - set a timeout on your connections with DialTimeout.
Make sure you have a proper TestOnBorrow function to get rid of dead connections, especially if you have timeout. I usually do a PING if the connection has been idle for more than 3 seconds (the function receives the idle time as a parameter)
Try setting MaxIdle to a larger number as well, I remember having problems with pooling that were resolved by increasing that parameter in the pool.

Related

Setting idletimeout in Go

I have a function in go which is handling connections which are coming through tcp and handled via ssh. I am trying to set an idle timeout by creating struct in the connection function.
Use case - a customer should be able to make a connection and upload/download multiple files
Reference - IdleTimeout in tcp server
Function code:
type Conn struct {
net.Conn
idleTimeout time.Duration
}
func HandleConn(conn net.Conn) {
var err error
rAddr := conn.RemoteAddr()
session := shortuuid.New()
config := LoadSSHServerConfig(session)
blocklistItem := blocklist.GetBlockListItem(rAddr)
if blocklistItem.IsBlocked() {
conn.Close()
atomic.AddInt64(&stats.Stats.BlockedConnections, 1)
return
}
func (c *Conn) Read(b []byte) (int, error) {
err := c.Conn.SetReadDeadline(time.Now().Add(c.idleTimeout))
if err != nil {
return 0, err
}
return c.Conn.Read(b)
}
sConn, chans, reqs, err := ssh.NewServerConn(conn, config)
if err != nil {
if err == io.EOF {
log.Errorw("SSH: Handshaking was terminated", log.Fields{
"address": rAddr,
"error": err,
"session": session})
} else {
log.Errorw("SSH: Error on handshaking", log.Fields{
"address": rAddr,
"error": err,
"session": session})
}
atomic.AddInt64(&stats.Stats.AuthorizationFailed, 1)
return
}
log.Infow("connection accepted", log.Fields{
"user": sConn.User(),
})
if user, ok := users[session]; ok {
log.Infow("SSH: Connection accepted", log.Fields{
"user": user.LogFields(),
"clientVersion": string(sConn.ClientVersion())})
atomic.AddInt64(&stats.Stats.AuthorizationSucceeded, 1)
// The incoming Request channel must be serviced.
go ssh.DiscardRequests(reqs)
// Key ID: sConn.Permissions.Extensions["key-id"]
handleServerConn(user, chans)
log.Infow("connection finished", log.Fields{"user": user.LogFields()})
log.Infow("checking connections", log.Fields{
//"cc": Stats.AcceptedConnections,
"cc2": &stats.Stats.AcceptedConnections})
// Remove connection from local cache
delete(users, session)
} else {
log.Infow("user not found from memory", log.Fields{"username": sConn.User()})
}
}
This code is coming from the Listen function:
func Listen() {
listener, err := net.Listen("tcp", sshListen)
if err != nil {
panic(err)
}
if useProxyProtocol {
listener = &proxyproto.Listener{
Listener: listener,
ProxyHeaderTimeout: time.Second * 10,
}
}
for {
// Once a ServerConfig has been configured, connections can be accepted.
conn, err := listener.Accept()
if err != nil {
log.Errorw("SSH: Error accepting incoming connection", log.Fields{"error": err})
atomic.AddInt64(&stats.Stats.FailedConnections, 1)
continue
}
// Before use, a handshake must be performed on the incoming net.Conn.
// It must be handled in a separate goroutine,
// otherwise one user could easily block entire loop.
// For example, user could be asked to trust server key fingerprint and hangs.
go HandleConn(conn)
}
}
Is that even possible to set a deadline for only the connections which have been idle for 20 secinds (no upload/downloads).
EDIT 1 : Following #LiamKelly's suggestions, I have made the changes in the code. Now the code is like
type SshProxyConn struct {
net.Conn
idleTimeout time.Duration
}
func (c *SshProxyConn) Read(b []byte) (int, error) {
err := c.Conn.SetReadDeadline(time.Now().Add(c.idleTimeout))
if err != nil {
return 0, err
}
return c.Conn.Read(b)
}
func HandleConn(conn net.Conn) {
//lines of code as above
sshproxyconn := &SshProxyConn{nil, time.Second * 20}
Conn, chans, reqs, err := ssh.NewServerConn(sshproxyconn, config)
//lines of code
}
But now the issue is that SSH is not happening. I am getting the error "Connection closed" when I try to do ssh. Is it still waiting for "conn" variable in the function call?
Is that even possible to set a deadline for only the connections which have been idle for 20 [seconds]
Ok so first a general disclaimer, I am going to assume go-protoproxy implements the Conn interface as we would expected. Also as you hinted at before, I don't think you can put a a struct method inside another function (I also recommend renaming it something unique to prevent Conn vs net.Conn confusion).
type SshProxyConn struct {
net.Conn
idleTimeout time.Duration
}
func (c *SshProxyConn) Read(b []byte) (int, error) {
err := c.Conn.SetReadDeadline(time.Now().Add(c.idleTimeout))
if err != nil {
return 0, err
}
return c.Conn.Read(b)
}
func HandleConn(conn net.Conn) {
This makes is more clear what your primary issue is, which you passed the normal net.Conn to your SSH server, not your wrapper class. So
sConn, chans, reqs, err := ssh.NewServerConn(conn, config)
should be EDIT
sshproxyconn := &SshProxyConn{conn, time.Second * 20}
Conn, chans, reqs, err := ssh.NewServerConn(sshproxyconn , config)

redigo error log: write: connection reset by peer

Almost the same amount of time (point in time as redigo error log: write: connection reset by peer?), redis error log:
Client id=45183 addr=127.0.0.1:40420 fd=39 name= age=39706 idle=46 flags=N db=0 sub=8 psub=0 multi=-1 qbuf=0 qbuf-free=0 obl=16114 oll=528 omem=8545237 events=rw cmd=ping scheduled to be closed ASAP for overcoming of output buffer limits.
go error log
write tcp 127.0.0.1:40806->127.0.0.1:6379: write: connection reset by peer
Before that, the Go program didn't receive the subscription message for about 7 minutes. I presume it was a cache overflow caused by messages not being consumed.
The Redis client-output-buffer-limit is the default configuration.
The linux fd and connection count are normal, and I can't find of a reason for the unconsumable.
Here is my code:
server.go
func WaitFroMsg(ctx context.Context, pool *redis.Pool, onMessage func(channel string, data []byte) error, channel ...string) (err error) {
conn := pool.Get()
psc := redis.PubSubConn{Conn: conn}
if err := psc.Subscribe(redis.Args{}.AddFlat(channel)...); err != nil {
return err
}
done := make(chan error, 1)
go func() {
for {
switch n := psc.Receive().(type) {
case error:
done <- fmt.Errorf("redis pubsub receive err: %v", n)
return
case redis.Message:
if err = onMessage(n.Channel, n.Data); err != nil {
done <- err
return
}
case redis.Subscription:
if n.Count == 0 {
fmt.Println("all channels are unsubscribed", channel)
done <- nil
return
}
}
}
}()
const healthCheck = time.Minute
ticker := time.NewTicker(healthCheck)
defer ticker.Stop()
for {
select {
case <-ticker.C:
if err = psc.Ping(""); err != nil {
fmt.Println("healthCheck ", err, channel)
return err
}
case err := <-done:
return err
case <-ctx.Done():
if err := psc.Unsubscribe(); err != nil {
return fmt.Errorf("redis unsubscribe failed: %v", err)
}
return nil
}
}
}
pool.go
func NewPool(addr string, db int) *redis.Pool {
return &redis.Pool{
MaxIdle: 3,
IdleTimeout: 240 * time.Second,
Dial: func() (redis.Conn, error) {
c, err := redis.Dial("tcp", addr)
if err != nil {
return nil, err
}
if _, err = c.Do("SELECT", db); err != nil {
c.Close()
return nil, err
}
return c, nil
},
TestOnBorrow: func(c redis.Conn, t time.Time) error {
if time.Since(t) < time.Minute {
return nil
}
_, err := c.Do("PING")
fmt.Println("PING error", err)
return err
},
}
}

Too many open files Websocket

Process leaves too many opened file descriptor.
When i do: lsof -p $pid, most of results are (about 80%):
points2 28360 root 911u sock 0,9 0t0 42082509
protocol: TCPv6
Before FD Type turns to 'sock', it stays as CLOSE_WAIT for a while. What I noticed is that some of these 'sock' FDs disapears, some stay forever
Amount of open files increases gradually with small fluctuations till it not reaches the maximum 1024. Currently i set allowed maximum amount of open files for 4096, to make process work longer.
srvTLS := &http.Server{
Addr: utils.PortSocketTLS,
ReadTimeout: 10 * time.Second,
WriteTimeout: 15 * time.Second,
}
srvTLS.SetKeepAlivesEnabled(false)
Handler:
func WsRoom(w http.ResponseWriter, r *http.Request) {
ws, err := websocket.Upgrade(w, r, nil, 1024, 1024)
if _, ok := err.(websocket.HandshakeError); ok {
http.Error(w, "Not a websocket handshake", 400)
return
} else if err != nil {
return
}
...other stuff
}
Writer to conn
func PlayerWriter(pc *model.PlayerConn) {
ticker := time.NewTicker(utils.PingPeriod)
defer func() {
ticker.Stop()
pc.WS.Close()
}()
for {
select {
case message, ok := <-pc.Ch:
pc.WS.SetWriteDeadline(time.Now().Add(utils.WriteWait))
if !ok {
pc.WS.WriteMessage(websocket.CloseMessage, []byte{})
return
}
err := pc.WS.WriteMessage(websocket.TextMessage, message)
if err != nil {
return
}
inst := &model.PlayerLeftInstruction{}
_ = json.Unmarshal(message, inst)
if inst.Instruction == utils.UtilAFK || inst.Instruction == utils.RoomMoneyLess {
return
}
break
case <-ticker.C:
pc.WS.SetWriteDeadline(time.Now().Add(utils.WriteWait))
if err := pc.WS.WriteMessage(websocket.PingMessage, nil); err != nil {
return
}
break
}
}}
Conn Listener:
func PlayerListener(pc *model.PlayerConn) {
defer func() {
if r := recover(); r != nil {
}
close(pc.Ch)
pc.Room.Leave <- pc
pc.WS.Close()
}()
pc.WS.SetReadLimit(utils.MaxMessageSize)
pc.WS.SetReadDeadline(time.Now().Add(utils.PongWait))
pc.WS.SetPongHandler(func(string) error { pc.WS.SetReadDeadline(time.Now().Add(utils.PongWait)); return nil })
for {
_, command, err := pc.WS.ReadMessage()
if err != nil {
break
}
si := model.StatusInstruction{}
json.Unmarshal(command, &si)
z := make([]byte, len(command))
copy(z, command)
switch si.Status {
case utils.PlayerLeft:
goto Exit
case utils.MoveTurn, utils.LocalTurn, utils.LocalBetFold, utils.LocalBet, utils.MoveTurnBet:
pc.Room.RoomData.GameBridger.InstCh <- z
break
...some stuff
default:
log.Printf("Unexpected command is responsed it is: %s", string(command))
goto Exit
}
}
Exit:
}
If it's needed, i can share more code. I think problem with timeouts and something refered to this, but i don't exactly know where i'm missing
In lsof, TYPE 'sock' usually means: socket connection that didn't recived anything.
I had logical error in my code by NOT closing in some cases the established websocket connection.
Also i had goroutine leaks (however it didn't affected on lsof number), pprof helped to detect leaks.

Gorilla websocket disconnect is called two times

I'm writing a Go websocket server and I want to graceful stop the connections when my server goes down.
I have a map of active connections stored in the following variable:
var connections = make(map[string]*websocket.Conn)
My main function looks like this:
func main() {
// ... stuff ....
gracefulStop := make(chan os.Signal)
signal.Notify(gracefulStop, syscall.SIGTERM)
signal.Notify(gracefulStop, syscall.SIGINT)
signal.Notify(gracefulStop, syscall.SIGQUIT)
signal.Notify(gracefulStop, syscall.SIGKILL)
signal.Notify(gracefulStop, syscall.SIGHUP)
go func() {
sig := <-gracefulStop
log.Printf("Exiting from process due to %+v", sig)
log.Println("Closing all websocket connections")
for id, conn := range connections {
closeConnection(id, conn)
}
os.Exit(0)
}()
r := mux.NewRouter()
r.HandleFunc("/{id}", wsHandler)
err := http.ListenAndServe(fmt.Sprintf(":%d", *argPort), r)
if err != nil {
log.Println("Could not start http server")
log.Println(err)
}
}
closeConnection does 4 things:
conn.Close()
sets conn as nil
removes the id from the map
calls an AWS Lambda function
The same function is called as a defer function inside the wsHandler function, so if a client disconnects by its own, I execute function in the handler.
It's all working nicely, except that when I ctrl+c the server my closeConnection function is called two times per client, one in the graceful stop handler and the other in the wsHandler defer function.
I tried to check in my closeConnection function if the connection is still defined in connections, but it returns true both of the times.
I thought that it was due to the fact that they are called two times because they are in different goroutines, so I replaced the for loop above with just a time.Sleep(2 * time.Second), but in this case nothing happens (the closeConnection inside the wsHandler defer function is not even called).
This is what I mean:
go func() {
sig := <-gracefulStop
log.Printf("Exiting from process due to %+v", sig)
log.Println("Closing all websocket connections")
// for chargeboxIdentity, conn := range connections {
// chargeboxDisconnected(chargeboxIdentity, conn)
// }
time.Sleep(2 * time.Second)
os.Exit(0)
}()
EDIT: Here is the closeConnection function:
func closeConnection(id string, conn *websocket.Conn) {
_, ok := connections[id]
log.Println(ok)
log.Printf("%s (%s) disconnected", id, conn.RemoteAddr())
conn.WriteMessage(websocket.CloseMessage, websocket.FormatCloseMessage(websocket.CloseNormalClosure, ""))
time.Sleep(300 * time.Millisecond)
conn.Close()
conn = nil
delete(connections, id)
request := LambdaPayload{ID: id}
payload, err := json.Marshal(request)
if err != nil {
log.Println("Could not create payload for lambda call")
log.Println(err)
return
}
_, err = client.Invoke(&lambda.InvokeInput{FunctionName: aws.String(lambdaPrefix + "MainDisconnect"), Payload: payload})
if err != nil {
log.Println("Disconnect Lambda returned an error")
log.Println(err)
}
}
EDIT: Here's the wsHandler function:
func wsHandler(w http.ResponseWriter, r *http.Request) {
conn, err := upgrader.Upgrade(w, r, nil)
if err != nil {
log.Println("Could not upgrade websocket connection")
log.Println(err)
return
}
vars := mux.Vars(r)
if !clientConnected(vars["id"], conn) {
return
}
defer closeConnection(vars["id"], conn)
for {
msgType, msg, err := conn.ReadMessage()
if err != nil {
break
}
log.Printf("%s sent: %s", vars["id"], string(msg))
// ... stuff ...
}
}

Golang: Selecting DB on a RedisPool in Redigo

using redigo, I create a pool, something like so:
&redis.Pool{
MaxIdle: 80,
MaxActive: 12000, // max number of connections
Dial: func() (redis.Conn, error) {
c, err := redis.Dial("tcp", host+":"+port)
if err != nil {
panic(err.Error())
}
return c, err
}
the problem I have is that for each time I get a new connection, I need to set the db, as I use different db's of redis since I host a number of sites on the VPS.
So, something like this:
conn := pool.Get()
defer conn.Close()
conn.Do("SELECT", dbNumber) //this is the call I want to avoid
Having to select the db each time I work with redis seems redundant and also poses a problem since I use it for sessions i.e. having code that is not mine working with my redis connection from my pool makes it "impossible" to set the correct db for it.
What I would like to do is to set the dbno for the pool so that whenever somebody asks for a new connection from the pool, it comes with the correct db already set i.e. not setting it explicitly each time.
How did you solve this in your applications?
Thanks.
You can use redis.DialOption: redis.DialDatabase, redis.DialPassword
conn, err := redis.Dial("tcp", "127.0.0.1:6379", redis.DialDatabase(1))
if err != nil {
panic(err)
}
defer conn.Close()
Select the database in your dial function:
&redis.Pool{
MaxIdle: 80,
MaxActive: 12000, // max number of connections
Dial: func() (redis.Conn, error) {
c, err := redis.Dial("tcp", host+":"+port)
if err != nil {
return nil, err
}
_, err := c.Do("SELECT", dbNum)
if err != nil {
c.Close()
return nil, err
}
return c, nil
}
Also, return the error from dial instead of panicking.
If these libs don't support it, then you have two options:
submit a patch to automate this (the python lib does that, but be careful when keeping the state).
Wrap your redis pool with your own custom pool that automates this, something like (untested code, but you'll get the idea):
// a pool embedding the original pool and adding adbno state
type DbnoPool struct {
redis.Pool
dbno int
}
// "overriding" the Get method
func (p *DbnoPool)Get() Connection {
conn := p.Pool.Get()
conn.Do("SELECT", p.dbno)
return conn
}
pool := &DbnoPool {
redis.Pool{
MaxIdle: 80,
MaxActive: 12000, // max number of connections
Dial: func() (redis.Conn, error) {
c, err := redis.Dial("tcp", host+":"+port)
if err != nil {
panic(err.Error())
}
return c, err
},
3, // the db number
}
//now you call it normally
conn := pool.Get()
defer conn.Close()
Best way is to use DialOptions like DialDatabase:
redisPool = &redis.Pool{
MaxIdle: AppConfig.DefaultInt("RedisMaxPool", 10),
IdleTimeout: 240 * time.Second,
Dial: func() (redis.Conn, error) {
c, err := redis.Dial(
"tcp",
AppConfig.DefaultString("RedisPath", ":6379"),
redis.DialDatabase(AppConfig.DefaultInt("RedisDB", 1)),
)
if err != nil {
return nil, err
}
return c, err
},
TestOnBorrow: func(c redis.Conn, t time.Time) error {
_, err := c.Do("PING")
return err
},
}

Resources