Hypothetically speaking, is it good practice to connect to a database for each request and close in when the request has completed?
I'm using mongodb with mgo for the database.
In my project, I would like to connect to a certain database by getting the database name from the request header (of course, this is combined with an authentication mechanism, e.g. JWT in my app). The flow goes something like:
User authentication:
POST to http://api.app.com/authenticate
// which checks the user in a "global" database,
// authenticates them and returns a signed JWT token
// The token is stored in bolt.db for the authentication mechanism
Some RESTful operations
POST to http://api.app.com/v1/blog/posts
// JWT middleware for each request to /v1* is set up
// `Client-Domain` in header is set to a database's name, e.g 'app-com'
// so we open a connection to that database and close when
// request finishes
So my questions are:
Is this feasible? - I've read about connection pools and reusing them but I haven't read much about them yet
Is there a better way of achieving the desired functionality?
How do I ensure the session is only closed when the request has completed?
The reason why I need to do this is because we have multiple vendors that have the same database collections with different entries with restricted access to their own databases.
Update / Solution
I ended up using Go's built in Context by Copying a session and using it anywhere I need to do any CRUD ops
Something like:
func main() {
...
// Configure connection and set in global var
model.DBSession, err = mgo.DialWithInfo(mongoDBDialInfo)
defer model.DBSession.Close()
...
n := negroni.Classic()
n.Use(negroni.HandlerFunc(Middleware))
...
}
func Middleware(res http.ResponseWriter, req *http.Request, next http.HandlerFunc) {
...
db := NewDataStore(clientDomain)
// db.Close() is an alias for ds.session.Close(), code for this function is not included in this post
// Im still experimenting with this, I need to make sure the session is only closed after a request has completed, currently it does not always do so
defer db.Close()
ctx := req.Context()
ctx = context.WithValue(ctx, auth.DataStore, db)
req = req.WithContext(ctx)
...
}
func NewDataStore(db string) *DataStore {
store := &DataStore{
db: DBSession.Copy().DB(db),
session: DBSession.Copy(),
}
return store
}
And then use it in a HandlerFunc, example /v1/system/users:
func getUsers(res http.ResponseWriter, req *http.Request) {
db := req.Context().Value(auth.DataStore).(*model.DataStore)
users := make([]SystemUser{}, 0)
// db.C() is an alias for ds.db.C(), code for this function is not included in this post
db.C("system_users").Find(nil).All(&users)
}
40% response time decrease over the original method I experimented with.
Hypothetically speaking is not a good practice because:
The database logic is scattered among several packages.
It's difficult to test
You can't apply DI (mainly it will be hard to maintain the code)
Replying to your questions:
Yes is feasible BUT you will not use the connection pool inside them go package (take a look to the code here if you want know more about Connection Pool)
A better way is to create a global variable that contains the database connection and close when the application is going to stop (and not close the connection every request)
How do I ensure the session is only closed when the request has complete<- you should checkout the answer fro your db query and then close the connection (but I don't recommend to close the connection after a request because you'll need to open again for another request and close again etc...)
Related
I’m currently maintaining a few HTTP APIs based on the standard library and gorilla mux and running in kubernetes (GKE).
We’ve adopted the http.TimeoutHandler as our “standard” way to have a consistent timeout error management.
A typical endpoint implementation will use the following “chain”:
MonitoringMiddleware => TimeoutMiddleware => … => handler
so that we can monitor a few key metrics per endpoint.
One of our API is typically used in a “fire and forget” mode meaning that clients will push some data and not care for the API response. We are facing the issue that
the Golang standard HTTP server will cancel a request context when the client connection is no longer active (godoc)
the TimeoutHandler will return a “timeout” response whenever the request context is done (see code)
This means that we are not processing requests to completion when the client disconnects which is not what we want and I’m therefore looking for solutions.
The only discussion I could find that somewhat relates to my issue is https://github.com/golang/go/issues/18527; however
The workaround is your application can ignore the Handler's Request.Context()
would mean that the monitoring middleware would not report the "proper" status since the Handler would perform the request processing in its goroutine but the TimeoutHandler would be enforcing the status and observability would be broken.
For now, I’m not considering removing our middlewares as they’re helpful to have consistency across our APIs both in terms of behaviours and observability. My conclusion so far is that I need to “fork” the TimeoutHandler and use a custom context for when an handler should not depend on the client waiting for the response or not.
The gist of my current idea is to have:
type TimeoutHandler struct {
handler Handler
body string
dt time.Duration
// BaseContext optionally specifies a function that returns
// the base context for controling if the server request processing.
// If BaseContext is nil, the default is req.Context().
// If non-nil, it must return a non-nil context.
BaseContext func(*http.Request) context.Context
}
func (h *TimeoutHandler) ServeHTTP(w ResponseWriter, r *Request) {
reqCtx := r.Context()
if h.BaseContext != nil {
reqCtx = h.BaseContext(r)
}
ctx, cancelCtx := context.WithTimeout(reqCtx, h.dt)
defer cancelCtx()
r = r.WithContext(ctx)
...
case <-reqCtx.Done():
tw.mu.Lock()
defer tw.mu.Unlock()
w.WriteHeader(499) // write status for monitoring;
// no need to write a body since no client is listening.
case <-ctx.Done():
tw.mu.Lock()
defer tw.mu.Unlock()
w.WriteHeader(StatusServiceUnavailable)
io.WriteString(w, h.errorBody())
tw.timedOut = true
}
The middleware BaseContext callback would return context.Background() for requests to the “fire and forget” endpoint.
One thing I don’t like is that in doing so I’m losing any context keys written so this new middleware would have strong usage constraints. Overall I feel like this is more complex than it should be.
Am I completely missing something obvious?
Any feedback on API instrumentation (maybe our middlewares are an antipattern) /fire and forget implementations would be welcomed!
EDIT: as most comments are that a request for which the client does not wait for a response has unspecified behavior, I checked for more information on typical clients for which this happens.
From our logs, this happens for user agents that seem to be mobile devices. I can imagine that connections can be much more unstable and the problem will likely not disappear.
I would therefore not conclude that I shouldn't find a solution since this is currently creating false-positive alerts.
There is this excellent blog post by Jack Lindamood How to correctly use context.Context in Go 1.7 which boils down to the following money quote:
Context.Value should inform, not control. This is the primary mantra that I feel should guide if you are using context.Value correctly. The
content of context.Value is for maintainers not users. It should never
be required input for documented or expected results.
Currently, I am using Context to transport the following information:
RequestID which is generated on the client-side passed to the Go backend and it solely travels through the command-chain and is then inserted in the response again. Without the RequestID in the response, the client-side would break though.
SessionID identifies the WebSocket session, this is important when certain responses are generated in asynchronous computations (e.g. worker queues) in order to identify on which WebSocket session the response should be send.
When taking the definition very seriously I would say both violate the intention of context.Context but then again their values do not change any behavior while the whole request is made, it's only relevant when generating the response.
What's the alternative? Having the context.Context for metadata in the server API actually helps to maintain lean method signatures because this data is really irrelevant to the API but only important for the transport layer which is why I am reluctant to create something like a request struct:
type Request struct {
RequestID string
SessionID string
}
and make it part of every API method which solely exists to be passed through before sending a response.
Based on my understanding context should be limited to passing things like request or session ID. In my application, I do something like below in one of my middleware. Helps with observability
if next != nil {
if requestID != "" {
b := context.WithValue(r.Context(), "requestId", requestID)
r = r.WithContext(b)
}
next.ServeHTTP(w, r)
}
Given that I want to use a different proxy per request I did the following:
var proxies chan *url.URL
var anonymousClient = &http.Client{Transport: &http.Transport{Proxy: func(r *http.Request) (*url.URL, error) {
fmt.Println("Called")
p := <-proxies
proxies <- p
return p, nil
}}}
If I make 10 get requests using the above client Called gets printed once, shouldn't it be printed out with every request?
It looks to me that the result of the first call to that function gets cached and its called only once but I can be wrong, any ideas?
From the net/http package documentation:
By default, Transport caches connections for future re-use. This may leave many open connections when accessing many hosts. This behavior can be managed using Transport's CloseIdleConnections method and the MaxIdleConnsPerHost and DisableKeepAlives fields.
Transports should be reused instead of created as needed. Transports are safe for concurrent use by multiple goroutines.
I am currently working on an application relaying data sent from a mobile phone via a server to a browser using WebSockets. I am writing the server in go and I have a one-to-one relation between the mobile phones and the browsers as shown by the following illustration.
.
However, I want multiple sessions to work simultaneously.
I have read that go provides concurrency models that follow the principle "share memory by communicating" using goroutines and channels. I would prefer using the mentioned principle rather than locks using the sync.Mutex primitive.
Nevertheless, I have not been able to map this information to my issue and wanted to ask you if you could suggest a solution.
I had a similar to your problem, I needed multiple connections which each send data to each other through multiple servers.
I went with the WAMP protocol
WAMP is an open standard WebSocket subprotocol that provides two application messaging patterns in one unified protocol:
Remote Procedure Calls + Publish & Subscribe.
You can also take a look at a project of mine which is written in go and uses the protocol at hand: github.com/neutrinoapp/neutrino
There's nothing wrong with using a mutex in Go. Here's a solution using a mutex.
Declare a map of endpoints. I assume that a string key is sufficient to identify an endpoint:
type endpoint struct {
c *websocket.Conn
sync.Mutex // protects write to c
}
var (
endpoints = map[string]*endpoint
endpointsMu sync.Mutex // protects endpoints
)
func addEndpoint(key string, c *websocket.Connection) {
endpointsMu.Lock()
endpoints[key] = &endpoint{c:c}
endpointsMu.Unlock()
}
func removeEndpoint(key string) {
endpointsMu.Lock()
delete(endpoints, key)
endpointsMu.Unlock()
}
func sendToEndpoint(key string, message []byte) error {
endpointsMu.Lock()
e := endpoints[key]
endpointsMu.Unlock()
if e === nil {
return errors.New("no endpoint")
}
e.Lock()
defer e.Unlock()
return e.c.WriteMessage(websocket.TextMessage, message)
}
Add the connection to the map with addEndpoint when the client connects. Remove the connection from the map with removeEndpoint when closing the connection. Send messages to a named endpoint with sendToEndpoint.
The Gorilla chat example can be adapted to solve this problem. Change the hub map to connections map[string]*connection, update channels to send a type with connection and key and change the broadcast loop to send to a single connection.
As the title says I don't know if having multiple sql.Open statements is a good or bad thing or what or if I should have a file with just an init that is something like:
var db *sql.DB
func init() {
var err error
db, err = sql.Open
}
just wondering what the best practice would be. Thanks!
You should at least check the error.
As mentioned in "Connecting to a database":
Note that Open does not directly open a database connection: this is deferred until a query is made. To verify that a connection can be made before making a query, use the Ping function:
if err := db.Ping(); err != nil {
log.Fatal(err)
}
After use, the database is closed using Close.
If possible, limit the number of opened connection to a database to a minimum.
See "Go/Golang sql.DB reuse in functions":
You shouldn't need to open database connections all over the place.
The database/sql package does connection pooling internally, opening and closing connections as needed, while providing the illusion of a single connection that can be used concurrently.
As elithrar points out in the comment, database.sql/#Open does mention:
The returned DB is safe for concurrent use by multiple goroutines and maintains its own pool of idle connections.
Thus, the Open function should be called just once.
It is rarely necessary to close a DB.
As mentioned here
Declaring *sql.DB globally also have some additional benefits such as SetMaxIdleConns (regulating connection pool size) or preparing SQL statements across your application.
You can use a function init, which will run even if you don't have a main():
var db *sql.DB
func init() {
db, err = sql.Open(DBparms....)
}
init() is always called, regardless if there's main or not, so if you import a package that has an init function, it will be executed.
You can have multiple init() functions per package, they will be executed in the order they show up in the code (after all variables are initialized of course).