Reload tensorflow model in Golang app server - go

I have a Golang app server wherein I keep reloading a saved tensorflow model every 15 minutes. Every api call that uses the tensorflow model, takes a read mutex lock and whenever I reload the model, I take a write lock. Functionality wise, this works fine but during the model load, my API response time increases as the request threads keep waiting for the write lock to be released. Could you please suggest a better approach to keep the loaded model up to date?
Edit, Code updated
Model Load Code:
tags := []string{"serve"}
// load from updated saved model
var m *tensorflow.SavedModel
var err error
m, err = tensorflow.LoadSavedModel("/path/to/model", tags, nil)
if err != nil {
log.Errorf("Exception caught while reloading saved model %v", err)
destroyTFModel(m)
}
if err == nil {
ModelLoadMutex.Lock()
defer ModelLoadMutex.Unlock()
// destroy existing model
destroyTFModel(TensorModel)
TensorModel = m
}
Model Use Code(Part of the API request):
config.ModelLoadMutex.RLock()
defer config.ModelLoadMutex.RUnlock()
scoreTensorList, err = TensorModel.Session.Run(map[tensorflow.Output]*tensorflow.Tensor{
UserOp.Output(0): uT,
DataOp.Output(0): nT},
[]tensorflow.Output{config.SumOp.Output(0)},
nil,
)

Presumably destroyTFModel takes a long time. You could try this:
old := TensorModel
ModelLoadMutex.Lock()
TensorModel = new
ModelLoadMutex.Unlock()
go destroyTFModel(old)
So destroy after assign and/or try destroying on another goroutine if it needs to clean up resources and somehow takes a long time blocking this response. I'd look into what you're doing in destroyTFModel and why it is slow though, does it make network requests to the db or involve the file system? Are you sure there isn't another lock external to your app you're not aware of (for example if it had to open a file and locked it for reads while destroying this model?).
Instead of using if err == nil { around it, consider returning on error.

Related

How to start & stop heartbeat per session using context.WithCancel?

I'm implementing currently the Golang client for TypeDB and struggle with their session based heartbeat convention. Usually, you implement heartbeat per client so that's relatively easy, just run a gorountine in the background and send a heartbeat every few seconds.
TypeDB, however, chose to implement heartbeat (they call it pulse) on a per session base. which means, every time a new session gets created, I have to start monitoring that session with a separate GoRoutine. Conversely, if the client closes a session, I have to stop the monitoring. What's particularly ugly, I also have to check for stalled session every once in a while. There is is GH issue to switch over to per client heartbeat, but no ETA so I have to make session heartbeat work to prevent serve side session termination.
So far, my solution:
Create a new session
Open that session & check for error
If no error, add session to a hashmap keyed by session ID
This seems to work for now. Code, just for context is here:
https://github.com/marvin-hansen/typedb-client-go/blob/main/src/client/v2/manager_session.go
For monitoring each session, I am mulling over two issues:
Chanel close over multiple gorountines is a bit tricky and may lead to race conditions.
I would need some kind of error group to catch heartbeat failures i.e. in case the server shuts down or a network link error.
With all that in mind, I believe a context.WithCancel might be safe & sane solution.
What I came up so far is this:
Pass the global context as parameter to the heartbeat function
Create a new context WithCancel for each session calling heartbeat
Run heartbeat in a GoRoutine until either cancel gets called (by stopMonitoring) or or error occurs
What's not so clear to me is, how do I track all the cancel functions returned from each tracked session as to ensure I am closing the right GoRotuine matching the session to close ?
Thank you for any hint to solve this.
The code:
func (s SessionManager) startMonitorSession(sessionID []byte) {
// How do I track each goRoutine per session
}
func (s SessionManager) stopMonitorSession(sessionID []byte) {
// How do I call the correct cancel function to stop the GoRoutine matching the session?
}
func (s SessionManager) runHeartbeat(ctx context.Context, sessionID []byte) context.CancelFunc {
// Create a new context, with its cancellation function from the original context
ctx, cancel := context.WithCancel(ctx)
go func() {
select {
case <-ctx.Done():
fmt.Println("Stopped monitoring session: ")
default:
err := s.sendPulseRequest(sessionID)
// If this operation returns an error
// cancel all operations using this local context created above
if err != nil {
cancel()
}
fmt.Println("done")
}
}()
// return cancel function for call site to close at a later stage
return cancel
}
func (s SessionManager) sendPulseRequest(sessionID []byte) error {
mtd := "sendPulse: "
req := requests.GetSessionPulseReq(sessionID)
res, pulseErr := s.client.client.SessionPulse(s.client.ctx, req)
if pulseErr != nil {
dbgPrint(mtd, "Heartbeat error. Close session")
return pulseErr
}
if res.Alive == false {
dbgPrint(mtd, "Server not alive anymore. Close session")
closeErr := s.CloseSession(sessionID)
if closeErr != nil {
return closeErr
}
}
// no error
return nil
}
Update:
Thanks to the comment(s) I managed to solve the bulk of the issue by wrapping session & CancelFunc in a dedicated struct, called TypeDBSession.
That way, the stop function simply pulls the CancelFunc from the struct, calls it, and stops the monitoring GoRoutine. With some more tweaking, tests seems to pass although this is not concurrency safe for the time being.
That being said, this was a non-trivial issue to solve. Again, but thanks to the comments!
If any one is open to suggesting some code improvements especially w.r.t to make this concurrency safe, feel free to comment here or fill a GH issue / PR.
SessionType:
https://github.com/marvin-hansen/typedb-client-go/blob/main/src/client/v2/manager_session_type.go
SessionMonitoring:
https://github.com/marvin-hansen/typedb-client-go/blob/main/src/client/v2/manager_session_monitor.go
Tests:
https://github.com/marvin-hansen/typedb-client-go/tree/main/test/client/session
My two cents:
You may need run the hearbeat repeatedly. Use a for with a time.Ticker around the select
Store a map session id —> func() to track all cancellable context. Perhaps you should convert the id to string

Should there be a new datastore.Client per HTTP request?

The official Go documentation on the datastore package (client library for the GCP datastore service) has the following code snippet for demonstartion:
type Entity struct {
Value string
}
func main() {
ctx := context.Background()
// Create a datastore client. In a typical application, you would create
// a single client which is reused for every datastore operation.
dsClient, err := datastore.NewClient(ctx, "my-project")
if err != nil {
// Handle error.
}
k := datastore.NameKey("Entity", "stringID", nil)
e := new(Entity)
if err := dsClient.Get(ctx, k, e); err != nil {
// Handle error.
}
old := e.Value
e.Value = "Hello World!"
if _, err := dsClient.Put(ctx, k, e); err != nil {
// Handle error.
}
fmt.Printf("Updated value from %q to %q\n", old, e.Value)
}
As one can see, it states that the datastore.Client should ideally only be instantiated once in an application. Now given that the datastore.NewClient function requires a context.Context object does it mean that it should get instantiated only once per HTTP request or can it safely be instantiated once globally with a context.Background() object?
Each operation requires a context.Context object again (e.g. dsClient.Get(ctx, k, e)) so is that the point where the HTTP request's context should be used?
I'm new to Go and can't really find any online resources which explain something like this very well with real world examples and actual best practice patterns.
You may use any context.Context for the datastore client creation, it may be context.Background(), that's completely fine. Client creation may be lengthy, it may require connecting to a remote server, authenticating, fetching configuration etc. If your use case has limited time, you may pass a context with timeout to abort the operation. Also if creation takes longer than the time you have, you may use a context with cancel and abort the mission at your will. These are just options which you may or may not use. But the "tools" are given via context.Context.
Later when you use the datastore.Client during serving (HTTP) client requests, then using the request's context is reasonable, so if a request gets cancelled, then so will its context, and so will the datastore operation you issue, rightfully, because if the client cannot see the result, then there's no point completing the query. Terminating the query early you might not end up using certain resources (e.g. datastore reads), and you may lower the server's load (by aborting jobs whose result will not be sent back to the client).

Measure upload speed when using http.ResponseBody

Is there a way to measure a client's download speed when uploading a large quantity of data using an http.ResponseWriter?
Update for context: I'm writing a streaming download endpoint for blob storage which stores blobs in chunks. The files are very large, so loading and buffering whole blobs is not feasible. Being able to monitor the buffer state, bytes written or similar would allow better scheduling of the chunk downloads.
E.g. when Write()ing to the response, is there a way to check how much data is already queued?
An example of the context, but not using a file object.
func downloadHandler(w http.ResponseWriter, req *http.Request, ps httprouter.Params) {
// Open some file.
f := os.Open("somefile.txt")
// Adjust the iteration speed of this loop to the client's download speed.
for
{
data := make([]byte, 1000)
count, err := f.Read(data)
if err != nil {
log.Fatal(err)
}
if count == 0 {
break
}
// Upload data chunk to client.
w.Write(data[:count])
}
}
You could implement a custom http.ResponseWriter that measures bytes sent, and calculates throughput.
There are likely packages to do similar things already. Google found this one (which I haven't used).

Basic web tweaks that all applications should have

Currently my web app is just a router and handlers.
What are some important things I am missing to make this production worthy?
I believe I have to set the # of procs to ensure this uses maximum goroutines?
Should I be using output buffering?
Anything else you see missing that is best-practise?
var (
templates = template.Must(template.ParseFiles("templates/home.html")
)
func main() {
r := mux.NewRouter()
r.HandleFunc("/", WelcomeHandler)
http.ListenAndServe(":9000", r)
}
func WelcomeHandler(w http.ResponseWriter, r *http.Request) {
homePage, err := api.LoadHomePage()
if err != nil {
}
tmpl := "home"
renderTemplate(w, tmpl, homePage)
}
func renderTemplate(w http.ResponseWriter, tmpl string, hp *HomePage) {
err := templates.ExecuteTemplate(w, tmpl+".html", hp)
if err != nil {
http.Error(w, err.Error(), http.StatusInternalServerError)
}
}
You don't need to set/change runtime.GOMAXPROCS() as since Go 1.5 it defaults to the number of available CPU cores.
Buffering output? From the performance point of view, you don't need to. But there may be other considerations for which you may.
For example, your renderTemplate() function may potentially panic. If executing the template starts writing to the output, it involves setting the HTTP response code and other headers prior to writing data. And if a template execution error occurs after that, it will return an error, and so your code attempts to send back an error response. At this point HTTP headers are already written, and this http.Error() function will try to set headers again => panic.
One way to avoid this is to first render the template into a buffer (e.g. bytes.Buffer), and if no error is returned by the template execution, then you can write the content of the buffer to the response writer. If error occurs, then of course you won't write the content of the buffer, but send back an error response just like you did.
To sum it up, your code is production ready performance-wise (excluding the way you handle template execution errors).
WelcomeHandler should return when err != nil is true.
Log the error when one is hit to help investigation.
Place templates = template.Must(template.ParseFiles("templates/home.html") in the init. Split it into separate lines. If template.ParseFiles returns an then error make a Fatal log. And if you have multiple templates to initialize then initialize them in goroutines with a common WaitGroup to speed up the startup.
Since you are using mux, HTTP Server is too clean with its URLs might also be good to know.
You might also want to reconsider the decision of letting the user's know why they got the http.StatusInternalServerError response.
Setting the GOMAXPROCS > 1 if you have more the one core would definitely be a good idea but I would keep it less than number of cores available.

Running Multiple GTK WebKitWebViews via Goroutines

I'm using Go with the gotk3 and webkit2 libraries to try and build a web crawler that can parse JavaScript in the context of a WebKitWebView.
Thinking of performance, I'm trying to figure out what would be the best way to have it crawl concurrently (if not in parallel, with multiple processors), using all available resources.
GTK and everything with threads and goroutines are pretty new to me. Reading from the gotk3 goroutines example, it states:
Native GTK is not thread safe, and thus, gotk3's GTK bindings may not be used from other goroutines. Instead, glib.IdleAdd() must be used to add a function to run in the GTK main loop when it is in an idle state.
Go will panic and show a stack trace when I try to run a function, which creates a new WebView, in a goroutine. I'm not exactly sure why this happens, but I think it has something to do with this comment. An example is shown below.
Current Code
Here's my current code, which has been adapted from the webkit2 example:
package main
import (
"fmt"
"github.com/gotk3/gotk3/glib"
"github.com/gotk3/gotk3/gtk"
"github.com/sourcegraph/go-webkit2/webkit2"
"github.com/sqs/gojs"
)
func crawlPage(url string) {
web := webkit2.NewWebView()
web.Connect("load-changed", func(_ *glib.Object, i int) {
loadEvent := webkit2.LoadEvent(i)
switch loadEvent {
case webkit2.LoadFinished:
fmt.Printf("Load finished for: %v\n", url)
web.RunJavaScript("window.location.hostname", func(val *gojs.Value, err error) {
if err != nil {
fmt.Println("JavaScript error.")
} else {
fmt.Printf("Hostname (from JavaScript): %q\n", val)
}
//gtk.MainQuit()
})
}
})
glib.IdleAdd(func() bool {
web.LoadURI(url)
return false
})
}
func main() {
gtk.Init(nil)
crawlPage("https://www.google.com")
crawlPage("https://www.yahoo.com")
crawlPage("https://github.com")
crawlPage("http://deelay.me/2000/http://deelay.me/img/1000ms.gif")
gtk.Main()
}
It seems that creating a new WebView for each URL allows them to load concurrently. Having glib.IdleAdd() running in a goroutine, as per the gotk3 example, doesn't seem to have any effect (although I'm only doing a visual benchmark):
go glib.IdleAdd(func() bool { // Works
web.LoadURI(url)
return false
})
However, trying to create a goroutine for each crawlPage() call ends in a panic:
go crawlPage("https://www.google.com") // Panics and shows stack trace
I can run web.RunJavaScript() in a goroutine without issue:
switch loadEvent {
case webkit2.LoadFinished:
fmt.Printf("Load finished for: %v\n", url)
go web.RunJavaScript("window.location.hostname", func(val *gojs.Value, err error) { // Works
if err != nil {
fmt.Println("JavaScript error.")
} else {
fmt.Printf("Hostname (from JavaScript): %q\n", val)
}
//gtk.MainQuit()
})
}
Best Method?
The current methods I can think of are:
Spawn new WebViews to crawl each page, as shown in the current code. Track how many WebViews are opened and either continually delete and create new ones, or reuse a set number created initially, to where all available resources on the machine are used. Would this be limited in terms of processor cores being used?
Basic idea of #1, but running the binary multiple times (instead of one gocrawler process running on the machine, have four) to utilize all cores/resources.
Run the GUI (gtk3) portion of the app in its own goroutine. I could then pass data to other goroutines which do their own heavy processing, such as searching through content.
What would actually be the best way to run this code concurrently, if possible, and max out performance?
Update
Method 1 and 2 are probably out of the picture, as I ran a test by spawning ~100 WebViews and they seem to load synchronously.

Resources