I have a list of strings which can contain number of elements ranging from 1 to 100,000. I want to verify each string and see if they are stored in a database, which requires call to network.
In order to maximize the efficiency, I want to spawn a go routine for each element.
Goal is to return false if one of the verifications inside the go routine function returns err, and return true if there is no err. So if we find at least one err we can stop since we already know that it is going to return false.
This is the basic idea, and the function below is the structure I've been thinking about using so far. I'd like to know if there is a better way (perhaps using channel?).
for _, id := range userIdList {
go func(id string){
user, err := verifyId(id)
if err != nil {
return err
}
// ...
// few more calls to other APIs for verifications
if err != nil {
return err
}
}(id)
}
I have wrote a small function that might be helpful for you.
Please take a look at limited parallel operations
Related
I'm new to Go, so sorry for the silly question in advance!
I'm using Gin framework and want to make multiple queries to the database within the same handler (database/sql + lib/pq)
userIds := []int{}
bookIds := []int{}
var id int
/* Handling first query here */
rows, err := pgClient.Query(getUserIdsQuery)
defer rows.Close()
if err != nil {
return
}
for rows.Next() {
err := rows.Scan(&id)
if err != nil {
return
}
userIds = append(userIds, id)
}
/* Handling second query here */
rows, err = pgClient.Query(getBookIdsQuery)
defer rows.Close()
if err != nil {
return
}
for rows.Next() {
err := rows.Scan(&id)
if err != nil {
return
}
bookIds = append(bookIds, id)
}
I have a couple of questions regarding this code (any improvements and best practices would be appreciated)
Does Go properly handle defer rows.Close() in such a case? I mean I have reassignment of rows variable later down the code, so will compiler track both and properly close at the end of a function?
Is it ok to reuse id shared var or should I redeclare it while iterating within rows.Next() loop?
What's the better approach of having even more queries within one handler? Should I have some kind of Writer that accepts query and slice and populate it with ids retrieved?
Thanks.
I've never worked with go-pg library, and my answer is mostly focused on the other stuff, which are generic, and are not specific to golang or go-pg.
Regardless of the fact that the rows here has the same reference while being shared between 2 queries (so one rows.Close() call would suffice, unless the library has some special implementation), defining two variables is cleaner, like userRows and bookRows.
Although I already said that I have not worked with go-pg, I believe that you wont need to iterate through rows and scan the id for all the rows manually, I believe that the lib has provided some API like this (based on the quick look on the documentations):
userIds := []int{}
err := pgClient.Query(&userIds, "select id from users where ...", args...)
Regarding your second question, it depends on what you mean by "ok". Since your doing some synchronous iteration, I don't think it would result into bugs, but when it comes to coding style, personally, I wouldn't do this.
I think that the best thing to do in your case is this:
// repo layer
func getUserIds(args whatever) ([]int, err) {...}
// these can be exposed, based on your packaging logic
func getBookIds(args whatever) ([]int, err) {...}
// service layer, or wherever you want to aggregate both queries
func getUserAndBookIds() ([]int, []int, err) {
userIds, err := getUserIds(...)
// potential error handling
bookIds, err := getBookIds(...)
// potential error handling
return userIds, bookIds, nil // you have done err handling earlier
}
I think this code is easier to read/maintain. You won't face the variable reassignment and other issues.
You can take a look at the go-pg documentations for more details on how to improve your query.
Am I correct to assume that with the Go language, these two formulations are always equivalent ?
func f() {
// Do stuff
}
go f()
and
func f() {
go func(){
// do stuff
}()
)
The question was basically answered in the comments, but although in the simple case both examples do the same thing, one may be preferred over the other depending on what the actual goal is.
One that the comments mention is allowing the user of your code to decide on concurrency vs you (the writer) deciding. I think this rule of thumb is generally preferred especially for people writing packages for others to use (even if perhaps the others are in your own team). I've also seen this rule of thumb espoused elsewhere on "the internet", and I think arose because in the early days of Go, people were using (and abusing) concurrency features just because they were available. For example, returning a channel from which you'd receive a value instead of just returning the value.
Another difference is that in the top example, f() may not be able to close on variables that you might want accessible when run as a goroutine--you'd have to pass everything into f() as a parameter. In the second example the anonymous function in go func() {...} could close over something in f().
One example where I prefer the second style is starting servers. For example:
func (app *Application) start() {
if app.HttpsServer != nil {
go func() {
err := app.HttpsServer.ListenAndServeTLS(
app.Config.TLSCertificateFile,
app.Config.TLSKeyFile)
if err != nil && err != http.ErrServerClosed {
// unexpected error
log.Printf(log.Critical, "error with https server: %s", err)
}
}()
}
go func() {
err := app.HttpServer.ListenAndServe()
if err != nil && err != http.ErrServerClosed {
// unexpected error
log.Printf(log.Critical, "error with http server: %s", err)
}
}()
}
Here the intention is that Application is configured and controlled in main(), the servers (one on https, one on http) are started and program flow returns to main(). In my specific case, main() waits for a signal from the OS then shuts down the servers and exits. Both goroutines close over app and have access to the data it contains. Is this "good" or "bad"...who knows, but it works well for me.
So essentially... "It depends".
I saw the example of errgroup in godoc, and it makes me confused that it simply assigns the result to global results instead of using channels in each search routines. Heres the code:
Google := func(ctx context.Context, query string) ([]Result, error) {
g, ctx := errgroup.WithContext(ctx)
searches := []Search{Web, Image, Video}
results := make([]Result, len(searches))
for i, search := range searches {
i, search := i, search // https://golang.org/doc/faq#closures_and_goroutines
g.Go(func() error {
result, err := search(ctx, query)
if err == nil {
results[i] = result
}
return err
})
}
if err := g.Wait(); err != nil {
return nil, err
}
return results, nil
}
I'm not sure is there any reason or implied rules guarantees it is correct? THX
The intent here is to make searches and results congruent. The result for the Web search is always at results[0], the result for the Image search always at results[1], etc. It also makes for a simpler example, because there is no need for an additional goroutine that consumes a channel.
If the goroutines would send their results into a channel, the result order would be unpredictable. If predictable result order is not a property you care about feel free to use a channel.
There is secret sauce in this code that creates siloing:
results := make([]Result, len(searches))
^^^^ ^^^^^^^^^^^^^
for i, search := ... {
i, search := i, search
^^^^^^^^^^
g.Go {
results[i] = result
^^^^^^^^^^
}
We know how big the result set is going to be, so we pre-allocate all the slots before starting any goroutines. This eliminates any contention over the slice object itself
make(.., len(searches))
^^^^ ^^^^^^^^^^^^^
We then promote the index number and search property to a closure for each iteration, so there is no contention over the variables being used by the loop/goroutines
i, search := i, search
And finally, each worker operates on a singular slot in the pre-sized slice:
results[i] = result
The workers are guaranteed to only perform read operations on the "results" slice to find out where their element is (results[i]).
This particular pattern is limiting, you can't use the results until all the workers are completed. So ask yourself what you're going to do next when deciding whether to use this or a channels-based pipeline workflow.
results := getSearchResults(searches)
statistics := analyzeResults(results)
for stats := range statistics {
our.Write("{%s}\n", stats.String())
}
If the analysis of a given result is independent of any other, this is a good candidate for a channel-based workflow.
But if the analysis depends on order, or has different results depending on each other then you may not have any choice but to serialize the flow.
I want to write test for function which includes a call to fmt.Scanf(), but I am having problem in passing the required parameter to function.
Is there a better way to do this or I need to mock fmt.Scanf()
Function to be tested is given here:
https://github.com/apsdehal/Konsoole/blob/master/parser.go#L28
// Initializes the network interface by finding all the available devices
// displays them to user and finally selects one of them as per the user
func Init() *pcap.Pcap {
devices, err := pcap.Findalldevs()
if err != nil {
fmt.Fprintf(errWriter, "[-] Error, pcap failed to iniaitilize")
}
if len(devices) == 0 {
fmt.Fprintf(errWriter, "[-] No devices found, quitting!")
os.Exit(1)
}
fmt.Println("Select one of the devices:")
var i int = 1
for _, x := range devices {
fmt.Println(i, x.Name)
i++
}
var index int
fmt.Scanf("%d", &index)
handle, err := pcap.Openlive(devices[index-1].Name, 65535, true, 0)
if err != nil {
fmt.Fprintf(errWriter, "Konsoole: %s\n", err)
errWriter.Flush()
}
return handle
}
It's theoretically possible to change the behavior of Scanf by hotswapping the value of os.Stdin with some other os.File. I wouldn't particularly recommend it just for testing purposes, though.
A better option would just be to make your Init take in an io.Reader that you pass to Fscanf.
Overall, however, it would likely be better to separate your device initialization code from your input as much as possible. This probably means having a device list returning function and a device opening function. You only need to prompt for selection in live/main code.
The problem is this: There is a web server. I figured that it would be beneficial to use goroutines in page loading, so I went ahead and did: called loadPage function as a goroutine. However, when doing this, the server simply stops working without errors. It prints a blank, white page. The problem has to be in the function itself- something there is conflicting with the goroutine somehow.
These are the relevant functions:
func loadPage(w http.ResponseWriter, path string) {
s := GetFileContent(path)
w.Header().Add("Content-Type", getHeader(path))
w.Header().Add("Content-Length", GetContentLength(path))
fmt.Fprint(w, s)
}
func GetFileContent(path string) string {
cont, err := ioutil.ReadFile(path)
e(err)
aob := len(cont)
s := string(cont[:aob])
return s
}
func GetFileContent(path string) string {
cont, err := ioutil.ReadFile(path)
e(err)
aob := len(cont)
s := string(cont[:aob])
return s
}
func getHeader(path string) string {
images := []string{".jpg", ".jpeg", ".gif", ".png"}
readable := []string{".htm", ".html", ".php", ".asp", ".js", ".css"}
if ArrayContainsSuffix(images, path) {
return "image/jpeg"
}
if ArrayContainsSuffix(readable, path) {
return "text/html"
}
return "file/downloadable"
}
func ArrayContainsSuffix(arr []string, c string) bool {
length := len(arr)
for i := 0; i < length; i++ {
s := arr[i]
if strings.HasSuffix(c, s) {
return true
}
}
return false
}
The reason why this happens is because your HandlerFunc which calls "loadPage" is called synchronously with the request. When you call it in a go routine the Handler is actually returning immediately, causing the response to be sent immediately. That's why you get a blank page.
You can see this in server.go (line 1096):
serverHandler{c.server}.ServeHTTP(w, w.req)
if c.hijacked() {
return
}
w.finishRequest()
The ServeHTTP function calls your handler, and as soon as it returns it calls "finishRequest". So your Handler function must block as long as it wants to fulfill the request.
Using a go routine will actually not make your page any faster. Synchronizing a singe go routine with a channel, as Philip suggests, will also not help you in this case as that would be the same as not having the go routine at all.
The root of your problem is actually ioutil.ReadFile, which buffers the entire file into memory before sending it.
If you want to stream the file you need to use os.Open. You can use io.Copy to stream the contents of the file to the browser, which will used chunked encoding.
That would look something like this:
f, err := os.Open(path)
if err != nil {
http.Error(w, "Not Found", http.StatusNotFound)
return
}
n, err := io.Copy(w, f)
if n == 0 && err != nil {
http.Error(w, "Error", http.StatusInternalServerError)
return
}
If for some reason you need to do work in multiple go routines, take a look at sync.WaitGroup. Channels can also work.
If you are trying to just serve a file, there are other options that are optimized for this, such as FileServer or ServeFile.
In the typical web framework implementations in Go, the route handlers are invoked as Goroutines. I.e. at some point the web framework will say go loadPage(...).
So if you call a Go routine from inside loadPage, you have two levels of Goroutines.
The Go scheduler is really lazy and will not execute the second level if it's not forced to. So you need to enforce it through synchronization events. E.g. by using channels or the sync package. Example:
func loadPage(w http.ResponseWriter, path string) {
s := make(chan string)
go GetFileContent(path, s)
fmt.Fprint(w, <-s)
}
The Go documentation says this:
If the effects of a goroutine must be observed by another goroutine,
use a synchronization mechanism such as a lock or channel
communication to establish a relative ordering.
Why is this actually a smart thing to do? In larger projects you may deal with a large number of Goroutines that need to be coordinated somehow efficiently. So why call a Goroutine if it's output is used nowhere? A fun fact: I/O operations like fmt.Printf do trigger synchronization events too.