I'm reading values from a channel in a loop like this:
for {
capturedFrame := <-capturedFrameChan
remoteCopy(capturedFrame)
}
To make it more efficient, I would like to read these values in a batch, with something like this (pseudo-code):
for {
capturedFrames := <-capturedFrameChan
multipleRemoteCopy(capturedFrames)
}
But I'm not sure how to do that. If I call capturedFrames := <-capturedFrameChan multiple times it's going to block.
Basically, what I would like is to read all the available values in captureFrameChan and, if none is available, it blocks as usual.
What would be the way to accomplish this in Go?
Something like this should work:
for {
// we initialize our slice. You may want to add a larger cap to avoid multiple memory allocations on `append`
capturedFrames := make([]Frame, 1)
// We block waiting for a first frame
capturedFrames[0] = <-capturedFrameChan
forLoop:
for {
select {
case buf := <-capturedFrameChan:
// if there is more frame immediately available, we add them to our slice
capturedFrames = append(capturedFrames, buf)
default:
// else we move on without blocking
break forLoop
}
}
multipleRemoteCopy(capturedFrames)
}
Try this (for channel ch with type T):
for firstItem := range ch { // For ensure that any batch could not be empty
var itemsBatch []T
itemsBatch = append(itemsBatch, firstItem)
Remaining:
for len(itemsBatch) < BATCHSIZE { // For control maximum size of batch
select {
case item := <-ch:
itemsBatch = append(itemsBatch, item)
default:
break Remaining
}
}
// Consume itemsBatch here...
}
But, if BATCHSIZE is constant, this code would be more efficient:
var i int
itemsBatch := [BATCHSIZE]T{}
for firstItem := range ch { // For ensure that any batch could not be empty
itemsBatch[0] = firstItem
Remaining:
for i = 1; i < BATCHSIZE; i++ { // For control maximum size of batch
select {
case itemsBatch[i] = <-ch:
default:
break Remaining
}
}
// Now you have itemsBatch with length i <= BATCHSIZE;
// Consume that here...
}
By using len(capturedFrames), you can do it like below:
for {
select {
case frame := <-capturedFrames:
frames := []Frame{frame}
for i := 0; i < len(capturedFrames); i++ {
frames = append(frames, <-capturedFrames)
}
multipleRemoteCopy(frames)
}
}
Seems you can also benchmark just
for {
capturedFrame := <-capturedFrameChan
go remoteCopy(capturedFrame)
}
without any codebase refactoring to see if it increase efficiency.
I've ended up doing it as below. Basically I've used len(capturedFrames) to know how many frames are available, then retrieved them in a loop:
for {
var paths []string
itemCount := len(capturedFrames)
if itemCount <= 0 {
time.Sleep(50 * time.Millisecond)
continue
}
for i := 0; i < itemCount; i++ {
f := <-capturedFrames
paths = append(paths, f)
}
err := multipleRemoteCopy(paths, opts)
if err != nil {
fmt.Printf("Error: could not remote copy \"%s\": %s", paths, err)
}
}
Related
I have the following piece of code. I'm trying to run 3 GO routines at the same time never exceeding three. This works as expected, but the code is supposed to be running updates a table in the DB.
So the first routine processes the first 50, then the second 50, and then third 50, and it repeats. I don't want two routines processing the same rows at the same time and due to how long the update takes, this happens almost every time.
To solve this, I started flagging the rows with a new column processing which is a bool. I set it to true for all rows to be updated when the routine starts and sleep the script for 6 seconds to allow the flag to be updated.
This works for a random amount of time, but every now and then, I'll see 2-3 jobs processing the same rows again. I feel like the method I'm using to prevent duplicate updates is a bit janky and was wondering if there was a better way.
stopper := make(chan struct{}, 3)
var counter int
for {
counter++
stopper <- struct{}{}
go func(db *sqlx.DB, c int) {
fmt.Println("start")
updateTables(db)
fmt.Println("stop"b)
<-stopper
}(db, counter)
time.Sleep(6 * time.Second)
}
in updateTables
var ids[]string
err := sqlx.Select(db, &data, `select * from table_data where processing = false `)
if err != nil {
panic(err)
}
for _, row:= range data{
list = append(ids, row.Id)
}
if len(rows) == 0 {
return
}
for _, row:= range data{
_, err = db.Exec(`update table_data set processing = true where id = $1, row.Id)
if err != nil {
panic(err)
}
}
// Additional row processing
I think there's a misunderstanding on approach to go routines in this case.
Go routines to do these kind of work should be approached like worker Threads, using channels as the communication method in between the main routine (which will be doing the synchronization) and the worker go routines (which will be doing the actual job).
package main
import (
"log"
"sync"
"time"
)
type record struct {
id int
}
func main() {
const WORKER_COUNT = 10
recordschan := make(chan record)
var wg sync.WaitGroup
for k := 0; k < WORKER_COUNT; k++ {
wg.Add(1)
// Create the worker which will be doing the updates
go func(workerID int) {
defer wg.Done() // Marking the worker as done
for record := range recordschan {
updateRecord(record)
log.Printf("req %d processed by worker %d", record.id, workerID)
}
}(k)
}
// Feeding the records channel
for _, record := range fetchRecords() {
recordschan <- record
}
// Closing our channel as we're not using it anymore
close(recordschan)
// Waiting for all the go routines to finish
wg.Wait()
log.Println("we're done!")
}
func fetchRecords() []record {
result := []record{}
for k := 0; k < 100; k++ {
result = append(result, record{k})
}
return result
}
func updateRecord(req record) {
time.Sleep(200 * time.Millisecond)
}
You can even buffer things in the main go routine if you need to update all the 50 tables at once.
I'm trying to add multiple plots by using a loop, but I can't seem to figure out how to put the lines in. Here is the code I'm working on:
func plot_stochastic_processes(processes [][]float64, title string) {
p, err := plot.New()
if err != nil {
panic(err)
}
p.Title.Text = title
p.X.Label.Text = "X"
p.Y.Label.Text = "Y"
err = plotutil.AddLinePoints(p,
"Test", getPoints(processes[1]),
//Need to figure out how to loop through processes
)
if err != nil {
panic(err)
}
// Save the plot to a PNG file.
if err := p.Save(4*vg.Inch, 4*vg.Inch, "points.png"); err != nil {
panic(err)
}
}
My getPoints function looks like this:
func getPoints(line []float64) plotter.XYs {
pts := make(plotter.XYs, len(line))
for j, k := range line {
pts[j].X = float64(j)
pts[j].Y = k
}
return pts
}
I get an error when trying to put a loop where the commented section is. I know this should be fairly straightforward. Perhaps a loop prior to this to get the list of lines?
Something like
for i, process := range processes {
return "title", getPoints(process),
}
Obviously I know that isn't correct, but not I'm not sure how to go about it.
I think you want to first extract your data into a []interface{}, and then call into AddLinePoints. Roughly (I didn't test):
lines := make([]interface{},0)
for i, v := range processes {
lines = append(lines, "Title" + strconv.Itoa(i))
lines = append(lines, getPoints(v))
}
plotutil.AddLinePoints(p, lines...)
How can I make this simple for loop break after exactly one 1s has passed since its execution?
var i int
for {
i++
}
By checking the elapsed time since the start:
var i int
for start := time.Now(); time.Since(start) < time.Second; {
i++
}
Or using a "timeout" channel, acquired by calling time.After(). Use select to check if time is up, but you must add a default branch so it will be a non-blocking check. If time is up, break from the loop. Also very important to use a label and break from the for loop, else break will just break from the select and it will be an endless loop.
loop:
for timeout := time.After(time.Second); ; {
select {
case <-timeout:
break loop
default:
}
i++
}
Note: If the loop body also performs communication operations (like send or receive), using a timeout channel may be the only viable option! (You can list the timeout check and the loop's communication op in the same select.)
We may rewrite the timeout channel solution to not use a label:
for stay, timeout := true, time.After(time.Second); stay; {
i++
select {
case <-timeout:
stay = false
default:
}
}
Optimization
I know your loop is just an example, but if the loop is doing just a tiny bit of work, it is not worth checking the timeout in every iteration. We may rewrite the first solution to check timeout e.g. in every 10 iterations like this:
var i int
for start := time.Now(); ; {
if i % 10 == 0 {
if time.Since(start) > time.Second {
break
}
}
i++
}
We may choose an iteration number which is a multiple of 2, and then we may use bitmasks which is supposed to be even faster than remainder check:
var i int
for start := time.Now(); ; {
if i&0x0f == 0 { // Check in every 16th iteration
if time.Since(start) > time.Second {
break
}
}
i++
}
We may also calculate the end time once (when the loop must end), and then you just have to compare the current time to this:
var i int
for end := time.Now().Add(time.Second); ; {
if i&0x0f == 0 { // Check in every 16th iteration
if time.Now().After(end) {
break
}
}
i++
}
I know the question is a bit old, but below might be useful for someone looking for similar scenario:
func keepCheckingSomething() (bool, error) {
timeout := time.NewTimer(10 * time.Second)
ticker := time.NewTicker(500 * time.Millisecond)
defer timeout.Stop()
defer ticker.Stop()
// Keep trying until we're timed out or get a result/error
for {
select {
// Got a timeout! fail with a timeout error
case <-timeout:
// maybe, check for one last time
ok, err := checkSomething()
if !ok {
return false, errors.New("timed out")
}
return ok, err
// Got a tick, we should check on checkSomething()
case <-ticker:
ok, err := checkSomething()
if err != nil {
// We may return, or ignore the error
return false, err
// checkSomething() done! let's return
} else if ok {
return true, nil
}
// checkSomething() isn't done yet, but it didn't fail either, let's try again
}
}
}
I have a Go server that takes input from a number of TCP clients that stream in data. The format is a custom format and the end delimiter could appear within the byte stream so it uses bytes stuffing to get around this issue.
I am looking for hotspots in my code and this throws up a HUGE one and I'm sure it could be made more efficient but I'm not quite sure how at the moment given the provided Go functions.
The code is below and pprof shows the hotspot to be popPacketFromBuffer command. This looks at the current buffer, after each byte has been received and looks for the endDelimiter on it's own. If there is 2 of them in a row then it is within the packet itself.
I did look at using ReadBytes() instead of ReadByte() but it looks like I need to specify a delimiter and I'm fearful that this will cut off a packet mid stream? And also in any case would this be more efficient than what I am doing anyway?
Within the popPacketFromBuffer function it is the for loop that is the hotspot.
Any ideas?
// Read client data from channel
func (c *Client) listen() {
reader := bufio.NewReader(c.conn)
clientBuffer := new(bytes.Buffer)
for {
c.conn.SetDeadline(time.Now().Add(c.timeoutDuration))
byte, err := reader.ReadByte()
if err != nil {
c.conn.Close()
c.server.onClientConnectionClosed(c, err)
return
}
wrErr := clientBuffer.WriteByte(byte)
if wrErr != nil {
log.Println("Write Error:", wrErr)
}
packet := popPacketFromBuffer(clientBuffer)
if packet != nil {
c.receiveMutex.Lock()
packetSize := uint64(len(packet))
c.bytesReceived += packetSize
c.receiveMutex.Unlock()
packetBuffer := bytes.NewBuffer(packet)
b, err := uncompress(packetBuffer.Bytes())
if err != nil {
log.Println("Unzip Error:", err)
} else {
c.server.onNewMessage(c, b)
}
}
}
}
func popPacketFromBuffer(buffer *bytes.Buffer) []byte {
bufferLength := buffer.Len()
if bufferLength >= 125000 { // 1MB in bytes is roughly this
log.Println("Buffer is too large ", bufferLength)
buffer.Reset()
return nil
}
tempBuffer := buffer.Bytes()
length := len(tempBuffer)
// Return on zero length buffer submission
if length == 0 {
return nil
}
endOfPacket := -1
// Determine the endOfPacket position by looking for an instance of our delimiter
for i := 0; i < length-1; i++ {
if tempBuffer[i] == endDelimiter {
if tempBuffer[i+1] == endDelimiter {
i++
} else {
// We found a single delimiter, so consider this the end of a packet
endOfPacket = i - 2
break
}
}
}
if endOfPacket != -1 {
// Grab the contents of the provided packet
extractedPacket := buffer.Bytes()
// Extract the last byte as we were super greedy with the read operation to check for stuffing
carryByte := extractedPacket[len(extractedPacket)-1]
// Clear the main buffer now we have extracted a packet from it
buffer.Reset()
// Add the carryByte over to our new buffer
buffer.WriteByte(carryByte)
// Ensure packet begins with a valid startDelimiter
if extractedPacket[0] != startDelimiter {
log.Println("Popped a packet without a valid start delimiter")
return nil
}
// Remove the start and end caps
slice := extractedPacket[1 : len(extractedPacket)-2]
return deStuffPacket(slice)
}
return nil
}
Looks like you call popPacketFromBuffer() each time every single byte received from connection. However popPacketFromBuffer() copy hole buffer and inspect for delimeters every byte each tyme. Maybe this is overwhelming. For me you don't need loop
for i := 0; i < length-1; i++ {
if tempBuffer[i] == endDelimiter {
if tempBuffer[i+1] == endDelimiter {
i++
} else {
// We found a single delimiter, so consider this the end of a packet
endOfPacket = i - 2
break
}
}
}
in popPacketFromBuffer() Maybe instead of loop just testing last two bytes
if (buffer[len(buffer)-2] == endDelimiter) && (buffer[len(buffer)-1] != endDelimiter){
//It's a packet
}
would be enough for purpose.
this is code
func main() {
...
pool := createPool(*redis_server, *redis_pass)
defer pool.Close()
c := pool.Get()
var i int64
st := tickSec()
for i = 0; i < *total; i++ {
r := time.Now().Unix() - rand.Int63n(60*60*24*31*12)
score, _ := strconv.Atoi(time.Unix(r, 0).Format("2006010215"))
id := utee.PlainMd5(uuid.NewUUID().String())
c.Send("ZADD", "app_a_5512", score, id)
if i%10000 == 0 {
c.Flush()
log.Println("current sync to redis", i)
}
}
//c.Flush()
c.Close()
...
}
if i use c.Close(),the total set 100000,the real sortedset count 100000.
but if i use c.Flush(),the total also set 100000, the real sortedset count less than 100000(96932);if i use time.Sleep() in the end of the main func,the total is 100000 too.
when main func exit,the flush func is not complete?and why? thank you!
The reason that the program works when Close() is called after the loop is that the pooled connection's Close() method reads and discards all pending responses.
The application should Receive the responses for all commands instead of letting the respones backup and consume memory on the server. There's no need to flush in the loop.
go func() {
for i = 0; i < *total; i++ {
r := time.Now().Unix() - rand.Int63n(60*60*24*31*12)
score, _ := strconv.Atoi(time.Unix(r, 0).Format("2006010215"))
id := utee.PlainMd5(uuid.NewUUID().String())
c.Send("ZADD", "app_a_5512", score, id)
}
c.Flush()
}
for i = 0; i < *total; i++ {
c.Receive()
}
c.Close()
Also, the application should check and handle the errors returns from Send, Flush and Receive.