Simple concurrent queue - go

Could someone please mention the flaws and performance drawbacks in the Queue like implementation?
type Queue struct {
sync.Mutex
Items []interface{}
}
func (q *Queue) Push(item interface{}) {
q.Lock()
defer q.Unlock()
q.Items = append(q.Items, item)
}
func (q *Queue) Pop() interface{} {
q.Lock()
defer q.Unlock()
if len(q.Items) == 0 {
return nil
}
item := q.Items[0]
q.Items = q.Items[1:]
return item
}
I also have methods like PopMany and PushMany, and what I am concerned about is: Is too much re-slicing that bad?

You could simply use a buffered channel.
var queue = make(chan interface{}, 100)
The size of the buffer could to be determined empirically to be large enough for the high-water mark for the rate of pushes vs rate of pops. It should ideally not be much larger than this, to avoid wasting memory.
Indeed, a smaller buffer size will also work, provided the interacting goroutines don't deadlock for other reasons. If you use a smaller buffer size, you are effectively getting queueing via the run-queue of the goroutine time-slice engine, part of the Go runtime. (Quite possible, a buffer size of zero could work in many circumstances.)
Channels allow many reader goroutines and many writer goroutines. The concurrency of their access is handled automatically by the Go runtime. All writes into the channel are interleaved so as to be a sequential stream. All the reads are also interleaved to extract values sequentially in the same order they were enqueued. Here's further discussion on this topic.

The re-slicing is not an issue here. It will also make no difference whether you have a thread-safe or unsafe version as this is pretty much how the re-sizing is meant to be done.
You can alleviate some of the re-sizing overhead by initializing the queue with a capacity:
func NewQueue(capacity int) *Queue {
return &Queue {
Items: make([]interface{}, 0, capacity),
}
}
This will initialize the queue. It can still grow beyond the capacity, but you will not be having any unnecessary copying/re-allocation until that capacity is reached.
What may potentially cause problems with many concurrent accesses, is the mutex lock. At some point, you will be spending more time waiting for locks to be released than you are actually doing work. This is a general problem with lock contention and can be solved by implementing the queue as a lock-free data structure.
There are a few third-party packages out there which provide lock free implementations of basic data structures.
Whether this will actually be useful to you can only be determined with some benchmarking. Lock-free structures can have a higher base cost, but they scale much better when you get many concurrent users. There is a cutoff point at which mutex locks become more expensive than the lock-free approach.

I think the best way to approach this is to use a linked list, there is already one available for you in standard package here

The answer marked correct says re-slicing is not an issue. That is not correct, it is an issue. What Dave is suggesting is right, we should mark that element as nil.
Refer more about slices here: https://go.dev/blog/slices-intro

Related

Golang assignment safety with single reader and single writer

Say I have two go routines:
var sequence int64
// writer
for i := sequence; i < max; i++ {
doSomethingWithSequence(i)
sequence = i
}
// reader
for {
doSomeOtherThingWithSequence(sequence)
}
So can I get by without atomic?
Some potential risks I can think of:
reorder (for the writer, updating sequence happens before doSomething) could happen, but I can live with that.
sequence is not properly aligned in memory so the reader might observe a partially updated i. Running on (recent kernel) linux with x86_64,
can we rule that out?
go compiler 'cleverly optimizes' the reader, so the access to i never goes to memory but cached in a register. Is that possible in go?
Anything else?
Go's motto: Do not communicate by sharing memory; instead, share memory by communicating. Which is an effective best-practice most of the time.
If you care about ordering, you care about synchronizing the two goroutines.
I don't think they are possible. Anyway, those are not things you should worry about if you properly design the synchronization.
The same as above.
Luckily, Go has a data race detector integrated. Try to run your example with go run -race. You will probably see the race condition happening on sequence variable.

Possibility to NOT hook all available CPU power?

I know, most of the beginners of go ask how to have performative go-routines / concurrency, this point I passed a few weeks ago. :-)
I have a real fast trans-coder that uses every cycle available of my 4+4 (i7 HT) CPU. It reads a file into a slice of pointers to structs, does calculations on these and writes the result back to disk. I am using bufio. I am coming from VB so the performance of Go is unbelievable.
I tried to add minimal sleeps (via time.Sleep()), but that drastically decreased performance.
While my trans-coder is working the whole system is lagging. I must change the go task's priority to low or idle to be able to work again.
How could I implement something that keeps the system responsive?
Right now I start thousands of go-routines (loop over a slice of pointers). Should I limit the number of routines?
Lowering the process priority is arguably the correct way to do this. Use your OS's scheduler. That's what it's for. Per this question you can start your process with a specified priority like so:
start "MyApp" /low "C:\myapp.exe"
You may also be able to set process priority from within the application per this question:
err := syscall.Setpriority(syscall.Getpid(), -1, -1)
Lastly, you can use GOMAXPROCS to configure how many CPUs the process is allowed to use. You can pass it in as an environment variable at runtime, or call runtime.GOMAXPROCS() within your code to override it.
Probably the simplest solution is to limit the number of concurrent goroutines using a semaphore, for example:
sem := make(chan int, 10) // limited at go routines
for {
sem <- 1
go doThis()
}
func doThis() {
//do this
<- sem
}
The "sem <- 1" inside the for loop blocks until a goroutine "slot" is freed up by doThis extracting something from the sem channel/

Queue implementation using slices in Go

I have seen some implementations of FIFO Queues using slices in Go. As items exit the queue can this memory be freed up without reallocating the underlying array? If this doesn't occur it would seem to me that the queue would leak a ton of memory. This is what I mean:
type queue
{
[]int
int head
}
func (q *queue) enqueue(val int) {
q = append(q, val)
}
func (q *queue) dequeue() int {
return (*q)[q.head++]
}
After calling enqueue/dequeue a bunch of times, the low indexes of the array underying the slice are no longer usable but I am not sure how they can be freed either. Can someone point me to a proper queue implementation that does not use pointers, and doesn't leak memory like this or have performance issues? Alternatively a description of how this might work would also be appreciated.
Thank you,
Plamen
You can use a circular buffer. From wikipedia:
A circular buffer, circular queue, cyclic buffer or ring buffer is a data structure that uses a single, fixed-size buffer as if it were connected end-to-end. This structure lends itself easily to buffering data streams.
...
Circular buffering makes a good implementation strategy for a queue that has fixed maximum size. Should a maximum size be adopted for a queue, then a circular buffer is a completely ideal implementation; all queue operations are constant time. However, expanding a circular buffer requires shifting memory, which is comparatively costly. For arbitrarily expanding queues, a linked list approach may be preferred instead.
Here's a package that implements this: https://github.com/eapache/queue.
Depending on the use case, a channel is also a good way to implement a queue. It blocks, but using a select with a default you can avoid that behavior:
select {
case msg := <-queue:
default:
}

Are goroutines appropriate for large, parallel, compute-bound problems?

Are go-routines pre-emptively multitasked for numerical problems?
I am very intrigued by the lean design of Go, the speed, but most by the fact that channels are first-class objects. I hope the last point may enable a whole new class of deep-analysis algorithms for big data, via the complex interconnection patterns which they should allow.
My problem domain requires real-time compute-bound analysis of streaming incoming data. The data can be partitioned into between 100-1000 "problems" each of which will take between 10 and 1000 seconds to compute (ie their granularity is highly variable). Results must however all be available before the output makes sense, ie, say 500 problems come in, and all 500 must be solved before I can use any of them. The application must be able to scale, potentially to thousands (but unlikely 100s of thousands) problems.
Given that I am less worried about numerical library support (most of this stuff is custom), Go seems ideal as I can map each problem to a goroutine. Before I invest in learning Go rather than say, Julia, Rust, or a functional language (none of which, as far as I can see, have first-class channels so for me are at an immediate disadvantage) I need to know if goroutines are properly pre-emptively multi-tasked. That is, if I run 500 compute-bound goroutines on a powerful multicore computer, can I expect reasonably load balancing across all the "problems" or will I have to cooperatively "yield" all the time, 1995-style. This issue is particularly important given the variable granularity of the problem and the fact that, during compute, I usually will not know how much longer it will take.
If another language would serve me better, I am happy to hear about it, but I have a requirement that threads (or go/coroutines) of execution be lightweight. Python multiprocessing module for example, is far too resource intensive for my scaling ambitions. Just to pre-empt: I do understand the difference between parallelism and concurrency.
The Go runtime has a model where multiple Go routines are mapped onto multiple threads in an automatic fashion. No Go routine is bound to a certain thread, the scheduler may (and will) schedule Go routines to the next available thread. The number of threads a Go program uses is taken from the GOMAXPROCS environment variable and can be overriden with runtime.GOMAXPROCS(). This is a simplified description which is sufficient for understanding.
Go routines may yield in the following cases:
On any operation that might block, i.e. any operation that cannot return a result on the spot because it is either a (possible) blocking system-call like io.Read() or an operation that might require waiting for other Go routines, like acquiring a mutex or sending to or receiving from a channel
On various runtime operations
On function call if the scheduler detects that the preempted Go routine took a lot of CPU time (this is new in Go 1.2)
On call to runtime.Gosched()
On panic()
As of Go 1.14, tight loops can be preempted by the runtime. As a result, loops without function calls no longer potentially deadlock the scheduler or significantly delay garbage collection. This is not supported on all platforms - be sure to review the release notes. Also see issue #36365 for future plans in this area.
On various other occasions
The following things prevent a Go routine from yielding:
Executing C code. A Go routine can't yield while it's executing C code via cgo.
Calling runtime.LockOSThread(), until runtime.UnlockOSThread() has been called.
Not sure I fully understand you, however you can set runtime.GOMAXPROCS to scale to all processes, then use channels (or locks) to synchronize the data, example:
const N = 100
func main() {
runtime.GOMAXPROCS(runtime.NumCPU()) //scale to all processors
var stuff [N]bool
var wg sync.WaitGroup
ch := make(chan int, runtime.NumCPU())
done := make(chan struct{}, runtime.NumCPU())
go func() {
for i := range ch {
stuff[i] = true
}
}()
wg.Add(N)
for i := range stuff {
go func(i int) {
for { //cpu bound loop
select {
case <-done:
fmt.Println(i, "is done")
ch <- i
wg.Done()
return
default:
}
}
}(i)
}
go func() {
for _ = range stuff {
time.Sleep(time.Microsecond)
done <- struct{}{}
}
close(done)
}()
wg.Wait()
close(ch)
for i, v := range stuff { //false-postive datarace
if !v {
panic(fmt.Sprintf("%d != true", i))
}
}
fmt.Println("All done")
}
EDIT: Information about the scheduler # http://tip.golang.org/src/pkg/runtime/proc.c
Goroutine scheduler
The scheduler's job is to distribute ready-to-run goroutines over worker threads.
The main concepts are:
G - goroutine.
M - worker thread, or machine.
P - processor, a resource that is required to execute Go code. M must have an associated P to execute Go code, however it can be blocked or in a syscall w/o an associated P.
Design doc at http://golang.org/s/go11sched.

When to use a buffered channel?

What are the uses cases for buffered channels ? If i want multiple parallel actions i could just use the default, synchronous channel eq.
package main
import "fmt"
import "time"
func longLastingProcess(c chan string) {
time.Sleep(2000 * time.Millisecond)
c <- "tadaa"
}
func main() {
c := make(chan string)
go longLastingProcess(c)
go longLastingProcess(c)
go longLastingProcess(c)
fmt.Println(<- c)
}
What would be the practical cases for increasing the buffer size ?
To give a single, slightly-more-concrete use case:
Suppose you want your channel to represent a task queue, so that a task scheduler can send jobs into the queue, and a worker thread can consume a job by receiving it in the channel.
Suppose further that, though in general you expect each job to be handled in a timely fashion, it takes longer for a worker to complete a task than it does for the scheduler to schedule it.
Having a buffer allows the scheduler to deposit jobs in the queue and still remain responsive to user input (or network traffic, or whatever) because it does not have to sleep until the worker is ready each time it schedules a task. Instead, it goes about its business, and trusts the workers to catch up during a quieter period.
If you want an EVEN MORE CONCRETE example dealing with a specific piece of software then I'll see what I can do, but I hope this meets your needs.
Generally, buffering in channels is beneficial for performance reasons.
If a program is designed using an event-flow or data-flow approach, channels provide the means for the events to pass between one process and another (I use the term process in the same sense as in Tony Hoare's Communicating Sequential Processes (CSP), ie. effectively synonymous with the goroutine).
There are times when a program needs its components to remain in lock-step synchrony. In this case, unbuffered channels are required.
Otherwise, it is typically beneficial to add buffering to the channels. This should be seen as an optimisation step (deadlock may still be possible if not designed out).
There are novel throttle structures made possible by using channels with small buffers (example).
There are special overwriting or lossy forms of channels used in occam and jcsp for fixing the special case of a cycle (or loop) of processes that would otherwise probably deadlock. This is also possible in Go by writing an overwriting goroutine buffer (example).
You should never add buffering merely to fix a deadlock. If your program deadlocks, it's far easier to fix by starting with zero buffering and think through the dependencies. Then add buffering when you know it won't deadlock.
You can construct goroutines compositionally - that is, a goroutine may itself contain goroutines. This is a feature of CSP and benefits scalability greatly. The internal channels between a group of goroutines are not of interest when designing the external use of the group as a self-contained component. This principle can be applied repeatedly at increasingly-larger scales.
If the receiver of the channel is always slower than sender a buffer of any size will eventually be consumed. That will leave you with a channel that pauses your go routine as often as a unbuffered channel so you might as well use an unbuffered channel.
If the receiver is typically faster than the sender except for an occasional burst a buffered channel may be helpful and the buffer should be set to the size of the typical burst which you can arrive at by measurement at runtime.
As an alternative to a buffered channel it may better to just send an array or a struct containing an array over the channel to deal with bursts/batches.
Buffered channels are non-blocking for the sender as long as there's still room. This can increase responsiveness and throughput.
Sending several items on one buffered channel makes sure they are processed in the order in which they are sent.
From Effective Go (with example): "A buffered channel can be used like a semaphore, for instance to limit throughput."
In general, there are many use-cases and patterns of channel usage, so this is not an exhausting answer.
It's a hard question b/c the program is incorrect: It exits after receiving a signal from one goroutine, but three were started. Buffering the channel makes it no different.
EDIT: For example, here is a bit of general discussion about channel buffers. And some exercise. And a book chapter about the same.

Resources