I have seen some implementations of FIFO Queues using slices in Go. As items exit the queue can this memory be freed up without reallocating the underlying array? If this doesn't occur it would seem to me that the queue would leak a ton of memory. This is what I mean:
type queue
{
[]int
int head
}
func (q *queue) enqueue(val int) {
q = append(q, val)
}
func (q *queue) dequeue() int {
return (*q)[q.head++]
}
After calling enqueue/dequeue a bunch of times, the low indexes of the array underying the slice are no longer usable but I am not sure how they can be freed either. Can someone point me to a proper queue implementation that does not use pointers, and doesn't leak memory like this or have performance issues? Alternatively a description of how this might work would also be appreciated.
Thank you,
Plamen
You can use a circular buffer. From wikipedia:
A circular buffer, circular queue, cyclic buffer or ring buffer is a data structure that uses a single, fixed-size buffer as if it were connected end-to-end. This structure lends itself easily to buffering data streams.
...
Circular buffering makes a good implementation strategy for a queue that has fixed maximum size. Should a maximum size be adopted for a queue, then a circular buffer is a completely ideal implementation; all queue operations are constant time. However, expanding a circular buffer requires shifting memory, which is comparatively costly. For arbitrarily expanding queues, a linked list approach may be preferred instead.
Here's a package that implements this: https://github.com/eapache/queue.
Depending on the use case, a channel is also a good way to implement a queue. It blocks, but using a select with a default you can avoid that behavior:
select {
case msg := <-queue:
default:
}
Related
Say I have two go routines:
var sequence int64
// writer
for i := sequence; i < max; i++ {
doSomethingWithSequence(i)
sequence = i
}
// reader
for {
doSomeOtherThingWithSequence(sequence)
}
So can I get by without atomic?
Some potential risks I can think of:
reorder (for the writer, updating sequence happens before doSomething) could happen, but I can live with that.
sequence is not properly aligned in memory so the reader might observe a partially updated i. Running on (recent kernel) linux with x86_64,
can we rule that out?
go compiler 'cleverly optimizes' the reader, so the access to i never goes to memory but cached in a register. Is that possible in go?
Anything else?
Go's motto: Do not communicate by sharing memory; instead, share memory by communicating. Which is an effective best-practice most of the time.
If you care about ordering, you care about synchronizing the two goroutines.
I don't think they are possible. Anyway, those are not things you should worry about if you properly design the synchronization.
The same as above.
Luckily, Go has a data race detector integrated. Try to run your example with go run -race. You will probably see the race condition happening on sequence variable.
Circular queue is obviously better because it helps us to use the empty space left by popping out the elements. It also saves time that may have been used to do lateral shift of elements after each pop.
But is there any use case where queue would be preferred than using a circular queue?
Definition of Queue = We will go with the linear array implementation. Follows FIFO and no overwrites
Definition of Circular Queue = Ring Buffer Implementation. Follows FIFO. No overwrites.
Note: In many languages a queue is just an interface and doesn't say anything about the implementation.
When using an array based circular queue, a.k.a ring buffer, you must handle the situation where you push to a full buffer. You could:
Ignore the insertion
Overwrite the oldest entry
Block until there's space again
(Re)Allocate memory and copy all the content
Use an oversized buffer so this situation never happens
Each of these options have downsides. If you can live with them or you know that you will never fill the buffer, then ring buffer is the way to go.
Options 3 & 4 will induce stuttering. Depending on your use case, you might prefer longer but stable access time and reliability over occasional spikes and therefore opt for a linked list or some other sort of dynamic implementation, like a deque, instead.
Example use cases are tasks, where you have to achieve a stable frame/sampling rate or throughput and you can't tolerate stutters, like:
Realtime video and audio processing
Realtime rendering
Networking
Thread pools when you don't want the threads to block for too long when pushing new jobs.
However, a queue based on a linear array will suffer from the same downsides. I don't see a reason for choosing a linear queue over a circular queue.
(Besides the slightly higher implementation complexity.)
std::queue in C++ uses a deque as underlaying container by default. deque is essentially a dynamic array of arrays which seems like a good base for most use cases because it allocates memory in small chunks and hence induces less stuttering.
As in the title, are read and write operations regarding uint8, atomic?
Logically it must be a single cpu instruction obviously to read and write for a 8 bit variable. But in any case, two cores could simultaneously read and write from the memory, is it possible to create a stale data this way?
There's no guarantee that the access on native types are on any platform atomic. This is why there is sync/atomic. See also the advice in the memory model documentation.
Example for generic way of atomically setting a value (Play)
var ax atomic.Value // may be globally accessible
x := uint8(5)
// set atomically
ax.Store(x)
x = ax.Load().(uint8)
Probably more efficient solution for uint8 (Play):
var ax int64 // may be globally accessible
x := uint8(5)
atomic.StoreInt64(&ax, 10)
x = uint8(atomic.LoadInt64(&ax))
fmt.Printf("%T %v\n", x, x)
No. If you want atomic operations, you can use the sync/atomic package.
If you mean "would 8bit operations be atomic even if I ignore the Go memory model?", then the answer is still, it depends probably not.
If the hardware guarantees atomicity of read/write operations, then it might be atomic. But that still doesn't guarantee cache coherence, or compiler optimizations from reordering operations. You need to serialize the operations somehow, with the primitives Go provides in the "atomic" package, and using the "sync" package and channels to coordinate between goroutines.
Could someone please mention the flaws and performance drawbacks in the Queue like implementation?
type Queue struct {
sync.Mutex
Items []interface{}
}
func (q *Queue) Push(item interface{}) {
q.Lock()
defer q.Unlock()
q.Items = append(q.Items, item)
}
func (q *Queue) Pop() interface{} {
q.Lock()
defer q.Unlock()
if len(q.Items) == 0 {
return nil
}
item := q.Items[0]
q.Items = q.Items[1:]
return item
}
I also have methods like PopMany and PushMany, and what I am concerned about is: Is too much re-slicing that bad?
You could simply use a buffered channel.
var queue = make(chan interface{}, 100)
The size of the buffer could to be determined empirically to be large enough for the high-water mark for the rate of pushes vs rate of pops. It should ideally not be much larger than this, to avoid wasting memory.
Indeed, a smaller buffer size will also work, provided the interacting goroutines don't deadlock for other reasons. If you use a smaller buffer size, you are effectively getting queueing via the run-queue of the goroutine time-slice engine, part of the Go runtime. (Quite possible, a buffer size of zero could work in many circumstances.)
Channels allow many reader goroutines and many writer goroutines. The concurrency of their access is handled automatically by the Go runtime. All writes into the channel are interleaved so as to be a sequential stream. All the reads are also interleaved to extract values sequentially in the same order they were enqueued. Here's further discussion on this topic.
The re-slicing is not an issue here. It will also make no difference whether you have a thread-safe or unsafe version as this is pretty much how the re-sizing is meant to be done.
You can alleviate some of the re-sizing overhead by initializing the queue with a capacity:
func NewQueue(capacity int) *Queue {
return &Queue {
Items: make([]interface{}, 0, capacity),
}
}
This will initialize the queue. It can still grow beyond the capacity, but you will not be having any unnecessary copying/re-allocation until that capacity is reached.
What may potentially cause problems with many concurrent accesses, is the mutex lock. At some point, you will be spending more time waiting for locks to be released than you are actually doing work. This is a general problem with lock contention and can be solved by implementing the queue as a lock-free data structure.
There are a few third-party packages out there which provide lock free implementations of basic data structures.
Whether this will actually be useful to you can only be determined with some benchmarking. Lock-free structures can have a higher base cost, but they scale much better when you get many concurrent users. There is a cutoff point at which mutex locks become more expensive than the lock-free approach.
I think the best way to approach this is to use a linked list, there is already one available for you in standard package here
The answer marked correct says re-slicing is not an issue. That is not correct, it is an issue. What Dave is suggesting is right, we should mark that element as nil.
Refer more about slices here: https://go.dev/blog/slices-intro
What are the uses cases for buffered channels ? If i want multiple parallel actions i could just use the default, synchronous channel eq.
package main
import "fmt"
import "time"
func longLastingProcess(c chan string) {
time.Sleep(2000 * time.Millisecond)
c <- "tadaa"
}
func main() {
c := make(chan string)
go longLastingProcess(c)
go longLastingProcess(c)
go longLastingProcess(c)
fmt.Println(<- c)
}
What would be the practical cases for increasing the buffer size ?
To give a single, slightly-more-concrete use case:
Suppose you want your channel to represent a task queue, so that a task scheduler can send jobs into the queue, and a worker thread can consume a job by receiving it in the channel.
Suppose further that, though in general you expect each job to be handled in a timely fashion, it takes longer for a worker to complete a task than it does for the scheduler to schedule it.
Having a buffer allows the scheduler to deposit jobs in the queue and still remain responsive to user input (or network traffic, or whatever) because it does not have to sleep until the worker is ready each time it schedules a task. Instead, it goes about its business, and trusts the workers to catch up during a quieter period.
If you want an EVEN MORE CONCRETE example dealing with a specific piece of software then I'll see what I can do, but I hope this meets your needs.
Generally, buffering in channels is beneficial for performance reasons.
If a program is designed using an event-flow or data-flow approach, channels provide the means for the events to pass between one process and another (I use the term process in the same sense as in Tony Hoare's Communicating Sequential Processes (CSP), ie. effectively synonymous with the goroutine).
There are times when a program needs its components to remain in lock-step synchrony. In this case, unbuffered channels are required.
Otherwise, it is typically beneficial to add buffering to the channels. This should be seen as an optimisation step (deadlock may still be possible if not designed out).
There are novel throttle structures made possible by using channels with small buffers (example).
There are special overwriting or lossy forms of channels used in occam and jcsp for fixing the special case of a cycle (or loop) of processes that would otherwise probably deadlock. This is also possible in Go by writing an overwriting goroutine buffer (example).
You should never add buffering merely to fix a deadlock. If your program deadlocks, it's far easier to fix by starting with zero buffering and think through the dependencies. Then add buffering when you know it won't deadlock.
You can construct goroutines compositionally - that is, a goroutine may itself contain goroutines. This is a feature of CSP and benefits scalability greatly. The internal channels between a group of goroutines are not of interest when designing the external use of the group as a self-contained component. This principle can be applied repeatedly at increasingly-larger scales.
If the receiver of the channel is always slower than sender a buffer of any size will eventually be consumed. That will leave you with a channel that pauses your go routine as often as a unbuffered channel so you might as well use an unbuffered channel.
If the receiver is typically faster than the sender except for an occasional burst a buffered channel may be helpful and the buffer should be set to the size of the typical burst which you can arrive at by measurement at runtime.
As an alternative to a buffered channel it may better to just send an array or a struct containing an array over the channel to deal with bursts/batches.
Buffered channels are non-blocking for the sender as long as there's still room. This can increase responsiveness and throughput.
Sending several items on one buffered channel makes sure they are processed in the order in which they are sent.
From Effective Go (with example): "A buffered channel can be used like a semaphore, for instance to limit throughput."
In general, there are many use-cases and patterns of channel usage, so this is not an exhausting answer.
It's a hard question b/c the program is incorrect: It exits after receiving a signal from one goroutine, but three were started. Buffering the channel makes it no different.
EDIT: For example, here is a bit of general discussion about channel buffers. And some exercise. And a book chapter about the same.