I have a Go server that takes input from a number of TCP clients that stream in data. The format is a custom format and the end delimiter could appear within the byte stream so it uses bytes stuffing to get around this issue.
I am looking for hotspots in my code and this throws up a HUGE one and I'm sure it could be made more efficient but I'm not quite sure how at the moment given the provided Go functions.
The code is below and pprof shows the hotspot to be popPacketFromBuffer command. This looks at the current buffer, after each byte has been received and looks for the endDelimiter on it's own. If there is 2 of them in a row then it is within the packet itself.
I did look at using ReadBytes() instead of ReadByte() but it looks like I need to specify a delimiter and I'm fearful that this will cut off a packet mid stream? And also in any case would this be more efficient than what I am doing anyway?
Within the popPacketFromBuffer function it is the for loop that is the hotspot.
Any ideas?
// Read client data from channel
func (c *Client) listen() {
reader := bufio.NewReader(c.conn)
clientBuffer := new(bytes.Buffer)
for {
byte, err := reader.ReadByte()
if err != nil {
c.server.onClientConnectionClosed(c, err)
wrErr := clientBuffer.WriteByte(byte)
if wrErr != nil {
log.Println("Write Error:", wrErr)
packet := popPacketFromBuffer(clientBuffer)
if packet != nil {
packetSize := uint64(len(packet))
c.bytesReceived += packetSize
packetBuffer := bytes.NewBuffer(packet)
b, err := uncompress(packetBuffer.Bytes())
if err != nil {
log.Println("Unzip Error:", err)
} else {
c.server.onNewMessage(c, b)
func popPacketFromBuffer(buffer *bytes.Buffer) []byte {
bufferLength := buffer.Len()
if bufferLength >= 125000 { // 1MB in bytes is roughly this
log.Println("Buffer is too large ", bufferLength)
return nil
tempBuffer := buffer.Bytes()
length := len(tempBuffer)
// Return on zero length buffer submission
if length == 0 {
return nil
endOfPacket := -1
// Determine the endOfPacket position by looking for an instance of our delimiter
for i := 0; i < length-1; i++ {
if tempBuffer[i] == endDelimiter {
if tempBuffer[i+1] == endDelimiter {
} else {
// We found a single delimiter, so consider this the end of a packet
endOfPacket = i - 2
if endOfPacket != -1 {
// Grab the contents of the provided packet
extractedPacket := buffer.Bytes()
// Extract the last byte as we were super greedy with the read operation to check for stuffing
carryByte := extractedPacket[len(extractedPacket)-1]
// Clear the main buffer now we have extracted a packet from it
// Add the carryByte over to our new buffer
// Ensure packet begins with a valid startDelimiter
if extractedPacket[0] != startDelimiter {
log.Println("Popped a packet without a valid start delimiter")
return nil
// Remove the start and end caps
slice := extractedPacket[1 : len(extractedPacket)-2]
return deStuffPacket(slice)
return nil
Looks like you call popPacketFromBuffer() each time every single byte received from connection. However popPacketFromBuffer() copy hole buffer and inspect for delimeters every byte each tyme. Maybe this is overwhelming. For me you don't need loop
for i := 0; i < length-1; i++ {
if tempBuffer[i] == endDelimiter {
if tempBuffer[i+1] == endDelimiter {
} else {
// We found a single delimiter, so consider this the end of a packet
endOfPacket = i - 2
in popPacketFromBuffer() Maybe instead of loop just testing last two bytes
if (buffer[len(buffer)-2] == endDelimiter) && (buffer[len(buffer)-1] != endDelimiter){
//It's a packet
would be enough for purpose.
I'm trying to implement a function to ignore a line containing a pattern from a long text file (ASCII guaranteed) in Go
The functions I have below withoutIgnore and withIgnore, both take a filename argument input and return a *byte.Buffer, which can be subsequently used to write to a io.Writer.
The withIgnore function takes an additional argument pattern to exclude the line containing the pattern from the file. The function works, but with benchmarking, found it to be 5x slower than withoutIgnore. Is there a way it could be improved?
package main
import (
func withoutIgnore(f string) (*bytes.Buffer, error) {
rfd, err := os.Open(f)
if err != nil {
defer func() {
if err := rfd.Close(); err != nil {
inputBuffer := make([]byte, 1048576)
var bytesRead int
var bs []byte
opBuffer := bytes.NewBuffer(bs)
for {
bytesRead, err = rfd.Read(inputBuffer)
if err == io.EOF {
return opBuffer, nil
if err != nil {
return nil, nil
_, err = opBuffer.Write(inputBuffer[:bytesRead])
if err != nil {
return nil, err
return opBuffer, nil
func withIgnore(f, pattern string) (*bytes.Buffer, error) {
rfd, err := os.Open(f)
if err != nil {
defer func() {
if err := rfd.Close(); err != nil {
scanner := bufio.NewScanner(rfd)
var bs []byte
buffer := bytes.NewBuffer(bs)
for scanner.Scan() {
if !bytes.Contains(scanner.Bytes(), []byte(pattern)) {
_, err := buffer.WriteString(scanner.Text() + "\n")
if err != nil {
return nil, nil
return buffer, nil
func main() {
// buff, err := withoutIgnore("base64dump.log")
buff, err := withIgnore("base64dump.log", "AUDIT")
if err != nil {
_, err = buff.WriteTo(os.Stdout)
if err != nil {
Benchmark test
package main
import "testing"
func BenchmarkTestWithoutIgnore(b *testing.B) {
for i := 0; i < b.N; i++ {
_, err := withoutIgnore("base64dump.log")
if err != nil {
func BenchmarkTestWithIgnore(b *testing.B) {
for i := 0; i < b.N; i++ {
_, err := withIgnore("base64dump.log", "AUDIT")
if err != nil {
and the "base64dump.log" can be generated in the command line using
base64 /dev/urandom | head -c 10000000 > base64dump.log
Since ASCII is guaranteed, one can work directly at byte level.
Still if one checks each byte for line breaks when reading the input and then searches for the pattern again within the line, operations are applied to each byte.
If, on the other hand, one reads chunks of the input and performs an optimized search for the pattern in the text, not even examining each input byte, one minimizes the operations per input byte.
For example, there is the Boyer-Moore string search algorithm. Go's built-in bytes.Index function is also optimized. The achieved speed depends of course on the input data and the actual pattern. For the input as specified in the question, `bytes.Index turned out to be significantly more performant when measured.
read in a chunk, where the chunk size should be significantly longer than the maximum line length, a value >= 64KB should probably be good, in the test 1MB was used as in the question.
a chunk usually doesn't end at a linefeed, so search from the end of the chunk to the next linefeed, limit the search to this slice and remember the remaining data for the next pass
the last chunk does not necessarily end in a linefeed
with the help of the performant GO function bytes.Index you can find the places where the pattern occurs in the chunk
from the found location one searches for the preceding and the following linefeed
then the block is output up to the corresponding beginning of the line
and the search is continued from the end of the line where the pattern occurred
if the search does not find another location, the rest is output
read the next chunk and apply the described steps again until the end of the file is reached
A read operation may return less data than the chunk size, so it makes sense to repeat the read operation until the chunk size data has been read.
Optimized code is often significantly more complicated, but the performance is also significantly better, as we will see in a moment.
BenchmarkTestWithoutIgnore-8 270 4137267 ns/op
BenchmarkTestWithIgnore-8 54 22403931 ns/op
BenchmarkTestFilter-8 150 7947454 ns/op
Here, the optimized code BenchmarkTestFilter-8 is only about 1.9x slower than the operation without filtering while the BenchmarkTestWithIgnore-8 method is 5.4x slower than the comparison value without filtering.
Looked at another way: the optimized code is 2.8 times faster than the unoptimized one.
Of course, here is the code for your own tests:
func filterFile(f, pattern string) (*bytes.Buffer, error) {
rfd, err := os.Open(f)
if err != nil {
defer func() {
if err := rfd.Close(); err != nil {
reader := bufio.NewReader(rfd)
return filter(reader, []byte(pattern), 1024*1024)
// chunkSize must be larger than the longest line
// a reasonable size is probably >= 64K
func filter(reader io.Reader, pattern []byte, chunkSize int) (*bytes.Buffer, error) {
var bs []byte
buffer := bytes.NewBuffer(bs)
chunk := make([]byte, chunkSize)
var remaining []byte
for lastChunk := false; !lastChunk; {
n, err := readChunk(reader, chunk, remaining, chunkSize)
if err != nil {
if err == io.EOF {
lastChunk = true
} else {
return nil, err
remaining = remaining[:0]
if !lastChunk {
for i := n - 1; i > 0; i-- {
if chunk[i] == '\n' {
remaining = append(remaining, chunk[i+1:n]...)
n = i + 1
s := 0
for s < n {
hit := bytes.Index(chunk[s:n], pattern)
if hit < 0 {
hit += s
startOfLine := hit
for ; startOfLine > 0; startOfLine-- {
if chunk[startOfLine] == '\n' {
endOfLine := hit + len(pattern)
for ; endOfLine < n; endOfLine++ {
if chunk[endOfLine] == '\n' {
_, err = buffer.Write(chunk[s:startOfLine])
if err != nil {
return nil, err
s = endOfLine
if s < n {
_, err = buffer.Write(chunk[s:n])
if err != nil {
return nil, err
return buffer, nil
func readChunk(reader io.Reader, chunk, remaining []byte, chunkSize int) (int, error) {
copy(chunk, remaining)
r := len(remaining)
for r < chunkSize {
n, err := reader.Read(chunk[r:])
r += n
if err != nil {
return r, err
return r, nil
And the benchmark part might look something like this:
func BenchmarkTestFilter(b *testing.B) {
for i := 0; i < b.N; i++ {
_, err := filterFile("base64dump.log", "AUDIT")
if err != nil {
The filter function was split and the actual job is done in func filter(reader io.Reader, pattern []byte, chunkSize int) (*bytes.Buffer, error).
By injecting a reader and a chunkSize, the creation of unit tests is already prepared or contemplated, which is missing here, but is definitely recommended when dealing with indexes.
However, the main point here was to find a way to significantly improve it in terms of performance.
I need to hash very large files (>10TB files). So I decided to hash 128KB per MB.
My idea is to divide the file into 1MB blocks and hash only the first 128KB of each block.
The following code works, but it uses insane amounts of memory and I can't tell why...
func partialMD5Hash(filePath string) string {
var blockSize int64 = 1024 * 1024
var sampleSize int64 = 1024 * 128
file, err := os.Open(filePath)
if err != nil {
return "ERROR"
defer file.Close()
fileInfo, _ := file.Stat()
fileSize := fileInfo.Size()
hash := md5.New()
var i int64
for i = 0; i < fileSize / blockSize; i++ {
sample := make([]byte, sampleSize)
_, err = file.Read(sample)
if err != nil {
return "ERROR"
_, err := file.Seek(blockSize-sampleSize, 1)
if err != nil {
return "ERROR"
return hex.EncodeToString(hash.Sum(nil))
Any help will be appreciated!
There are several problems with the approach, and with the program.
If you want to hash a large file, you have to hash all of it. Sampling parts of the file will not detect modifications to the parts you didn't sample.
You are allocating a new buffer for every iteration. Instead, allocate one buffer outside the for-loop, and reuse it.
Also, you seem to be ignoring how many bytes actually read. So:
block := make([]byte, blockSize)
for {
n, err = file.Read(block)
if n>0 {
if err==io.EOF {
if err != nil {
return "ERROR"
However, the following would be much more concise:
I want to use inotify to watch some file changes, and use epoll to monitor if any inotify event occurs. However, I was having some trouble receiving inotify events. Say, how do I know if I got every events that occured.
If inotify fd sets to NON_BLOCK mode, EAGAIN would get returned when read() indicating that there's nothing else left to read, which is nice. But If inotify fd sets to BLOCKING mode, read() would block infinitely.
For example, if you
call read(2) by asking to read a certain amount of data and
read(2) returns a lower number of bytes, you can be sure of
having exhausted the read I/O space for the file descriptor.
Ref: epoll(7)
According to epoll(7), I can assure a thorough read if read() returns a lower number of bytes. But how should I deal with read() returning the same number of bytes ?
Here's the code I've tried.
func main() {
fd, _ := unix.InotifyInit()
epollfd, _ := unix.EpollCreate(1)
unix.EpollCtl(epollfd, unix.EPOLL_CTL_ADD, fd, &unix.EpollEvent{
Fd: int32(fd),
Events: unix.EPOLLIN,
unix.InotifyAddWatch(fd, os.Args[1], unix.IN_ALL_EVENTS)
epollevents := make([]unix.EpollEvent, 8)
for {
// epoll wait
var nepoll int
if nepoll, err = unix.EpollWait(epollfd, epollevents, -1); err == nil {
// nothing happens
} else if errors.Is(err, unix.EINTR) {
continue // ignore interrupt
} else {
// process inotify events
for i := 0; i < nepoll; i += 1 {
var nr int
// buff for receiving
// eventBuf for processing
// two buff to prevent a large amount of data
eventBuff := make([]byte, (unix.SizeofInotifyEvent+unix.PathMax)*2)
buff := make([]byte, unix.SizeofInotifyEvent+unix.PathMax)
offset, nr, lastUnread := 0, 0, 0
for {
// read loop in case of a large amount of data
if nr, err = unix.Read(int(epollevents[i].Fd), buff); err == nil {
// nothing happens
} else if errors.Is(err, unix.EAGAIN) {
} else {
copy(eventBuff[lastUnread:lastUnread+nr], buff[:nr])
lastUnread += nr
for offset < lastUnread &&
(nr < len(buff) || lastUnread-offset >= unix.PathMax+unix.SizeofInotifyEvent) {
event := (*unix.InotifyEvent)(unsafe.Pointer(&eventBuff[offset]))
switch event.Mask {
case unix.IN_ACCESS:
// ...
case unix.IN_OPEN:
// ...
case unix.IN_CLOSE_WRITE:
// ...
// ...
offset += unix.SizeofInotifyEvent + int(event.Len)
copy(eventBuff[offset:lastUnread], eventBuff[:lastUnread-offset])
lastUnread -= offset
offset = 0
fmt.Println("Over") // NEVER got print out in BLOCKING mode
I'm reading values from a channel in a loop like this:
for {
capturedFrame := <-capturedFrameChan
To make it more efficient, I would like to read these values in a batch, with something like this (pseudo-code):
for {
capturedFrames := <-capturedFrameChan
But I'm not sure how to do that. If I call capturedFrames := <-capturedFrameChan multiple times it's going to block.
Basically, what I would like is to read all the available values in captureFrameChan and, if none is available, it blocks as usual.
What would be the way to accomplish this in Go?
Something like this should work:
for {
// we initialize our slice. You may want to add a larger cap to avoid multiple memory allocations on `append`
capturedFrames := make([]Frame, 1)
// We block waiting for a first frame
capturedFrames[0] = <-capturedFrameChan
for {
select {
case buf := <-capturedFrameChan:
// if there is more frame immediately available, we add them to our slice
capturedFrames = append(capturedFrames, buf)
// else we move on without blocking
break forLoop
Try this (for channel ch with type T):
for firstItem := range ch { // For ensure that any batch could not be empty
var itemsBatch []T
itemsBatch = append(itemsBatch, firstItem)
for len(itemsBatch) < BATCHSIZE { // For control maximum size of batch
select {
case item := <-ch:
itemsBatch = append(itemsBatch, item)
break Remaining
// Consume itemsBatch here...
But, if BATCHSIZE is constant, this code would be more efficient:
var i int
itemsBatch := [BATCHSIZE]T{}
for firstItem := range ch { // For ensure that any batch could not be empty
itemsBatch[0] = firstItem
for i = 1; i < BATCHSIZE; i++ { // For control maximum size of batch
select {
case itemsBatch[i] = <-ch:
break Remaining
// Now you have itemsBatch with length i <= BATCHSIZE;
// Consume that here...
By using len(capturedFrames), you can do it like below:
for {
select {
case frame := <-capturedFrames:
frames := []Frame{frame}
for i := 0; i < len(capturedFrames); i++ {
frames = append(frames, <-capturedFrames)
Seems you can also benchmark just
for {
capturedFrame := <-capturedFrameChan
go remoteCopy(capturedFrame)
without any codebase refactoring to see if it increase efficiency.
I've ended up doing it as below. Basically I've used len(capturedFrames) to know how many frames are available, then retrieved them in a loop:
for {
var paths []string
itemCount := len(capturedFrames)
if itemCount <= 0 {
time.Sleep(50 * time.Millisecond)
for i := 0; i < itemCount; i++ {
f := <-capturedFrames
paths = append(paths, f)
err := multipleRemoteCopy(paths, opts)
if err != nil {
fmt.Printf("Error: could not remote copy \"%s\": %s", paths, err)
I'm trying to compress file from buffered reader and pass compressed bytes through byte channel, but with poor results :), here's what I came up till now, obviously this don't works...
func Compress(r io.Reader) (<-chan byte) {
c := make(chan byte)
go func(){
var wBuff bytes.Buffer
rBuff := make([]byte, 1024)
writer := zlib.NewWriter(*wBuff)
for {
n, err := r.Read(rBuff)
if err != nil && err != io.EOF { panic(err) }
if n == 0 { break }
writer.Write(rBuff) // Compress and write compressed data
// How to send written compressed bytes through channel?
// as fas as I understand wBuff will eventually contain
// whole compressed data?
close(c) // Indicate that no more data follows
return c
Please bear with me, as I'm very new to Go
I suggest to use []byte instead of byte. It is more efficient. Because of concurrent memory accesses it may be necessary to send a copy of the buffer through the channel rather than sending the []byte buffer itself.
You can define a type ChanWriter chan []byte and let it implement the io.Writer interface. Then pass the ChanWriter to zlib.NewWriter.
You can create a goroutine for doing the compression and then immediately return the ChanWriter's channel from your Compress function. If there is no goroutine then there is no reason for the function to return a channel and the preferred return type is io.Reader.
The return type of the Compress function should be changed into something like chan <-BytesWithError. In this case ChanWriter can be defined as type ChanWriter chan BytesWithError.
Sending bytes one by one down a channel is not going to be particularly efficient. Another approach that may be more useful would be to return an object implementing the io.Reader interface, implementing the Read() method by reading a block from a original io.Reader and compressing its output before returning it.
Your writer.Write(rBuff) statement always writes len(rBuff) bytes, even when n != len(rBuff).
Also, your Read loop is
for {
n, err := r.Read(rBuff)
if err != nil && err != io.EOF {
if n == 0 {
// ...
which is equivalent to
for {
n, err := r.Read(rBuff)
if err != nil && err != io.EOF {
// !(err != nil && err != io.EOF)
// !(err != nil) || !(err != io.EOF)
// err == nil || err == io.EOF
if err == nil || err == io.EOF {
if n == 0 {
// ...
The loop exits prematurely if err == nil && if n == 0.
Instead, write
for {
n, err := r.Read(rBuf)
if err != nil {
if err != io.EOF {
if n == 0 {
// ...
Ok, I've found working solution: (Feel free to indicate where it can be improved, or maybe I'm doing something wrong?)
func Compress(r io.Reader) (<-chan byte) {
c := make(chan byte)
go func(){
var wBuff bytes.Buffer
rBuff := make([]byte, 1024)
writer := zlib.NewWriter(&wBuff)
for {
n, err := r.Read(rBuff)
if err != nil {
if err != io.EOF {
if n == 0 {
for _, v := range wBuff.Bytes() {
c <- v
for _, v := range wBuff.Bytes() {
c <- v
close(c) // Indicate that no more data follows
return c