Efficient way to use a csv.Reader() for a "chan string" - go

I have a "chan string", where each entry is a CSV log line that I would like to convert to columns "[]string", currently I am (un-efficiently) creating a csv.NewReader(strings.NewReader(i)) for each item, which looks a lot more work than it really needs to be:
for i := range feederChan {
r := csv.NewReader(strings.NewReader(i))
a, err := r.Read()
if err != nil {
// log error...
continue
}
// then do stuff with 'a'
// ...
}
So, I'd really appreciate sharing if there's a more efficient way to do that, like creating the csv.Reader once, then feeding it the chan content somehow (stream 'chan' content to something that implements the 'io.Reader' interface?).

Use the following to convert a channel of strings to a reader:
type chanReader struct {
c chan string
buf string
}
func (r *chanReader) Read(p []byte) (int, error) {
// Fill the buffer when we have no data to return to the caller
if len(r.buf) == 0 {
var ok bool
r.buf, ok = <-r.c
if !ok {
// Return eof on channel closed
return 0, io.EOF
}
}
n := copy(p, r.buf)
r.buf = r.buf[n:]
return n, nil
}
Use it like this:
r := csv.NewReader(&chanReader{c: feederChan})
for {
a, err := r.Read()
if err != nil {
// handle error, break out of loop
}
// do something with a
}
Run it on the playground
If the application assumes that newlines separate the values received from the channel, then append a newline to each value received:
...
var ok bool
r.buf, ok = <-r.c
if !ok {
// Return eof on channel closed
return 0, io.EOF
}
r.buf += "\n"
...
The += "\n" copies the string. If this does not meet the application's efficiency requirements, then introduce a new field to manage line separators.
type chanReader struct {
c chan string // source of lines
buf string // the current line
nl bool // true if line separator is pending
}
func (r *chanReader) Read(p []byte) (int, error) {
// Fill the buffer when we have no data to return to the caller
if len(r.buf) == 0 && !r.nl {
var ok bool
r.buf, ok = <-r.c
if !ok {
// Return eof on channel closed
return 0, io.EOF
}
r.nl = true
}
// Return data if we have it
if len(r.buf) > 0 {
n := copy(p, r.buf)
r.buf = r.buf[n:]
return n, nil
}
// No data, return the line separator
n := copy(p, "\n")
r.nl = n == 0
return n, nil
}
Run it on the playground.
Another approach is to use an io.Pipe and goroutine to convert the channel to a io.Reader as suggested in a comment to the question. A first pass at this approach is:
var nl = []byte("\n")
func createChanReader(c chan string) io.Reader {
r, w := io.Pipe()
go func() {
defer w.Close()
for s := range c {
io.WriteString(w, s)
w.Write(nl)
}
}
}()
return r
}
Use it like this:
r := csv.NewReader(createChanReader(feederChan))
for {
a, err := r.Read()
if err != nil {
// handle error, break out of loop
}
// do something with a
}
This first pass at the io.Pipe solution leaks a goroutine when the application exits the loop before reading the pipe to EOF. The application might break out early because the CSV reader detected a syntax error, the application panicked because of a programmer error, or any number of other reasons.
To fix the goroutine leak, exit the writing goroutine on write error and close the pipe reader when done reading.
var nl = []byte("\n")
func createChanReader(c chan string) *io.PipeReader {
r, w := io.Pipe()
go func() {
defer w.Close()
for s := range c {
if _, err := io.WriteString(w, s); err != nil {
return
}
if _, err := w.Write(nl); err != nil {
return
}
}
}()
return r
}
Use it like this:
cr := createChanReader(feederChan)
defer cr.Close() // Required for goroutine cleanup
r := csv.NewReader(cr)
for {
a, err := r.Read()
if err != nil {
// handle error, break out of loop
}
// do something with a
}
Run it on the playground.

Even though "ThunderCat's" answer was really useful and appreciated, I ended up using io.Pipe() "as mh-cbon mentioned" which is much simpler and looks like more efficient (explained below):
rp, wp := io.Pipe()
go func() {
defer wp.Close()
for i := range feederChan {
fmt.Fprintln(wp, i)
}
}()
r := csv.NewReader(rp)
for { // keep reading
a, err := r.Read()
if err == io.EOF {
break
}
// do stuff with 'a'
// ...
}
The io.Pipe() is synchronous, and should be fairly efficient: it pipes data from writer to a reader; I fed the csv.NewReader() the reader part, and created a goroutine that drains the chan writing to the writer part.
Thanks a lot.
EDIT: ThunderCat added the io.Pipe approach to his answer (after I posted this I guess) ... his answer is much more comprehensive and was accepted as such.

Related

Ignore a line containing a pattern from a long text file in Go

I'm trying to implement a function to ignore a line containing a pattern from a long text file (ASCII guaranteed) in Go
The functions I have below withoutIgnore and withIgnore, both take a filename argument input and return a *byte.Buffer, which can be subsequently used to write to a io.Writer.
The withIgnore function takes an additional argument pattern to exclude the line containing the pattern from the file. The function works, but with benchmarking, found it to be 5x slower than withoutIgnore. Is there a way it could be improved?
package main
import (
"bufio"
"bytes"
"io"
"log"
"os"
)
func withoutIgnore(f string) (*bytes.Buffer, error) {
rfd, err := os.Open(f)
if err != nil {
log.Fatal(err)
}
defer func() {
if err := rfd.Close(); err != nil {
log.Fatal(err)
}
}()
inputBuffer := make([]byte, 1048576)
var bytesRead int
var bs []byte
opBuffer := bytes.NewBuffer(bs)
for {
bytesRead, err = rfd.Read(inputBuffer)
if err == io.EOF {
return opBuffer, nil
}
if err != nil {
return nil, nil
}
_, err = opBuffer.Write(inputBuffer[:bytesRead])
if err != nil {
return nil, err
}
}
return opBuffer, nil
}
func withIgnore(f, pattern string) (*bytes.Buffer, error) {
rfd, err := os.Open(f)
if err != nil {
log.Fatal(err)
}
defer func() {
if err := rfd.Close(); err != nil {
log.Fatal(err)
}
}()
scanner := bufio.NewScanner(rfd)
var bs []byte
buffer := bytes.NewBuffer(bs)
for scanner.Scan() {
if !bytes.Contains(scanner.Bytes(), []byte(pattern)) {
_, err := buffer.WriteString(scanner.Text() + "\n")
if err != nil {
return nil, nil
}
}
}
return buffer, nil
}
func main() {
// buff, err := withoutIgnore("base64dump.log")
buff, err := withIgnore("base64dump.log", "AUDIT")
if err != nil {
log.Fatal(err)
}
_, err = buff.WriteTo(os.Stdout)
if err != nil {
log.Fatal(err)
}
}
Benchmark test
package main
import "testing"
func BenchmarkTestWithoutIgnore(b *testing.B) {
for i := 0; i < b.N; i++ {
_, err := withoutIgnore("base64dump.log")
if err != nil {
b.Fatal(err)
}
}
}
func BenchmarkTestWithIgnore(b *testing.B) {
for i := 0; i < b.N; i++ {
_, err := withIgnore("base64dump.log", "AUDIT")
if err != nil {
b.Fatal(err)
}
}
}
and the "base64dump.log" can be generated in the command line using
base64 /dev/urandom | head -c 10000000 > base64dump.log
Since ASCII is guaranteed, one can work directly at byte level.
Still if one checks each byte for line breaks when reading the input and then searches for the pattern again within the line, operations are applied to each byte.
If, on the other hand, one reads chunks of the input and performs an optimized search for the pattern in the text, not even examining each input byte, one minimizes the operations per input byte.
For example, there is the Boyer-Moore string search algorithm. Go's built-in bytes.Index function is also optimized. The achieved speed depends of course on the input data and the actual pattern. For the input as specified in the question, `bytes.Index turned out to be significantly more performant when measured.
Procedure
read in a chunk, where the chunk size should be significantly longer than the maximum line length, a value >= 64KB should probably be good, in the test 1MB was used as in the question.
a chunk usually doesn't end at a linefeed, so search from the end of the chunk to the next linefeed, limit the search to this slice and remember the remaining data for the next pass
the last chunk does not necessarily end in a linefeed
with the help of the performant GO function bytes.Index you can find the places where the pattern occurs in the chunk
from the found location one searches for the preceding and the following linefeed
then the block is output up to the corresponding beginning of the line
and the search is continued from the end of the line where the pattern occurred
if the search does not find another location, the rest is output
read the next chunk and apply the described steps again until the end of the file is reached
Noteworthy
A read operation may return less data than the chunk size, so it makes sense to repeat the read operation until the chunk size data has been read.
Benchmark
Optimized code is often significantly more complicated, but the performance is also significantly better, as we will see in a moment.
BenchmarkTestWithoutIgnore-8 270 4137267 ns/op
BenchmarkTestWithIgnore-8 54 22403931 ns/op
BenchmarkTestFilter-8 150 7947454 ns/op
Here, the optimized code BenchmarkTestFilter-8 is only about 1.9x slower than the operation without filtering while the BenchmarkTestWithIgnore-8 method is 5.4x slower than the comparison value without filtering.
Looked at another way: the optimized code is 2.8 times faster than the unoptimized one.
Code
Of course, here is the code for your own tests:
func filterFile(f, pattern string) (*bytes.Buffer, error) {
rfd, err := os.Open(f)
if err != nil {
log.Fatal(err)
}
defer func() {
if err := rfd.Close(); err != nil {
log.Fatal(err)
}
}()
reader := bufio.NewReader(rfd)
return filter(reader, []byte(pattern), 1024*1024)
}
// chunkSize must be larger than the longest line
// a reasonable size is probably >= 64K
func filter(reader io.Reader, pattern []byte, chunkSize int) (*bytes.Buffer, error) {
var bs []byte
buffer := bytes.NewBuffer(bs)
chunk := make([]byte, chunkSize)
var remaining []byte
for lastChunk := false; !lastChunk; {
n, err := readChunk(reader, chunk, remaining, chunkSize)
if err != nil {
if err == io.EOF {
lastChunk = true
} else {
return nil, err
}
}
remaining = remaining[:0]
if !lastChunk {
for i := n - 1; i > 0; i-- {
if chunk[i] == '\n' {
remaining = append(remaining, chunk[i+1:n]...)
n = i + 1
break
}
}
}
s := 0
for s < n {
hit := bytes.Index(chunk[s:n], pattern)
if hit < 0 {
break
}
hit += s
startOfLine := hit
for ; startOfLine > 0; startOfLine-- {
if chunk[startOfLine] == '\n' {
startOfLine++
break
}
}
endOfLine := hit + len(pattern)
for ; endOfLine < n; endOfLine++ {
if chunk[endOfLine] == '\n' {
break
}
}
endOfLine++
_, err = buffer.Write(chunk[s:startOfLine])
if err != nil {
return nil, err
}
s = endOfLine
}
if s < n {
_, err = buffer.Write(chunk[s:n])
if err != nil {
return nil, err
}
}
}
return buffer, nil
}
func readChunk(reader io.Reader, chunk, remaining []byte, chunkSize int) (int, error) {
copy(chunk, remaining)
r := len(remaining)
for r < chunkSize {
n, err := reader.Read(chunk[r:])
r += n
if err != nil {
return r, err
}
}
return r, nil
}
And the benchmark part might look something like this:
func BenchmarkTestFilter(b *testing.B) {
for i := 0; i < b.N; i++ {
_, err := filterFile("base64dump.log", "AUDIT")
if err != nil {
b.Fatal(err)
}
}
}
The filter function was split and the actual job is done in func filter(reader io.Reader, pattern []byte, chunkSize int) (*bytes.Buffer, error).
By injecting a reader and a chunkSize, the creation of unit tests is already prepared or contemplated, which is missing here, but is definitely recommended when dealing with indexes.
However, the main point here was to find a way to significantly improve it in terms of performance.

Program goes into deadlock using waitgroup

I'm writing a program that reads a list of order numbers in a file called orders.csv and compares it with the other csv files that are present in the folder.
The problem is that it goes into deadlock even using waitgroup and I don't know why.
For some reason stackoverflow says that my post is mostly code, so I have to add this line, because the whole code is necessary if someone wants to help me debug this problem I'm having.
package main
import (
"bufio"
"fmt"
"log"
"os"
"path/filepath"
"strings"
"sync"
)
type Files struct {
filenames []string
}
type Orders struct {
ID []string
}
var ordersFilename string = "orders.csv"
func main() {
var (
ordersFile *os.File
files Files
orders Orders
err error
)
mu := new(sync.Mutex)
wg := &sync.WaitGroup{}
wg.Add(1)
if ordersFile, err = os.Open(ordersFilename); err != nil {
log.Fatalln("Could not open file: " + ordersFilename)
}
orders = getOrderIDs(ordersFile)
files.filenames = getCSVsFromCurrentDir()
var filenamesSize = len(files.filenames)
var ch = make(chan map[string][]string, filenamesSize)
var done = make(chan bool)
for i, filename := range files.filenames {
go func(currentFilename string, ch chan<- map[string][]string, i int, orders Orders, wg *sync.WaitGroup, filenamesSize *int, mu *sync.Mutex, done chan<- bool) {
wg.Add(1)
defer wg.Done()
checkFile(currentFilename, orders, ch)
mu.Lock()
*filenamesSize--
mu.Unlock()
if i == *filenamesSize {
done <- true
close(done)
}
}(filename, ch, i, orders, wg, &filenamesSize, mu, done)
}
select {
case str := <-ch:
fmt.Printf("%+v\n", str)
case <-done:
wg.Done()
break
}
wg.Wait()
close(ch)
}
// getCSVsFromCurrentDir returns a string slice
// with the filenames of csv files inside the
// current directory that are not "orders.csv"
func getCSVsFromCurrentDir() []string {
var filenames []string
err := filepath.Walk(".", func(path string, info os.FileInfo, err error) error {
if path != "." && strings.HasSuffix(path, ".csv") && path != ordersFilename {
filenames = append(filenames, path)
}
return nil
})
if err != nil {
log.Fatalln("Could not read file names in current dir")
}
return filenames
}
// getOrderIDs returns an Orders struct filled
// with order IDs retrieved from the file
func getOrderIDs(file *os.File) Orders {
var (
orders Orders
err error
fileContent string
)
reader := bufio.NewReader(file)
if fileContent, err = readLine(reader); err != nil {
log.Fatalln("Could not read file: " + ordersFilename)
}
for err == nil {
orders.ID = append(orders.ID, fileContent)
fileContent, err = readLine(reader)
}
return orders
}
func checkFile(filename string, orders Orders, ch chan<- map[string][]string) {
var (
err error
file *os.File
fileContent string
orderFilesMap map[string][]string
counter int
)
orderFilesMap = make(map[string][]string)
if file, err = os.Open(filename); err != nil {
log.Fatalln("Could not read file: " + filename)
}
reader := bufio.NewReader(file)
if fileContent, err = readLine(reader); err != nil {
log.Fatalln("Could not read file: " + filename)
}
for err == nil {
if containedInSlice(fileContent, orders.ID) && !containedInSlice(fileContent, orderFilesMap[filename]) {
orderFilesMap[filename] = append(orderFilesMap[filename], fileContent)
// fmt.Println("Found: ", fileContent, " in ", filename)
} else {
// fmt.Printf("Could not find: '%s' in '%s'\n", fileContent, filename)
}
counter++
fileContent, err = readLine(reader)
}
ch <- orderFilesMap
}
// containedInSlice returns true or false
// based on whether the string is contained
// in the slice
func containedInSlice(str string, slice []string) bool {
for _, ID := range slice {
if ID == str {
return true
}
}
return false
}
// readLine returns a line from the passed reader
func readLine(r *bufio.Reader) (string, error) {
var (
isPrefix bool = true
err error = nil
line, ln []byte
)
for isPrefix && err == nil {
line, isPrefix, err = r.ReadLine()
ln = append(ln, line...)
}
return string(ln), err
}
The first issue is the wg.Add always must be outside of the goroutine(s) it stands for. If it isn't, the
wg.Wait call might be called before the goutine(s) have actually started running (and called wg.Add) and therefore will "think"
that there is nothing to wait for.
The second issue with the code is that there are multiple ways it waits for the routines to be done. There is
the WaitGroup and there is the done channel. Use only one of them. Which one depends also on how the results of the
goroutines are used. Here we come to the next problem.
The third issue is with gathering the results. Currently the code only prints / uses a single result from the goroutines.
Put a for { ... } loop around the select and use return to break out of the loop if the done channel is closed.
(Note that you don't need to send anything on the done channel, closing it is enough.)
Improved Version 0.0.1
So here the first version (including some other "code cleanup") with a done channel used for closing and the WaitGroup removed:
func main() {
ordersFile, err := os.Open(ordersFilename)
if err != nil {
log.Fatalln("Could not open file: " + ordersFilename)
}
orders := getOrderIDs(ordersFile)
files := Files{
filenames: getCSVsFromCurrentDir(),
}
var (
mu = new(sync.Mutex)
filenamesSize = len(files.filenames)
ch = make(chan map[string][]string, filenamesSize)
done = make(chan bool)
)
for i, filename := range files.filenames {
go func(currentFilename string, ch chan<- map[string][]string, i int, orders Orders, filenamesSize *int, mu *sync.Mutex, done chan<- bool) {
checkFile(currentFilename, orders, ch)
mu.Lock()
*filenamesSize--
mu.Unlock()
// TODO: This also accesses filenamesSize, so it also needs to be protected with the mutex:
if i == *filenamesSize {
done <- true
close(done)
}
}(filename, ch, i, orders, &filenamesSize, mu, done)
}
// Note: closing a channel is not really needed, so you can omit this:
defer close(ch)
for {
select {
case str := <-ch:
fmt.Printf("%+v\n", str)
case <-done:
return
}
}
}
Improved Version 0.0.2
In your case we have some advantage however. We know exactly how many goroutines we started and therefore also how
many results we expect. (Of course if each goroutine returns a result which currently this code does.) That gives
us another option as we can collect the results with another for loop having the same amount of iterations:
func main() {
ordersFile, err := os.Open(ordersFilename)
if err != nil {
log.Fatalln("Could not open file: " + ordersFilename)
}
orders := getOrderIDs(ordersFile)
files := Files{
filenames: getCSVsFromCurrentDir(),
}
var (
// Note: a buffered channel helps speed things up. The size does not need to match the size of the items that will
// be passed through the channel. A fixed, small size is perfect here.
ch = make(chan map[string][]string, 5)
)
for _, filename := range files.filenames {
go func(filename string) {
// orders and channel are not variables of the loop and can be used without copying
checkFile(filename, orders, ch)
}(filename)
}
for range files.filenames {
str := <-ch
fmt.Printf("%+v\n", str)
}
}
A lot simpler, isn't it? Hope that helps!
There is a lot wrong with this code.
You're using the WaitGroup wrong. Add has to be called in the main goroutine, else there is a chance that Wait is called before all Add calls complete.
There's an extraneous Add(1) call right after initializing the WaitGroup that isn't matched by a Done() call, so Wait will never return (assuming the point above is fixed).
You're using both a WaitGroup and a done channel to signal completion. This is redundant at best.
You're reading filenamesSize while not holding the lock (in the if i == *filenamesSize statement). This is a race condition.
The i == *filenamesSize condition makes no sense in the first place. Goroutines execute in an arbitrary order, so you can't be sure that the goroutine with i == 0 is the last one to decrement filenamesSize
This can all be simplified by getting rid of most if the synchronization primitives and simply closing the ch channel when all goroutines are done:
func main() {
ch := make(chan map[string][]string)
var wg WaitGroup
for _, filename := range getCSVsFromCurrentDir() {
filename := filename // capture loop var
wg.Add(1)
go func() {
checkFile(filename, orders, ch)
wg.Done()
}()
}
go func() {
wg.Wait() // after all goroutines are done...
close(ch) // let range loop below exit
}()
for str := range ch {
// ...
}
}
not an answer, but some comments that does not fit the comment box.
In this part of the code
func main() {
var (
ordersFile *os.File
files Files
orders Orders
err error
)
mu := new(sync.Mutex)
wg := &sync.WaitGroup{}
wg.Add(1)
The last statement is a call to wg.Add that appears dangling. By that i mean we can hardly understand what will trigger the required wg.Done counter part. This is a mistake to call for wg.Add without a wg.Done, this is prone to errors to not write them in such way we can not immediately find them in pair.
In that part of the code, it is clearly wrong
go func(currentFilename string, ch chan<- map[string][]string, i int, orders Orders, wg *sync.WaitGroup, filenamesSize *int, mu *sync.Mutex, done chan<- bool) {
wg.Add(1)
defer wg.Done()
Consider that by the time the routine is executed, and that you added 1 to the waitgroup, the parent routine continues to execute. See this example: https://play.golang.org/p/N9Chaqkv4bd
The main routine does not wait for the waitgroup because it does not have time to increment.
There is more to say but i find it hard to understand the purpose of your code so i am not sure how to help you further without basically rewrite it.

How to efficiently close two goroutines?

I'm using two concurrent goroutines to copy stdin/stdout from my terminal to a net.Conn target. For some reason, I can't manage to completely stop the two go routines without getting a panic error (for trying to close a closed connection). This is my code:
func interact(c net.Conn, sessionMap map[int]net.Conn) {
quit := make(chan bool) //the channel to quit
copy := func(r io.ReadCloser, w io.WriteCloser) {
defer func() {
r.Close()
w.Close()
close(quit) //this is how i'm trying to close it
}()
_, err := io.Copy(w, r)
if err != nil {
//
}
}
go func() {
for {
select {
case <-quit:
return
default:
copy(c, os.Stdout)
}
}
}()
go func() {
for {
select {
case <-quit:
return
default:
copy(os.Stdin, c)
}
}
}()
}
This errors as panic: close of closed channel
I want to terminate the two go routines, and then normally proceed to another function. What am I doing wrong?
You can't call close on a channel more than once, there's no reason to call copy in a for loop, since it can only operate one time, and you're copying in the wrong direction, writing to stdin and reading from stdout.
Simply asking how to quit 2 goroutines is simple, but that's not the only thing you need to do here. Since io.Copy is blocking, you don't need the extra synchronization to determine when the call is complete. This lets you simplify the code significantly, which will make it a lot easier to reason about.
func interact(c net.Conn) {
go func() {
// You want to close this outside the goroutine if you
// expect to send data back over a half-closed connection
defer c.Close()
// Optionally close stdout here if you need to signal the
// end of the stream in a pipeline.
defer os.Stdout.Close()
_, err := io.Copy(os.Stdout, c)
if err != nil {
//
}
}()
_, err := io.Copy(c, os.Stdin)
if err != nil {
//
}
}
Also note that you may not be able to break out of the io.Copy from stdin, so you can't expect the interact function to return. Manually doing the io.Copy in the function body and checking for a half-closed connection on every loop may be a good idea, then you can break out sooner and ensure that you fully close the net.Conn.
Also could be like this
func scanReader(quit chan int, r io.Reader) chan string {
line := make(chan string)
go func(quit chan int) {
defer close(line)
scan := bufio.NewScanner(r)
for scan.Scan() {
select {
case <- quit:
return
default:
s := scan.Text()
line <- s
}
}
}(quit)
return line
}
stdIn := scanReader(quit, os.Stdin)
conIn := scanReader(quit, c)
for {
select {
case <-quit:
return
case l <- stdIn:
_, e := fmt.Fprintf(c, l)
if e != nil {
quit <- 1
return
}
case l <- conIn:
fmt.Println(l)
}
}

Return if there is no error Golang

In Golang, is it ok to run the function
err, value := function()
if err == nil {
return value
}
instead of doing this:
err, value := function()
if err != nil {
panic(err)
}
return err
If so, is there any time advantages / bonuses?
This is not a non-fatal error. I am trying to convert something to different types, and i'm not sure which I should use.
A panic is similar to an exception, but doesn't get passed to the caller (aka when you call panic, it happens then and there; you don't get to wait). You should go with the first sample of your code, where you can attempt an action, fail, and continue.
func main() {
s1 := rand.NewSource(time.Now().UnixNano())
r1 := rand.New(s1)
// Generate some random numbers, and call into add()
for i := 0; i < 10; i++ {
s, err := add(r1.Intn(100), r1.Intn(100))
if err != nil {
fmt.Println(err)
continue
}
fmt.Println(s)
}
}
// Error if we get a sum over 100
func add(a int, b int) (int, error) {
s := a + b
if s > 100 {
return s, errors.New("Hey doofus, error!")
}
return s, nil
}
If you were to panic in this example, you'd be done (try it-- instead of returning an error do panic("Some error"). But instead, we determine there's an error, and we can try to generate another random number.
Like others have said, if you have a use case where you just can't recover (say you were trying to read from a file, but the file isn't there), you might decide it's better to panic. But if you have a long running process (like an API), you'll want to keep churning, despite any errors.
GoPlay here: http://play.golang.org/p/ThXTxVfM6R
OP has update his post with a use case-- he's trying to convert to a type. If you were to panic in this function, you would be dead in the water. Instead, we want to return an error, and let the caller decide what to do with the error. Take this as an example:
func interfaceToString(i interface{}) (string, error) {
if i == nil {
return "", errors.New("nil interface")
}
switch i.(type) {
case string:
return i.(string), nil
case float64:
return strconv.Itoa(int(i.(float64))), nil
case int:
return strconv.Itoa(i.(int)), nil
}
return "", errors.New(fmt.Sprintf("Unable to convert %v", i))
}
GoPlay here: http://play.golang.org/p/7y7v151EH4

How to pass compressed bytes through channel?

I'm trying to compress file from buffered reader and pass compressed bytes through byte channel, but with poor results :), here's what I came up till now, obviously this don't works...
func Compress(r io.Reader) (<-chan byte) {
c := make(chan byte)
go func(){
var wBuff bytes.Buffer
rBuff := make([]byte, 1024)
writer := zlib.NewWriter(*wBuff)
for {
n, err := r.Read(rBuff)
if err != nil && err != io.EOF { panic(err) }
if n == 0 { break }
writer.Write(rBuff) // Compress and write compressed data
// How to send written compressed bytes through channel?
// as fas as I understand wBuff will eventually contain
// whole compressed data?
}
writer.Close()
close(c) // Indicate that no more data follows
}()
return c
}
Please bear with me, as I'm very new to Go
I suggest to use []byte instead of byte. It is more efficient. Because of concurrent memory accesses it may be necessary to send a copy of the buffer through the channel rather than sending the []byte buffer itself.
You can define a type ChanWriter chan []byte and let it implement the io.Writer interface. Then pass the ChanWriter to zlib.NewWriter.
You can create a goroutine for doing the compression and then immediately return the ChanWriter's channel from your Compress function. If there is no goroutine then there is no reason for the function to return a channel and the preferred return type is io.Reader.
The return type of the Compress function should be changed into something like chan <-BytesWithError. In this case ChanWriter can be defined as type ChanWriter chan BytesWithError.
Sending bytes one by one down a channel is not going to be particularly efficient. Another approach that may be more useful would be to return an object implementing the io.Reader interface, implementing the Read() method by reading a block from a original io.Reader and compressing its output before returning it.
Your writer.Write(rBuff) statement always writes len(rBuff) bytes, even when n != len(rBuff).
writer.Write(rBuff[:n])
Also, your Read loop is
for {
n, err := r.Read(rBuff)
if err != nil && err != io.EOF {
panic(err)
}
if n == 0 {
break
}
writer.Write(rBuff[:n])
// ...
}
which is equivalent to
for {
n, err := r.Read(rBuff)
if err != nil && err != io.EOF {
panic(err)
}
// !(err != nil && err != io.EOF)
// !(err != nil) || !(err != io.EOF)
// err == nil || err == io.EOF
if err == nil || err == io.EOF {
if n == 0 {
break
}
}
writer.Write(rBuff[:n])
// ...
}
The loop exits prematurely if err == nil && if n == 0.
Instead, write
for {
n, err := r.Read(rBuf)
if err != nil {
if err != io.EOF {
panic(err)
}
if n == 0 {
break
}
}
writer.Write(rBuf[:n])
// ...
}
Ok, I've found working solution: (Feel free to indicate where it can be improved, or maybe I'm doing something wrong?)
func Compress(r io.Reader) (<-chan byte) {
c := make(chan byte)
go func(){
var wBuff bytes.Buffer
rBuff := make([]byte, 1024)
writer := zlib.NewWriter(&wBuff)
for {
n, err := r.Read(rBuff)
if err != nil {
if err != io.EOF {
panic(err)
}
if n == 0 {
break
}
}
writer.Write(rBuff[:n])
for _, v := range wBuff.Bytes() {
c <- v
}
wBuff.Truncate(0)
}
writer.Close()
for _, v := range wBuff.Bytes() {
c <- v
}
close(c) // Indicate that no more data follows
}()
return c
}

Resources