How to read a data file using golang? - go

I have a txt file :
2
Data 5 1.32
DataSecond 4 5.41
4
...
And so on. How to read the first line to know the count and then go on spliting the other lines to get individual parameters? I tried doing as follows, but it is obviously wrong.
f, err := os.Open("DATA.txt")
check(err)
s := bufio.NewScanner(f)
for s.Scan() {
line := s.Text()
count, err := strconv.Atoi(line)
check(err)
for i := 0; i < count; i++ {
testArray := strings.Fields(s.Text())
for _, v := range testArray {
fmt.Println(v)
}
}
}

You just forgot to Scan() inside the inner loop.
f, err := os.Open("DATA.txt")
check(err)
s := bufio.NewScanner(f)
for s.Scan() {
line := s.Text()
count, err := strconv.Atoi(line)
check(err)
for i := 0; i < count && s.Scan(); i++ {
testArray := strings.Fields(s.Text())
for _, v := range testArray {
fmt.Println(v)
}
}
}

You could do something like this, read the single lines into a count int
and use it as a test. If count is > 0 then use sscanf to get the other values
func main() {
f, err := os.Open("DATA.txt")
check(err)
s := bufio.NewScanner(f)
count := 0
for s.Scan() {
line := s.Text()
if count < 1 {
count, err = strconv.Atoi(line)
check(err)
continue
}
count--
var tag string
var n int
var f float64
fmt.Sscanf(line, "%s %d %f", &tag, &n, &f)
// not sure what you really wnant to do with the data!
fmt.Println(n, f, tag)
}
}

Related

Slice automatically be sorted?

While I want to create my own pipeline to practice with goroutines, there's something particularly weird.
I use the random perm function to generate some int numbers, randomly obviously, I write them to IO writer and then read them form IO reader, cuz its binary source so I print them and they are sorted!!
Here's the code:
func RandomSource(tally int) chan int {
out := make(chan int)
sli := rand.Perm(tally)
fmt.Println(sli)
go func() {
for num := range sli {
out <- num
}
close(out)
}()
return out
}
func ReaderSource(reader io.Reader) chan int {
out := make(chan int)
go func() {
buffer := make([]byte, 8)
for ; ; {
n, err := reader.Read(buffer)
if n > 0 {
v := int(binary.BigEndian.Uint64(buffer))
out <- v
}
if err != nil {
break
}
}
close(out)
}()
return out
}
func WriterSink(writer io.Writer, in chan int) {
for v := range in {
buffer := make([]byte, 8)
binary.BigEndian.PutUint64(
buffer, uint64(v))
writer.Write(buffer)
}
}
func main() {
fileName := "small.in"
file, err := os.Create(fileName)
if err != nil {
log.Fatal(err)
}
defer file.Close()
p := RandomSource(500)
WriterSink(file, p)
file, err = os.Open(fileName)
if err != nil {
log.Fatal(err)
}
defer file.Close()
p = ReaderSource(file)
for v := range p {
fmt.Println(v)
}
}
range returns an index as the first value for an array or slice, which always goes from 0 up to len - 1. Use for _, num := range sli { if you want to iterate over the values themselves rather than the set of indices.

for range vs static channel length golang

I have a channel taking events parsed from a log file and another one which is used for synchronization. There were 8 events for the purpose of my test.
When using the for range syntax, I get 4 events. When using the known number (8), I can get all of them.
func TestParserManyOpinit(t *testing.T) {
ch := make(chan event.Event, 1000)
done := make(chan bool)
go parser.Parse("./test_data/many_opinit", ch, done)
count := 0
exp := 8
evtList := []event.Event{}
<-done
close(ch)
//This gets all the events
for i := 0; i < 8; i++ {
evtList = append(evtList, <-ch)
count++
}
//This only gives me four
//for range ch {
// evtList = append(evtList, <-ch)
// count++
//}
if count != exp || count != len(evtList) {
t.Errorf("Not proper lenght, got %d, exp %d, evtList %d", count, exp, len(evtList))
}
func Parse(filePath string, evtChan chan event.Event, done chan bool) {
log.Info(fmt.Sprintf("(thread) Parsing file %s", filePath))
file, err := os.Open(filePath)
defer file.Close()
if err != nil {
log.Error("Cannot read file " + filePath)
}
count := 0
scan := bufio.NewScanner(file)
scan.Split(splitFunc)
scan.Scan() //Skip log file header
for scan.Scan() {
text := scan.Text()
text = strings.Trim(text, "\n")
splitEvt := strings.Split(text, "\n")
// Some parsing ...
count++
evtChan <- evt
}
fmt.Println("Done ", count) // gives 8
done <- true
}
I must be missing something related to for loops on a channel.
I've tried adding a time.Sleep just before the done <- true part. It didn't change the result.
When you use for range, each loop iteration reads from the channel, and you're not using the read value. Hence, half the values are discarded. It should be:
for ev := range ch {
evtList = append(evtList, ev)
count++
}
In order to actually utilize the values read in the loop iterator.
Ranging over channels is demonstrated in the Tour of Go and detailed in the Go spec.

how to limit goroutine

I'm developing a gmail client based on google api.
I have a list of labels obtained through this call
r, err := s.gClient.Service.Users.Labels.List(s.gClient.User).Do()
Then, for every label I need to get details
for _, l := range r.Labels {
d, err := s.gClient.Service.Users.Labels.Get(s.gClient.User, l.Id).Do()
}
I'd like to handle the loop in a more powerful way so I have implemented a goroutine in the loop:
ch := make(chan label.Label)
for _, l := range r.Labels {
go func(gmailLabels *gmailclient.Label, gClient *gmail.Client, ch chan<- label.Label) {
d, err := s.gClient.Service.Users.Labels.Get(s.gClient.User, l.Id).Do()
if err != nil {
panic(err)
}
// Performs some operation with the label `d`
preparedLabel := ....
ch <- preparedLabel
}(l, s.gClient, ch)
}
for i := 0; i < len(r.Labels); i++ {
lab := <-ch
fmt.Printf("Processed %v\n", lab.LabelID)
}
The problem with this code is that gmail api has a rate limit, so, I get this error:
panic: googleapi: Error 429: Too many concurrent requests for user, rateLimitExceeded
What is the correct way to handle this situation?
How about only starting e.g. 10 goroutines and pass the values in from one for loop in another go routine. The channels have a small buffer to decrease synchronisation time.
chIn := make(chan label.Label, 20)
chOut := make(chan label.Label, 20)
for i:=0;i<10;i++ {
go func(gClient *gmail.Client, chIn chan label.Label, chOut chan<- label.Label) {
for gmailLabels := range chIn {
d, err := s.gClient.Service.Users.Labels.Get(s.gClient.User, l.Id).Do()
if err != nil {
panic(err)
}
// Performs some operation with the label `d`
preparedLabel := ....
chOut <- preparedLabel
}
}(s.gClient, chIn, chOut)
}
go func(chIn chan label.Label) {
defer close(chIn)
for _, l := range r.Labels {
chIn <- l
}
}(chIn)
for i := 0; i < len(r.Labels); i++ {
lab := <-chOut
fmt.Printf("Processed %v\n", lab.LabelID)
}
EDIT:
Here a playground sample.

write string to file in goroutine

I am using go routine in code as follow:
c := make(chan string)
work := make(chan string, 1000)
clvl := runtime.NumCPU()
for i := 0; i < clvl; i++ {
go func(i int) {
f, err := os.Create(fmt.Sprintf("/tmp/sample_match_%d.csv", i))
if nil != err {
panic(err)
}
defer f.Close()
w := bufio.NewWriter(f)
for jdId := range work {
for _, itemId := range itemIdList {
w.WriteString("test")
}
w.Flush()
c <- fmt.Sprintf("done %s", jdId)
}
}(i)
}
go func() {
for _, jdId := range jdIdList {
work <- jdId
}
close(work)
}()
for resp := range c {
fmt.Println(resp)
}
This is ok, but can I all go routine just write to one files? just like this:
c := make(chan string)
work := make(chan string, 1000)
clvl := runtime.NumCPU()
f, err := os.Create("/tmp/sample_match_%d.csv")
if nil != err {
panic(err)
}
defer f.Close()
w := bufio.NewWriter(f)
for i := 0; i < clvl; i++ {
go func(i int) {
for jdId := range work {
for _, itemId := range itemIdList {
w.WriteString("test")
}
w.Flush()
c <- fmt.Sprintf("done %s", jdId)
}
}(i)
}
This can not work, error : panic: runtime error: slice bounds out of range
The bufio.Writer type does not support concurrent access. Protect it with a mutex.
Because the short strings are flushed on every write, there's no point in using a bufio.Writer. Write to the file directly (and protect it with a mutex).
There's no code to ensure that the goroutines complete before the file is closed or the program exits. Use a sync.WaitGroup.

limitation on bytes.Buffer?

I am trying to gzip a slice of bytes using the package "compress/gzip". I am writing to a bytes.Buffer and I am writing 45976 bytes, when I am trying to uncompress the content using a gzip.reader and then reader function - I find that the not all of the content is recovered. Is there some limitations to bytes.buffer? and is it a way to by pass or alter this? here is my code (edit):
func compress_and_uncompress() {
var buf bytes.Buffer
w := gzip.NewWriter(&buf)
i,err := w.Write([]byte(long_string))
if(err!=nil){
log.Fatal(err)
}
w.Close()
b2 := make([]byte, 80000)
r, _ := gzip.NewReader(&buf)
j, err := r.Read(b2)
if(err!=nil){
log.Fatal(err)
}
r.Close()
fmt.Println("Wrote:", i, "Read:", j)
}
output from testing (with a chosen string as long_string) would give
Wrote: 45976, Read 32768
Continue reading to get the remaining 13208 bytes. The first read returns 32768 bytes, the second read returns 13208 bytes, and the third read returns zero bytes and EOF.
For example,
package main
import (
"bytes"
"compress/gzip"
"fmt"
"io"
"log"
)
func compress_and_uncompress() {
var buf bytes.Buffer
w := gzip.NewWriter(&buf)
i, err := w.Write([]byte(long_string))
if err != nil {
log.Fatal(err)
}
w.Close()
b2 := make([]byte, 80000)
r, _ := gzip.NewReader(&buf)
j := 0
for {
n, err := r.Read(b2[:cap(b2)])
b2 = b2[:n]
j += n
if err != nil {
if err != io.EOF {
log.Fatal(err)
}
if n == 0 {
break
}
}
fmt.Println(len(b2))
}
r.Close()
fmt.Println("Wrote:", i, "Read:", j)
}
var long_string string
func main() {
long_string = string(make([]byte, 45976))
compress_and_uncompress()
}
Output:
32768
13208
Wrote: 45976 Read: 45976
Use ioutil.ReadAll. The contract for io.Reader says it doesn't have to return all the data and there is a good reason for it not to to do with sizes of internal buffers. ioutil.ReadAll works like io.Reader but will read until EOF.
Eg (untested)
import "io/ioutil"
func compress_and_uncompress() {
var buf bytes.Buffer
w := gzip.NewWriter(&buf)
i,err := w.Write([]byte(long_string))
if err!=nil {
log.Fatal(err)
}
w.Close()
r, _ := gzip.NewReader(&buf)
b2, err := ioutil.ReadAll(r)
if err!=nil {
log.Fatal(err)
}
r.Close()
fmt.Println("Wrote:", i, "Read:", len(b2))
}
If the read from gzip.NewReader does not return the whole expected slice. You can just keep re-reading until you have recieved all the data in the buffer.
Regarding you problem where if you re-read the subsequent reads did not append to the end of the slice, but instead at the beginning; the answer can be found in the implementation of gzip's Read function, which includes
208 z.digest.Write(p[0:n])
This will result in an "append" at the beginning of the string.
This can be solves in this manner
func compress_and_uncompress(long_string string) {
// Writer
var buf bytes.Buffer
w := gzip.NewWriter(&buf)
i,err := w.Write([]byte(long_string))
if(err!=nil){
log.Fatal(err)
}
w.Close()
// Reader
var j, k int
b2 := make([]byte, 80000)
r, _ := gzip.NewReader(&buf)
for j=0 ; ; j+=k {
k, err = r.Read(b2[j:]) // Add the offset here
if(err!=nil){
if(err != io.EOF){
log.Fatal(err)
} else{
break
}
}
}
r.Close()
fmt.Println("Wrote:", i, "Read:", j)
}
The result will be:
Wrote: 45976 Read: 45976
Also after testing with a string of 45976 characters i can confirm that the output is in exactly the same manner as the input, where the second part is correctly appended after the first part.
Source for gzip.Read: http://golang.org/src/pkg/compress/gzip/gunzip.go?s=4633:4683#L189

Resources