How does a Reader in Golang automatically iterate in a loop? - go

I'm reading about how to use CSVs in Golang and came across this code:
csvFile, _ := os.Open("people.csv")
reader := csv.NewReader(bufio.NewReader(csvFile))
var people []Person
for {
line, error := reader.Read()
if error == io.EOF {
break
} else if error != nil {
log.Fatal(error)
}
people = append(people, Person{
Firstname: line[0],
Lastname: line[1],
})
}
Located here: https://www.thepolyglotdeveloper.com/2017/03/parse-csv-data-go-programming-language/
What I find confusing here is with the infinite for-loop, each iteration grabs the next line but there's no lineNum++ type logic being passed into the Reader. How does the reader know which iteration its on? How can I change this? E.g. grab just the first line.

How does a Reader in Golang automatically iterate in a loop?
How does the reader know which iteration its on?
The Read method returns the next record by consuming more data from the underlying io.Reader. The Read method returns io.EOF when there are no more records in the underlying reader.
The application is responsible for calling read in a loop as shown in the example.
The Reader does not need to know the line number to read the next record, but the Reader does maintain a line counter in its internal state for annotating errors.
If the application needs to know the line number, the application can declare a counter and increment the counter on each read.
How can I change this? E.g. grab just the first line.
Call Read once:
f, err := os.Open("people.csv")
if err != nil {
// handle error
}
defer f.Close()
r := csv.NewReader(f)
firstLine, err := r.Read()
if err != nil {
// handle error
}

Related

why *(*string)(unsafe.Pointer(&b)) doesn't work with bufio.Reader

i have a file. it has some ip
1.1.1.0/24
1.1.2.0/24
2.2.1.0/24
2.2.2.0/24
i read this file to slice, and used *(*string)(unsafe.Pointer(&b)) to parse []byte to string, but is doesn't work
func TestInitIpRangeFromFile(t *testing.T) {
filepath := "/tmp/test"
file, err := os.Open(filepath)
if err != nil {
t.Errorf("failed to open ip range file:%s, err:%s", filepath, err)
}
reader := bufio.NewReader(file)
ranges := make([]string, 0)
for {
ip, _, err := reader.ReadLine()
if err != nil {
if err == io.EOF {
break
}
logger.Fatalf("failed to read ip range file, err:%s", err)
}
t.Logf("ip:%s", *(*string)(unsafe.Pointer(&ip)))
ranges = append(ranges, *(*string)(unsafe.Pointer(&ip)))
}
t.Logf("%v", ranges)
}
result:
task_test.go:71: ip:1.1.1.0/24
task_test.go:71: ip:1.1.2.0/24
task_test.go:71: ip:2.2.1.0/24
task_test.go:71: ip:2.2.2.0/24
task_test.go:75: [2.2.2.0/24 1.1.2.0/24 2.2.1.0/24 2.2.2.0/24]
why 1.1.1.0/24 changed to 2.2.2.0/24 ?
change
*(*string)(unsafe.Pointer(&ip))
to string(ip) it works
So, while reinterpreting a slice-header as a string-header the way you did is absolutely bonkers and has no guarantee whatsoever of working correctly, it's only indirectly the cause of your problem.
The real problem is that you're retaining a pointer to the return value of bufio/Reader.ReadLine(), but the docs for that method say "The returned buffer is only valid until the next call to ReadLine." Which means that the reader is free to reuse that memory later on, and that's what's happening.
When you do the cast in the proper way, string(ip), Go copies the contents of the buffer into the newly-created string, which remains valid in the future. But when you type-pun the slice into a string, you keep the exact same pointer, which stops working as soon as the reader refills its buffer.
If you decided to do the pointer trickery as a performance hack to avoid copying and allocation... too bad. The reader interface is going to force you to copy the data out anyway, and since it does, you should just use string().

Passing a pointer to bufio.Scanner()

Lest I provide an XY problem, my goal is to share a memory-mapped file between multiple goroutines as recommended. Each goroutine needs to iterate over the file line by line so I had hoped to store the complete contents in memory first to speed things up.
The method I tried is passing a pointer to a bufio.Scanner, but that is not working. I thought it might be related to needing to set the seek position back to the beginning of the file but it is not even working the very first time and I can find no such parameter in the documentation. My attempt was to create this function then pass the result by reference to the function I intend to run in a goroutine (for right now, I am not using goroutines just to make sure this works outright, which it does not).
Here is a MWE:
// ... package declaration; imports; yada yada
func main() {
// ... validate path to file stored in filePath variable
filePath := "/path/to/file.txt"
// get word list scanner to be shared between goroutines
scanner := getScannerPtr(&filePath)
// pass to function (no goroutine for now, I try to solve one problem at a time)
myfunc(scanner)
}
func getScannerPtr(filePath *string) *bufio.Scanner {
f, err := os.Open(*filePath)
if err != nil {
fmt.Fprint(os.Stderr, "Error opening file\n")
panic(err)
}
defer f.Close()
scanner := bufio.NewScanner(f)
scanner.Split(bufio.ScanLines)
return scanner
}
func myfunc(scanner *bufio.Scanner) {
for scanner.Scan() {
line := strings.TrimSpace(scanner.Text())
// ... do something with line
}
}
I'm not receiving any errors, it just is not iterating over the file when I call Scan() so it never makes it inside that block to do anything with each line of the file. Keep in mind I am not even using concurrency yet, that is just my eventual goal which I want to point out in case that impacts the method I need to take.
Why is Scan() not working?
Is this is a viable approach if I intend to call go myfunc(scanner) in the future?
You're closing the file before you ever use the Scanner:
func getScannerPtr(filePath *string) *bufio.Scanner {
f, err := os.Open(*filePath)
if err != nil {
fmt.Fprint(os.Stderr, "Error opening file\n")
panic(err)
}
defer f.Close() // <--- Here
scanner := bufio.NewScanner(f)
scanner.Split(bufio.ScanLines)
return scanner // <-- File gets closed, then Scanner that tries to read it is returned for further use, which won't work
}
Because Scanner does not expose Close, you'll need to work around this; the quickest is probably to make a simple custom type with a couple of embedded fields:
type FileScanner struct {
io.Closer
*bufio.Scanner
}
func getScannerPtr(filePath *string) *FileScanner {
f, err := os.Open(*filePath)
if err != nil {
fmt.Fprint(os.Stderr, "Error opening file\n")
panic(err)
}
scanner := bufio.NewScanner(f)
return &FileScanner{f, scanner}
}
func myfunc(scanner *FileScanner) {
defer scanner.Close()
for scanner.Scan() {
line := strings.TrimSpace(scanner.Text())
// ... do something with line
}
}

golang - bufio read multiline until (CRLF) \r\n delimiter

I am trying to implement my own beanstalkd client as a way of learning go. https://github.com/kr/beanstalkd/blob/master/doc/protocol.txt
At the moment, I am using bufio to read in a line of data delimited by \n.
res, err := this.reader.ReadLine('\n')
This is fine for when I send a single command, and read a a single line response like: INSERTED %d\r\n but I find difficulties when I try to reserve a job because the job body could be multiple lines and as such, I cannot use the \n delimiter.
Is there a way to read into the buffer until CRLF?
e.g. when I send the reserve command. My expected response is as follows:
RESERVED <id> <bytes>\r\n
<data>\r\n
But data could contain \n, so I need to read until the \r\n.
Alternatively - is there a way of reading a specific number of bytes as specified in <bytes> in example response above?
At the moment, I have (err handling removed):
func (this *Bean) receiveLine() (string, error) {
res, err := this.reader.ReadString('\n')
return res, err
}
func (this *Bean) receiveBody(numBytesToRead int) ([]byte, error) {
res, err := this.reader.ReadString('\r\n') // What to do here to read to CRLF / up to number of expected bytes?
return res, err
}
func (this *Bean) Reserve() (*Job, error) {
this.send("reserve\r\n")
res, err := this.receiveLine()
var jobId uint64
var bodylen int
_, err = fmt.Sscanf(res, "RESERVED %d %d\r\n", &jobId, &bodylen)
body, err := this.receiveBody(bodylen)
job := new(Job)
job.Id = jobId
job.Body = body
return job, nil
}
res, err := this.reader.Read('\n')
Does not make any sense to me. Did you mean ReadBytes/ReadSlice/ReadString?
You need bufio.Scanner.
Define your bufio.SplitFunc (example is a copy of bufio.ScanLines with modifications to look for '\r\n'). Modify it to match your case.
// dropCR drops a terminal \r from the data.
func dropCR(data []byte) []byte {
if len(data) > 0 && data[len(data)-1] == '\r' {
return data[0 : len(data)-1]
}
return data
}
func ScanCRLF(data []byte, atEOF bool) (advance int, token []byte, err error) {
if atEOF && len(data) == 0 {
return 0, nil, nil
}
if i := bytes.Index(data, []byte{'\r','\n'}); i >= 0 {
// We have a full newline-terminated line.
return i + 2, dropCR(data[0:i]), nil
}
// If we're at EOF, we have a final, non-terminated line. Return it.
if atEOF {
return len(data), dropCR(data), nil
}
// Request more data.
return 0, nil, nil
}
Now, wrap your io.Reader with your custom scanner.
scanner := bufio.NewScanner(this.reader)
scanner.Split(ScanCRLF)
// Set the split function for the scanning operation.
scanner.Split(split)
// Validate the input
for scanner.Scan() {
fmt.Printf("%s\n", scanner.Text())
}
if err := scanner.Err(); err != nil {
fmt.Printf("Invalid input: %s", err)
}
Read bufio package's source code about Scanner.
Alternatively - is there a way of reading a specific number of bytes as specified in in example response above?
First you need to read "RESERVED \r\n" line some how.
And then you can use
nr_of_bytes : = read_number_of_butes_somehow(this.reader)
buf : = make([]byte, nr_of_bytes)
this.reader.Read(buf)
or LimitedReader.
But i dont like this approach.
Thanks for this - reader.Read('\n') was a typo - I corrected question. I have also attached example code of where I have got so far. As you can see, I can get the number of expected bytes of the body. Could you elaborate on why you don't like the idea of reading a specific number of bytes? This seems most logical?
I'd like to see Bean's definition, especially reader's part.
Imagine, this counter is wrong somehow.
Its short: you need to find following "\r\n" and discard everything up to that point? or not? why do you need counter in the first place then?
Its bigger then it should be (or even worse its huge!).
2.1 No next message in the reader: fine, read is shorter then expected but its fine.
2.2 There is next message waiting: bah, you read part of it and there is no easy way to recover.
2.3 Its huge: you cant allocate memory even if message is only 1 byte.
This byte counters in general are designed to verify the message.
And looks like it is the case with beanstalkd protocol.
Use Scanner, parse message, check length with expected number ... profit
UPD
Be warned, default bufio.Scanner cant read more then 64k, set max length with scanner.Buffer first. And thats bad, because you cant change this option on the fly and some data may have had been "pre"-read by scanner.
UPD2
Thinking about my last update. Take a look at net.textproto how it implements dotReader like simple state machine. You could do something similar with reading command first and "expected bytes" checking on payload.

How to read a text file line-by-line in Go when some lines are long enough to cause "bufio.Scanner: token too long" errors?

I have a text file where each line represents a JSON object. I am processing this file in Go with a simple for loop like this:
scanner := bufio.NewScanner(file)
for scanner.Scan() {
jsonBytes = scanner.Bytes()
var jsonObject interface{}
err := json.Unmarshal(jsonBytes, &jsonObject)
// do stuff with "jsonObject"...
}
if err := scanner.Err(); err != nil {
log.Fatal(err)
}
When this code reaches a line with a particularly large JSON string (~67kb), I get the error message, "bufio.Scanner: token too long".
Is there an easy way to increase the max line size readable by NewScanner? Or is there another approach you can take altogether, when needing to read lines that are too large for NewScanner but are known to not be of unsafe size generally?
You can also do:
scanner := bufio.NewScanner(file)
buf := make([]byte, 0, 64*1024)
scanner.Buffer(buf, 1024*1024)
for scanner.Scan() {
// do your stuff
}
The second argument to scanner.Buffer() sets the maximum token size. In the above example you will be able to scan the file as long as none of the lines is larger than 1MB.
From the package docs:
Programs that need more control over error handling or large tokens,
or must run sequential scans on a reader, should use bufio.Reader
instead.
It looks like the preferred solution is bufio.Reader.ReadLine.
You surely don't want to be reading line-by-line in the first place. Why don't you just do this:
d := json.NewDecoder(file)
for {
var ob whateverType
err := d.Decode(&ob)
if err == io.EOF {
break
}
if err != nil {
log.Fatalf("Error decoding: %v", err)
}
// do stuff with "jsonObject"...
}

How to find EOF while reading from a file

I am using the following code to read a file in Go:
spoon , err := ioutil.ReadFile(os.Args[1])
if err!=nil {
panic ("File reading error")
}
Now I check for every byte I pick for what character it is. For example:
spoon[i]==' ' //for checking space
Likewise I read the whole file (I know there maybe other ways of reading it)
but keeping this way intact, how can I know that I have reached EOF of the file and I should stop reading it further?
Please don't suggest to find the length of spoon and start a loop. I want a sure shot way of finding EOF.
Use io.EOF to test for end-of-file. For example, to count spaces in a file:
package main
import (
"fmt"
"io"
"os"
)
func main() {
if len(os.Args) <= 1 {
fmt.Println("Missing file name argument")
return
}
f, err := os.Open(os.Args[1])
if err != nil {
fmt.Println(err)
return
}
defer f.Close()
data := make([]byte, 100)
spaces := 0
for {
data = data[:cap(data)]
n, err := f.Read(data)
if err != nil {
if err == io.EOF {
break
}
fmt.Println(err)
return
}
data = data[:n]
for _, b := range data {
if b == ' ' {
spaces++
}
}
}
fmt.Println(spaces)
}
ioutil.ReadFile() reads the entire contents of the file into a byte slice. You don't need to be concerned with EOF. EOF is a construct that is needed when you read a file one chunk at a time. You need to know which chunk has reached the end of the file when you're reading one chunk at a time.
The length of the byte slice returned by ioutil.ReadFile() is all you need.
data := ioutil.ReadFile(os.Args[1])
// Do we need to know the data size?
slice_size := len(data)
// Do we need to look at each byte?
for _,byte := range data {
// do something with each byte
}
This is what you need to look for to find out about End Of File(EOF)
if err != nil {
if errors.Is(err, io.EOF) { // prefered way by GoLang doc
fmt.Println("Reading file finished...")
}
break
}
When you use ioutil.ReadFile(), you don't ever see io.EOF, by design, because ReadFile will read the whole file until EOF is reached. So the slice it returns is the whole file. From the doc:
ReadFile reads the file named by filename and returns the contents. A successful call returns err == nil, not err == EOF. Because ReadFile reads the whole file, it does not treat an EOF from Read as an error to be reported.
From your question, you explicitly mention that you are aware there are other ways to read the file, and some of those ways require you to test the error for io.EOF, but not ReadFile.
Then, with the slice you have, you can read the file using the for...range construct, as others have mentioned. This is a sure way to read the whole file and nothing more (again, ReadFile takes care of that). Or iterating from 0 to len(spoon) - 1 would work too, but range is more idiomatic and basically does the same.
In other words: when you reach the end of the slice, you reach the end of the file (provided ReadFile did not return an error).
A slice has no concept of end of file. The slice returned by ioutil.ReadFile has a specific length, which reflects the size of the file it was read from. A common idiom, but only one of the possible used in this case, is to range the slice, effectively "consuming" all of the bytes, originally sitting in the file:
for i, b := range spoon {
// At index 'i' is byte 'b'
// At file's offset 'i', 'b' was read
... do something useful here
}

Resources