I am using the following code to read a file in Go:
spoon , err := ioutil.ReadFile(os.Args[1])
if err!=nil {
panic ("File reading error")
}
Now I check for every byte I pick for what character it is. For example:
spoon[i]==' ' //for checking space
Likewise I read the whole file (I know there maybe other ways of reading it)
but keeping this way intact, how can I know that I have reached EOF of the file and I should stop reading it further?
Please don't suggest to find the length of spoon and start a loop. I want a sure shot way of finding EOF.
Use io.EOF to test for end-of-file. For example, to count spaces in a file:
package main
import (
"fmt"
"io"
"os"
)
func main() {
if len(os.Args) <= 1 {
fmt.Println("Missing file name argument")
return
}
f, err := os.Open(os.Args[1])
if err != nil {
fmt.Println(err)
return
}
defer f.Close()
data := make([]byte, 100)
spaces := 0
for {
data = data[:cap(data)]
n, err := f.Read(data)
if err != nil {
if err == io.EOF {
break
}
fmt.Println(err)
return
}
data = data[:n]
for _, b := range data {
if b == ' ' {
spaces++
}
}
}
fmt.Println(spaces)
}
ioutil.ReadFile() reads the entire contents of the file into a byte slice. You don't need to be concerned with EOF. EOF is a construct that is needed when you read a file one chunk at a time. You need to know which chunk has reached the end of the file when you're reading one chunk at a time.
The length of the byte slice returned by ioutil.ReadFile() is all you need.
data := ioutil.ReadFile(os.Args[1])
// Do we need to know the data size?
slice_size := len(data)
// Do we need to look at each byte?
for _,byte := range data {
// do something with each byte
}
This is what you need to look for to find out about End Of File(EOF)
if err != nil {
if errors.Is(err, io.EOF) { // prefered way by GoLang doc
fmt.Println("Reading file finished...")
}
break
}
When you use ioutil.ReadFile(), you don't ever see io.EOF, by design, because ReadFile will read the whole file until EOF is reached. So the slice it returns is the whole file. From the doc:
ReadFile reads the file named by filename and returns the contents. A successful call returns err == nil, not err == EOF. Because ReadFile reads the whole file, it does not treat an EOF from Read as an error to be reported.
From your question, you explicitly mention that you are aware there are other ways to read the file, and some of those ways require you to test the error for io.EOF, but not ReadFile.
Then, with the slice you have, you can read the file using the for...range construct, as others have mentioned. This is a sure way to read the whole file and nothing more (again, ReadFile takes care of that). Or iterating from 0 to len(spoon) - 1 would work too, but range is more idiomatic and basically does the same.
In other words: when you reach the end of the slice, you reach the end of the file (provided ReadFile did not return an error).
A slice has no concept of end of file. The slice returned by ioutil.ReadFile has a specific length, which reflects the size of the file it was read from. A common idiom, but only one of the possible used in this case, is to range the slice, effectively "consuming" all of the bytes, originally sitting in the file:
for i, b := range spoon {
// At index 'i' is byte 'b'
// At file's offset 'i', 'b' was read
... do something useful here
}
Related
i have a file. it has some ip
1.1.1.0/24
1.1.2.0/24
2.2.1.0/24
2.2.2.0/24
i read this file to slice, and used *(*string)(unsafe.Pointer(&b)) to parse []byte to string, but is doesn't work
func TestInitIpRangeFromFile(t *testing.T) {
filepath := "/tmp/test"
file, err := os.Open(filepath)
if err != nil {
t.Errorf("failed to open ip range file:%s, err:%s", filepath, err)
}
reader := bufio.NewReader(file)
ranges := make([]string, 0)
for {
ip, _, err := reader.ReadLine()
if err != nil {
if err == io.EOF {
break
}
logger.Fatalf("failed to read ip range file, err:%s", err)
}
t.Logf("ip:%s", *(*string)(unsafe.Pointer(&ip)))
ranges = append(ranges, *(*string)(unsafe.Pointer(&ip)))
}
t.Logf("%v", ranges)
}
result:
task_test.go:71: ip:1.1.1.0/24
task_test.go:71: ip:1.1.2.0/24
task_test.go:71: ip:2.2.1.0/24
task_test.go:71: ip:2.2.2.0/24
task_test.go:75: [2.2.2.0/24 1.1.2.0/24 2.2.1.0/24 2.2.2.0/24]
why 1.1.1.0/24 changed to 2.2.2.0/24 ?
change
*(*string)(unsafe.Pointer(&ip))
to string(ip) it works
So, while reinterpreting a slice-header as a string-header the way you did is absolutely bonkers and has no guarantee whatsoever of working correctly, it's only indirectly the cause of your problem.
The real problem is that you're retaining a pointer to the return value of bufio/Reader.ReadLine(), but the docs for that method say "The returned buffer is only valid until the next call to ReadLine." Which means that the reader is free to reuse that memory later on, and that's what's happening.
When you do the cast in the proper way, string(ip), Go copies the contents of the buffer into the newly-created string, which remains valid in the future. But when you type-pun the slice into a string, you keep the exact same pointer, which stops working as soon as the reader refills its buffer.
If you decided to do the pointer trickery as a performance hack to avoid copying and allocation... too bad. The reader interface is going to force you to copy the data out anyway, and since it does, you should just use string().
I am trying to read file in buffered manner because I have very large files. I want to apply some text replacement on a file. Suppose for each read I search for a word 'foo' and replace it with some other word 'bar'. If I read using buffer of some size 5MB then it may be the case foo will split into two reads may be one read 'fo' and another read 'o' then I will not be able to find that word. Is there a way so that I can use buffered read upto last newline or may be read multiple line in buffer
I did below. But It will not read upto next line or previous line
file, err := os.Open(filename)
if err != nil {
panic(err)
}
defer file.Close()
byteSlice := make([]byte, 5*1024*1024) // read 5 MB
bufioreader := bufio.NewReaderSize(file, bufferSize)
for {
n, err := bufioreader.Read(byteSlice)
if n > 0 {
fmt.Println(byteSlice[:n])
} else if err == io.EOF {
break
} else {
panic(err)
}
}
Since you're using the bufio reader, you shouldn't really work on aligning the input with buffer boundaries yourself. Use one of the high level read functions, such as `bufioreader.ReadString('\n'), which will read a line using the underlying buffer, and you won't have to deal with line delimiters yourself.
You don’t need bufio reader if you have your own buffer. With your code you have a useless copy of data from the buffer in bufio to the byteslice.
Regarding the split "foo" problem, the solution is to move the last 2 characters from the buffer to the front before the next read.
More precisely, if the word to replace has length m, the copy the m-1 last letters of the buffer to the front of the buffer, fill the remain of the buffer and search for the word to replace in the buffer.
// assume we want to find word
file, err := os.Open(filename)
if err != nil {
panic(err)
}
defer file.Close()
trailingLen := len(word)-1
dataLen := 5*1024*1024 + trailingLen
data := make([]byte, dataLen) // read 5 MB
for {
n, err := file.Read(data[trailingLen:])
if err != nil {
if err == io.EOF {
break
}
panic(err)
}
// search and replace word in data[:n]
if n == dataLen {
copy(data, data[dataLen-trailingLen:])
}
}
I'm reading about how to use CSVs in Golang and came across this code:
csvFile, _ := os.Open("people.csv")
reader := csv.NewReader(bufio.NewReader(csvFile))
var people []Person
for {
line, error := reader.Read()
if error == io.EOF {
break
} else if error != nil {
log.Fatal(error)
}
people = append(people, Person{
Firstname: line[0],
Lastname: line[1],
})
}
Located here: https://www.thepolyglotdeveloper.com/2017/03/parse-csv-data-go-programming-language/
What I find confusing here is with the infinite for-loop, each iteration grabs the next line but there's no lineNum++ type logic being passed into the Reader. How does the reader know which iteration its on? How can I change this? E.g. grab just the first line.
How does a Reader in Golang automatically iterate in a loop?
How does the reader know which iteration its on?
The Read method returns the next record by consuming more data from the underlying io.Reader. The Read method returns io.EOF when there are no more records in the underlying reader.
The application is responsible for calling read in a loop as shown in the example.
The Reader does not need to know the line number to read the next record, but the Reader does maintain a line counter in its internal state for annotating errors.
If the application needs to know the line number, the application can declare a counter and increment the counter on each read.
How can I change this? E.g. grab just the first line.
Call Read once:
f, err := os.Open("people.csv")
if err != nil {
// handle error
}
defer f.Close()
r := csv.NewReader(f)
firstLine, err := r.Read()
if err != nil {
// handle error
}
I am trying to implement my own beanstalkd client as a way of learning go. https://github.com/kr/beanstalkd/blob/master/doc/protocol.txt
At the moment, I am using bufio to read in a line of data delimited by \n.
res, err := this.reader.ReadLine('\n')
This is fine for when I send a single command, and read a a single line response like: INSERTED %d\r\n but I find difficulties when I try to reserve a job because the job body could be multiple lines and as such, I cannot use the \n delimiter.
Is there a way to read into the buffer until CRLF?
e.g. when I send the reserve command. My expected response is as follows:
RESERVED <id> <bytes>\r\n
<data>\r\n
But data could contain \n, so I need to read until the \r\n.
Alternatively - is there a way of reading a specific number of bytes as specified in <bytes> in example response above?
At the moment, I have (err handling removed):
func (this *Bean) receiveLine() (string, error) {
res, err := this.reader.ReadString('\n')
return res, err
}
func (this *Bean) receiveBody(numBytesToRead int) ([]byte, error) {
res, err := this.reader.ReadString('\r\n') // What to do here to read to CRLF / up to number of expected bytes?
return res, err
}
func (this *Bean) Reserve() (*Job, error) {
this.send("reserve\r\n")
res, err := this.receiveLine()
var jobId uint64
var bodylen int
_, err = fmt.Sscanf(res, "RESERVED %d %d\r\n", &jobId, &bodylen)
body, err := this.receiveBody(bodylen)
job := new(Job)
job.Id = jobId
job.Body = body
return job, nil
}
res, err := this.reader.Read('\n')
Does not make any sense to me. Did you mean ReadBytes/ReadSlice/ReadString?
You need bufio.Scanner.
Define your bufio.SplitFunc (example is a copy of bufio.ScanLines with modifications to look for '\r\n'). Modify it to match your case.
// dropCR drops a terminal \r from the data.
func dropCR(data []byte) []byte {
if len(data) > 0 && data[len(data)-1] == '\r' {
return data[0 : len(data)-1]
}
return data
}
func ScanCRLF(data []byte, atEOF bool) (advance int, token []byte, err error) {
if atEOF && len(data) == 0 {
return 0, nil, nil
}
if i := bytes.Index(data, []byte{'\r','\n'}); i >= 0 {
// We have a full newline-terminated line.
return i + 2, dropCR(data[0:i]), nil
}
// If we're at EOF, we have a final, non-terminated line. Return it.
if atEOF {
return len(data), dropCR(data), nil
}
// Request more data.
return 0, nil, nil
}
Now, wrap your io.Reader with your custom scanner.
scanner := bufio.NewScanner(this.reader)
scanner.Split(ScanCRLF)
// Set the split function for the scanning operation.
scanner.Split(split)
// Validate the input
for scanner.Scan() {
fmt.Printf("%s\n", scanner.Text())
}
if err := scanner.Err(); err != nil {
fmt.Printf("Invalid input: %s", err)
}
Read bufio package's source code about Scanner.
Alternatively - is there a way of reading a specific number of bytes as specified in in example response above?
First you need to read "RESERVED \r\n" line some how.
And then you can use
nr_of_bytes : = read_number_of_butes_somehow(this.reader)
buf : = make([]byte, nr_of_bytes)
this.reader.Read(buf)
or LimitedReader.
But i dont like this approach.
Thanks for this - reader.Read('\n') was a typo - I corrected question. I have also attached example code of where I have got so far. As you can see, I can get the number of expected bytes of the body. Could you elaborate on why you don't like the idea of reading a specific number of bytes? This seems most logical?
I'd like to see Bean's definition, especially reader's part.
Imagine, this counter is wrong somehow.
Its short: you need to find following "\r\n" and discard everything up to that point? or not? why do you need counter in the first place then?
Its bigger then it should be (or even worse its huge!).
2.1 No next message in the reader: fine, read is shorter then expected but its fine.
2.2 There is next message waiting: bah, you read part of it and there is no easy way to recover.
2.3 Its huge: you cant allocate memory even if message is only 1 byte.
This byte counters in general are designed to verify the message.
And looks like it is the case with beanstalkd protocol.
Use Scanner, parse message, check length with expected number ... profit
UPD
Be warned, default bufio.Scanner cant read more then 64k, set max length with scanner.Buffer first. And thats bad, because you cant change this option on the fly and some data may have had been "pre"-read by scanner.
UPD2
Thinking about my last update. Take a look at net.textproto how it implements dotReader like simple state machine. You could do something similar with reading command first and "expected bytes" checking on payload.
I'm new to go, so don't know a whole lot about the language specific constructs.
My use case is first to read into memory an input file containing JSON blobs that are newline delimited. From this "array" of JSON source, I'd like to unmarshal each array element to deal with it in golang. The expected structure mapping is already defined.
I typically like to read all lines at once, so ioutil.ReadFile() as mentioned in How can I read a whole file into a string variable in Golang? seems like a good choice. And json.Unmarshal appears to take byte array as the source. But if I'm using ReadFile(), I have a single array of bytes for the whole file. How might I extract slices of this byte array such that the newline bytes (as delimiters) are skipped and each slice is one of those JSON blobs? I'd assume the best technique is one that doesn't do or minimizes data type conversions. As the easy hack would be something like convert the byte array to string, split the newline delimited string to array then cast each string array element back to bytes to pass to json.Unmarshal. I'd prefer the optimized approach but not sure how to tackle the implementation algorithm details in go, could use some tips here.
Ideally, I'd like the preprocessing done beforehand, so that I'm not dealing with the content of the JSON byte array from file as I'm iterating over the slices, etc. Rather I'd like to preprocess the single byte array read from file into an array of byte array slices, with all the newline bytes removed, each slice being the segments that were delimited by newline.
Use bufio.Scanner to read a line at a time:
f, err := os.Open(fname)
if err != nil {
// handle error
}
s := bufio.NewScanner(f)
for s.Scan() {
var v ValueTypeToUnmarshalTo
if err := json.Unmarshal(s.Bytes(), &v); err != nil {
//handle error
}
// do something with v
}
if s.Err() != nil {
// handle scan error
}
or use ioutil.ReadFile to slurp up the entire file and bytes.Split to break the file into lines:
p, err := ioutil.ReadFile(fname)
if err != nil {
// handle error
}
for _, line := range bytes.Split(p, []byte{'\n'}) {
var v ValueTypeToUnmarshalTo
if err := json.Unmarshal(line, &v); err != nil {
//handle error
}
// do something with v
}
or use the json.Decoder built-in streaming feature to read mulitple values from the file:
f, err := os.Open(fname)
if err != nil {
// handle error
}
d := json.NewDecoder(f)
for {
var v ValueTypeToUnmarshalTo
if err := d.Decode(&v); err == io.EOF {
break // done decoding file
} else if err != nil {
// handle error
}
// do something with v
}
Run the code on the playground
The ioutil.ReadFile approach uses more memory than the other approaches (one byte for each byte in file plus one slice header for each line).
Because the decoder ignores whitespace following a JSON value, the three approaches handle \r\n line terminators.
There are no data conversions in any of these approaches other than those inherent to unmarshalling JSON bytes to Go values.