I was trying to read in CSV file in Golang line by line with a for loop that required an if statement with a break to see if the error reading the file was EOF. I find this syntax rather unnecessary when I could in java for example read the line inside a while loop conditional and simultaneously check for the EOF error. I thought that declaring a variable inside of a for loop was possible and I know for sure that you can do this with if statements in Golang. Doing:
if v := 2; v > 1{
fmt.Println("2 is better than 1")
}
The first snippet of code I have here is what I know to work in my program.
reader := csv.NewReader(some_file)
for {
line, err := reader.Read()
if err == io.EOF {
break
}
//do data parsing from your line here
}
I do not know whether or not this second snippet is conceptually possible or just syntactically incorrect.
reader := csv.NewReader(some_file)
for line, err := reader.Read(); err != io.EOF {
//do data parsing from your line here
}
Would like some clarification/benefits/conventions of doing it one way over another, Thanks :)
It is conventional way to write simpler statements rather than lengthy, complex one, isn't it?
So, I consider the 1st version is more conventional way than the 2nd version. Moreover, the for loop in your 2nd version isn't in the right way. If you want to use that, then fix it like following or whatever you wish:
for line, err := reader.Read(); err != io.EOF; line, err = reader.Read() {
//do data parsing from your line here
}
Related
What is the best way to remove a line (which contains a specific substring) from a file?
I have tried to load the whole file into a slice, modify that slice and then print the slice to a file, which worked good, but when I want to do this with big files (e.g. 50GB+) this wouldn't work because I don't have so much memory.
I think this would be possible with streams, but I didn't figure out how to read and write at the same time (because I have to search the line via a substring and then remove it).
Is this even possible or do I have to read the whole file and safe the index? If so what is the best way of doing so?
This reads from standard input and writes to standard output. Note that I adapted it from code in the 2nd answer at reading file line by line in go (not tested).
scanner := bufio.NewScanner(os.Stdin)
for scanner.Scan() {
line := scanner.Text()
if line != "unwanted" {
fmt.Println(line)
}
}
if err := scanner.Err(); err != nil {
log.Fatal(err)
}
I am porting some ruby code to golang. I'm having difficulty finding a good equivalent for the below line and wondered if someone knew of a better solution than what I have below. The basic premise is find a line in a file that has a lot of spaces and remove the line.
I also thought of just using exec to call sed -i but when I tried that it didn't work, and the below did finally work.
Ruby:
File.write(filename, File.read(filename).gsub(/^\s*$/,""))
Golang:
b, err := ioutil.ReadFile(filename)
if err != nil {
return
}
// I happen to know that there will be at least 30 spaces,
// but I would really prefer to not use a hardcoded value here.
// I was just never able to make using '^\s*$' work in the regex.
r := regexp.MustCompile(`[ ]{30,}`) // there's a space in the []
newb := r.ReplaceAll(b, []byte(""))
err = ioutil.WriteFile(filename, newb, 0666)
if err != nil {
fmt.Printf("Unable to write to file (%+v)\n", err)
return
}
Turn on multiline mode, and your original pattern will work:
r := regexp.MustCompile(`(?m)^\s*$`)
Demo using strings: https://play.golang.org/p/6TsfgB83WgX
I'm reading about how to use CSVs in Golang and came across this code:
csvFile, _ := os.Open("people.csv")
reader := csv.NewReader(bufio.NewReader(csvFile))
var people []Person
for {
line, error := reader.Read()
if error == io.EOF {
break
} else if error != nil {
log.Fatal(error)
}
people = append(people, Person{
Firstname: line[0],
Lastname: line[1],
})
}
Located here: https://www.thepolyglotdeveloper.com/2017/03/parse-csv-data-go-programming-language/
What I find confusing here is with the infinite for-loop, each iteration grabs the next line but there's no lineNum++ type logic being passed into the Reader. How does the reader know which iteration its on? How can I change this? E.g. grab just the first line.
How does a Reader in Golang automatically iterate in a loop?
How does the reader know which iteration its on?
The Read method returns the next record by consuming more data from the underlying io.Reader. The Read method returns io.EOF when there are no more records in the underlying reader.
The application is responsible for calling read in a loop as shown in the example.
The Reader does not need to know the line number to read the next record, but the Reader does maintain a line counter in its internal state for annotating errors.
If the application needs to know the line number, the application can declare a counter and increment the counter on each read.
How can I change this? E.g. grab just the first line.
Call Read once:
f, err := os.Open("people.csv")
if err != nil {
// handle error
}
defer f.Close()
r := csv.NewReader(f)
firstLine, err := r.Read()
if err != nil {
// handle error
}
I'm writing a program which opens a named pipe for reading, and then processes any lines written to this pipe:
err = syscall.Mkfifo("/tmp/myfifo", 0666)
if err != nil {
panic(err)
}
pipe, err := os.OpenFile("/tmp/myfifo", os.O_RDONLY, os.ModeNamedPipe)
if err != nil {
panic(err)
}
reader := bufio.NewReader(pipe)
scanner := bufio.NewScanner(reader)
for scanner.Scan() {
line := scanner.Text()
process(line)
}
This works fine as long as the writing process does not restart or for other reasons send an EOF. When this happens, the loop terminates (as expected from the specifications of Scanner).
However, I want to keep the pipe open to accept further writes. I could just reinitialize the scanner of course, but I believe this would create a race condition where the scanner might not be ready while a new process has begun writing to the pipe.
Are there any other options? Do I need to work directly with the File type instead?
From the bufio GoDoc:
Scan ... returns false when the scan stops, either by reaching the end of the input or an error.
So you could possibly leave the file open and read until EOF, then trigger scanner.Scan() again when the file has changed or at a regular interval (i.e. make a goroutine), and make sure the pipe variable doesn't go out of scope so you can reference it again.
If I understand your concern about a race condition correctly, this wouldn't be an issue (unless write and read operations must be synchronized) but when the scanner is re-initialized it will end up back at the beginning of the file.
I am using the following code to read a file in Go:
spoon , err := ioutil.ReadFile(os.Args[1])
if err!=nil {
panic ("File reading error")
}
Now I check for every byte I pick for what character it is. For example:
spoon[i]==' ' //for checking space
Likewise I read the whole file (I know there maybe other ways of reading it)
but keeping this way intact, how can I know that I have reached EOF of the file and I should stop reading it further?
Please don't suggest to find the length of spoon and start a loop. I want a sure shot way of finding EOF.
Use io.EOF to test for end-of-file. For example, to count spaces in a file:
package main
import (
"fmt"
"io"
"os"
)
func main() {
if len(os.Args) <= 1 {
fmt.Println("Missing file name argument")
return
}
f, err := os.Open(os.Args[1])
if err != nil {
fmt.Println(err)
return
}
defer f.Close()
data := make([]byte, 100)
spaces := 0
for {
data = data[:cap(data)]
n, err := f.Read(data)
if err != nil {
if err == io.EOF {
break
}
fmt.Println(err)
return
}
data = data[:n]
for _, b := range data {
if b == ' ' {
spaces++
}
}
}
fmt.Println(spaces)
}
ioutil.ReadFile() reads the entire contents of the file into a byte slice. You don't need to be concerned with EOF. EOF is a construct that is needed when you read a file one chunk at a time. You need to know which chunk has reached the end of the file when you're reading one chunk at a time.
The length of the byte slice returned by ioutil.ReadFile() is all you need.
data := ioutil.ReadFile(os.Args[1])
// Do we need to know the data size?
slice_size := len(data)
// Do we need to look at each byte?
for _,byte := range data {
// do something with each byte
}
This is what you need to look for to find out about End Of File(EOF)
if err != nil {
if errors.Is(err, io.EOF) { // prefered way by GoLang doc
fmt.Println("Reading file finished...")
}
break
}
When you use ioutil.ReadFile(), you don't ever see io.EOF, by design, because ReadFile will read the whole file until EOF is reached. So the slice it returns is the whole file. From the doc:
ReadFile reads the file named by filename and returns the contents. A successful call returns err == nil, not err == EOF. Because ReadFile reads the whole file, it does not treat an EOF from Read as an error to be reported.
From your question, you explicitly mention that you are aware there are other ways to read the file, and some of those ways require you to test the error for io.EOF, but not ReadFile.
Then, with the slice you have, you can read the file using the for...range construct, as others have mentioned. This is a sure way to read the whole file and nothing more (again, ReadFile takes care of that). Or iterating from 0 to len(spoon) - 1 would work too, but range is more idiomatic and basically does the same.
In other words: when you reach the end of the slice, you reach the end of the file (provided ReadFile did not return an error).
A slice has no concept of end of file. The slice returned by ioutil.ReadFile has a specific length, which reflects the size of the file it was read from. A common idiom, but only one of the possible used in this case, is to range the slice, effectively "consuming" all of the bytes, originally sitting in the file:
for i, b := range spoon {
// At index 'i' is byte 'b'
// At file's offset 'i', 'b' was read
... do something useful here
}