Read lines from text file in go, with bufio.reader - go

for {
v, err = nextNum(reader, ' ')
if err != nil {
break
}
w, err = nextNum(reader, ' ')
if err != nil {
break
}
cost, err = nextNum(reader, '\n')
if err != nil {
break
}
fmt.Println(v, w, cost)
}
My text file consists of three coloumns and n rows. The first time nextNum is called the number in the first row and first column will be returned, next time the number in the second column and first row, and so on. My problem is when i get to the end and i call nextNum for the last time then i will recieve an EOF error and the last line will never get printed out, becuase break will be called before. Any suggestions on how to solve the problem?
CHeers

I guess there is no new line in the last row in your file and it's simply ending with EOF. It his correct? As a result, the very last column is not being parsed correctly, as it doesn't end with an expected character (\n).
You didn't show us exactly how you're using bufio.Reader, but either way you will need to account for missing new line at the end of file (it's up to you whether treat it as an error or not). Using methods like bufio.Reader.ReadString with \n delimiter won't treat EOF as the end-of-line automatically, but will return you a valid content along with EOF (i.e. you can get both data and error at the same call – note this is a different behaviour than in bufio.Reader.Read).
Saying this, it might be beneficial for you to use the csv package instead. It will solve the EOF problem and you could also benefit from some nicer error messages on unexpected number of columns. The additional features like comments or quotes might be good or bad for your purposes.
Example:
// No line break at the end, pure EOF (still works)
data := "one 1\ntwo 2\nthree 3\nfour 4"
// You can wrap your file reader with bufio.Reader here
cr := csv.NewReader(bytes.NewReader([]byte(data)))
cr.Comma = ' '
cr.FieldsPerRecord = 2
var err error
for err == nil {
var columns []string
if columns, err = cr.Read(); err == nil {
fmt.Println(columns)
// err = processRow(columns)
}
}
if err != io.EOF {
// Parse error
panic(err)
}

From the bufio docs:
At EOF, the count will be zero and err will be io.EOF
So you can simply test for that. Like change your if err != nil to if err != nil && err != io.EOF
or
if err == io.EOF {
fmt.Println(v, w, cost)
break
}
if err != nil {
break
}
fmt.Println(v, w, cost)
Though you really should do something with the error and not just ignore it.

Related

How this error handling works. CSV file reading and writing case

I'm learning how to read and write CSV files and error handling in Go.
I found a query whose answer I do not understand.
Using Golang to read csv, reorder columns then write result to a new csv with Concurrency
The answer is
for line, err: = reader.Read (); err == nil; line, err = reader.Read () {
if err = writer.Write ([] string {line [col_index [0]], line [col_index [1]], line [col_index [2]]}); err! = nil {
fmt.Println ("Error:", err)
break
}
writer.Flush ()
}
Why error equals the return of writer.Write()?
I'm used to seeing only
err! = nil {
fmt.Println ("Error:", err)
break
}
Could you explain it to me?
Thanks!
You should read until the end of the line, this is an if statement with an assignment executed before the comparison.
Shortening the Write call, this is:
if err = writer.Write(...); err != nil {
This first assigns the return value of writer.Write to err, then checks it against nil.
An important note: the equal operator is ==, not =.
Another note: you should run gofmt on your code, != and := should not have a space in the middle. In fact, your spacing is all over the place.
If you look at the components of the first line:
for line, err: = reader.Read (); // Declare err and assign to the result of Read
err == nil; // Continue looping while err==nil
line, err = reader.Read () // Read again, and assign err
{
The above for-loop will continue looping reading lines while err==nil.
Then:
if err = writer.Write (...); err! = nil {
This is using the same err created in the for-loop. It will simply break the loop if Write returns an error.

Read file and display its contents in Go

I'm new to Go, I want to do a simple program that reads filename from user and display it's contents back to user. This is what I have so far:
fname := "D:\myfolder\file.txt"
f, err := os.Open(fname)
if err != nil {
fmt.Println(err)
}
var buff []byte
defer f.Close()
buff = make([]byte, 1024)
for {
n, err := f.Read(buff)
if n > 0 {
fmt.Println(string(buff[:n]))
}
if err == io.EOF {
break
}
}
but I get error:
The filename, directory name, or volume label syntax is incorrect.
I suspect the backslashes in fname is the reason. Try with double backslash (\\).
Put the filename in backquotes. This makes it a raw string literal. With raw string literals, no escape sequences such as \f will be processed.
fname := `D:\myfolder\file.txt`
You can also use the unix '/' path separators instead.
Does the job.
fname := "D:/myfolder/file.txt"
Congrats on learning Go! Though the question was about a specific error in the example, let's break it down line by line and learn a bit about some of the other issues that may be encountered:
fname := "D:\myfolder\file.txt"
Like C and many other languages, Go uses the backslash character for an "escape sequence". That is, certain characters that start with a backslash get translated into other characters that would be hard to see otherwise (eg. \t becomes a tab character, which may otherwise be indistinguishable from a space).
The fix is to use a raw string literal (use backticks instead of quotes) where no escape sequences are processed:
fname := `D:\myfolder\file.txt`
This fixes the initial error you were seeing by removing the invalid \m and \f escape sequences. A full list of escape sequences and more explanation can be found by reading the String Literals section of the Go spec.
f, err := os.Open(fname)
if err != nil {
fmt.Println(err)
}
The first line of this chunk is good, but it can be improved. If an error occurs, there is no reason for our program to continue executing since we couldn't even open the file, so we should both print it (probably to standard error) and exit, preferably with a non-zero exit status to indicate that something bad happened. Also, as a matter of good habit we probably want to close the file at the end of the function if opening it was successful. Putting it right below the Open call is conventional and makes it easier when someone else is reading your code. I would rewrite this as:
f, err := os.Open(fname)
if err != nil {
fmt.Fprintln(os.Stderr, err)
os.Exit(2)
// It is also common to replace these two lines with a call to log.Fatal
}
defer f.Close()
The last chunk is a bit complicated, and we could rewrite it in multiple ways. Right now it looks like this:
var buff []byte
defer f.Close()
buff = make([]byte, 1024)
for {
n, err := f.Read(buff)
if n > 0 {
fmt.Println(string(buff[:n]))
}
if err == io.EOF {
break
}
}
But we don't need to define our own buffering, because the standard library provides us with the bufio and bytes packages which can do this for us. In this case though, we probably don't need them because we can also replace the iteration with a call to io.Copy which does its own internal buffering. We could also use one of the other copy variants such as io.CopyBuffer if we wanted to use our own buffer. It's also missing some error handling, so we'll add that. Now this entire chunk becomes:
_, err := io.Copy(os.Stdout, f)
if err != nil {
fmt.Fprintf(os.Stderr, "Error reading from file: `%s'\n", err)
os.Exit(2)
}
// We're done!

io.Reader and Line Break issue involving a CSV file

I have an application which deals with CSV's being delivered via RabbitMQ from many different upstream applications - typically 5000-15,000 rows per file. Most of the time it works great. However a couple of these upstream applications are old (12-15 years) and the people who wrote them are long gone.
I'm unable to read CSV files from these older aplications due to the line breaks. I'm finding this a bit weird as the line breaks see to map to UTF-8 Carriage Returns (http://www.fileformat.info/info/unicode/char/000d/index.htm). Typically the app reads in only the headers from those older files and nothing else.
If I open one of these files in a text editor and save as utf-8 encoding overwriting the exiting file then it works with no issues at all.
Things I've tried I expected to work:
-Using a Reader:
ba := make([]byte, 262144000)
if _, err := file.Read(ba); err != nil {
return nil, err
}
ba = bytes.Trim(ba, "\x00")
bb := bytes.NewBuffer(ba)
reader := csv.NewReader(bb)
records, err := reader.ReadAll()
if err != nil {
return nil, err
}
-Using the Scanner to read line by line (get a bufio.Scanner: token too long)
scanner := bufio.NewScanner(file)
var bb bytes.Buffer
for scanner.Scan() {
bb.WriteString(fmt.Sprintf("%s\n", scanner.Text()))
}
// check for errors
if err = scanner.Err(); err != nil {
return nil, err
}
reader := csv.NewReader(&bb)
records, err := reader.ReadAll()
if err != nil {
return nil, err
}
Things I tried I expected not to work (and didn't):
Writing file contents to a new file (.txt) and reading the file back in (including running dos2unix against the created txt file)
Reading file into a standard string (hoping Go's UTF-8 encoding would magically kick in which of course it doesn't)
Reading file to Rune slice, then transforming to a string via byte slice
I'm aware of the https://godoc.org/golang.org/x/text/transform package but not too sure of a viable approach - it looks like the src encoding needs to be known to transform.
Am I stupidly overlooking something? Are there any suggestions how to transform these files into UTF-8 or update the line endings without knowing the file encoding whilst keeping the application working for all the other valid CSV files being delivered? Are there any options that don't involve me going byte to byte and doing a bytes.Replace I've not considered?
I'm hoping there's something really obvious I've overlooked.
Apologies - I can't share the CSV files for obvious reasons.
For anyone who's stumbled on this and wants an answer that doesn't involve strings.Replace, here's a method that wraps an io.Reader to replace solo carriage returns. It could probably be more efficient, but works better with huge files than a strings.Replace-based solution.
https://gist.github.com/b5/78edaae9e6a4248ea06b45d089c277d6
// ReplaceSoloCarriageReturns wraps an io.Reader, on every call of Read it
// for instances of lonely \r replacing them with \r\n before returning to the end customer
// lots of files in the wild will come without "proper" line breaks, which irritates go's
// standard csv package. This'll fix by wrapping the reader passed to csv.NewReader:
// rdr, err := csv.NewReader(ReplaceSoloCarriageReturns(r))
//
func ReplaceSoloCarriageReturns(data io.Reader) io.Reader {
return crlfReplaceReader{
rdr: bufio.NewReader(data),
}
}
// crlfReplaceReader wraps a reader
type crlfReplaceReader struct {
rdr *bufio.Reader
}
// Read implements io.Reader for crlfReplaceReader
func (c crlfReplaceReader) Read(p []byte) (n int, err error) {
if len(p) == 0 {
return
}
for {
if n == len(p) {
return
}
p[n], err = c.rdr.ReadByte()
if err != nil {
return
}
// any time we encounter \r & still have space, check to see if \n follows
// if next char is not \n, add it in manually
if p[n] == '\r' && n < len(p) {
if pk, err := c.rdr.Peek(1); (err == nil && pk[0] != '\n') || (err != nil && err.Error() == io.EOF.Error()) {
n++
p[n] = '\n'
}
}
n++
}
return
}
Have you tried to replace all line endings from \r\n or \r to \n ?

Golang - why is string slice element not included in exec cat unless I sort it

I have a slightly funky issue in golang. Essentially I have a slice of strings which represent file paths. I then run a cat against those filepaths to combine the files before sorting, deduping, etc.
here is the section of code (where 'applicableReductions' is the string slice):
applicableReductions := []string{}
for _, fqFromListName := range fqFromListNames {
filePath := GetFilePath()
//BROKE CODE GOES HERE
}
applicableReductions = append(applicableReductions, filePath)
fileOut, err := os.Create(toListWriteTmpFilePath)
if err != nil {
return err
}
cat := exec.Command("cat", applicableReductions...)
catStdOut, err := cat.StdoutPipe()
if err != nil {
return err
}
go func(cat *exec.Cmd) error {
if err := cat.Start(); err != nil {
return fmt.Errorf("File reduction error (cat) : %s", err)
}
return nil
}(cat)
// Init Writer & write file
writer := bufio.NewWriter(fileOut)
defer writer.Flush()
_, err = io.Copy(writer, catStdOut)
if err != nil {
return err
}
if err = cat.Wait(); err != nil {
return err
}
fDiff.StandardiseData(fileOut, toListUpdateFolderPath, list.Name)
The above works fine. The problem comes when I try to append a new ele to the array. I have a seperate function which creates a new file from db content which is then added to the applicableReductions slice.
func RetrieveDomainsFromDB(collection *Collection, listName, outputPath string) error {
domains, err := domainReviews.GetDomainsForList(listName)
if err != nil {
return err
}
if len(domains) < 1 {
return ErrNoDomainReviewsForList
}
fh, err := os.OpenFile(outputPath, os.O_RDWR, 0774)
if err != nil {
fh, err = os.Create(outputPath)
if err != nil {
return err
}
}
defer fh.Close()
_, err = fh.WriteString(strings.Join(domains, "\n"))
if err != nil {
return err
}
return nil
}
If I call the above function and append the filePath to the applicableReduction slice, it is in there but doesnt get called by cat.
To clarify, when I put the following where it says BROKE CODE GOES HERE:
if dbSource {
err = r.RetrieveDomainsFromDB(collection, ToListName, filePath)
if err != nil {
return err
continue
}
}
The filepath can be seen when doing fmt.Println(applicableReductions) but the content of the files contents are not seen in the cat output file.
I thought perhaps a delay in the file being written so i tried adding a time.wait, tis didnt help. However the solution I found was to sort the slice, e.g this code above the call to exec cat solves the problem but I dont know why:
sort.Strings(applicableReductions)
I have confirmed all files present on both successful and unsucessful runs the only difference is without the sort, the content of the final appended file is missing
An explanation from a go-pro out there would be very much appreciated, let me know if you need more info, debug - happy to oblige to understand
UPDATE
It has been suggested that this is the same issue as here: Golang append an item to a slice, I think I understand the issue there and I'm not saying this isnt the same but I cannot see the same thing happenning - the slice in question is not touched from outside the main function (e.g. no editing of the slice in RetrieveDomainsFromDB function), I create the slice before a loop, append to it within a loop and then use it after the loop - Ive added an example at the top to show how the slice is built - please could someone clarify where this slice is being copied if this is the case
UPDATE AND CLOSE
Please close question - the issue was unrelated to the use of a string slice. Turns out that I was reading from the final output file before bufio-writer had been flushed (at end of function before defer flush kicked in on function return)
I think the sorting was just re-arranging the problem so I didnt notice it persisted or possibly giving some time for the buffer to flush. Either way sorted now with a manual call to flush.
Thanks for all help provided

Golang: Skipping Whitespace in a file

In reading a file in Go I am attempting to skip all of the white spaces; however, I am having issues finding the correct method to do this. Any assistance would be appreciated
file, err := os.Open(filename) // For read access.
this.file = file
if err != nil {
log.Fatal(err)
}
//skip white space
c := make([]byte, 1)
char, err := this.file.Read(c)
//skip white space
for {
//catch unintended errors
if err != nil && err != io.EOF {
panic(err)
}
if err == io.EOF || !unicode.IsSpace(int(c)) {
break
}
//get next
char, err := this.file.Read(c)
}
I am just simply attempting to create a scanner for a file to read a single character at a time and ignore whitespace
EDIT
I changed a few things around to make use of bufio.Reader; however I have still fallen into issue What is the correct way to read a file character by character so that it might be compared to a specific symbol such as 'A' but also can ignore whitespace i.e unicode.isSpace(rune)
char, size, err := this.reader.ReadRune()
//skip white space and comments
for {
//catch unintended errors
if err != nil && err != io.EOF {
panic(err)
}
if err == io.EOF {
break
}
//skip it when their is no data or a space
if size != 0 && char == '{' {
//Ignore Comments
//Documentation specifies no nested comments
for char != '}' {
char, size, err = this.reader.ReadRune()
}
} else if !unicode.IsSpace(char) {
break
}
// Do something with the byte
fmt.Print(char)
//get next
char, size, err = this.reader.ReadRune()
}
Unless I'm misunderstanding your question, it would seem that you'd want a continue statement when encountering a space.
c := make([]byte, 100)
n, err := this.file.Read(c)
//skip white space
for {
//catch unintended errors
if err != nil && err != io.EOF {
panic(err)
}
if err == io.EOF {
break
}
for i := 0; i < n; i++ {
ch := c[i]
switch ch {
case '{': // Do something
case '}': // Do something else
default:
if unicode.IsSpace(int(ch)) {
continue
}
// Do whatever
}
}
//get next
n, err = this.file.Read(c)
}
I don't know why you're reading one byte at a time, but I left it that way in case it's intentional. At the very least, I'd think you'd want to read full unicode characters instead of individual bytes.

Resources