Porting from ruby to go: find and replace in a file - ruby

I am porting some ruby code to golang. I'm having difficulty finding a good equivalent for the below line and wondered if someone knew of a better solution than what I have below. The basic premise is find a line in a file that has a lot of spaces and remove the line.
I also thought of just using exec to call sed -i but when I tried that it didn't work, and the below did finally work.
Ruby:
File.write(filename, File.read(filename).gsub(/^\s*$/,""))
Golang:
b, err := ioutil.ReadFile(filename)
if err != nil {
return
}
// I happen to know that there will be at least 30 spaces,
// but I would really prefer to not use a hardcoded value here.
// I was just never able to make using '^\s*$' work in the regex.
r := regexp.MustCompile(`[ ]{30,}`) // there's a space in the []
newb := r.ReplaceAll(b, []byte(""))
err = ioutil.WriteFile(filename, newb, 0666)
if err != nil {
fmt.Printf("Unable to write to file (%+v)\n", err)
return
}

Turn on multiline mode, and your original pattern will work:
r := regexp.MustCompile(`(?m)^\s*$`)
Demo using strings: https://play.golang.org/p/6TsfgB83WgX

Related

What is the difference between os.Stdout and syscall.Stdout?

I have been trying to work ForkExec() and I am not able to get this one work, is there a difference between syscall.Stdout and os.Stdout?
Here is a small example of the code I am trying to run.
command := "/usr/bin/echo"
args := []string{"Hello there."}
attr := new(syscall.ProcAttr)
attr.Env = os.Environ()
attr.Files = []uintptr{uintptr(syscall.Stdin), uintptr(syscall.Stdout), uintptr(syscall.Stderr)}
pid , err := syscall.ForkExec(command, args, attr)
if err != nil {
log.Fatal(err)
}
fmt.Println(pid)
The output is not showing up on the screen.
Thanks a lot for your help in advance.
os.Stdout is a *os.File. It works with go functions that want an io.Writer or similar interfaces. syscall.Stdout is an integer constant. It's the file descriptor number of stdout, which is useful for low-level syscalls.
syscall.ForkExec does indeed want file descriptor numbers... but it's unclear why you're using that instead of os/exec.Cmd which is much more straightforward.

What is the fastest way to remove a specific line from a big file?

What is the best way to remove a line (which contains a specific substring) from a file?
I have tried to load the whole file into a slice, modify that slice and then print the slice to a file, which worked good, but when I want to do this with big files (e.g. 50GB+) this wouldn't work because I don't have so much memory.
I think this would be possible with streams, but I didn't figure out how to read and write at the same time (because I have to search the line via a substring and then remove it).
Is this even possible or do I have to read the whole file and safe the index? If so what is the best way of doing so?
This reads from standard input and writes to standard output. Note that I adapted it from code in the 2nd answer at reading file line by line in go (not tested).
scanner := bufio.NewScanner(os.Stdin)
for scanner.Scan() {
line := scanner.Text()
if line != "unwanted" {
fmt.Println(line)
}
}
if err := scanner.Err(); err != nil {
log.Fatal(err)
}

Golang one line for loop

I was trying to read in CSV file in Golang line by line with a for loop that required an if statement with a break to see if the error reading the file was EOF. I find this syntax rather unnecessary when I could in java for example read the line inside a while loop conditional and simultaneously check for the EOF error. I thought that declaring a variable inside of a for loop was possible and I know for sure that you can do this with if statements in Golang. Doing:
if v := 2; v > 1{
fmt.Println("2 is better than 1")
}
The first snippet of code I have here is what I know to work in my program.
reader := csv.NewReader(some_file)
for {
line, err := reader.Read()
if err == io.EOF {
break
}
//do data parsing from your line here
}
I do not know whether or not this second snippet is conceptually possible or just syntactically incorrect.
reader := csv.NewReader(some_file)
for line, err := reader.Read(); err != io.EOF {
//do data parsing from your line here
}
Would like some clarification/benefits/conventions of doing it one way over another, Thanks :)
It is conventional way to write simpler statements rather than lengthy, complex one, isn't it?
So, I consider the 1st version is more conventional way than the 2nd version. Moreover, the for loop in your 2nd version isn't in the right way. If you want to use that, then fix it like following or whatever you wish:
for line, err := reader.Read(); err != io.EOF; line, err = reader.Read() {
//do data parsing from your line here
}

Read file and display its contents in Go

I'm new to Go, I want to do a simple program that reads filename from user and display it's contents back to user. This is what I have so far:
fname := "D:\myfolder\file.txt"
f, err := os.Open(fname)
if err != nil {
fmt.Println(err)
}
var buff []byte
defer f.Close()
buff = make([]byte, 1024)
for {
n, err := f.Read(buff)
if n > 0 {
fmt.Println(string(buff[:n]))
}
if err == io.EOF {
break
}
}
but I get error:
The filename, directory name, or volume label syntax is incorrect.
I suspect the backslashes in fname is the reason. Try with double backslash (\\).
Put the filename in backquotes. This makes it a raw string literal. With raw string literals, no escape sequences such as \f will be processed.
fname := `D:\myfolder\file.txt`
You can also use the unix '/' path separators instead.
Does the job.
fname := "D:/myfolder/file.txt"
Congrats on learning Go! Though the question was about a specific error in the example, let's break it down line by line and learn a bit about some of the other issues that may be encountered:
fname := "D:\myfolder\file.txt"
Like C and many other languages, Go uses the backslash character for an "escape sequence". That is, certain characters that start with a backslash get translated into other characters that would be hard to see otherwise (eg. \t becomes a tab character, which may otherwise be indistinguishable from a space).
The fix is to use a raw string literal (use backticks instead of quotes) where no escape sequences are processed:
fname := `D:\myfolder\file.txt`
This fixes the initial error you were seeing by removing the invalid \m and \f escape sequences. A full list of escape sequences and more explanation can be found by reading the String Literals section of the Go spec.
f, err := os.Open(fname)
if err != nil {
fmt.Println(err)
}
The first line of this chunk is good, but it can be improved. If an error occurs, there is no reason for our program to continue executing since we couldn't even open the file, so we should both print it (probably to standard error) and exit, preferably with a non-zero exit status to indicate that something bad happened. Also, as a matter of good habit we probably want to close the file at the end of the function if opening it was successful. Putting it right below the Open call is conventional and makes it easier when someone else is reading your code. I would rewrite this as:
f, err := os.Open(fname)
if err != nil {
fmt.Fprintln(os.Stderr, err)
os.Exit(2)
// It is also common to replace these two lines with a call to log.Fatal
}
defer f.Close()
The last chunk is a bit complicated, and we could rewrite it in multiple ways. Right now it looks like this:
var buff []byte
defer f.Close()
buff = make([]byte, 1024)
for {
n, err := f.Read(buff)
if n > 0 {
fmt.Println(string(buff[:n]))
}
if err == io.EOF {
break
}
}
But we don't need to define our own buffering, because the standard library provides us with the bufio and bytes packages which can do this for us. In this case though, we probably don't need them because we can also replace the iteration with a call to io.Copy which does its own internal buffering. We could also use one of the other copy variants such as io.CopyBuffer if we wanted to use our own buffer. It's also missing some error handling, so we'll add that. Now this entire chunk becomes:
_, err := io.Copy(os.Stdout, f)
if err != nil {
fmt.Fprintf(os.Stderr, "Error reading from file: `%s'\n", err)
os.Exit(2)
}
// We're done!

io.Reader and Line Break issue involving a CSV file

I have an application which deals with CSV's being delivered via RabbitMQ from many different upstream applications - typically 5000-15,000 rows per file. Most of the time it works great. However a couple of these upstream applications are old (12-15 years) and the people who wrote them are long gone.
I'm unable to read CSV files from these older aplications due to the line breaks. I'm finding this a bit weird as the line breaks see to map to UTF-8 Carriage Returns (http://www.fileformat.info/info/unicode/char/000d/index.htm). Typically the app reads in only the headers from those older files and nothing else.
If I open one of these files in a text editor and save as utf-8 encoding overwriting the exiting file then it works with no issues at all.
Things I've tried I expected to work:
-Using a Reader:
ba := make([]byte, 262144000)
if _, err := file.Read(ba); err != nil {
return nil, err
}
ba = bytes.Trim(ba, "\x00")
bb := bytes.NewBuffer(ba)
reader := csv.NewReader(bb)
records, err := reader.ReadAll()
if err != nil {
return nil, err
}
-Using the Scanner to read line by line (get a bufio.Scanner: token too long)
scanner := bufio.NewScanner(file)
var bb bytes.Buffer
for scanner.Scan() {
bb.WriteString(fmt.Sprintf("%s\n", scanner.Text()))
}
// check for errors
if err = scanner.Err(); err != nil {
return nil, err
}
reader := csv.NewReader(&bb)
records, err := reader.ReadAll()
if err != nil {
return nil, err
}
Things I tried I expected not to work (and didn't):
Writing file contents to a new file (.txt) and reading the file back in (including running dos2unix against the created txt file)
Reading file into a standard string (hoping Go's UTF-8 encoding would magically kick in which of course it doesn't)
Reading file to Rune slice, then transforming to a string via byte slice
I'm aware of the https://godoc.org/golang.org/x/text/transform package but not too sure of a viable approach - it looks like the src encoding needs to be known to transform.
Am I stupidly overlooking something? Are there any suggestions how to transform these files into UTF-8 or update the line endings without knowing the file encoding whilst keeping the application working for all the other valid CSV files being delivered? Are there any options that don't involve me going byte to byte and doing a bytes.Replace I've not considered?
I'm hoping there's something really obvious I've overlooked.
Apologies - I can't share the CSV files for obvious reasons.
For anyone who's stumbled on this and wants an answer that doesn't involve strings.Replace, here's a method that wraps an io.Reader to replace solo carriage returns. It could probably be more efficient, but works better with huge files than a strings.Replace-based solution.
https://gist.github.com/b5/78edaae9e6a4248ea06b45d089c277d6
// ReplaceSoloCarriageReturns wraps an io.Reader, on every call of Read it
// for instances of lonely \r replacing them with \r\n before returning to the end customer
// lots of files in the wild will come without "proper" line breaks, which irritates go's
// standard csv package. This'll fix by wrapping the reader passed to csv.NewReader:
// rdr, err := csv.NewReader(ReplaceSoloCarriageReturns(r))
//
func ReplaceSoloCarriageReturns(data io.Reader) io.Reader {
return crlfReplaceReader{
rdr: bufio.NewReader(data),
}
}
// crlfReplaceReader wraps a reader
type crlfReplaceReader struct {
rdr *bufio.Reader
}
// Read implements io.Reader for crlfReplaceReader
func (c crlfReplaceReader) Read(p []byte) (n int, err error) {
if len(p) == 0 {
return
}
for {
if n == len(p) {
return
}
p[n], err = c.rdr.ReadByte()
if err != nil {
return
}
// any time we encounter \r & still have space, check to see if \n follows
// if next char is not \n, add it in manually
if p[n] == '\r' && n < len(p) {
if pk, err := c.rdr.Peek(1); (err == nil && pk[0] != '\n') || (err != nil && err.Error() == io.EOF.Error()) {
n++
p[n] = '\n'
}
}
n++
}
return
}
Have you tried to replace all line endings from \r\n or \r to \n ?

Resources