Strip consecutive empty lines in a golang writer - go

I've got a Go text/template that renders a file, however I've found it difficult to structure the template cleanly while preserving the line breaks in the output.
I'd like to have additional, unnecessary newlines in the template to make it more readable, but strip them from the output. Any group of newlines more than a normal paragraph break should be condensed to a normal paragraph break, e.g.
lines with
too many breaks should become lines with
normal paragraph breaks.
The string is potentially too large to store safely in memory, so I want to keep it as an output stream.
My first attempt:
type condensingWriter struct {
writer io.Writer
lastLineIsEmpty bool
}
func (c condensingWriter) Write(b []byte) (n int, err error){
thisLineIsEmpty := strings.TrimSpace(string(b)) == ""
defer func(){
c.lastLineIsEmpty = thisLineIsEmpty
}()
if c.lastLineIsEmpty && thisLineIsEmpty{
return 0, nil
} else {
return c.writer.Write(b)
}
}
This doesn't work because I naively assumed that it would buffer on newline characters, but it doesn't.
Any suggestions on how to get this to work?

Inspired by zmb's approach, I've come up with the following package:
//Package striplines strips runs of consecutive empty lines from an output stream.
package striplines
import (
"io"
"strings"
)
// Striplines wraps an output stream, stripping runs of consecutive empty lines.
// You must call Flush before the output stream will be complete.
// Implements io.WriteCloser, Writer, Closer.
type Striplines struct {
Writer io.Writer
lastLine []byte
currentLine []byte
}
func (w *Striplines) Write(p []byte) (int, error) {
totalN := 0
s := string(p)
if !strings.Contains(s, "\n") {
w.currentLine = append(w.currentLine, p...)
return 0, nil
}
cur := string(append(w.currentLine, p...))
lastN := strings.LastIndex(cur, "\n")
s = cur[:lastN]
for _, line := range strings.Split(s, "\n") {
n, err := w.writeLn(line + "\n")
w.lastLine = []byte(line)
if err != nil {
return totalN, err
}
totalN += n
}
rem := cur[(lastN + 1):]
w.currentLine = []byte(rem)
return totalN, nil
}
// Close flushes the last of the output into the underlying writer.
func (w *Striplines) Close() error {
_, err := w.writeLn(string(w.currentLine))
return err
}
func (w *Striplines) writeLn(line string) (n int, err error) {
if strings.TrimSpace(string(w.lastLine)) == "" && strings.TrimSpace(line) == "" {
return 0, nil
} else {
return w.Writer.Write([]byte(line))
}
}
See it in action here: http://play.golang.org/p/t8BGPUMYhb

The general idea is you'll have to look for consecutive newlines anywhere in the input slice and if such cases exist, skip over all but the first newline character.
Additionally, you have to track whether the last byte written was a newline, so the next call to Write will know to eliminate a newline if necessary. You were on the right track by adding a bool to your writer type. However, you'll want to use a pointer receiver instead of a value receiver here, otherwise you'll be modifying a copy of the struct.
You would want to change
func (c condensingWriter) Write(b []byte)
to
func (c *condensingWriter) Write(b []byte)
You could try something like this. You'll have to test with larger inputs to make sure it handles all cases correctly.
package main
import (
"bytes"
"io"
"os"
)
var Newline byte = byte('\n')
type ReduceNewlinesWriter struct {
w io.Writer
lastByteNewline bool
}
func (r *ReduceNewlinesWriter) Write(b []byte) (int, error) {
// if the previous call to Write ended with a \n
// then we have to skip over any starting newlines here
i := 0
if r.lastByteNewline {
for i < len(b) && b[i] == Newline {
i++
}
b = b[i:]
}
r.lastByteNewline = b[len(b) - 1] == Newline
i = bytes.IndexByte(b, Newline)
if i == -1 {
// no newlines - just write the entire thing
return r.w.Write(b)
}
// write up to the newline
i++
n, err := r.w.Write(b[:i])
if err != nil {
return n, err
}
// skip over immediate newline and recurse
i++
for i < len(b) && b[i] == Newline {
i++
}
i--
m, err := r.Write(b[i:])
return n + m, nil
}
func main() {
r := ReduceNewlinesWriter{
w: os.Stdout,
}
io.WriteString(&r, "this\n\n\n\n\n\n\nhas\nmultiple\n\n\nnewline\n\n\n\ncharacters")
}

Related

How to recursively capture user input

I'm trying to capture the input of a bunch of numbers in Go. I am not allowed to do for loops. User input is multi-lined. However the function below is not returning the expected results of an []int, it instead returns with an empty array. Why is this? Or is there another way to capture multi-lined user input without for loops?
func input_to_list() []int {
fmt.Print("continuously enter text: ")
reader := bufio.NewReader(os.Stdin)
user_input, _ := reader.ReadString('\n')
print(user_input)
var result []int
if user_input == "\n" {
return result
}
return append(result, input_to_list()...)
}
How to recursively capture user input?
I am not allowed to do for loops.
For example,
package main
import (
"bufio"
"fmt"
"io"
"os"
"strconv"
"strings"
)
func readInt(rdr *bufio.Reader, n []int) []int {
line, err := rdr.ReadString('\n')
line = strings.TrimSpace(line)
if i, err := strconv.Atoi(line); err == nil {
n = append(n, i)
}
if err == io.EOF || strings.ToLower(line) == "end" {
return n
}
return readInt(rdr, n)
}
func ReadInts() []int {
fmt.Print("enter integers:\n")
var n []int
rdr := bufio.NewReader(os.Stdin)
return readInt(rdr, n)
}
func main() {
n := ReadInts()
fmt.Println(n)
}
Output:
enter integers:
42
7
end
[42 7]
Your function never assigns any value to result.
func input_to_list() []int {
/* ... */
var result []int // Create empty `result` slice
if user_input == "\n" {
return result // Return empty result slice
}
return append(result, input_to_list()...) // Combine two empty slices, and return the (still) empty slice
}
Let's step through:
You create an empty slice called result
If user_input is empty, you return the result immediately.
If user_input is not empty, you call input_to_list() recursively, and add its (empty) result to your empty result, then return that (still) empty result.
To get your desired behavior, you should be doing something (other than just checking for empty) with user_input. Probably something related to strconv.Atoi or similar, then adding that to result.

Tour of Go exercise #22: Reader, what does the question mean?

Exercise: Readers
Implement a Reader type that emits an infinite stream of the ASCII character 'A'.
I don't understand the question, how to emit character 'A'? into which variable should I set that character?
Here's what I tried:
package main
import "golang.org/x/tour/reader"
type MyReader struct{}
// TODO: Add a Read([]byte) (int, error) method to MyReader.
func main() {
reader.Validate(MyReader{}) // what did this function expect?
}
func (m MyReader) Read(b []byte) (i int, e error) {
b = append(b, 'A') // this is wrong..
return 1, nil // this is also wrong..
}
Ah I understand XD
I think it would be better to say: "rewrite all values in []byte into 'A's"
package main
import "golang.org/x/tour/reader"
type MyReader struct{}
// TODO: Add a Read([]byte) (int, error) method to MyReader.
func (m MyReader) Read(b []byte) (i int, e error) {
for x := range b {
b[x] = 'A'
}
return len(b), nil
}
func main() {
reader.Validate(MyReader{})
}
An io.Reader.Read role is to write a given memory location with data read from its source.
To implement a stream of 'A', the function must write given memory location with 'A' values.
It is not required to fill in the entire slice provided in input, it can decide how many bytes of the input slice is written (Read reads up to len(p) bytes into p), it must return that number to indicate to the consumer the length of data to process.
By convention an io.Reader indicates its end by returning an io.EOF error. If the reader does not return an error, it behaves as an infinite source of data to its consumer which can never detect an exit condition.
Note that a call to Read that returns 0 bytes read can happen and does not indicate anything particular, Callers should treat a return of 0 and nil as indicating that nothing happened; Which makes this non-solution https://play.golang.org/p/aiUyc4UDYi2 fails with a timeout.
In regard to that, the solution provided here https://stackoverflow.com/a/68077578/4466350 return copy(b, "A"), nil is really just right. It writes the minimum required, with an elegant use of built-ins and syntax facilities, and it never returns an error.
The alleged answer is didn't work for me, even without the typos.
Try as I did, that string would not go into b.
func (r MyReader) Read(b []byte) (int, error) {
return copy(b, "A"), nil
}
My solution: just add one byte at a time, store the index i using closure.
package main
import (
"golang.org/x/tour/reader"
)
type MyReader struct{}
func (mr MyReader) Read(b []byte) (int, error) {
i := 0
p := func () int {
b[i] = 'A'
i += 1
return i
}
return p(), nil
}
func main() {
reader.Validate(MyReader{})
}
Simplest one:
func (s MyReader) Read(b []byte) (int, error) {
b[0] = byte('A')
return 1, nil
}
You can generalize the idea to create an eternal reader, alwaysReader, from which you always read the same byte value over and over (it never results in EOF):
package readers
type alwaysReader struct {
value byte
}
func (r alwaysReader) Read(p []byte) (n int, err error) {
for i := range p {
p[i] = r.value
}
return len(p), nil
}
func NewAlwaysReader(value byte) alwaysReader {
return alwaysReader { value }
}
NewAlwaysReader() is the constructor for alwaysReader (which isn't exported). The result of NewAlwaysReader('A') is a reader from whom you will always read 'A'.
A clarifying unit test for alwaysReader:
package readers_test
import (
"bytes"
"io"
"readers"
"testing"
)
func TestEmptyReader(t *testing.T) {
const numBytes = 128
const value = 'A'
buf := bytes.NewBuffer(make([]byte, 0, numBytes))
reader := io.LimitReader(readers.NewAlwaysReader(value), numBytes)
n, err := io.Copy(buf, reader)
if err != nil {
t.Fatal("copy failed: %w")
}
if n != numBytes {
t.Errorf("%d bytes read but %d expected", n, numBytes)
}
for i, elem := range buf.Bytes() {
if elem != value {
t.Errorf("byte at position %d has not the value %v but %v", i, value, elem)
}
}
}
Since we can read from the alwaysReader forever, we need to decorate it with a io.LimitReader so that we end up reading at most numBytes from it. Otherwise, the bytes.Buffer will eventually run out of memory for reallocating its internal buffer because of io.Copy().
Note that the following implementation of Read() for alwaysReader is also valid:
func (r alwaysReader) Read(p []byte) (n int, err error) {
if len(p) > 0 {
p[0] = r.value
return 1, nil
}
return 0, nil
}
The former Read() implementation fills the whole byte slice with the byte value, whereas the latter writes a single byte.

How to read a file line by line and return how many bytes have been read?

The case is :
I want read the log like "tail -f" *NIX
when I kill the program I can know how many bytes I have already read,and I can use the seek
when the program start again,will continue to read the log line by line depend by seek data in step 2
I want get the bytes when I use bufio.NewScanner as a line reader to read a line
eg:
import ...
func main() {
f, err := os.Open("111.txt")
if err != nil {
log.Fatal(err)
}
f.Seek(0,os.SEEK_SET)
scan := bufio.NewScanner(f)
for scan.Scan() {
log.Printf(scan.Text())
//what I want is how many bytes at this time when I read a line
}//This is a program for read line
}
thx!
==================================update==========================================
#twotwotwo this is close to what I want,but I want change the io.Reader to the io.ReaderAt, and it is what I want,I write a demo use the io.Reader:`
import (
"os"
"log"
"io"
)
type Reader struct {
reader io.Reader
count int
}
func (r *Reader) Read(b []byte) (int, error) {
n, err := r.reader.Read(b)
r.count += n
return n, err
}
func (r *Reader) Count() int {
return r.count
}
func NewReader(r io.Reader) *Reader {
return &Reader{reader: r}
}
func ReadLine(r *Reader) (ln int,line []byte,err error) {
line = make([]byte,0,4096)
for {
b := make([]byte,1)
n,er := r.Read(b)
if er == io.EOF {
err = er
break
}
if n > 0{
c := b[0]
if c == '\n' {
break
}
line = append(line, c)
}
if er != nil{
err = er
}
}
ln = r.Count()
return ln,line,err
}
func main() {
f, err := os.Open("111.txt")
if err != nil {
log.Fatal(err)
}
fi,_:=os.Stat("111.txt")
log.Printf("the file have %v bytes",fi.Size())
co := NewReader(f)
for {
count,line,er := ReadLine(co)
if er == io.EOF {
break
}
log.Printf("now read the line :%v",string(line))
log.Printf("in all we have read %v bytes",count)
}
}`
this Program can tell me how many bytes I have already read,but cannt read start from anywhere where I want,so I think that if we use io.ReaderAt must can do it.
thanks again!
You could consider another approach based on os.File.
See ActiveState/tail, which monitor the state of a file, and uses os.File#Seek() to resume tailing a file from within a certain point.
See tail.go.
Consider composition.
We know that bufio.NewScanner is interacting with its input through the io.Reader interface. So we may wrap an io.Reader with something else that counts how many bytes have been read so far.
package main
import (
"bufio"
"bytes"
"io"
"log"
)
type ReadCounter struct {
io.Reader
BytesRead int
}
func (r *ReadCounter) Read(p []byte) (int, error) {
n, err := r.Reader.Read(p)
r.BytesRead += n
return n, err
}
func main() {
b := &ReadCounter{Reader: bytes.NewBufferString("hello\nworld\testing\n")}
scan := bufio.NewScanner(b)
for scan.Scan() {
log.Println(scan.Text())
log.Println("Read", b.BytesRead, "bytes so far")
}
}
But we'll note that bufio.NewScanner is buffered, so we can see that it reads its input in chunks. So for your purposes, this might not be as useful as you want.
An alternative is to take the content of scan.Text() and count up the lengths. You can compensate for its removal of newline bytes in your internal count.

Is there a ReadLine equivalent for a file in Go?

I want to ReadBytes until "\n" for a text file, not a bufio.
Is there a way to do this without converting to a bufio?
There are many ways to do it, but wrapping with bufio is what I would suggest. But if that doesn't work for you (why not?), you can go ahead and read single bytes like this:
Full working example:
package main
import (
"bytes"
"fmt"
"io"
)
// ReadLine reads a line delimited by \n from the io.Reader
// Unlike bufio, it does so rather inefficiently by reading one byte at a time
func ReadLine(r io.Reader) (line []byte, err error) {
b := make([]byte, 1)
var l int
for err == nil {
l, err = r.Read(b)
if l > 0 {
if b[0] == '\n' {
return
}
line = append(line, b...)
}
}
return
}
var data = `Hello, world!
I will write
three lines.`
func main() {
b := bytes.NewBufferString(data)
for {
line, err := ReadLine(b)
fmt.Println("Line: ", string(line))
if err != nil {
return
}
}
}
Output:
Line: Hello, world!
Line: I will write
Line: three lines.
Playground: http://play.golang.org/p/dfb0GHPpnm

How to flush Stdin after fmt.Scanf() in Go?

Here's an issue that's bedeviling me at the moment. When getting input from the user, I want to employ a loop to ask the user to retry until they enter valid input:
// user_input.go
package main
import (
"fmt"
)
func main() {
fmt.Println("Please enter an integer: ")
var userI int
for {
_, err := fmt.Scanf("%d", &userI)
if err == nil {
break
}
fmt.Println("Sorry, invalid input. Please enter an integer: ")
}
fmt.Println(userI)
}
Running the above, if the user enters valid input, no problem:
Please enter an integer:
3
3

exit code 0, process exited normally.
But try inputting a string instead?
Please enter an integer:

what?
Sorry, invalid input. Please enter an integer:

Sorry, invalid input. Please enter an integer:

Sorry...
Etc, and it keeps looping character by character until the string is exhausted.
Even inputting a single character loops twice, I assume as it parses the newline.
Anyways, there must be a way to flush Stdin in Go?
P.S. In the absence of such a feature, how would you work around it to provide equivalent functionality? I've failed even at that...
I would fix this by reading until the end of the line after each failure. This clears the rest of the text.
package main
import (
"bufio"
"fmt"
"os"
)
func main() {
stdin := bufio.NewReader(os.Stdin)
fmt.Println("Please enter an integer: ")
var userI int
for {
_, err := fmt.Fscan(stdin, &userI)
if err == nil {
break
}
stdin.ReadString('\n')
fmt.Println("Sorry, invalid input. Please enter an integer: ")
}
fmt.Println(userI)
}
Is it bad to wake up an old question?
I prefer to use fmt.Scanln because A) it doesn't require importing another library (e.g. reader) and B) it doesn't involve an explicit for loop.
func someFunc() {
fmt.Printf("Please enter an integer: ")
// Read in an integer
var i int
_, err := fmt.Scanln(&i)
if err != nil {
fmt.Printf("Error: %s", err.Error())
// If int read fails, read as string and forget
var discard string
fmt.Scanln(&discard)
return
}
fmt.Printf("Input contained %d", i)
}
However, it seems like there ought to be a more elegant solution. Particularly in the case of fmt.Scanln it seems odd that the read stops after the first non-number byte rather than "scanning the line".
I ran into a similar problem for getting user input but solved it in a slightly different way. Adding to the thread in case someone else finds this useful:
package main
import (
"bufio"
"fmt"
"os"
"strings"
)
// Get first word from stdin
func getFirstWord() (string) {
input := bufio.NewScanner(os.Stdin)
input.Scan()
ans := strings.Fields(input.Text())
if len(ans) == 0 {
return ""
} else {
return ans[0]
}
}
func main() {
fmt.Printf("Would you like to play a game?\n> ")
ans := getFirstWord()
fmt.Printf("Your answer: %s\n", ans)
}
I know this has already been answered but this was my implementation:
func flush (reader *bufio.Reader) {
var i int
for i = 0; i < reader.Buffered(); i++ {
reader.ReadByte()
}
}
This should work in every situation, including ones where "stdin.ReadString('\n')" cannot be used.
Sorry for digging this back up, but I ran into this today and wanted to improve on the existing answers by using new standard library functionality.
import (
"bufio"
"fmt"
"os"
)
func discardBuffer(r *bufio.Reader) {
r.Discard(r.Buffered())
}
stdin := bufio.NewReader(os.Stdin)
var i int
for true {
if _, err := fmt.Fscanln(stdin, &i); err != nil {
discardBuffer(stdin)
// Handle error, display message, etc.
continue
}
// Do your other value checks and validations
break
}
The basic idea is to always buffer your reads from stdin. When you encounter an error while scanning, just discard the buffer contents. That way you start with an empty buffer for your next scan.
Alternatively, you can discard the buffer before you scan, so any stray inputs by the user before then won't get picked up.
func fscanln(r *bufio.Reader, a ...interface{}) error {
r.Discard(r.Buffered())
_, err := fmt.Fscanln(r, a...)
return err
}
stdin := bufio.NewReader(os.Stdin)
var i int
if err := fscanln(stdin, &i); err != nil {
// Handle error
}
I use this snippet to filter unnecessary leading space/new line
in := bufio.NewReader(os.Stdin)
result, err = in.ReadString('\n')
for len(strings.TrimSpace(result)) == 0 {
result, err = in.ReadString('\n')
}
I usually use bufio.Scanner since the fmt.Scan funcs always split on whitespace.
func promptYN(msg string) bool {
s := bufio.NewScanner(os.Stdin)
for {
fmt.Printf("%s [y/n]: ", msg)
s.Scan()
input := strings.ToLower(s.Text())
if input == "y" || input == "n" {
return input == "y"
}
fmt.Println("Error: expected Y or N.")
}
}
func promptInt(msg string) int {
s := bufio.NewScanner(os.Stdin)
for {
fmt.Printf("%s [int]: ", msg)
s.Scan()
output, err := strconv.Atoi(s.Text())
if err == nil {
return output
}
fmt.Println("Error: expected an integer.")
}
}
Or you could make something more universal:
func prompt(msg string, check func(string) bool) {
s := bufio.NewScanner(os.Stdin)
for {
fmt.Printf("%s: ", msg)
s.Scan()
if check(s.Text()) {
return
}
}
}
Example:
var f float64
prompt("Enter a float", func(s string) bool {
f, err = strconv.ParseFloat(s, 64)
return err == nil
})

Resources