After called Peek method, the origin data has changed - go

package main
import (
"bufio"
"io"
"golang.org/x/net/html/charset"
"golang.org/x/text/encoding"
"net/http"
"fmt"
"golang.org/x/text/transform"
"io/ioutil"
)
// main
func main() {
resp, err := http.Get("http://www.baidu.com")
if err != nil {
panic(err)
}
defer resp.Body.Close()
if resp.StatusCode != http.StatusOK {
fmt.Println("Error: status code", resp.StatusCode)
return
}
e := determineEncoding(resp.Body)
utf8Reader := transform.NewReader(resp.Body, e.NewDecoder())
all, err := ioutil.ReadAll(utf8Reader)
if err != nil {
panic(err)
}
fmt.Printf("%s\n", all)
}
// determine
func determineEncoding(r io.Reader) encoding.Encoding {
reader := bufio.NewReader(r)
// The start position was not correct
bytes, err := reader.Peek(1024)
if err != nil {
panic(err)
}
e, _, _ := charset.DetermineEncoding(bytes, "")
return e
}
The result is not correct data. The start position is not zero.
As document describe 'Peek returns the next n bytes without advancing the reader. The bytes stop being valid at the next read call. If Peek returns fewer than n bytes, it also returns an error explaining why the read is short. The error is ErrBufferFull if n is larger than b's buffer size.'

Peek returns the next n bytes without advancing the reader.
This refers to the *bufio.Reader, not the underlying reader. The buffered reader will read from the underlying reader if necessary. How else would it return the bytes?
In your case, you have to stop using the response body directly after calling determineEncoding and use the *bufio.Reader instead.
For instance:
func determineEncoding(r *bufio.Reader) encoding.Encoding {
bytes, err := r.Peek(1024)
// as before
}
func main() {
// as before
defer resp.Body.Close()
r := bufio.NewReader(resp.Body)
e := determineEncoding(r)
utf8Reader := transform.NewReader(r, e.NewDecoder())
// as before
}

Related

Read exactly n bytes unless EOF?

I'm using a function that returns an io.Reader to download a file from the Internet.
I want to process the file in exactly 2048 chunks until it's no longer possible because of EOF.
The io.ReadFull function is almost what I want:
buf := make([]byte, 2048)
for {
if _, err := io.ReadFull(reader, buf); err == io.EOF {
return io.ErrUnexpectedEOF
} else if err != nil {
return err
}
// Do processing on buf
}
The problem with this is that not all files are a multiple of 2048 bytes, so the last chunk may only be e.g. 500 bytes, io.ReadFull will therefore return ErrUnexpectedEOF and the last chunk is discarded.
A function name to summarize what I want could be io.ReadFullUnlessLastChunk, so ErrUnexpectedEOF is not returned if the reason that buf cannot be filled with 2048 bytes, is that the file is EOF after e.g. 500 bytes. However, in any other case ErrUnexpectedEOF should be returned as a problem has occured.
What could I do to accomplish this?
Another problem is that reading only 2048 bytes at the time directly from the network seems to have much overhead, if I could get 256 KB from network into a buffer, and then take the 2048 bytes I need from that buffer instead, that would be better.
For example,
package main
import (
"bufio"
"fmt"
"io"
"os"
)
func readChunks(r io.Reader) error {
if _, ok := r.(*bufio.Reader); !ok {
r = bufio.NewReader(r)
}
buf := make([]byte, 0, 2048)
for {
n, err := io.ReadFull(r, buf[:cap(buf)])
buf = buf[:n]
if err != nil {
if err == io.EOF {
break
}
if err != io.ErrUnexpectedEOF {
return err
}
}
// Process buf
fmt.Println(len(buf))
}
return nil
}
func main() {
fName := `test.file`
f, err := os.Open(fName)
if err != nil {
fmt.Println(err)
return
}
defer f.Close()
err = readChunks(f)
if err != nil {
fmt.Println(err)
return
}
}

Go bufio ReadString in loop is infinite

I have the next code:
resp, err := http.Get("https://www.google.com")
if err != nil{
panic(err)
}
r := bufio.NewReader(resp.Body)
for v, e := r.ReadString('\n'); e == nil; {
fmt.Println(v)
}
So, I want to read responce body in loop but reader r reads first line of Body infinitely.
While in the same time, this code works fine:
v, e := r.ReadString('\n')
for e == nil {
fmt.Println(v)
v, e = r.ReadString('\n')
}
Can someone explain why the first solution has such behaviour?
Package bufio
import "bufio"
func (*Reader) ReadString
func (b *Reader) ReadString(delim byte) (string, error)
ReadString reads until the first occurrence of delim in the input,
returning a string containing the data up to and including the
delimiter. If ReadString encounters an error before finding a
delimiter, it returns the data read before the error and the error
itself (often io.EOF). ReadString returns err != nil if and only if
the returned data does not end in delim. For simple uses, a Scanner
may be more convenient.
This is an XY problem: The XY problem is asking about your attempted solution rather than your actual problem.
Why didn't you take the advice, "For simple uses, a Scanner may be more convenient", given in the bufio.ReadString documentation?
Proper use of bufio.ReadString is complicated, even when you know how to use for loops. See function reader.
Proper use of bufio.Scannner is simple, even if you don't know how to use for loops. See function scanner.
For example,
package main
import (
"bufio"
"fmt"
"io"
"net/http"
"os"
"strings"
)
func reader(url string) error {
resp, err := http.Get(url)
if err != nil {
return err
}
defer resp.Body.Close()
// ReadString
r := bufio.NewReader(resp.Body)
for {
line, err := r.ReadString('\n')
if len(line) == 0 && err != nil {
if err == io.EOF {
break
}
return err
}
line = strings.TrimSuffix(line, "\n")
fmt.Println(line)
if err != nil {
if err == io.EOF {
break
}
return err
}
}
return nil
}
func scanner(url string) error {
resp, err := http.Get(url)
if err != nil {
return err
}
defer resp.Body.Close()
// Scanner
s := bufio.NewScanner(resp.Body)
for s.Scan() {
line := s.Text()
fmt.Println(line)
}
if s.Err() != nil {
return err
}
return nil
}
func main() {
url := "https://www.example.com"
fmt.Println("\nReader:\n")
err := reader(url)
if err != nil {
fmt.Fprintln(os.Stderr, err)
}
fmt.Println("\nScanner:\n")
err = scanner(url)
if err != nil {
fmt.Fprintln(os.Stderr, err)
}
fmt.Println("\n")
}
Playground: https://play.golang.org/p/e0WY_aNxW8
Structure of the loop is:
for init; condition; post { }
The init part of the loop is called only once, at the beginning. That means that the...
v, e := r.ReadString('\n')
...part from your loop is called only once, which explains why your loop implementation reads only the first line from r and why e is always nil, resulting in an infinite loop.
You may want to do something like this instead:
for v, e := "", (error)(nil); e == nil; {
v, e = r.ReadString('\n')
fmt.Println(v)
}
Or if that looks weird to you, something like this:
var v string
var e error
for ; e == nil; {
v, e = r.ReadString('\n')
fmt.Println(v)
}
More info here:
https://golang.org/doc/effective_go.html#for
https://golang.org/ref/spec#For_statements

How to measure both before and after byte size and time to compress

I want to gzip a string (it is actually a JSON response)
var b bytes.Buffer
gz := gzip.NewWriter(&b)
if _, err := gz.Write([]byte("YourDataHere")); err != nil {
panic(err)
}
How can I easily output the size of bytes before and after compression and more importantly how can I time it takes to compress and decompress back to a string?
You can calculate the size as per Nipun Talukdar's comment.
len([]byte("YourDataHere"))
b.Len()
And use time.Now() and time.Since() to get the time taken.
var b bytes.Buffer
input := []byte("YourDataHere")
fmt.Println("Input size : ", len(input))
gz := gzip.NewWriter(&b)
start := time.Now()
gz.Write(input)
if _, err := gz.Flush(); err != nil {
panic(err)
}
totalTime := time.Since(start)
fmt.Println("Compressed size : ", b.Len(), "\nTime taken : ", totalTime)
gz.Close()
Same method can be applied with unzipping.
You can also create a support function that can do the timing.
func timer(startTime time.Time) {
totalTime := time.Since(startTime)
log.Println("Time taken : ",totalTime)
}
Usage : defer timer(time.Now()) at the start of the function.
There are examples of how to do this in Go here
https://golang.org/test/bench/go1/gzip_test.go
Thankfully it's BSD licensed...
// Copyright 2011 The Go Authors. All rights reserved.
// Use of this source code is governed by a BSD-style
// license that can be found in the LICENSE file.
// This benchmark tests gzip and gunzip performance.
package go1
import (
"bytes"
gz "compress/gzip"
"io"
"io/ioutil"
"testing"
)
var (
jsongunz = bytes.Repeat(jsonbytes, 10)
jsongz []byte
)
func init() {
var buf bytes.Buffer
c := gz.NewWriter(&buf)
c.Write(jsongunz)
c.Close()
jsongz = buf.Bytes()
}
func gzip() {
c := gz.NewWriter(ioutil.Discard)
if _, err := c.Write(jsongunz); err != nil {
panic(err)
}
if err := c.Close(); err != nil {
panic(err)
}
}
func gunzip() {
r, err := gz.NewReader(bytes.NewBuffer(jsongz))
if err != nil {
panic(err)
}
if _, err := io.Copy(ioutil.Discard, r); err != nil {
panic(err)
}
r.Close()
}
func BenchmarkGzip(b *testing.B) {
b.SetBytes(int64(len(jsongunz)))
for i := 0; i < b.N; i++ {
gzip()
}
}
func BenchmarkGunzip(b *testing.B) {
b.SetBytes(int64(len(jsongunz)))
for i := 0; i < b.N; i++ {
gunzip()
}
}
It's unfortunate that gzip.Writer can't report the number of compressed bytes written to the underlying stream. It gets more complicated when that underlying stream is not in-memory.
To solve this, I wrote a "counting io.Writer" that I place in between gzip.Writer and the underlying stream, so I can count and extract the number of compressed bytes written.
Try out the following code in the Go Playground.
package main
import (
"compress/gzip"
"fmt"
"io"
"os"
)
// countingWriter is an io.Writer that counts the total bytes written to it.
type countingWriter struct {
w io.Writer
Count int
}
var _ io.Writer = &countingWriter{}
func newCountingWriter(w io.Writer) *countingWriter {
return &countingWriter{w: w}
}
func (cw *countingWriter) Write(p []byte) (int, error) {
n, err := cw.w.Write(p)
cw.Count += n
return n, err
}
func ExampleUse(w io.Writer) (int, error) {
cw := newCountingWriter(w)
zw, err := gzip.NewWriterLevel(cw, gzip.BestCompression)
if err != nil {
return 0, err
}
if _, err := zw.Write([]byte("hello world")); err != nil {
return cw.Count, err
}
err = zw.Close()
return cw.Count, err
}
func main() {
n, err := ExampleUse(os.Stderr)
if err != nil {
panic(err)
}
fmt.Printf("wrote %d bytes\n", n)
}

Go file downloader

I have the following code which is suppose to download file by splitting it into multiple parts. But right now it only works on images, when I try downloading other files like tar files the output is an invalid file.
UPDATED:
Used os.WriteAt instead of os.Write and removed os.O_APPEND file mode.
package main
import (
"errors"
"flag"
"fmt"
"io/ioutil"
"log"
"net/http"
"os"
"strconv"
)
var file_url string
var workers int
var filename string
func init() {
flag.StringVar(&file_url, "url", "", "URL of the file to download")
flag.StringVar(&filename, "filename", "", "Name of downloaded file")
flag.IntVar(&workers, "workers", 2, "Number of download workers")
}
func get_headers(url string) (map[string]string, error) {
headers := make(map[string]string)
resp, err := http.Head(url)
if err != nil {
return headers, err
}
if resp.StatusCode != 200 {
return headers, errors.New(resp.Status)
}
for key, val := range resp.Header {
headers[key] = val[0]
}
return headers, err
}
func download_chunk(url string, out string, start int, stop int) {
client := new(http.Client)
req, _ := http.NewRequest("GET", url, nil)
req.Header.Add("Range", fmt.Sprintf("bytes=%d-%d", start, stop))
resp, _ := client.Do(req)
defer resp.Body.Close()
body, err := ioutil.ReadAll(resp.Body)
if err != nil {
log.Fatalln(err)
return
}
file, err := os.OpenFile(out, os.O_WRONLY, 0600)
if err != nil {
if file, err = os.Create(out); err != nil {
log.Fatalln(err)
return
}
}
defer file.Close()
if _, err := file.WriteAt(body, int64(start)); err != nil {
log.Fatalln(err)
return
}
fmt.Println(fmt.Sprintf("Range %d-%d: %d", start, stop, resp.ContentLength))
}
func main() {
flag.Parse()
headers, err := get_headers(file_url)
if err != nil {
fmt.Println(err)
} else {
length, _ := strconv.Atoi(headers["Content-Length"])
bytes_chunk := length / workers
fmt.Println("file length: ", length)
for i := 0; i < workers; i++ {
start := i * bytes_chunk
stop := start + (bytes_chunk - 1)
go download_chunk(file_url, filename, start, stop)
}
var input string
fmt.Scanln(&input)
}
}
Basically, it just reads the length of the file, divides it with the number of workers then each file downloads using HTTP's Range header, after downloading it seeks to a position in the file where that chunk is written.
If you really ignore many errors like seen above then your code is not supposed to work reliably for any file type.
However, I guess I can see on problem in your code. I think that mixing O_APPEND and seek is probably a mistake (Seek should be ignored with this mode). I suggest to use (*os.File).WriteAt instead.
IIRC, O_APPEND forces any write to happen at the [current] end of file. However, your download_chunk function instances for file parts can be executing in unpredictable order, thus "reordering" the file parts. The result is then a corrupted file.
1.the sequence of the go routine is not sure。
eg. the execute result maybe as follows:
...
file length:20902
Range 10451-20901:10451
Range 0-10450:10451
...
so the chunks can't just append.
2.when write chunk datas must have a sys.Mutex
(my english is poor,please forget it)

Read text file into string array (and write)

The ability to read (and write) a text file into and out of a string array is I believe a fairly common requirement. It is also quite useful when starting with a language removing the need initially to access a database. Does one exist in Golang?
e.g.
func ReadLines(sFileName string, iMinLines int) ([]string, bool) {
and
func WriteLines(saBuff[]string, sFilename string) (bool) {
I would prefer to use an existing one rather than duplicate.
As of Go1.1 release, there is a bufio.Scanner API that can easily read lines from a file. Consider the following example from above, rewritten with Scanner:
package main
import (
"bufio"
"fmt"
"log"
"os"
)
// readLines reads a whole file into memory
// and returns a slice of its lines.
func readLines(path string) ([]string, error) {
file, err := os.Open(path)
if err != nil {
return nil, err
}
defer file.Close()
var lines []string
scanner := bufio.NewScanner(file)
for scanner.Scan() {
lines = append(lines, scanner.Text())
}
return lines, scanner.Err()
}
// writeLines writes the lines to the given file.
func writeLines(lines []string, path string) error {
file, err := os.Create(path)
if err != nil {
return err
}
defer file.Close()
w := bufio.NewWriter(file)
for _, line := range lines {
fmt.Fprintln(w, line)
}
return w.Flush()
}
func main() {
lines, err := readLines("foo.in.txt")
if err != nil {
log.Fatalf("readLines: %s", err)
}
for i, line := range lines {
fmt.Println(i, line)
}
if err := writeLines(lines, "foo.out.txt"); err != nil {
log.Fatalf("writeLines: %s", err)
}
}
Note: ioutil is deprecated as of Go 1.16.
If the file isn't too large, this can be done with the ioutil.ReadFile and strings.Split functions like so:
content, err := ioutil.ReadFile(filename)
if err != nil {
//Do something
}
lines := strings.Split(string(content), "\n")
You can read the documentation on ioutil and strings packages.
Cannot update first answer.
Anyway, after Go1 release, there are some breaking changes, so I updated as shown below:
package main
import (
"os"
"bufio"
"bytes"
"io"
"fmt"
"strings"
)
// Read a whole file into the memory and store it as array of lines
func readLines(path string) (lines []string, err error) {
var (
file *os.File
part []byte
prefix bool
)
if file, err = os.Open(path); err != nil {
return
}
defer file.Close()
reader := bufio.NewReader(file)
buffer := bytes.NewBuffer(make([]byte, 0))
for {
if part, prefix, err = reader.ReadLine(); err != nil {
break
}
buffer.Write(part)
if !prefix {
lines = append(lines, buffer.String())
buffer.Reset()
}
}
if err == io.EOF {
err = nil
}
return
}
func writeLines(lines []string, path string) (err error) {
var (
file *os.File
)
if file, err = os.Create(path); err != nil {
return
}
defer file.Close()
//writer := bufio.NewWriter(file)
for _,item := range lines {
//fmt.Println(item)
_, err := file.WriteString(strings.TrimSpace(item) + "\n");
//file.Write([]byte(item));
if err != nil {
//fmt.Println("debug")
fmt.Println(err)
break
}
}
/*content := strings.Join(lines, "\n")
_, err = writer.WriteString(content)*/
return
}
func main() {
lines, err := readLines("foo.txt")
if err != nil {
fmt.Println("Error: %s\n", err)
return
}
for _, line := range lines {
fmt.Println(line)
}
//array := []string{"7.0", "8.5", "9.1"}
err = writeLines(lines, "foo2.txt")
fmt.Println(err)
}
You can use os.File (which implements the io.Reader interface) with the bufio package for that. However, those packages are build with fixed memory usage in mind (no matter how large the file is) and are quite fast.
Unfortunately this makes reading the whole file into the memory a bit more complicated. You can use a bytes.Buffer to join the parts of the line if they exceed the line limit. Anyway, I recommend you to try to use the line reader directly in your project (especially if do not know how large the text file is!). But if the file is small, the following example might be sufficient for you:
package main
import (
"os"
"bufio"
"bytes"
"fmt"
)
// Read a whole file into the memory and store it as array of lines
func readLines(path string) (lines []string, err os.Error) {
var (
file *os.File
part []byte
prefix bool
)
if file, err = os.Open(path); err != nil {
return
}
reader := bufio.NewReader(file)
buffer := bytes.NewBuffer(make([]byte, 1024))
for {
if part, prefix, err = reader.ReadLine(); err != nil {
break
}
buffer.Write(part)
if !prefix {
lines = append(lines, buffer.String())
buffer.Reset()
}
}
if err == os.EOF {
err = nil
}
return
}
func main() {
lines, err := readLines("foo.txt")
if err != nil {
fmt.Println("Error: %s\n", err)
return
}
for _, line := range lines {
fmt.Println(line)
}
}
Another alternative might be to use io.ioutil.ReadAll to read in the complete file at once and do the slicing by line afterwards. I don't give you an explicit example of how to write the lines back to the file, but that's basically an os.Create() followed by a loop similar to that one in the example (see main()).
func readToDisplayUsingFile1(f *os.File){
defer f.Close()
reader := bufio.NewReader(f)
contents, _ := ioutil.ReadAll(reader)
lines := strings.Split(string(contents), '\n')
}
or
func readToDisplayUsingFile1(f *os.File){
defer f.Close()
slice := make([]string,0)
reader := bufio.NewReader(f)
for{
str, err := reader.ReadString('\n')
if err == io.EOF{
break
}
slice = append(slice, str)
}

Resources