I am building a Go application that takes an http.Response object and saves it (response headers and body) to a redis hash. When the application receives an http.Response.Body that is not gzipped, I want to gzip it before saving it to the cache.
My confusion stems from my inability to make clear sense of Go's io interfaces, and how to negotiate between http.Response.Body's io.ReadCloser and the gzip Writer. I imagine there is an elegant, streaming solution here, but I can't quite get it to work.
If you've already determined the body is uncompressed, and if you need a []byte of the compressed data (instead of for example already having an io.Writer you could write to, e.g. if you wanted to save the body to a file then you'd want to stream into the file not into a buffer) then something like this should work:
func getCompressedBody(r *http.Response) ([]byte, error) {
var buf bytes.Buffer
gz := gzip.NewWriter(&buf)
if _, err := io.Copy(gz, r.Body); err != nil {
return nil, err
}
err := gz.Close()
return buf.Bytes(), err
}
(this is just an example and would probably be in-line instead of as a function; if you wanted it as a fuction then it should probably take an io.Reader instead of an *http.Response).
Related
i have a file. it has some ip
1.1.1.0/24
1.1.2.0/24
2.2.1.0/24
2.2.2.0/24
i read this file to slice, and used *(*string)(unsafe.Pointer(&b)) to parse []byte to string, but is doesn't work
func TestInitIpRangeFromFile(t *testing.T) {
filepath := "/tmp/test"
file, err := os.Open(filepath)
if err != nil {
t.Errorf("failed to open ip range file:%s, err:%s", filepath, err)
}
reader := bufio.NewReader(file)
ranges := make([]string, 0)
for {
ip, _, err := reader.ReadLine()
if err != nil {
if err == io.EOF {
break
}
logger.Fatalf("failed to read ip range file, err:%s", err)
}
t.Logf("ip:%s", *(*string)(unsafe.Pointer(&ip)))
ranges = append(ranges, *(*string)(unsafe.Pointer(&ip)))
}
t.Logf("%v", ranges)
}
result:
task_test.go:71: ip:1.1.1.0/24
task_test.go:71: ip:1.1.2.0/24
task_test.go:71: ip:2.2.1.0/24
task_test.go:71: ip:2.2.2.0/24
task_test.go:75: [2.2.2.0/24 1.1.2.0/24 2.2.1.0/24 2.2.2.0/24]
why 1.1.1.0/24 changed to 2.2.2.0/24 ?
change
*(*string)(unsafe.Pointer(&ip))
to string(ip) it works
So, while reinterpreting a slice-header as a string-header the way you did is absolutely bonkers and has no guarantee whatsoever of working correctly, it's only indirectly the cause of your problem.
The real problem is that you're retaining a pointer to the return value of bufio/Reader.ReadLine(), but the docs for that method say "The returned buffer is only valid until the next call to ReadLine." Which means that the reader is free to reuse that memory later on, and that's what's happening.
When you do the cast in the proper way, string(ip), Go copies the contents of the buffer into the newly-created string, which remains valid in the future. But when you type-pun the slice into a string, you keep the exact same pointer, which stops working as soon as the reader refills its buffer.
If you decided to do the pointer trickery as a performance hack to avoid copying and allocation... too bad. The reader interface is going to force you to copy the data out anyway, and since it does, you should just use string().
This is how I am using gzip writer.
var b bytes.Buffer
gz := gzip.NewWriter(&b)
if _, err := gz.Write([]byte(data)); err != nil {
panic(err)
}
/*
if err := gz.Flush(); err != nil {
panic(err)
}
*/
if err := gz.Close(); err != nil {
panic(err)
}
playground link https://play.golang.org/p/oafHItGOlDN
Clearly, Flush + Close and just Close are giving different results.
Docs for the compress/gzip package says:
func (z *Writer) Close() error
Close closes the Writer by flushing any unwritten data to the underlying io.Writer and writing the GZIP footer. It does not close the underlying io.Writer.
What flushing is this doc talking about? Why do you need Flush function at all if Close is enough? Why doesn't Close call Flush?
Closing does cause a flush. When you call Flush and then Close, the stream is flushed twice, which causes an additional chunk to be output, which uses 5 bytes to code 0 bytes of data. Both streams encode the same data, but one of them is wasteful.
As for why you would use Flush, the explanation is right there in the documentation for Flush. Sometimes you're not done writing, but you need to ensure that all of the data that you've written up to this point is readable by the client, before additional data is available. At those points, you flush the stream. You only close when there will be no more data.
I have a file struct that holds a body which is just a *bytes.Reader I have two methods on the struct Zip() error and UnZip() error. When I call Zip it should zip the file storing the zipped data in body and I should be able to call UnZip on the same file and store the unzipped data in the body.
The minimal example I have is below in the playground. https://play.golang.org/p/WmZtqtvnyN
I'm able to zip the file just fine and looks like it's doing what it's supposed to do; however, when I try and unzip the file I get unexpected EOF
I've been going at this for hours now. Any help is greatly appreciated.
I believe you should close gzip writer before geting bytes from the underlying buffer.
func (f *File) Zip() error {
buff := bytes.NewBuffer(nil)
writer := gzip.NewWriter(buff)
defer writer.Close()
_, err := f.Body.WriteTo(writer)
if err != nil {
return err
}
writer.Close() // I have added this line
f.Body = bytes.NewReader(buff.Bytes())
f.Name = fmt.Sprintf("%s.gz", f.Name)
return nil
}
As per the documentation for gzip.NewReader, If r does not also implement io.ByteReader, the decompressor may read more data than necessary from r.
For bytes.Reader, A Reader implements the io.Reader, io.ReaderAt, io.WriterTo, io.Seeker, io.ByteScanner, and io.RuneScanner interfaces by reading from a byte slice.
The problem maybe lies in the fact that bytes.Reader does not implement io.ByteReader.
I'm currently saving a struct to file so it can be loaded and later used by implementing gob, as follows:
func (t *Object) Load(filename string) error {
fi, err := os.Open(filename)
if err !=nil {
return err
}
defer fi.Close()
fz, err := gzip.NewReader(fi)
if err !=nil {
return err
}
defer fz.Close()
decoder := gob.NewDecoder(fz)
err = decoder.Decode(&t)
if err !=nil {
return err
}
return nil
}
func (t *Object) Save(filename string) error {
fi, err := os.Create(filename)
if err !=nil {
return err
}
defer fi.Close()
fz := gzip.NewWriter(fi)
defer fz.Close()
encoder := gob.NewEncoder(fz)
err = encoder.Encode(t)
if err !=nil {
return err
}
return nil
}
My concern is that Go might be updated in a way that changes the way that gobs of data are encoding and decoded. If this happens then the version of my app compiled with the new version of Go would not be able to load files saved from the previous version. This would be a major issue but I'm not sure if its a realistic concern or not.
So does anyone know if I can consider it safe to save and load gob encoding data like this and expect it to still work when Go is updated?
If not, what would be the best alternative? Would my function still work if I changed gob.NewDecoder and gob.NewEncoder to xml.NewDecoder and xml.NewEncoder? (Does the XML encoder encode and decode structs in the same way as gob, i.e. without me having to tell it what they look like?)
The documentation for the type GobEncoder does mention:
Note: Since gobs can be stored permanently, It is good design to guarantee the encoding used by a GobEncoder is stable as the software evolves.
For instance, it might make sense for GobEncode to include a version number in the encoding.
But that applies to custom encoder.
For the one provided with go, the compatibility is guarantee at source level: Backwards-incompatible changes will not be made to any Go 1 point release.
That should mean gob should continue to work as it does now.
A different and robust solution exists with projects like "ugorji/go/codec":
High Performance and Feature-Rich Idiomatic Go Library providing encode/decode support for different serialization formats.
Supported Serialization formats are:
msgpack: https://github.com/msgpack/msgpack
binc: http://github.com/ugorji/binc
But unless you need those specific formats, gob should be enough.
I have an io.ReadCloser object (from an http.Response object).
What's the most efficient way to convert the entire stream to a string object?
EDIT:
Since 1.10, strings.Builder exists. Example:
buf := new(strings.Builder)
n, err := io.Copy(buf, r)
// check errors
fmt.Println(buf.String())
OUTDATED INFORMATION BELOW
The short answer is that it it will not be efficient because converting to a string requires doing a complete copy of the byte array. Here is the proper (non-efficient) way to do what you want:
buf := new(bytes.Buffer)
buf.ReadFrom(yourReader)
s := buf.String() // Does a complete copy of the bytes in the buffer.
This copy is done as a protection mechanism. Strings are immutable. If you could convert a []byte to a string, you could change the contents of the string. However, go allows you to disable the type safety mechanisms using the unsafe package. Use the unsafe package at your own risk. Hopefully the name alone is a good enough warning. Here is how I would do it using unsafe:
buf := new(bytes.Buffer)
buf.ReadFrom(yourReader)
b := buf.Bytes()
s := *(*string)(unsafe.Pointer(&b))
There we go, you have now efficiently converted your byte array to a string. Really, all this does is trick the type system into calling it a string. There are a couple caveats to this method:
There are no guarantees this will work in all go compilers. While this works with the plan-9 gc compiler, it relies on "implementation details" not mentioned in the official spec. You can not even guarantee that this will work on all architectures or not be changed in gc. In other words, this is a bad idea.
That string is mutable! If you make any calls on that buffer it will change the string. Be very careful.
My advice is to stick to the official method. Doing a copy is not that expensive and it is not worth the evils of unsafe. If the string is too large to do a copy, you should not be making it into a string.
Answers so far haven't addressed the "entire stream" part of the question. I think the good way to do this is ioutil.ReadAll. With your io.ReaderCloser named rc, I would write,
Go >= v1.16
if b, err := io.ReadAll(rc); err == nil {
return string(b)
} ...
Go <= v1.15
if b, err := ioutil.ReadAll(rc); err == nil {
return string(b)
} ...
data, _ := ioutil.ReadAll(response.Body)
fmt.Println(string(data))
func copyToString(r io.Reader) (res string, err error) {
var sb strings.Builder
if _, err = io.Copy(&sb, r); err == nil {
res = sb.String()
}
return
}
The most efficient way would be to always use []byte instead of string.
In case you need to print data received from the io.ReadCloser, the fmt package can handle []byte, but it isn't efficient because the fmt implementation will internally convert []byte to string. In order to avoid this conversion, you can implement the fmt.Formatter interface for a type like type ByteSlice []byte.
var b bytes.Buffer
b.ReadFrom(r)
// b.String()
I like the bytes.Buffer struct. I see it has ReadFrom and String methods. I've used it with a []byte but not an io.Reader.