Golang un-gzip from bytes.Reader - go

I have a file struct that holds a body which is just a *bytes.Reader I have two methods on the struct Zip() error and UnZip() error. When I call Zip it should zip the file storing the zipped data in body and I should be able to call UnZip on the same file and store the unzipped data in the body.
The minimal example I have is below in the playground. https://play.golang.org/p/WmZtqtvnyN
I'm able to zip the file just fine and looks like it's doing what it's supposed to do; however, when I try and unzip the file I get unexpected EOF
I've been going at this for hours now. Any help is greatly appreciated.

I believe you should close gzip writer before geting bytes from the underlying buffer.
func (f *File) Zip() error {
buff := bytes.NewBuffer(nil)
writer := gzip.NewWriter(buff)
defer writer.Close()
_, err := f.Body.WriteTo(writer)
if err != nil {
return err
}
writer.Close() // I have added this line
f.Body = bytes.NewReader(buff.Bytes())
f.Name = fmt.Sprintf("%s.gz", f.Name)
return nil
}

As per the documentation for gzip.NewReader, If r does not also implement io.ByteReader, the decompressor may read more data than necessary from r.
For bytes.Reader, A Reader implements the io.Reader, io.ReaderAt, io.WriterTo, io.Seeker, io.ByteScanner, and io.RuneScanner interfaces by reading from a byte slice.
The problem maybe lies in the fact that bytes.Reader does not implement io.ByteReader.

Related

Getting `panic: os: invalid use of WriteAt on file opened with O_APPEND`

I am a newbie to Go. Was starting to write my first code in which I have to download a bunch of CSV's from AWS. I don't understand why it is giving me the below error with O_APPEND mode. If I remove os.O_APPEND, I only get the last file data which is not the objective.
The objective is to download all CSV files into one file locally. I'd like to understand what I'm doing incorrectly.
package main
import (
"fmt"
"os"
"path/filepath"
"github.com/aws/aws-sdk-go/aws"
"github.com/aws/aws-sdk-go/aws/credentials"
"github.com/aws/aws-sdk-go/aws/session"
"github.com/aws/aws-sdk-go/service/s3"
"github.com/aws/aws-sdk-go/service/s3/s3manager"
)
const (
AccessKeyId = "xxxxxxxxx"
SecretAccessKey = "xxxxxxxxxxxxxxxxxxxx"
Region = "eu-central-1"
Bucket = "dexter-reports"
bucketKey = "Jenkins/pluginVersions/"
)
func main() {
// Load the Shared AWS Configuration
os.Setenv("AWS_ACCESS_KEY_ID", AccessKeyId)
os.Setenv("AWS_SECRET_ACCESS_KEY", SecretAccessKey)
filename := "JenkinsPluginDetais.txt"
cred := credentials.NewStaticCredentials(AccessKeyId, SecretAccessKey, "")
config := aws.Config{Credentials: cred, Region: aws.String(Region), Endpoint: aws.String("s3.amazonaws.com")}
file, err := os.OpenFile(filename, os.O_APPEND|os.O_WRONLY|os.O_CREATE, 0666)
if err != nil {
panic(err)
}
defer file.Close()
sess, err := session.NewSession(&config)
if err != nil {
fmt.Println(err)
}
//list Buckets
ObjectList := listBucketObjects(sess)
//loop over the obectlist. First initialize the s3 downloader via s3manager
downloader := s3manager.NewDownloader(sess)
for _, item := range ObjectList.Contents {
csvFile := filepath.Base(*item.Key)
if csvFile != "pluginVersions" {
downloadBucketObjects(downloader, file, csvFile)
}
}
}
func listBucketObjects(sess *session.Session) *s3.ListObjectsV2Output {
//create a new s3 client
svc := s3.New(sess)
resp, err := svc.ListObjectsV2(&s3.ListObjectsV2Input{
Bucket: aws.String(Bucket),
Prefix: aws.String(bucketKey),
})
if err != nil {
panic(err)
}
return resp
}
func downloadBucketObjects(downloader *s3manager.Downloader, file *os.File, keyobj string) {
fileToDownload := bucketKey + keyobj
numBytes, err := downloader.Download(file,
&s3.GetObjectInput{
Bucket: aws.String(Bucket),
Key: aws.String(fileToDownload),
})
if err != nil {
panic(err)
}
fmt.Println("Downloaded", file.Name(), numBytes, "bytes")
}
Firstly, I don't get it why do you even need os.O_APPEND flag in the first place. As per my understanding, you can omit os.O_APPEND.
Now, let's come to the actual problem of why it's happening:
Doc for O_APPEND (Ref: https://man7.org/linux/man-pages/man2/open.2.html):
O_APPEND
The file is opened in append mode. Before each write(2),
the file offset is positioned at the end of the file, as
if with lseek(2). The modification of the file offset and
the write operation are performed as a single atomic step.
So for every call to write the file offset is positioned at the end of the file.
But (*s3Manager.Download).Download supposedly be using WriteAt method, i.e.,
Doc for WriteAt:
$ go doc os WriteAt
package os // import "os"
func (f *File) WriteAt(b []byte, off int64) (n int, err error)
WriteAt writes len(b) bytes to the File starting at byte offset off. It
returns the number of bytes written and an error, if any. WriteAt returns a
non-nil error when n != len(b).
If file was opened with the O_APPEND flag, WriteAt returns an error.
Notice the last line, that if the file's opened with O_APPEND flag it will result in an error and it's even right because WriteAt's second argument is an offset but mixing O_APPEND's behaviour and WriteAt offset seeking might create problem resulting in unexpected results and it errors out.
Consider the definition of s3manager.Downloader:
func (d Downloader) Download(w io.WriterAt, input *s3.GetObjectInput, options ...func(*Downloader)) (n int64, err error)
The first argument is an io.WriterAt; this interface is:
type WriterAt interface {
WriteAt(p []byte, off int64) (n int, err error)
}
This means that the Download function is going to call the WriteAt method in the File you are passing it. As per the documentation for File.WriteAt
If file was opened with the O_APPEND flag, WriteAt returns an error.
So this explains why you are getting the error but raises the question "why is Download using WriteAt and not accepting an io.Writer (and calling Write)?"; the answer can be found in the documentation:
The w io.WriterAt can be satisfied by an os.File to do multipart concurrent downloads, or in memory []byte wrapper using aws.WriteAtBuffer
So, to increase performance, Downloader might make multiple simultaneous requests for parts of the file and then write these out as they are received (meaning it may not write the data in order). This also explains why calling the function multiple times with the same File results in overwritten data (when Downloader retrieves the each chunk of the file it writes it out at the appropriate position in the output file; this overwrites any data already there).
The above quote from the documentation also points to a possible solution; use an aws.WriteAtBuffer and, once the download is finished, write the data to your file (which could then be opened with O_APPEND) - something like this:
buf := aws.NewWriteAtBuffer([]byte{})
numBytes, err := downloader.Download(buf,
&s3.GetObjectInput{
Bucket: aws.String(Bucket),
Key: aws.String(fileToDownload),
})
if err != nil {
panic(err)
}
_, err = file.Write(buf.Bytes())
if err != nil {
panic(err)
}
An alternative would be to download into a temporary file and then append that to your output file (you may need to do this if the files are large).

How do I write a gota dataframe to a csv?

I have found many many code examples of writing to a CSV by passing in a [][]string. (like the following):
package main
import (
"os"
"log"
"encoding/csv"
)
var data = [][]string{
{"Row 1", "30"},
{"Row 2", "60"},
{"Row 3", "90"}}
func main() {
file, err := os.Create("tutorials_technology.csv")
if err != nil {
log.Fatal(err)
}
defer file.Close()
w := csv.NewWriter(file)
for _, value := range data {
if err := w.Write(value); err != nil {
log.Fatalln("Error writing record to csv: ", err)
}
}
w.Flush()
}
However, I haven't found any code examples that show how to use the gota dataframe.WriteCSV() function to write to a CSV. In the gota dataframe documentation, there isn't an example for writing to a csv, but there is an example for reading from a csv.
The dataframe function WriteCSV() requires an input of the io.Writer{} interface. I wasn't sure how to satsify that.
The following didn't work
writer := csv.NewWriter(f)
df.WriteCSV(writer) // TODO This writer needs to be a []byte writer
I've been working on this for quite a while. Does anyone have any clues?
I have looked into turning my gota dataframe into a [][]string type, but that's a little inconvenient because I put my data into a gota dataframe with the package's LoadStructs() function and I had read in some CSV in a semi-custom way before putting them into structs.
So I could write a function to turn my structs into a [][]string format, but I feel like that is pretty tedious and I'm sure there has got to be a better way. In fact, I'm sure there is because the dataframe type has the WriteCSV() method but I just haven't figured out how to use it.
Here are my structs
type CsvLine struct {
Index int
Date string
Symbol string
Open float64
High float64
Low float64
Close float64
// Volume float64
// Market_Cap float64
}
type File struct {
Rows []CsvLine
}
Disclaimer: I am a little bit of a golang newbie. I've only been using Go for a couple months, and this is the first time I've tried to write to a file. I haven't interacted much with the io.Writer interface, but I hear that it's very useful.
And yes, I frequently look at the Golang.org blog and I've read "Effective Go" and I keep referencing it.
So it turns out I misunderstood the io.Writer interface and I didn't understand what the os.Create() function returns.
It turns out the code is even simpler and easier than I thought it would be.
Here is the working code example:
df := dataframe.LoadStructs(file.Rows)
f, err := os.Create(outFileName)
if err != nil {
log.Fatal(err)
}
df.WriteCSV(f)

can't understand code about go's print function

i am new to golang, when i read the code example of package "archtive/tar",i read some code like this:
// Iterate through the files in the archive.
for {
hdr, err := tr.Next()
if err == io.EOF {
// end of tar archive
break
}
if err != nil {
log.Fatalln(err)
}
fmt.Printf("Contents of %s:\n", hdr.Name)
if _, err := io.Copy(os.Stdout, tr); err != nil {
log.Fatalln(err)
}
fmt.Println()
}
the output just like this:
Contents of readme.txt:
This archive contains some text files.
Contents of gopher.txt:
Gopher names:
George
Geoffrey
Gonzo
Contents of todo.txt:
Get animal handling license.
can anyone tell me how the programe print the body of the struct? thank you.
You left out a vital piece of the example, the two lines preceding what you posted.
// Open the tar archive for reading.
r := bytes.NewReader(buf.Bytes())
tr := tar.NewReader(r)
This creates a tar.Reader which implements io.Reader. The statement io.Copy(os.Stdout, tr) in the if statement knows how to copy the contents of the reader to Stdout.
Godoc for tar.Reader
Also might be useful to note that the code example in the package documentation doesn't ever write the tar it creates to disk. It is all done in memory using bytes.Buffers. Examples of writing to disk would be in the io package.

Gzip uncompressed http.Response.Body

I am building a Go application that takes an http.Response object and saves it (response headers and body) to a redis hash. When the application receives an http.Response.Body that is not gzipped, I want to gzip it before saving it to the cache.
My confusion stems from my inability to make clear sense of Go's io interfaces, and how to negotiate between http.Response.Body's io.ReadCloser and the gzip Writer. I imagine there is an elegant, streaming solution here, but I can't quite get it to work.
If you've already determined the body is uncompressed, and if you need a []byte of the compressed data (instead of for example already having an io.Writer you could write to, e.g. if you wanted to save the body to a file then you'd want to stream into the file not into a buffer) then something like this should work:
func getCompressedBody(r *http.Response) ([]byte, error) {
var buf bytes.Buffer
gz := gzip.NewWriter(&buf)
if _, err := io.Copy(gz, r.Body); err != nil {
return nil, err
}
err := gz.Close()
return buf.Bytes(), err
}
(this is just an example and would probably be in-line instead of as a function; if you wanted it as a fuction then it should probably take an io.Reader instead of an *http.Response).

From io.Reader to string in Go

I have an io.ReadCloser object (from an http.Response object).
What's the most efficient way to convert the entire stream to a string object?
EDIT:
Since 1.10, strings.Builder exists. Example:
buf := new(strings.Builder)
n, err := io.Copy(buf, r)
// check errors
fmt.Println(buf.String())
OUTDATED INFORMATION BELOW
The short answer is that it it will not be efficient because converting to a string requires doing a complete copy of the byte array. Here is the proper (non-efficient) way to do what you want:
buf := new(bytes.Buffer)
buf.ReadFrom(yourReader)
s := buf.String() // Does a complete copy of the bytes in the buffer.
This copy is done as a protection mechanism. Strings are immutable. If you could convert a []byte to a string, you could change the contents of the string. However, go allows you to disable the type safety mechanisms using the unsafe package. Use the unsafe package at your own risk. Hopefully the name alone is a good enough warning. Here is how I would do it using unsafe:
buf := new(bytes.Buffer)
buf.ReadFrom(yourReader)
b := buf.Bytes()
s := *(*string)(unsafe.Pointer(&b))
There we go, you have now efficiently converted your byte array to a string. Really, all this does is trick the type system into calling it a string. There are a couple caveats to this method:
There are no guarantees this will work in all go compilers. While this works with the plan-9 gc compiler, it relies on "implementation details" not mentioned in the official spec. You can not even guarantee that this will work on all architectures or not be changed in gc. In other words, this is a bad idea.
That string is mutable! If you make any calls on that buffer it will change the string. Be very careful.
My advice is to stick to the official method. Doing a copy is not that expensive and it is not worth the evils of unsafe. If the string is too large to do a copy, you should not be making it into a string.
Answers so far haven't addressed the "entire stream" part of the question. I think the good way to do this is ioutil.ReadAll. With your io.ReaderCloser named rc, I would write,
Go >= v1.16
if b, err := io.ReadAll(rc); err == nil {
return string(b)
} ...
Go <= v1.15
if b, err := ioutil.ReadAll(rc); err == nil {
return string(b)
} ...
data, _ := ioutil.ReadAll(response.Body)
fmt.Println(string(data))
func copyToString(r io.Reader) (res string, err error) {
var sb strings.Builder
if _, err = io.Copy(&sb, r); err == nil {
res = sb.String()
}
return
}
The most efficient way would be to always use []byte instead of string.
In case you need to print data received from the io.ReadCloser, the fmt package can handle []byte, but it isn't efficient because the fmt implementation will internally convert []byte to string. In order to avoid this conversion, you can implement the fmt.Formatter interface for a type like type ByteSlice []byte.
var b bytes.Buffer
b.ReadFrom(r)
// b.String()
I like the bytes.Buffer struct. I see it has ReadFrom and String methods. I've used it with a []byte but not an io.Reader.

Resources