How can I efficiently download a large file using Go? - go

Is there a way to download a large file using Go that will store the content directly into a file instead of storing it all in memory before writing it to a file? Because the file is so big, storing it all in memory before writing it to a file is going to use up all the memory.

I'll assume you mean download via http (error checks omitted for brevity):
import ("net/http"; "io"; "os")
...
out, err := os.Create("output.txt")
defer out.Close()
...
resp, err := http.Get("http://example.com/")
defer resp.Body.Close()
...
n, err := io.Copy(out, resp.Body)
The http.Response's Body is a Reader, so you can use any functions that take a Reader, to, e.g. read a chunk at a time rather than all at once. In this specific case, io.Copy() does the gruntwork for you.

A more descriptive version of Steve M's answer.
import (
"os"
"net/http"
"io"
)
func downloadFile(filepath string, url string) (err error) {
// Create the file
out, err := os.Create(filepath)
if err != nil {
return err
}
defer out.Close()
// Get the data
resp, err := http.Get(url)
if err != nil {
return err
}
defer resp.Body.Close()
// Check server response
if resp.StatusCode != http.StatusOK {
return fmt.Errorf("bad status: %s", resp.Status)
}
// Writer the body to file
_, err = io.Copy(out, resp.Body)
if err != nil {
return err
}
return nil
}

The answer selected above using io.Copy is exactly what you need, but if you are interested in additional features like resuming broken downloads, auto-naming files, checksum validation or monitoring progress of multiple downloads, checkout the grab package.

Here is a sample. https://github.com/thbar/golang-playground/blob/master/download-files.go
Also I give u some codes might help you.
code:
func HTTPDownload(uri string) ([]byte, error) {
fmt.Printf("HTTPDownload From: %s.\n", uri)
res, err := http.Get(uri)
if err != nil {
log.Fatal(err)
}
defer res.Body.Close()
d, err := ioutil.ReadAll(res.Body)
if err != nil {
log.Fatal(err)
}
fmt.Printf("ReadFile: Size of download: %d\n", len(d))
return d, err
}
func WriteFile(dst string, d []byte) error {
fmt.Printf("WriteFile: Size of download: %d\n", len(d))
err := ioutil.WriteFile(dst, d, 0444)
if err != nil {
log.Fatal(err)
}
return err
}
func DownloadToFile(uri string, dst string) {
fmt.Printf("DownloadToFile From: %s.\n", uri)
if d, err := HTTPDownload(uri); err == nil {
fmt.Printf("downloaded %s.\n", uri)
if WriteFile(dst, d) == nil {
fmt.Printf("saved %s as %s\n", uri, dst)
}
}
}

Related

Zip a Directory and not Have the Result Saved in File System

I am able to zip a file using logic similar to the zip writer seen here.
This results in an array of bytes ([]byte) being created within the bytes.Buffer object that is returned. I would just like to know if there is there any way I can upload this 'zipped' array of bytes to an API endpoint that expects a 'multipart/form-data' request body (without having to save it locally).
Supplementary information:
I have code that utilizes this when compressing a folder. I am able to successfully execute an HTTP POST request with the zip file to the endpoint with this logic.
However, this unfortunately saves zipped files in a user's local file system. I would like to try to avoid this :)
You can create multipart writer and write []byte zipped data into field with field name you like and file name like below.
func addZipFileToReq(zipped []byte) (*http.Request, error){
body := bytes.NewBuffer(nil)
writer := multipart.NewWriter(body)
part, err := writer.CreateFormFile(`fileField`, `filename`)
if err != nil {
return nil, err
}
_, err = part.Write(zipped)
if err != nil {
return nil, err
}
err = writer.Close()
if err != nil {
return nil, err
}
r, err := http.NewRequest(http.MethodPost, "https://example.com", body)
if err != nil {
return nil, err
}
r.Header.Set("Content-Type", writer.FormDataContentType())
return r, nil
}
If you want to stream-upload the zip, you should be able to do so with io.Pipe. The following is an incomplete and untested example to demonstrate the general idea. To make it work you'll need to modify it and potentially fix whatever bugs you encounter.
func UploadReader(r io.Reader) error {
req, err := http.NewRequest("POST", "<UPLOAD_URL>", r)
if err != nil {
return err
}
// TODO set necessary headers (content type, auth, etc)
res, err := http.DefaultClient.Do(req)
if err != nil {
return err
} else if res.StatusCode != 200 {
return errors.New("not ok")
}
return nil
}
func ZipDir(dir string, w io.Writer) error {
zw := zip.NewWriter(w)
defer zw.Close()
return filepath.Walk(dir, func(path string, fi os.FileInfo, err error) error {
if err != nil {
return err
}
if !fi.Mode().IsRegular() {
return nil
}
header, err := zip.FileInfoHeader(fi)
if err != nil {
return err
}
header.Name = path
header.Method = zip.Deflate
w, err := zw.CreateHeader(header)
if err != nil {
return err
}
f, err := os.Open(path)
if err != nil {
return err
}
defer f.Close()
if _, err := io.Copy(w, f); err != nil {
return err
}
return nil
})
}
func UploadDir(dir string) error {
r, w := io.Pipe()
ch := make(chan error)
wg := sync.WaitGroup{}
wg.Add(1)
go func() {
defer wg.Done()
defer w.Close()
if err := ZipDir(dir, w); err != nil {
ch <- err
}
}()
wg.Add(1)
go func() {
defer wg.Done()
defer r.Close()
if err := UploadReader(r); err != nil {
ch <- err
}
}()
go func() {
wg.Wait()
close(ch)
}()
return <-ch
}

Convert protobuf serialized messages to JSON without precompiling Go code

I want to convert protobuf serialized messages into a human readable JSON format. The major problem I face is that I need to do this without compiling the proto descriptor into Go code beforehand. I have access to the .proto files at runtime, but not at compile time.
I had the impression that the new Protobuf API v2 (https://github.com/protocolbuffers/protobuf-go) supports dynamic deserialization (see package types/dynamicpb), but I couldn't figure out how to use it apparently:
func readDynamically(in []byte) {
// How do I load the required descriptor (for NewMessage()) from my `addressbook.proto` file?)
descriptor := ??
msg := dynamicpb.NewMessage(descriptor)
err := protojson.Unmarshal(in, msg)
if err != nil {
panic(err)
}
}
Above code is annotated with my problem: How can I get the required descriptor for the dynamicpb.NewMessage() from a .proto file?
Should work like this with the dynamicpb package.
func readDynamically(in []byte) {
registry, err := createProtoRegistry(".", "addressbook.proto")
if err != nil {
panic(err)
}
desc, err := registry.FindFileByPath("addressbook.proto")
if err != nil {
panic(err)
}
fd := desc.Messages()
addressBook := fd.ByName("AddressBook")
msg := dynamicpb.NewMessage(addressBook)
err = proto.Unmarshal(in, msg)
jsonBytes, err := protojson.Marshal(msg)
if err != nil {
panic(err)
}
fmt.Println(string(jsonBytes))
if err != nil {
panic(err)
}
}
func createProtoRegistry(srcDir string, filename string) (*protoregistry.Files, error) {
// Create descriptors using the protoc binary.
// Imported dependencies are included so that the descriptors are self-contained.
tmpFile := filename + "-tmp.pb"
cmd := exec.Command("./protoc/protoc",
"--include_imports",
"--descriptor_set_out=" + tmpFile,
"-I"+srcDir,
path.Join(srcDir, filename))
cmd.Stdout = os.Stdout
cmd.Stderr = os.Stderr
err := cmd.Run()
if err != nil {
return nil, err
}
defer os.Remove(tmpFile)
marshalledDescriptorSet, err := ioutil.ReadFile(tmpFile)
if err != nil {
return nil, err
}
descriptorSet := descriptorpb.FileDescriptorSet{}
err = proto.Unmarshal(marshalledDescriptorSet, &descriptorSet)
if err != nil {
return nil, err
}
files, err := protodesc.NewFiles(&descriptorSet)
if err != nil {
return nil, err
}
return files, nil
}
This question is kind of interesting. I have done some works on protobuf plugs. As far as i can tell, additional cli is needed because we don't want to "reinvent the wheel".
Step one, we need protoc to translate ".proto" file to some format so we can get "protoreflect.MessageDescriptor" easily.
This plug is to get raw bytes which protoc sends to other plugs as input.
package main
import (
"fmt"
"io/ioutil"
"os"
)
func main() {
if len(os.Args) == 2 && os.Args[1] == "--version" {
// fmt.Fprintf(os.Stderr, "%v %v\n", filepath.Base(os.Args[0]), version.String())
os.Exit(0)
}
in, err := ioutil.ReadAll(os.Stdin)
if err != nil {
fmt.Printf("error: %v", err)
return
}
ioutil.WriteFile("./out.pb", in, 0755)
}
build and rename it as protoc-gen-raw, then generate protoc --raw_out=./pb ./server.proto, you will get out.pb. Forget your ".proto" file from now on, and put this "out.pb" where you intend to put ".proto". And what we get is official support with this .pb file.
Step 2: Deserialize a protobuf serialized message into JSON.
package main
import (
"fmt"
"io/ioutil"
"google.golang.org/protobuf/proto"
"google.golang.org/protobuf/compiler/protogen"
"google.golang.org/protobuf/encoding/protojson"
"google.golang.org/protobuf/types/dynamicpb"
"google.golang.org/protobuf/types/pluginpb"
)
func main() {
in, err := ioutil.ReadFile("./out.pb")
if err != nil {
fmt.Printf("failed to read proto file: %v", err)
return
}
req := &pluginpb.CodeGeneratorRequest{}
if err := proto.Unmarshal(in, req); err != nil {
fmt.Printf("failed to unmarshal proto: %v", err)
return
}
gen, err := protogen.Options{}.New(req)
if err != nil {
fmt.Printf("failed to create new plugin: %v", err)
return
}
// serialize protobuf message "ServerConfig"
data := &ServerConfig{
GameType: 1,
ServerId: 105,
Host: "host.host.host",
Port: 10024,
}
raw, err := data.Marshal()
if err != nil {
fmt.Printf("failed to marshal protobuf: %v", err)
return
}
for _, f := range gen.Files {
for _, m := range f.Messages {
// "ServerConfig" is the message name of the serialized message
if m.GoIdent.GoName == "ServerConfig" {
// m.Desc is MessageDescriptor
msg := dynamicpb.NewMessage(m.Desc)
// unmarshal []byte into proto message
err := proto.Unmarshal(raw, msg)
if err != nil {
fmt.Printf("failed to Unmarshal protobuf data: %v", err)
return
}
// marshal message into json
jsondata, err := protojson.Marshal(msg)
if err != nil {
fmt.Printf("failed to Marshal to json: %v", err)
return
}
fmt.Printf("out: %v", string(jsondata))
}
}
}
}
// the output is:
// out: {"gameType":1, "serverId":105, "host":"host.host.host", "port":10024}

Editing a zip file in memory

I am trying to edit a zip file in memory in Go and return the zipped file through a HTTP response
The goal is to add a few files to a path in the zip file example
I add a log.txt file in my path/to/file route in the zipped folder
All this should be done without saving the file or editing the original file.
I have implemented a simple version of real-time stream compression, which can correctly compress a single file. If you want it to run efficiently, you need a lot of optimization.
This is only for reference. If you need more information, you should set more useful HTTP header information before compression so that the client can correctly process the response data.
package main
import (
"archive/zip"
"io"
"net/http"
"os"
"github.com/gin-gonic/gin"
)
func main() {
engine := gin.Default()
engine.GET("/log.zip", func(c *gin.Context) {
f, err := os.Open("./log.txt")
if err != nil {
c.String(http.StatusInternalServerError, err.Error())
return
}
defer f.Close()
info, err := f.Stat()
if err != nil {
c.String(http.StatusInternalServerError, err.Error())
return
}
z := zip.NewWriter(c.Writer)
head, err := zip.FileInfoHeader(info)
if err != nil {
c.String(http.StatusInternalServerError, err.Error())
return
}
defer z.Close()
w, err := z.CreateHeader(head)
if err != nil {
c.String(http.StatusInternalServerError, err.Error())
return
}
_, err = io.Copy(w, f)
if err != nil {
c.String(http.StatusInternalServerError, err.Error())
return
}
})
engine.Run("127.0.0.1:8080")
}
So after hours of tireless work i figured out my approach was bad or maybe not possible with the level of my knowledge so here is a not so optimal solution but it works and fill ur file is not large it should be okay for you.
So you have a file template.zip and u want to add extra files, my initial approach was to copy the whole file into memory and edit it from their but i was having complications.
My next approach was to recreate the file in memory, file by file and to do that i need to know every file in the directory i used the code below to get all my files into a list
root := "template"
err = filepath.Walk(root, func(path string, info os.FileInfo, err error) error {
if info.IsDir() {
return nil
}append(files,path)}
now i have all my files and i can create a buffer to hold all this files
buf := new(bytes.Buffer)
// Create a new zip archive.
zipWriter := zip.NewWriter(buf)
now with the zip archive i can write all my old files to it while at the same time copying the contents
for _, file := range files {
zipFile, err := zipWriter.Create(file)
if err != nil {
fmt.Println(err)
}
content, err := ioutil.ReadFile(file)
if err != nil {
log.Fatal(err)
}
// Convert []byte to string and print to screen
// text := string(content)
_, err = zipFile.Write(content)
if err != nil {
fmt.Println(err)
}
}
At this point, we have our file in buf.bytes()
The remaining cold adds the new files and sends the response back to the client
for _, appCode := range appPageCodeText {
f, err := zipWriter.Create(filepath.fileextension)
if err != nil {
log.Fatal(err)
}
_, err = f.Write([]byte(appCode.Content))
}
err = zipWriter.Close()
if err != nil {
fmt.Println(err)
}
w.Header().Set("Content-Disposition", "attachment; filename="+"template.zip")
w.Header().Set("Content-Type", "application/zip")
w.Write(buf.Bytes()) //'Copy' the file to the client

How would I optimize code that reads when doing a Hash and seeks to the beginning to re-read it again?

How do I make this not require a file seek? Basically, I am doing a hash, and then re-reading the file. This is not optimal. How can I optimize by either using TeeReader or another method to read in chunks so Hashing and Writing content can happen without duplication of reading.
Also, do I need to specify content length myself?
// PUT method
func (c *Client) PutFileOld(filename string, noLen bool) error {
file, err := os.Open(filename)
if err != nil {
return err
}
defer file.Close()
hasher := md5.New()
if _, err := io.Copy(hasher, file); err != nil {
log.Fatal("Could not compute MD5")
}
// Lazy way to go back to the beginning since the reader has consumed our bytes
// and we have to compute the hash
file.Seek(0, 0)
c.MD5 = hex.EncodeToString(hasher.Sum(nil)[:16])
log.Printf("Uploading to: %s", fmt.Sprintf("%s/%s", c.baseURL, filename))
baseURL, err := url.Parse(fmt.Sprintf("%s/%s", c.baseURL, filename))
if err != nil {
return err
}
log.Printf("MD5: %s - file: %s\n", c.MD5, filename)
req, err := http.NewRequest(http.MethodPut, baseURL.String(), bufio.NewReader(file))
if err != nil {
return err
}
req.Header.Set("Content-Type", "application/octet-stream")
req.Header.Set("Content-Md5", c.MD5)
fi, _ := file.Stat()
// Not sure if this is needed, or if Go sets it automatically
req.ContentLength = fi.Size()
res, err := c.httpClient.Do(req)
if err != nil {
return err
}
dump, err := httputil.DumpResponse(res, true)
if err != nil {
log.Fatal(err)
}
fmt.Printf("%q\n", dump)
c.StatusCode = res.StatusCode
defer res.Body.Close()
return nil
}

How to write RIFF chunk header when store image from url?

I just tried to download webp image from url, but I found something different when I try to process the stored image.
If I download the image from the browser, it can be decoded using x/image/webp package, but if I store the image using http.Get() then create a new file then io.Copy() the image, it says:
"missing RIFF chunk header"
I assume that I need to write some RIFF chunk header when I store it using golang code.
func main(){
response, e := http.Get(URL)
if e != nil {
log.Fatal(e)
}
defer response.Body.Close()
//open a file for writing
file, err := os.Create('tv.webp')
if err != nil {
log.Fatal(err)
}
defer file.Close()
// Use io.Copy to just dump the response body to the file. This supports huge files
_, err = io.Copy(file, response.Body)
if err != nil {
log.Fatal(err)
}
fmt.Println("Success!")
imgData, err := os.Open("tv.webp")
if err != nil {
fmt.Println(err)
return
}
log.Printf("%+v", imgData)
image, err := webp.Decode(imgData)
if err != nil {
fmt.Println(err)
return
}
fmt.Println(image.Bounds())
}
Here is the URL IMG URL
download file is not webp type. it's png.
package main
import (
"fmt"
"image"
"io"
"log"
"net/http"
"os"
_ "image/png"
)
func main() {
response, e := http.Get("https://www.sony.com/is/image/gwtprod/0abe7672ff4c6cb4a0a4d4cc143fd05b?fmt=png-alpha")
if e != nil {
log.Fatal(e)
}
defer response.Body.Close()
file, err := os.Create("dump")
if err != nil {
log.Fatal(err)
}
defer file.Close()
_, err = io.Copy(file, response.Body)
if err != nil {
log.Fatal(err)
}
fmt.Println("Success!")
imageFile, err := os.Open("dump")
if err != nil {
panic(err)
}
m, name, err := image.Decode(imageFile)
if err != nil {
panic(err)
}
fmt.Println("image type is ", name, m.Bounds())
}

Resources