golang unzip Response.Body - go

I wrote a little web crawler and had known that the Response is a zip file.
In my limited experience with golang programing, I only know how to unzip a existing file.
Can I unzip the Response.Body in memory without saving it in hard disk in advance?

Updating answer for handling Zip file response body in-memory.
Note: Ensure you have sufficient memory for handling zip file.
package main
import (
"archive/zip"
"bytes"
"fmt"
"io/ioutil"
"log"
"net/http"
)
func main() {
resp, err := http.Get("zip file url")
if err != nil {
log.Fatal(err)
}
defer resp.Body.Close()
body, err := ioutil.ReadAll(resp.Body)
if err != nil {
log.Fatal(err)
}
zipReader, err := zip.NewReader(bytes.NewReader(body), int64(len(body)))
if err != nil {
log.Fatal(err)
}
// Read all the files from zip archive
for _, zipFile := range zipReader.File {
fmt.Println("Reading file:", zipFile.Name)
unzippedFileBytes, err := readZipFile(zipFile)
if err != nil {
log.Println(err)
continue
}
_ = unzippedFileBytes // this is unzipped file bytes
}
}
func readZipFile(zf *zip.File) ([]byte, error) {
f, err := zf.Open()
if err != nil {
return nil, err
}
defer f.Close()
return ioutil.ReadAll(f)
}
By default Go HTTP client handles Gzip response automatically. So do typical read and close of response body.
However there is a catch in it.
// Reference https://github.com/golang/go/blob/master/src/net/http/transport.go
//
// DisableCompression, if true, prevents the Transport from
// requesting compression with an "Accept-Encoding: gzip"
// request header when the Request contains no existing
// Accept-Encoding value. If the Transport requests gzip on
// its own and gets a gzipped response, it's transparently
// decoded in the Response.Body. However, if the user
// explicitly requested gzip it is not automatically
// uncompressed.
DisableCompression bool
What it means is; If you add a header Accept-Encoding: gzip manually in the request then you have to handle Gzip response body by yourself.
For Example -
reader, err := gzip.NewReader(resp.Body)
if err != nil {
log.Fatal(err)
}
defer reader.Close()
body, err := ioutil.ReadAll(reader)
if err != nil {
log.Fatal(err)
}
fmt.Println(string(body))

Related

Golang bufio from websocket breaking after first read

I am trying to stream JSON text from a websocket. However after an initial read I noticed that the stream seems to break/disconnect. This is from a Pleroma server (think: Mastodon). I am using the default Golang websocket library.
package main
import (
"bufio"
"fmt"
"log"
"golang.org/x/net/websocket"
)
func main() {
origin := "https://poa.st/"
url := "wss://poa.st/api/v1/streaming/?stream=public"
ws, err := websocket.Dial(url, "", origin)
if err != nil {
log.Fatal(err)
}
s := bufio.NewScanner(ws)
for s.Scan() {
line := s.Text()
fmt.Println(line)
}
}
After the initial JSON text response, the for-loop breaks. I would expect it to send a new message every few seconds.
What might be causing this? I am willing to switch to the Gorilla websocket library if I can use it with bufio.
Thanks!
Although x/net/websocket connection has a Read method with the same signature as the Read method in io.Reader, the connection does not work like an io.Reader. The connection will not work as you expect when wrapped with a bufio.Scanner.
The poa.st endpoint sends a stream of messages where each message is a JSON document. Use the following code to read the messages using the Gorilla package:
url := "wss://poa.st/api/v1/streaming/?stream=public"
ws, _, err := websocket.DefaultDialer.Dial(url, nil)
if err != nil {
log.Fatal(err)
}
defer ws.Close()
for {
_, p, err := ws.ReadMessage()
if err != nil {
log.Fatal(err)
}
// p is a []byte containing the JSON document.
fmt.Printf("%s\n", p)
}
The Gorilla package has a helper method for decoding JSON messages. Here's an example of how to use that method.
url := "wss://poa.st/api/v1/streaming/?stream=public"
ws, _, err := websocket.DefaultDialer.Dial(url, nil)
if err != nil {
log.Fatal(err)
}
defer ws.Close()
for {
// The JSON documents are objects containing two fields,
// the event type and the payload. The payload is a JSON
// document itself.
var e struct {
Event string
Payload string
}
err := ws.ReadJSON(&e)
if err != nil {
log.Fatal(err)
}
// TODO: decode e.Payload based on e.Event
}

Editing a zip file in memory

I am trying to edit a zip file in memory in Go and return the zipped file through a HTTP response
The goal is to add a few files to a path in the zip file example
I add a log.txt file in my path/to/file route in the zipped folder
All this should be done without saving the file or editing the original file.
I have implemented a simple version of real-time stream compression, which can correctly compress a single file. If you want it to run efficiently, you need a lot of optimization.
This is only for reference. If you need more information, you should set more useful HTTP header information before compression so that the client can correctly process the response data.
package main
import (
"archive/zip"
"io"
"net/http"
"os"
"github.com/gin-gonic/gin"
)
func main() {
engine := gin.Default()
engine.GET("/log.zip", func(c *gin.Context) {
f, err := os.Open("./log.txt")
if err != nil {
c.String(http.StatusInternalServerError, err.Error())
return
}
defer f.Close()
info, err := f.Stat()
if err != nil {
c.String(http.StatusInternalServerError, err.Error())
return
}
z := zip.NewWriter(c.Writer)
head, err := zip.FileInfoHeader(info)
if err != nil {
c.String(http.StatusInternalServerError, err.Error())
return
}
defer z.Close()
w, err := z.CreateHeader(head)
if err != nil {
c.String(http.StatusInternalServerError, err.Error())
return
}
_, err = io.Copy(w, f)
if err != nil {
c.String(http.StatusInternalServerError, err.Error())
return
}
})
engine.Run("127.0.0.1:8080")
}
So after hours of tireless work i figured out my approach was bad or maybe not possible with the level of my knowledge so here is a not so optimal solution but it works and fill ur file is not large it should be okay for you.
So you have a file template.zip and u want to add extra files, my initial approach was to copy the whole file into memory and edit it from their but i was having complications.
My next approach was to recreate the file in memory, file by file and to do that i need to know every file in the directory i used the code below to get all my files into a list
root := "template"
err = filepath.Walk(root, func(path string, info os.FileInfo, err error) error {
if info.IsDir() {
return nil
}append(files,path)}
now i have all my files and i can create a buffer to hold all this files
buf := new(bytes.Buffer)
// Create a new zip archive.
zipWriter := zip.NewWriter(buf)
now with the zip archive i can write all my old files to it while at the same time copying the contents
for _, file := range files {
zipFile, err := zipWriter.Create(file)
if err != nil {
fmt.Println(err)
}
content, err := ioutil.ReadFile(file)
if err != nil {
log.Fatal(err)
}
// Convert []byte to string and print to screen
// text := string(content)
_, err = zipFile.Write(content)
if err != nil {
fmt.Println(err)
}
}
At this point, we have our file in buf.bytes()
The remaining cold adds the new files and sends the response back to the client
for _, appCode := range appPageCodeText {
f, err := zipWriter.Create(filepath.fileextension)
if err != nil {
log.Fatal(err)
}
_, err = f.Write([]byte(appCode.Content))
}
err = zipWriter.Close()
if err != nil {
fmt.Println(err)
}
w.Header().Set("Content-Disposition", "attachment; filename="+"template.zip")
w.Header().Set("Content-Type", "application/zip")
w.Write(buf.Bytes()) //'Copy' the file to the client

How to write RIFF chunk header when store image from url?

I just tried to download webp image from url, but I found something different when I try to process the stored image.
If I download the image from the browser, it can be decoded using x/image/webp package, but if I store the image using http.Get() then create a new file then io.Copy() the image, it says:
"missing RIFF chunk header"
I assume that I need to write some RIFF chunk header when I store it using golang code.
func main(){
response, e := http.Get(URL)
if e != nil {
log.Fatal(e)
}
defer response.Body.Close()
//open a file for writing
file, err := os.Create('tv.webp')
if err != nil {
log.Fatal(err)
}
defer file.Close()
// Use io.Copy to just dump the response body to the file. This supports huge files
_, err = io.Copy(file, response.Body)
if err != nil {
log.Fatal(err)
}
fmt.Println("Success!")
imgData, err := os.Open("tv.webp")
if err != nil {
fmt.Println(err)
return
}
log.Printf("%+v", imgData)
image, err := webp.Decode(imgData)
if err != nil {
fmt.Println(err)
return
}
fmt.Println(image.Bounds())
}
Here is the URL IMG URL
download file is not webp type. it's png.
package main
import (
"fmt"
"image"
"io"
"log"
"net/http"
"os"
_ "image/png"
)
func main() {
response, e := http.Get("https://www.sony.com/is/image/gwtprod/0abe7672ff4c6cb4a0a4d4cc143fd05b?fmt=png-alpha")
if e != nil {
log.Fatal(e)
}
defer response.Body.Close()
file, err := os.Create("dump")
if err != nil {
log.Fatal(err)
}
defer file.Close()
_, err = io.Copy(file, response.Body)
if err != nil {
log.Fatal(err)
}
fmt.Println("Success!")
imageFile, err := os.Open("dump")
if err != nil {
panic(err)
}
m, name, err := image.Decode(imageFile)
if err != nil {
panic(err)
}
fmt.Println("image type is ", name, m.Bounds())
}

Requesting multiple URLs in Go

I have the following Go program: https://play.golang.org/p/-TUtJ7DIhi
package main
import (
"encoding/json"
"fmt"
"io/ioutil"
"net/http"
"strconv"
)
func main() {
body, err := get("https://hacker-news.firebaseio.com/v0/topstories.json")
if err != nil {
panic(err)
}
var ids [500]int
if err = json.Unmarshal(body, &ids); err != nil {
panic(err)
}
var contents []byte
for _, value := range ids[0:10] {
body, err := get("https://hacker-news.firebaseio.com/v0/item/" + strconv.Itoa(value) + ".json")
if err != nil {
fmt.Println(err)
} else {
contents = append(contents, body...)
}
}
fmt.Println(contents)
}
func get(url string) ([]byte, error) {
res, err := http.Get(url)
if err != nil {
return nil, err
}
body, err := ioutil.ReadAll(res.Body)
res.Body.Close()
return body, err
}
When run it throws EOF json errors on the iterative get requests, but when I hit the URLs individually they do not appear to be malformed.
What am I missing?
It looks like there's something wrong with their server, and it's closing connections without sending a Connection: close header. The client therefore tries to reuse the connection per the HTTP/1.1 specification.
You can work around this by creating your own request, and setting Close = true, or using a custom Transport with DisableKeepAlives = true
req, err := http.NewRequest("GET", url, nil)
if err != nil {
return nil, err
}
req.Close = true
res, err := http.DefaultClient.Do(req)
if err != nil {
return nil, err
}

How can I efficiently download a large file using Go?

Is there a way to download a large file using Go that will store the content directly into a file instead of storing it all in memory before writing it to a file? Because the file is so big, storing it all in memory before writing it to a file is going to use up all the memory.
I'll assume you mean download via http (error checks omitted for brevity):
import ("net/http"; "io"; "os")
...
out, err := os.Create("output.txt")
defer out.Close()
...
resp, err := http.Get("http://example.com/")
defer resp.Body.Close()
...
n, err := io.Copy(out, resp.Body)
The http.Response's Body is a Reader, so you can use any functions that take a Reader, to, e.g. read a chunk at a time rather than all at once. In this specific case, io.Copy() does the gruntwork for you.
A more descriptive version of Steve M's answer.
import (
"os"
"net/http"
"io"
)
func downloadFile(filepath string, url string) (err error) {
// Create the file
out, err := os.Create(filepath)
if err != nil {
return err
}
defer out.Close()
// Get the data
resp, err := http.Get(url)
if err != nil {
return err
}
defer resp.Body.Close()
// Check server response
if resp.StatusCode != http.StatusOK {
return fmt.Errorf("bad status: %s", resp.Status)
}
// Writer the body to file
_, err = io.Copy(out, resp.Body)
if err != nil {
return err
}
return nil
}
The answer selected above using io.Copy is exactly what you need, but if you are interested in additional features like resuming broken downloads, auto-naming files, checksum validation or monitoring progress of multiple downloads, checkout the grab package.
Here is a sample. https://github.com/thbar/golang-playground/blob/master/download-files.go
Also I give u some codes might help you.
code:
func HTTPDownload(uri string) ([]byte, error) {
fmt.Printf("HTTPDownload From: %s.\n", uri)
res, err := http.Get(uri)
if err != nil {
log.Fatal(err)
}
defer res.Body.Close()
d, err := ioutil.ReadAll(res.Body)
if err != nil {
log.Fatal(err)
}
fmt.Printf("ReadFile: Size of download: %d\n", len(d))
return d, err
}
func WriteFile(dst string, d []byte) error {
fmt.Printf("WriteFile: Size of download: %d\n", len(d))
err := ioutil.WriteFile(dst, d, 0444)
if err != nil {
log.Fatal(err)
}
return err
}
func DownloadToFile(uri string, dst string) {
fmt.Printf("DownloadToFile From: %s.\n", uri)
if d, err := HTTPDownload(uri); err == nil {
fmt.Printf("downloaded %s.\n", uri)
if WriteFile(dst, d) == nil {
fmt.Printf("saved %s as %s\n", uri, dst)
}
}
}

Resources