How to efficiently store html response to a file in golang - go

I'm trying to build a crawler in Golang. I'm using net/http library to download the html file from url. I'm trying to save http.resp and http.Header into file.
How to convert these two file from their respective format into string so that, it could be written to a text file.
I also see a question asked earlier on parsing a stored html response file. Parse HTTP requests and responses from text file in Go . Is there any way to save the url response in this format.

Go has an httputil package with a response dump.
https://golang.org/pkg/net/http/httputil/#DumpResponse.
The second argument of response dump is a bool of whether or not to include the body. So if you want to save just the header to a file, set that to false.
An example function that would dump the response to a file could be:
import (
"io/ioutil"
"net/http"
"net/http/httputil"
)
func dumpResponse(resp *http.Response, filename string) error {
dump, err := httputil.DumpResponse(resp, true)
if err != nil {
return err
}
return ioutil.WriteFile(filename, dump, 0644)
}

Edit: Thanks to #JimB for pointing to the http.Response.Write method which makes this a lot easier than I proposed in the beginning:
resp, err := http.Get("http://google.com/")
if err != nil{
log.Panic(err)
}
f, err := os.Create("output.txt")
defer f.Close()
resp.Write(f)
This was my first Answer
You could do something like this:
resp, err := http.Get("http://google.com/")
body, err := ioutil.ReadAll(resp.Body)
// write whole the body
err = ioutil.WriteFile("body.txt", body, 0644)
if err != nil {
panic(err)
}
This was the edit to my first answer:
Thanks to #Hector Correa who added the header part. Here is a more comprehensive snippet, targeting your whole question. This writes header followed by the body of the request to output.txt
//get the response
resp, err := http.Get("http://google.com/")
//body
body, err := ioutil.ReadAll(resp.Body)
//header
var header string
for h, v := range resp.Header {
for _, v := range v {
header += fmt.Sprintf("%s %s \n", h, v)
}
}
//append all to one slice
var write []byte
write = append(write, []byte(header)...)
write = append(write, body...)
//write it to a file
err = ioutil.WriteFile("output.txt", write, 0644)
if err != nil {
panic(err)
}

Following on the answer by #Riscie you could also pick up the headers from the response with something like this:
for header, values := range resp.Header {
for _, value := range values {
log.Printf("\t\t %s %s", header, value)
}
}

Related

HTTP Post containing binary data in golang [duplicate]

This question already has answers here:
POST data using the Content-Type multipart/form-data
(7 answers)
Closed 8 months ago.
I hope I can explain this right. I'm trying to make an HTTP post request that contains binary data (a file). This is for DeepStack image processing. In Python I have the following working:
image_data = open(file,"rb").read()
try:
response = requests.post("http://deepstack.local:82/v1/vision/detection",files={"image":image_data},timeout=15).json()
In Go, I started with the basic example from here: https://golangtutorial.dev/tips/http-post-json-go/
Modifying this a bit for my use, the relevant lines are:
data, err := ioutil.ReadFile(tempPath + file.Name())
if err != nil {
log.Print(err)
}
httpposturl := "http://deepstack.local:82/v1/vision/custom/combined"
fmt.Println("HTTP JSON POST URL:", httpposturl)
var jsonData = []byte(`{"image": ` + data + `}`)
request, error := http.NewRequest("POST", httpposturl, bytes.NewBuffer(jsonData))
request.Header.Set("Content-Type", "application/json; charset=UTF-8")
This results in an error:
invalid operation: `{"image": ` + data (mismatched types untyped string and []byte)`
the "data" variable at this point is []uint8 ([]byte). I realize, at a high level, what is wrong. I'm trying to join two data types that are not the same. That's about it though. I've tried a bunch of stuff that I'm pretty sure anyone familiar with Go would immediately realize was wrong (declaring jsonData as a byte, converting data to a string, using os.Open instead of ioutil.ReadFile, etc.). I'm just kind of stumbling around blind though. I can't find an example that doesn't use a plain string as the JSON data.
I would appreciate any thoughts.
--- ANSWER ---
I'm marking Dietrich Epp's answer as accepted, because he gave me what I asked for. However, RedBlue in the comments gave me what I actually needed. Thank you both. The code below is modified just a bit from this answer: https://stackoverflow.com/a/56696333/2707357
Change the url variable to your DeepStack server, and the file name to one that actually exists, and the response body should return the necessary information.
package main
import (
"bytes"
"fmt"
"io"
"io/ioutil"
"mime/multipart"
"net/http"
"os"
)
func createMultipartFormData(fieldName, fileName string) (bytes.Buffer, *multipart.Writer) {
var b bytes.Buffer
var err error
w := multipart.NewWriter(&b)
var fw io.Writer
file := mustOpen(fileName)
if fw, err = w.CreateFormFile(fieldName, file.Name()); err != nil {
fmt.Println("Error: ", err)
}
if _, err = io.Copy(fw, file); err != nil {
fmt.Println("Error: ", err)
}
w.Close()
return b, w
}
func mustOpen(f string) *os.File {
r, err := os.Open(f)
if err != nil {
pwd, _ := os.Getwd()
fmt.Println("PWD: ", pwd)
panic(err)
}
return r
}
func main() {
url := "http://deepstack.local:82/v1/vision/custom/combined"
b, w := createMultipartFormData("image", "C:\\go_sort\\temp\\person.jpg")
req, err := http.NewRequest("POST", url, &b)
if err != nil {
return
}
// Don't forget to set the content type, this will contain the boundary.
req.Header.Set("Content-Type", w.FormDataContentType())
client := &http.Client{}
response, error := client.Do(req)
if err != nil {
panic(error)
}
defer response.Body.Close()
fmt.Println("response Status:", response.Status)
fmt.Println("response Headers:", response.Header)
body, _ := ioutil.ReadAll(response.Body)
fmt.Println("response Body:", string(body))
}
It's really such a small error. This is all your question boils down to, as far as I can tell:
var data []byte // with some value
jsonData := []byte(`{"image": ` + data + `}`)
All you have to do is change this to use append() or something similar:
jsonData := append(
append([]byte(`{"image": `), data...),
'}')
The reason is that you can't use + to concatenate []byte in Go. You can use append(), though.

Golang bufio from websocket breaking after first read

I am trying to stream JSON text from a websocket. However after an initial read I noticed that the stream seems to break/disconnect. This is from a Pleroma server (think: Mastodon). I am using the default Golang websocket library.
package main
import (
"bufio"
"fmt"
"log"
"golang.org/x/net/websocket"
)
func main() {
origin := "https://poa.st/"
url := "wss://poa.st/api/v1/streaming/?stream=public"
ws, err := websocket.Dial(url, "", origin)
if err != nil {
log.Fatal(err)
}
s := bufio.NewScanner(ws)
for s.Scan() {
line := s.Text()
fmt.Println(line)
}
}
After the initial JSON text response, the for-loop breaks. I would expect it to send a new message every few seconds.
What might be causing this? I am willing to switch to the Gorilla websocket library if I can use it with bufio.
Thanks!
Although x/net/websocket connection has a Read method with the same signature as the Read method in io.Reader, the connection does not work like an io.Reader. The connection will not work as you expect when wrapped with a bufio.Scanner.
The poa.st endpoint sends a stream of messages where each message is a JSON document. Use the following code to read the messages using the Gorilla package:
url := "wss://poa.st/api/v1/streaming/?stream=public"
ws, _, err := websocket.DefaultDialer.Dial(url, nil)
if err != nil {
log.Fatal(err)
}
defer ws.Close()
for {
_, p, err := ws.ReadMessage()
if err != nil {
log.Fatal(err)
}
// p is a []byte containing the JSON document.
fmt.Printf("%s\n", p)
}
The Gorilla package has a helper method for decoding JSON messages. Here's an example of how to use that method.
url := "wss://poa.st/api/v1/streaming/?stream=public"
ws, _, err := websocket.DefaultDialer.Dial(url, nil)
if err != nil {
log.Fatal(err)
}
defer ws.Close()
for {
// The JSON documents are objects containing two fields,
// the event type and the payload. The payload is a JSON
// document itself.
var e struct {
Event string
Payload string
}
err := ws.ReadJSON(&e)
if err != nil {
log.Fatal(err)
}
// TODO: decode e.Payload based on e.Event
}

Handle Http Upload Zip file in Golang

I'm using golang net/http package to retrieve the uploaded zip file via postman.
The attachment file link. It is not dangerous file. Feel free to check out.
Development env
local machine m1 macbook pro golang 1.17.2 - no issue
server docker image golang:1.17.5-stretch - got issue.
Code to capture the post form transSourceFile file.
func HandleFileReqTest(w http.ResponseWriter, req *http.Request, params map[string]string) err {
if err := req.ParseMultipartForm(32 << 20); err != nil {
return err
}
file, header, err := req.FormFile("transSourceFile")
if err != nil {
return err
}
defer file.Close()
fmt.Println("header.Size:", header.Size)
return nil
}
I tried below code also no use
func HandleFileReqTest(w http.ResponseWriter, req *http.Request, params map[string]string) err {
if err := req.ParseForm(); err != nil {
return err
}
req.ParseMultipartForm(32 << 20)
file, header, err := req.FormFile("transSourceFile")
if err != nil {
return err
}
defer file.Close()
fmt.Println("header.Size:", header.Size)
return nil
}
Result:
Local machine got the same file size as the origin file.
Server with golang:1.17.5-stretch got the different file size compare to origin file.
As the result on this, i'm unable to unzip the file in the server. Anyone can help?
You need to copy form file to the actual file:
f, err := os.Create("some.zip")
defer f.Close()
n, err := io.Copy(f, file)
Data isn't being flushed to the file completely. You should close the file first to ensure that the data is fully flushed.
// create a local file filename
dst, err := os.Create("filename.zip")
// save it
fl, err = io.Copy(dst, src)
// Close the file
dst.Close()
stat, _ := dst.Stat()
//Now check the size stat.Size() or header.Size after flushing the file.

How to extract files from multipart-form

I'm writing a Go client to create backups via a REST-API. The REST-API Response with a multipart form data to a GET-Request. So the content of the response (type *http.Response) body looks like this:
--1ceb25134a5967272c26c9f3f543e7d26834a5967272c26c9f3f595caf08
Content-Disposition: form-data; name="configuration"; filename="test.gz"
Content-Type: application/x-gzip
...
--1ceb25134a5967272c26c9f3f543e7d26834a5967272c26c9f3f595caf08--
How can I extract the zip file from the response body?
I tried to use the builtin (net/http) methods but these requires an Request struct.
Use the mime/multipart package. Assuming that resp is the *http.Response, use the following code to iterate through the parts.
contentType := resp.Header.Get("Content-Type")
mediaType, params, err := mime.ParseMediaType(contentType)
if err != nil {
log.Fatal(err)
}
if strings.HasPrefix(mediaType, "multipart/") {
mr := multipart.NewReader(resp.Body, params["boundary"])
for {
p, err := mr.NextPart()
if err == io.EOF {
return
}
if err != nil {
log.Fatal(err)
}
// p.FormName() is the name of the element.
// p.FileName() is the name of the file (if it's a file)
// p is an io.Reader on the part
// The following code prints the part for demonstration purposes.
slurp, err := ioutil.ReadAll(p)
if err != nil {
log.Fatal(err)
}
fmt.Printf("Part %q, %q: %q\n", p.FormName(), p.FileName(), slurp)
}
}
The code in the answer handles errors by calling log.Fata. Adjust the error handling to meet the needs of your application.

Multipart file field is unreadable

I am trying to upload photos to Twitter. I created a multipart writer and creating a file field using that named media but when I send my request to Twitter it keeps responding missing media field.
Am I missing something?
Here is my code
f, err := os.Open("/Users/nikos/Desktop/test.png")
errored:
if nil != err {
fmt.Println(err)
return
}
var img = new(bytes.Buffer)
enc := base64.NewEncoder(base64.StdEncoding, img)
_, err = io.Copy(enc, f)
if nil != err {
goto errored
}
body := new(bytes.Buffer)//Multipart body
writer := multipart.NewWriter(body)
cl, err := twitter.OauthClient.MakeHttpClient(&oauth.AccessToken{
Token: "xxx",
Secret: "yyy",
})
err = writer.WriteField("media_data", img.String())//base64 version of the image (i tried both binary and base64 versions neither will work)
if nil != err {
goto errored
}
part, err := writer.CreateFormFile("media", "test.png")//actual binary file multiparted and it is named media.
if nil != err {
goto errored
}
_, err = io.Copy(part, f)
if nil != err {
goto errored
}
req, err := http.NewRequest("POST",
"https://upload.twitter.com/1.1/media/upload.json",
body)
if nil != err {
goto errored
}
res, err := cl.Do(req)
if nil != err {
goto errored
}
//and twitter responds that there is no field attached named media
_, err = io.Copy(os.Stdout, res.Body)
fmt.Println(res)
if nil != err {
goto errored
}
Updates: Just referred Twitter API Upload parameter. As per your code snippet you're using both fields media and media_data. You have to use only one -
Upload using base64 -> field name is media_data
Upload using raw -> field name is media
And, you have to add Content-Type header.
req, err := http.NewRequest("POST",
"https://upload.twitter.com/1.1/media/upload.json",
body)
req.Header.Set("Content-Type", writer.FormDataContentType())
if err := writer.Close(); err != nil {
log.Println(err)
}
// Now fire the http request
PS: While composing an answer, in 30 secs gap, #cerise-limón added comment, also close the multipart writer as mentioned by #cerise-limón.
Asked in the comment:
Twitter accepts application/octet-stream, you may not need below approach.
Adding multi-part with user supplied Content-Type instead of application/octet-stream. Basically you have to do same implementation as convenience wrapper with your content-type.
writer := multipart.NewWriter(body)
h := make(textproto.MIMEHeader)
h.Set("Content-Disposition", fmt.Sprintf(`form-data; name="%s"; filename="%s"`,
escapeQuotes(fieldname), escapeQuotes(filename)))
h.Set("Content-Type", "image/png")
part, err := writer.CreatePart(h)
// use part same as before
Definition of escapeQuotes from multiple-part package.
var quoteEscaper = strings.NewReplacer("\\", "\\\\", `"`, "\\\"")
func escapeQuotes(s string) string {
return quoteEscaper.Replace(s)
}

Resources