Can I stream data from a writer to a reader in golang? - go

I want to process a number of files whose contents don't fit in the memory of my worker. The solution I found so far involves saving the results to the processing to the /tmp directory before uploading it to S3.
import (
"bufio"
"bytes"
"context"
"fmt"
"log"
"os"
"runtime"
"strings"
"sync"
"github.com/aws/aws-sdk-go-v2/service/s3"
"github.com/korovkin/limiter"
"github.com/xitongsys/parquet-go/parquet"
"github.com/xitongsys/parquet-go/writer"
)
func DownloadWarc(
ctx context.Context,
s3Client *s3.Client,
warcs []*types.Warc,
path string,
) error {
key := fmt.Sprintf("parsed_warc/%s.parquet", path)
filename := fmt.Sprintf("/tmp/%s", path)
file, err := os.Create(filename)
if err != nil {
return fmt.Errorf("error creating file: %s", err)
}
defer file.Close()
bytesWriter := bufio.NewWriter(file)
pw, err := writer.NewParquetWriterFromWriter(bytesWriter, new(Page), 4)
if err != nil {
return fmt.Errorf("Can't create parquet writer: %s", err)
}
pw.RowGroupSize = 128 * 1024 * 1024 //128M
pw.CompressionType = parquet.CompressionCodec_SNAPPY
mutex := sync.Mutex{}
numWorkers := runtime.NumCPU() * 2
fmt.Printf("Using %d workers\n", numWorkers)
limit := limiter.NewConcurrencyLimiter(numWorkers)
for i, warc := range warcs {
limit.Execute(func() {
log.Printf("%d: %+v", i, warc)
body, err := GetWarc(ctx, s3Client, warc)
if err != nil {
fmt.Printf("error getting warc: %s", err)
return
}
page, err := Parse(body)
if err != nil {
key := fmt.Sprintf("unparsed_warc/%s.warc", path)
s3Client.PutObject(
ctx,
&s3.PutObjectInput{
Body: bytes.NewReader(body),
Bucket: &s3Record.Bucket.Name,
Key: &key,
},
)
fmt.Printf("error getting page %s: %s", key, err)
return
}
mutex.Lock()
err = pw.Write(page)
pw.Flush(true)
mutex.Unlock()
if err != nil {
fmt.Printf("error writing page: %s", err)
return
}
})
}
limit.WaitAndClose()
err = pw.WriteStop()
if err != nil {
return fmt.Errorf("error writing stop: %s", err)
}
bytesWriter.Flush()
file.Seek(0, 0)
_, err = s3Client.PutObject(
ctx,
&s3.PutObjectInput{
Body: file,
Bucket: &s3Record.Bucket.Name,
Key: &key,
},
)
if err != nil {
return fmt.Errorf("error uploading warc: %s", err)
}
return nil
}
Is there a way to avoid saving the contents into a temp file and use only a limited size byte buffer between the writer and the upload function?
In other words can I begin to stream data to a reader while still writing to the same buffer?

Yes there is a way to write the same content to multiple writers. Using io.MultiWriter might allow you to not use a temp file. However, it might still be good to use a temp file.
I often use io.MultiWriter to write to a list of checksum (sha256...) calculators. Actually, last time I read the the S3 client code, I noticed it does this under the hood to calculate the checksum. MultiWriter is pretty useful for piping big files between cloud places.
Also, if you end up using temp files. You may want to use os.CreateTemp to create temporary files. If you don't, you may run into issues with your created file names if your code is running in two processes or your files have the same name.
Feel free to clarify your question. I can try to answer again :)

Related

Golang: Facing error while creating .tar.gz file having large name

I am trying to create a .tar.gz file from folder that contains multiple files / folders. Once the .tar.gz file gets created, while extracting, the files are not not properly extracted. Mostly I think its because of large names or path exceeding some n characters, because same thing works when the filename/path is small. I referred this https://github.com/golang/go/issues/17630 and tried to add below code but it did not help.
header.Uid = 0
header.Gid = 0
I am using simple code seen below to create .tar.gz. The approach is, I create a temp folder, do some processing on the files and from that temp path, I create the .tar.gz file hence in the path below I am using pre-defined temp folder path.
package main
import (
"archive/tar"
"compress/gzip"
"fmt"
"io"
"log"
"os"
fp "path/filepath"
)
func main() {
// Create output file
out, err := os.Create("output.tar.gz")
if err != nil {
log.Fatalln("Error writing archive:", err)
}
defer out.Close()
// Create the archive and write the output to the "out" Writer
tmpDir := "C:/Users/USERNAME~1/AppData/Local/Temp/temp-241232063"
err = createArchive1(tmpDir, out)
if err != nil {
log.Fatalln("Error creating archive:", err)
}
fmt.Println("Archive created successfully")
}
func createArchive1(path string, targetFile *os.File) error {
gw := gzip.NewWriter(targetFile)
defer gw.Close()
tw := tar.NewWriter(gw)
defer tw.Close()
// walk through every file in the folder
err := fp.Walk(path, func(filePath string, info os.FileInfo, err error) error {
// ensure the src actually exists before trying to tar it
if _, err := os.Stat(filePath); err != nil {
return err
}
if err != nil {
return err
}
if info.IsDir() {
return nil
}
file, err := os.Open(filePath)
if err != nil {
return err
}
defer file.Close()
// generate tar header
header, err := tar.FileInfoHeader(info, info.Name())
header.Uid = 0
header.Gid = 0
if err != nil {
return err
}
header.Name = filePath //strings.TrimPrefix(filePath, fmt.Sprintf("%s/", fp.Dir(path))) //info.Name()
// write header
if err := tw.WriteHeader(header); err != nil {
return err
}
if _, err := io.Copy(tw, file); err != nil {
return err
}
return nil
})
return err
}
Please let me know what wrong I am doing.

Uploading for to internet site

With the below code I can download a file from internet asking with monitoring the downloaded percentage.
How can I do something to upload file to internet as well as monitoring the upload progress. I want to upload executable file at github assets
package main
import (
"fmt"
"io"
"net/http"
"os"
"strings"
"github.com/dustin/go-humanize"
)
// WriteCounter counts the number of bytes written to it. It implements to the io.Writer interface
// and we can pass this into io.TeeReader() which will report progress on each write cycle.
type WriteCounter struct {
Total uint64
}
func (wc *WriteCounter) Write(p []byte) (int, error) {
n := len(p)
wc.Total += uint64(n)
wc.PrintProgress()
return n, nil
}
func (wc WriteCounter) PrintProgress() {
// Clear the line by using a character return to go back to the start and remove
// the remaining characters by filling it with spaces
fmt.Printf("\r%s", strings.Repeat(" ", 35))
// Return again and print current status of download
// We use the humanize package to print the bytes in a meaningful way (e.g. 10 MB)
fmt.Printf("\rDownloading... %s complete", humanize.Bytes(wc.Total))
}
func main() {
fmt.Println("Download Started")
fileUrl := "https://upload.wikimedia.org/wikipedia/commons/d/d6/Wp-w4-big.jpg"
err := DownloadFile("avatar.jpg", fileUrl)
if err != nil {
panic(err)
}
fmt.Println("Download Finished")
}
// DownloadFile will download a url to a local file. It's efficient because it will
// write as it downloads and not load the whole file into memory. We pass an io.TeeReader
// into Copy() to report progress on the download.
func DownloadFile(filepath string, url string) error {
// Create the file, but give it a tmp file extension, this means we won't overwrite a
// file until it's downloaded, but we'll remove the tmp extension once downloaded.
out, err := os.Create(filepath + ".tmp")
if err != nil {
return err
}
// Get the data
resp, err := http.Get(url)
if err != nil {
out.Close()
return err
}
defer resp.Body.Close()
// Create our progress reporter and pass it to be used alongside our writer
counter := &WriteCounter{}
if _, err = io.Copy(out, io.TeeReader(resp.Body, counter)); err != nil {
out.Close()
return err
}
// The progress use the same line so print a new line once it's finished downloading
fmt.Print("\n")
// Close the file without defer so it can happen before Rename()
out.Close()
if err = os.Rename(filepath+".tmp", filepath); err != nil {
return err
}
return nil
}
I just modify your code. It works for my file server.
func UploadFile(filepath string, url string) error {
// Create the file, but give it a tmp file extension, this means we won't overwrite a
// file until it's downloaded, but we'll remove the tmp extension once downloaded.
out, err := os.Open(filepath)
if err != nil {
return err
}
// Create our progress reporter and pass it to be used alongside our writer
counter := &WriteCounter{}
// Get the data
resp, err := http.Post(url, "multipart/form-data", io.TeeReader(out, counter))
if err != nil {
out.Close()
log.Println(err.Error())
return err
}
defer resp.Body.Close()
// The progress use the same line so print a new line once it's finished downloading
fmt.Print("\n")
// Close the file without defer so it can happen before Rename()
out.Close()
return nil
}

Can't upload data to google cloud storage from a chaincode instance in hyperledger fabric

I tried to write a chaincode such that when it's executed in a peer instance, it uploads data to google cloud storage bucket. The file I'll be uploading is actually stored as small file chunks in a folder, so that different peers upload different chunks to the GCS bucket. I'm using the fabcar blueprint to develop this chaincode, and test-network script files to execute the chaincode. The function I used to upload data is working well when I executed locally, but when I tried to use in the chaincode, it's showing
Error: endorsement failure during invoke. response: status:500 message:"error in simulation: failed to execute transaction 49a9b96088ff2f32906a6b6c9ba1f4ac0a530779bf8d506b176fcdfb8818afe2: error sending: chaincode stream terminated"
(What I'm doing might sound crazy, but I'm new to this hyperledger fabric)
Below is the code sample I'm executing (I think it's the problem with uploadGCS or InitLedger function)(FYI: chaincode execution runs InitLedger function only, which ofcourse uses uploadGCS function)
package main
import (
"fmt"
"os"
"io"
"log"
"strings"
"encoding/json"
"encoding/hex"
"github.com/hyperledger/fabric-contract-api-go/contractapi"
"path/filepath"
"strconv"
"crypto/sha256"
"time"
"context"
"cloud.google.com/go/storage"
"google.golang.org/api/option"
"golang.org/x/oauth2/google"
)
type SmartContract struct {
contractapi.Contract
}
type Data struct {
Owner string `json:"owner"`
File string `json:"file"`
FileChunkNumber string `json:"filechunknumber"`
SHA256 string `json:"sha256"`
}
func uploadGCS(owner, filechunklocation, uploadlocation string) error {
ct := context.Background()
creds, err := google.FindDefaultCredentials(ct, storage.ScopeReadOnly)
if err != nil {
log.Fatal("GoT an err %s", err)
}
client, err := storage.NewClient(ct, option.WithCredentials(creds))
if err != nil {
return fmt.Errorf("storage.NewClient: %v", err)
}
defer client.Close()
// Open local file.
f, err := os.Open(filechunklocation)
if err != nil {
return fmt.Errorf("os.Open: %v", err)
}
defer f.Close()
ct, cancel := context.WithTimeout(ct, time.Second*50)
defer cancel()
// Upload an object with storage.Writer.
wc := client.Bucket("btp2016bcs0015-cloud-storage").Object(uploadlocation).NewWriter(ct)
if _, err = io.Copy(wc, f); err != nil {
return fmt.Errorf("io.Copy: %v", err)
}
if err := wc.Close(); err != nil {
return fmt.Errorf("Writer.Close: %v", err)
}
return nil
}
func (s *SmartContract) InitLedger(ctx contractapi.TransactionContextInterface) error {
filelocation := "/home/busyfriend/go/src/github.com/hyperledger/fabric-samples/test-network/samplefile---pdf"
data := []Data{
Data{Owner: "ID126859", File: "samplefile.pdf", FileChunkNumber: "1", SHA256: "eb73a20d61c1fb294b0eba4d35568d10c8ddbfe2544a3cacc959d640077673f5"},
Data{Owner: "ID126859", File: "samplefile.pdf", FileChunkNumber: "2", SHA256: "92dd8ea8aa0da4a48a2cb45ae38f70f17526b6b50ef80c44367a56de6ec9abf9"},
Data{Owner: "ID126859", File: "samplefile.pdf", FileChunkNumber: "3", SHA256: "b97027d261d01f86d1e514a52886add096ddc4e66d15d01e53516dd9d5cfb20b"},
Data{Owner: "ID126859", File: "samplefile.pdf", FileChunkNumber: "4", SHA256: "377582f5e62dc3b34e40741f2d70d8f37a029856f75cbe68a6659328258e23a3"},
Data{Owner: "ID126859", File: "samplefile.pdf", FileChunkNumber: "5", SHA256: "afb6c6d112d446ac07d78b13957bb440105038411095032de444bf08e3bbdba8"},
Data{Owner: "ID126859", File: "samplefile.pdf", FileChunkNumber: "6", SHA256: "e43b885c2bfb47130c54fa70528fb2a91d9d1af1417a0f7c5a4c22d8f16efb01"},
}
for i := range data {
_, dir := filepath.Split(filelocation)
dir_1 := strings.Split(dir,"---")
filechunk := dir_1[0]+"_"+ data[i].FileChunkNumber
filechunklocation := filepath.Join(filelocation, filechunk)
uploadlocation := data[i].Owner + "/" + dir + "/" + filechunk
err := uploadGCS(data[i].Owner, filechunklocation, uploadlocation)
if err != nil {
return fmt.Errorf("Got an error %s", err.Error())
}
}
for i, putdata := range data {
dataAsBytes, _ := json.Marshal(putdata)
err := ctx.GetStub().PutState("DATA"+strconv.Itoa(i), dataAsBytes)
if err != nil {
return fmt.Errorf("Failed to put to world state. %s", err.Error())
}
}
return nil
}
// Uploads new data to the world state with given details
func (s *SmartContract) uploadData(ctx contractapi.TransactionContextInterface, dataID string, owner string, filelocation string, filechunknumber string) error {
//Uploads the filechunk to the cloud storage
_, dir := filepath.Split(filelocation)
dir_1 := strings.Split(dir,"---")
filechunk := dir_1[0]+"_"+ filechunknumber
filechunklocation := filepath.Join(filelocation, filechunk)
uploadlocation := owner + "/" + dir + "/" + filechunk
err := uploadGCS(owner, filechunklocation, uploadlocation)
if err != nil {
fmt.Println(err.Error())
return err
}
//Creates SHA256 hash of the file chunk
f, err := os.Open(filechunklocation)
if err != nil {
log.Fatal(err)
}
defer f.Close()
h := sha256.New()
if _, err := io.Copy(h, f); err != nil {
log.Fatal(err)
}
data := Data{
Owner: owner,
File: dir_1[0]+"."+dir_1[1],
FileChunkNumber: filechunknumber,
SHA256: hex.EncodeToString(h.Sum(nil)),
}
dataAsBytes, _ := json.Marshal(data)
return ctx.GetStub().PutState(dataID, dataAsBytes)
}
func main() {
chaincode, err := contractapi.NewChaincode(new(SmartContract))
if err != nil {
fmt.Printf("Error create cloud chaincode: %s", err.Error())
return
}
if err := chaincode.Start(); err != nil {
fmt.Printf("Error starting cloud chaincode: %s", err.Error())
}
}
This is something I got after executing this chaincode
terminal result
Can you check chaincode containers' logs? You can find extra containers created which have your chaincode's name and version in it. If you want to see the logs generated by chaincode, you need to look into those containers (with docker logs <container>).
In your code, you are logging some errors with Fatal. Instead of log & continue approach, it'd be better to return errors and fail.
Please note that, according to endorsement policy (i.e. AND(Org1.member, Org2.member)) one invoke request can be executed on multiple peers. All those executions has to return same result and put/get exactly same data to/from ledger. This is a design to ensure trust between different organizations having same chaincode running. (i.e. it'd fail if your different peers convert same object to string with different attributes order before putting to ledger.)
My opinion if to keep I/O operation separate then just put hashes to ledger. Your approach has some in-common with off-chain data concept. Please have a look into it before reconsideration.

Editing a zip file in memory

I am trying to edit a zip file in memory in Go and return the zipped file through a HTTP response
The goal is to add a few files to a path in the zip file example
I add a log.txt file in my path/to/file route in the zipped folder
All this should be done without saving the file or editing the original file.
I have implemented a simple version of real-time stream compression, which can correctly compress a single file. If you want it to run efficiently, you need a lot of optimization.
This is only for reference. If you need more information, you should set more useful HTTP header information before compression so that the client can correctly process the response data.
package main
import (
"archive/zip"
"io"
"net/http"
"os"
"github.com/gin-gonic/gin"
)
func main() {
engine := gin.Default()
engine.GET("/log.zip", func(c *gin.Context) {
f, err := os.Open("./log.txt")
if err != nil {
c.String(http.StatusInternalServerError, err.Error())
return
}
defer f.Close()
info, err := f.Stat()
if err != nil {
c.String(http.StatusInternalServerError, err.Error())
return
}
z := zip.NewWriter(c.Writer)
head, err := zip.FileInfoHeader(info)
if err != nil {
c.String(http.StatusInternalServerError, err.Error())
return
}
defer z.Close()
w, err := z.CreateHeader(head)
if err != nil {
c.String(http.StatusInternalServerError, err.Error())
return
}
_, err = io.Copy(w, f)
if err != nil {
c.String(http.StatusInternalServerError, err.Error())
return
}
})
engine.Run("127.0.0.1:8080")
}
So after hours of tireless work i figured out my approach was bad or maybe not possible with the level of my knowledge so here is a not so optimal solution but it works and fill ur file is not large it should be okay for you.
So you have a file template.zip and u want to add extra files, my initial approach was to copy the whole file into memory and edit it from their but i was having complications.
My next approach was to recreate the file in memory, file by file and to do that i need to know every file in the directory i used the code below to get all my files into a list
root := "template"
err = filepath.Walk(root, func(path string, info os.FileInfo, err error) error {
if info.IsDir() {
return nil
}append(files,path)}
now i have all my files and i can create a buffer to hold all this files
buf := new(bytes.Buffer)
// Create a new zip archive.
zipWriter := zip.NewWriter(buf)
now with the zip archive i can write all my old files to it while at the same time copying the contents
for _, file := range files {
zipFile, err := zipWriter.Create(file)
if err != nil {
fmt.Println(err)
}
content, err := ioutil.ReadFile(file)
if err != nil {
log.Fatal(err)
}
// Convert []byte to string and print to screen
// text := string(content)
_, err = zipFile.Write(content)
if err != nil {
fmt.Println(err)
}
}
At this point, we have our file in buf.bytes()
The remaining cold adds the new files and sends the response back to the client
for _, appCode := range appPageCodeText {
f, err := zipWriter.Create(filepath.fileextension)
if err != nil {
log.Fatal(err)
}
_, err = f.Write([]byte(appCode.Content))
}
err = zipWriter.Close()
if err != nil {
fmt.Println(err)
}
w.Header().Set("Content-Disposition", "attachment; filename="+"template.zip")
w.Header().Set("Content-Type", "application/zip")
w.Write(buf.Bytes()) //'Copy' the file to the client

Go file downloader

I have the following code which is suppose to download file by splitting it into multiple parts. But right now it only works on images, when I try downloading other files like tar files the output is an invalid file.
UPDATED:
Used os.WriteAt instead of os.Write and removed os.O_APPEND file mode.
package main
import (
"errors"
"flag"
"fmt"
"io/ioutil"
"log"
"net/http"
"os"
"strconv"
)
var file_url string
var workers int
var filename string
func init() {
flag.StringVar(&file_url, "url", "", "URL of the file to download")
flag.StringVar(&filename, "filename", "", "Name of downloaded file")
flag.IntVar(&workers, "workers", 2, "Number of download workers")
}
func get_headers(url string) (map[string]string, error) {
headers := make(map[string]string)
resp, err := http.Head(url)
if err != nil {
return headers, err
}
if resp.StatusCode != 200 {
return headers, errors.New(resp.Status)
}
for key, val := range resp.Header {
headers[key] = val[0]
}
return headers, err
}
func download_chunk(url string, out string, start int, stop int) {
client := new(http.Client)
req, _ := http.NewRequest("GET", url, nil)
req.Header.Add("Range", fmt.Sprintf("bytes=%d-%d", start, stop))
resp, _ := client.Do(req)
defer resp.Body.Close()
body, err := ioutil.ReadAll(resp.Body)
if err != nil {
log.Fatalln(err)
return
}
file, err := os.OpenFile(out, os.O_WRONLY, 0600)
if err != nil {
if file, err = os.Create(out); err != nil {
log.Fatalln(err)
return
}
}
defer file.Close()
if _, err := file.WriteAt(body, int64(start)); err != nil {
log.Fatalln(err)
return
}
fmt.Println(fmt.Sprintf("Range %d-%d: %d", start, stop, resp.ContentLength))
}
func main() {
flag.Parse()
headers, err := get_headers(file_url)
if err != nil {
fmt.Println(err)
} else {
length, _ := strconv.Atoi(headers["Content-Length"])
bytes_chunk := length / workers
fmt.Println("file length: ", length)
for i := 0; i < workers; i++ {
start := i * bytes_chunk
stop := start + (bytes_chunk - 1)
go download_chunk(file_url, filename, start, stop)
}
var input string
fmt.Scanln(&input)
}
}
Basically, it just reads the length of the file, divides it with the number of workers then each file downloads using HTTP's Range header, after downloading it seeks to a position in the file where that chunk is written.
If you really ignore many errors like seen above then your code is not supposed to work reliably for any file type.
However, I guess I can see on problem in your code. I think that mixing O_APPEND and seek is probably a mistake (Seek should be ignored with this mode). I suggest to use (*os.File).WriteAt instead.
IIRC, O_APPEND forces any write to happen at the [current] end of file. However, your download_chunk function instances for file parts can be executing in unpredictable order, thus "reordering" the file parts. The result is then a corrupted file.
1.the sequence of the go routine is not sure。
eg. the execute result maybe as follows:
...
file length:20902
Range 10451-20901:10451
Range 0-10450:10451
...
so the chunks can't just append.
2.when write chunk datas must have a sys.Mutex
(my english is poor,please forget it)

Resources