Reading files from AWS S3 in Golang

Reading files from AWS S3 in Golang - go

I am trying to deploy a golang code on Heroku. My code needs a text file as input and I need to fetch this text file from S3 bucket. My go-code takes the filename as input, Can someone provide a code snippet for reading a file from S3 and storing its contents into a file?
My GOlang code-
func getDomains(path string) (lines []string, Error error) {
file, err := os.Open(path)
if err != nil {
log.Fatalln(err)
}
defer file.Close()
scanner := bufio.NewScanner(file)
for scanner.Scan() {
lines = append(lines, scanner.Text())
}
return lines, scanner.Err()
}
func Process(w http.ResponseWriter, r *http.Request) {
urls := make(chan *Http, Threads*10)
list, err := getDomains("**NEED A TEXT FILE FROM S3 HERE as an argument**")
if err != nil {
log.Fatalln(err)
}
var wg sync.WaitGroup
for i := 0; i < Threads; i++ {
wg.Add(1)
go func() {
for url := range urls {
url.DNS()
}
wg.Done()
}()
}
for i := 0; i < len(list); i++ {
Progress := fmt.Sprintln(w, len(list))
urls <- &Http{Url: list[i], Num: Progress}
}
close(urls)
wg.Wait()
fmt.Printf("\r%s", strings.Repeat(" ", 100))
fmt.Fprintln(w, "\rTask completed.\n")
}
Can someone suggest a good library for reading the file from S3 into a text file? I cannot download the file from S3 because I have to deploy it on Heroku.
A code snippet for example will be highly appreciated!

The code snippet below should work (given that you have installed the proper dependencies):
package main
import (
"github.com/aws/aws-sdk-go/aws"
"github.com/aws/aws-sdk-go/aws/session"
"github.com/aws/aws-sdk-go/service/s3"
"github.com/aws/aws-sdk-go/service/s3/s3manager"
"fmt"
"log"
"os"
)
func main() {
// NOTE: you need to store your AWS credentials in ~/.aws/credentials
// 1) Define your bucket and item names
bucket := "<YOUR_BUCKET_NAME>"
item := "<YOUR_ITEM_NAME>"
// 2) Create an AWS session
sess, _ := session.NewSession(&aws.Config{
Region: aws.String("us-west-2")},
)
// 3) Create a new AWS S3 downloader
downloader := s3manager.NewDownloader(sess)
// 4) Download the item from the bucket. If an error occurs, log it and exit. Otherwise, notify the user that the download succeeded.
file, err := os.Create(item)
numBytes, err := downloader.Download(file,
&s3.GetObjectInput{
Bucket: aws.String(bucket),
Key: aws.String(item),
})
if err != nil {
log.Fatalf("Unable to download item %q, %v", item, err)
}
fmt.Println("Downloaded", file.Name(), numBytes, "bytes")
}
For more details you can check the AWS Go SDK and the Github Example

Using current stable AWS lib for go:
sess := session.Must(session.NewSession(&aws.Config{
....
}))
svc := s3.New(sess)
rawObject, err := svc.GetObject(
&s3.GetObjectInput{
Bucket: aws.String("toto"),
Key: aws.String("toto.txt"),
})
buf := new(bytes.Buffer)
buf.ReadFrom(rawObject.Body)
myFileContentAsString := buf.String()

Here is a function for getting an object using V2 of the SDK (adapted from examples in https://github.com/aws/aws-sdk-go-v2):
Note: No Error handling - demo code only.
package s3demo
import (
"os"
"context"
"fmt"
"io/ioutil"
"github.com/aws/aws-sdk-go-v2/aws"
"github.com/aws/aws-sdk-go-v2/aws/awserr"
"github.com/aws/aws-sdk-go-v2/aws/external"
"github.com/aws/aws-sdk-go-v2/service/s3"
)
func GetObjectWithV2SDKDemo() {
bucket := "YOUR_BUCKET"
key := "YOUR_OBJECT_KEY"
fileName := "YOUR_FILE_PATH"
// may need AWS_PROFILE and AWS_REGION populated as environment variables
cfg, err := external.LoadDefaultAWSConfig()
if err != nil {
panic("failed to load config, " + err.Error())
}
svc := s3.New(cfg)
ctx := context.Background()
req := svc.GetObjectRequest(&s3.GetObjectInput{
Bucket: aws.String(bucket),
Key: aws.String(key),
})
resp, err := req.Send(ctx)
if err != nil {
panic(err)
}
s3objectBytes, err := ioutil.ReadAll(resp.Body)
if err != nil {
panic(err)
}
// create file
f, err := os.Create(fileName)
defer f.Close()
if err != nil {
panic(err)
}
bytesWritten, err := f.Write(s3objectBytes)
if err != nil {
panic(err)
}
fmt.Printf("Fetched %d bytes for S3Object\n", bytesWritten)
fmt.Printf("successfully downloaded data from %s/%s\n to file %s", bucket, key, fileName)
}

Related

Can I stream data from a writer to a reader in golang?

I want to process a number of files whose contents don't fit in the memory of my worker. The solution I found so far involves saving the results to the processing to the /tmp directory before uploading it to S3.
import (
"bufio"
"bytes"
"context"
"fmt"
"log"
"os"
"runtime"
"strings"
"sync"
"github.com/aws/aws-sdk-go-v2/service/s3"
"github.com/korovkin/limiter"
"github.com/xitongsys/parquet-go/parquet"
"github.com/xitongsys/parquet-go/writer"
)
func DownloadWarc(
ctx context.Context,
s3Client *s3.Client,
warcs []*types.Warc,
path string,
) error {
key := fmt.Sprintf("parsed_warc/%s.parquet", path)
filename := fmt.Sprintf("/tmp/%s", path)
file, err := os.Create(filename)
if err != nil {
return fmt.Errorf("error creating file: %s", err)
}
defer file.Close()
bytesWriter := bufio.NewWriter(file)
pw, err := writer.NewParquetWriterFromWriter(bytesWriter, new(Page), 4)
if err != nil {
return fmt.Errorf("Can't create parquet writer: %s", err)
}
pw.RowGroupSize = 128 * 1024 * 1024 //128M
pw.CompressionType = parquet.CompressionCodec_SNAPPY
mutex := sync.Mutex{}
numWorkers := runtime.NumCPU() * 2
fmt.Printf("Using %d workers\n", numWorkers)
limit := limiter.NewConcurrencyLimiter(numWorkers)
for i, warc := range warcs {
limit.Execute(func() {
log.Printf("%d: %+v", i, warc)
body, err := GetWarc(ctx, s3Client, warc)
if err != nil {
fmt.Printf("error getting warc: %s", err)
return
}
page, err := Parse(body)
if err != nil {
key := fmt.Sprintf("unparsed_warc/%s.warc", path)
s3Client.PutObject(
ctx,
&s3.PutObjectInput{
Body: bytes.NewReader(body),
Bucket: &s3Record.Bucket.Name,
Key: &key,
},
)
fmt.Printf("error getting page %s: %s", key, err)
return
}
mutex.Lock()
err = pw.Write(page)
pw.Flush(true)
mutex.Unlock()
if err != nil {
fmt.Printf("error writing page: %s", err)
return
}
})
}
limit.WaitAndClose()
err = pw.WriteStop()
if err != nil {
return fmt.Errorf("error writing stop: %s", err)
}
bytesWriter.Flush()
file.Seek(0, 0)
_, err = s3Client.PutObject(
ctx,
&s3.PutObjectInput{
Body: file,
Bucket: &s3Record.Bucket.Name,
Key: &key,
},
)
if err != nil {
return fmt.Errorf("error uploading warc: %s", err)
}
return nil
}
Is there a way to avoid saving the contents into a temp file and use only a limited size byte buffer between the writer and the upload function?
In other words can I begin to stream data to a reader while still writing to the same buffer?

Yes there is a way to write the same content to multiple writers. Using io.MultiWriter might allow you to not use a temp file. However, it might still be good to use a temp file.
I often use io.MultiWriter to write to a list of checksum (sha256...) calculators. Actually, last time I read the the S3 client code, I noticed it does this under the hood to calculate the checksum. MultiWriter is pretty useful for piping big files between cloud places.
Also, if you end up using temp files. You may want to use os.CreateTemp to create temporary files. If you don't, you may run into issues with your created file names if your code is running in two processes or your files have the same name.
Feel free to clarify your question. I can try to answer again :)

GCP - get project NAT GW's

We have account on GCP which contain valid cloud Nat, now we want to get those values via
GCP sdk, I've tried the following and get empty response (maybe I use the wrong API and it not ListExternalVpnGatewaysRequest)
package main
import (
"context"
"fmt"
compute "cloud.google.com/go/compute/apiv1"
"google.golang.org/api/iterator"
computepb "google.golang.org/genproto/googleapis/cloud/compute/v1"
)
func main() {
ctx := context.Background()
c, err := compute.NewExternalVpnGatewaysRESTClient(ctx)
if err != nil {
fmt.Println(err)
}
defer c.Close()
proj := "dev-proj"
req := &computepb.ListExternalVpnGatewaysRequest{
//Filter: new(string),
//MaxResults: new(uint32),
//OrderBy: new(string),
//PageToken: new(string),
Project: proj,
//ReturnPartialSuccess: new(bool),
}
it := c.List(ctx, req)
for {
resp, err := it.Next()
if err == iterator.Done {
break
}
if err != nil {
fmt.Println(err)
}
// TODO: Use resp.
_ = resp
fmt.Println(resp)
}
}
I need to get the following values using GCP GO SDK
update
I tried the following as-is and I got error
package main
import (
"context"
"fmt"
"google.golang.org/api/compute/v1"
"log"
)
func main() {
project := "my-proj"
region := "my-region"
ctx := context.Background()
computeService, err := compute.New(ctx)
if err != nil {
log.Fatal(err)
}
req := computeService.Routers.List(project, region)
if err := req.Pages(ctx, func(page *compute.RouterList) error {
for _, router := range page.Items {
// process each `router` resource:
fmt.Printf("%#v\n", router)
// NAT Gateways are found in router.nats
}
return nil
}); err != nil {
log.Fatal(err)
}
}
Error is: ./main.go:16:36: cannot use ctx (type context.Context) as type *http.Client in argument to compute.New

A VPN Gateway is not the same as a NAT Gateway.
Use this code to list routers. Within the list of routers, is the NAT Gateways
import "google.golang.org/api/compute/v1"
// Replace with valid values for your project
project := "my-project"
region := "my-region"
ctx := context.Background()
c, err := google.DefaultClient(ctx, compute.CloudPlatformScope)
if err != nil {
log.Fatal(err)
}
computeService, err := compute.New(c)
if err != nil {
log.Fatal(err)
}
req := computeService.Routers.List(project, region)
if err := req.Pages(ctx, func(page *compute.RouterList) error {
for _, router := range page.Items {
// process each `router` resource:
fmt.Printf("%#v\n", router)
// NAT Gateways are found in router.nats
}
return nil
}); err != nil {
log.Fatal(err)
}
SDK Documentation

How to upload image to firebase storage using Golang

I have been struggling for the past hours trying to upload an image to firestore storage but I can't make it... The image seems to be corrupted once on Firestore
func (fs *FS) Upload(fileInput []byte, fileName string) error {
ctx, cancel := context.WithTimeout(context.Background(), fs.defaultTransferTimeout)
defer cancel()
bucket, err := fs.client.DefaultBucket()
if err != nil {
return err
}
object := bucket.Object(fileName)
writer := object.NewWriter(ctx)
defer writer.Close()
if _, err := io.Copy(writer, bytes.NewReader(fileInput)); err != nil {
return err
}
if err := object.ACL().Set(context.Background(), storage.AllUsers, storage.RoleReader); err != nil {
return err
}
return nil
}
I get no error but once uploaded... I get this:
Meanwhile on Google Cloud Storage:
Any thoughts?

The upload is probably fine. The Firebase console is just known to be unable to show previews of content that was not uploaded by a web or mobile client. Try downloading the file locally to verify that it's the same as what you uploaded.
Feel free to also file a feature request with Firebase support about the console.

You need to send an additional metadata attribute: firebaseStorageDownloadTokens
Like:
import (
"github.com/google/uuid"
)
func (fs *FS) Upload(fileInput []byte, fileName string) error {
//create an id
id := uuid.New()
ctx, cancel := context.WithTimeout(context.Background(), fs.defaultTransferTimeout)
defer cancel()
bucket, err := fs.client.DefaultBucket()
if err != nil {
return err
}
object := bucket.Object(fileName)
writer := object.NewWriter(ctx)
//Set the attribute
writer.ObjectAttrs.Metadata = map[string]string{"firebaseStorageDownloadTokens": id.String()}
defer writer.Close()
if _, err := io.Copy(writer, bytes.NewReader(fileInput)); err != nil {
return err
}
if err := object.ACL().Set(context.Background(), storage.AllUsers, storage.RoleReader); err != nil {
return err
}
return nil
}
The image should appear after these changes.

How to get logs from kubernetes using Go?

I'm looking for the solution of how to get logs from a pod in Kubernetes cluster using Go. I've looked at "https://github.com/kubernetes/client-go" and "https://godoc.org/sigs.k8s.io/controller-runtime/pkg/client", but couldn't understand how to use them for this purpose. I have no issues getting information of a pod or any other object in K8S except for logs.
For example, I'm using Get() from "https://godoc.org/sigs.k8s.io/controller-runtime/pkg/client#example-Client--Get" to get K8S job info:
found := &batchv1.Job{}
err = r.client.Get(context.TODO(), types.NamespacedName{Name: job.Name, Namespace: job.Namespace}, found)
Please share of how you get pod's logs nowadays.
Any suggestions would be appreciated!
Update:
The solution provided in Kubernetes go client api for log of a particular pod is out of date. It have some tips, but it is not up to date with current libraries.

Here is what we came up with eventually using client-go library:
func getPodLogs(pod corev1.Pod) string {
podLogOpts := corev1.PodLogOptions{}
config, err := rest.InClusterConfig()
if err != nil {
return "error in getting config"
}
// creates the clientset
clientset, err := kubernetes.NewForConfig(config)
if err != nil {
return "error in getting access to K8S"
}
req := clientset.CoreV1().Pods(pod.Namespace).GetLogs(pod.Name, &podLogOpts)
podLogs, err := req.Stream()
if err != nil {
return "error in opening stream"
}
defer podLogs.Close()
buf := new(bytes.Buffer)
_, err = io.Copy(buf, podLogs)
if err != nil {
return "error in copy information from podLogs to buf"
}
str := buf.String()
return str
}
I hope it will help someone. Please share your thoughts or solutions of how you get logs from pods in Kubernetes.

And if you want read stream in client-go v11.0.0+, the code is like this, feel free for create clientset by yourself:
func GetPodLogs(namespace string, podName string, containerName string, follow bool) error {
count := int64(100)
podLogOptions := v1.PodLogOptions{
Container: containerName,
Follow: follow,
TailLines: &count,
}
podLogRequest := clientSet.CoreV1().
Pods(namespace).
GetLogs(podName, &podLogOptions)
stream, err := podLogRequest.Stream(context.TODO())
if err != nil {
return err
}
defer stream.Close()
for {
buf := make([]byte, 2000)
numBytes, err := stream.Read(buf)
if numBytes == 0 {
continue
}
if err == io.EOF {
break
}
if err != nil {
return err
}
message := string(buf[:numBytes])
fmt.Print(message)
}
return nil
}

The controller-runtime client library does not yet support subresources other than /status, so you would have to use client-go as shown in the other question.

Combining some answers found elsewhere and here to stream (tailing) logs for all containers (init included):
func GetPodLogs(namespace string, podName string) {
pod, err := clientSet.CoreV1().Pods(namespace).Get(ctx, podName, metav1.GetOptions{})
if err != nil {
return err
}
wg := &sync.WaitGroup{}
functionList := []func(){}
for _, container := range append(pod.Spec.InitContainers, pod.Spec.Containers...) {
podLogOpts := v1.PodLogOptions{}
podLogOpts.Follow = true
podLogOpts.TailLines = &[]int64{int64(100)}[0]
podLogOpts.Container = container.Name
podLogs, err := clientSet.CoreV1().Pods(namespace).GetLogs(podName, &podLogOpts).Stream(ctx)
if err != nil {
return err
}
defer podLogs.Close()
functionList = append(functionList, func() {
defer wg.Done()
reader := bufio.NewScanner(podLogs)
for reader.Scan() {
select {
case <-ctx.Done():
return
default:
line := reader.Text()
fmt.Println(worker+"/"+podLogOpts.Container, line)
}
}
log.Printf("INFO log EOF " + reader.Err().Error() + ": " + worker + "/" + podLogOpts.Container)
})
}
wg.Add(len(functionList))
for _, f := range functionList {
go f()
}
wg.Wait()
return nil
}

The answer by anon_coword got me interested, in getting logs in a bit more complicated case:
I want to preform the action multiple times, and check the logs multiple times.
I want to have many pods that will react the same way.
Here are a few examples: https://github.com/nwaizer/GetPodLogsEfficiently
One example is:
package main
import (
"GetPodLogsEfficiently/client"
"GetPodLogsEfficiently/utils"
"bufio"
"context"
"fmt"
corev1 "k8s.io/api/core/v1"
"time"
)
func GetPodLogs(cancelCtx context.Context, PodName string) {
PodLogsConnection := client.Client.Pods(utils.Namespace).GetLogs(PodName, &corev1.PodLogOptions{
Follow: true,
TailLines: &[]int64{int64(10)}[0],
})
LogStream, _ := PodLogsConnection.Stream(context.Background())
defer LogStream.Close()
reader := bufio.NewScanner(LogStream)
var line string
for {
select {
case <-cancelCtx.Done():
break
default:
for reader.Scan() {
line = reader.Text()
fmt.Printf("Pod: %v line: %v\n", PodName, line)
}
}
}
}
func main() {
ctx := context.Background()
cancelCtx, endGofunc := context.WithCancel(ctx)
for _, pod := range utils.GetPods().Items {
fmt.Println(pod.Name)
go GetPodLogs(cancelCtx, pod.Name)
}
time.Sleep(10 * time.Second)
endGofunc()
}

#Emixam23
I believe you will find this snippet useful.
How to get the dynamic name of a pod?
import metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
labelSelector := metav1.LabelSelector{MatchLabels: map[string]string{<LABEL_KEY>: <LABEL_VALUE>}}
listOptions := metav1.ListOptions{
LabelSelector: labels.Set(labelSelector.MatchLabels).String(),
}
pod, err := k8sClient.CoreV1().Pods(<NAMESPACE>).List(listOptions)
podName := pod.Items[0].ObjectMeta.Name

Go file downloader

I have the following code which is suppose to download file by splitting it into multiple parts. But right now it only works on images, when I try downloading other files like tar files the output is an invalid file.
UPDATED:
Used os.WriteAt instead of os.Write and removed os.O_APPEND file mode.
package main
import (
"errors"
"flag"
"fmt"
"io/ioutil"
"log"
"net/http"
"os"
"strconv"
)
var file_url string
var workers int
var filename string
func init() {
flag.StringVar(&file_url, "url", "", "URL of the file to download")
flag.StringVar(&filename, "filename", "", "Name of downloaded file")
flag.IntVar(&workers, "workers", 2, "Number of download workers")
}
func get_headers(url string) (map[string]string, error) {
headers := make(map[string]string)
resp, err := http.Head(url)
if err != nil {
return headers, err
}
if resp.StatusCode != 200 {
return headers, errors.New(resp.Status)
}
for key, val := range resp.Header {
headers[key] = val[0]
}
return headers, err
}
func download_chunk(url string, out string, start int, stop int) {
client := new(http.Client)
req, _ := http.NewRequest("GET", url, nil)
req.Header.Add("Range", fmt.Sprintf("bytes=%d-%d", start, stop))
resp, _ := client.Do(req)
defer resp.Body.Close()
body, err := ioutil.ReadAll(resp.Body)
if err != nil {
log.Fatalln(err)
return
}
file, err := os.OpenFile(out, os.O_WRONLY, 0600)
if err != nil {
if file, err = os.Create(out); err != nil {
log.Fatalln(err)
return
}
}
defer file.Close()
if _, err := file.WriteAt(body, int64(start)); err != nil {
log.Fatalln(err)
return
}
fmt.Println(fmt.Sprintf("Range %d-%d: %d", start, stop, resp.ContentLength))
}
func main() {
flag.Parse()
headers, err := get_headers(file_url)
if err != nil {
fmt.Println(err)
} else {
length, _ := strconv.Atoi(headers["Content-Length"])
bytes_chunk := length / workers
fmt.Println("file length: ", length)
for i := 0; i < workers; i++ {
start := i * bytes_chunk
stop := start + (bytes_chunk - 1)
go download_chunk(file_url, filename, start, stop)
}
var input string
fmt.Scanln(&input)
}
}
Basically, it just reads the length of the file, divides it with the number of workers then each file downloads using HTTP's Range header, after downloading it seeks to a position in the file where that chunk is written.

If you really ignore many errors like seen above then your code is not supposed to work reliably for any file type.
However, I guess I can see on problem in your code. I think that mixing O_APPEND and seek is probably a mistake (Seek should be ignored with this mode). I suggest to use (*os.File).WriteAt instead.
IIRC, O_APPEND forces any write to happen at the [current] end of file. However, your download_chunk function instances for file parts can be executing in unpredictable order, thus "reordering" the file parts. The result is then a corrupted file.

1.the sequence of the go routine is not sure。
eg. the execute result maybe as follows:
...
file length:20902
Range 10451-20901:10451
Range 0-10450：10451
...
so the chunks can't just append.
2.when write chunk datas must have a sys.Mutex
(my english is poor,please forget it)

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

Reading files from AWS S3 in Golang - go

Related

Can I stream data from a writer to a reader in golang?

GCP - get project NAT GW's

How to upload image to firebase storage using Golang

How to get logs from kubernetes using Go?

Go file downloader

Categories

Resources