Merge two Text files in Golang - go

I'm attempting to merge two text files by merging fileACopy.txt to tmp1.txt. The error I get when trying to do this is:
Cannot use 'fileACopy' (type *File) as type []byte"
Both text files have multiple lines of strings and I want to maintain the line breaks. I have imported io, log and os.
How does my code need to be modified or what code should I use?
// Append fileACopy.txt to tmp1.txt
fileACopy, err := os.Open("./fileACopy.txt")
if err != nil {
log.Fatal(err)
}
defer fileACopy.Close()
append, err := os.OpenFile("tmp1.txt", os.O_APPEND|os.O_CREATE|os.O_WRONLY, 0644)
if err != nil {
log.Fatal(err)
}
defer append.Close()
if _, err := append.Write(fileACopy); err != nil {
log.Fatal(err)
}
err := os.Remove("fileACopy.txt")
if err != nil {
log.Fatal(err)
}

Consider the definitions of Open and Write:
func Open(name string) (*File, error)
func (f *File) Write(b []byte) (n int, err error)
So the line:
fileACopy, err := os.Open("./fileACopy.txt")
Gives you fileACopy (a *File) and you then pass this as an argument to append.Write(fileACopy). As per the above definitions append.Write takes a []byte and you are trying to pass it something different (fileACopy, a *File) hence the error "Cannot use 'fileACopy' (type *File) as type []byte".
The simplest way to achieve what you want is probably to use io.Copy:
Copy(dst Writer, src Reader) (written int64, err error)
In your case io.Copy(append, fileACopy) should do the trick. A, less efficient, alternative would be to read the file contents using ioutil.ReadAll (amoungst other options) which will give you a []byte which you can then pass to append.Write (this may not work as well because the entire file is read into memory and then written).

Related

Getting `panic: os: invalid use of WriteAt on file opened with O_APPEND`

I am a newbie to Go. Was starting to write my first code in which I have to download a bunch of CSV's from AWS. I don't understand why it is giving me the below error with O_APPEND mode. If I remove os.O_APPEND, I only get the last file data which is not the objective.
The objective is to download all CSV files into one file locally. I'd like to understand what I'm doing incorrectly.
package main
import (
"fmt"
"os"
"path/filepath"
"github.com/aws/aws-sdk-go/aws"
"github.com/aws/aws-sdk-go/aws/credentials"
"github.com/aws/aws-sdk-go/aws/session"
"github.com/aws/aws-sdk-go/service/s3"
"github.com/aws/aws-sdk-go/service/s3/s3manager"
)
const (
AccessKeyId = "xxxxxxxxx"
SecretAccessKey = "xxxxxxxxxxxxxxxxxxxx"
Region = "eu-central-1"
Bucket = "dexter-reports"
bucketKey = "Jenkins/pluginVersions/"
)
func main() {
// Load the Shared AWS Configuration
os.Setenv("AWS_ACCESS_KEY_ID", AccessKeyId)
os.Setenv("AWS_SECRET_ACCESS_KEY", SecretAccessKey)
filename := "JenkinsPluginDetais.txt"
cred := credentials.NewStaticCredentials(AccessKeyId, SecretAccessKey, "")
config := aws.Config{Credentials: cred, Region: aws.String(Region), Endpoint: aws.String("s3.amazonaws.com")}
file, err := os.OpenFile(filename, os.O_APPEND|os.O_WRONLY|os.O_CREATE, 0666)
if err != nil {
panic(err)
}
defer file.Close()
sess, err := session.NewSession(&config)
if err != nil {
fmt.Println(err)
}
//list Buckets
ObjectList := listBucketObjects(sess)
//loop over the obectlist. First initialize the s3 downloader via s3manager
downloader := s3manager.NewDownloader(sess)
for _, item := range ObjectList.Contents {
csvFile := filepath.Base(*item.Key)
if csvFile != "pluginVersions" {
downloadBucketObjects(downloader, file, csvFile)
}
}
}
func listBucketObjects(sess *session.Session) *s3.ListObjectsV2Output {
//create a new s3 client
svc := s3.New(sess)
resp, err := svc.ListObjectsV2(&s3.ListObjectsV2Input{
Bucket: aws.String(Bucket),
Prefix: aws.String(bucketKey),
})
if err != nil {
panic(err)
}
return resp
}
func downloadBucketObjects(downloader *s3manager.Downloader, file *os.File, keyobj string) {
fileToDownload := bucketKey + keyobj
numBytes, err := downloader.Download(file,
&s3.GetObjectInput{
Bucket: aws.String(Bucket),
Key: aws.String(fileToDownload),
})
if err != nil {
panic(err)
}
fmt.Println("Downloaded", file.Name(), numBytes, "bytes")
}
Firstly, I don't get it why do you even need os.O_APPEND flag in the first place. As per my understanding, you can omit os.O_APPEND.
Now, let's come to the actual problem of why it's happening:
Doc for O_APPEND (Ref: https://man7.org/linux/man-pages/man2/open.2.html):
O_APPEND
The file is opened in append mode. Before each write(2),
the file offset is positioned at the end of the file, as
if with lseek(2). The modification of the file offset and
the write operation are performed as a single atomic step.
So for every call to write the file offset is positioned at the end of the file.
But (*s3Manager.Download).Download supposedly be using WriteAt method, i.e.,
Doc for WriteAt:
$ go doc os WriteAt
package os // import "os"
func (f *File) WriteAt(b []byte, off int64) (n int, err error)
WriteAt writes len(b) bytes to the File starting at byte offset off. It
returns the number of bytes written and an error, if any. WriteAt returns a
non-nil error when n != len(b).
If file was opened with the O_APPEND flag, WriteAt returns an error.
Notice the last line, that if the file's opened with O_APPEND flag it will result in an error and it's even right because WriteAt's second argument is an offset but mixing O_APPEND's behaviour and WriteAt offset seeking might create problem resulting in unexpected results and it errors out.
Consider the definition of s3manager.Downloader:
func (d Downloader) Download(w io.WriterAt, input *s3.GetObjectInput, options ...func(*Downloader)) (n int64, err error)
The first argument is an io.WriterAt; this interface is:
type WriterAt interface {
WriteAt(p []byte, off int64) (n int, err error)
}
This means that the Download function is going to call the WriteAt method in the File you are passing it. As per the documentation for File.WriteAt
If file was opened with the O_APPEND flag, WriteAt returns an error.
So this explains why you are getting the error but raises the question "why is Download using WriteAt and not accepting an io.Writer (and calling Write)?"; the answer can be found in the documentation:
The w io.WriterAt can be satisfied by an os.File to do multipart concurrent downloads, or in memory []byte wrapper using aws.WriteAtBuffer
So, to increase performance, Downloader might make multiple simultaneous requests for parts of the file and then write these out as they are received (meaning it may not write the data in order). This also explains why calling the function multiple times with the same File results in overwritten data (when Downloader retrieves the each chunk of the file it writes it out at the appropriate position in the output file; this overwrites any data already there).
The above quote from the documentation also points to a possible solution; use an aws.WriteAtBuffer and, once the download is finished, write the data to your file (which could then be opened with O_APPEND) - something like this:
buf := aws.NewWriteAtBuffer([]byte{})
numBytes, err := downloader.Download(buf,
&s3.GetObjectInput{
Bucket: aws.String(Bucket),
Key: aws.String(fileToDownload),
})
if err != nil {
panic(err)
}
_, err = file.Write(buf.Bytes())
if err != nil {
panic(err)
}
An alternative would be to download into a temporary file and then append that to your output file (you may need to do this if the files are large).

Base64 encode/decode results in corrupted output

I'm trying to write some convenience wrapper funcs that base64 encodes and decodes byte slices. (Can't understand why this is not conveniently provided in the stdlib.)
However this code (in playground):
func b64encode(b []byte) []byte {
encodedData := &bytes.Buffer{}
encoder := base64.NewEncoder(base64.URLEncoding, encodedData)
defer encoder.Close()
encoder.Write(b)
return encodedData.Bytes()
}
func b64decode(b []byte) ([]byte, error) {
dec := base64.NewDecoder(base64.URLEncoding, bytes.NewReader(b))
buf := &bytes.Buffer{}
_, err := io.Copy(buf, dec)
if err != nil {
return nil, err
}
return buf.Bytes(), nil
}
func main() {
b := []byte("hello")
e := b64encode(b)
d, err := b64decode(e)
if err != nil {
log.Fatalf("could not decode: %s", err)
}
fmt.Println(string(d))
}
generates truncated output when I try to print it:
hel
What's going on?
The defer executes when the function ends. That is AFTER the return statement has been evaluated.
The following works: https://play.golang.org/p/sYn-W6fZh1
func b64encode(b []byte) []byte {
encodedData := &bytes.Buffer{}
encoder := base64.NewEncoder(base64.URLEncoding, encodedData)
encoder.Write(b)
encoder.Close()
return encodedData.Bytes()
}
That being said, if it really is all in memory, you can avoid creating an encoder entirely. Instead, you can do something like:
func b64encode(b []byte) []byte {
ret := make([]byte, base64.URLEncoding.EncodedLen(len(b)))
base64.URLEncoding.Encode(ret, b)
return ret
}
An added benefit of doing it this way it it is more efficient since it only needs to allocate once. It also allows you to no longer ignore errors in the Write and Close methods.

using io.Pipes() for sending and receiving message

I am using os.Pipes() in my program, but for some reason it gives a bad file descriptor error each time i try to write or read data from it.
Is there some thing I am doing wrong?
Below is the code
package main
import (
"fmt"
"os"
)
func main() {
writer, reader, err := os.Pipe()
if err != nil {
fmt.Println(err)
}
_,err= writer.Write([]byte("hello"))
if err != nil {
fmt.Println(err)
}
var data []byte
_, err = reader.Read(data)
if err != nil {
fmt.Println(err)
}
fmt.Println(string(data))
}
output :
write |0: Invalid argument
read |1: Invalid argument
You are using an os.Pipe, which returns a pair of FIFO connected files from the os. This is different than an io.Pipe which is implemented in Go.
The invalid argument errors are because you are reading and writing to the wrong files. The signature of os.Pipe is
func Pipe() (r *File, w *File, err error)
which shows that the returns values are in the order "reader, writer, error".
and io.Pipe:
func Pipe() (*PipeReader, *PipeWriter)
Also returning in the order "reader, writer"
When you check the error from the os.Pipe function, you are only printing the value. If there was an error, the files are invalid. You need to return or exit on that error.
Pipes are also blocking (though an os.Pipe has a small, hard coded buffer), so you need to read and write asynchronously. If you swapped this for an io.Pipe it would deadlock immediately. Dispatch the Read method inside a goroutine and wait for it to complete.
Finally, you are reading into a nil slice, which will read nothing. You need to allocate space to read into, and you need to record the number of bytes read to know how much of the buffer is used.
A more correct version of your example would look like:
reader, writer, err := os.Pipe()
if err != nil {
log.Fatal(err)
}
var wg sync.WaitGroup
wg.Add(1)
go func() {
defer wg.Done()
data := make([]byte, 1024)
n, err = reader.Read(data)
if n > 0 {
fmt.Println(string(data[:n]))
}
if err != nil && err != io.EOF {
fmt.Println(err)
}
}()
_, err = writer.Write([]byte("hello"))
if err != nil {
fmt.Println(err)
}
wg.Wait()

Handling nested zip files with archive/zip

I'm struggling to handle nested zip files in Go (where a zip file contains another zip file). I'm trying to recurse a zip file and list all of the files it contains.
archive/zip gives you two methods for handling a zip file:
zip.NewReader
zip.OpenReader
OpenReader opens a file on disk. NewReader accepts an io.ReaderAt and a file size. As you iterate through the zipped files with either of these, you get out a zip.File for each file inside the zip. To get the file contents of file f, you call f.Open which gives you a zip.ReadCloser. To open a nested zip file, I'd need to use NewReader, but zip.File and zip.ReadCloser do not satisfy the io.ReaderAt interface.
zip.File has a private field zipr which is an io.ReaderAt and zip.ReadCloser has a private field f which is an os.File which should satisfy the requirements for NewReader.
My question: is there any way to open a nested zip file without first writing the contents to a file on disk, or reading the whole thing into memory.
It looks like everything that is needed is available in zip.File, but isn't exported. I'm hoping I missed something.
How about an io.ReaderAt from an io.Reader that reinitializes if you decided to go backwards: (this code is largely untested, but hopefully you get the idea)
package main
import (
"io"
"io/ioutil"
"os"
"strings"
)
type inefficientReaderAt struct {
rdr io.ReadCloser
cur int64
initer func() (io.ReadCloser, error)
}
func newInefficentReaderAt(initer func() (io.ReadCloser, error)) *inefficientReaderAt {
return &inefficientReaderAt{
initer: initer,
}
}
func (r *inefficientReaderAt) Read(p []byte) (n int, err error) {
n, err = r.rdr.Read(p)
r.cur += int64(n)
return n, err
}
func (r *inefficientReaderAt) ReadAt(p []byte, off int64) (n int, err error) {
// reset on rewind
if off < r.cur || r.rdr == nil {
r.cur = 0
r.rdr, err = r.initer()
if err != nil {
return 0, err
}
}
if off > r.cur {
sz, err := io.CopyN(ioutil.Discard, r.rdr, off-r.cur)
n = int(sz)
if err != nil {
return n, err
}
}
return r.Read(p)
}
func main() {
r := newInefficentReaderAt(func() (io.ReadCloser, error) {
return ioutil.NopCloser(strings.NewReader("ABCDEFG")), nil
})
io.Copy(os.Stdout, io.NewSectionReader(r, 0, 3))
io.Copy(os.Stdout, io.NewSectionReader(r, 1, 3))
}
If you mostly move forwards this probably works ok. Especially if you use a buffered reader.
I should note that this violates the io.ReaderAt guarantees: https://godoc.org/io#ReaderFrom , namely it doesn't allow parallel calls to ReadAt, and doesn't block on full reads, so this may not even work properly
I ran into the exact same need and came up with the following approach, not sure if its any help to you:
// NewZipFromReader ...
func NewZipFromReader(file io.ReadCloser, size int64) (*zip.Reader, error) {
in := file.(io.Reader)
if _, ok := in.(io.ReaderAt); ok != true {
buffer, err := ioutil.ReadAll(in)
if err != nil {
return nil, err
}
in = bytes.NewReader(buffer)
size = int64(len(buffer))
}
reader, err := zip.NewReader(in.(io.ReaderAt), size)
if err != nil {
return nil, err
}
return reader, nil
}
So if file doesn't implement io.ReaderAt it reads the whole contents into a buffer.
It's probably not safe to handle ZIP bombs, and will defenitely fail with OOM for files larger than RAM.

Go: zlib uncompressing a slice of bytes

I am trying to parse a file that annoying consists of many separately zipped segments. I have parsed these segments one at a time into a slice of bytes and I want to uncompress them as I go.
Here is my current code that does the decompressing, which doesn't work. from and to are just set at the top as an example, in reality they are set by the code. data is the byte array containing the entire file. I don't want to seek it while it's on disk because its location on another server, so it's only realistic for me to load the entire file to []byte first and then parse it.
from, to := 0, 1000;
b := bytes.NewReader(data[from:from+to])
z, err := zlib.NewReader(b)
CheckErr(err)
defer z.Close()
p := make([]byte,0,1024)
z.Read(p)
fmt.Println(string(p))
So how is it so massively difficult just to unzip a slice of bytes? Anyway...
The problem appears to with how I am reading it out. Where it says z.Read, that doesn't seem to do anything.
How can I read the entire thing in one go into a slice of bytes?
Here's an outline for you. Note: In Go, CHECK FOR ERRORS!
package main
import (
"bytes"
"compress/zlib"
"fmt"
"io/ioutil"
)
func readSegment(data []byte, from, to int) ([]byte, error) {
b := bytes.NewReader(data[from : from+to])
z, err := zlib.NewReader(b)
if err != nil {
return nil, err
}
defer z.Close()
p, err := ioutil.ReadAll(z)
if err != nil {
return nil, err
}
return p, nil
}
func main() {
from, to := 0, 1000
data := make([]byte, from+to)
// ** parse input segments into data **
p, err := readSegment(data, from, to)
if err != nil {
fmt.Println(err)
return
}
fmt.Println(string(p))
}
Use ReadAll(r io.Reader) ([]byte, error) from the io/ioutil package.
p, err := ioutil.ReadAll(b)
fmt.Println(string(p))
Read only reads up to the length of the given slice (1024 bytes in your case).
To read in chunks of 1024 bytes:
p := make([]byte,1024)
for {
numBytes, err := l.Read(p)
if err == io.EOF {
// you are done, numBytes might be less than len(p)
break
}
// do what you want with p
}
If you are getting the data from a webserver, you might even do
import (
"net/http"
"io/ioutil"
)
...
resp, errGet := http.Get("http://example.com/somefile")
// do error handling
z, errZ := zlib.NewReader(resp.Body)
// do error handling
resp.Body.Close()
p, err := ioutil.ReadAll(b)
// do error handling
since resp.Body happens to be an io.Reader as most io related types.

Resources