Get offset/position of bytes written to file - go

I'm writing a string to a file, and I'd like to get the offset of the string which was just written.
Here is the code writing the file:
package main
import (
"os"
)
func main() {
path := "test_file.txt"
byteString := []byte("string to write")
f, err := os.OpenFile(path, os.O_APPEND|os.O_WRONLY, 0600)
if err != nil {
panic(err)
}
defer f.Close()
if _, err = f.Write(byteString); err != nil {
panic(err)
}
}
How can I get the offset after having written the line ?

os.Write only returns the length of the bytes written. If you want the offset, you can either:
Call os.Stat before writing, and then use os.WriteAt to write at the offset for the end of the file provided by the FileInfo structure.
Call os.Stat after writing, and subtract the length written to the file from the new size.

Related

Getting `panic: os: invalid use of WriteAt on file opened with O_APPEND`

I am a newbie to Go. Was starting to write my first code in which I have to download a bunch of CSV's from AWS. I don't understand why it is giving me the below error with O_APPEND mode. If I remove os.O_APPEND, I only get the last file data which is not the objective.
The objective is to download all CSV files into one file locally. I'd like to understand what I'm doing incorrectly.
package main
import (
"fmt"
"os"
"path/filepath"
"github.com/aws/aws-sdk-go/aws"
"github.com/aws/aws-sdk-go/aws/credentials"
"github.com/aws/aws-sdk-go/aws/session"
"github.com/aws/aws-sdk-go/service/s3"
"github.com/aws/aws-sdk-go/service/s3/s3manager"
)
const (
AccessKeyId = "xxxxxxxxx"
SecretAccessKey = "xxxxxxxxxxxxxxxxxxxx"
Region = "eu-central-1"
Bucket = "dexter-reports"
bucketKey = "Jenkins/pluginVersions/"
)
func main() {
// Load the Shared AWS Configuration
os.Setenv("AWS_ACCESS_KEY_ID", AccessKeyId)
os.Setenv("AWS_SECRET_ACCESS_KEY", SecretAccessKey)
filename := "JenkinsPluginDetais.txt"
cred := credentials.NewStaticCredentials(AccessKeyId, SecretAccessKey, "")
config := aws.Config{Credentials: cred, Region: aws.String(Region), Endpoint: aws.String("s3.amazonaws.com")}
file, err := os.OpenFile(filename, os.O_APPEND|os.O_WRONLY|os.O_CREATE, 0666)
if err != nil {
panic(err)
}
defer file.Close()
sess, err := session.NewSession(&config)
if err != nil {
fmt.Println(err)
}
//list Buckets
ObjectList := listBucketObjects(sess)
//loop over the obectlist. First initialize the s3 downloader via s3manager
downloader := s3manager.NewDownloader(sess)
for _, item := range ObjectList.Contents {
csvFile := filepath.Base(*item.Key)
if csvFile != "pluginVersions" {
downloadBucketObjects(downloader, file, csvFile)
}
}
}
func listBucketObjects(sess *session.Session) *s3.ListObjectsV2Output {
//create a new s3 client
svc := s3.New(sess)
resp, err := svc.ListObjectsV2(&s3.ListObjectsV2Input{
Bucket: aws.String(Bucket),
Prefix: aws.String(bucketKey),
})
if err != nil {
panic(err)
}
return resp
}
func downloadBucketObjects(downloader *s3manager.Downloader, file *os.File, keyobj string) {
fileToDownload := bucketKey + keyobj
numBytes, err := downloader.Download(file,
&s3.GetObjectInput{
Bucket: aws.String(Bucket),
Key: aws.String(fileToDownload),
})
if err != nil {
panic(err)
}
fmt.Println("Downloaded", file.Name(), numBytes, "bytes")
}
Firstly, I don't get it why do you even need os.O_APPEND flag in the first place. As per my understanding, you can omit os.O_APPEND.
Now, let's come to the actual problem of why it's happening:
Doc for O_APPEND (Ref: https://man7.org/linux/man-pages/man2/open.2.html):
O_APPEND
The file is opened in append mode. Before each write(2),
the file offset is positioned at the end of the file, as
if with lseek(2). The modification of the file offset and
the write operation are performed as a single atomic step.
So for every call to write the file offset is positioned at the end of the file.
But (*s3Manager.Download).Download supposedly be using WriteAt method, i.e.,
Doc for WriteAt:
$ go doc os WriteAt
package os // import "os"
func (f *File) WriteAt(b []byte, off int64) (n int, err error)
WriteAt writes len(b) bytes to the File starting at byte offset off. It
returns the number of bytes written and an error, if any. WriteAt returns a
non-nil error when n != len(b).
If file was opened with the O_APPEND flag, WriteAt returns an error.
Notice the last line, that if the file's opened with O_APPEND flag it will result in an error and it's even right because WriteAt's second argument is an offset but mixing O_APPEND's behaviour and WriteAt offset seeking might create problem resulting in unexpected results and it errors out.
Consider the definition of s3manager.Downloader:
func (d Downloader) Download(w io.WriterAt, input *s3.GetObjectInput, options ...func(*Downloader)) (n int64, err error)
The first argument is an io.WriterAt; this interface is:
type WriterAt interface {
WriteAt(p []byte, off int64) (n int, err error)
}
This means that the Download function is going to call the WriteAt method in the File you are passing it. As per the documentation for File.WriteAt
If file was opened with the O_APPEND flag, WriteAt returns an error.
So this explains why you are getting the error but raises the question "why is Download using WriteAt and not accepting an io.Writer (and calling Write)?"; the answer can be found in the documentation:
The w io.WriterAt can be satisfied by an os.File to do multipart concurrent downloads, or in memory []byte wrapper using aws.WriteAtBuffer
So, to increase performance, Downloader might make multiple simultaneous requests for parts of the file and then write these out as they are received (meaning it may not write the data in order). This also explains why calling the function multiple times with the same File results in overwritten data (when Downloader retrieves the each chunk of the file it writes it out at the appropriate position in the output file; this overwrites any data already there).
The above quote from the documentation also points to a possible solution; use an aws.WriteAtBuffer and, once the download is finished, write the data to your file (which could then be opened with O_APPEND) - something like this:
buf := aws.NewWriteAtBuffer([]byte{})
numBytes, err := downloader.Download(buf,
&s3.GetObjectInput{
Bucket: aws.String(Bucket),
Key: aws.String(fileToDownload),
})
if err != nil {
panic(err)
}
_, err = file.Write(buf.Bytes())
if err != nil {
panic(err)
}
An alternative would be to download into a temporary file and then append that to your output file (you may need to do this if the files are large).

Golang fix jpeg bad EOF formatting

For some jpeg image, the EOI is not ending with \xff\xd9, my example I see \xff\x00, so I am trying to fix this using go.
f, _ := os.Open("bad.jpeg")
img, _, err := image.Decode(f)
if err != nil {
fmt.Println(err)
}
fmt.Println("successfully decoded")
opt := jpeg.Options{
Quality: 100,
}
f1, _ := os.Create("good.jpeg")
jpeg.Encode(f1, img, &opt)
however, image.Decode(f) failed due to unexpected EOF, I would like to know how to fix the ending problem for bad formatted jpeg file.
With Python, I can simply do the following, open and save will automatically fix the EOI for me, any equivalent way in Go?
from PIL import Image
im = Image.open("bad.jpeg")
im.save("good.jpeg", quality=100)
here is the image I am testing
Here is a fairly naive solution that only works for this very specific case:
read the file, try to decode it. If it fails to decode, check the last two bytes and overwrite the last one if it's a known pattern. Try to decode it again. If successful, write the fixed bytes to the new file.
package main
import (
"bytes"
"image"
_ "image/jpeg"
"io/ioutil"
)
func main() {
contents, err := ioutil.ReadFile("bad.jpeg")
if err != nil {
panic(err)
}
buffer := bytes.NewBuffer(contents)
_, _, err = image.Decode(buffer)
if err == nil {
return
}
if err.Error() != "unexpected EOF" {
panic(err)
}
// Maybe wrong End-Of-Image.
if contents[len(contents)-1] == '\x00' && contents[len(contents)-2] == '\xff' {
contents[len(contents)-1] = '\xd9'
} else {
panic("don't know what to do")
}
// Reset buffer and decode again.
buffer = bytes.NewBuffer(contents)
_, _, err = image.Decode(buffer)
if err != nil {
panic(err)
}
// Write fixed buffer to the new file.
err = ioutil.WriteFile("good.jpeg", contents, 0644)
if err != nil {
panic(err)
}
}

Prevent ReadFile or ReadAll from reading EOF

I start learning Go and I am a bit puzzled by the fact it includes the EOF when using the ioutil.ReadFile function. I want, for example, to read a file and parse all its lines on a field separator.
Sample input File:
CZG;KCZG;some text
EKY;KEKY;some text
A50;KA50;some text
UKY;UCFL;some text
MIC;KMIC;some text
K2M;K23M;some text
This is what I do to read and parse that file:
import(
"fmt"
"log"
"io/ioutil"
"strings"
)
func main() {
/* Read file */
airportsFile := "/path/to/file/ad_iata"
content, err := ioutil.ReadFile(airportsFile)
if err != nil {
log.Fatal(err)
}
/* split content on EOL */
lines := strings.Split(string(content), "\n")
/* split line on field separator ; */
for _, line := range lines {
lineSplit := strings.Split(line, ";")
fmt.Println(lineSplit)
}
}
The string.Split function adds a empty element at the end of the lineSplit slice when it sees the EOF (nothing to parse). Therefore, if I want to access the second index of that slice (lineSplit[1]) I run into a panic: runtime error: index out of range. I have to restrict the range by doing this
/* split line on field separator ; */
lenLines := len(lines) -1
for _, line := range lines[:lenLines] {
lineSplit := strings.Split(line, ";")
fmt.Println(lineSplit[1])
}
Is there a better way if I want to keep using ReadFile for its terseness ?
The same problem occurs when using ioutil.ReadAll
There is no such thing as an "EOF byte" or "EOF character". What you are seeing is probably caused by a line break character ('\n') at the very end of the file.
To read a file line by line, it's more idiomatic to use bufio.Scanner instead:
file, err := os.Open(airportsFile)
if err != nil {
log.Fatal(err)
}
defer file.Close()
scanner := bufio.NewScanner(file)
for scanner.Scan() {
line := scanner.Text()
// ... use line as you please ...
}
if err := scanner.Err(); err != nil {
log.Fatal(err)
}
And this actually addresses your problem, because Scanner will read the final newline without starting a new line, as evidenced by this playground example.
Your input File seeems to be CSV file, so you can use encoding/csv
airportsFile := "/path/to/file/ad_iata"
content, err := os.Open(airportsFile)
if err != nil {
log.Fatal(err)
}
r := csv.NewReader(content)
r.Comma = ';'
records, err := r.ReadAll() /* split line on field separator ; */
if err != nil {
log.Fatal(err)
}
fmt.Println(records)
which looks terse enough for me and provide correct output
[[CZG KCZG some text] [EKY KEKY some text] [A50 KA50 some text] [UKY UCFL some text] [MIC KMIC some text] [K2M K23M some text]]
You may use scanner.Err() to check for errors on file read.
// Err returns the first non-EOF error that was encountered by the Scanner.
func (s *Scanner) Err() error {
if s.err == io.EOF {
return nil
}
return s.err
}
In general in go the idiomatic way to read and parse a file is to use bufio.NewScanner which accept as an input parameter the file to read and returns a new Scanner.
Considering the above remarks here is a way you can read and parse a file:
package main
import (
"bufio"
"fmt"
"os"
)
func main() {
input, err := os.Open("example.txt")
if err != nil {
panic("Error happend during opening the file. Please check if file exists!")
os.Exit(1)
}
defer input.Close()
scanner := bufio.NewScanner(input)
for scanner.Scan() {
line := scanner.Text()
fmt.Printf("%v\n", line)
}
if err := scanner.Err(); err != nil {
fmt.Fprintln(os.Stderr, "reading input:", err)
}
}

How do I read in a large flat file

I have a flat file that has 339276 line of text in it for a size of 62.1 MB. I am attempting to read in all the lines, parse them based on some conditions I have and then insert them into a database.
I originally attempted to use a bufio.Scan() loop and bufio.Text() to get the line but I was running out of buffer space. I switched to using bufio.ReadLine/ReadString/ReadByte (I tried each) and had the same problem with each. I didn't have enough buffer space.
I tried using read and setting the buffer size but as the document says it actually a const that can be made smaller but never bigger that 64*1024 bytes. I then tried to use File.ReadAt where I set the starting postilion and moved it along as I brought in each section to no avail. I have looked at the following examples and explanations (not an exhaustive list):
Read text file into string array (and write)
How to Read last lines from a big file with Go every 10 secs
reading file line by line in go
How do I read in an entire file (either line by line or the whole thing at once) into a slice so I can then go do things to the lines?
Here is some code that I have tried:
file, err := os.Open(feedFolder + value)
handleError(err)
defer file.Close()
// fileInfo, _ := file.Stat()
var linesInFile []string
r := bufio.NewReader(file)
for {
path, err := r.ReadLine("\n") // 0x0A separator = newline
linesInFile = append(linesInFile, path)
if err == io.EOF {
fmt.Printf("End Of File: %s", err)
break
} else if err != nil {
handleError(err) // if you return error
}
}
fmt.Println("Last Line: ", linesInFile[len(linesInFile)-1])
Here is something else I tried:
var fileSize int64 = fileInfo.Size()
fmt.Printf("File Size: %d\t", fileSize)
var bufferSize int64 = 1024 * 60
bytes := make([]byte, bufferSize)
var fullFile []byte
var start int64 = 0
var interationCounter int64 = 1
var currentErr error = nil
for currentErr != io.EOF {
_, currentErr = file.ReadAt(bytes, st)
fullFile = append(fullFile, bytes...)
start = (bufferSize * interationCounter) + 1
interationCounter++
}
fmt.Printf("Err: %s\n", currentErr)
fmt.Printf("fullFile Size: %s\n", len(fullFile))
fmt.Printf("Start: %d", start)
var currentLine []string
for _, value := range fullFile {
if string(value) != "\n" {
currentLine = append(currentLine, string(value))
} else {
singleLine := strings.Join(currentLine, "")
linesInFile = append(linesInFile, singleLine)
currentLine = nil
}
}
I am at a loss. Either I don't understand exactly how the buffer works or I don't understand something else. Thanks for reading.
bufio.Scan() and bufio.Text() in a loop perfectly works for me on a files with much larger size, so I suppose you have lines exceeded buffer capacity. Then
check your line ending
and which Go version you use path, err :=r.ReadLine("\n") // 0x0A separator = newline? Looks like func (b *bufio.Reader) ReadLine() (line []byte, isPrefix bool, err error) has return value isPrefix specifically for your use case
http://golang.org/pkg/bufio/#Reader.ReadLine
It's not clear that it's necessary to read in all the lines before parsing them and inserting them into a database. Try to avoid that.
You have a small file: "a flat file that has 339276 line of text in it for a size of 62.1 MB." For example,
package main
import (
"bytes"
"fmt"
"io"
"io/ioutil"
)
func readLines(filename string) ([]string, error) {
var lines []string
file, err := ioutil.ReadFile(filename)
if err != nil {
return lines, err
}
buf := bytes.NewBuffer(file)
for {
line, err := buf.ReadString('\n')
if len(line) == 0 {
if err != nil {
if err == io.EOF {
break
}
return lines, err
}
}
lines = append(lines, line)
if err != nil && err != io.EOF {
return lines, err
}
}
return lines, nil
}
func main() {
// a flat file that has 339276 lines of text in it for a size of 62.1 MB
filename := "flat.file"
lines, err := readLines(filename)
fmt.Println(len(lines))
if err != nil {
fmt.Println(err)
return
}
}
It seems to me this variant of readLines is shorter and faster than suggested peterSO
func readLines(filename string) (map[int]string, error) {
lines := make(map[int]string)
data, err := ioutil.ReadFile(filename)
if err != nil {
return nil, err
}
for n, line := range strings.Split(string(data), "\n") {
lines[n] = line
}
return lines, nil
}
package main
import (
"fmt"
"os"
"log"
"bufio"
)
func main() {
FileName := "assets/file.txt"
file, err := os.Open(FileName)
if err != nil {
log.Fatal(err)
}
defer file.Close()
scanner := bufio.NewScanner(file)
for scanner.Scan() {
fmt.Println(scanner.Text())
}
}

Go: zlib uncompressing a slice of bytes

I am trying to parse a file that annoying consists of many separately zipped segments. I have parsed these segments one at a time into a slice of bytes and I want to uncompress them as I go.
Here is my current code that does the decompressing, which doesn't work. from and to are just set at the top as an example, in reality they are set by the code. data is the byte array containing the entire file. I don't want to seek it while it's on disk because its location on another server, so it's only realistic for me to load the entire file to []byte first and then parse it.
from, to := 0, 1000;
b := bytes.NewReader(data[from:from+to])
z, err := zlib.NewReader(b)
CheckErr(err)
defer z.Close()
p := make([]byte,0,1024)
z.Read(p)
fmt.Println(string(p))
So how is it so massively difficult just to unzip a slice of bytes? Anyway...
The problem appears to with how I am reading it out. Where it says z.Read, that doesn't seem to do anything.
How can I read the entire thing in one go into a slice of bytes?
Here's an outline for you. Note: In Go, CHECK FOR ERRORS!
package main
import (
"bytes"
"compress/zlib"
"fmt"
"io/ioutil"
)
func readSegment(data []byte, from, to int) ([]byte, error) {
b := bytes.NewReader(data[from : from+to])
z, err := zlib.NewReader(b)
if err != nil {
return nil, err
}
defer z.Close()
p, err := ioutil.ReadAll(z)
if err != nil {
return nil, err
}
return p, nil
}
func main() {
from, to := 0, 1000
data := make([]byte, from+to)
// ** parse input segments into data **
p, err := readSegment(data, from, to)
if err != nil {
fmt.Println(err)
return
}
fmt.Println(string(p))
}
Use ReadAll(r io.Reader) ([]byte, error) from the io/ioutil package.
p, err := ioutil.ReadAll(b)
fmt.Println(string(p))
Read only reads up to the length of the given slice (1024 bytes in your case).
To read in chunks of 1024 bytes:
p := make([]byte,1024)
for {
numBytes, err := l.Read(p)
if err == io.EOF {
// you are done, numBytes might be less than len(p)
break
}
// do what you want with p
}
If you are getting the data from a webserver, you might even do
import (
"net/http"
"io/ioutil"
)
...
resp, errGet := http.Get("http://example.com/somefile")
// do error handling
z, errZ := zlib.NewReader(resp.Body)
// do error handling
resp.Body.Close()
p, err := ioutil.ReadAll(b)
// do error handling
since resp.Body happens to be an io.Reader as most io related types.

Resources