I have the following code which is suppose to download file by splitting it into multiple parts. But right now it only works on images, when I try downloading other files like tar files the output is an invalid file.
UPDATED:
Used os.WriteAt instead of os.Write and removed os.O_APPEND file mode.
package main
import (
"errors"
"flag"
"fmt"
"io/ioutil"
"log"
"net/http"
"os"
"strconv"
)
var file_url string
var workers int
var filename string
func init() {
flag.StringVar(&file_url, "url", "", "URL of the file to download")
flag.StringVar(&filename, "filename", "", "Name of downloaded file")
flag.IntVar(&workers, "workers", 2, "Number of download workers")
}
func get_headers(url string) (map[string]string, error) {
headers := make(map[string]string)
resp, err := http.Head(url)
if err != nil {
return headers, err
}
if resp.StatusCode != 200 {
return headers, errors.New(resp.Status)
}
for key, val := range resp.Header {
headers[key] = val[0]
}
return headers, err
}
func download_chunk(url string, out string, start int, stop int) {
client := new(http.Client)
req, _ := http.NewRequest("GET", url, nil)
req.Header.Add("Range", fmt.Sprintf("bytes=%d-%d", start, stop))
resp, _ := client.Do(req)
defer resp.Body.Close()
body, err := ioutil.ReadAll(resp.Body)
if err != nil {
log.Fatalln(err)
return
}
file, err := os.OpenFile(out, os.O_WRONLY, 0600)
if err != nil {
if file, err = os.Create(out); err != nil {
log.Fatalln(err)
return
}
}
defer file.Close()
if _, err := file.WriteAt(body, int64(start)); err != nil {
log.Fatalln(err)
return
}
fmt.Println(fmt.Sprintf("Range %d-%d: %d", start, stop, resp.ContentLength))
}
func main() {
flag.Parse()
headers, err := get_headers(file_url)
if err != nil {
fmt.Println(err)
} else {
length, _ := strconv.Atoi(headers["Content-Length"])
bytes_chunk := length / workers
fmt.Println("file length: ", length)
for i := 0; i < workers; i++ {
start := i * bytes_chunk
stop := start + (bytes_chunk - 1)
go download_chunk(file_url, filename, start, stop)
}
var input string
fmt.Scanln(&input)
}
}
Basically, it just reads the length of the file, divides it with the number of workers then each file downloads using HTTP's Range header, after downloading it seeks to a position in the file where that chunk is written.
If you really ignore many errors like seen above then your code is not supposed to work reliably for any file type.
However, I guess I can see on problem in your code. I think that mixing O_APPEND and seek is probably a mistake (Seek should be ignored with this mode). I suggest to use (*os.File).WriteAt instead.
IIRC, O_APPEND forces any write to happen at the [current] end of file. However, your download_chunk function instances for file parts can be executing in unpredictable order, thus "reordering" the file parts. The result is then a corrupted file.
1.the sequence of the go routine is not sure。
eg. the execute result maybe as follows:
...
file length:20902
Range 10451-20901:10451
Range 0-10450:10451
...
so the chunks can't just append.
2.when write chunk datas must have a sys.Mutex
(my english is poor,please forget it)
Related
I have a tar file that contains multiple tar files in it. I'm currently extracting these tars recursively using the tar Reader by moving manually over the files. This process is very heavy and slow, especially when dealing with large tar files that contain thousands of files and directories.
I didn't find any good package that is able to do this recursive extraction fast. plus I tried using the command tar -xf file.tar --same-owner" for the inner tars, but had a problem with permissions issue (which happens only on mac).
my question is:
Is there a way to parallelize the manual extraction process so that the inner tars will be extracted in parallel?
I have a method for the extraction task which I'm trying to make parallel:
var wg sync.WaitGroup
wg.Add(len(tarFiles))
for {
header, err := tarBallReader.Next()
if err != nil {
break
}
go extractFileAsync(parentFolder, header, tarBallReader, depth, &wg)
}
wg.Wait()
after adding the go routines, the files are getting corrupted and the process is stuck on an endless loop.
example of the main tar content:
1d2755f3375860aaaf2b5f0474692df2e0d4329569c1e8187595bf4b3bf3f3b9/
1d2755f3375860aaaf2b5f0474692df2e0d4329569c1e8187595bf4b3bf3f3b9/VERSION
1d2755f3375860aaaf2b5f0474692df2e0d4329569c1e8187595bf4b3bf3f3b9/json
1d2755f3375860aaaf2b5f0474692df2e0d4329569c1e8187595bf4b3bf3f3b9/layer.tar
348188998f2a69b4ac0ca96b42990292eef67c0abfa05412e2fb7857645f4280/
348188998f2a69b4ac0ca96b42990292eef67c0abfa05412e2fb7857645f4280/VERSION
348188998f2a69b4ac0ca96b42990292eef67c0abfa05412e2fb7857645f4280/json
348188998f2a69b4ac0ca96b42990292eef67c0abfa05412e2fb7857645f4280/layer.tar
54c027bf04447fdb035ddc13a6ae5493a3f997bdd3577607b0980954522efb9e.json
9dd3c29af50daaf86744a8ade86ecf12f6a5a6ffc27a5a7398628e4a21770ee3/
9dd3c29af50daaf86744a8ade86ecf12f6a5a6ffc27a5a7398628e4a21770ee3/VERSION
9dd3c29af50daaf86744a8ade86ecf12f6a5a6ffc27a5a7398628e4a21770ee3/json
9dd3c29af50daaf86744a8ade86ecf12f6a5a6ffc27a5a7398628e4a21770ee3/layer.tar
b6c49400b643245cdbe17b7a7eb14f0f7def5a93326b99560241715c1e95502e/
b6c49400b643245cdbe17b7a7eb14f0f7def5a93326b99560241715c1e95502e/VERSION
b6c49400b643245cdbe17b7a7eb14f0f7def5a93326b99560241715c1e95502e/json
b6c49400b643245cdbe17b7a7eb14f0f7def5a93326b99560241715c1e95502e/layer.tar
c662ec0dc487910e7b76b2a4d67ab1a9ca63ce1784f636c2637b41d6c7ac5a1e/
c662ec0dc487910e7b76b2a4d67ab1a9ca63ce1784f636c2637b41d6c7ac5a1e/VERSION
c662ec0dc487910e7b76b2a4d67ab1a9ca63ce1784f636c2637b41d6c7ac5a1e/json
c662ec0dc487910e7b76b2a4d67ab1a9ca63ce1784f636c2637b41d6c7ac5a1e/layer.tar
da87454b77f6ac7fab1f465c10a07a1eb4b46df8058d98892794618cac8eacdc/
da87454b77f6ac7fab1f465c10a07a1eb4b46df8058d98892794618cac8eacdc/VERSION
da87454b77f6ac7fab1f465c10a07a1eb4b46df8058d98892794618cac8eacdc/json
da87454b77f6ac7fab1f465c10a07a1eb4b46df8058d98892794618cac8eacdc/layer.tar
ea1c2adfdc777d8746e50ad3e679789893a991606739c9bc7e01f273fa0b6e12/
ea1c2adfdc777d8746e50ad3e679789893a991606739c9bc7e01f273fa0b6e12/VERSION
ea1c2adfdc777d8746e50ad3e679789893a991606739c9bc7e01f273fa0b6e12/json
ea1c2adfdc777d8746e50ad3e679789893a991606739c9bc7e01f273fa0b6e12/layer.tar
f3b6608e814053048d79e519be79f654a2e9364dfdc8fb87b71e2fc57bbff115/
f3b6608e814053048d79e519be79f654a2e9364dfdc8fb87b71e2fc57bbff115/VERSION
f3b6608e814053048d79e519be79f654a2e9364dfdc8fb87b71e2fc57bbff115/json
f3b6608e814053048d79e519be79f654a2e9364dfdc8fb87b71e2fc57bbff115/layer.tar
manifest.json
repositories
or simply you can run docker save <image>:<tag> -o image.tar and check the content of the tar.
Probably your code hangs on wg.Wait() due to the fact that the number of calls to wg.Done() during execution is not equal to len(tarFiles).
That should work:
var wg sync.WaitGroup
// wg.Add(len(tarFiles))
for {
header, err := tarBallReader.Next()
if err != nil {
break
}
wg.Add(1)
go extractFileAsync(parentFolder, header, tarBallReader, depth, &wg)
}
wg.Wait()
func extractFileAsync(...) {
defer wg.Done()
// some code
}
UPD: correction of a possible race condition. Thanks #craigb
Here is my solution to a similar problem (simplified):
package main
import (
"archive/tar"
"fmt"
"io"
"os"
"path/filepath"
"strings"
"sync"
)
type Semaphore struct {
Wg sync.WaitGroup
Ch chan int
}
// Limit on the number of simultaneously running goroutines.
// Depends on the number of processor cores, storage performance, amount of RAM, etc.
const grMax = 10
const tarFileName = "docker_image.tar"
const dstDir = "output/docker"
func extractTar(tarFileName string, dstDir string) error {
f, err := os.Open(tarFileName)
if err != nil {
return err
}
sem := Semaphore{}
sem.Ch = make(chan int, grMax)
if err := Untar(dstDir, f, &sem, true); err != nil {
return err
}
fmt.Println("extractTar: wait for complete")
sem.Wg.Wait()
return nil
}
func Untar(dst string, r io.Reader, sem *Semaphore, godeep bool) error {
tr := tar.NewReader(r)
for {
header, err := tr.Next()
switch {
case err == io.EOF:
return nil
case err != nil:
return err
}
// the target location where the dir/file should be created
target := filepath.Join(dst, header.Name)
switch header.Typeflag {
// if its a dir and it doesn't exist create it
case tar.TypeDir:
if _, err := os.Stat(target); err != nil {
if err := os.MkdirAll(target, 0755); err != nil {
return err
}
}
// if it's a file create it
case tar.TypeReg:
if err := saveFile(tr, target, os.FileMode(header.Mode)); err != nil {
return err
}
ext := filepath.Ext(target)
// if it's tar file and we are on top level, extract it
if ext == ".tar" && godeep {
sem.Wg.Add(1)
// A buffered channel is used to limit the number of simultaneously running goroutines
sem.Ch <- 1
// the file is unpacked to a directory with the file name (without extension)
newDir := filepath.Join(dst, strings.TrimSuffix(header.Name, ".tar"))
if err := os.Mkdir(newDir, 0755); err != nil {
return err
}
go func(target string, newDir string, sem *Semaphore) {
fmt.Println("start goroutine, chan length:", len(sem.Ch))
fmt.Println("START:", target)
defer sem.Wg.Done()
defer func() {<-sem.Ch}()
// the internal tar file opens
ft, err := os.Open(target)
if err != nil {
fmt.Println(err)
return
}
defer ft.Close()
// the godeep parameter is false here to avoid unpacking archives inside the current archive.
if err := Untar(newDir, ft, sem, false); err != nil {
fmt.Println(err)
return
}
fmt.Println("DONE:", target)
}(target, newDir, sem)
}
}
}
return nil
}
func saveFile(r io.Reader, target string, mode os.FileMode) error {
f, err := os.OpenFile(target, os.O_CREATE|os.O_RDWR, mode)
if err != nil {
return err
}
defer f.Close()
if _, err := io.Copy(f, r); err != nil {
return err
}
return nil
}
func main() {
err := extractTar(tarFileName, dstDir)
if err != nil {
fmt.Println(err)
}
}
I am trying to create a .tar.gz file from folder that contains multiple files / folders. Once the .tar.gz file gets created, while extracting, the files are not not properly extracted. Mostly I think its because of large names or path exceeding some n characters, because same thing works when the filename/path is small. I referred this https://github.com/golang/go/issues/17630 and tried to add below code but it did not help.
header.Uid = 0
header.Gid = 0
I am using simple code seen below to create .tar.gz. The approach is, I create a temp folder, do some processing on the files and from that temp path, I create the .tar.gz file hence in the path below I am using pre-defined temp folder path.
package main
import (
"archive/tar"
"compress/gzip"
"fmt"
"io"
"log"
"os"
fp "path/filepath"
)
func main() {
// Create output file
out, err := os.Create("output.tar.gz")
if err != nil {
log.Fatalln("Error writing archive:", err)
}
defer out.Close()
// Create the archive and write the output to the "out" Writer
tmpDir := "C:/Users/USERNAME~1/AppData/Local/Temp/temp-241232063"
err = createArchive1(tmpDir, out)
if err != nil {
log.Fatalln("Error creating archive:", err)
}
fmt.Println("Archive created successfully")
}
func createArchive1(path string, targetFile *os.File) error {
gw := gzip.NewWriter(targetFile)
defer gw.Close()
tw := tar.NewWriter(gw)
defer tw.Close()
// walk through every file in the folder
err := fp.Walk(path, func(filePath string, info os.FileInfo, err error) error {
// ensure the src actually exists before trying to tar it
if _, err := os.Stat(filePath); err != nil {
return err
}
if err != nil {
return err
}
if info.IsDir() {
return nil
}
file, err := os.Open(filePath)
if err != nil {
return err
}
defer file.Close()
// generate tar header
header, err := tar.FileInfoHeader(info, info.Name())
header.Uid = 0
header.Gid = 0
if err != nil {
return err
}
header.Name = filePath //strings.TrimPrefix(filePath, fmt.Sprintf("%s/", fp.Dir(path))) //info.Name()
// write header
if err := tw.WriteHeader(header); err != nil {
return err
}
if _, err := io.Copy(tw, file); err != nil {
return err
}
return nil
})
return err
}
Please let me know what wrong I am doing.
The process cannot access the file ... because it is being used by another process
I can't Remover Zip file with this code ..
it's possible? extract and delete the file in one code.
Code
package main
import (
"archive/zip"
"fmt"
"io"
"log"
"net/http"
"os"
"path/filepath"
"strings"
)
func main() {
url := "https://230c07c8-77b2-4c0d-9b82-8c6501a5bc45.filesusr.com/archives/b7572a_9ec985e0031042ef912cb40cafbe6376.zip?dn=7.zip"
out, _ := os.Create("E:\\experi\\1234567890.zip")
defer out.Close()
resp, _ := http.Get(url)
defer resp.Body.Close()
_, _ = io.Copy(out, resp.Body)
files, err := Unzip("E:\\experi\\1234567890.zip", "E:\\experi\\1234567890")
if err != nil {
log.Fatal(err)
}
fmt.Println("Unzipped the following files:\n" + strings.Join(files, "\n"))
}
func Unzip(src string, destination string) ([]string, error) {
var filenames []string
r, err := zip.OpenReader(src)
if err != nil {
return filenames, err
}
defer r.Close()
for _, f := range r.File {
fpath := filepath.Join(destination, f.Name)
if !strings.HasPrefix(fpath, filepath.Clean(destination)+string(os.PathSeparator)){
return filenames, fmt.Errorf("%s is an illegal filepath", fpath)
}
filenames = append(filenames, fpath)
if f.FileInfo().IsDir() {
os.MkdirAll(fpath, os.ModePerm)
continue
}
if err = os.MkdirAll(filepath.Dir(fpath), os.ModePerm); err != nil {
return filenames, err
}
outFile, err := os.OpenFile(fpath,
os.O_WRONLY|os.O_CREATE|os.O_TRUNC | os.O_RDWR,
f.Mode())
if err != nil {
return filenames, err
}
rc, err := f.Open()
if err != nil {
return filenames, err
}
_, err = io.Copy(outFile, rc)
outFile.Close()
rc.Close()
if err != nil {
return filenames, err
}
}
removeFile()
return filenames, nil
}
func removeFile() {
error := os.Remove("E:\\experi\\1234567890.zip")
if error != nil {
log.Fatal(error)
}
}
Output
output text
2020/10/28 13:09:04 remove E:\experi\1234567890.zip: The process cannot access the file because it is being used by another process.
Process finished with exit code 1
Any other way to do this same thing ?
Did I go wrong anywhere?
Help Would be Much Appreciated. Thanks in Advance. :)
out, _ := os.Create("E:\\experi\\1234567890.zip") creates or truncates the file and returns you a *File (so the file is open).
defer out.Close() closes the file "the moment the surrounding function returns" (spec).
So at the time you call Unzip you have the file open. To fix this call out.Close() before the call to Unzip (and please don't assume that calls complete without error).
If you close using the defer, it is closed after performing up to the last line of the function. You must explicitly close the file before remove it.
With the below code I can download a file from internet asking with monitoring the downloaded percentage.
How can I do something to upload file to internet as well as monitoring the upload progress. I want to upload executable file at github assets
package main
import (
"fmt"
"io"
"net/http"
"os"
"strings"
"github.com/dustin/go-humanize"
)
// WriteCounter counts the number of bytes written to it. It implements to the io.Writer interface
// and we can pass this into io.TeeReader() which will report progress on each write cycle.
type WriteCounter struct {
Total uint64
}
func (wc *WriteCounter) Write(p []byte) (int, error) {
n := len(p)
wc.Total += uint64(n)
wc.PrintProgress()
return n, nil
}
func (wc WriteCounter) PrintProgress() {
// Clear the line by using a character return to go back to the start and remove
// the remaining characters by filling it with spaces
fmt.Printf("\r%s", strings.Repeat(" ", 35))
// Return again and print current status of download
// We use the humanize package to print the bytes in a meaningful way (e.g. 10 MB)
fmt.Printf("\rDownloading... %s complete", humanize.Bytes(wc.Total))
}
func main() {
fmt.Println("Download Started")
fileUrl := "https://upload.wikimedia.org/wikipedia/commons/d/d6/Wp-w4-big.jpg"
err := DownloadFile("avatar.jpg", fileUrl)
if err != nil {
panic(err)
}
fmt.Println("Download Finished")
}
// DownloadFile will download a url to a local file. It's efficient because it will
// write as it downloads and not load the whole file into memory. We pass an io.TeeReader
// into Copy() to report progress on the download.
func DownloadFile(filepath string, url string) error {
// Create the file, but give it a tmp file extension, this means we won't overwrite a
// file until it's downloaded, but we'll remove the tmp extension once downloaded.
out, err := os.Create(filepath + ".tmp")
if err != nil {
return err
}
// Get the data
resp, err := http.Get(url)
if err != nil {
out.Close()
return err
}
defer resp.Body.Close()
// Create our progress reporter and pass it to be used alongside our writer
counter := &WriteCounter{}
if _, err = io.Copy(out, io.TeeReader(resp.Body, counter)); err != nil {
out.Close()
return err
}
// The progress use the same line so print a new line once it's finished downloading
fmt.Print("\n")
// Close the file without defer so it can happen before Rename()
out.Close()
if err = os.Rename(filepath+".tmp", filepath); err != nil {
return err
}
return nil
}
I just modify your code. It works for my file server.
func UploadFile(filepath string, url string) error {
// Create the file, but give it a tmp file extension, this means we won't overwrite a
// file until it's downloaded, but we'll remove the tmp extension once downloaded.
out, err := os.Open(filepath)
if err != nil {
return err
}
// Create our progress reporter and pass it to be used alongside our writer
counter := &WriteCounter{}
// Get the data
resp, err := http.Post(url, "multipart/form-data", io.TeeReader(out, counter))
if err != nil {
out.Close()
log.Println(err.Error())
return err
}
defer resp.Body.Close()
// The progress use the same line so print a new line once it's finished downloading
fmt.Print("\n")
// Close the file without defer so it can happen before Rename()
out.Close()
return nil
}
The ability to read (and write) a text file into and out of a string array is I believe a fairly common requirement. It is also quite useful when starting with a language removing the need initially to access a database. Does one exist in Golang?
e.g.
func ReadLines(sFileName string, iMinLines int) ([]string, bool) {
and
func WriteLines(saBuff[]string, sFilename string) (bool) {
I would prefer to use an existing one rather than duplicate.
As of Go1.1 release, there is a bufio.Scanner API that can easily read lines from a file. Consider the following example from above, rewritten with Scanner:
package main
import (
"bufio"
"fmt"
"log"
"os"
)
// readLines reads a whole file into memory
// and returns a slice of its lines.
func readLines(path string) ([]string, error) {
file, err := os.Open(path)
if err != nil {
return nil, err
}
defer file.Close()
var lines []string
scanner := bufio.NewScanner(file)
for scanner.Scan() {
lines = append(lines, scanner.Text())
}
return lines, scanner.Err()
}
// writeLines writes the lines to the given file.
func writeLines(lines []string, path string) error {
file, err := os.Create(path)
if err != nil {
return err
}
defer file.Close()
w := bufio.NewWriter(file)
for _, line := range lines {
fmt.Fprintln(w, line)
}
return w.Flush()
}
func main() {
lines, err := readLines("foo.in.txt")
if err != nil {
log.Fatalf("readLines: %s", err)
}
for i, line := range lines {
fmt.Println(i, line)
}
if err := writeLines(lines, "foo.out.txt"); err != nil {
log.Fatalf("writeLines: %s", err)
}
}
Note: ioutil is deprecated as of Go 1.16.
If the file isn't too large, this can be done with the ioutil.ReadFile and strings.Split functions like so:
content, err := ioutil.ReadFile(filename)
if err != nil {
//Do something
}
lines := strings.Split(string(content), "\n")
You can read the documentation on ioutil and strings packages.
Cannot update first answer.
Anyway, after Go1 release, there are some breaking changes, so I updated as shown below:
package main
import (
"os"
"bufio"
"bytes"
"io"
"fmt"
"strings"
)
// Read a whole file into the memory and store it as array of lines
func readLines(path string) (lines []string, err error) {
var (
file *os.File
part []byte
prefix bool
)
if file, err = os.Open(path); err != nil {
return
}
defer file.Close()
reader := bufio.NewReader(file)
buffer := bytes.NewBuffer(make([]byte, 0))
for {
if part, prefix, err = reader.ReadLine(); err != nil {
break
}
buffer.Write(part)
if !prefix {
lines = append(lines, buffer.String())
buffer.Reset()
}
}
if err == io.EOF {
err = nil
}
return
}
func writeLines(lines []string, path string) (err error) {
var (
file *os.File
)
if file, err = os.Create(path); err != nil {
return
}
defer file.Close()
//writer := bufio.NewWriter(file)
for _,item := range lines {
//fmt.Println(item)
_, err := file.WriteString(strings.TrimSpace(item) + "\n");
//file.Write([]byte(item));
if err != nil {
//fmt.Println("debug")
fmt.Println(err)
break
}
}
/*content := strings.Join(lines, "\n")
_, err = writer.WriteString(content)*/
return
}
func main() {
lines, err := readLines("foo.txt")
if err != nil {
fmt.Println("Error: %s\n", err)
return
}
for _, line := range lines {
fmt.Println(line)
}
//array := []string{"7.0", "8.5", "9.1"}
err = writeLines(lines, "foo2.txt")
fmt.Println(err)
}
You can use os.File (which implements the io.Reader interface) with the bufio package for that. However, those packages are build with fixed memory usage in mind (no matter how large the file is) and are quite fast.
Unfortunately this makes reading the whole file into the memory a bit more complicated. You can use a bytes.Buffer to join the parts of the line if they exceed the line limit. Anyway, I recommend you to try to use the line reader directly in your project (especially if do not know how large the text file is!). But if the file is small, the following example might be sufficient for you:
package main
import (
"os"
"bufio"
"bytes"
"fmt"
)
// Read a whole file into the memory and store it as array of lines
func readLines(path string) (lines []string, err os.Error) {
var (
file *os.File
part []byte
prefix bool
)
if file, err = os.Open(path); err != nil {
return
}
reader := bufio.NewReader(file)
buffer := bytes.NewBuffer(make([]byte, 1024))
for {
if part, prefix, err = reader.ReadLine(); err != nil {
break
}
buffer.Write(part)
if !prefix {
lines = append(lines, buffer.String())
buffer.Reset()
}
}
if err == os.EOF {
err = nil
}
return
}
func main() {
lines, err := readLines("foo.txt")
if err != nil {
fmt.Println("Error: %s\n", err)
return
}
for _, line := range lines {
fmt.Println(line)
}
}
Another alternative might be to use io.ioutil.ReadAll to read in the complete file at once and do the slicing by line afterwards. I don't give you an explicit example of how to write the lines back to the file, but that's basically an os.Create() followed by a loop similar to that one in the example (see main()).
func readToDisplayUsingFile1(f *os.File){
defer f.Close()
reader := bufio.NewReader(f)
contents, _ := ioutil.ReadAll(reader)
lines := strings.Split(string(contents), '\n')
}
or
func readToDisplayUsingFile1(f *os.File){
defer f.Close()
slice := make([]string,0)
reader := bufio.NewReader(f)
for{
str, err := reader.ReadString('\n')
if err == io.EOF{
break
}
slice = append(slice, str)
}