I need my program to be in the middle of the connection and transfer data correctly in both directions. I wrote this code, but it does not work properly
package main
import (
"fmt"
"net"
)
func main() {
listener, err := net.Listen("tcp", ":8120")
if err != nil {
fmt.Println(err)
return
}
defer listener.Close()
fmt.Println("Server is listening...")
for {
var conn1, conn2 net.Conn
var err error
conn1, err = listener.Accept()
if err != nil {
fmt.Println(err)
conn1.Close()
continue
}
conn2, err = net.Dial("tcp", "185.151.245.51:80")
if err != nil {
fmt.Println(err)
conn2.Close()
continue
}
go handleConnection(conn1, conn2)
go handleConnection(conn2, conn1)
}
}
func handleConnection(conn1, conn2 net.Conn) {
defer conn1.Close()
for {
input := make([]byte, 1024)
n, err := conn1.Read(input)
if n == 0 || err != nil {
break
}
conn2.Write([]byte(input))
}
}
The problem is that the data is corrupted,
for example.
Left one is original, right one is what i got.
End of the final gotten file is unreadable.
But at the beginnig everything is ok.
I tried to change input slice size. If size > 0 and < 8, everything is fine, but slow. If i set input size very large, corruption of data become more awful.
What I'm doing wrong?
In handleConnection, you always write 1024 bytes, no matter what conn1.Read returns.
You want to write the data like this:
conn2.Write(input[:n])
You should also check your top-level for loop. Are you sure you're not accepting multiple connections and smushing them all together? I'd sprinkle in some log statements so you can see when connections are made and closed.
Another (probably inconsequential) mistake, is that you treat n==0 as a termination condition. In the documentation of io.Reader it's recommended that you ignore n==0, err==nil. Without checking the code I can't be sure, but I expect that conn.Read never returns n==0, err==nil, so it's unlikely that this is causing you trouble.
Although it doesn't affect correctness, you could also lift the definition of input out of the loop so that it's reused on each iteration; it's likely to reduce the amount of work the garbage collector has to do.
Related
Go 1.12 on Linux 4.19.93 armv6l.
Hardware is a raspberypi zero w (BCM2835) running a yocto linux image.
I've got a gpio driven SRF04 proximity sensor driven by the srf04 linux driver.
It works great over sysfs and the busybox shell.
# cat /sys/bus/iio/devices/iio:device0/in_distance_raw
1646
I've used Go before with IIO devices that support triggers and buffered output at high sample rates on this hardware platform. However for this application the srf04 driver doesn't implement those IIO features. Drat. I don't really feel like adding buffer / trigger support to the driver myself (at this time) since I do not have a need for a 'high' sample rate. A handful of pings per second should suffice for my purpose. I figure I'll calculate mean & std. dev. for a rolling window of data points and 'divine' the signal out of the noise.
So with that - I'd be perfectly happy to Read the bytes from the published sysfs file with Go.
Which brings me to the point of this post.
When I open the file for reading, and try to Read() any number of bytes, I always get a generic -EIO error.
func (s *Srf04) Read() (int, error) {
samp := make([]byte, 16)
f, err := os.OpenFile(s.readPath, OS.O_RDONLY, os.ModeDevice)
if err != nil {
return 0, err
}
defer f.Close()
n, err := f.Read(samp)
if err != nil {
// This block is always executed.
// The error is never a timeout, and always 'input/output error' (-EIO aka -5)
log.Fatal(err)
}
...
}
This seems like strange behavior to me.
So I decided to mess with using io.ReadFull. This yielded unreliable results.
func (s *Srf04) Read() (int, error) {
samp := make([]byte, 16)
f, err := os.OpenFile(s.readPath, OS.O_RDONLY, os.ModeDevice)
if err != nil {
return 0, err
}
defer f.Close()
for {
n, err := io.ReadFull(readFile, samp)
log.Println("ReadFull ", n, " bytes.")
if err == io.EOF {
break
}
if err != nil {
log.Println(err)
}
}
...
}
I ended up adding it to a loop, as I found behavior changes from 'one-off' reads to multiple read calls subsequent to one another. I have it exiting if it gets an EOF, and repeatedly trying to read otherwise.
The results are straight-up crazy unreliable, seemingly returning random results. Sometimes I get the -5, other times I read between 2 - 5 bytes from the device. Sometimes I get bytes without an eof file before the EOF. The bytes appear to represent character data for numbers (each rune is a rune between [0-9]) -- which I'd expect.
Aside: I expect this is related to file polling and the go blocking IO implementation, but I have no way to really tell.
As a temporary workaround, I decided try using os.exec, and now I get results I'd expect to see.
func (s *Srf04)Read() (int, error) {
out, err := exec.Command("cat", s.readPath).Output()
if err != nil {
return 0, err
}
return strconv.Atoi(string(out))
}
But Yick. os.exec. Yuck.
I'd try to run that cat whatever encantation under strace and then peer at what read(2) calls cat actually manages to do (including the number of bytes actually read), and then I'd try to re-create that behaviour in Go.
My own sheer guess at the problem's cause is that the driver (or the sysfs layer) is not too well prepared to deal with certain access patterns.
For a start, consider that GNU cat is not a simple-minded byte shoveler but is rather a reasonably tricky piece of software, which, among other things, considers optimal I/O block sizes for both input and output devices (if available), calls fadvise(2) etc. It's not that any of that gets actually used when you run it on your sysfs-exported file, but it may influence how the full stack (starting with the sysfs layer) performs in the case of using cat and with your code, respectively.
Hence my advice: start with strace-ing the cat and then try to re-create its usage pattern in your Go code; then try to come up with a minimal subset of that, which works; then profoundly comment your code ;-)
I'm sure I've been looking at this too long tonight, and this code is probably terrible. That said, here's the snippet of what I came up with that works just as reliably as the busybox cat, but in Go.
The Srf04 struct carries a few things, the important bits are included below:
type Srf04 struct {
readBuf []byte `json:"-"`
readFile *os.File `json:"-"`
samples *ring.Ring `json:"-"`
}
func (s *Srf04) Read() (int, error) {
/** Reliable, but really really slow.
out, err := exec.Command("cat", s.readPath).Output()
if err != nil {
log.Fatal(err)
}
val, err := strconv.Atoi(string(out[:len(out) - 2]))
if err == nil {
s.samples.Value = val
s.samples = s.samples.Next()
}
*/
// Seek should tell us the new offset (0) and no err.
bytesRead := 0
_, err := s.readFile.Seek(0, 0)
// Loop until N > 0 AND err != EOF && err != timeout.
if err == nil {
n := 0
for {
n, err = s.readFile.Read(s.readBuf)
bytesRead += n
if os.IsTimeout(err) {
// bail out.
bytesRead = 0
break
}
if err == io.EOF {
// Success!
break
}
// Any other err means 'keep trying to read.'
}
}
if bytesRead > 0 {
val, err := strconv.Atoi(string(s.readBuf[:bytesRead-1]))
if err == nil {
fmt.Println(val)
s.samples.Value = val
s.samples = s.samples.Next()
}
return val, err
}
return 0, err
}
I am trying to make a program for checking file duplicates based on md5 checksum.
Not really sure whether I am missing something or not, but this function reading the XCode installer app (it has like 8GB) uses 16GB of Ram
func search() {
unique := make(map[string]string)
files, err := ioutil.ReadDir(".")
if err != nil {
log.Println(err)
}
for _, file := range files {
fileName := file.Name()
fmt.Println("CHECKING:", fileName)
fi, err := os.Stat(fileName)
if err != nil {
fmt.Println(err)
continue
}
if fi.Mode().IsRegular() {
data, err := ioutil.ReadFile(fileName)
if err != nil {
fmt.Println(err)
continue
}
sum := md5.Sum(data)
hexDigest := hex.EncodeToString(sum[:])
if _, ok := unique[hexDigest]; ok == false {
unique[hexDigest] = fileName
} else {
fmt.Println("DUPLICATE:", fileName)
}
}
}
}
As per my debugging the issue is with the file reading
Is there a better approach to do that?
thanks
There is an example in the Golang documentation, which covers your case.
package main
import (
"crypto/md5"
"fmt"
"io"
"log"
"os"
)
func main() {
f, err := os.Open("file.txt")
if err != nil {
log.Fatal(err)
}
defer f.Close()
h := md5.New()
if _, err := io.Copy(h, f); err != nil {
log.Fatal(err)
}
fmt.Printf("%x", h.Sum(nil))
}
For your case, just make sure to close the files in the loop and not defer them. Or put the logic into a function.
Sounds like the 16GB RAM is your problem, not speed per se.
Don't read the entire file into a variable with ReadFile; io.Copy from the Reader that Open gives you to the Writer that hash/md5 provides (md5.New returns a hash.Hash, which embeds an io.Writer). That only copies a little bit at a time instead of pulling all of the file into RAM.
This is a trick useful in a lot of places in Go; packages like text/template, compress/gzip, net/http, etc. work in terms of Readers and Writers. With them, you don't usually need to create huge []bytes or strings; you can hook I/O interfaces up to each other and let them pass around pieces of content for you. In a garbage collected language, saving memory tends to save you CPU work as well.
When I try to copy from a Reader to a Writer manually, I notice that this works:
func fromAToB(a, b net.Conn) {
buf := make([]byte, 1024*32)
for {
n, err := a.Read(buf)
if n > 0 {
if err != nil {
log.Fatal(err)
}
b.Write(buf[0:n])
}
}
}
But this doesn't
func fromAToB(a, b net.Conn) {
buf := make([]byte, 1024*32)
for {
_, err := a.Read(buf)
if err != nil {
log.Fatal(err)
}
b.Write(buf)
}
}
So the questions are:
Why is the check if n>0 necessary?
Is this only necessary for net.Conn or for any type that implements the Reader and Writer interfaces?
EDIT: The second snippet runs fine without any runtime error, just that the behavior is not correct. I want to know what is the effect of that n>0 check and what will happen under the surface when I remove it.
There's already a function io.Copy to do exactly this. You can see how it's implemented for a good example. It works with all io.Reader/io.Writer types.
I figured it out: without n, it will write the whole buffer (32*1024 bytes) to the Writer instead of just n bytes, and that's the source of the weird behavior.
I've modified the official documentation example for the zlib package to use an opened file rather than a set of hardcoded bytes (code below).
The code reads in the contents of a source text file and compresses it with the zlib package. I then try to read back the compressed file and print its decompressed contents into stdout.
The code doesn't error, but it also doesn't do what I expect it to do; which is to display the decompressed file contents into stdout.
Also: is there another way of displaying this information, rather than using io.Copy?
package main
import (
"compress/zlib"
"io"
"log"
"os"
)
func main() {
var err error
// This defends against an error preventing `defer` from being called
// As log.Fatal otherwise calls `os.Exit`
defer func() {
if err != nil {
log.Fatalln("\nDeferred log: \n", err)
}
}()
src, err := os.Open("source.txt")
if err != nil {
return
}
defer src.Close()
dest, err := os.Create("new.txt")
if err != nil {
return
}
defer dest.Close()
zdest := zlib.NewWriter(dest)
defer zdest.Close()
if _, err := io.Copy(zdest, src); err != nil {
return
}
n, err := os.Open("new.txt")
if err != nil {
return
}
r, err := zlib.NewReader(n)
if err != nil {
return
}
defer r.Close()
io.Copy(os.Stdout, r)
err = os.Remove("new.txt")
if err != nil {
return
}
}
Your defer func doesn't do anything, because you're shadowing the err variable on every new assignment. If you want a defer to run, return from a separate function, and call log.Fatal after the return statement.
As for why you're not seeing any output, it's because you're deferring all the Close calls. The zlib.Writer isn't flushed until after the function exits, and neither is the destination file. Call Close() explicitly where you need it.
zdest := zlib.NewWriter(dest)
if _, err := io.Copy(zdest, src); err != nil {
log.Fatal(err)
}
zdest.Close()
dest.Close()
I think you messed up the code logic with all this defer stuff and your "trick" err checking.
Files are definitively written when flushed or closed. You just copy into new.txt without closing it before opening it to read it.
Defering the closing of the file is neat inside a function which has multiple exits: It makes sure the file is closed once the function is left. But your main requires the new.txt to be closed after the copy, before re-opening it. So don't defer the close here.
BTW: Your defense against log.Fatal terminating the code without calling your defers is, well, at least strange. The files are all put into some proper state by the OS, there is absolutely no need to complicate the stuff like this.
Check the error from the second Copy:
2015/12/22 19:00:33
Deferred log:
unexpected EOF
exit status 1
The thing is, you need to close zdest immediately after you've done writing. Close it after the first Copy and it works.
I would have suggested to use io.MultiWriter.
In this way you read only once from src. Not much gain for small files but is faster for bigger files.
w := io.MultiWriter(dest, os.Stdout)
I am trying build a zip archive from a large number of small-medium sized files. I want to be able to do this concurrently, since compression is CPU intensive, and I'm running on a multi core server. Also I don't want to have the whole archive in memory, since its might turn out to be large.
My question is that do I have to compress every file and then combine manually combine everything together with zip header, checksum etc?
Any help would be greatly appreciated.
I don't think you can combine the zip headers.
What you could do is, run the zip.Writer sequentially, in a separate goroutine, and then spawn a new goroutine for each file that you want to read, and pipe those to the goroutine that is zipping them.
This should reduce the IO overhead that you get by reading the files sequentially, although it probably won't leverage multiple cores for the archiving itself.
Here's a working example. Note that, to keep things simple,
it does not handle errors nicely, just panics if something goes wrong,
and it does not use the defer statement too much, to demonstrate the order in which things should happen.
Since defer is LIFO, it can sometimes be confusing when you stack a lot of them together.
package main
import (
"archive/zip"
"io"
"os"
"sync"
)
func ZipWriter(files chan *os.File) *sync.WaitGroup {
f, err := os.Create("out.zip")
if err != nil {
panic(err)
}
var wg sync.WaitGroup
wg.Add(1)
zw := zip.NewWriter(f)
go func() {
// Note the order (LIFO):
defer wg.Done() // 2. signal that we're done
defer f.Close() // 1. close the file
var err error
var fw io.Writer
for f := range files {
// Loop until channel is closed.
if fw, err = zw.Create(f.Name()); err != nil {
panic(err)
}
io.Copy(fw, f)
if err = f.Close(); err != nil {
panic(err)
}
}
// The zip writer must be closed *before* f.Close() is called!
if err = zw.Close(); err != nil {
panic(err)
}
}()
return &wg
}
func main() {
files := make(chan *os.File)
wait := ZipWriter(files)
// Send all files to the zip writer.
var wg sync.WaitGroup
wg.Add(len(os.Args)-1)
for i, name := range os.Args {
if i == 0 {
continue
}
// Read each file in parallel:
go func(name string) {
defer wg.Done()
f, err := os.Open(name)
if err != nil {
panic(err)
}
files <- f
}(name)
}
wg.Wait()
// Once we're done sending the files, we can close the channel.
close(files)
// This will cause ZipWriter to break out of the loop, close the file,
// and unblock the next mutex:
wait.Wait()
}
Usage: go run example.go /path/to/*.log.
This is the order in which things should be happening:
Open output file for writing.
Create a zip.Writer with that file.
Kick off a goroutine listening for files on a channel.
Go through each file, this can be done in one goroutine per file.
Send each file to the goroutine created in step 3.
After processing each file in said goroutine, close the file to free up resources.
Once each file has been sent to said goroutine, close the channel.
Wait until the zipping has been done (which is done sequentially).
Once zipping is done (channel exhausted), the zip writer should be closed.
Only when the zip writer is closed, should the output file be closed.
Finally everything is closed, so close the sync.WaitGroup to tell the calling function that we're good to go. (A channel could also be used here, but sync.WaitGroup seems more elegant.)
When you get the signal from the zip writer that everything is properly closed, you can exit from main and terminate nicely.
This might not answer your question, but I've been using similar code to generate zip archives on-the-fly for a web service some time ago. It performed quite well, even though the actual zipping was done in a single goroutine. Overcoming the IO bottleneck can already be an improvement.
From the look of it, you won't be able to parallelise the compression using the standard library archive/zip package because:
Compression is performed by the io.Writer returned by zip.Writer.Create or CreateHeader.
Calling Create/CreateHeader implicitly closes the writer returned by the previous call.
So passing the writers returned by Create to multiple goroutines and writing to them in parallel will not work.
If you wanted to write your own parallel zip writer, you'd probably want to structure it something like this:
Have multiple goroutines compress files using the compress/flate module, and keep track of the CRC32 value and length of the uncompressed data. The output should be directed to temporary files. Note the compressed size of the data.
Once everything has been compressed, start writing the Zip file starting with the header.
Write out the file header followed by the contents of the corresponding temporary file for each compressed file.
Write out the central directory record and end record at the end of the file. All the required information should be available at this point.
For added parallelism, step 1 could be performed in parallel with the remaining steps by using a channel to indicate when compression of each file completes.
Due to the file format, you won't be able to perform parallel compression without either storing compressed data in memory or in temporary files.
With Go1.17, parallel compression and merging of zip files are possible using the archive/zip package.
An example is below. In the example, I create zip workers to create individual zip files and an entry provider worker which provides entries to be added to a zip file via a channel to zip workers. Actual files can be provided to the zip workers but I skipped that part.
package main
import (
"archive/zip"
"context"
"fmt"
"io"
"log"
"os"
"strings"
"golang.org/x/sync/errgroup"
)
const numOfZipWorkers = 10
type entry struct {
name string
rc io.ReadCloser
}
func main() {
log.SetFlags(log.LstdFlags | log.Lshortfile)
entCh := make(chan entry, numOfZipWorkers)
zpathCh := make(chan string, numOfZipWorkers)
group, ctx := errgroup.WithContext(context.Background())
for i := 0; i < numOfZipWorkers; i++ {
group.Go(func() error {
return zipWorker(ctx, entCh, zpathCh)
})
}
group.Go(func() error {
defer close(entCh) // Signal workers to stop.
return entryProvider(ctx, entCh)
})
err := group.Wait()
if err != nil {
log.Fatal(err)
}
f, err := os.OpenFile("output.zip", os.O_CREATE|os.O_TRUNC|os.O_WRONLY, 0644)
if err != nil {
log.Fatal(err)
}
zw := zip.NewWriter(f)
close(zpathCh)
for path := range zpathCh {
zrd, err := zip.OpenReader(path)
if err != nil {
log.Fatal(err)
}
for _, zf := range zrd.File {
err := zw.Copy(zf)
if err != nil {
log.Fatal(err)
}
}
_ = zrd.Close()
_ = os.Remove(path)
}
err = zw.Close()
if err != nil {
log.Fatal(err)
}
err = f.Close()
if err != nil {
log.Fatal(err)
}
}
func entryProvider(ctx context.Context, entCh chan<- entry) error {
for i := 0; i < 2*numOfZipWorkers; i++ {
select {
case <-ctx.Done():
return ctx.Err()
case entCh <- entry{
name: fmt.Sprintf("file_%d", i+1),
rc: io.NopCloser(strings.NewReader(fmt.Sprintf("content %d", i+1))),
}:
}
}
return nil
}
func zipWorker(ctx context.Context, entCh <-chan entry, zpathch chan<- string) error {
f, err := os.CreateTemp(".", "tmp-part-*")
if err != nil {
return err
}
zw := zip.NewWriter(f)
Loop:
for {
var (
ent entry
ok bool
)
select {
case <-ctx.Done():
err = ctx.Err()
break Loop
case ent, ok = <-entCh:
if !ok {
break Loop
}
}
hdr := &zip.FileHeader{
Name: ent.name,
Method: zip.Deflate, // zip.Store can also be used.
}
hdr.SetMode(0644)
w, e := zw.CreateHeader(hdr)
if e != nil {
_ = ent.rc.Close()
err = e
break
}
_, e = io.Copy(w, ent.rc)
_ = ent.rc.Close()
if e != nil {
err = e
break
}
}
if e := zw.Close(); e != nil && err == nil {
err = e
}
if e := f.Close(); e != nil && err == nil {
err = e
}
if err == nil {
select {
case <-ctx.Done():
err = ctx.Err()
case zpathch <- f.Name():
}
}
return err
}