pkg/sftp Much Slower than Linux SCP, Why? - go

With standard scp (version: 1:7.6p1-4ubuntu0.3) on Ubuntu 18.04, transferring a 110GB file to another host takes roughly 6-8 minutes.
When using Go's pkg/sftp, it takes double the amount of time:
Example:
package main
import (
"fmt"
"github.com/melbahja/goph"
"golang.org/x/crypto/ssh"
"io"
"log"
"os"
"time"
)
func main() {
auth, err := goph.Key("your/key", "")
if err != nil {
log.Fatalln(err)
}
client, err := goph.NewConn(&goph.Config{
User: "someUser",
Addr: "someHostname",
Port: 22,
Auth: auth,
//Timeout: time.Duration(timeout) * time.Second,
Callback: ssh.InsecureIgnoreHostKey(),
})
if err != nil {
log.Fatalln(err)
}
file := "/some/file"
start := time.Now()
local, err := os.Open(file)
if err != nil {
return
}
defer local.Close()
ftp, err := client.NewSftp()
// VARIATION 1 => ftp, err := client.NewSftp(sftp.MaxPacketUnchecked(1 << 16))
if err != nil {
return
}
defer ftp.Close()
remote, err := ftp.Create(file)
if err != nil {
return
}
defer remote.Close()
/*
VARIATION 1 => buffer := make([]byte, 1 << 16)
VARIATION 1 => _, err = io.CopyBuffer(remote, local, buffer)
*/
_, err = io.Copy(remote, local)
if err != nil {
log.Fatalln(err)
}
duration := time.Since(start)
fmt.Println(duration)
}
Note: I even attempted to increase the size of the read buffer (and tcp max packet size) with the commented out lines (see VARIATION 1), and it made no difference.
Any ideas as to why, and how to speed up the Go equivalent?

I've found that pkg/sftp supports concurrent upload and the concurrent upload speed is quite similar to linux sftp command.
Here the client initialization:
sftpConn, err := sftp.NewClient(sshConn,
sftp.UseConcurrentReads(true),
sftp.UseConcurrentWrites(true),
sftp.MaxConcurrentRequestsPerFile(64),
// Big max packet size can improve throughput.
sftp.MaxPacketUnchecked(128*agentio.KiB),
// Beware of customizing max packet size for download!
// On download, big max packet size can cause "connection lost" error.
)
Here is how you can use it, pay attention to ReadFrom:
func sftpUpload(c *sftp.Client, r io.Reader, path string) error {
fp, err := c.Create(path)
if err != nil {
return fmt.Errorf("create destination file: %w", err)
}
defer fp.Close()
_, err = fp.ReadFrom(r)
return err
}
To use concurrent upload the reader's underline struct has to be bytes.Reader, io.LimitedReader or satisfy one of the following interfaces:
Len() int
Size() int64
Stat() (os.FileInfo, error)
It's required to determine amount of bytes to upload.
If ReadFrom can't determine reader's size than it will fallback to single thread upload.
Check the ReadFrom source code:
func (f *File) ReadFrom(r io.Reader) (int64, error) {
f.mu.Lock()
defer f.mu.Unlock()
if f.c.useConcurrentWrites {
var remain int64
switch r := r.(type) {
case interface{ Len() int }:
remain = int64(r.Len())
case interface{ Size() int64 }:
remain = r.Size()
case *io.LimitedReader:
remain = r.N
case interface{ Stat() (os.FileInfo, error) }:
info, err := r.Stat()
if err == nil {
remain = info.Size()
}
}
Also, if you pass the reader to ReadFrom which producing a lot of small peaces of data than the throughput will be quite low even with concurrency.

Related

How to find server real IP

How do I find the public IP address of the machine or server my program is running on?
Like when the program is executed it detect the public of the server and print for e.g.running at 123.45.67.89
The short answer is that there's no method guaranteed to return your "public" ip address.
The first question is, what is your public ip address? The address of your machine, as seen by the system to which you are connecting, may vary depending on how your local Internet service is configured and on the service to which you're connecting:
As I mentioned in a comment, in a typical home environment your machine doesn't have a public ip address. The public address is hosted by your router.
If you're accessing a service through a proxy or vpn, your machine's address may be entirely different from when you're directly connecting to a service.
On a system with multiple interfaces, the origin address selected may depend upon the address to which you are connecting: different addresses may have different routes.
You can try using a service like http://icanhazip.com/ to try to determine your public ip. This will be correct in many but not all situations.
public IP address is a vague notion, in practice, it might or might not be a static address. What do you know about that ? It is just a endpoint valid for a certain amount of time, which depends on many factors like which interface was used to issue the query.
We can use the mainline bittorrent dht to give us some indications.
The Go language provides the cool dht package written by anacrolix.
When querying nodes with a find_peers verb we receive a packet containing the remote ip address the peer has associated with our query. This is described in bep10.
If an UDP connection is not a good option, you might opt for a query to bittorent trackers as described in bep24
Consider that peers might be malicious, thus the more results, the better.
Below program outputs the list of external network addresses associated with the computer initiating the query from the POV of the cohort of nodes queried.
Addresses are scored by the numbers of response.
read also https://www.bittorrent.org/beps/bep_0005.html
found 9 bootstrap peers
found 6 peers
4 [2001:861:51c5:xxx:40d1:8061:1fe0:xxx]:9090
2 81.96.42.191:9090
4 peers told us that we were using [2001:861:51c5:xxx:40d1:8061:1fe0:xxx]:9090, we can infer this is ipv6.
2 of them told we were using 81.96.42.191:9090, the ipv4 interface.
package main
import (
"encoding/json"
"errors"
"fmt"
"io/ioutil"
"log"
"net"
"os"
"sort"
"sync"
"time"
"github.com/anacrolix/dht"
"github.com/anacrolix/dht/krpc"
"github.com/anacrolix/torrent/bencode"
)
var maxTimeout = time.Second * 5
func main() {
b, _ := ioutil.ReadFile("db.json")
var rawAddrs []string
json.Unmarshal(b, &rawAddrs)
defer func() {
if len(rawAddrs) < 1 {
return
}
if len(rawAddrs) > 30 {
rawAddrs = rawAddrs[:30]
}
buf, err := json.Marshal(rawAddrs)
if err != nil {
panic(err)
}
err = ioutil.WriteFile("db.json", buf, os.ModePerm)
if err != nil {
panic(err)
}
fmt.Fprintf(os.Stderr, "%v peers recorded\n", len(rawAddrs))
}()
bootstrap, err := parseAddrs(rawAddrs)
if err != nil {
bootstrap, err = globalBootstrapAddrs()
if err != nil {
panic(err)
}
}
findPeers := []byte(`d1:ad2:id20:abcdefghij01234567899:info_hash20:mnopqrstuvwxyz123456e1:q9:get_peers1:t2:aa1:y1:qe`)
local, err := net.ResolveUDPAddr("udp", "0.0.0.0:9090")
if err != nil {
panic(err)
}
ln, err := net.ListenUDP("udp", local)
if err != nil {
panic(err)
}
addrscores := map[string]int{}
var drain drain
defer drain.Wait()
fmt.Fprintf(os.Stderr, "found %v bootstrap peers\n", len(bootstrap))
res, errs := readResponses(ln, len(bootstrap), sendQuery(ln, bootstrap, findPeers))
drain.Errors(errs)
peers := []net.Addr{}
for d := range res {
if isValidAddr(d.IP.UDP()) {
addrscores[d.IP.String()]++
d.R.ForAllNodes(func(arg1 krpc.NodeInfo) {
peers = append(peers, arg1.Addr.UDP())
})
}
}
if len(peers) > 0 {
fmt.Fprintf(os.Stderr, "found %v peers\n", len(peers))
res, errs = readResponses(ln, len(peers), sendQuery(ln, peers, findPeers))
drain.Errors(errs)
for d := range res {
if isValidAddr(d.IP.UDP()) {
addrscores[d.IP.String()]++
}
}
}
for _, peer := range peers {
if isValidAddr(peer) {
rawAddrs = append(rawAddrs, peer.String())
}
}
addrs := make([]string, 0, len(addrscores))
for addr := range addrscores {
addrs = append(addrs, addr)
}
sort.Slice(addrs, func(i int, j int) bool {
return addrscores[addrs[i]] > addrscores[addrs[j]]
})
for _, addr := range addrs {
fmt.Printf("%-4v %v\n", addrscores[addr], addr)
}
}
type drain struct{ sync.WaitGroup }
func (d *drain) Errors(errs <-chan error) {
d.Add(1)
go func() {
defer d.Done()
for err := range errs {
fmt.Fprintln(os.Stderr, err)
}
}()
}
func parseAddrs(rawAddrs []string) (addrs []net.Addr, err error) {
for _, s := range rawAddrs {
host, port, err := net.SplitHostPort(s)
if err != nil {
panic(err)
}
ua, err := net.ResolveUDPAddr("udp", net.JoinHostPort(host, port))
if err != nil {
log.Printf("error resolving %q: %v", host, err)
continue
}
addrs = append(addrs, ua)
}
if len(addrs) == 0 {
err = errors.New("nothing resolved")
}
return
}
func globalBootstrapAddrs() (addrs []net.Addr, err error) {
bootstrap, err := dht.GlobalBootstrapAddrs("udp")
if err != nil {
return nil, err
}
for _, b := range bootstrap {
addrs = append(addrs, b.Raw())
}
return
}
func isValidAddr(addr net.Addr) bool { // so weird guys.
return addr.String() != "<nil>" && addr.String() != ":0"
}
func sendQuery(ln *net.UDPConn, peers []net.Addr, query []byte) chan error {
errs := make(chan error)
for _, addr := range peers {
go func(addr net.Addr) {
_, err := ln.WriteTo(query, addr)
if err != nil {
errs <- addressedError{Op: "send", error: err, Addr: addr}
}
}(addr)
}
return errs
}
func readResponses(ln *net.UDPConn, count int, errs chan error) (<-chan krpc.Msg, <-chan error) {
data := make(chan krpc.Msg)
var wg sync.WaitGroup
for i := 0; i < count; i++ {
wg.Add(1)
go func() {
defer wg.Done()
buf := make([]byte, 1000)
ln.SetReadDeadline(time.Now().Add(maxTimeout))
n, remoteAddr, err := ln.ReadFromUDP(buf)
if err != nil {
errs <- addressedError{Op: "rcv", error: err, Addr: remoteAddr}
return
}
var m krpc.Msg
err = bencode.Unmarshal(buf[:n], &m)
if err != nil {
errs <- addressedError{Op: "rcv", error: err, Addr: remoteAddr}
return
}
data <- m
}()
}
go func() {
wg.Wait()
close(errs)
close(data)
}()
return data, errs
}
type addressedError struct {
error
Op string
Addr net.Addr
}
func (a addressedError) Error() string {
if !isValidAddr(a.Addr) {
return fmt.Sprintf("%-5v %v", a.Op, a.error.Error())
}
return fmt.Sprintf("%-5v %v: %v", a.Op, a.Addr.String(), a.error.Error())
}

Go GRPC Bidirectional Stream Performance

We are developing a high frequency trading platform and in one of our components we have implemented the grpc with golang. And we needed to use bidirectional streaming in one of our usecases , we made a sample implementation as in below code , however when we test the performance of the code by checking the difference between timestamps of the logs in
Recv Time %v Index: %v Num: %v
Send Time %v, Index: %v, Num: %v
we found out that calling .Send method of the stream from client side and receiving the same data by calling .Recv on the server side tooks approximately 400-800 microseconds which is too low for us. We need maximum 10-50 microseconds performance , and when we read the guidelines we saw that grpc can go up to nanoseconds if both client and server is in the same computer (Which is exactly our case)
So I think we are missing some options or some performance tricks about it. Does anyone know what we can do to increase this performance problem
Client Code:
package main
import (
"context"
"log"
"math/rand"
pb "github.com/pahanini/go-grpc-bidirectional-streaming-example/src/proto"
"time"
"google.golang.org/grpc"
)
func main() {
rand.Seed(time.Now().Unix())
// dail server
conn, err := grpc.Dial(":50005", grpc.WithInsecure())
if err != nil {
log.Fatalf("can not connect with server %v", err)
}
// create stream
client := pb.NewMathClient(conn)
stream, err := client.Max(context.Background())
if err != nil {
log.Fatalf("openn stream error %v", err)
}
var max int32
ctx := stream.Context()
done := make(chan bool)
msgCount := 100
fromMsg := 0
// first goroutine sends random increasing numbers to stream
// and closes int after 10 iterations
go func() {
for i := 1; i <= msgCount; i++ {
// generate random nummber and send it to stream
rnd := int32(i)
req := pb.Request{Num: rnd}
if i-1 >= fromMsg {
sendTime := time.Now().UnixNano()
log.Printf("Send Time %v, Index: %v, Num: %v", sendTime,i-1,req.Num)
}
if err := stream.Send(&req); err != nil {
log.Fatalf("can not send %v", err)
}
//afterSendTime := time.Now().UnixNano()
//log.Printf("After Send Time %v", afterSendTime)
//log.Printf("---------------")
//log.Printf("%d sent", req.Num)
//time.Sleep(time.Millisecond * 200)
}
if err := stream.CloseSend(); err != nil {
log.Println(err)
}
}()
// third goroutine closes done channel
// if context is done
go func() {
<-ctx.Done()
if err := ctx.Err(); err != nil {
log.Println(err)
}
close(done)
}()
<-done
log.Printf("finished with max=%d", max)
}
Server Code:
package main
import (
"io"
"log"
"net"
"time"
pb "github.com/pahanini/go-grpc-bidirectional-streaming-example/src/proto"
"google.golang.org/grpc"
)
type server struct{}
func (s server) Max(srv pb.Math_MaxServer) error {
log.Println("start new server")
var max int32
ctx := srv.Context()
i := 0
fromMsg := 0
for {
// exit if context is done
// or continue
select {
case <-ctx.Done():
return ctx.Err()
default:
}
// receive data from stream
req, err := srv.Recv()
if err == io.EOF {
// return will close stream from server side
log.Println("exit")
return nil
}
if err != nil {
log.Printf("receive error %v", err)
continue
}
if i >= fromMsg {
recvTime := time.Now().UnixNano()
log.Printf("Recv Time %v Index: %v Num: %v", recvTime,i,req.Num)
}
i++
// continue if number reveived from stream
// less than max
if req.Num <= max {
continue
}
// update max and send it to stream
/*
max = req.Num
resp := pb.Response{Result: max}
if err := srv.Send(&resp); err != nil {
log.Printf("send error %v", err)
}
*/
//log.Printf("send new max=%d", max)
}
}
func main() {
// create listiner
lis, err := net.Listen("tcp", ":50005")
if err != nil {
log.Fatalf("failed to listen: %v", err)
}
// create grpc server
s := grpc.NewServer()
pb.RegisterMathServer(s, server{})
// and start...
if err := s.Serve(lis); err != nil {
log.Fatalf("failed to serve: %v", err)
}
}

Read exactly n bytes unless EOF?

I'm using a function that returns an io.Reader to download a file from the Internet.
I want to process the file in exactly 2048 chunks until it's no longer possible because of EOF.
The io.ReadFull function is almost what I want:
buf := make([]byte, 2048)
for {
if _, err := io.ReadFull(reader, buf); err == io.EOF {
return io.ErrUnexpectedEOF
} else if err != nil {
return err
}
// Do processing on buf
}
The problem with this is that not all files are a multiple of 2048 bytes, so the last chunk may only be e.g. 500 bytes, io.ReadFull will therefore return ErrUnexpectedEOF and the last chunk is discarded.
A function name to summarize what I want could be io.ReadFullUnlessLastChunk, so ErrUnexpectedEOF is not returned if the reason that buf cannot be filled with 2048 bytes, is that the file is EOF after e.g. 500 bytes. However, in any other case ErrUnexpectedEOF should be returned as a problem has occured.
What could I do to accomplish this?
Another problem is that reading only 2048 bytes at the time directly from the network seems to have much overhead, if I could get 256 KB from network into a buffer, and then take the 2048 bytes I need from that buffer instead, that would be better.
For example,
package main
import (
"bufio"
"fmt"
"io"
"os"
)
func readChunks(r io.Reader) error {
if _, ok := r.(*bufio.Reader); !ok {
r = bufio.NewReader(r)
}
buf := make([]byte, 0, 2048)
for {
n, err := io.ReadFull(r, buf[:cap(buf)])
buf = buf[:n]
if err != nil {
if err == io.EOF {
break
}
if err != io.ErrUnexpectedEOF {
return err
}
}
// Process buf
fmt.Println(len(buf))
}
return nil
}
func main() {
fName := `test.file`
f, err := os.Open(fName)
if err != nil {
fmt.Println(err)
return
}
defer f.Close()
err = readChunks(f)
if err != nil {
fmt.Println(err)
return
}
}

Why does this Golang app use more memory the longer it runs?

I made this to monitor a few websites and notify me if one of them goes down. I'm testing it on just two urls. When it starts it uses about 5mb of memory (I checked with systemctl status monitor). After 40 minutes, it's using 7.4mb. After 8 hours, it uses over 50mb of memory. Why is it doing this? Is this called a memory leak?
package main
import (
"fmt"
"io/ioutil"
"net/http"
"os"
"sync"
"time"
"monitor/utils/slack"
"gopkg.in/yaml.v2"
)
var config struct {
Frequency int
Urls []string
}
type statusType struct {
values map[string]int
mux sync.Mutex
}
var status = statusType{values: make(map[string]int)}
func (s *statusType) set(url string, value int) {
s.mux.Lock()
s.values[url] = value
s.mux.Unlock()
}
func init() {
data, err := ioutil.ReadFile("config.yaml")
if err != nil {
fmt.Printf("Invalid config: %s\n", err)
os.Exit(0)
}
err = yaml.Unmarshal(data, &config)
if err != nil {
fmt.Printf("Invalid config: %s\n", err)
os.Exit(0)
}
for _, url := range config.Urls {
status.set(url, 200)
}
}
func main() {
ticker := time.NewTicker(time.Duration(config.Frequency) * time.Second)
for _ = range ticker.C {
for _, url := range config.Urls {
go check(url)
}
}
}
func check(url string) {
res, err := http.Get(url)
if err != nil {
res = &http.Response{StatusCode: 500}
}
// the memory problem occurs when this condition is never satisfied, so I didn't post the slack package.
if res.StatusCode != status.values[url] {
status.set(url, res.StatusCode)
err := slack.Alert(url, res.StatusCode)
if err != nil {
fmt.Println(err)
}
}
}
If this belongs in Code Review then I will put it there.
Yes, this is a memory leak. One obvious source I can spot is that you're not closing the response bodies from your requests:
func check(url string) {
res, err := http.Get(url)
if err != nil {
res = &http.Response{StatusCode: 500}
} else {
defer res.Body.Close() // You need to close the response body!
}
if res.StatusCode != status.values[url] {
status.set(url, res.StatusCode)
err := slack.Alert(url, res.StatusCode)
if err != nil {
fmt.Println(err)
}
}
}
Better still, so that Go can use keepalive, you want to read the full body and close it:
defer func() {
io.Copy(ioutil.Discard, res.Body)
res.Body.Close()
}()
You can further analyse where memory usage is coming from by profiling your application with pprof. There's a good rundown on the Go blog and a web search will turn up many more articles on the topic.

Go file downloader

I have the following code which is suppose to download file by splitting it into multiple parts. But right now it only works on images, when I try downloading other files like tar files the output is an invalid file.
UPDATED:
Used os.WriteAt instead of os.Write and removed os.O_APPEND file mode.
package main
import (
"errors"
"flag"
"fmt"
"io/ioutil"
"log"
"net/http"
"os"
"strconv"
)
var file_url string
var workers int
var filename string
func init() {
flag.StringVar(&file_url, "url", "", "URL of the file to download")
flag.StringVar(&filename, "filename", "", "Name of downloaded file")
flag.IntVar(&workers, "workers", 2, "Number of download workers")
}
func get_headers(url string) (map[string]string, error) {
headers := make(map[string]string)
resp, err := http.Head(url)
if err != nil {
return headers, err
}
if resp.StatusCode != 200 {
return headers, errors.New(resp.Status)
}
for key, val := range resp.Header {
headers[key] = val[0]
}
return headers, err
}
func download_chunk(url string, out string, start int, stop int) {
client := new(http.Client)
req, _ := http.NewRequest("GET", url, nil)
req.Header.Add("Range", fmt.Sprintf("bytes=%d-%d", start, stop))
resp, _ := client.Do(req)
defer resp.Body.Close()
body, err := ioutil.ReadAll(resp.Body)
if err != nil {
log.Fatalln(err)
return
}
file, err := os.OpenFile(out, os.O_WRONLY, 0600)
if err != nil {
if file, err = os.Create(out); err != nil {
log.Fatalln(err)
return
}
}
defer file.Close()
if _, err := file.WriteAt(body, int64(start)); err != nil {
log.Fatalln(err)
return
}
fmt.Println(fmt.Sprintf("Range %d-%d: %d", start, stop, resp.ContentLength))
}
func main() {
flag.Parse()
headers, err := get_headers(file_url)
if err != nil {
fmt.Println(err)
} else {
length, _ := strconv.Atoi(headers["Content-Length"])
bytes_chunk := length / workers
fmt.Println("file length: ", length)
for i := 0; i < workers; i++ {
start := i * bytes_chunk
stop := start + (bytes_chunk - 1)
go download_chunk(file_url, filename, start, stop)
}
var input string
fmt.Scanln(&input)
}
}
Basically, it just reads the length of the file, divides it with the number of workers then each file downloads using HTTP's Range header, after downloading it seeks to a position in the file where that chunk is written.
If you really ignore many errors like seen above then your code is not supposed to work reliably for any file type.
However, I guess I can see on problem in your code. I think that mixing O_APPEND and seek is probably a mistake (Seek should be ignored with this mode). I suggest to use (*os.File).WriteAt instead.
IIRC, O_APPEND forces any write to happen at the [current] end of file. However, your download_chunk function instances for file parts can be executing in unpredictable order, thus "reordering" the file parts. The result is then a corrupted file.
1.the sequence of the go routine is not sure。
eg. the execute result maybe as follows:
...
file length:20902
Range 10451-20901:10451
Range 0-10450:10451
...
so the chunks can't just append.
2.when write chunk datas must have a sys.Mutex
(my english is poor,please forget it)

Resources