I am connecting my go program to redis using library redigo. When i run one request I get correct results. But on load testing, using apache benchmark tool, It works when:
ab -n 1000 -k -c 10 -p post.txt -T application/x-www-form-urlencoded http://localhost:8084/abcd
However when request is:
ab -n 1000 -k -c 15 -p post.txt -T application/x-www-form-urlencoded http://localhost:8084/abcd
I am getting error:
panic: dial tcp :6379: too many open files
This is my code:
func newPool() *redis.Pool {
return &redis.Pool{
MaxIdle: 80, // max number of connections
Dial: func() (redis.Conn, error) {
c, err := redis.Dial("tcp", ":6379")
if err != nil {
panic(err.Error())
}
return c, err
},
// If Wait is true and the pool is at the MaxActive limit, then Get() waits
// for a connection to be returned to the pool before returning
Wait: true,
}
}
var pool = newPool()
func checkError(err error) {
if err != nil {
log.Fatal(err)
}
}
func func1(pool *redis.Pool, id int) string {
c := pool.Get()
defer c.Close()
m, err := redis.String(c.Do("HGET", "key", id))
checkError(err)
return m
}
func func2(pool *redis.Pool, string_ids []string) chan string {
c := make(chan string)
var ids []int
var temp int
for _, v := range string_ids {
temp, _ = strconv.Atoi(v)
ids = append(ids, temp)
}
go func() {
var wg sync.WaitGroup
wg.Add(len(ids))
for _, v := range ids {
go func(id int) {
defer wg.Done()
c <- func1(pool, id)
}(v)
}
wg.Wait()
close(c)
}()
return c
}
func getReq(w http.ResponseWriter, req *http.Request) {
err := req.ParseForm()
checkError(err)
ids := req.Form["ids[]"]
for v := range func2(pool, ids) {
fmt.Println(v)
}
}
func main() {
http.HandleFunc("/abcd", getReq)
log.Fatal(http.ListenAndServe(":8084", nil))
}
How to handle atleast 40 concurrent request using apache benchmark tool.
Note: I haven't changed anything in my redis conf file
I am getting following response when running apache benchmark tool. Only 15 request are completed.
$ ab -n 1000 -k -c 15 -p post.txt -T application/x-www-form-urlencoded http://localhost:8084/abcd
This is ApacheBench, Version 2.3 <$Revision: 1528965 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/
Benchmarking localhost (be patient)
apr_socket_recv: Connection refused (111)
Total of 15 requests completed
To fix the issue immediately, set MaxActive on your redis.Pool, which you mention in the comments but don't set yourself.
Fundamentally though, you should not be dispatching goroutines for every id lookup. Your maximum possible concurrency would then be (number of client connections) x (number of ids in the request), each of which can open a new redis connection. It would be far faster and more efficient to have a single redis connection read each of the ids serially. There's no need for any of the extra concurrency you have here, do it all serially from the handler, and don't convert the strings to ints when redis only operates on string keys to begin with.
func getReq(w http.ResponseWriter, req *http.Request) {
err := req.ParseForm()
checkError(err)
c := pool.Get()
defer c.Close()
for _, id := range req.Form["ids[]"] {
m, err := redis.String(c.Do("HGET", "key", id))
checkError(err)
fmt.Println(id)
}
}
If you want to optimize this further, you can use pipelines to reduce the round trips to the redis server.
for _, id := range req.Form["ids[]"] {
c.Send("HGET", "key", id))
}
c.Flush()
values, err := redis.Strings(c.Receive())
checkError(err)
...
Related
Problem
I have written a TCP echo server in Go and I am trying to write/read as often as I can in 10s to measure how much data got transfered in this time. Weirdly, the value is way too high and does not depend on the length of the bytearray which I am transfering (but it should!). It is always around 600k connections in this 10 seconds (The length of the "result" Array depicts how much connections were made in the 10s). As soon as I add let's say a print statement to the server and the values get processed, I get more realistic values that depend on the length of the bytearray as a result.
Why doesn't the length of the bytearray matter in the first case?
Code
Server
package main
import (
"fmt"
"log"
"net"
)
func main() {
tcpAddr, err := net.ResolveTCPAddr("tcp", fmt.Sprintf("127.0.0.1:8888"))
checkError(err)
ln, err := net.ListenTCP("tcp", tcpAddr)
checkError(err)
for {
conn, err := ln.Accept()
checkError(err)
go handleConnection(conn)
}
}
func checkError(err error) {
if err != nil {
log.Fatal(err)
}
}
func handleConnection(conn net.Conn) {
var input [1000000]byte
for {
n, err := conn.Read(input[0:])
checkError(err)
//fmt.Println(input[0:n])
_, err = conn.Write(input[0:n])
checkError(err)
}
}
Client
package main
import (
"fmt"
"log"
"net"
"time"
)
var (
result []int
elapsed time.Duration
)
func main() {
input := make([]byte, 1000)
tcpAddr, err := net.ResolveTCPAddr("tcp", "127.0.0.1:8888")
checkError(err)
conn, err := net.DialTCP("tcp", nil, tcpAddr)
checkError(err)
for start := time.Now(); time.Since(start) < time.Second*time.Duration(10); {
startTimer := time.Now()
_, err = conn.Write(input)
checkError(err)
_, err := conn.Read(input[0:])
checkError(err)
elapsed = time.Since(startTimer)
result = append(result, int(elapsed))
}
fmt.Println(fmt.Sprintf("result: %v", len(result)))
}
func checkError(err error) {
if err != nil {
log.Fatal(err)
}
}
Read in the client loop is not guaranteed to read all of the data sent in the previous call to Write.
When input is small enough to be transmitted in a single packet on the network, Read in the client returns all of the data in the previous call to Write in the client. In this mode, the application measures the time to execute request/response pairs.
For larger sizes of input, read on the client can fall behind what the client is writing. When this happens, the calls to Read complete faster because the calls return data from an earlier call to Write. The application is pipelining in this mode. The throughput for pipelining is higher than the throughput for request/response pairs. The client will not read all data in this mode, but the timing impact of that is not significant.
Use the following code to time request/response pairs for arbitrary sizes of input.
for start := time.Now(); time.Since(start) < time.Second*time.Duration(10); {
startTimer := time.Now()
_, err = conn.Write(input)
checkError(err)
_, err := io.ReadFull(conn, input) // <-- read all of the data
checkError(err)
elapsed = time.Since(startTimer)
result = append(result, int(elapsed))
}
To measure full-on pipelining, modify the client to read and write from different goroutines. An example follows.
go func() {
for start := time.Now(); time.Since(start) < time.Second*time.Duration(10); {
_, err = conn.Write(input)
checkError(err)
}
conn.CloseWrite() // tell server that we are done sending data
}()
start := time.Now()
output := make([]byte, 4096)
for {
_, err := conn.Read(output)
if err != nil {
if err == io.EOF {
break
}
checkError(err)
}
}
fmt.Println(time.Since(start))
I am making a _golang git bruteforcer. It's acting a bit weird, I guess it's got something to do with concurrency.
sync.WaitGroup
Here's the code : https://dpaste.org/vO7y
package main
import { <snipped for brevity> }
// ReadFile : Reads File and returns it's contents
func ReadFile(fileName string) []string { <snipped for brevity> }
func joinString(strs ...string) string { <snipped for brevity> }
// MakeRequest : Makes requests concurrently
func MakeRequest(client *http.Client, url string, useragent string, ch chan<- string, wg *sync.WaitGroup) {
defer wg.Done()
// start := time.Now()
request, err := http.NewRequest("GET", url, nil)
if err != nil {
fmt.Println(err)
return
}
request.Header.Set("User-Agent", useragent)
response, err := client.Do(request)
if err != nil {
return
}
// secs := time.Since(start).Seconds()
if response.StatusCode < 400 {
// fmt.Printf("Time elapsed %f", secs)
bodyBytes, err := ioutil.ReadAll(response.Body)
if err != nil {
log.Fatal(err)
}
defer response.Body.Close()
bodyString := string(bodyBytes)
notGit, err := regexp.MatchString("<html>", strings.ToLower(bodyString))
if !notGit && len(bodyString) > 0 { // empty pages and html pages shouldn't be included
fmt.Println(bodyString)
ch <- fmt.Sprintf(" %s ", Green(url))
}
}
}
func main() {
start := time.Now()
useragent := "Mozilla/10.0 (Windows NT 10.0) AppleWebKit/538.36 (KHTML, like Gecko) Chrome/69.420 Safari/537.36"
gitEndpoint := []string{"/.git/", "/.git/HEAD", "/.gitignore", "/.git/description", "/.git/index"}
timeout := 10 * time.Second
var tr = &http.Transport{
MaxIdleConns: 30,
IdleConnTimeout: time.Second,
DisableKeepAlives: true,
TLSClientConfig: &tls.Config{InsecureSkipVerify: true},
DialContext: (&net.Dialer{
Timeout: timeout,
KeepAlive: time.Second,
}).DialContext,
}
re := func(req *http.Request, via []*http.Request) error {
return http.ErrUseLastResponse
}
client := &http.Client{
Transport: tr,
CheckRedirect: re,
Timeout: timeout,
}
output := ReadFile(os.Args[1])
// start := time.Now()
ch := make(chan string)
var wg sync.WaitGroup
for _, url := range output {
for _, endpoint := range gitEndpoint {
wg.Add(1)
go MakeRequest(client, "https://"+url+endpoint, useragent, ch, &wg)
}
}
go func() {
wg.Wait()
close(ch)
}()
f, err := os.OpenFile("git_finder.txt", os.O_APPEND|os.O_CREATE|os.O_WRONLY, 0644)
for val := range ch {
if err != nil {
fmt.Println(Red(err))
}
_, err = fmt.Fprintln(f, val)
fmt.Println(val)
}
f.Close()
fmt.Printf("Total time taken %.2fs elapsed\n", time.Since(start).Seconds())
}
Working :
It reads the urls from a file and checks for /.git, /.git/HEAD, /.git/description, /.git/index on the webserver.
Problem :
If I change the http.Client timeout to 2 seconds it will finish in 2 seconds, if it's 50 seconds it will wait till 50 seconds, it doesn't matter if the input file contains 10 urls or 500 urls.
My Understanding is if there's more number of urls it will wait till the timeout of last URL that's passed with the goroutine.
Update 1 :
As adrian mentioned in the comments, it doesn't look like a concurrency problem, that's what one of the main issue with this is that I can't place a finger on what the exact problem is here
In your code, you are reading URLs from a file, then firing requests in parallel to all those URLs, then waiting for all the parallel requests to finish.
So this actually makes sense and would not indicate an issue:
If I change the http.Client timeout to 2 seconds it will finish in 2 seconds, if it's 50 seconds it will wait till 50 seconds, it doesn't matter if the input file contains 10 urls or 500 urls.
Let's say your file has 500 URLs.
You fire the 500 requests in parallel... then wait for all of them to finish (remember, they are all executing in parallel). How long would that take?
In the worst case (all of the requests timeout at 50 seconds), it will just take 50 seconds in total (since they are all waiting for those 50 seconds in parallel).
In the best case (all requests go through successfully with no timeouts) it should take a few seconds.
In the average case you are probably seeing (a few timeout at 50 seconds) then it takes 50 seconds (you will be waiting for those few requests to wait those 50 seconds in parallel as in the worst case).
The aws s3 sync command in the CLI can download a large collection of files very quickly, and I can not achieve the same performance with the AWS Go SDK. I have millions of files in the bucket so this is critical to me. I need to use the list pages command as well so that I can add a prefix which is not supported well by the sync CLI command.
I have tried using multiple goroutines (10 up to 1000) to make requests to the server, but the time is just so much slower compared to the CLI. It takes about 100 ms per file to run the Go GetObject function which is unacceptable for the number of files that I have. I know that the AWS CLI also uses the Python SDK in the backend, so how does it have so much better performance (I tried my script in boto as well as Go).
I am using ListObjectsV2Pages and GetObject. My region is the same as the S3 server's.
logMtx := &sync.Mutex{}
logBuf := bytes.NewBuffer(make([]byte, 0, 100000000))
err = s3c.ListObjectsV2Pages(
&s3.ListObjectsV2Input{
Bucket: bucket,
Prefix: aws.String("2019-07-21-01"),
MaxKeys: aws.Int64(1000),
},
func(page *s3.ListObjectsV2Output, lastPage bool) bool {
fmt.Println("Received", len(page.Contents), "objects in page")
worker := make(chan bool, 10)
for i := 0; i < cap(worker); i++ {
worker <- true
}
wg := &sync.WaitGroup{}
wg.Add(len(page.Contents))
objIdx := 0
objIdxMtx := sync.Mutex{}
for {
<-worker
objIdxMtx.Lock()
if objIdx == len(page.Contents) {
break
}
go func(idx int, obj *s3.Object) {
gs := time.Now()
resp, err := s3c.GetObject(&s3.GetObjectInput{
Bucket: bucket,
Key: obj.Key,
})
check(err)
fmt.Println("Get: ", time.Since(gs))
rs := time.Now()
logMtx.Lock()
_, err = logBuf.ReadFrom(resp.Body)
check(err)
logMtx.Unlock()
fmt.Println("Read: ", time.Since(rs))
err = resp.Body.Close()
check(err)
worker <- true
wg.Done()
}(objIdx, page.Contents[objIdx])
objIdx += 1
objIdxMtx.Unlock()
}
fmt.Println("ok")
wg.Wait()
return true
},
)
check(err)
Many results look like:
Get: 153.380727ms
Read: 51.562µs
Have you tried using https://docs.aws.amazon.com/sdk-for-go/api/service/s3/s3manager/?
iter := new(s3manager.DownloadObjectsIterator)
var files []*os.File
defer func() {
for _, f := range files {
f.Close()
}
}()
err := client.ListObjectsV2PagesWithContext(ctx, &s3.ListObjectsV2Input{
Bucket: aws.String(bucket),
Prefix: aws.String(prefix),
}, func(output *s3.ListObjectsV2Output, last bool) bool {
for _, object := range output.Contents {
nm := filepath.Join(dstdir, *object.Key)
err := os.MkdirAll(filepath.Dir(nm), 0755)
if err != nil {
panic(err)
}
f, err := os.Create(nm)
if err != nil {
panic(err)
}
log.Println("downloading", *object.Key, "to", nm)
iter.Objects = append(iter.Objects, s3manager.BatchDownloadObject{
Object: &s3.GetObjectInput{
Bucket: aws.String(bucket),
Key: object.Key,
},
Writer: f,
})
files = append(files, f)
}
return true
})
if err != nil {
panic(err)
}
downloader := s3manager.NewDownloader(s)
err = downloader.DownloadWithIterator(ctx, iter)
if err != nil {
panic(err)
}
I ended up settling for my script in the initial post. I tried 20 goroutines and that seemed to work pretty well. On my laptop, the initial script is definitely slower than the command line (i7 8-thread, 16 GB RAM, NVME) versus the CLI. However, on the EC2 instance, the difference was small enough that it was not worth my time to optimize it further. I used a c5.xlarge instance in the same region as the S3 server.
Apache Batch is a popular pressure measuring tool.
I write a tool with the same function in golang myself. Then I make some tests.
I find it works well.
But accidentally, I find a difference from my tool and Apache Batch:
When checking ESTABLISHED link number by the cmd netstat -na|grep ESTABLISHED|wc -l:
Use cmd ab -n 128000 -c 128 http://127.0.0.1:8000/, the cmd above returns a number near 128.
But use my own tool, the cmd returns near 256 (when set concurrency 128).
Why my own tool have twice the expected number of concurrencies?
My code:
func request(cli *http.Client, uri string) {
resp, err := cli.Get(uri)
if err != nil {
panic(err)
}
defer resp.Body.Close()
if resp.StatusCode != 200 {
panic("return" + strconv.Itoa(resp.StatusCode))
}
}
func workRoutine(ch chan string, wg *sync.WaitGroup) {
cli := &http.Client{}
for uri := range ch {
request(cli, uri)
}
wg.Done()
}
func main() {
rnum := 128
tnum := 128000
url := "http://127.0.0.1:8000/"
ch := make(chan string)
var wg sync.WaitGroup
go func() {
for i := 0; i < tnum; i++ {
ch <- url
}
close(ch)
}()
wg.Add(rnum)
for i := 0; i < rnum; i++ {
go workRoutine(ch, &wg)
}
stime := time.Now()
wg.Wait()
dtime := time.Now().Sub(stime)
fmt.Println("Timecost", dtime)
fmt.Println("Throughputs", float64(tnum)/dtime.Seconds())
}
I have a for-loop that calls a function runCommand() which runs a remote command on a switch and prints the output. The function is called in a goroutine on each iteration and I am using a sync.Waitgroup to synchronize the goroutines. Now, I need a way to capture the output and any errors of my runCommand() function into a channel. I have read many articles and watched a lot of videos on using channels with goroutines, but this is the first time I have ever written a concurrent application and I can't seem to wrap my head around the idea.
Basically, my program takes in a list of hostnames from the command line then asynchronously connects to each host, runs a configuration command on it, and prints the output. It is ok for my program to continue configuring the remaining hosts if one has an error.
How would I idiomatically send the output or error(s) of each call to runCommand() to a channel then receive the output or error(s) for printing?
Here is my code:
package main
import (
"fmt"
"golang.org/x/crypto/ssh"
"os"
"time"
"sync"
)
func main() {
hosts := os.Args[1:]
clientConf := configureClient("user", "password")
var wg sync.WaitGroup
for _, host := range hosts {
wg.Add(1)
go runCommand(host, &clientConf, &wg)
}
wg.Wait()
fmt.Println("Configuration complete!")
}
// Run a remote command
func runCommand(host string, config *ssh.ClientConfig, wg *sync.WaitGroup) {
defer wg.Done()
// Connect to the client
client, err := ssh.Dial("tcp", host+":22", config)
if err != nil {
fmt.Println(err)
return
}
defer client.Close()
// Create a session
session, err := client.NewSession()
if err != nil {
fmt.Println(err)
return
}
defer session.Close()
// Get the session output
output, err := session.Output("show lldp ne")
if err != nil {
fmt.Println(err)
return
}
fmt.Print(string(output))
fmt.Printf("Connection to %s closed.\n", host)
}
// Set up client configuration
func configureClient(user, password string) ssh.ClientConfig {
var sshConf ssh.Config
sshConf.SetDefaults()
// Append supported ciphers
sshConf.Ciphers = append(sshConf.Ciphers, "aes128-cbc", "aes256-cbc", "3des-cbc", "des-cbc", "aes192-cbc")
// Create client config
clientConf := &ssh.ClientConfig{
Config: sshConf,
User: user,
Auth: []ssh.AuthMethod{ssh.Password(password)},
HostKeyCallback: ssh.InsecureIgnoreHostKey(),
Timeout: time.Second * 5,
}
return *clientConf
}
EDIT: I got rid of the Waitgroup, as suggested, and now I need to keep track of which output belongs to which host by printing the hostname before printing its output and printing a Connection to <host> closed. message when the gorouttine completes. For example:
$ go run main.go host1[,host2[,...]]
Connecting to <host1>
[Output]
...
[Error]
Connection to <host1> closed.
Connecting to <host2>
...
Connection to <host2> closed.
Configuration complete!
I know the above won't necessarily process host1 and host2 in order, But I need to print the correct host value for the connecting and closing messages before and after the output/error(s), respectively. I tried defering printing the closing message in the runCommand() function, but the message is printed out before the output/error(s). And printing the closing message in the for-loop after each goroutine call doesn't work as expected either.
Updated code:
package main
import (
"fmt"
"golang.org/x/crypto/ssh"
"os"
"time"
)
type CmdResult struct {
Host string
Output string
Err error
}
func main() {
start := time.Now()
hosts := os.Args[1:]
clientConf := configureClient("user", "password")
results := make(chan CmdResult)
for _, host := range hosts {
go runCommand(host, &clientConf, results)
}
for i := 0; i < len(hosts); i++ {
output := <- results
fmt.Println(output.Host)
if output.Output != "" {
fmt.Printf("%s\n", output.Output)
}
if output.Err != nil {
fmt.Printf("Error: %v\n", output.Err)
}
}
fmt.Printf("Configuration complete! [%s]\n", time.Since(start).String())
}
// Run a remote command
func runCommand(host string, config *ssh.ClientConfig, ch chan CmdResult) {
// This is printing before the output/error(s).
// Does the same when moved to the bottom of this function.
defer fmt.Printf("Connection to %s closed.\n", host)
// Connect to the client
client, err := ssh.Dial("tcp", host+":22", config)
if err != nil {
ch <- CmdResult{host, "", err}
return
}
defer client.Close()
// Create a session
session, err := client.NewSession()
if err != nil {
ch <- CmdResult{host, "", err}
return
}
defer session.Close()
// Get the session output
output, err := session.Output("show lldp ne")
if err != nil {
ch <- CmdResult{host, "", err}
return
}
ch <- CmdResult{host, string(output), nil}
}
// Set up client configuration
func configureClient(user, password string) ssh.ClientConfig {
var sshConf ssh.Config
sshConf.SetDefaults()
// Append supported ciphers
sshConf.Ciphers = append(sshConf.Ciphers, "aes128-cbc", "aes256-cbc", "3des-cbc", "des-cbc", "aes192-cbc")
// Create client config
clientConf := &ssh.ClientConfig{
Config: sshConf,
User: user,
Auth: []ssh.AuthMethod{ssh.Password(password)},
HostKeyCallback: ssh.InsecureIgnoreHostKey(),
Timeout: time.Second * 5,
}
return *clientConf
}
If you use an unbuffered channel, you actually don't need the sync.WaitGroup, because you can call the receive operator on the channel once for every goroutine that will send on the channel. Each receive operation will block until a send statement is ready, resulting in the same behavior as a WaitGroup.
To make this happen, change runCommand to execute a send statement exactly once before the function exits, under all conditions.
First, create a type to send over the channel:
type CommandResult struct {
Output string
Err error
}
And edit your main() {...} to execute a receive operation on the channel the same number of times as the number of goroutines that will send to the channel:
func main() {
ch := make(chan CommandResult) // initialize an unbuffered channel
// rest of your setup
for _, host := range hosts {
go runCommand(host, &clientConf, ch) // pass in the channel
}
for x := 0; x < len(hosts); x++ {
fmt.Println(<-ch) // this will block until one is ready to send
}
And edit your runCommand function to accept the channel, remove references to WaitGroup, and execute the send exactly once under all conditions:
func runCommand(host string, config *ssh.ClientConfig, ch chan CommandResult) {
// do stuff that generates output, err; then when ready to exit function:
ch <- CommandResult{output, err}
}
EDIT: Question updated with stdout message order requirements
I'd like to get nicely formatted output that ignores the order of events
In this case, remove all print messages from runCommand, you're going to put all output into the element you're passing on the channel so it can be grouped together. Edit the CommandResult type to contain additional fields you want to organize, such as:
type CommandResult struct {
Host string
Output string
Err error
}
If you don't need to sort your results, you can just move on to printing the data received, e.g.
for x := 0; x < len(hosts); x++ {
r := <-ch
fmt.Printf("Host: %s----\nOutput: %s\n", r.Host, r.Output)
if r.Err != nil {
fmt.Printf("Error: %s\n", r.Err)
}
}
If you do need to sort your results, then in your main goroutine, add the elements received on the channel to a slice:
...
results := make([]CommandResult, 0, len(hosts))
for x := 0; x < len(hosts); x++ {
results = append(results, <-ch) // this will block until one is ready to send
}
Then you can use the sort package in the Go standard library to sort your results for printing. For example, you could sort them alphabetically by host. Or you could put the results into a map with host string as the key instead of a slice to allow you to print in the order of the original host list.