Go networking best practices for UDP server - go

I'm writing a DNS server in Go to learn how DNS works and how to write a real, potentially useful program in Go.
One of the reasons I chose Go was for its Go routines instead of threads.
Currently, my DNS server doesn't really do much, it sends the same response for every query it receives.
One thing that confuses me is that my DNS server, even with its Go routines and even though it's small and doesn't do much is 10x slower than BIND.
I ran a program called dnsblast to send lots of DNS queries at once and these are my results:
BIND
Sending 10,000 queries = 39,000 pps
My server
Sending 10,000 queries = 3,000 pps
Also, as I increase the number of packets I send per second, the server responds to less and less of the queries.
For example:
When sending 1,000 queries, the server responds to 100%, but when sending 10,000 queries the server responds to just 66%.
Is there anything to do with networking in Go that could be limiting the performance of my DNS server? Are there settings in Go I can configure?
Currently, the main program looks like this:
func main() {
serv, err := net.ListenPacket("udp", ":53")
if err != nil {
panic(err)
}
defer serv.Close()
for {
tmp := make([]byte, 512)
num_bytes, addr, _ := serv.ReadFrom(tmp)
go handleQuery(serv, bytes.NewBuffer(tmp[:num_bytes]), addr)
}
}
This seems to be a pretty standard way of creating a server in Go from what I've read online.
Listen for packets
Save packet data in a buffer
Process each packet using a separate Go routine.
Are there any best practices to improve my server's throughput or does the server look okay and it's just my partial DNS implementation is slow?
Thanks!

Unfortunately Go's UDP support is suboptimal. The implementation allocates memory. What helps a bit is to run the loop in parallel. Increased buffer sizes on OS level limit package loss.

Related

How to disable HTTP/2 in Golang's standard http.Client, or avoid tons of INTERNAL_ERRORs from Stream ID=N?

I want to send a fairly large number (several thousand) of HTTP requests ASAP, without putting too much load on the CDN (has an https: URL, and ALPN selects HTTP/2 during the TLS phase) So, staggering (i.e. time shifting) the requests is an option, but I don't want to wait TOO long (minimize errors AND total round-trip time) and I'm not being rate limited by the server at the scale I'm operating yet.
The problem I'm seeing originates from h2_bundle.go and specifically in either writeFrame or onWriteTimeout when about 500-1k requests are in-flight, which manifests during io.Copy(fileWriter, response.Body) as:
http2ErrCodeInternal = "INTERNAL_ERROR" // also IDs a Stream number
// ^ then io.Copy observes the reader encountering "unexpected EOF"
I'm fine sticking with HTTP/1.x for now, but I would love an explanation re: what's going on. Clearly, people DO use Go to make a lot of round-trips happen per unit time, but most advice I can find is from the perspective of the server, not clients. I've already tried specifying all the relevant time-outs I can find, and cranking up connection pool max sizes.
Here's my best guess at what's going on:
The rate of requests is overwhelming a queue of connections or some other resource in the HTTP/2 internals. Maybe this is fix-able in general or possible to fine tune for my specific use case, but the fastest way to overcome this kind of problem is to rely on HTTP/1.1 entirely, as well as implement limited retry + rate limiting mechanisms.
Aside, I am now using a single retry and rate.Limiter from https://pkg.go.dev/golang.org/x/time/rate#Limiter in addition to the "ugly hack" of disabled HTTP/2, so that outbound requests are able send an initial "burst" of M requests, and then "leak more gradually" at a given rate of N/sec. Ultimately, the errors from h2_bundle.go are just too ugly for end-users to parse. An expected/unexpected EOF should result in the client "giving it another try" or two, which is more pragmatic anyway.
As per the docs, the easiest way to disable h2 in Go's http.Client at runtime is env GODEBUG=http2client=0 ... which I can also achieve in other ways as well. Especially important to understand is that the "next protocol" is pre-negotiated "early" during TLS, so Go's http.Transport must manage that configuration along with a cache/memo to provide its functionality in a performant way. Therefore, use your own httpClient to .Do(req) (and don't forget to give your Request a context.Context so that it's easy to cancel) using a custom http.RoundTripper for Transport. Here's some example code:
type forwardRoundTripper struct {
rt http.RoundTripper
}
func (my *forwardRoundTripper) RoundTrip(r *http.Request) (*http.Response, error) {
return my.rt.RoundTrip(r) // adjust URLs, or transport as necessary per-request
}
// httpTransport is the http.RoundTripper given to a Client as Transport
// (don't forget to set up a reasonable Timeout and other behavior as desired)
var httpTransport = &customRoundTripper{rt: http.DefaultTransport}
func h2Disabled(rt *http.Transport) *http.Transport {
log.Println("--- only using HTTP/1.x ...")
rt.ForceAttemptHTTP2 = false // not good enough
// at least one of the following is ALSO required:
rt.TLSClientConfig.NextProtos = []string{"http/1.1"}
// need to Clone() or replace the TLSClientConfig if a request already occurred
// - Why? Because the first time the transport is used, it caches certain structures.
// (if you do this replacement, don't forget to set a minimum TLS version)
rt.TLSHandshakeTimeout = longTimeout // not related to h2, but necessary for stability
rt.TLSNextProto = make(map[string]func(authority string, c *tls.Conn) http.RoundTripper)
// ^ some sources seem to think this is necessary, but not in all cases
// (it WILL be required if an "h2" key is already present in this map)
return rt
}
func init() {
h2ok := ...
if t, ok := httpTransport.rt.(*http.Transport); ok && !h2ok {
httpTransport.rt = h2Disabled(t.Clone())
}
// tweak rate limits here
}
This lets me make the volume of requests that I need to OR get more-reasonable errors in edge cases.

HTTP request without last byte?

I'm looking to test load my app in Golang. I haven't found this functionality in already existing tools, I tried all of them. Here is what I'm trying to do:
Create 100 exactly the same HTTP requests (as goroutines)
From each goroutine connect to HTTP server and send the body of the response (which can be up to few MB), except the last byte
Synchronize between all goroutines - pretty much wait until all threads are at the point where there is only 1 byte left to send
Based on input from Terminal (for example, when I hit Enter), send the remaining byte, so I can test how the server handles this type of load - 100 large requests at the same time
I looked at the docs of the standard HTTP library, and I don't think it's possible wit standard tools. I'm looking to rewrite some parts of HTTP library to have this support, or maybe even use the plain old OS sockets to perform this type of functionality. It will require a lot of time just to implement that.
I'm wondering if I'm missing something here, some kind of HTTP library feature that allows to do that easily? Appreiate any suggestion that might work without a full rewrite.
To my understanding there is no way to send part of a http request then the rest at the end, but I believe I can help with the concurrency part.
Two variables here, threads (mind the python terminology) = number of simultaneous goroutines, number = number of times to
func main() {
fmt.Println("Input # of times to run")
var number int
fmt.Scan(&number)
fmt.Println("Input # of threads")
var threads int
fmt.Scan(&threads)
swg := sizedwaitgroup.New(threads)
for i := 0; i < number; i++ {
swg.Add()
go func(i int) {
defer swg.Done()//Ensure to put your request after this line
//Do request
}(i)
}
swg.Wait()
}
This code uses the github.com/remeh/sizedwaitgroup library
Bear in mind, if one of the first requests is completed, it will start another without waiting for others to finish.
Here's it in practice:
https://codeshare.io/3A3dj4
https://pastebin.com/DP1sn1m4
Edit:
If you further and manage to send all but the last byte of the http request, you'll be wanted to use channels to communicate when to send the last byte, I'm not too good at them but this guide is great:
https://go.dev/blog/pipelines

Relay data between two different tcp clients in golang

I'm writing a TCP server which simultaneously accepts multiple connections from mobile devices and some WiFi devices (IOT). The connections needs to be maintained once established, with the 30 seconds timeout if there is no heartbeat received. So it is something like the following:
// clientsMap map[string] conn
func someFunction() {
conn, err := s.listener.Accept()
// I store the conn in clientsMap
// so I can access it, for brevity not
// shown here, then:
go serve(connn)
}
func serve(conn net.Conn) {
timeoutDuration := 30 * time.Second
conn.SetReadDeadline(time.Now().Add(timeoutDuration))
for {
msgBuffer := make([]byte, 2048)
msgBufferLen, err := conn.Read(msgBuffer)
// do something with the stuff
}
}
So there is one goroutine for each client. And each client, once connected to the server, is pending on the read. The server then processes the stuff read.
The problem is that I sometimes need to read things off one client, and then pass data to another (Between a mobile device and a WiFi device). I have stored the connections in clientsMap. So I can always access that. But since each client is handled by one goroutine, shall I be passing the data from one client to another by using a channel? But if the goroutine is blocked waiting for a pending read, how do I make it also wait for data from a channel? Or shall I just obtain the connection for the other party from the clientsMap and write to it?
The documentation for net.Conn clearly states:
Multiple goroutines may invoke methods on a Conn simultaneously.
So yes, it is okay to simply Write to the connections. You should take care to issue a single Write call per message you want to send. If you call Write more than once you risk interleaving messages from different mobile devices. This implies calling Write directly and not via some other API (in other words don't wrap the connection). For instance, the following would not be safe:
json.NewEncoder(conn).Encode(myValue) // use json.Marshal(myValue) instead
io.Copy(conn, src) // use io.ReadAll(src) instead

Unbuffered bidirectional data streaming with gRPC: how to get the size of the client-side buffer?

I am streaming data from a server to a client and I would like the server not to read and send more data than the client's buffer size.
Given:
service StreamService {
rpc Stream(stream Buffer) returns (stream Buffer);
}
message Buffer {
bytes data = 1;
}
My client's program basically looks like:
func ReadFromServer(stream StreamService_StreamClient, buf []byte) (n int, err error) {
// I actually don't need more than len(buf)...
// How could I send len(buf) while stream is bidirectional...?
buffer, err := stream.Recv()
if err != nil {
return 0, err
}
n = copy(buf, buffer.Data)
// buf could also be smaller than buffer.Data...
return n, nil
}
So how could I send len(buf) while the RPC's stream is bidirectional, i.e. the send direction is used by another independent stream of data? Note that I don't use client or server-side buffering to avoid loosing data when one of them is terminated (my data-source is an I/O).
gRPC provides no mechanism for this. It only provides push-back when a sender needs to slow down. But there will still be buffering happening internally and that is not exposed because gRPC is message-based, not byte-based.
There's really only two options in your case:
Server chunks responses arbitrarily. The client Recv()s when necessary and any extra is manually managed for later.
The client sends a request asking for a precise amount to be returned, and then waits for the response.
Note that I don't use client or server-side buffering to avoid loosing data when one of them is terminated (my data-source is an I/O).
This isn't how it works. When you do a Send() there is no guarantee it is received when the call returns. When you do a Recv() there is no guarantee that the message was received after the recv call (it could have been received before the call). There is buffering going on, period.
I think there's no built-in solution for that. The use-case looks little bit weird: why server must care about client's state at all? If it really needs to, you should extend your bidirectional stream: the client must request byte slices of a particular size (according to the own buffer size and other factors).
By the way, you may find useful message size limit settings GRPC client and server:
https://godoc.org/google.golang.org/grpc#MaxMsgSize https://godoc.org/google.golang.org/grpc#WithMaxMsgSize

Fastest way to send multiple HTTP requests

I have an array of about 2000 user objects (maps) that I need to call an API to get the user detail -> process the response -> update my local DB as soon as possible. I used Go's waitgroup and goroutine to implement the concurrent request sending method, however to call 2000 requests it would take about 24 seconds on my 2014 Macbook Pro. Is there anyway to make it faster?
var wg sync.WaitGroup
json.Unmarshal(responseData, &users)
wg.Add(len(users))
for i:= 0; i<len(users); i++ {
go func(userid string){
url := "https://www.example.com/user_detail/"+ userid
response, _ := http.Get(url)
defer response.Body.Close()
data, _ := ioutil.ReadAll(response.Body)
wg.Done()
}(users[i]["userid"])
}
wg.Wait()
This sort of situation is very difficult to address in general. Performance at this level depends very much on the specifics of your server, API, network, etc. But here are a few suggestions to get you going:
Try limiting the number of concurrent connections.
As mentioned by #JimB in comments, trying to handle 2000 concurrent connections is likely inefficient, for both the server and client. Try limiting to 10, 20, 50, 100 simultaneous connections. Benchmark each value, and tweak accordingly until you get the best performance.
On the client side, this may allow re-using connections (thus reducing the average per-request overhead), which is currently impossible, since you're initiating all 2000 connections before any of them complete.
If the server supports HTTP/2, make sure you're using HTTP/2, which can be more efficient (with multiple requests--so this really depends on #1 above, too). See the documentation about debugging HTTP/2.
If the API supports bulk requests, take advantage of this, and request multiple users in a single request.

Resources