Rate limit function 40/second with "golang.org/x/time/rate" - go

I'm trying to use "golang.org/x/time/rate" to build a function which blocks until a token is free. Is this the correct way to use the library to rate limit blocks of code to 40 requests per second, with a bucket size of 2.
type Client struct {
limiter *rate.Limiter
ctx context.Context
}
func NewClient() *Client {
c :=Client{}
c.limiter = rate.NewLimiter(40, 2)
c.ctx = context.Background()
return &c
}
func (client *Client) RateLimitFunc() {
err := client.limiter.Wait(client.ctx)
if err != nil {
fmt.Printf("rate limit error: %v", err)
}
}
To rate limit a block of code I call
RateLimitFunc()
I don't want to use a ticker as I want the rate limiter to take into account the length of time the calling code runs for.

Reading the documentation here; link
You can see that the first parameter to NewLimiter is of type rate.Limit.
If you want 40 requests / second then that translates into a rate of 1 request every 25 ms.
You can create that by doing:
limiter := rate.NewLimiter(rate.Every(25 * time.Millisecond), 2)
Side note:
In generate, a context, ctx, should not be stored on a struct and should be per request. It would appear that Client will be reused, thus you could pass a context to the RateLimitFunc() or wherever appropriate instead of storing a single context on the client struct.

func RateLimit(ctx context.Context) {
limiter := rate.NewLimiter(40, 10)
err := limiter.Wait(ctx)
if err != nil {
// Log the error and return
}
// Do the actual work here
}
As Zak said, do not store Context inside a struct type according to the Go documentation context.

Related

Azure DevOps Rate Limit

Goal is to retrieve Azure DevOps users with their license and project entitlements in go.
I'm using Microsoft SDK.
Our Azure DevOps organization has more than 1500 users. So when I request each user entitlements, I have an error message due to Azure DevOps rate limit => 443: read: connection reset by peer
However, limiting top with 100/200 does the job, of course..
For a real solution, I though not using SDK anymore and using direct REST API calls with a custom http handler which would support rate limit. Or maybe using heimdall.
What is your advise for a good design guys ?
Thanks.
Here is code :
package main
import (
"context"
"fmt"
"github.com/microsoft/azure-devops-go-api/azuredevops"
"github.com/microsoft/azure-devops-go-api/azuredevops/memberentitlementmanagement"
"log"
"runtime"
"sync"
"time"
)
var organizationUrl = "https://dev.azure.com/xxx"
var personalAccessToken = "xxx"
type User struct {
DisplayName string
MailAddress string
PrincipalName string
LicenseDisplayName string
Status string
GroupAssignments string
ProjectEntitlements []string
LastAccessedDate azuredevops.Time
DateCreated azuredevops.Time
}
func init() {
runtime.GOMAXPROCS(runtime.NumCPU()) // Try to use all available CPUs.
}
func main() {
// Time measure
defer timeTrack(time.Now(), "Fetching Azure DevOps Users License and Projects")
// Compute context
fmt.Println("Version", runtime.Version())
fmt.Println("NumCPU", runtime.NumCPU())
fmt.Println("GOMAXPROCS", runtime.GOMAXPROCS(0))
fmt.Println("Starting concurrent calls...")
// Create a connection to your organization
connection := azuredevops.NewPatConnection(organizationUrl, personalAccessToken)
// New context
ctx := context.Background()
// Create a member client
memberClient, err := memberentitlementmanagement.NewClient(ctx, connection)
if err != nil {
log.Fatal(err)
}
// Request all users
top := 10000
skip := 0
filter := "Id"
response, err := memberClient.GetUserEntitlements(ctx, memberentitlementmanagement.GetUserEntitlementsArgs{
Top: &top,
Skip: &skip,
Filter: &filter,
SortOption: nil,
})
usersLen := len(*response.Members)
allUsers := make(chan User, usersLen)
var wg sync.WaitGroup
wg.Add(usersLen)
for _, user := range *response.Members {
go func(user memberentitlementmanagement.UserEntitlement) {
defer wg.Done()
var userEntitlement = memberentitlementmanagement.GetUserEntitlementArgs{UserId: user.Id}
account, err := memberClient.GetUserEntitlement(ctx, userEntitlement)
if err != nil {
log.Fatal(err)
}
var GroupAssignments string
var ProjectEntitlements []string
for _, assignment := range *account.GroupAssignments {
GroupAssignments = *assignment.Group.DisplayName
}
for _, userProject := range *account.ProjectEntitlements {
ProjectEntitlements = append(ProjectEntitlements, *userProject.ProjectRef.Name)
}
allUsers <- User{
DisplayName: *account.User.DisplayName,
MailAddress: *account.User.MailAddress,
PrincipalName: *account.User.PrincipalName,
LicenseDisplayName: *account.AccessLevel.LicenseDisplayName,
DateCreated: *account.DateCreated,
LastAccessedDate: *account.LastAccessedDate,
GroupAssignments: GroupAssignments,
ProjectEntitlements: ProjectEntitlements,
}
}(user)
}
wg.Wait()
close(allUsers)
for eachUser := range allUsers {
fmt.Println(eachUser)
}
}
func timeTrack(start time.Time, name string) {
elapsed := time.Since(start)
log.Printf("%s took %s", name, elapsed)
}
You can write custom version of GetUserEntitlement function.
https://github.com/microsoft/azure-devops-go-api/blob/dev/azuredevops/memberentitlementmanagement/client.go#L297-L314
It does not use any private members.
After getting http.Response you can check Retry-After header and delay next loop's iteration if it is present.
https://github.com/microsoft/azure-devops-go-api/blob/dev/azuredevops/memberentitlementmanagement/client.go#L306
P.S. Concurrency in your code is redundant and can be removed.
Update - explaining concurrency issue:
You cannot easily implement rate-limiting in concurrent code. It will be much simpler if you execute all requests sequentially and check Retry-After header in every response before moving to the next one.
With parallel execution: 1) you cannot rely on Retry-After header value because you may have another request executing at the same time returning a different value. 2) You cannot apply delay to other requests because some of them are already in progress.
For a real solution, I though not using SDK anymore and using direct
REST API calls with a custom http handler which would support rate
limit. Or maybe using heimdall.
Do you mean you want to avoid the Rate Limit by using the REST API directly?
If so, then your idea will not work.
Most REST APIs are accessible through client libraries, and if you're using SDK based on a REST API or other thing based on a REST API, it will of course hit a rate limit.
Since the rate limit is based on users, I suggest that you can complete your operations based on multiple users (provided that your request is not too much that the server blocking your IP).

Colly Max Depth and encoding/json - null

I have gone through the Go tour and I'm now going through some of the Colly tutorials. I understand the max depth and have been trying to implement it in a go program like so:
package main
import (
"encoding/json"
"log"
"net/http"
"github.com/gocolly/colly"
)
func ping(w http.ResponseWriter, r *http.Request) {
log.Println("Ping")
w.Write([]byte("ping"))
}
func getData(w http.ResponseWriter, r *http.Request) {
//Verify the param "URL" exists
URL := r.URL.Query().Get("url")
if URL == "" {
log.Println("missing URL argument")
return
}
log.Println("visiting", URL)
//Create a new collector which will be in charge of collect the data from HTML
c := colly.NewCollector(
// MaxDepth is 2, so only the links on the scraped page
// and links on those pages are visited
colly.MaxDepth(2),
colly.Async(true),
)
// Limit the maximum parallelism to 2
// This is necessary if the goroutines are dynamically
// created to control the limit of simultaneous requests.
//
// Parallelism can be controlled also by spawning fixed
// number of go routines.
c.Limit(&colly.LimitRule{DomainGlob: "*", Parallelism: 2})
//Slices to store the data
var response []string
//onHTML function allows the collector to use a callback function when the specific HTML tag is reached
//in this case whenever our collector finds an
//anchor tag with href it will call the anonymous function
// specified below which will get the info from the href and append it to our slice
c.OnHTML("a[href]", func(e *colly.HTMLElement) {
link := e.Request.AbsoluteURL(e.Attr("href"))
if link != "" {
response = append(response, link)
}
})
//Command to visit the website
c.Visit(URL)
// parse our response slice into JSON format
b, err := json.Marshal(response)
if err != nil {
log.Println("failed to serialize response:", err)
return
}
// Add some header and write the body for our endpoint
w.Header().Add("Content-Type", "application/json")
w.Write(b)
}
func main() {
addr := ":7171"
http.HandleFunc("/links", getData)
http.HandleFunc("/ping", ping)
log.Println("listening on", addr)
log.Fatal(http.ListenAndServe(addr, nil))
}
When doing so the response is null. Taking out the MaxDepth and Async lines results in the expected response (with only the top level links).
Any help is appreciated!
When running in Async mode c.Visit will return before the requests are actually made (see here); the correct process is demonstrated in the Parallel demo. In your case this means:
c.Visit(URL)
c.Wait()
Using async is not very useful when just making the one request. Check out the reddit example to see how this can be used to visit multiple URLs in one operation.
Note: You really should be checking the error values returned by these functions and adding an error handler is also good practice.

Testing once condition in Golang

I've got a module that relies on populating a cache with a call to an external service like so:
func (provider *Cache) GetItem(productId string, skuId string, itemType string) (*Item, error) {
// First, create the key we'll use to uniquely identify the item
key := fmt.Sprintf("%s:%s", productId, skuId)
// Now, attempt to get the concurrency control associated with the item key
// If we couldn't find it then create one and add it to the map
var once *sync.Once
if entry, ok := provider.lockMap.Load(key); ok {
once = entry.(*sync.Once)
} else {
once = &sync.Once{}
provider.lockMap.Store(key, once)
}
// Now, use the concurrency control to attempt to request the item
// but only once. Channel any errors that occur
cErr := make(chan error, 1)
once.Do(func() {
// We didn't find the item in the cache so we'll have to get it from the partner-center
item, err := provider.client.GetItem(productId, skuId)
if err != nil {
cErr <- err
return
}
// Add the item to the cache
provider.cache.Store(key, &item)
})
// Attempt to read an error from the channel; if we get one then return it
// Otherwise, pull the item out of the cache. We have to use the select here because this is
// the only way to attempt to read from a channel without it blocking
var sku interface{}
select {
case err, ok := <-cErr:
if ok {
return nil, err
}
default:
item, _ = provider.cache.Load(key)
}
// Now, pull out a reference to the item and return it
return item.(*Item), nil
}
This method works as I expect it to. My problem is testing; specifically testing to ensure that the GetItem method is called only once for a given value of key. My test code is below:
var _ = Describe("Item Tests", func() {
It("GetItem - Not cached, two concurrent requests - Client called once", func() {
// setup cache
// Setup a wait group so we can ensure both processes finish
var wg sync.WaitGroup
wg.Add(2)
// Fire off two concurrent requests for the same SKU
go runRequest(&wg, cache)
go runRequest(&wg, cache)
wg.Wait()
// Check the cache; it should have one value
_, ok := cache.cache.Load("PID:SKUID")
Expect(ok).Should(BeTrue())
// The client should have only been requested once
Expect(client.RequestCount).Should(Equal(1)) // FAILS HERE
})
})
// Used for testing concurrency
func runRequest(wg *sync.WaitGroup, cache *SkuCache) {
defer wg.Done()
sku, err := cache.GetItem("PID", "SKUID", "fakeitem")
Expect(err).ShouldNot(HaveOccurred())
}
type mockClient struct {
RequestFails bool
RequestCount int
lock sync.Mutex
}
func NewMockClient(requestFails bool) *mockClient {
return &mockClient{
RequestFails: requestFails,
RequestCount: 0,
lock: sync.Mutex{},
}
}
func (client *mockClient) GetItem(productId string, skuId string) (item Item, err error) {
defer GinkgoRecover()
// If we want to simulate client failure then return an error here
if client.RequestFails {
err = fmt.Errorf("GetItem failed")
return
}
// Sleep for 100ms so we can more accurately simulate the request latency
time.Sleep(100 * time.Millisecond)
// Update the request count
client.lock.Lock()
client.RequestCount++
client.lock.Unlock()
item = Item{
Id: skuId,
ProductId: productId,
}
return
}
The problem I've been having is that occasionally this test will fail because the request count is 2 when it's expected it was 1, at the commented line. This failure is not consistent and I'm not quite sure how to debug this problem. Any help would be greatly appreciated.
I think your tests fail sometimes because your cache fails to provide guarantee that it only fetches items once, and you're lucky the tests caught this.
If an item is not in it, and 2 concurrent goroutines call Cache.GetItem() at the same time, it may happen that lockMap.Load() will report in both that the key is not in the map, both goroutines create a sync.Once, and both will store their own instance in the map (obviously only one–the latter–will remain in the map, but your cache does not check this).
Then the 2 goroutines both will call client.GetItem() because 2 separate sync.Once provides no synchronization. Only if the same sync.Once instance is used, only then there is guarantee that the function passed to Once.Do() is executed only once.
I think a sync.Mutex would be easier and more appropriate to avoid creating and using 2 sync.Once here.
Or since you're already using sync.Map, you may use the Map.LoadOrStore() method: create a sync.Once, and pass that to Map.LoadOrStore(). If the key is already in the map, use the returned sync.Once. If the key is not in the map, your sync.Once will be stored in it and so you can use that. This will ensure no multiple concurrent goroutines can store multiple sync.once instances in it.
Something like this:
var once *sync.Once
if entry, loaded := provider.lockMap.LoadOrStore(key, once); loaded {
// Was already in the map, use the loaded Once
once = entry.(*sync.Once)
}
This solution is still not perfect: if 2 goroutines call Cache.GetItem() at the same time, only one will attempt to fetch the item from the client, but if that fails, only this goroutine will report the error, the other goroutine will not try to fetch the item from the client, but will load it from the map and you don't check whether loading succeeds. You should, and if it's not in the map, that means another, concurrent attempt failed to get it. And so you should report error then (and clear the sync.Once).
As you can see, it's getting more complicated. I stand by my earlier advice: using a sync.Mutex would be easier here.

How to access a variable across all connected tcp clients in go?

I'm setting up a tcp server in a pet project I'm writing in go. I want to be able to maintain a slice of all connected clients, and then modify it whenever a new client connects or disconnects from my server.
My main mental obstacle right now is whether I should be declaring a package level slice, or just passing a slice into my handler.
My first thought was to declare my ClientList slice (I'm aware that a slice might not be my best option here, but I've decided to leave it as is for now) as a package level variable. While I think this would work, I've seen a number of posts discouraging the use of them.
My other thought was to declare ClientList as a slice in my main function, and then I pass ClientList to my HandleClient function, so whenever a client connects/disconnects I can call AddClient or RemoveClient and pass this slice in and add/remove the appropriate client.
This implementation is seen below. There are definitely other issues with the code, but I'm stuck trying to wrap my head around something that seems like it should be very simple.
type Client struct {
Name string
Conn net.Conn
}
type ClientList []*Client
// Identify is used to set the name of the client
func (cl *Client) Identify() error {
// code here to set the client's name in the based on input from client
}
// This is not a threadsafe way to do this - need to use mutex/channels
func (cList *ClientList) AddClient(cl *Client) {
*cList = append(*cList, cl)
}
func (cl *Client) HandleClient(cList *ClientList) {
defer cl.Conn.Close()
cList.AddClient(cl)
err := cl.Identify()
if err != nil {
log.Println(err)
return
}
for {
err := cl.Conn.SetDeadline(time.Now().Add(20 * time.Second))
if err != nil {
log.Println(err)
return
}
cl.Conn.Write([]byte("What command would you like to perform?\n"))
netData, err := bufio.NewReader(cl.Conn).ReadString('\n')
if err != nil {
log.Println(err)
return
}
cmd := strings.TrimSpace(string(netData))
if cmd == "Ping" {
cl.Ping() //sends a pong msg back to client
} else {
cl.Conn.Write([]byte("Unsupported command at this time\n"))
}
}
}
func main() {
arguments := os.Args
PORT := ":" + arguments[1]
l, err := net.Listen("tcp4", PORT)
if err != nil {
fmt.Println(err)
return
}
defer l.Close()
fmt.Println("Listening...")
// Create a new slice to store pointers to clients
var cList ClientList
for {
c, err := l.Accept()
if err != nil {
log.Println(err)
return
}
// Create client cl1
cl1 := Client{Conn: c}
// Go and handle the client
go cl1.HandleClient(&cList)
}
}
From my initial testing, this appears to work. I am able to print out my client list and I can see that new clients are being added, and their name is being added after Identify() is called as well.
When I run it with the -race flag, I do get data race warnings, so I know I will need a threadsafe way to handle adding clients. The same goes for removing clients when I add that in.
Are there any other issues I might be missing by passing my ClientList into HandleClient, or any benefits I would gain from declaring ClientList as a package level variable instead?
Several problems with this approach.
First, your code contains a data race: each TCP connection is served by a separate goroutine, and they all attempt to modify the slice concurrently.
You might try building your code with go build -race (or go install -race — whatever you're using), and see it crash by the enabled runtime checks.
This one is easy to fix. The most straightforward approach is to add a mutex variable into the ClientList type:
type ClientList struct {
mu sync.Mutex
clients []*Client
}
…and make the type's methods hold the mutex while they're mutating the clients field, like this:
func (cList *ClientList) AddClient(cl *Client) {
cList.mu.Lock()
defer cList.mu.Unlock()
cList.clients = append(cList.clients, o)
}
(If you will ever encounter the typical usage pattern of your ClientList type is to frequently call methods which only read the contained list, you may start using the sync.RWLock type instead, which allows multiple concurrent readers.)
Second, I'd split the part which "identifies" a client out of the handler function.
As of now, in the handler, if the identification fails, the handler exits but the client is not delisted.
I'd say it would be better to identify it up front and only run the handler once the client is beleived to be okay.
Also it supposedly worth adding a deferred call to something like RemoveClient at the top of the handler's body so that the client is properly delisted when the handler is done with it.
IOW, I'd expect to see something like this:
func (cl *Client) HandleClient(cList *ClientList) {
defer cl.Conn.Close()
err := cl.Identify()
if err != nil {
log.Println(err)
return
}
cList.AddClient(cl)
defer cList.RemoveClient(cl)
// ... the rest of the code
}

Gin If `request body` bound in middleware, c.Request.Body become 0

My API server has middle ware which is getting token from request header.
If it access is correct, its go next function.
But request went to middle ware and went to next function, c.Request.Body become 0.
middle ware
func getUserIdFromBody(c *gin.Context) (int) {
var jsonBody User
length, _ := strconv.Atoi(c.Request.Header.Get("Content-Length"))
body := make([]byte, length)
length, _ = c.Request.Body.Read(body)
json.Unmarshal(body[:length], &jsonBody)
return jsonBody.Id
}
func CheckToken() (gin.HandlerFunc) {
return func(c *gin.Context) {
var userId int
config := model.NewConfig()
reqToken := c.Request.Header.Get("token")
_, resBool := c.GetQuery("user_id")
if resBool == false {
userId = getUserIdFromBody(c)
} else {
userIdStr := c.Query("user_id")
userId, _ = strconv.Atoi(userIdStr)
}
...
if ok {
c.Nex()
return
}
}
next func
func bindOneDay(c *gin.Context) (model.Oneday, error) {
var oneday model.Oneday
if err := c.BindJSON(&oneday); err != nil {
return oneday, err
}
return oneday, nil
}
bindOneDay return error with EOF. because maybe c.Request.Body is 0.
I want to get user_id from request body in middle ware.
How to do it without problem that c.Request.Body become 0
You can only read the Body from the client once. The data is streaming from the user, and they're not going to send it again. If you want to read it more than once, you're going to have to buffer the whole thing in memory, like so:
bodyCopy := new(bytes.Buffer)
// Read the whole body
_, err := io.Copy(bodyCopy, req.Body)
if err != nil {
return err
}
bodyData := bodyCopy.Bytes()
// Replace the body with a reader that reads from the buffer
req.Body = ioutil.NopCloser(bytes.NewReader(bodyData))
// Now you can do something with the contents of bodyData,
// like passing it to json.Unmarshal
Note that buffering the entire request into memory means that a user can cause you to allocate unlimited memory -- you should probably either block this at a frontend proxy or use an io.LimitedReader to limit the amount of data you'll buffer.
You also have to read the entire body before Unmarshal can start its work -- this is probably no big deal, but you can do better using io.TeeReader and json.NewDecoder if you're so inclined.
Better, of course, would be to figure out a way to restructure your code so that buffering the body and decoding it twice aren't necessary.
Gin provides a native solution to allow you to get data multiple times from c.Request.Body. The solution is to use c.ShouldBindBodyWith. Per the gin documentation
ShouldBindBodyWith ... stores the
request body into the context, and reuse when it is called again.
For your particular example, this would be implemented in your middleware like so,
func getUserIdFromBody(c *gin.Context) (int) {
var jsonBody User
if err := c.ShouldBindBodyWith(&jsonBody, binding.JSON); err != nil {
//return error
}
return jsonBody.Id
}
After the middleware, if you want to bind to the body again, just use ctx.ShouldBindBodyWith again. For your particular example, this would be implemented like so
func bindOneDay(c *gin.Context) (model.Oneday, error) {
var oneday model.Oneday
if err := c.ShouldBindBodyWith(&oneday); err != nil {
return error
}
return oneday, nil
}
The issue we're fighting against is that gin has setup c.Request.Body as an io.ReadCloser object -- meaning that it is intended to be read from only once. So, if you access c.Request.Body in your code at all, the bytes will be read (consumed) and c.Request.Body will be empty thereafter. By using ShouldBindBodyWith to access the bytes, gin saves the bytes into another storage mechanism within the context, so that it can be reused over and over again.
As a side note, if you've consumed the c.Request.Body and later want to access c.Request.Body, you can do so by tapping into gin's storage mechanism via ctx.Get(gin.BodyBytesKey). Here's an example of how you can obtain the gin-stored Request Body as []byte and then convert it to a string,
var body string
if cb, ok := ctx.Get(gin.BodyBytesKey); ok {
if cbb, ok := cb.([]byte); ok {
body = string(cbb)
}
}

Resources