I tried to understand uber's opensource implementation of ratelimit, but a little confused
full code are as followings:
// Take blocks to ensure that the time spent between multiple
// Take calls is on average time.Second/rate.
func (t *limiter) Take() time.Time {
newState := state{}
taken := false
for !taken {
now := t.clock.Now()
previousStatePointer := atomic.LoadPointer(&t.state)
oldState := (*state)(previousStatePointer)
newState = state{}
newState.last = now
// If this is our first request, then we allow it.
if oldState.last.IsZero() {
taken = atomic.CompareAndSwapPointer(&t.state, previousStatePointer, unsafe.Pointer(&newState))
continue
}
// sleepFor calculates how much time we should sleep based on
// the perRequest budget and how long the last request took.
// Since the request may take longer than the budget, this number
// can get negative, and is summed across requests.
newState.sleepFor += t.perRequest - now.Sub(oldState.last)
// We shouldn't allow sleepFor to get too negative, since it would mean that
// a service that slowed down a lot for a short period of time would get
// a much higher RPS following that.
if newState.sleepFor < t.maxSlack {
newState.sleepFor = t.maxSlack
}
if newState.sleepFor > 0 {
newState.last = newState.last.Add(newState.sleepFor)
}
taken = atomic.CompareAndSwapPointer(&t.state, previousStatePointer, unsafe.Pointer(&newState))
}
t.clock.Sleep(newState.sleepFor)
return newState.last
}
I realized this is a lock-free algorithm, but I know little about this topic.
appreciated if anyone happend to know this algorithm and offer me some answers/docs/blogs.
Related
I am writing a Kubernetes operator and dealing with a peculiar situation of handling long-running tasks from the reconcile loop.
I have the following situation:
func (r *ProvMyAppReconciler) Reconcile(req ctrl.Request) (ctrl.Result, error) {
// _ = context.Background()
_ = r.Log.WithValues("Instance.Namespace", req.NamespacedName)
// your logic here
var i int32
var j int32
var yappMyAppSpex myAppingv1alpha1.MyAppSpec
var result *ctrl.Result
var msg string
var requeueFlag bool
runFlag = false
<<<<< Instance prov logic>>>
// =============================== Check Deletion TimeStamp========
// Check if the MyAppOMyApp instance is marked to be deleted, which is
// // indicated by the deletion timestamp being set.
// <<<<<<<Deleteion and finalizer logic is here>>>>>>>>
// ================================ MyApp Setup ===================
if len(instance.Spec.MyApp) > 0 {
for i = 0; i < int32(len(instance.Spec.MyApp)); i++ {
yappMyAppSpex = instance.Spec.MyApp[i]
if !yappMyAppSpex.setRemove {
result, err = r.provisionStatefulSet(instance, prepareStatefulSetForMyApp(instance, yappMyAppSpex), "TYPE=TELEAPP")
if err != nil {
return *result, err
} else {
_, err = r.addMyApp(yappMyAppSpex.Name) <<< This takes lot of time and if I do any operation on a CR resource , it is not cptured by K8s as operator is busy in running this job in foreground.>>>>
if err != nil {
requeueFlag = true
}
}
}
}
}
if runFlag {
return ctrl.Result{Requeue: true, RequeueAfter: 30 * time.Second}, nil
}
return ctrl.Result{}, nil
}
I am trying to understand what is the best way to handle the above situation? Do I need to use channels and run them in the background? The main issue here is that I have to run some configuration which is taking a lot of time and causing the K8s operator not to handle other updates that are done on CR.
The first thing I would recommend is making good use of the resource status. It's important that consumers of the operator know that the operator has acknowledged the changes and is acting on it.
Then, I would recommend revisiting the API definitions - is this the right API to be using? i.e. can you split this up into more than one controller (or API)?
A simple example: in a restaurant it's not a good idea if the cook is also the waiter because you want customers to know they are being taken care of but cooking could take a long time. It's better for customers to have a waiter that takes the order, marks the order status accordingly, and then hands it to the cook(s) to execute on (this could be another API only used between cook and waiter).
In your example above you could add an API for a single app instance. The main controller would only be responsible for applying that API for each of the instances declared in the top level API (which doesn't take long). The worker controller would respond to the creation of that single-app-instance API and execute on it. This still would take time for the worker but visibility and UX would improve.
Defining the problem:
We have this IOT device which each send us logs about cars locations. We want to compute the distance the car is travelling online! so when ever a log comes(after putting it in a queue etc) we do this:
type Delta struct {
DeviceId string
time int64
Distance float64
}
var LastLogs = make(map[string]FullLog)
var Distances = make(map[string]Delta)
func addLastLog(l FullLog) {
LastLogs[l.DeviceID] = l
}
func AddToLogPerDay(l FullLog) {
//mutex.Lock()
if val, ok := LastLogs[l.DeviceID]; ok {
if distance, exist := Distances[l.DeviceID]; exist {
x := computingDistance(val, l)
Distances[l.DeviceID] = Delta{
DeviceId: l.DeviceID,
time: distance.time + 1,
Distance: distance.Distance + x,
}
} else {
Distances[l.DeviceID] = Delta{
DeviceId: l.DeviceID,
time: 1,
Distance: 0,
}
}
}
addLastLog(l)
}
which basically calculates distance using a utility function! so in Distances each device Id is mapped to some distance traveled! now here is where the problem starts: While this distances are added to Distances map, I want a go routine to put this data in the database but since there are many devices and many logs and so on doing this query for every log is not a good idea. So I need to this for every 5 second which means every 5 seconds try to empty the list of all last distances added to the map. I wrote this function:
func UpdateLogPerDayTable() {
for {
for _, distance := range Distances {
logs := model.HourPerDay{}
result := services.CarDBProvider.DB.Table(model.HourPerDay{}.TableName()).
Where("created_at >? AND device_id = ?", getCurrentData(), distance.DeviceId).
Find(&logs)
if result.Error != nil && !result.RecordNotFound() {
log.Infof("Something went wrong while checking the log: %v", result.Error)
} else {
if !result.RecordNotFound() {
logs.CountDistance = distance.Distance
logs.CountSecond = distance.time
err := services.CarDBProvider.DB.Model(&logs).
Update(map[string]interface{}{
"count_second": logs.CountSecond,
"count_distance": logs.CountDistance,
})
if err.Error != nil {
log.Infof("Something went wrong while updating the log: %v", err.Error)
}
} else if result.RecordNotFound() {
dayLog := model.HourPerDay{
Model: gorm.Model{},
DeviceId: distance.DeviceId,
CountSecond: int64(distance.time),
CountDistance: distance.Distance,
}
err := services.CarDBProvider.DB.Create(&dayLog)
if err.Error != nil {
log.Infof("Something went wrong while adding the log: %v", err.Error)
}
}
}
}
time.Sleep(time.Second * 5)
}
}
it is called go utlis.UpdateLogPerDayTable() on another go routine. However there are many problems here:
I don't know how to secure Distances so when I add it in another routine I read it somewhere else ,every thing is ok!(The problem is that I want to use go channels and don't have any idea how to do it)
How can I schedule tasks in go for this problem?
Probably I will add a redis to store all the devices that or online so I could do the select query faster and just update the actual database. also add an expire time for redis so if a device didn't send and data for some time, it vanishes! where should I put this code?
Sorry If my explanations weren't enough but I really need some help. specifically for code implementation
Go has a really cool pattern using for / select over multiple channels. This allows you to batch distance writes using both a timeout and a max record size. Using this pattern requires using channels.
First thing is to model your distances as a channel:
distances := make(chan Delta)
Then you an keep track of the current batch
var deltas []Delta
Then
ticker := time.NewTicker(time.Second * 5)
var deltas []Delta
for {
select {
case <-ticker.C:
// 5 seconds up flush to db
// reset deltas
case d := <-distances:
deltas = append(deltas, d)
if len(deltas) >= maxDeltasPerFlush {
// flush
// reset deltas
}
}
}
I don't know how to secure Distances so when I add it in another
routine I read it somewhere else ,every thing is ok!(The problem is
that I want to use go channels and don't have any idea how to do it)
If you intend to keep a map and share memory you need to protect it using mutual exclusion (mutex) to synchronize access between go routines. Using a channel allows you to send a copy to a channel, removing the need for synchronizing across the Delta Object. Depending on your architecture you could also create a pipeline of go routines connected by channels, which could make it so only a single go routine (monitor go routine) is accessing the Delta, also removing the need for synchronization.
How can I schedule tasks in go for this problem?
Using a channel as the primitive for how you pass Deltas to different go routines :)
Probably I will add a redis to store all the devices that or online so
I could do the select query faster and just update the actual
database. also add an expire time for redis so if a device didn't send
and data for some time, it vanishes! where should I put this code?
This depends on your finished architecture. You could write a decorator for the select operation, which would check redis first then go to the DB. The client of this function wouldn't have to know about this. Write operations could be done the same way: Write to persistent store and then write back to redis with the cached value and the expiration. Using decorators the client wouldn't need to know about this, they would just perform the Reads and Writes and the cache logic would be implemented inside of the decorators. There are many ways for this, and its largely dependent on where your implementation settles.
So I'm building a small utility that listens on a socket and stores incoming messages as a structs in a slice:
var points []Point
type Point struct {
time time.Time
x float64
y float64
}
func main() {
received = make([]Point, 0)
l, err := net.Listen("tcp4", ":8900")
// (...)
}
func processIncomingData(data string) {
// Parse icoming data that comes as: xValue,yValue
inData = strings.Split(data, ",")
x, err := strconv.ParseFloat(inData[0], 64);
if err != nil {
fmt.Println(err)
}
y, err := strconv.ParseFloat(inData[1], 64);
if err != nil {
fmt.Println(err)
}
// Store the new Point
points = append(points, Point{
time: time.Now(),
x: x,
y: y,
})
// Remove points older than 1h ?
}
Now, as you might imagine this will quickly fill my RAM. What's the best way (faster execution) to remove points older than 1h after appening each new one? I'll be getting new points 10-15 times peer second.
Thank you.
An approach I've used several times is to start a goroutine early in the project that looks something like this:
go cleanup()
...
func cleanup() {
for {
time.Sleep(...)
// do cleanup
}
}
Then what you could do is iterate over points using time.Since(point.time) to figure out how old each piece of data is. If it's too old, there's a slice trick to remove an item from a slice given it's position:
points = append(points[:i], points[i+1:]...)
(where i is the index to remove)
Because the points are in the slice in order of the time they were added, you could speed things up by simply finding the first index that isn't an hour old and doing points = points[i:] to chop off the old points from the beginning of the slice.
You may run into problems if you get a request that accesses the array while you're cleaning it up. Adding a sync.Mutex can help with that. Just lock the mutex before the cleanup and also attempt to lock the mutex anywhere else you write to the array. This may be premature optimization though. I'd experiment without the mutex before adding it in as this would effectively make interacting with points a serial operation and slow the service down.
The time.Sleep(...) in the loop is to prevent from cleaning too often. You might be tempted to set it to an hour since you want to delete points older than that but you might end up with a situation where a point is added immediately after a cleanup. On the next cleanup it'll be 59 mins old and you don't delete it, on the NEXT cleanup it's nearly 2 hours old. My rule of thumb is that I attempt to clean up every 1/10 the amount of time I want to allow an object to stay in memory but that's rather arbitrary. This approach means an object could be at most 1h 5m 59s old when it's deleted.
I was playing with following Go code which calculates Population count using lookup table:
package population
import (
"fmt"
)
var pc [256]byte
func init(){
for i := range pc {
pc[i] = pc[i/2] + byte(i&1)
}
}
func countPopulation() {
var x uint64 = 65535
populationCount := int(pc[byte(x>>(0*8))] +
pc[byte(x>>(1*8))] +
pc[byte(x>>(2*8))] +
pc[byte(x>>(3*8))] +
pc[byte(x>>(4*8))] +
pc[byte(x>>(5*8))] +
pc[byte(x>>(6*8))] +
pc[byte(x>>(7*8))])
fmt.Printf("Population count: %d\n", populationCount)
}
I have written following benchmark code to check performance of above code block:
package population
import "testing"
func BenchmarkCountPopulation(b *testing.B) {
for i := 0; i < b.N; i++ {
countPopulation()
}
}
Which gave me following result:
100000 18760 ns/op
PASS
ok gopl.io/ch2 2.055s
Then I moved the code from init() function to the countPopulation() function as below:
func countPopulation() {
var pc [256]byte
for i := range pc {
pc[i] = pc[i/2] + byte(i&1)
}
var x uint64 = 65535
populationCount := int(pc[byte(x>>(0*8))] +
pc[byte(x>>(1*8))] +
pc[byte(x>>(2*8))] +
pc[byte(x>>(3*8))] +
pc[byte(x>>(4*8))] +
pc[byte(x>>(5*8))] +
pc[byte(x>>(6*8))] +
pc[byte(x>>(7*8))])
fmt.Printf("Population count: %d\n", populationCount)
}
and once again ran the same benchmark code, which gave me following result:
100000 20565 ns/op
PASS
ok gopl.io/ch2 2.303s
After observing both the results it is clear that init() function is not in the scope of benchmark function. That's why first benchmark execution took lesser time compared to second execution.
Now I have another question which I am looking to get answer for.
If I need to benchmark only the init() method, considering there can be multiple init() functions in a package. How is it done in golang?
Thanks in advance.
Yes there can be multiple init()'s in a package, in-fact you can have multiple init()'s in a file. More information about init can be found here. Remember that init() is automatically called one time before your program's main() is even started.
The benchmark framework runs your code multiple times (in your case 100000). This allows it to measure very short functions, as well as very long functions. It doesn't make sense for benchmark to include the time for init(). The problem you are having is that you are not understanding the purpose of benchmarking. Benchmarking lets you compare two or more separate implementations to determine which one is faster (also you can compare performance based on input of the same function). It does not tell you where you should be doing that.
What you are basically doing is known as Premature Optimization. It's where you start optimizing code trying to make it as fast as possible, without knowing where your program actually spends most of its time. Profiling is the process of measuring the time and space complexity of a program. In practice, it allows you to see where your program is spending most of its time. Using that information, you can write more efficient functions. More information about profiling in go can be found in this blog post.
I need to build a data-structure like this:
map[string]SomeType
But it must store values for about 10 minutes and then clear it from memory.
Second condition is records amount - it must be huge. This data-structure must add at least 2-5K records per second.
So, what is the most correct way in Go to make it?
I'm trying to make goroutine with timeout for each new elemnt. And one(or more) garbage-collector goroutine with channel to receive timeouts and clear elements.
But I'm not sure it's the most clear way. Is it Ok to have millions of waiting goroutines with timeouts?
Thanks.
You will have to create a struct to hold your map and provide custom get/put/delete funcs to access it.
Note that 2-5k accesses per second is not really that much at all, so you don't have to worry about that.
Here's a simple implementation:
type item struct {
value string
lastAccess int64
}
type TTLMap struct {
m map[string]*item
l sync.Mutex
}
func New(ln int, maxTTL int) (m *TTLMap) {
m = &TTLMap{m: make(map[string]*item, ln)}
go func() {
for now := range time.Tick(time.Second) {
m.l.Lock()
for k, v := range m.m {
if now.Unix() - v.lastAccess > int64(maxTTL) {
delete(m.m, k)
}
}
m.l.Unlock()
}
}()
return
}
func (m *TTLMap) Len() int {
return len(m.m)
}
func (m *TTLMap) Put(k, v string) {
m.l.Lock()
it, ok := m.m[k]
if !ok {
it = &item{value: v}
m.m[k] = it
}
it.lastAccess = time.Now().Unix()
m.l.Unlock()
}
func (m *TTLMap) Get(k string) (v string) {
m.l.Lock()
if it, ok := m.m[k]; ok {
v = it.value
it.lastAccess = time.Now().Unix()
}
m.l.Unlock()
return
}
playground
note(2020-09-23): for some reason the time resolution on the current version of the playground is way off, this works fine, however to try on the playground you have to change the sleep to 3-5 seconds.
Take a look at buntdb.
tinykv is no longer being maintained.
Just for the record, I had the same problem and wrote tinykv package which uses a map internally.
It uses a heap of time.Time for timeouts, so it does not ranges over the whole map.
A max interval can be set when creating an instance. But actual intervals for checking the timeout can be any value of time.Duration greater than zero and less than max, based on the last item that timed out.
It provides CAS and Take functionality.
A callback (optional) can be set which notifies which key and value got timed out.
Timeouts can be explicit or sliding.
I suggest to use Map of golang's built-in package sync, it's very easy to use and already handles concurrency https://golang.org/pkg/sync/#Map