Rate limiting algorithm for throttling request

Rate limiting algorithm for throttling request - algorithm

I need to design a rate limiter service for throttling requests.
For every incoming request a method will check if the requests per second has exceeded its limit or not. If it has exceeded then it will return the amount of time it needs to wait for being handled.
Looking for a simple solution which just uses system tick count and rps(request per second). Should not use queue or complex rate limiting algorithms and data structures.
Edit: I will be implementing this in c++. Also, note I don't want to use any data structures to store the request currently getting executed.
API would be like:
if (!RateLimiter.Limit())
{
do work
RateLimiter.Done();
}
else
reject request

The most common algorithm used for this is token bucket. There is no need to invent a new thing, just search for an implementation on your technology/language.
If your app is high avalaible / load balanced you might want to keep the bucket information on some sort of persistent storage. Redis is a good candidate for this.
I wrote Limitd is a different approach, is a daemon for limits. The application ask the daemon using a limitd client if the traffic is conformant. The limit is configured on the limitd server and the app is agnostic to the algorithm.

since you give no hint of language or platform I'll just give out some pseudo code..
things you are gonna need
a list of current executing requests
a wait to get notified where a requests is finished
and the code can be as simple as
var ListOfCurrentRequests; //A list of the start time of current requests
var MaxAmoutOfRequests;// just a limit
var AverageExecutionTime;//if the execution time is non deterministic the best we can do is have a average
//for each request ether execute or return the PROBABLE amount to wait
function OnNewRequest(Identifier)
{
if(count(ListOfCurrentRequests) < MaxAmoutOfRequests)//if we have room
{
Struct Tracker
Tracker.Request = Identifier;
Tracker.StartTime = Now; // save the start time
AddToList(Tracker) //add to list
}
else
{
return CalculateWaitTime()//return the PROBABLE time it will take for a 'slot' to be available
}
}
//when request as ended release a 'slot' and update the average execution time
function OnRequestEnd(Identifier)
{
Tracker = RemoveFromList(Identifier);
UpdateAverageExecutionTime(Now - Tracker.StartTime);
}
function CalculateWaitTime()
{
//the one that started first is PROBABLY the first to finish
Tracker = GetTheOneThatIsRunnigTheLongest(ListOfCurrentRequests);
//assume the it will finish in avg time
ProbableTimeToFinish = AverageExecutionTime - Tracker.StartTime;
return ProbableTimeToFinish
}
but keep in mind that there are several problems with this
assumes that by returning the wait time the client will issue a new request after the time as passed. since the time is a estimation, you can not use it to delay execution, or you can still overflow the system
since you are not keeping a queue and delaying the request, a client can be waiting for more time that what he needs.
and for last, since you do not what to keep a queue, to prioritize and delay the requests, this mean that you can have a live lock, where you tell a client to return later, but when he returns someone already took its spot, and he has to return again.
so the ideal solution should be a actual execution queue, but since you don't want one.. I guess this is the next best thing.

according to your comments you just what a simple (not very precise) requests per second flag. in that case the code can be something like this
var CurrentRequestCount;
var MaxAmoutOfRequests;
var CurrentTimestampWithPrecisionToSeconds
function CanRun()
{
if(Now.AsSeconds > CurrentTimestampWithPrecisionToSeconds)//second as passed reset counter
CurrentRequestCount=0;
if(CurrentRequestCount>=MaxAmoutOfRequests)
return false;
CurrentRequestCount++
return true;
}
doesn't seem like a very reliable method to control whatever.. but.. I believe it's what you asked..

Related

Kotlin coroutines slow start

I've been attempting to do a bit of performance review on an app I have, it's a back end Kotlin app that just pulls in some data, does a bit of data transformation and dumps it out, nothing too fancy. One thing that caught my eye was the final bit of execution where we dump our final data onto a queue, at first I noticed when we start up the app the final network call takes a very long time at first, sometimes over a second. Normally we run this network call in a coroutine to stop that last call blocking everything but I started trying to time the coroutine and the network call separately and got some odd results, from what I can see the coroutine takes can take forever to launch/complete compared to the network call. It's entirely possible I'm not recording things correctly but this is the general timing approach I have:
val coroutineTime - Instant.now().toEpochMillis()
GlobalScope.launch {
executionTime = measureTimeMillis { <--DO Message Sending -->}
totalTime = Instant.now().toEpochMillis() - coroutineTime
// Log out execution Time and total time
}
Now here what I'll see is something like
- totalTime = ~800ms
- executionTime = ~150ms
These aren't one-offs either, I have multiple of these processes going on at once ( up to 10 threads I think) and the first total times will always take significantly longer than the actual executionTime/network call. Eventually after a new dozen messages the overhead will calm down and these times will become equivalent at about 15ms, but having nearly 700ms overhead on coroutine start up seems insane to me.
Is this normal/expected behavior? I've tested this in a separate app and see similar but less extreme results where the first coroutine will take about 70ms to boot up, I'm struggling to find any other examples of this type of discussion outside of kotlin being used in android development.

As a first note, it's almost never a good idea to use the GlobalScope unless you really know what you're doing. This is why it was marked as delicate API. You should instead use a scope that is appropriately closed (following the lifecycle of whatever component launches this work).
Now, AFAIK, this GlobalScope runs on the default dispatcher, so maybe this is due to a cold start of that default thread pool. Later, it could also be a problem to use this dispatcher for network calls depending on the amount of concurrent coroutines you have. It would be more appropriate to use Disptachers.IO instead for IO bound work (or a custom thread pool).
It still doesn't explain the cold start, but I would first change that before investigating.

This is expected behavior if you use coroutines inappropriately ;-)
My guess is that your message sending is a blocking operation. By default GlobalScope.launch() dispatches coroutines with Dispatchers.Default which is designed to perform CPU-intensive operations, it has a limited number of threads and you should never block when using it. If you do you may run out of threads and coroutines will need to wait until some blocking operations will finish.
If you need to run blocking or IO code, you should use Dispatchers.IO instead:
GlobalScope.launch(Dispatchers.IO) {

I was facing similar issue, I have a function that loads some data from shared prefs, makes some calculations on the data (all this done in Dispatcher.Default), and return the result on Dispatcher.Main. I measured how long it took the Coroutine to actually start executing the block inside Dispatchers.Main.launch { } after calculations are done(time from tag2 to tag3 below), and got about 950ms (!!) Here is the function :
fun someName() {
CoroutineScope(Dispatchers.Default).launch {
val time = System.currentTimeMillis()
//load data and calculations
Log.d("tag2", "load and calculations took " + (System.currentTimeMillis() - time))
CoroutineScope(Dispatchers.Main.immediate).launch {
Log.d("tag3", "reached main thread code " + (System.currentTimeMillis() - time))
//do something
Log.d("tag4", "do something took " + (System.currentTimeMillis() - time))
}
}
}
But then I realized this happens while app launch, and main thread is busy creating all the UI, so even with .immediate it takes time until main thread will get to execute the dispatched code... then I tried to run this function after app already started and waiting, and found that from tag2 to tag 3 takes about 1ms (!!) (with .immediate). So looks like when dispatching something on Coroutine, when thread isn't busy it will start immediately

Http Performance - Many small requests or one big one

Scenario:
In my site I display books.
The user can add every book to a "Read Later" list.
Behavior:
When the user enters the site, they are presented with a list of books.
Some of which are already in their "Read Later" list, some aren't.
The user has an indication next to each book telling them whether the book has been added to the list or not.
My issue
I am debating which option is the ideal for my situation.
Option 1:
For every book, query the server whether it already exists in the user's list.
Update the indicator for each book.
Pro:
Very small request to the server, and very easy response (true or false).
Con: In a page with 30 books, I will send 30 separate http requests, which can block sockets, and is rather slow considering the browser and the server have to perform the entire handshake for each transaction.
Option 2:
I query the server once, and get a response with the full list of books in the "Read Later" list as an array.
In the browser, I go over the array, and update the indication for each book based on whether it exists in the array or not.
Pro: I only make one request, and update the indicator for all the books at once.
Con: The "Read Later" list might have hundreds of books, and passing a big array might prove slow and excessive. Especially in scenarios when not 30 books appear on the screen, but only 2-3. (That is, I want to check if a certain book is in the list, and for this I have the server send the client the entire list of books from the list).
So,
Which way would you go to maximize performance: 1 or 2?
Is there any alternative I am missing?

I think in 2017, and beyond, the solution is much less about overall performance but about user experience and user expectations.
Nowadays users do not tolerate delays. In that sense sophisticated user interfaces try to be responsive as quickly as possible. Thus: if you can use those small requests to enable the user to do something quickly (instead of waiting 2 seconds for that one big request to return) you should prefer that solution.
To my knowledge, there are many "high fidelity" sites out there where a single page might send 50, 100 requests. Therefore I consider that to be common practice!
And maybe it is helpful here: se-radio.net podcast episode 277 discusses this topic intensively, in the context of tail latency.

Option 1 sounds good but has a big problem in terms of scalability.
Option 2 mitigates this scalability problem and we can improve its design:
Client side, via javascript, collect only displayed book ids and query once, via ajax, for an array of read-later info, only for those 30 books. This way you still serve the page fast and request a small set of additional info, once with a single http request.
Server side you can further improve caching an in memory array of read-later ids for each user.

Live Testing, Solution & Real-World Data
This answer is written in JavaScript, and includes easy to understand code examples.
Introduction
The OP asked what is the most efficient way to make requests to a "Read Later" API that each request requires to wait some time while the backend saves the book.
For this answer, I have created a demo of a "Read Later" API endpoint, every request waits randomly from 70-130 milliseconds for saving each book.
I am testing in all scenarios 30 books every time.
Finally, we will see the best results for each method by measuring professionally real runtime of every action we will take.
Synchronous Requests (OP's Option 1)
Here, we will run every call via JS, one after the other synchronously.
The code:
async function saveBooksSync() {
console.time('save-books-sync');
// creates 30 book IDs
const booksIds = Array.from({length: 30}, (_, i) => i + 1);
// creates 30 API links for each request
const urls = booksIds.map(bookId => `http://localhost:7777/books/read-later?bookId=${bookId}`);
for(let url of urls) {
const response = await fetch(url);
const json = await response.json();
console.log(json);
}
console.timeEnd('save-books-sync');
}
Runtime: 3712.40087890625 ms
One Big Request
Although we will not be creating many request connection to the server, the runtime speaks for itself.
The code:
async function saveAllBooksAtOnce() {
console.time('save-all-books')
const booksIds = Array.from({length: 30}, (_, i) => i + 1);
const url = `http://localhost:7777/books/read-later?all=1`;
const response = await fetch(url);
const json = await response.json();
console.timeEnd('save-all-books');
}
Runtime: 3486.71484375 ms
Parallel Asynchronous Requests (solution)
Here the magic happens, the solution to the question, what is the most efficient request method.
Here we are making 30 parallel small requests with amazing results.
The code:
async function saveBooksParallel() {
console.time('save-books')
const booksIds = Array.from({length: 30}, (_, i) => i + 1);
const urls = booksIds.map(bookId => `http://localhost:7777/books/read-later?bookId=${bookId}`);
const promises = urls.map((url) =>
fetch(url).then((response) => response.json())
);
const data = await Promise.all(promises);
console.log(data);
console.timeEnd('save-books');
}
Here in this asynchronous parallel example, I used the Promise.all method.
The Promise.all() method takes an iterable of promises as an input,
and returns a single Promise that resolves to an array of the results
of the input promises
Runtime: 668.47705078125 ms
Conclusion
The results are clear, the most efficient way to make these multiple requests is to do this in Asynchronous Parallel.
Update: I followed #Iglesias Leonardo's request to remove the console.log() of the data output because (presumably) it takes high resources.
These are the runtime results:
Synchronous Requests: 3371.695 ms
One Big Request: 3358.269 ms
Parallel Asynchronous Requests: 613.506
Update Conclusion:
The runtimes stayed almost the same and thus reflect the reality that Parallel Asynchronous Requests are unmatched by speed

In my view it depends on how the data is stored. If a relational database is being used you could easily get the boolean flag into the list of books by simply doing a join on the corresponding tables.
This will most likely give you the best results and you wouldn't have to write any algorithms in the front end.

Synchronous XMLHttpRequest deprecated

Today, I had to restart my browser due to some issue with an extension. What I found when I restarted it, was that my browser (Chromium) automatically updated to a new version that doesn't allow synchronous AJAX-requests anymore. Quote:
Synchronous XMLHttpRequest on the main thread is deprecated because of
its detrimental effects to the end user's experience. For more help,
check http://xhr.spec.whatwg.org/.
I need synchronous AJAX-requests for my node.js applications to work though, as they store and load data from disk through a server utilizing fopen. I found this to be a very simplistic and effective way of doing things, very handy in the creation of little hobby projects and editors... Is there a way to re-enable synchronous XMLHttpRequests in Chrome/Chromium?

This answer has been edited.
Short answer:
They don't want sync on the main thread.
The solution is simple for new browsers that support threads/web workers:
var foo = new Worker("scriptWithSyncRequests.js")
Neither DOM nor global vairables aren't going to be visible within a worker but encapsulation of multiple synchronous requests is going to be really easy.
Alternative solution is to switch to async but to use browser localStorage along with JSON.stringify as a medium. You might be able to mock localStorage if you allowed to do some IO.
http://caniuse.com/#search=localstorage
Just for fun, there are alternative hacks if we want to restrict our self using only sync:
It is tempting to use setTimeout because one might think it is a good way to encapsulate synchronous requests together. Sadly, there is a gotcha. Async in javascript doesn't mean it gets to run in its own thread. Async is likely postponing the call, waiting for others to finish. Lucky for us there is light at the end of the tunnel because it is likely you can use xhttp.timeout along with xhttp.ontimeout to recover. See Timeout XMLHttpRequest
This means we can implement tiny version of a schedular that handles failed request and allocates time to try again or report error.
// The basic idea.
function runSchedular(s)
{
setTimeout(function() {
if (s.ptr < callQueue.length) {
// Handles rescheduling if needed by pushing the que.
// Remember to set time for xhttp.timeout.
// Use xhttp.ontimeout to set default return value for failure.
// The pushed function might do something like: (in pesudo)
// if !d1
// d1 = get(http...?query);
// if !d2
// d2 = get(http...?query);
// if (!d1) {pushQue tryAgainLater}
// if (!d2) {pushQue tryAgainLater}
// if (d1 && d2) {pushQue handleData}
s = s.callQueue[s.ptr++](s);
} else {
// Clear the que when there is nothing more to do.
s.ptr = 0;
s.callQueue = [];
// You could implement an idle counter and increase this value to free
// CPU time.
s.t = 200;
}
runSchedular(s);
}, s.t);
}

Doesn't "deprecated" mean that it's available, but won't be forever. (I read elsewhere that it won't be going away for a number of years.) If so, and this is for hobby projects, then perhaps you could use async: false for now as a quick way to get the job done?

What approach should I take for creating a "lobby" in Node.js?

I have users connecting to a Node.js server, and when they join, I add them into a Lobby (essentially a queue). Any time there are 2 users in the lobby, I want them to pair off and be removed from the lobby. So essentially, it's just a simple queue.
I started off by trying to implement this with a Lobby.run method, which has an infinite loop (started within a process.nextTick call), and any time there are more than two entries in the queue, I remove them form the queue. However, I found that this was eating all my memory and that infinite loops like this are generally ill-advised.
I'm now assuming that emitting events via EventEmitter is the way to go. However, my concern is with synchronization. Let's assuming my Lobby is pretty simple:
Lobby = {
users: []
, join: function (user) {
this.users.push(user);
emitter.emit('lobby.join', user);
}
, leave: function (user) {
var index = this.users.indexOf(user);
this.users.splice(index, 1);
emitter.emit('lobby.leave', user);
}
};
Now essentially I assume I want to watch for users joining the lobby and pair them up, maybe something like this:
Lobby = {
...
, run: function () {
emitter.on('lobby.join', function (user) {
// TODO: determine if this.users contains other users,
// pair them off, and remove them from the array
});
}
}
As I mentioned, this does not account for synchronization. Multiple users can join the lobby at the same time, and so the event listener might pair up a single user with multiple other users instead of just one.
Can someone with more Node.js experience tell me if I am right to be concerned with this event-based approach? Any insight for improvement on this approach would be much appreciated.

You are wrong to be concerned with this. This is because Node.JS is single-threaded, there is no concurrency at all! Whenever a block of code is fired no other code (including event handlers) can be fired until the block finishes what it does. In particular if you define this empty loop in your app:
while(true) { }
then your server is crashed, no other code will ever fire, no other request will be ever handled. So be careful with blocks of code, make sure that each block will eventually end.
Back to the question... So in your case it is impossible for multiple users to be paired with the same user. And let me say one more time: this is simply because there is no concurrency in Node.JS!
On the other hand this only applies to one instance of Node.JS. If you want to scale it to many machines, then obviously you will have to implement some locking mechanism (which ensures that no other process can work with the data at the same time).

measuring cross process latency on windows

I am building latency measurement into a communication middleware I am building. The way I have it working is that I periodically send a probe msg from my publishing apps. Subscribing apps receive this probe, cache it, and send an echo back at a time of their choosing, noting how much time the msg was kept “on hold”. The subscribing app receives these echos and calculates latency as (now() – time_sent – time_on_hold) / 2.
This kinda works, but the numbers are vastly different (3x) when “time on hold” is greater than 0. I.e if I echo the msg back immediately I get around 50us on my dev env and if I wait, then send the msg back the time jumps to 150us (though I discount whatever time I was on hold). I use QueryPerfomanceCounter for all measurements.
This is all inside a single Windows 7 box. What am I missing here?
TIA.

A bit more information. I am using the following to measure time:
static long long timeFreq;
static struct Init
{
Init()
{
QueryPerformanceFrequency((LARGE_INTEGER*) &timeFreq);
}
} init;
long long OS::now()
{
long long result;
QueryPerformanceCounter((LARGE_INTEGER*)&result);
return result;
}
double OS::secondsDiff(long long ts1, long long ts2)
{
return (double) (ts1-ts2)/timeFreq;
}
On the publish side I do something like:
Probe p;
p.sentTimeStamp = OS::now();
send(p);
Response r = recv();
latency=OS::secondsDiff(OS::now()- r.sentTimeStamp) - r.secondsOnHoldOnReceiver;
And on the receiver side:
Probe p = recv();
long long received = OS::now();
sleep();
Response r;
r.sentTimeStamp = p.timeStamp;
r.secondOnHoldOnReceiver = OS::secondsDiff(OS::now(), received);
send(r);

Ok, I have edited my answer to reflect your answer: Sorry for the delay, but I didn't notice that you had elaborated on the question by creating an answer.
It's seems that functionally you are doing nothing wrong.
I think that when you distribute your application outside of localhost conditions, the additional 100us (if it is indeed roughly constant) will pale into insignificance compared to the average latency of a functioning network.
For the purposes of answering your question am thinking that there is a thread/interrupt scheduling issue on the server side that needs to be investigated, as you do not seem to be doing anything on the client that is not accounted for.
Try the following test scenario:
Send two Probes to clients A and B. (all localhost)
Send the Probe to 'Client B' one second (or X/2 seconds) after you send the probe to Client A.
Ensuring that 'Client A' waits for two seconds (or X seconds) and 'Client B' waits 'one second (or X/2 seconds)
The idea being that hopefully, both clients will send back their probe answers at roughly the same time and both after a sleep/wait (performing the action that exposes the problem). The objective is to try to get one of the clients responses to 'wake up' the publisher to see if the next clients answer will be processed immediately.
If one of these returned probes is not showing the anomily (most likely the second response) it could point to the fact that the publisher thread is waking from a sleep cycle (on recv 1st responce) and is immediately available to process the second response.
Again, if it turns out that the 100us delay is roughly constant, it will be +-10% of 1ms which is the timeframe appropriate for realworld network conditions.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio