I've to write a weighted load balancing algorithm and I'm looking for some references. Is there any book ? that you can suggest to understand such algorithms.
Thanks!
A simple algorithm here isn't that complicated.
Let's say you have a list of servers with the following weights:
A 10
B 20
C 30
Where the higher weight represents it can handle more traffic.
Just divide the amount of traffic sent to each server by the weight and sort smallest to largest. The server that comes out on top gets the user.
for example, let's say each server starts at 10 users, then the order is going to be:
C - 10 / 30 = 0.33
B - 10 / 20 = 0.50
A - 10 / 10 = 1.00
Which means the next 5 requests will go to server C. The 6th request will go to either C or B. The 7th will go to whichever one didn't handle the 6th.
To complicate things, you might want the balancer to be more intelligent. In which case it needs to keep track of how many requests are currently being serviced by each of the servers and decrement them when the request is completely fulfilled.
Further complications include adding stickiness to sessions. Which means the balancer has to inspect each request for the session id and keep track of where they went last time.
On the whole, if you can just buy a product from a company that already does this.
Tomcat's balancer app and the tutorial here serve as good starting points.
Related
I have a GET method in AWS Api Gateway. The cache is enabled for the stage, and works for most requests. However some requests seem to slip through to the backend no matter what I do. That is, some requests going through the API are not cached.
I have defined the parameter a, b & c to be cached; by checking their respective "caching"-box under the "request" settings. There are also other parameters which are not cached.
The request can either have all three parameters or just one:
example.com/?a=foo&b=bar&c=baz&d=qux
example.com/?a=foo&d=qux
a, b & c can take on between 3 and 25 different values. But a can only have one value if b & c are present. Also b cannot be present without c and vice versa.
An example, say the cache's TTL is 60 I send this between time 0 and 10:
example.com/?a=foo&b=bar&c=baz&d=qux
example.com/?a=quux&d=qux
example.com/?a=foo&b=quux&c=baz&d=qux
example.com/?a=foo&b=corge&c=fred&d=qux
example.com/?a=baz&d=qux
And then between time 30 and 40 I send the same requests and I might see the following log:
example.com/?a=foo&b=bar&c=baz&d=qux
example.com/?a=quux&d=qux
example.com/?a=baz&d=qux
So these requests were cached while the others weren't:
example.com/?a=foo&b=quux&c=baz&d=qux
example.com/?a=foo&b=corge&c=fred&d=qux
In the example above most were not cached but this is not the case in the real case; Most queries are cached. In the real case there are a fairly big number of requests coming in on the second run, about 600/s. In the first run the request-rate is about 1/s. The queries I see slipping through are among the first that would be requested by the application.
It seems unlikely that AWS API Gateway wouldn't be able to handle similar query rates (throttling is enabled at 10 000 requests and 5000 at burst) but yet it seems the first few queries the application sends slip through. Is this to be expected from API Gateway?
I was also thinking that there might be a cache size issue but increasing the cache does not seem to help.
So what reasons could there be for API Gateway to let seemingly cached requests slip through to the backend?
UPDATE: The nature of the application, which creates the requests is that it starts a request chain. Meaning, there are about 500-600 applications which all start at the same time. When they make a handful of asynchronously and then a chain of about 300-500 requests (synchronously).
With this in mind, The burst rate at 0 s is probably much higher. The ~600 requests/s stated above the average of ~36 000 queries over 60 s. Most of the requests would be done at the beginning of those 60 s but I don't have a number on the exact rate. An estimate might be about 1000-2000 requests/s for the first seconds and maybe even more (say 3000+) for the first second.
In short, I still don't know why this happens but I did manage to minimize the number of requests that slipped through.
I did this by having the requesting application delay the start (I explained the nature of the start sequence in the update to the question) by some random time. I let the application pick a random start time between 0 and 3 minutes to avoid spikes to API Gateway.
This didn't eliminate the phenomenon of requests slipping through but it lowered the number from about 500-1500 over 60s to between 0-10 over 3 minutes. Something my backend could easily handle, compared to the 1000+ over 60 s which was on the edge.
It seems to me that when API Gateway is flooded with a large number of requests over a short time it will just pass these requests through. I am surprised (and a little skeptical) that these numbers would be so large as to cause problems for AWS but that is what I see.
Perhaps this can be solved by changing the throttling levels, but I found no difference when playing around with it (mind you, I'm no expert!).
My question is related to telecommunications, but it's still pure programming challenge since I'm using a Soft-switch.
Goal:
create algorithm used by call routing engine to fully saturate
available link capacity with traffic sold at highest possible rate
Situation:
there is communications link (E1/T1) with fixed capacity of 30 voice
channels (1 channel = one voice call between end users, so we can have max 30 concurrent calls on each link)
link has fixed cost of running per month, so it's best when it's fully utilized all the time (fixed cost divided by more minutes results in higher profit)
there are users "fighting" for link capacity by sending calls to Call Routing Engine
each user can consume random link capacity at given time, it's possible that one user take whole capacity at one time (ie peek
hours) but consume no capacity in off-peak hours
each user has different call rate per minute
ideal situation: link is fully utilized (24/7/365) with calls made by users with highest call rate per minute
Available control:
call routing engine can accept call and send it using this link or reject the call
Available data:
current link usage
user rate per minute
recent calls per minute per user
user call history (access is costly, but possible)
Example:
user A has rate 1 cent per minute, B 0.8 cent, C 0.7 cent
it's best to accept user A calls and reject others if user A can fill full link capacity
BUT user A usually can't fill whole link capacity and we need to accept calls from others to fill the gap
we have no control on how many calls users will send at given moment, so It's hard to plan what calls to accept and what to reject
Any ideas or suggested approach to this problem?
I suspect that the simplest algorithm you can come up with may be the best - for example if you get a call from user type B or C, simply check if there are any calls from a user type A and if not accept then call.
The reasosn why it may be best to go simplest approach:
Tts easier!
Rejecting calls like this may not be allowed by the regulator depending on the area.
If there really is a strong business opportunity here then a VoIP solution is likely going to be easier and if your client does not ask you do this someone else will likely do it anyway. VoIP as a an alternative transport for high cost TDM legs of calls is a very common approach.
If i do a benchmark, and for example i found the following:
With 1 concurrent user, The api give 150 req/s. (9000 req/minute)
With more than 300 concurrent user, The api start throwing exception.
An app is doing request 1 every 30 minute.
Is it correct if I say:
the best cases is that the api could handle (30 * 9000 = 270.000 user). That is under 30 minute, there would be 270.000 sequential request and each are coming from different user
The worst cases would be when there is 300 user posting request at the same time.
And if it's true, would there any way to calculate the average case ?
Is is the same as calculating worst case, average case complexity of an algorithm ?
One theoretical tool to answer these questions is http://en.wikipedia.org/wiki/Queueing_theory. It says that you are very unlikely to get the level of performance that you are assuming, because the load applied to the system fluctuates, so that there are busy periods and quiet periods. If the system has nothing to do in quiet periods it is forced into idleness that you haven't accounted for. In busy periods, on the other hand, it will typically build up long queues of pending work, until the queues get so long that customers walk away, or the queues become longer than the system can support and it collapses, or both.
The graph at figure 1 page 3 of http://pages.cs.wisc.edu/~dsmyers/cs547/lecture_12_mm1_queue.pdf shows a graph of response time vs applied load for what is probably the most optimistic even vaguely realistic situation. You can see that response time gets very large as you approach maximum load.
By far the most sensible thing to do is to run tests which apply a realistic load to your application - this is important enough for people to build things like http://jmeter.apache.org/. If you want a rule of thumb I'd say don't plan to stress the system at more than 50% of theoretical capacity as you originally calculated.
We are currently developing a servlet that will stream large image files to a client. We are trying to determine how many Jboss nodes we would need in our cluster with an Apache mod_jk load balancer. I know that it takes roughly 5000 milliseconds to serve a single request. I am trying to use the forumula here http://people.apache.org/~mturk/docs/article/ftwai.html to figure out how many connections are possible, but I am having an issue because they don't explain each one of the numbers in the formula. Specifically they say that you should limit each server to 200 requests per cpu, but I don't know if I should use that in the formula or not. Each server we are using will have 8 cores so I think the forumula should either go like this:
Concurrent Users = (500/5000) * 200 * 8 = 100 concurrent users
Or like this:
Concurrent Users = (500/5000) * (200 * 8) * 8 = ~1200 concurrent users
It makes a big difference which one they meant. Without a example in their documentation it is hard to tell. Could anyone clarify?
Thanks in advance.
I guess these images aren't static, or you'd have stopped at this line?
First thing to ease the load from the
Tomcat is to use the Web server for
serving static content like images,
etc..
Even if not, you've got larger issues than a factor of 8: the purpose of his formula is to determine how many concurrent connections you can handle without the AART (average application response time) exceeding 0.5 seconds. Your application takes 5 seconds to serve a single request. The formula as you're applying it is telling you 9 women can produce a baby in one month.
If you agree that 0.5 seconds is the maximum acceptable AART, then you first have to be able to serve a single request in <=0.5 seconds.
Otherwise, you need to replace his value for maximum AART in ms (500) with yours (which must be greater than or equal to your actual AART).
Finally, as to the question of whether his CPU term should account for cores: it's going to vary depending on CPU & workload. If you're serving large images, you're probably IO-bound, not CPU-bound. You need to test.
Max out Tomcat's thread pools & add more load until you find the point where your AART degrades. That's your actual value for the second half of his equation. But at that point you can keep testing and see the actual value for "Concurrent Users" by determining when the AART exceeds your maximum.
Suppose you want to get from point A to point B. You use Google Transit directions, and it tells you:
Route 1:
1. Wait 5 minutes
2. Walk from point A to Bus stop 1 for 8 minutes
3. Take bus 69 till stop 2 (15 minues)
4. Wait 2 minutes
5. Take bus 6969 till stop 3(12 minutes)
6. Walk 7 minutes from stop 3 till point B for 3 minutes.
Total time = 5 wait + 40 minutes.
Route 2:
1. Wait 10 minutes
2. Walk from point A to Bus stop I for 13 minutes
3. Take bus 96 till stop II (10 minues)
4. Wait 17 minutes
5. Take bus 9696 till stop 3(12 minutes)
6. Walk 7 minutes from stop 3 till point B for 8 minutes.
Total time = 10 wait + 50 minutes.
All in all Route 1 looks way better. However, what really happens in practice is that bus 69 is 3 minutes behind due to traffic, and I end up missing bus 6969. The next bus 6969 comes at least 30 minutes later, which amounts to 5 wait + 70 minutes (including 30 m wait in the cold or heat). Would not it be nice if Google actually advertised this possibility? My question now is: what is the better algorithm for displaying the top 3 routes, given uncertainty in the schedule?
Thanks!
How about adding weightings that express a level of uncertainty for different types of journey elements.
Bus services in Dublin City are notoriously untimely, you could add a 40% margin of error to anything to do with Dublin Bus schedule, giving a best & worst case scenario. you could also factor in the chronic traffic delays at rush hours. Then a user could see that they may have a 20% or 80% chance of actually making a connection.
You could sort "best" journeys by the "most probably correct" factor, and include this data in the results shown to the user.
My two cents :)
For the UK rail system, each interchange node has an associated 'minimum transfer time to allow'. The interface to the route planner here then has an Advanced option allowing the user to either accept the default, or add half hour increments.
In your example, setting a' minimum transfer time to allow' of say 10 minutes at step 2 would prevent Route 1 as shown being suggested. Of course, this means that the minimum possible journey time is increased, but that's the trade off.
If you take uncertainty into account then there is no longer a "best route", but instead there can be a "best strategy" that minimizes the total time in transit; however, it can't be represented as a linear sequence of instructions but is more of the form of a general plan, i.e. "go to bus station X, wait until 10:00 for bus Y, if it does not arrive walk to station Z..." This would be notoriously difficult to present to the user (in addition of being computationally expensive to produce).
For a fixed sequence of instructions it is possible to calculate the probability that it actually works out; but what would be the level of certainty users want to accept? Would you be content with, say, 80% success rate? When you then miss one of your connections the house of cards falls down in the worst case, e.g. if you miss a train that leaves every second hour.
I wrote many years a go a similar program to calculate long-distance bus journeys in Finland, and I just reported the transfer times assuming every bus was on schedule. Then basically every plan with less than 15 minutes transfer time or so was disregarded because they were too risky (there were sometimes only one or two long-distance buses per day at a given route).
Empirically. Record the actual arrival times vs scheduled arrival times, and compute the mean and standard deviation for each. When considering possible routes, calculate the probability that a given leg will arrive late enough to make you miss the next leg, and make the average wait time P(on time)*T(first bus) + (1-P(on time))*T(second bus). This gets more complicated if you have to consider multiple legs, each of which could be late independently, and multiple possible next legs you could miss, but the general principle holds.
Catastrophic failure should be the first check.
This is especially important when you are trying to connect to that last bus of the day which is a critical part of the route. The rider needs to know that is what is happening so he doesn't get too distracted and knows the risk.
After that it could evaluate worst-case single misses.
And then, if you really wanna get fancy, take a look at the crime stats for the neighborhood or transit station where the waiting point is.