Convert Situation Into Linear Programming - algorithm

I want to find server system reliability based on multi-objective optimization. The optimization will be based on budget and duration of server usage.
Here is the situation,
Let say a company want to have a cloud storage system with specific amount of budget (B). Based on the budget, they will identify how many number of server can be purchase, depend on price of each server.
For example:
Budget: $100,000
Server Cost: $18,000
Total Server can be purchase: 5
Based on that,the company want to find maximum server reliability based on number of server combination and duration. They will set specific target duration, for example 10 years and target reliability for example 99.9%. According to reliability calculation, high number of server used will be the best reliability, but based on target duration and target reliability, the minimum server that achieve 99.9% reliability within 10 years is selected.
Here is the formula to find the reliability:
R = 1 - q(to the power of n)
R is the reliability
q is the failure rate
n is the number of server
Assume failure rate of the server is identical (same failure rate)
For example, a server having 0.2%/1000 hours, which means the device will fail to operate two times within one million hours. Consider the server is operated 24 hours a day in a year.
q = (0.2/100) * (1/1000) * 24 * 365
q = 0.01752
So the reliability of 1 server in one year is
R = 1 - 0.01752
R = 0.98248 which is 98.2%
I have made a calculation, but it is not in linear programming.
For example ( i want to provide screen shot image but not enough reputation),
Budget(b) = 100,000
Server cost(Sc) = 18,000
Total server (n)= 5
Fail per million hours (nF)= 2
million hours (mh)= 10(to the power of -6)
Duration to check reliability (y) = 10 years
Target reliability = 99.9%
Calculation:
Find total server, n=b/Sc
Find reliability:
R = Rs = 1-[(nF*mh)*(hours*day*y)]to the power of n
The OUTPUT, From the calculation, I found using:
1 server is not reliable
2 server is not reliable
3 server is not reliable
4 server is reliable
5 server is reliable
I want to convert this to linear programming and will display target output is 4

Related

How I can estimate maximum number of requests per second for J-meter with 8 users

Scenario is
Total Number of Users 50000
Ramp up time = 2 Minutes
Test Duration = 5 Minutes
while I've login credentials of 8 users
So, Please guide that how I can send 50000 requests in 2 minutes with 8 users
Request per Second (RPS) is a result of your load test execution. You cannot estimate it beforehand.
Typically, you have a number in mind like ex. 15rps based on your application history or the research you might have done. While you do load test, you assert if actual rps >= expected rps. Accordingly, you can work report your findings to business team / development team.
There are various factors like server configuration, network, think time which can affect your answer. With a lower server config (1 vcpu and 1 gb ram) you can expect a relatively low rps. And this number will improve as you increase server capacity.
Perhaps, follow this thread.

I have two systems in parallel, each with an 99.9% uptime throughout the year. What's the overall uptime?

If two systems in parallel have the same uptime of 99.9% over the year then how can I determine the uptime of the overall system?
0.99 * 0.99 = 0.9801 is the chance that both of them will be up (e.g. 1 is user server and 1 is product server)
The more PC you add to the system the bigger chance that either one of them goes down. A similar formula is used in calculating the RAID stability chance.
EDIT
If you consider that the system is stable while at least one node is up (e.g. 2 instances of the same service) then the chance is 100 - 0.01*0.01 = 99.9999

Writing a weighted load balancing algorithm

I've to write a weighted load balancing algorithm and I'm looking for some references. Is there any book ? that you can suggest to understand such algorithms.
Thanks!
A simple algorithm here isn't that complicated.
Let's say you have a list of servers with the following weights:
A 10
B 20
C 30
Where the higher weight represents it can handle more traffic.
Just divide the amount of traffic sent to each server by the weight and sort smallest to largest. The server that comes out on top gets the user.
for example, let's say each server starts at 10 users, then the order is going to be:
C - 10 / 30 = 0.33
B - 10 / 20 = 0.50
A - 10 / 10 = 1.00
Which means the next 5 requests will go to server C. The 6th request will go to either C or B. The 7th will go to whichever one didn't handle the 6th.
To complicate things, you might want the balancer to be more intelligent. In which case it needs to keep track of how many requests are currently being serviced by each of the servers and decrement them when the request is completely fulfilled.
Further complications include adding stickiness to sessions. Which means the balancer has to inspect each request for the session id and keep track of where they went last time.
On the whole, if you can just buy a product from a company that already does this.
Tomcat's balancer app and the tutorial here serve as good starting points.

Maximum number of concurrent connections jBoss

We are currently developing a servlet that will stream large image files to a client. We are trying to determine how many Jboss nodes we would need in our cluster with an Apache mod_jk load balancer. I know that it takes roughly 5000 milliseconds to serve a single request. I am trying to use the forumula here http://people.apache.org/~mturk/docs/article/ftwai.html to figure out how many connections are possible, but I am having an issue because they don't explain each one of the numbers in the formula. Specifically they say that you should limit each server to 200 requests per cpu, but I don't know if I should use that in the formula or not. Each server we are using will have 8 cores so I think the forumula should either go like this:
Concurrent Users = (500/5000) * 200 * 8 = 100 concurrent users
Or like this:
Concurrent Users = (500/5000) * (200 * 8) * 8 = ~1200 concurrent users
It makes a big difference which one they meant. Without a example in their documentation it is hard to tell. Could anyone clarify?
Thanks in advance.
I guess these images aren't static, or you'd have stopped at this line?
First thing to ease the load from the
Tomcat is to use the Web server for
serving static content like images,
etc..
Even if not, you've got larger issues than a factor of 8: the purpose of his formula is to determine how many concurrent connections you can handle without the AART (average application response time) exceeding 0.5 seconds. Your application takes 5 seconds to serve a single request. The formula as you're applying it is telling you 9 women can produce a baby in one month.
If you agree that 0.5 seconds is the maximum acceptable AART, then you first have to be able to serve a single request in <=0.5 seconds.
Otherwise, you need to replace his value for maximum AART in ms (500) with yours (which must be greater than or equal to your actual AART).
Finally, as to the question of whether his CPU term should account for cores: it's going to vary depending on CPU & workload. If you're serving large images, you're probably IO-bound, not CPU-bound. You need to test.
Max out Tomcat's thread pools & add more load until you find the point where your AART degrades. That's your actual value for the second half of his equation. But at that point you can keep testing and see the actual value for "Concurrent Users" by determining when the AART exceeds your maximum.

Creating a formula for calculating device "health" based on uptime/reboots

I have a few hundred network devices that check in to our server every 10 minutes. Each device has an embedded clock, counting the seconds and reporting elapsed seconds on every check in to the server.
So, sample data set looks like
CheckinTime Runtime
2010-01-01 02:15:00.000 101500
2010-01-01 02:25:00.000 102100
2010-01-01 02:35:00.000 102700
etc.
If the device reboots, when it checks back into the server, it reports a runtime of 0.
What I'm trying to determine is some sort of quantifiable metric for the device's "health".
If a device has rebooted a lot in the past but has not rebooted in the last xx days, then it is considered healthy, compared to a device that has a big uptime except for the last xx days where it has repeatedly rebooted.
Also, a device that has been up for 30 days and just rebooted, shouldn't be considered "distressed", compared to a device that has continually rebooted every 24 hrs or so for the last xx days.
I've tried multiple ways of calculating the health, using a variety of metrics:
1. average # of reboots
2. max(uptime)
3. avg(uptime)
4. # of reboots in last 24 hrs
5. # of reboots in last 3 days
6. # of reboots in last 7 days
7. # of reboots in last 30 days
Each individual metric only accounts for one aspect of the device health, but doesn't take into account the overall health compared to other devices or to its current state of health.
Any ideas would be GREATLY appreciated.
You could do something like Windows' 7 reliability metric - start out at full health (say 10). Every hour / day / checkin cycle, increment the health by (10 - currenthealth)*incrementfactor). Every time the server goes down, subtract a certain percentage.
So, given a crashfactor of 20%/crash and an incrementfactor of 10%/day:
If a device has rebooted a lot in the past but has not rebooted in the last 20 days will have a health of 8.6
Big uptime except for the last 2 days where it has repeatedly rebooted 5 times will have a health of 4.1
a device that has been up for 30 days and just rebooted will have a health of 8
a device that has continually rebooted every 24 hrs or so for the last 10 days will have a health of 3.9
To run through an example:
Starting at 10
Day 1: no crash, new health = CurrentHealth + (10 - CurrentHealth)*.1 = 10
Day 2: One crash, new health = currenthealth - currentHealth*.2 = 8
But still increment every day so new health = 8 + (10 - 8)*.1 = 8.2
Day 3: No crash, new health = 8.4
Day 4: Two crashes, new health = 5.8
You might take the reboot count / t of a particular machine and compare that to the standard deviation of the entire population. Those that fall say three standard deviations from the mean, where it's rebooting more often, could be flagged.
You could use weighted average uptime and include the current uptime only when it would make the average higher.
The weight would be how recent the uptime is, so that most recent uptimes have the biggest weight.
Are you able to break the devices out into groups of similar devices? Then you could compare an individual device to its peers.
Another suggestions is to look in to various Moving Average algorithms. These are supposed to smooth out time-series data as well as highlight trends.
Does it always report it a runtime of 0, on reboot? Or something close to zero (less then former time anyway)?
You could calculate this two ways.
1. The lower the number, the less troubles it had.
2. The higher the number, it scored the largest periods.
I guess you need to account, that the health can vary. So it can worsen over time. So the latest values, should have a higher weight then the older ones. This could indicate a exponential growth.
The more reboots it had in the last period, the more broken the system could be. But also looking at shorter intervals of the reboots. Let's say, 5 reboots a day vs. 10 reboots in 2 weeks. That does mean a lot different. So I guess time should be a metric as well as the amount of reboots in this formula.
I guess you need to calculate the density of the amount of reboots in the last period.
You can use the weight of the density, by simply dividing. Because how larger the number is, on which you divide, how lower the result will be, so how lower the weight of the number can become.
Pseudo code:
function calcHealth(machine)
float value = 0;
float threshold = 800;
for each (reboot in machine.reboots) {
reboot.daysPast = time() - reboot.time;
// the more days past, the lower the value, so the lower the weight
value += (100 / reboot.daysPast);
}
return (value == 0) ? 0 : (threshold / value);
}
You could advance this function by for example, filtering for maxDaysPast and playing with the threshold and stuff like that.
This formula is based on this plot: f(x) = 100/x. As you see, on low numbers (low x value), the value is higher, then on large x value. So that's on how this formula calculates the weight of the daysPast. Because lower daysPast == lower x == heigher weight.
With the value += this formula counts the reboots and with the 100/x part it gives weight to the reboot, on where the weight is the time.
At the return, the threshold is divided through the value. This is because, the higher the score of the reboots, the lower the result must be.
You can use a plotting program or calculator, to see the bending of the plot, which is also the bending of the weight of the daysPast.

Resources