Azure web and worker role - 2x small instances or 1x medium? - performance

Which is better in terms of performance, 2 medium role instances or 4 small role instances?
What are the pro's and cons of each configuration?

The only real way to know if you gain advantage of using larger instances is trying and measuring, not asking here. This article has a table that says that a medium instance has everything twice as large as a small one. However in real life your mileage may vary and how this affects your application only you can measure.
Smaller roles have one important advantage - if instances fail separately you get smaller performance degradation. Supposing you know about "guaranteed uptime" requirement of having at least two instances, you have to choose between two medium and four small instances. If one small instance fails you lose 1/4 of your performance, but if one medium instance fails you lose half of performance.
Instances will fail if for example you have an unhandled exception inside Run() of your role entry point descendant and sometimes something just goes wrong big time and your code can't handle this and it'd better just restart. Not that you should deliberately target for such failures but you should expect them and take measures to minimize impact to your application.
So the bottom line is - it's impossible to say which gets better performance, but uptime implications are just as important and they are clearly in favor of smaller instances.

Good points by #sharptooth. One more thing to consider: When scaling in, the fewest number of instances is one, not zero. So, say you have a worker role that does some nightly task for an hour, and it requires either 2 Medium or 4 Small instances to get the job done in that timeframe. When the work is done, you may want to save costs by scaling to one instance and let it run as one instance for 23 hours until the next nightly job. With a single Small instance, you'll burn 23 core-hours, and with a single Medium instance, you'll burn 46 core-hours. This thinking also applies to your Web role, but probably more-so since you will probably have minimum two instances to make sure you have uptime SLA (it may not be as important for you to have SLA on your worker if, say, your end user never interacts with it and it's just for utility purposes).
My general rule of thumb when sizing: Pick the smallest VM size that can properly do the work, and then scale out/in as needed. Your choice will primarily be driven by CPU, RAM, and network bandwidth needs (and don't forget you need network when moving data between Compute and Storage).

For a start, you won't get the guaranteed uptime of 99% unless you have at least 2 roles role instances, this allows one to die and be restarted while the other one takes the burden. Otherwise, it is a case of how much you want to pay and what specs you get on each. It has not caused me any hassle having more than one role role instance, Azure hides the difficult stuff.

One other point maybe worth a mention if you use four small roles you would be able to run two in one datacenter and two in another datacenter and use traffic manager to route people at least which is closer. This might give you some performance gains.
Two mediums will give you more options to store stuff in cache at compute level and thus more in cache rather than coming off SQL Azure it is going to be faster.
Ideally you have to follow #sharptooth and measure and test. This is all very subjective and I second David also you want to start as small as possible and scale outwards. We run this way, you really want to think about designing your app around a more sharding aspect as this fits azure model better than working in traditional sense of just getting a bigger box to run everything on, at some point you run out into limits thinking in the bigger box process, ie.Like SQL Azure Connection limits.
Using technologies like Jmeter is your friend here and should give you some tools to test your app.
http://jmeter.apache.org/

Related

When is a new server instance better than just upgrading current one?

I'm just wondering, usually huge web apps have a few server instances. When is a new server instance better than just upgrading the current one's memory, cpu and etc.?
I understand location reasons, but wondering about any other reasons.
Here are a few reasons off the top of my head why scaling horizontally can be better than scaling vertically:
Fault tolerance. A server can do down (or be taken down by you for
maintenance/updates) and the site can stay online.
Red/black deployments. Let's say you have a new version of your software you want to roll out, but you're not 100% confident in how it works. If you have two or more servers, you can update one (and segment a % of your users to only use that server) and see how it goes. This way if there are egregious problems, you can roll back and only a few users will have seen problems.
You can treat your hardware as cattle and not pets. If you try to scale up one server to as large as it will go, you will need to be very careful with it. It will become a delicate flower that you don't allow anyone to touch. However if you have a bunch of cheaper, commoditized servers you can lose a few, update some, replace them at will. This will save you time and money in the long run.
Lastly, at some point you will not be able to scale up your one precious snowflake server. At that point what do you do? You now have to scale out while also dealing with a massive amount of traffic this one machine is handling. Better to do it earlier when things are simpler!

Prototyping for amazon Ec2

How do people (and start up companies) actually go about prototyping/deploying things on amazon and keep costs reasonable? Last month we were experimenting with some specific applications and running own hadoop cluster and managed to spend almost 1.5k just for tests ? Sure - they have micro instances, but what if you application is so intensive it actually requires a larger instance to even test? So I'd like some input as to how people go about doing this?
Several key issues:
Consider a local testbed for some purposes & consider if a given test really needs EC2. If it's really so hard to wrangle 2-4 machines to use as a testbed for Hadoop, there's a different problem. Get your head around whatever you're going to run, how Hadoop will play a role, and kick the tires on that. In time, you will also want to change your grid, upgrade software, tinker with other ideas, etc. When you go to EC2, you'll have smoothed some rough edges already.
Don't use a larger capacity machine than you need while getting the hang of things. If you're not pushing lots of data or compute cycles through at this stage, don't bother with cluster compute nodes, massive RAM instances, etc. Just focus on getting things set up correctly.
When you are ready to retarget to more powerful machines, try a few different machine setups. Maybe the cluster compute instances will pay off, maybe you don't need that kind of throughput: until you know your bottlenecks, don't overspend.
Be sure to use spot instances frequently during the testing phase. You will typically pay about 50% of the on-demand price.
If you get to a point where you want to pay for on-demand instances, have a separate instance start and stop Hadoop instances as needed - unless you need a big cluster all on cluster compute instances.
Prepare your AMIs to get launched as quickly as possible (under 1 minute) and never leave anything running overnight or over a weekend if it isn't necessary.
Until you get the system set up and running, you're basically paying tuition to learn how to get everything tailored to your needs. Just pay the "tuition" to learn each lesson (configurations, bottlenecks, scaling up, etc.), rather than try to take on everything at once. When you approach it as a series of lessons to be learned, it is less painful to spend the money, but as long as you know what you're about to test and learn, you will also spend money more judiciously.
Finally, compare the $1500 to the labor costs of this learning experience - it probably isn't a big deal in the long run. Once you know that something is going to be a reasonable block of computational effort, it's well engineered, and will finish quickly (albeit on many machines), it isn't so painful to spend money on it. Right now, it's hard to appreciate what you're learning because it doesn't yet benefit your org's goals.
To address cost issue while doing proof-of-concept of using Amazon Cloud.
I created a light-weight Java Application using Amazon AWS API, which creates the amazon cloud instances when I want to run a test on them. Once the test is finished or failed-to-start the application terminates the instances immediately by sending out diagnostic mail.
So, no amazon instance kept running or sitting ideal. Which can happen if you create/terminate manually or through a separate program.
Consider using spot instances. If you overbid, you can be almost sure it won't be terminated. In longer run they have price on a level of reserved instances, but you don't need to pay upfront. I believe you could also schedule the tests for non-peak hours, reaching even better prices, or switch to on-demand if spot instance price exceeds on-demand one - Hadoop should handle it nicely. Check this article about spot instances. It has also references to two other articles that analyze the potential of spot instances.

AWS Amazon EC2 Spot pricing

I'd like a non-amazon answer to this quandry...
It looks like, via spot instance pricing, you could run an instance for 22 or 23 cents an hour, for as many hours as you want, because the historical charts for hours/days/months show the spot price never goes over 21 (22?) cents per hour. That's like half of the non-reserved instance cost for the same sized instance and its even less than a reserved instance would ever work out to be per hour. With no commitment.
Am I missing something, do I have a complete and total misunderstanding of the spot/bid/ask instance mechanisim? Or is this a cheap way to get an 24/7 instance while Amazon has a bunch of extra capacity?
Jeremy
No, you are not missing anything. I asked the same question many times when I first looked at Spot, followed by "why doesn't everyone use this all the time?"
So what's the downside? Amazon reserves the right to terminate a Spot instance at any time for any reason. Now, a normal "on-demand" instance might die at any time too, but Amazon goes to great efforts to keep them online and to serve customers with warnings well in advance (days / weeks) if the host server needs to be powered down for maintenance. If you have a Spot instance running on a server they want to reboot ... they will just shut it off. In practice, both are pretty reliable (but NOT 100%!!), and many roles can run 24/7 on spot without issues. Just don't go whining to Amazon that your Spot instance got shut off and your entire database was stored on the ephemeral drive... of course if you do that on ANY instance, you are taking a HUGE (and very stupid) risk.
Some companies are saving tons of money with Spot. Here's a writeup on Vimeo saving 50%, and one on Pinterest saving 60%+ ($54/hr => $20/hr).
Why don't more companies use Spot for their instances? Many of the companies buying EC2 instance hours aren't very price sensitive and are very very risk-adverse, especially when it comes to outages and to operational events that sap engineering effort. They don't want to deal with the hassle to save a few bucks, especially if AWS fees aren't a significant cost-center versus personel. And for 24/7 instances, they already pay 1/2 price via "reserved instances", so the savings aren't as dramatic as they seem versus full-priced "on-demand" instances. Spot isn't fully relevant to large customers. You can be nearly certain that when a customer gets to be the size of a Netflix, they 1) need to coordinate with Amazon on capacity planning because you can't just spin up 1/2 a datacenter on a whim, and 2) getting significant volume discounts that bring their usage costs down into the Spot price range anyways. Plus, the first tier of cost cutting is to reclaim hardware that isn't really needed; at my last company, one guy found a bug where as we cycled through boxes we would "forget" about some of them and shutting that down saved $100+k / month (yikes). Once companies burn through that fat, they start looking at Spot.
There's a second, less discussed reason Spot doesn't get used... It's a different API. Think about how this interacts with "organizational inertia" .... Working at a company that continuously spends $XX / hr on EC2 (and coming from a company that spent $XXXX / hr), engineers start instances with the tools they are given. Our Chef deployment doesn't know how to talk to spot. Rightscale (prev place) defaulted to launching on-demand instances. With some quantity of work, I could probably figure out how to make a spot instance, but why bother if my priority is to get role XYZ up and running by tomorrow? I'm not about to engineer a spot-based solution just for my one role and then evangelize why that was a good idea; it's gotta be an org-wide decision. If you read the Pinterest case-study I linked above, you'll notice they talk about migrating their whole deployment over from $54/hr to $20/hr on spot. Reading between the lines, they didn't choose to launch Spot instances 1-by-1; one day, they woke up and made a company-wide decision to "solve the spot problem" and 'migrate' their deployment tools to using Spot by default (probably with support for a flag that keeps their DB instances off Spot). I can't imagine how much money Amazon has made by making Spot a different API instead of being a flag on the normal EC2 API; Hint: it's boatloads .. as in, you could buy a boat and then fill it with cash until it sinks.
So if you are willing to tolerate slightly higher risk and / or you are somewhat price-sensitive ... then, yes, you absolutely can save a crapton of money by running your service under Spot 24/7.
Just make sure you are double-prepared to unexpectedly lose your instance (ie, take backups) .... something you ALREADY need to be prepared for with an "on-demand" instance that doesn't have 100.0% uptime either.
Think of it this way:
Instead of getting something 99.9% reliable, you are getting something 99.5% reliable and paying half-price
(I made those numbers up to convey the idea, but they probably aren't too far off from the truth).
So long as your bid price is above the spot instance market price, you can continue to run whatever spot instances you want, and only pay the market price.
However, when the market price goes above your bid price, you lose your instances. Without any warning. They just terminate. While the spot price rarely spikes, and when it does it tends to come back down again quickly, for many applications the possibility of losing all your instances without warming is unacceptable. You can insulate yourself against that possibility by bidding higher, but then you risk having to pay that much.
TL;DR: If your application is tolerant to sudden termination, then spot instances are great. But there is a risk involved in using them.
I think these answers are slightly missing the point...
You need to select the most appropriate pricing for your workload and architect your solution with this in mind. AWS offers 3 pricing types:
Reserved Instances (low cost, high reliability, but pay up-front)
On-demand instances (highest cost, high reliability, but pay as you go)
Spot Instances (generally lowest cost, but can terminate unexpectedly)
Reserved Instances
- Use these for cost savings on long running / constant / predictable workloads.
On-demand Instances
- Use these for temporary workloads e.g. development / proof of concept / unpredictable workloads that can't be interrupted.
Spot Instances
- Use these for transitory workloads. Ensure applications are designed with this in mind (e.g. maintain state somewhere permanent and support the ability for new instances to resume where previous ones left off).
A useful design pattern can be to have a "pilot light" instance and use auto-scaling to bring spot instances on as required, and with a bit of cunning bring on-demand instances on if spot-instances fail to appear.
TL;DR: Spot Instances are suitable for workloads that can pause and resume, but are not mission critical. They can be subject to extraordinary peaks (e.g. N. California m2.2xlarge spot price is usually $0.11/hr but has sustained peaks of $10.00/hr!).
Or is this a cheap way to get an 24/7 instance while Amazon has a bunch of extra capacity?
Spot on, if your bid price always remains above the spot price.
I couldn't find any other explicit mention of when they will terminate your instance.
I would have assumed it would be when they would require that capacity for customers willing to pay full charges for the instance, but then again, the spot price could technically go above the on-demand price.

How to decide on what hardware to deploy web application

Suppose you have a web application, no specific stack (Java/.NET/LAMP/Django/Rails, all good).
How would you decide on which hardware to deploy it? What rules of thumb exist when determining how many machines you need?
How would you formulate parameters such as concurrent users, simultaneous connections, daily hits and DB read/write ratio to a decision on how much, and which, hardware you need?
Any resources on this issue would be very helpful...
Specifically - any hard numbers from real world experience and case studies would be great.
Capacity Planning is quite a detailed and extensive area. You'll need to accept an iterative model with a "Theoretical Baseline > Load Testing > Tuning & Optimizing" approach.
Theory
The first step is to decide on the Business requirements: how many users are expected for peak usage ? Remember - these numbers are usually inaccurate by some margin.
As an example, let's assume that all the peak traffic (at worst case) will be over 4 hours of the day. So if the website expects 100K hits per day, we dont divide that over 24 hours, but over 4 hours instead. So my site now needs to support a peak traffic of 25K hits per hour.
This breaks down to 417 hits per minute, or 7 hits per second. This is on the front end alone.
Add to this the number of internal transactions such as database operations, any file i/o per user, any batch jobs which might run within the system, reports etc.
Tally all these up to get the number of transactions per second, per minute etc that your system needs to support.
This gets further complicated when you have requirements such as "Avg response time must be 3 seconds etc" which means you have to figure in network latency / firewall / proxy etc
Finally - when it comes to choosing hardware, check out the published datasheets from each manufacturer such as Sun, HP, IBM, Windows etc. These detail the maximum transactions per second under test conditions. We usually accept 50% of those peaks under real conditions :)
But ultimately the choice of the hardware is usually a commercial decision.
Also you need to keep a minimum of 2 servers at each tier : web / app / even db for failover clustering.
Load testing
It's recommended to have a separate reference testing environment throughout the project lifecycle and post-launch so you can come back to run dedicated performance tests on the app. Scale this to be a smaller version of production, so if Prod has 4 servers and Ref has 1, then you test for 25% of the peak transactions etc.
Tuning & Optimizing
Too often, people throw some expensive hardware together and expect it all to work beautifully. You'll need to tune the hardware and OS for various parameters such as TCP timeouts etc - these are published by the software vendors, and these have to be done once the software are finalized. Set these tuning params on the Ref env, test and then decide which ones you need to carry over to Production.
Determine your expected load.
Setup a machine and run some tests against it with a Load testing tool.
How close are you if you only accomplished 10% of the peak load with some margin for error then you know you are going to need some load balancing. Design and implement a solution and test again. Make sure you solution is flexible enough to scale.
Trial and error is pretty much the way to go. It really depends on the individual app and usage patterns.
Test your app with a sample load and measure performance and load metrics. DB queries, disk hits, latency, whatever.
Then get an estimate of the expected load when deployed (go ask the domain expert) (you have to consider average load AND spikes).
Multiply the two and add some just to be sure. That's a really rough idea of what you need.
Then implement it, keeping in mind you usually won't scale linearly and you probably won't get the expected load ;)

How to create a system with 1500 servers that deliver results instantaneously?

I want to create a system that delivers user interface response within 100ms, but which requires minutes of computation. Fortunately, I can divide it up into very small pieces, so that I could distribute this to a lot of servers, let's say 1500 servers. The query would be delivered to one of them, which then redistributes to 10-100 other servers, which then redistribute etc., and after doing the math, results propagate back again and are returned by a single server. In other words, something similar to Google Search.
The problem is, what technology should I use? Cloud computing sounds obvious, but the 1500 servers need to be prepared for their task by having task-specific data available. Can this be done using any of the existing cloud computing platforms? Or should I create 1500 different cloud computing applications and upload them all?
Edit: Dedicated physical servers does not make sense, because the average load will be very, very small. Therefore, it also does not make sense, that we run the servers ourselves - it needs to be some kind of shared servers at an external provider.
Edit2: I basically want to buy 30 CPU minutes in total, and I'm willing to spend up to $3000 on it, equivalent to $144,000 per CPU-day. The only criteria is, that those 30 CPU minutes are spread across 1500 responsive servers.
Edit3: I expect the solution to be something like "Use Google Apps, create 1500 apps and deploy them" or "Contact XYZ and write an asp.net script which their service can deploy, and you pay them based on the amount of CPU time you use" or something like that.
Edit4: A low-end webservice provider, offering asp.net at $1/month would actually solve the problem (!) - I could create 1500 accounts, and the latency is ok (I checked), and everything would be ok - except that I need the 1500 accounts to be on different servers, and I don't know any provider that has enough servers that is able to distribute my accounts on different servers. I am fully aware that the latency will differ from server to server, and that some may be unreliable - but that can be solved in software by retrying on different servers.
Edit5: I just tried it and benchmarked a low-end webservice provider at $1/month. They can do the node calculations and deliver results to my laptop in 15ms, if preloaded. Preloading can be done by making a request shortly before the actual performance is needed. If a node does not respond within 15ms, that node's part of the task can be distributed to a number of other servers, of which one will most likely respond within 15ms. Unfortunately, they don't have 1500 servers, and that's why I'm asking here.
[in advance, apologies to the group for using part of the response space for meta-like matters]
From the OP, Lars D:
I do not consider [this] answer to be an answer to the question, because it does not bring me closer to a solution. I know what cloud computing is, and I know that the algorithm can be perfectly split into more than 300,000 servers if needed, although the extra costs wouldn't give much extra performance because of network latency.
Lars,
I sincerely apologize for reading and responding to your question at a naive and generic level. I hope you can see how both the lack of specifity in the question itself, particularly in its original form, and also the somewhat unusual nature of the problem (1) would prompt me respond to the question in like fashion. This, and the fact that such questions on SO typically emanate from hypotheticals by folks who have put but little thought and research into the process, are my excuses for believing that I, a non-practionner [of massively distributed systems], could help your quest. The many similar responses (some of which had the benefits of the extra insight you provided) and also the many remarks and additional questions addressed to you show that I was not alone with this mindset.
(1) Unsual problem: An [apparently] mostly computational process (no mention of distributed/replicated storage structures), very highly paralellizable (1,500 servers), into fifty-millisecondish-sized tasks which collectively provide a sub-second response (? for human consumption?). And yet, a process that would only be required a few times [daily..?].
Enough looking back!
In practical terms, you may consider some of the following to help improve this SO question (or move it to other/alternate questions), and hence foster the help from experts in the domain.
re-posting as a distinct (more specific) question. In fact, probably several questions: eg. on the [likely] poor latency and/or overhead of mapreduce processes, on the current prices (for specific TOS and volume details), on the rack-awareness of distributed processes at various vendors etc.
Change the title
Add details about the process you have at hand (see many questions in the notes of both the question and of many of the responses)
in some of the questions, add tags specific to a give vendor or technique (EC2, Azure...) as this my bring in the possibly not quite unbuyist but helpful all the same, commentary from agents at these companies
Show that you understand that your quest is somewhat of a tall order
Explicitly state that you wish responses from effective practionners of the underlying technologies (maybe also include folks that are "getting their feet wet" with these technologies as well, since with the exception of the physics/high-energy folks and such, who BTW traditionnaly worked with clusters rather than clouds, many of the technologies and practices are relatively new)
Also, I'll be pleased to take the hint from you (with the implicit non-veto from other folks on this page), to delete my response, if you find that doing so will help foster better responses.
-- original response--
Warning: Not all processes or mathematical calculations can readily be split in individual pieces that can then be run in parallel...
Maybe you can check Wikipedia's entry from Cloud Computing, understanding that cloud computing is however not the only architecture which allows parallel computing.
If your process/calculation can efficitively be chunked in parallelizable pieces, maybe you can look into Hadoop, or other implementations of MapReduce, for an general understanding about these parallel processes. Also, (and I believe utilizing the same or similar algorithms), there also exist commercially available frameworks such as EC2 from amazon.
Beware however that the above systems are not particularly well suited for very quick response time. They fare better with hour long (and then some) data/number crunching and similar jobs, rather than minute long calculations such as the one you wish to parallelize so it provides results in 1/10 second.
The above frameworks are generic, in a sense that they could run processes of most any nature (again, the ones that can at least in part be chunked), but there also exist various offerings for specific applications such as searching or DNA matching etc. The search applications in particular can have very short response times (cf Google for example) and BTW this is in part tied to fact that such jobs can very easily and quickly be chunked for parallel processing.
Sorry, but you are expecting too much.
The problem is that you are expecting to pay for processing power only. Yet your primary constraint is latency, and you expect that to come for free. That doesn't work out. You need to figure out what your latency budgets are.
The mere aggregating of data from multiple compute servers will take several milliseconds per level. There will be a gaussian distribution here, so with 1500 servers the slowest server will respond after 3σ. Since there's going to be a need for a hierarchy, the second level with 40 servers , where again you'll be waiting for the slowest server.
Internet roundtrips also add up quickly; that too should take 20 to 30 ms of your latency budget.
Another consideration is that these hypothethical servers will spend much of their time idle. That means they're powered on, drawing electricity yet not generating revenue. Any party with that many idle servers would turn them off, or at the very least in sleep mode just to conserve electricity.
MapReduce is not the solution! Map Reduce is used in Google, Yahoo and Microsoft for creating the indexes out of the huge data (the whole Web!) they have on their disk. This task is enormous and Map Reduce was built to make it happens in hours instead of years, but starting a Master controller of Map Reduce is already 2 seconds, so for your 100ms this is not an option.
Now, from Hadoop you may get advantages out of the distributed file system. It may allow you to distribute the tasks close to where the data is physically, but that's it. BTW: Setting up and managing an Hadoop Distributed File System means controlling your 1500 servers!
Frankly in your budget I don't see any "cloud" service that will allow you to rent 1500 servers. The only viable solution, is renting time on a Grid Computing solution like Sun and IBM are offering, but they want you to commit to hours of CPU from what I know.
BTW: On Amazon EC2 you have a new server up in a couple of minutes that you need to keep for an hour minimum!
Hope you'll find a solution!
I don't get why you would want to do that, only because "Our user interfaces generally aim to do all actions in less than 100ms, and that criteria should also apply to this".
First, 'aim to' != 'have to', its a guideline, why would u introduce these massive process just because of that. Consider 1500 ms x 100 = 150 secs = 2.5 mins. Reducing the 2.5 mins to a few seconds its a much more healthy goal. There is a place for 'we are processing your request' along with an animation.
So my answer to this is - post a modified version of the question with reasonable goals: a few secs, 30-50 servers. I don't have the answer for that one, but the question as posted here feels wrong. Could even be 6-8 multi-processor servers.
Google does it by having a gigantic farm of small Linux servers, networked together. They use a flavor of Linux that they have custom modified for their search algorithms. Costs are software development and cheap PC's.
It would seem that you are indeed expecting at least 1000-fold speedup from distributing your job to a number of computers. That may be ok. Your latency requirement seems tricky, though.
Have you considered the latencies inherent in distributing the job? Essentially the computers would have to be fairly close together in order to not run into speed of light issues. Also, the data center in which the machines would be would again have to be fairly close to your client so that you can get your request to them and back in less than 100 ms. On the same continent, at least.
Also note that any extra latency requires you to have many more nodes in the system. Losing 50% of available computing time to latency or anything else that doesn't parallelize requires you to double the computing capacity of the parallel portions just to keep up.
I doubt a cloud computing system would be the best fit for a problem like this. My impression at least is that the proponents of cloud computing would prefer to not even tell you where your machines are. Certainly I haven't seen any latency terms in the SLAs that are available.
You have conflicting requirements. You're requirement for 100ms latency is directly at odds with your desire to only run your program sporadically.
One of the characteristics of the Google-search type approach you mentioned in your question is that the latency of the cluster is dependent on the slowest node. So you could have 1499 machines respond in under 100ms, but if one machine took longer, say 1s - whether due to a retry, or because it needed to page you application in, or bad connectivity - your whole cluster would take 1s to produce an answer. It's inescapable with this approach.
The only way to achieve the kinds of latencies you're seeking would be to have all of the machines in your cluster keep your program loaded in RAM - along with all the data it needs - all of the time. Having to load your program from disk, or even having to page it in from disk, is going to take well over 100ms. As soon as one of your servers has to hit the disk, it is game over for your 100ms latency requirement.
In a shared server environment, which is what we're talking about here given your cost constraints, it is a near certainty that at least one of your 1500 servers is going to need to hit the disk in order to activate your app.
So you are either going to have to pay enough to convince someone to keep you program active and in memory at all times, or you're going to have to loosen your latency requirements.
Two trains of thought:
a) if those restraints are really, absolutely, truly founded in common sense, and doable in the way you propose in the nth edit, it seems the presupplied data is not huge. So how about trading storage for precomputation to time. How big would the table(s) be? Terabytes are cheap!
b) This sounds a lot like a employer / customer request that is not well founded in common sense. (from my experience)
Let´s assume the 15 minutes of computation time on one core. I guess thats what you say.
For a reasonable amount of money, you can buy a system with 16 proper, 32 hyperthreading cores and 48 GB RAM.
This should bring us in the 30 second range.
Add a dozen Terabytes of storage, and some precomputation.
Maybe a 10x increase is reachable there.
3 secs.
Are 3 secs too slow? If yes, why?
Sounds like you need to utilise an algorithm like MapReduce: Simplified Data Processing on Large Clusters
Wiki.
Check out Parallel computing and related articles in this WikiPedia-article - "Concurrent programming languages, libraries, APIs, and parallel programming models have been created for programming parallel computers." ... http://en.wikipedia.org/wiki/Parallel_computing
Although Cloud Computing is the cool new kid in town, your scenario sounds more like you need a cluster, i.e. how can I use parallelism to solve a problem in a shorter time.
My solution would be:
Understand that if you got a problem that can be solved in n time steps on one cpu, does not guarantee that it can be solved in n/m on m cpus. Actually n/m is the theoretical lower limit. Parallelism is usually forcing you to communicate more and therefore you'll hardly ever achieve this limit.
Parallelize your sequential algorithm, make sure it is still correct and you don't get any race conditions
Find a provider, see what he can offer you in terms of programming languages / APIs (no experience with that)
What you're asking for doesn't exist, for the simple reason that doing this would require having 1500 instances of your application (likely with substantial in-memory data) idle on 1500 machines - consuming resources on all of them. None of the existing cloud computing offerings bill on such a basis. Platforms like App Engine and Azure don't give you direct control over how your application is distributed, while platforms like Amazon's EC2 charge by the instance-hour, at a rate that would cost you over $2000 a day.

Resources