When is the 4-hour downtime for free Heroku Postgres databases? - heroku

The free tier for Heroku Postgres databases has "up to 4 hours of downtime per month". I think I might have run into that downtime while presenting an app, and I'm trying to avoid problems like that in the future. Is there any way to find out when those 4 hours of downtime will occur? (Or, even better, could I schedule them during the middle of the night?)

Is there any way to find out when those 4 hours of downtime will occur? (Or, even better, could I schedule them during the middle of the night?)
No:
Expected uptime of 99.5% each month.
Unannounced maintenance and database upgrades.

Related

During migration of couchdb from 1.6.1 to 2.3.1, due to memory issues couchup utility taking lot of time rebuild views

During my migration of couchdb from 1.6.1 to 2.3.1, couchup utility is taking a lot of time to rebuild views. There are memory issues with couchup utility. The size of databases are in 500 GB range. It is taking forever. It has been almost 5 to 6 days and still not complete. Is there any way to speed it up?
When trying to do replicate, after 2-3 mins of couchup running, couchdb dies because of memory leak issues and again it starts. Replicate will take around 10 days. For replicate, it was showing progress bar but for rebuild views, it does not show progress bar. I don't know about the how much has been done.
The couchdb is installed in a RHEL Linux server.
reducing backlog growth:
As couchup encounters views that take longer than 5 seconds to rebuild, couchup is going to carry on calling additional view urls, triggering their rebuild. Once a number of long running view rebuilds are running even rebuilds that would have been shorter will take at least 5 seconds, leading to a large backlog. If individual databases are large or (map/reduce functions are very inefficient) it would probably be best to set the timeout to something like 5 minutes. If you see more than a couple:
Timeout, view is processing. Moving on.
messages it is probably time to kill couchup and double the timeout.
Observing Index growth
By default view_index_dir is the same as the database directory so if data is in /var/lib/couchdb/shards then /var/lib/couchdb is the configured directory and indexes are stored in /var/lib/coucdh/.shards. You can observe which index shard files are being created and growing or move view_index_dir somewhere separate for easier observation.
What resources are running out?
You can tune couchdb in general, it is hard to say whether tuning is needed once the system is not rebuilding all indexes, etc.
In particular, you would want to look for and disable any autocompaction. Look at files in /proc/[couchdb proc] to figure out the effective fd limits and how many open files there are and whether the crash happens around a specific number of open files. Due to sharding the number of open files is usually a multiple of the number of those in earlier versions.
Look at memory growth and figure out if it is stabalizing enough to use swap to prevent the problem.

Can a reuse AWS instances algorithm be still useful when switching to payments by usage minutes?

We have the algorithm to reuse AWS EC2 instances for jobs. This was very useful when the payments were by using time rounded by hours. Now, due to the change in the AWS policy the payments can be done by time expressed in minutes. At first glance there is no reason to keep the reuse algorithm because a job lasts at least 10 minutes up to many hours. Are there any suggestions why this algorithm can still be useful?
Unless you're running an EC2 instance for less than a minute (which it sounds like you're not) and need per second billing or are using an EBS volume with it, I would say probably not. See more details here

Amazon EC2 : High resources usage although I have no running instances

At the end of last month, I started experimenting with Amazon EC2. I launched one or two t2 micro instances, to experiment for free under the free tier.
However, I have quickly noticed on my billing dashboard that my services usage was increasing fast, and after a few dayswas forecast to exceed the free tier limitations by the end of the month. Since I don't need to maintain an 24/7 online presence at the moment and mostly use my instances to experiment, I have terminated one and stopped another. I have even detached the volume of the stopped instance.
All I have left are:
One stopped micro instance
One detached volume
Two snapshots of that volume
One AMI that I had used to move my instance from one region to another in the beginning
As far as I understand it, those more or less "idle" resources, at least not highly active ones. And yet, in my billing dashboard, in the row EC2 - Linux, the month-to-date usage keeps increasing, at the rate of more than one hour of usage per real-time hour.
Before I detached the volume, the usage would increase by 14 hours in only 3 hours. Now that I have detached it, it slowed down a bit, but still increased by 16 hours in the last 13 hours. All that, again, without actively running anything!
I am aware that the tally of the usage isn't strictly limited to running instances, but to associated resources as well. But still, it seems very high to me. I don't even dare to test my app anymore.
I would like to know if such an increase is normal, or if there may be something wrong with my account and how I configured my instances. If it is normal, any indications as to what actions I could take to reduce this increase would be very welcome!
Note that I did contact the Amazon support several days ago and didn't get a single reply, this is why I'm turning to here.
Thanks in advance!
Edit: Solved. I did have an instance running in another region, probably by mistake because I had never interacted directly with that region. See comments for a script to systematically check all regions and avoid this kind of stupid mistake.

AWS Amazon EC2 Spot pricing

I'd like a non-amazon answer to this quandry...
It looks like, via spot instance pricing, you could run an instance for 22 or 23 cents an hour, for as many hours as you want, because the historical charts for hours/days/months show the spot price never goes over 21 (22?) cents per hour. That's like half of the non-reserved instance cost for the same sized instance and its even less than a reserved instance would ever work out to be per hour. With no commitment.
Am I missing something, do I have a complete and total misunderstanding of the spot/bid/ask instance mechanisim? Or is this a cheap way to get an 24/7 instance while Amazon has a bunch of extra capacity?
Jeremy
No, you are not missing anything. I asked the same question many times when I first looked at Spot, followed by "why doesn't everyone use this all the time?"
So what's the downside? Amazon reserves the right to terminate a Spot instance at any time for any reason. Now, a normal "on-demand" instance might die at any time too, but Amazon goes to great efforts to keep them online and to serve customers with warnings well in advance (days / weeks) if the host server needs to be powered down for maintenance. If you have a Spot instance running on a server they want to reboot ... they will just shut it off. In practice, both are pretty reliable (but NOT 100%!!), and many roles can run 24/7 on spot without issues. Just don't go whining to Amazon that your Spot instance got shut off and your entire database was stored on the ephemeral drive... of course if you do that on ANY instance, you are taking a HUGE (and very stupid) risk.
Some companies are saving tons of money with Spot. Here's a writeup on Vimeo saving 50%, and one on Pinterest saving 60%+ ($54/hr => $20/hr).
Why don't more companies use Spot for their instances? Many of the companies buying EC2 instance hours aren't very price sensitive and are very very risk-adverse, especially when it comes to outages and to operational events that sap engineering effort. They don't want to deal with the hassle to save a few bucks, especially if AWS fees aren't a significant cost-center versus personel. And for 24/7 instances, they already pay 1/2 price via "reserved instances", so the savings aren't as dramatic as they seem versus full-priced "on-demand" instances. Spot isn't fully relevant to large customers. You can be nearly certain that when a customer gets to be the size of a Netflix, they 1) need to coordinate with Amazon on capacity planning because you can't just spin up 1/2 a datacenter on a whim, and 2) getting significant volume discounts that bring their usage costs down into the Spot price range anyways. Plus, the first tier of cost cutting is to reclaim hardware that isn't really needed; at my last company, one guy found a bug where as we cycled through boxes we would "forget" about some of them and shutting that down saved $100+k / month (yikes). Once companies burn through that fat, they start looking at Spot.
There's a second, less discussed reason Spot doesn't get used... It's a different API. Think about how this interacts with "organizational inertia" .... Working at a company that continuously spends $XX / hr on EC2 (and coming from a company that spent $XXXX / hr), engineers start instances with the tools they are given. Our Chef deployment doesn't know how to talk to spot. Rightscale (prev place) defaulted to launching on-demand instances. With some quantity of work, I could probably figure out how to make a spot instance, but why bother if my priority is to get role XYZ up and running by tomorrow? I'm not about to engineer a spot-based solution just for my one role and then evangelize why that was a good idea; it's gotta be an org-wide decision. If you read the Pinterest case-study I linked above, you'll notice they talk about migrating their whole deployment over from $54/hr to $20/hr on spot. Reading between the lines, they didn't choose to launch Spot instances 1-by-1; one day, they woke up and made a company-wide decision to "solve the spot problem" and 'migrate' their deployment tools to using Spot by default (probably with support for a flag that keeps their DB instances off Spot). I can't imagine how much money Amazon has made by making Spot a different API instead of being a flag on the normal EC2 API; Hint: it's boatloads .. as in, you could buy a boat and then fill it with cash until it sinks.
So if you are willing to tolerate slightly higher risk and / or you are somewhat price-sensitive ... then, yes, you absolutely can save a crapton of money by running your service under Spot 24/7.
Just make sure you are double-prepared to unexpectedly lose your instance (ie, take backups) .... something you ALREADY need to be prepared for with an "on-demand" instance that doesn't have 100.0% uptime either.
Think of it this way:
Instead of getting something 99.9% reliable, you are getting something 99.5% reliable and paying half-price
(I made those numbers up to convey the idea, but they probably aren't too far off from the truth).
So long as your bid price is above the spot instance market price, you can continue to run whatever spot instances you want, and only pay the market price.
However, when the market price goes above your bid price, you lose your instances. Without any warning. They just terminate. While the spot price rarely spikes, and when it does it tends to come back down again quickly, for many applications the possibility of losing all your instances without warming is unacceptable. You can insulate yourself against that possibility by bidding higher, but then you risk having to pay that much.
TL;DR: If your application is tolerant to sudden termination, then spot instances are great. But there is a risk involved in using them.
I think these answers are slightly missing the point...
You need to select the most appropriate pricing for your workload and architect your solution with this in mind. AWS offers 3 pricing types:
Reserved Instances (low cost, high reliability, but pay up-front)
On-demand instances (highest cost, high reliability, but pay as you go)
Spot Instances (generally lowest cost, but can terminate unexpectedly)
Reserved Instances
- Use these for cost savings on long running / constant / predictable workloads.
On-demand Instances
- Use these for temporary workloads e.g. development / proof of concept / unpredictable workloads that can't be interrupted.
Spot Instances
- Use these for transitory workloads. Ensure applications are designed with this in mind (e.g. maintain state somewhere permanent and support the ability for new instances to resume where previous ones left off).
A useful design pattern can be to have a "pilot light" instance and use auto-scaling to bring spot instances on as required, and with a bit of cunning bring on-demand instances on if spot-instances fail to appear.
TL;DR: Spot Instances are suitable for workloads that can pause and resume, but are not mission critical. They can be subject to extraordinary peaks (e.g. N. California m2.2xlarge spot price is usually $0.11/hr but has sustained peaks of $10.00/hr!).
Or is this a cheap way to get an 24/7 instance while Amazon has a bunch of extra capacity?
Spot on, if your bid price always remains above the spot price.
I couldn't find any other explicit mention of when they will terminate your instance.
I would have assumed it would be when they would require that capacity for customers willing to pay full charges for the instance, but then again, the spot price could technically go above the on-demand price.

Average EC2 Uptime?

Curious as to 99.95% uptime REALLY means; Is it really going to go down 7 minutes a month? Please post your longest/average uptimes on EC2, thanks.
Usually uptime is calculated in a yearly basis. So if you have a Service Level Agreement for 99.95% this means:
365 * 0.0005 = 0.1825 days or 4.38 hours
If during a year of service there is an outage and your system is down for more than that, then you are liable for compensation.
As of your question, I have a server running unstopped in EC2 for about 3 months now. I would say that their uptime is good, but if you have a mission critical application you definitely need to have a fail-over solution. A good uptime only means that they will be able to respond to an outage quickly. Even a 99.9999% uptime won't be able to save you if you aren't prepared for an outage.
Read the SLA carefully (http://aws.amazon.com/ec2-sla/) they only count "Region Unavailable" as downtime, and what is more they only count it as downtime if the region is down for 5 consecutive minutes.
“"Annual Uptime Percentage” is calculated by subtracting from 100% the percentage of 5 minute periods during the Service Year in which Amazon EC2 was in the state of “Region Unavailable.”
By my count this mean any downtime of less then 4 minutes is not countable. Also if they do break the SLA they are only in for %10 of the month in which you had largest downtime bill.
So if they where down for all of January and your bill was $100 they would apply a $10 credit to your account.
I would have a hard time convensing my boss that this is a serious product with a SLA like that.
SLA's are useless. They only measure how much risk the company is willing to take on and have no bearing on actual uptime. I've seen SLA's, with heavy penalties, offered when the company knew the could not meet the SLA in order to land the sale.
I have one client with 400+ days of EC2 uptime and another with 300+ days as measured by web pulse, this is by far the most reliable service I've worked with.
For my single instance running in the US-East availability zone, 9 months, 0 downtime.
Since Amazon switched to provide an SLA, I've never had an instance go down on me. When I've had instances go down in the past, Amazon has always sent a message informing me that the instance is degraded before it actually disappeared, so I've had time to start up a new instance.
The previous answer makes a good point, though; EC2's service model dictates that you write your apps to handle failover to a new server if you're not prepared for extended down time.
conrad#papa ~ $ uptime
04:42:36 up 495 days, 8:51, 8 users, load average: 0.02, 0.02, 0.00
Checking out the AWS Service Health Dashboard will get you a good idea of any current or past issues. My experience is that the AWS uptime is better than most "traditional" hosting options (even full-blown redundant RackSpace setup...).
However, simply going with AWS for uptime is like getting a car for the keychain (ok, almost.. ;)). With an architecture utilizing AWS the big benefit is scaling (without upfront costs).
SLA... Guaranteed uptime...
These are all very nice taglines. But when the servers aren't available for an hour (March 1, 2012, in the EU region) and the clients start calling, then it won't help you that they still have a 300 days uptime.
And when the lightning struck 1 out of 3 of their datacenters in the EU, we all found out that they have no off-site redundancies, and the fact that they have 3 datacenters doesn't mean a thing.
One must love the phrase "degraded performance", that actually means: "cross your fingers and pray that your data will still be available after the catastrophe passes".
I'm still trying to look for any official/non-official statistics about the availability percentages of all of their datacenters.
No luck thus far...
I've never had downtime on EC2, however, I do keep local backups and make daily images of my machines and port them to another availability zone, just in case. I use twilio to alert me if a machine is unreachable with a phone call to all my devices. Then I can just log in to EC2 and fire up a machine in another availability zone; worst case I'll be down for a few minutes.
Which in my case, is potentially pretty sucky, because my machines are doing 24/7 Forex trading.
My rule: know the potential cost of downtime, and be willing to invest that much in redundancy assuming it will happen - because it will.
That said, EC2 has never let me down. Helps probably that my servers are not in an area of the country where natural disasters are common. If you're in an earthquake zone, tornado alley, or a potential hurricane path, downtime truly is an inevitability.

Resources