Two Questions Regarding AWS' RDS Multi AZ - amazon-ec2

I understand that when upgrading to a Multi-AZ rds from a Single-AZ, there occurs a "breef i/o freeze". What exactly does that mean?
When an upgrade is made to a Multi-AZ deployment, say from small to large, will the production database be impacted at all? Will it be able to use the backup databse, then failover?

Answers to your questions are written down :
When you choose to move from Single AZ to Multi AZ, brief I/O freeze happens. It means that for some duration database won't be accessible. No read,write operations will be performed on the database. Mostly, the duration for this is around 3-4 minutes.
Yeah, production database will be affected when you resize the compute(from small to large). The best idea to perform resize operation is during scheduled maintenance window. If select Apply Immediately option, for sometime the database won't be accessible(time to switch control to backup server).
Regards,
Sanket Dangi

the downtime when converting from single-AZ to multi-AZ is essentially the time it takes for a new instance to launch and become fully functional as sanket said, it may take a few mins.
scaling up a multi-AZ deployment first scales up the slave instance, then performs a failover. the downtime is the time it takes to do the actual failover - usually closer to a minute.
scaling out a multi-AZ deployment is done by adding additional read-replicas (sourced off of the standby) which incurs no interruption. keep in mind that adding read-replicas creates an eventually consistent system which may or may not be desirable.
it's also worth nothing that you should use the same instance types across all multi-AZ instances, otherwise the imbalance may incur replica lag.
as you're probably realizing, it's best to start with a multi-AZ configuration from the beginning. it makes scaling up and scaling out a lot easier and with less downtime.

Is question 1 still valid? According to AWS documentation (2022) there is no downtime, but there is a small decrease in performance.
Quoting AWS documentation:
When converting a DB instance from Single-AZ to Multi-AZ, Amazon RDS
creates a snapshot of the database volumes and restores these to new
volumes in a different Availability Zone. Although the newly restored
volumes are available almost immediately, they don’t reach their
specified performance until the underlying storage blocks are copied
from the snapshot.
Therefore, during the conversion from Single-AZ to Multi-AZ, you can
experience elevated latency and performance impacts. This impact is a
function of volume type, your workload, instance, and volume size, and
can be significant and may impact large write-intensive DB instances
during peak hour of operations.

Related

Should the threshold of an "expensive" operation change with infrastructure?

I'm using Amazon ElastiCache for session storage and expensive operation caching on a multi-node web application. One gotchya – I failed to take into account the network latency of the ElastiCache node(s) compared to a local Memcached server.
My benchmarks are show 1-2ms response times for ElastiCache calls within an AWS VPC (as advertised), pretty good, but obviously dramatically slower than anything local. In terms of actual compute cycles 1-2ms is a lifetime. This dramatically changes what I can consider an "expensive" operation worth caching.
My inexperience that led me down this path, but I would imagine others must have similar issues when moving into the "cloud".
Question: Is it better to rethink (and rewrite) what qualifies as an "expensive" operation, or should the infrastructure do a better job of supporting the code (for example, I could use a local memcached server on each node and only pass cache misses through to the ElastiCache node).
Some open questions:
How did you benchmarked ? 1-2ms was from connection opening to close, or just fetching time.
Why you need 15-20 average calls, can they be reduced ? If yes, that might be better.
Is Amazon ElasticCache on same region as your servers ?
Now coming to solutions (will update according to above questions' answer):
Possible Solution
Divide things which you are fine displaying stale values, and things which should be non-staled and common across all servers (example money in wallet, etc.). Now setup two layer caching, one layer i.e. on server only, and save all values which can be staled individually on different servers, and maintain common AmazonElastic cache for stable values. This can be one of working strategy.

AWS RDS Provisioned IOPS really worth it?

As I understand it, RDS Provisioned IOPS is quite expensive compared to standard I/O rate.
In Tokyo region, P-IOPS rate is 0.15$/GB, 0.12$/IOP for standard deployment. (Double the price for Multi-AZ deployment...)
For P-IOPS, the minimum required storage is 100GB, IOP is 1000.
Therefore, starting cost for P-IOPS is 135$ excluding instance pricing.
For my case, using P-IOPS costs about 100X more than using standard I/O rate.
This may be a very subjective question, but please give some opinion.
In the most optimized database for RDS P-IOPS, would the performance be worth the price?
or
The AWS site gives some insights on how P-IOPS can benefit the performance. Is there any actual benchmark?
SELF ANSWER
In addition to the answer that zeroSkillz wrote, I did some more research. However, please note that I am not an expert on reading database benchmarks. Also, the benchmark and the answer was based on EBS.
According to an article written by "Rodrigo Campos", the performance does actually improve significantly.
From 1000 IOPS to 2000 IOPS, the read/write(including random read/write) performance doubles. From what zeroSkillz said, the standard EBS block provices about 100 IOPS. Imagine the improvement on performance when 100 IOPS goes up to 1000 IOPS(which is the minimum IOPS for P-IOPS deployment).
Conclusion
According to the benchmark, the performance/price seems reasonable. For performance critical situations, I guess some people or companies should choose P-IOPS even when they are charged 100X more.
However, if I were a financial consultant in a small or medium business, I would just scale-up(as in CPU, memory) on my RDS instances gradually until the performance/price matches P-IOPS.
Ok. This is a bad question because it doesn't mention the size of the allocated storage or any other details of the setup. We use RDS and it has its pluses and minuses. First- you can't use an ephemeral storage device with RDS. You cant even access the storage device directly when using the RDS service.
That being said - the storage medium for RDS is presumed to be based on a variant of EBS from amazon. Performance for standard IOPS depends on the size of the volume and there are many sources stating that above 100GB storage they start to "stripe" EBS volumes. This provides better average case data access both on read and write.
We run currently about 300GB of storage allocation and can get 2k write IOP and 1k IOP about 85% of the time over a several hour time period. We use datadog to log this so we can actually see. We've seen bursts of up to 4k write IOPs, but nothing sustained like that.
The main symptom we see from an application side is lock contention if the IOPS for writing is not enough. The number and frequency you get of these in your application logs will give you symptoms for exhausting the IOPS of standard RDS. You can also use a service like datadog to monitor the IOPS.
The problem with provisioned IOPS is they assume steady state volumes of writes / reads in order to be cost effective. This is almost never a realistic use case and is the reason Amazon started cloud services to fix. The only assurance you get with P-IOPS is that you'll get a max throughput capability reserved. If don't use it, you pay for it still.
If you're ok with running replicas, we recommend running a read-only replica as a NON-RDS instance, and putting it on a regular EC2 instance. You can get better read-IOPS at a much cheaper price by managing the replica yourself. We even setup replicas outside AWS using stunnel and put SSD drives as the primary block device and we get ridiculous read speeds for our reporting systems - literally 100 times faster than we get from RDS.
I hope this helps give some real world details. In short, in my opinion - unless you must ensure a certain level of throughput capability (or your application will fail) on a constant basis (or at any given point) there are better alternatives to provisioned-IOPS including read-write splitting with read-replicas memcache etc.
So, I just got off of a call with an Amazon System Engineer, and he had some interesting insights related to this question. (ie. this is 2nd hand knowledge.)
standard EBS blocks can handle bursty traffic well, but eventually it will taper off to about 100 iops. There were several alternatives that this engineer suggested.
some customers use multiple small EBS blocks and stripe them. This will improve IOPS, and be the most cost effective. You don't need to worry about mirroring because EBS is mirrored behind the scenes.
some customers use the ephemeral storage on the EC2 instance. (or RDS instance) and have multiple slaves to "ensure" durabilty. The ephemeral storage is local storage and much faster than EBS. You can even use SSD provisioned EC2 instances.
some customers will configure the master to use provisioned IOPS, or SSD ephemeral storage, then use standard EBS storage for the slave(s). Expected performance is good, but failover performance is degraded (but still available)
anyway, If you decide to use any of these strategies, I would recheck with amazon to make sure I haven't forgotten any important steps. As I said before, this is 2nd hand knowledge.

Load balance/distribution for postgresql

I am coming here after spending considerable time trying to understand how to implement load balancing (distributing database processing load) between postgresql database servers.
I have a postgresql system which attracts about 100s of transactions per second and this is likely to grow. Please do note that my case has so many updates + inserts + selects as well. So any solution for me needs to cater to all insert/update and reads.
I am planning to use plproxy as suggested through db tools from skype at http://www.slideshare.net/adorepump/database-tools-by-skype.
Now I am also hearing that "postgresql streaming replication + hot standby" in postgres 9.0 can be considered
Can someone suggest me if there is any simple (or complex) solution to implement for the above scenario?
If your database is smaller than 100GB then you should first try to maximize what you can from one computer.
You'd need:
a good storage controller with large battery backed cache;
a bunch of fast disks in RAID10;
another bunch of disks in RAID10 for WAL;
more RAM than you have data;
as many fast processor cores as you can.
You'd be able to do several 1000s of tps with this one computer.
If it won't be enough I'd try to add a second hot standby server with streaming replication. You'd use it to run long running read-only report queries, backups etc. so your master server won't have to do these.
Only if it prove not enough then you should try to add more streaming replication hot standby servers to load balance read-only queries. This will be complicated though - because it is asynchronous there's delay between master confirming and stand-by seeing a change. You'd have to deal with it in your client application. Your setup will be a lot more complicated.

Does sharding an EC2 EBS volume yield performance gains?

I'm considering using EBS for a very large collection of maildirs. Lots of little files spread over lots of directories. Would sharding my EBS storage into multiple smaller containers yield performance gains in reading/writing versus one large EBS volume?
Maybe you can explain what exactly you mean by sharding. Otherwise, as performance of EBS is concerned, there are a few drawbacks:
It's network-bound (e.g. on a smaller instance, where more instances share one host, network performance is sub-stellar).
It's multi-tenant (again, multiple people on the host affect EBS)
Its performance varies (performance is never stable)
It's not SAN!
To mitigate some of these issues, a lot of people suggest to create a raid from multiple EBS volumes, I suggest the following articles:
http://www.mysqlperformanceblog.com/2009/08/06/ec2ebs-single-and-raid-volumes-io-bencmark/
http://orion.heroku.com/past/2009/7/29/io_performance_on_ebs/
Bottomline, for a maildir, I'd probably look into real hardware. It doesn't sound like that you need to scale up/down from one minute to another. I'd probably get a setup in place and if necessary use a cloud-solution in addition to temporarily handle spikes (before you get more hardware in place).
Let me know if this helps!

PostgreSQL performance on EC2/EBS

What gives best performance for running PostgreSQL on EC2? EBS in RAID? PGData on /mnt?
Do you have any preferences or experiences? Main "plus" for running PostgreSQL on EBS is switching from one to another instance. Can this be the reason to be slower than using the /mnt partition?
PS: I'm running PostgreSQL 8.4 with datas/size about 50G, Amazon EC2 xlarge(64) instance.
Here there is some linked info. The main take-away is this post from Bryan Murphy:
Been running a very busy 170+ gb OLTP postgres database on Amazon for
1.5 years now. I can't say I'm "happy" but I've made it work and still prefer it to running downtown to a colo at 3am when something
goes wrong.
There are two main things to be wary of:
1) Physical I/O is not very good, thus how that first system used a RAID0.
Let's be clear here, physical I/O is at times terrible. :)
If you have a larger database, the EBS volumes are going to become a
real bottleneck. Our primary database needs 8 EBS volumes in a RAID
drive and we use slony to offload requests to two slave machines and
it still can't really keep up.
There's no way we could run this database on a single EBS volume.
I also recommend you use RAID10, not RAID0. EBS volumes fail. More
frequently, single volumes will experience very long periods of poor
performance. The more drives you have in your raid, the more you'll
smooth things out. However, there have been occasions where we've had
to swap out a poor performing volume for a new one and rebuild the
RAID to get things back up to speed. You can't do that with a RAID0
array.
2) Reliability of EBS is terrible by database standards; I commented on this
a bit already at
http://archives.postgresql.org/pgsql-general/2009-06/msg00762.php The end
result is that you must be careful about how you back your data up, with a
continuous streaming backup via WAL shipping being the recommended approach.
I wouldn't deploy into this environment in a situation where losing a
minute or two of transactions in the case of a EC2/EBS failure would be
unacceptable, because that's something that's a bit more likely to hapen
here than on most database hardware.
Agreed. We have three WAL-shipped spares. One streams our WAL files
to a single EBS volume which we use for worst case scenario snapshot
backups. The other two are exact replicas of our primary database
(one in the west coast data center, and the other in an east coast
data center) which we have for failover.
If we ever have to worst-case-scenario restore from one of our EBS
snapshots, we're down for six hours because we'll have to stream the
data from our EBS snapshot back over to an EBS raid array. 170gb at
20mb/sec (if you're lucky) takes a LONG time. It takes 30 to 60
minutes for one of those snapshots to become "usable" once we create a
drive from it, and then we still have to bring up the database and
wait an agonizingly long time for hot data to stream back into memory.
We had to fail over to one of our spares twice in the last 1.5 years.
Not fun. Both times were due to instance failure.
It's possible to run a larger database on EC2, but it takes a lot of
work, careful planning and a thick skin.
Bryan

Resources