as I understand it's technically possible to have redis cluster with nodes spreaded by different regions in amazon cloud (EC2) - so I will be able to obtain the same data in machines in each region.
But here is 2 questions I am not sure:
how it would impact on redis' speed? As guys measured (Speed from Different EC2 Regions) - there is about 4 times difference. What should it mean for redis?
how much would it cost for me? Or in other words - how much service traffic does reddis generate when work in cluster (for example, by one node in both of two regions)?
I have no any practical experience with redis, but seems it can be very useful for my purposes.
thanks a lot for help.
Costing to you will be the cost of bandwidth and instance hours.
Related
I have one scylla db cluster with 9 nodes and RF=3 using amazon AWS i3en.xlarge instance.
I'm curious if 3 i3en.3xlarge are much better than 9 i3en.xlarge.
Full disclosure - I work on the ScyllaDB project.
Theoretically, Scylla's shard-per-core architecture means that 16 4xlarges or 4 16xlarges should perform fundamentally the same. Each vCPU performs as in independent shared-nothing shard doing its own thing. So, how those shards are configured is irrelevant.
However, in the real world, there are good reasons for scaling up, rather than scaling out. For example:
Larger nodes have better network guarantees from AWS.
Larger nodes have fewer noisy neighbor problems.
Managing a few nodes is generally easier than managing many nodes.
Generally speaking, our users have had better experiences with larger nodes. But the choice is yours.
I'm currently getting started on building up a graph database. For that I'm using Titan 1.0 and Cassandra 2.1.12 as the storage backend. For now I'll rely on Titans internal mechanisms for indexing and won't add any external indexing service like elasticsearch.
For the general surrounding the graph will be used in: For now the graph should mostly contain friendship and follower relations of my user base. Regarding read and write load I suspect some write load (e.g. when the user bulk-adds a lot of friends) and at the same time a lot of reads (e.g. the user wants a list of his friendships).
Today I ran some load tests and saw multiple times a spike in the metrics that Titan outputs.
I was wondering what kind of EC2 instances are best for running Titan? Right now I'm using r3.large but was wondering if maybe a little more CPU optimized instances would work better? Are there any benchmarks for different instance types out there?
Since the answer to your question is a little subjective I am going to point you in the direction of a post on Performance Tuning Titan in AWS. The post's author provides a comparison between the m4.large and m4.2xlarge with a Titan stack.
As you can see, moving from a m4.large (2 vCPU, 8 GiB memory) instance
to an m4.2xlarge (8 vCPU, 32 GiB) only gives a 9% gain in performance
when running this particular query, which shows it isn’t bound by
memory or CPU.
He points out that having multiple instances running an individual service will allow for fine grained tuning. This will help you once the architecture is in production since the expected read/write percentages are unknown. I think splitting the services to specific instances is going to give you the freedom to tune the stack better than simply moving to a larger instance.
As I understand it, RDS Provisioned IOPS is quite expensive compared to standard I/O rate.
In Tokyo region, P-IOPS rate is 0.15$/GB, 0.12$/IOP for standard deployment. (Double the price for Multi-AZ deployment...)
For P-IOPS, the minimum required storage is 100GB, IOP is 1000.
Therefore, starting cost for P-IOPS is 135$ excluding instance pricing.
For my case, using P-IOPS costs about 100X more than using standard I/O rate.
This may be a very subjective question, but please give some opinion.
In the most optimized database for RDS P-IOPS, would the performance be worth the price?
or
The AWS site gives some insights on how P-IOPS can benefit the performance. Is there any actual benchmark?
SELF ANSWER
In addition to the answer that zeroSkillz wrote, I did some more research. However, please note that I am not an expert on reading database benchmarks. Also, the benchmark and the answer was based on EBS.
According to an article written by "Rodrigo Campos", the performance does actually improve significantly.
From 1000 IOPS to 2000 IOPS, the read/write(including random read/write) performance doubles. From what zeroSkillz said, the standard EBS block provices about 100 IOPS. Imagine the improvement on performance when 100 IOPS goes up to 1000 IOPS(which is the minimum IOPS for P-IOPS deployment).
Conclusion
According to the benchmark, the performance/price seems reasonable. For performance critical situations, I guess some people or companies should choose P-IOPS even when they are charged 100X more.
However, if I were a financial consultant in a small or medium business, I would just scale-up(as in CPU, memory) on my RDS instances gradually until the performance/price matches P-IOPS.
Ok. This is a bad question because it doesn't mention the size of the allocated storage or any other details of the setup. We use RDS and it has its pluses and minuses. First- you can't use an ephemeral storage device with RDS. You cant even access the storage device directly when using the RDS service.
That being said - the storage medium for RDS is presumed to be based on a variant of EBS from amazon. Performance for standard IOPS depends on the size of the volume and there are many sources stating that above 100GB storage they start to "stripe" EBS volumes. This provides better average case data access both on read and write.
We run currently about 300GB of storage allocation and can get 2k write IOP and 1k IOP about 85% of the time over a several hour time period. We use datadog to log this so we can actually see. We've seen bursts of up to 4k write IOPs, but nothing sustained like that.
The main symptom we see from an application side is lock contention if the IOPS for writing is not enough. The number and frequency you get of these in your application logs will give you symptoms for exhausting the IOPS of standard RDS. You can also use a service like datadog to monitor the IOPS.
The problem with provisioned IOPS is they assume steady state volumes of writes / reads in order to be cost effective. This is almost never a realistic use case and is the reason Amazon started cloud services to fix. The only assurance you get with P-IOPS is that you'll get a max throughput capability reserved. If don't use it, you pay for it still.
If you're ok with running replicas, we recommend running a read-only replica as a NON-RDS instance, and putting it on a regular EC2 instance. You can get better read-IOPS at a much cheaper price by managing the replica yourself. We even setup replicas outside AWS using stunnel and put SSD drives as the primary block device and we get ridiculous read speeds for our reporting systems - literally 100 times faster than we get from RDS.
I hope this helps give some real world details. In short, in my opinion - unless you must ensure a certain level of throughput capability (or your application will fail) on a constant basis (or at any given point) there are better alternatives to provisioned-IOPS including read-write splitting with read-replicas memcache etc.
So, I just got off of a call with an Amazon System Engineer, and he had some interesting insights related to this question. (ie. this is 2nd hand knowledge.)
standard EBS blocks can handle bursty traffic well, but eventually it will taper off to about 100 iops. There were several alternatives that this engineer suggested.
some customers use multiple small EBS blocks and stripe them. This will improve IOPS, and be the most cost effective. You don't need to worry about mirroring because EBS is mirrored behind the scenes.
some customers use the ephemeral storage on the EC2 instance. (or RDS instance) and have multiple slaves to "ensure" durabilty. The ephemeral storage is local storage and much faster than EBS. You can even use SSD provisioned EC2 instances.
some customers will configure the master to use provisioned IOPS, or SSD ephemeral storage, then use standard EBS storage for the slave(s). Expected performance is good, but failover performance is degraded (but still available)
anyway, If you decide to use any of these strategies, I would recheck with amazon to make sure I haven't forgotten any important steps. As I said before, this is 2nd hand knowledge.
I'm in the planning stages of estimating server costs for my web application. How can I determine how many Amazon EC2 instances will I need to handle a database backed web application with 1M active users? How should I go about filling out this monthly calculator on Amazon's site?
http://calculator.s3.amazonaws.com/calc5.html
The web application will be somewhat akin to a social networking site. There will be most likely small, but anywhere from 100,000 to 500,000 data transfers from users to the servers on a daily basis.
To get an accurate estimate of your costs, you will have understand the application architecture, usage patterns and how many servers (instances) and storage and data transfer you expect to use.
Take a look at this video: https://www.youtube.com/watch?v=PsEX3W6lHN4&list=PLhr1KZpdzukcAtqFF32cjGUNNT5GOzKQ8 This video might help you understand how to use the calculator and fill up different values in it.
Jin
Capacity planning is yours, it's specific to app so nobody including Amazon can suggest anything on that regards. Regarding cost estimation, yes you can use monthly calculator. Only thing I could suggest is that when you do your capacity planning make sure that you do your homework like which AWS service you are going to use. For each service you might want to find out unit of measure used for pricing. Once you know that you should do your capacity planning accordingly to find out how much you are projecting to use for a given UOM of a given service on monthly basis. One exception to that is, as you can reserve instances for 3-5 years with upfront fees so you might want to spread that cost across 3 or 5 years based on your choice.
I'm based in the UK, as are all users of my web app, and currently host in the EU-West region. The US-East region is quite a bit cheaper and I'm using a service from another company that locate their servers in the US-East region (meaning that I'll have data transfer costs between regions if I kept things in the EU). Has much of a speed difference am I likely to see between the two?
(I could do a test myself but I'm hoping someone else has already done it :) )
I'd appreciate insights that anyone has. Thanks in advance
I found a nice service which compares the speed and latency of different clouds and different regions:
http://cloudharmony.com/speedtest
You can pick only the amazon services and test all regions.
Speed within Availability zones is very impressive. When you start talking about across continents you're going to get some lag and much less bandwidth. I think it would depend what you're passing over the network whether it would work in your case or not. However, to answer your question there is a very large difference in speed.