Are Amazon's micro instances (Linux, 64bit) good for MongoDB servers? - performance

Do you think using an EC2 instance (Micro, 64bit) would be good for MongoDB replica sets?
Seems like if that is all they did, and with 600+ megs of RAM, one could use them for a nice set.
Also, would they make good primary (write) servers too?
My database is only 1-2 gigs now but I see it growing to 20-40 gigs this year (hopefully).
Thanks

They COULD be good - depending on your data set, but likely they will not be very good.
For starters, you dont get much RAM with those instances. Consider that you will be running an entire operating system and all related services - 613mb of RAM could get filled up very quickly.
MongoDB tries to keep as much data in RAM as possible and that wont be possible if your data set is 1-2 gigs and becomes even more of a problem if your data set grows to 20-40 gigs.
Secondly they are labeled as "Low IO performance" so when your data swaps to disk (and it will based on the size of that data set), you are going to suffer from disk reads due to low IO throughput.

Be aware that micro instances are designed for spiky CPU usage, and you will be throttled to the "low background level" if you exceed the allotment.
The AWS Micro Documentation has good information of what they are intended for.
Between the CPU and not very good IO performance my experience with using micros for development/testing has not been very good. (larger instance types have been fine though), but a micro may work for your use case.
However, there are exceptions for a config or arbiter nodes, I believe a micro should be good enough for these types of machines.
There is also some mongodb documentation specific to EC2 which might help.

Related

Redis vs memcached vs Scylla Cache - Which one to choose?

I'm designing an application where I want to cache million data each around 10kb.. I did some analysis and on the fence between using Redis vs memcached vs Scylla as Cache.. Can some experts suggests which might best suits my needs?
Highly performant
High availability
High Throughput
Low pricing?
Full disclosure - I work on the Scylla project.
I think it is a question of latency and HA vs cost. As a RAM-based system, Redis will be the lowest latency. If you need < 1 millisecond response, then Redis or memcached are the choice.
Scylla is a disk-based system. Those values that are in Scylla's RAM will be low latency, but those that need to pull from disk will be slower. So your 99p latency is likely to be slower. How slow? Depends on your disk. NVME can be 99p 3-5 ms. SSD, maybe 5-10 ms. If this is an acceptable latency, then Scylla will be much less expensive, as even NVME is much cheaper than RAM.
As for HA - Redis and memcached are intended as a cache. While there are some features and frameworks that you can use to replicate data around, these are all bolt-ons and increase complexity. Scylla is a distributed system by design. So the replication to allow for multiple layers of HA is built-in (node, rack and DC-availability)
Redis (and to a lesser extend, memcached) are phenomenal caches. But, depending upon your use case, Scylla might be the right choice.
All three options you mentioned are open-source software, so the pricing is the same - zero :-) However, both Scylla and Redis are written and backed by companies (ScyllaDB and RedisLabs, respectively), so if your use case is mission-critical you may choose to pay these companies for enteprise-level support, you can inquire with these companies what are their prices.
The more interesting difference between the three is in the technology.
You described a use case where you have 10 GB of data in the cache. This amount can be easily held in memory, so a completely in-memory database like Memcached or Redis is a natural choice. However, there are still questions you need to ask yourself, which may lead you to a distributed database, such as Scylla depending on your answers:
Would you be using powerful many-core machines? If so, you should probably rule out Memcached - my experience (and others' - see
Can memcached make full use of multi-core?) suggests that it does not scale well with many cores. On an 8-core machine you will not get anywhere close to 8 times the performance of a one-core machine.
Redis is also not really meant for multi-core use - https://redis.io/topics/benchmarks says that Redis "is not designed to benefit from multiple CPU cores. People are supposed to launch several Redis instances to scale out on several cores if needed.". Scylla, on the other hand, thrives on multi-core machines. You should probably test the performance of all three products on your use case before making a decision.
How much of a disaster would be to suddenly lose the entire content of your cache? In some use cases, it just means you would need to query some slightly-slower backend server, so suddenly losing the cache on reboot is acceptable. In such cases, a memory-only cache like Memached or Redis is probably exactly what you need. However, in other cases, there may be a big penalty for starting from scratch with an empty cache - the backend server might be very slow, or maybe the original content is stored on a far-away server with a slow and expensive WAN. In such a case you would want a disk-backed cache, so if the memory cache is lost, you can still refresh it from disk and not from the backend server. Redis has a disk backing option, and in Scylla disk backing is the main way.
You mentioned a working set of 10 GB, which can easily fit memory of a single server. But is it possible this will grow and in a year you'll find yourself needing to cache 100 GB or 1 TB, which no longer fits the memory of a single server? In memcached you'll be out of luck. Redis used to have a "virtual memory" solution for this purpose, but it is deprecated and https://redis.io/topics/virtual-memory now states that Redis is "without considering at least for now the support for databases bigger than RAM". Scylla does handle this issue in two ways. First, your cache would be stored on disk which can be much larger than memory (and whatever amount of memory you have will be used to further speed up that cache, but it doesn't need to fit memory). Second, Scylla is a distributed server. It can distribute a 100 GB working set to 10 different nodes. Redis also has "replication", but it copies the entire data to all nodes - while Scylla can optionally store different subsets of the data on different nodes.
In-memory is actually a bad thing since RAM is expensive and not persistent.
So Scylla will be a better option for K/V or columnar workloads.
Scylla also has a limited Redis api with good results [1], using the CQL
api will result in better results.
[1] https://medium.com/#siddharthc/redis-on-nvme-with-scylladb-5e12afd38dbc

Kubernetes number of replicas vs performance

I have just gotten into Kubernetes and really liking its ability to orchestrate containers. I had the assumption that when the app starts to grow, I can simply increase the replicas to handle the demand. However, now that I have run some benchmarking, the results confuse me.
I am running Laravel 6.2 w/ Apache on GKE with a single g1-small machine as the node. I'm only using NodePort service to expose the app since LoadBalancer seems expensive.
The benchmarking tool used are wrk and ab. When the replicas is increased to 2, requests/s somehow drops. I would expect the requests/s to increase since there are 2 pods available to serve the request. Is there a bottleneck occurring somewhere or perhaps my understanding is flawed. Do hope someone can point out what I'm missing.
A g1-small instance is really tiny: you get 50% utilization of a single core and 1.7 GB of RAM. You don't describe what your application does or how you've profiled it, but if it's CPU-bound, then adding more replicas of the process won't help you at all; you're still limited by the amount of CPU that GCP gives you. If you're hitting the memory limit of the instance that will dramatically reduce your performance, whether you swap or one of the replicas gets OOM-killed.
The other thing that can affect this benchmark is that, sometimes, for a limited time, you can be allowed to burst up to 100% CPU utilization. So if you got an instance and ran the first benchmark, it might have used a burst period and seen higher performance, but then re-running the second benchmark on the same instance might not get to do that.
In short, you can't just crank up the replica count on a Deployment and expect better performance. You need to identify where in the system the actual bottleneck is. Monitoring tools like Prometheus that can report high-level statistics on per-pod CPU utilization can help. In a typical database-backed Web application the database itself is the bottleneck, and there's nothing you can do about that at the Kubernetes level.

EC2 instance types for running a TitanDB cluster

I'm currently getting started on building up a graph database. For that I'm using Titan 1.0 and Cassandra 2.1.12 as the storage backend. For now I'll rely on Titans internal mechanisms for indexing and won't add any external indexing service like elasticsearch.
For the general surrounding the graph will be used in: For now the graph should mostly contain friendship and follower relations of my user base. Regarding read and write load I suspect some write load (e.g. when the user bulk-adds a lot of friends) and at the same time a lot of reads (e.g. the user wants a list of his friendships).
Today I ran some load tests and saw multiple times a spike in the metrics that Titan outputs.
I was wondering what kind of EC2 instances are best for running Titan? Right now I'm using r3.large but was wondering if maybe a little more CPU optimized instances would work better? Are there any benchmarks for different instance types out there?
Since the answer to your question is a little subjective I am going to point you in the direction of a post on Performance Tuning Titan in AWS. The post's author provides a comparison between the m4.large and m4.2xlarge with a Titan stack.
As you can see, moving from a m4.large (2 vCPU, 8 GiB memory) instance
to an m4.2xlarge (8 vCPU, 32 GiB) only gives a 9% gain in performance
when running this particular query, which shows it isn’t bound by
memory or CPU.
He points out that having multiple instances running an individual service will allow for fine grained tuning. This will help you once the architecture is in production since the expected read/write percentages are unknown. I think splitting the services to specific instances is going to give you the freedom to tune the stack better than simply moving to a larger instance.

Best practices for use of Neo4j on Google Compute Engine / Amazon EC2 instances

There is a very nice guide on optimizing linux machine for Neo4j. But this guide assumes the typical characteristics of a physical hard drive. I am running my Neo4j instances on Google CE and Amazon EC2. I am unable to find any document detailing an optimal setup for these virtual machines. What resources do I need in terms of memory (for heap or extended use) and disk speed / IOPS to get an optimal performance? I currently have a couple of million nodes and about ten million relationships (2 GBs) and the data size is increasing with imports.
On EC2 I used to rely on SSD scratch disks and then make regular backups to permanent disks. There is no such thing available on Compute Engines, and the write speeds don't seem very high to me, at least at normal disk sizes (because speed changes with size). Is there any way to get a reasonable performance on my import/index operations? Or maybe these operations have more to do with memory and compute capacities?
Any additional reading is welcome...
Use local disks whenever possible, SSDs are better than other, try provisioned ops on AWS.
EBS is not a good fit, it is slow and jittery.
No idea for compute engine though, you might want to use more RAM and try to load larger parts of the graph into memory then.
Additional reading: http://structr.org/blog/neo4j-performance-on-ext4
You still should check the other things mentioned in that blog post. Like Linux scheduler, write barriers etc.
Better to set those memory mapping settings manually. And for the 2nd level caches probably check out the enterprise version with the hpc cache.
See also this webinar: https://vimeo.com/46049647 on hw-sizing

AWS RDS Provisioned IOPS really worth it?

As I understand it, RDS Provisioned IOPS is quite expensive compared to standard I/O rate.
In Tokyo region, P-IOPS rate is 0.15$/GB, 0.12$/IOP for standard deployment. (Double the price for Multi-AZ deployment...)
For P-IOPS, the minimum required storage is 100GB, IOP is 1000.
Therefore, starting cost for P-IOPS is 135$ excluding instance pricing.
For my case, using P-IOPS costs about 100X more than using standard I/O rate.
This may be a very subjective question, but please give some opinion.
In the most optimized database for RDS P-IOPS, would the performance be worth the price?
or
The AWS site gives some insights on how P-IOPS can benefit the performance. Is there any actual benchmark?
SELF ANSWER
In addition to the answer that zeroSkillz wrote, I did some more research. However, please note that I am not an expert on reading database benchmarks. Also, the benchmark and the answer was based on EBS.
According to an article written by "Rodrigo Campos", the performance does actually improve significantly.
From 1000 IOPS to 2000 IOPS, the read/write(including random read/write) performance doubles. From what zeroSkillz said, the standard EBS block provices about 100 IOPS. Imagine the improvement on performance when 100 IOPS goes up to 1000 IOPS(which is the minimum IOPS for P-IOPS deployment).
Conclusion
According to the benchmark, the performance/price seems reasonable. For performance critical situations, I guess some people or companies should choose P-IOPS even when they are charged 100X more.
However, if I were a financial consultant in a small or medium business, I would just scale-up(as in CPU, memory) on my RDS instances gradually until the performance/price matches P-IOPS.
Ok. This is a bad question because it doesn't mention the size of the allocated storage or any other details of the setup. We use RDS and it has its pluses and minuses. First- you can't use an ephemeral storage device with RDS. You cant even access the storage device directly when using the RDS service.
That being said - the storage medium for RDS is presumed to be based on a variant of EBS from amazon. Performance for standard IOPS depends on the size of the volume and there are many sources stating that above 100GB storage they start to "stripe" EBS volumes. This provides better average case data access both on read and write.
We run currently about 300GB of storage allocation and can get 2k write IOP and 1k IOP about 85% of the time over a several hour time period. We use datadog to log this so we can actually see. We've seen bursts of up to 4k write IOPs, but nothing sustained like that.
The main symptom we see from an application side is lock contention if the IOPS for writing is not enough. The number and frequency you get of these in your application logs will give you symptoms for exhausting the IOPS of standard RDS. You can also use a service like datadog to monitor the IOPS.
The problem with provisioned IOPS is they assume steady state volumes of writes / reads in order to be cost effective. This is almost never a realistic use case and is the reason Amazon started cloud services to fix. The only assurance you get with P-IOPS is that you'll get a max throughput capability reserved. If don't use it, you pay for it still.
If you're ok with running replicas, we recommend running a read-only replica as a NON-RDS instance, and putting it on a regular EC2 instance. You can get better read-IOPS at a much cheaper price by managing the replica yourself. We even setup replicas outside AWS using stunnel and put SSD drives as the primary block device and we get ridiculous read speeds for our reporting systems - literally 100 times faster than we get from RDS.
I hope this helps give some real world details. In short, in my opinion - unless you must ensure a certain level of throughput capability (or your application will fail) on a constant basis (or at any given point) there are better alternatives to provisioned-IOPS including read-write splitting with read-replicas memcache etc.
So, I just got off of a call with an Amazon System Engineer, and he had some interesting insights related to this question. (ie. this is 2nd hand knowledge.)
standard EBS blocks can handle bursty traffic well, but eventually it will taper off to about 100 iops. There were several alternatives that this engineer suggested.
some customers use multiple small EBS blocks and stripe them. This will improve IOPS, and be the most cost effective. You don't need to worry about mirroring because EBS is mirrored behind the scenes.
some customers use the ephemeral storage on the EC2 instance. (or RDS instance) and have multiple slaves to "ensure" durabilty. The ephemeral storage is local storage and much faster than EBS. You can even use SSD provisioned EC2 instances.
some customers will configure the master to use provisioned IOPS, or SSD ephemeral storage, then use standard EBS storage for the slave(s). Expected performance is good, but failover performance is degraded (but still available)
anyway, If you decide to use any of these strategies, I would recheck with amazon to make sure I haven't forgotten any important steps. As I said before, this is 2nd hand knowledge.

Resources