Redis backups affecting Read performance - performance

The read performance of Redis cluster (version > 2.8.22 in AWS) we have is being affected lately by regular scheduled snapshots/backups. I see read operations increase in latency (or Timeouts) at the time of creation of redis backups.
As per AWS docs Redis backups with version > 2.8.22 spin a child process (in replicas) when enough memory is available to create a snapshot. So, this mean redis doesn't fork the process (of creating snapshot) when enough memory isn't available.
So, my question is how much is the enough memory for Redis to spin up a child process to create backups?
Is there a way to know whether replicas in my Redis cluster is forking a child process to create backups or not?
My Redis replicas has about 15 - 20% of available memory while creating the backups. Is this enough to not affect the read performance?
Some steps we took to mitigate the issue:
Increase number of replicas
Increase reserved-memory (to 10%).
But, both steps didn't mitigate the issue.
Does increase in reserved-memory help in improving the read performance?. As per the AWS docs (https://docs.aws.amazon.com/AmazonElastiCache/latest/red-ug/backups.html#backups-performance) reserved-memory helps to not affect Write performance.
Other workaround I'm thinking is to add a new shard to cluster. Adding a new shard would increase the available/free memory of each Replica and thus it always guarantee (theoretically) the forking of child process to create snapshots.
But, we also don't want to have too many shards to our cluster as too many shards could reduce our current read performance.
So, are there any other steps to make the snapshots/backups creation not affect Read performance?

Related

Is elasticsearch safe in single node in production environment of sensitive information?

Elastic on single node it can be faster than cluster, but what are the advantages and disadvantages of using it in a production environment with only one node.
I have a problem because my DBA wants to use elastic in single node with the justification of speed, but he is not taking into account availability, redundancy and failures against disaster the system is not slow, but all documentation I read about elastic says that in production environment it needs to run in cluster with his nodes/shard, We are an information bureau, we provide data for banks, credit analysis, and numerous large customer applications. Help me with arguments that prove that I am sure that the information we are dealing with needs high availability and redundancy. The index size is about 2.2TB I wanted to run on cluster as the information is very sensitive, but my DBA wants to run on single node, on production environment Help me give him an answer, if he's right or I'm wrong.
You should run a benchmark of the potential workloads if he needs proof.
It isn't really viable to run any real production load on a single system because the shards are what really create the speed of elasticsearch: Queries are run on many shards and partial results are returned to form the full result. If you use a single node to scan the entire dataset it will take a very long time for that single node to process everything.
I guess the issue really comes down to how much your data is getting updated and how many queries are running in parallel.
Without any network traffic it could seem faster on a small workload, but if your search cluster needs to do continuous indexing and run parallel queries it will just get stuck and stop returning results.
If a single-node cluster will be faster or slower than a multi-node cluster depends on the use case and many other factors, the argument that a single-node is faster is not valid without a comparative benchmark of the real use case.
For example, If your use case has more indexing than searchs, a single-node cluster can be faster, if your use case has more searchs than indexing, then a single-node cluster can be slower.
But even if it is faster in a specific use case, running a single-node cluster in production is not recommended and it is very risky.
The main issue is that a single-node has no resilience to failure, if your node is down, your entire cluster is down and until your node is back up running, the data in your elasticsearch cluster is unavailable.
Depending on how you will ingest the data, this can also lead to data loss.
If for some reason the data in your node gets corrupted or lost, you will need to restore it from a previous snapshot. On a multi-node cluster if a node is lost, you can spin-up another one and the cluster will take care of replicating the data, assuming you use replicas for your index.
There are also some limitations on how many shards you can have on a node and how many memory you should use for the java heap memory, the recomendations by elastic in those cases is to try to keep the number of shards per GB of heap memory below 20 and do not use more than 30 GB for the heap memory, so for a single-node this will give you a maximum of 600 shards.
If this is enough, depends on the use case and your indexing/shard strategy.
You should ask yourself if you can afford downtime and lose data, if the answer is no to both, then you should not use a single-node cluster.

How to set the number of replicas properly when I try to start a TDengine cluster?

I was wondering how to set the replica parameter properly when start a TDengine cluster to balance the storage and high availability? According to documentation of TDengine, default value of replica is 1 which means no copies for each vnode (vGroup size should be 1 as well), and the replica can be dynamically changed to maintain a high avilability of the cluster. However, the extra vnode copies have to be generated physically when starting up multi-replica. So the problem rise up, how should a real company determine the value of replica to increase availability without taking up too much overhead(storage and performance) when using TDengine cluster?
replica means keeping a copy of the same data on multiple machines that are connected via a network. There are reasons you want to replicate data:
To keep data geographically close to your users (and thus reduce latency)
To allow the system to continue working even if some of its parts have failed (and thus increase availability)
To scale out the number of machines that can serve read queries (and thus increase read throughput)
referred from DDIA

Migrating very large elasticsearch indices

We ran out of space due to a very large indices (5TB primary | 5TB replica). This indices has 5 shards (each shard is 1TB). We are planning to migrate this indices to bigger AWS instance type. Please let me know what are the settings that can be modified for the migration to go fast and smooth?
Note: We are using default elasticsearch settings.
First of all, I'd like to point out that having a 1TB shard is way off from the recommended 30gb limit. I'd also assume that due to this your cluster probably isn't as optimised as expected in even extreme scenarios.
Secondly, the recommended settings would depend on the track you're using to migrate this index?
I'd personally let snapshot/restore to take care of the process as it would use the least bandwidth and hence refusing the transfer time. Once done, since snapshot is already in an AWS region, it would be faster to restore.
Again, I'm assuming a lot here so alot depends on your limitations and preferred method.
All the best.

How to setup ElasticSearch cluster with auto-scaling on Amazon EC2?

There is a great tutorial elasticsearch on ec2 about configuring ES on Amazon EC2. I studied it and applied all recommendations.
Now I have AMI and can run any number of nodes in the cluster from this AMI. Auto-discovery is configured and the nodes join the cluster as they really should.
The question is How to configure cluster in way that I can automatically launch/terminate nodes depending on cluster load?
For example I want to have only 1 node running when we don't have any load and 12 nodes running on peak load. But wait, if I terminate 11 nodes in cluster what would happen with shards and replicas? How to make sure I don't lose any data in cluster if I terminate 11 nodes out of 12 nodes?
I might want to configure S3 Gateway for this. But all the gateways except for local are deprecated.
There is an article in the manual about shards allocation. May be I'm missing something very basic but I should admit I failed to figure out if it is possible to configure one node to always hold all the shards copies. My goal is to make sure that if this would be the only node running in the cluster we still don't lose any data.
The only solution I can imagine now is to configure index to have 12 shards and 12 replicas. Then when up to 12 nodes are launched every node would have copy of every shard. But I don't like this solution cause I would have to reconfigure cluster if I might want to have more then 12 nodes on peak load.
Auto scaling doesn't make a lot of sense with ElasticSearch.
Shard moving and re-allocation is not a light process, especially if you have a lot of data. It stresses IO and network, and can degrade the performance of ElasticSearch badly. (If you want to limit the effect you should throttle cluster recovery using settings like cluster.routing.allocation.cluster_concurrent_rebalance, indices.recovery.concurrent_streams, indices.recovery.max_size_per_sec . This will limit the impact but will also slow the re-balancing and recovery).
Also, if you care about your data you don't want to have only 1 node ever. You need your data to be replicated, so you will need at least 2 nodes (or more if you feel safer with a higher replication level).
Another thing to remember is that while you can change the number of replicas, you can't change the number of shards. This is configured when you create your index and cannot be changed (if you want more shards you need to create another index and reindex all your data). So your number of shards should take into account the data size and the cluster size, considering the higher number of nodes you want but also your minimal setup (can fewer nodes hold all the shards and serve the estimated traffic?).
So theoretically, if you want to have 2 nodes at low time and 12 nodes on peak, you can set your index to have 6 shards with 1 replica. So on low times you have 2 nodes that hold 6 shards each, and on peak you have 12 nodes that hold 1 shard each.
But again, I strongly suggest rethinking this and testing the impact of shard moving on your cluster performance.
In cases where the elasticity of your application is driven by a variable query load you could setup ES nodes configured to not store any data (node.data = false, http.enabled = true) and then put them in for auto scaling. These nodes could offload all the HTTP and result conflation processing from your main data nodes (freeing them up for more indexing and searching).
Since these nodes wouldn't have shards allocated to them bringing them up and down dynamically shouldn't be a problem and the auto-discovery should allow them to join the cluster.
I think this is a concern in general when it comes to employing auto-scalable architecture to meet temporary demands, but data still needs to be saved. I think there is a solution that leverages EBS
map shards to specific EBS volumes. Lets say we need 15 shards. We will need 15 EBS Volumes
amazon allows you to mount multiple volumes, so when we start we can start with few instances that have multiple volumes attached to them
as load increase, we can spin up additional instance - upto 15.
The above solution is only advised if you know your max capacity requirements.
I can give you an alternative approach using aws elastic search service(it will cost little bit more than normal ec2 elasticsearch).Write a simple script which continuously monitor the load (through api/cli)on the service and if the load goes beyond the threshold, programatically increase the nodes of your aws elasticsearch-service cluster.Here the advantage is aws will take care of the scaling(As per the documentation they are taking a snaphost and launching a completely new cluster).This will work for scale down also.
Regarding Auto-scaling approach there is some challenges like shard movement has an impact on the existing cluster, also we need to more vigilant while scaling down.You can find a good article on scaling down here which I have tested.If you can do some kind of intelligent automation of the steps in the above link through some scripting(python, shell) or through automation tools like Ansible, then the scaling in/out is achievable.But again you need to start the scaling up well before the normal limits since the scale up activities can have an impact on existing cluster.
Question: is possible to configure one node to always hold all the shards copies?
Answer: Yes,its possible by explicit shard routing.More details here
I would be tempted to suggest solving this a different way in AWS. I dont know what ES data this is or how its updated etc... Making a lot of assumptions I would put the ES instance behind a ALB (app load balancer) I would have a scheduled process that creates updated AMI's regularly (if you do it often then it will be quick to do), then based on load of your single server I would trigger more instances to be created from the latest instance you have available. Add the new instances to the ALB to share some of the load. As this quiet down I would trigger the termination of the temp instances. If you go this route here are a couple more things to consider
Use spot instances since they are cheaper and if it fits your use case
The "T" instances dont fit well here since they need time to build up credits
Use lambdas for the task of turning things on and off, if you want to be fancy you can trigger it based on a webhook to the aws gateway
Making more assumptions about your use case, consider putting a Varnish server in front of your ES machine so that you can more cheaply provide scale based on a cache strategy (lots of assumptions here) based on the stress you can dial in the right TTL for cache eviction. Check out the soft-purge feature for our ES stuff we have gotten a lot of good value from this.
if you do any of what i suggest here make sure to make your spawned ES instances report any logs back to a central addressable place on the persistent ES machine so you don't lose logs when the machines die

MongoDB capacity planning

I have an Oracle Database with around 7 millions of records/day and I want to switch to MongoDB. (~300Gb)
To setup a POC, I'd like to know how many nodes I need? I think 2 replica of 3 node in 2 shard will be enough but I want to know your thinking about it :)
I'd like to have an HA setup :)
Thanks in advance!
For MongoDB to work efficiently, you need to know your working set size..You need to know how much data does 7 million records/day amounts to. This is active data that will need to stay in RAM for high performance.
Also, be very sure WHY you are migrating to Mongo. I'm guessing..in your case, it is scalability..
but know your data well before doing so.
For your POC, keeping two shards means roughly 150GB on each.. If you have that much disk available, no problem.
Give some consideration to your sharding keys, what fields does it make sense for you to shared your data set on? This will impact on the decision of how many shards to deploy, verses the capacity of each shard. You might go with relatively few shards maybe two or three big deep shards if your data can be easily segmented into half or thirds, or several more lighter thinner shards if you can shard on a more diverse key.
It is relatively straightforward to upgrade from a MongoDB replica set configuration to a sharded cluster (each shard is actually a replica set). Rather than predetermining that sharding is the right solution to start with, I would think about what your reasons for sharding are (eg. will your application requirements outgrow the resources of a single machine; how much of your data set will be active working set for queries, etc).
It would be worth starting with replica sets and benchmarking this as part of planning your architecture and POC.
Some notes to get you started:
MongoDB's journaling, which is enabled by default as of 1.9.2, provides crash recovery and durability in the storage engine.
Replica sets are the building block for high availability, automatic failover, and data redundancy. Each replica set needs a minimum of three nodes (for example, three data nodes or two data nodes and an arbiter) to enable failover to a new primary via an election.
Sharding is useful for horizontal scaling once your data or writes exceed the resources of a single server.
Other considerations include planning your documents based on your application usage .. for example, if your documents will be updated frequently and grow in size over time, you may want to consider manual padding to prevent excessive document moves.
If this is your first MongoDB project you should definitely read the FAQs on Replica Sets and Sharding with MongoDB, as well as for Application Developers.
Note that choosing a good shard key for your use case is an important consideration. A poor choice of shard key can lead to "hot spots" for data writes, or unbalanced shards if you plan to delete large amounts of data.

Resources