I want to create followers for my database. Now I wonder if there is a significant delay between the master and the follower state.
Can I write to a master and read immediately after from a follower instance?
There is a variable delay, that depends on network latency between master/slave, volume of data being replicated, and locks/transactions that may effect replicated data on the slave.
Due to all of these things, you should consider the slave to be a valid point in time snapshot of the database, but not a current one.
There is a synchronous replication mode available in Postgres, but not on Heroku Postgres. This synchronous mode waits for a write to be written to the slaves before acknowledging it written on the master. This can be a dangerous feature, introducing high latency or bigger problems if the master/slave are partitioned. I don't recommend it.
If you need guaranteed reads of current data, you should be reading from the master.
Anecdotally, our slave is at most 100-200 commits behind the master, when we run blocking reporting jobs on the slave.
Related
The read performance of Redis cluster (version > 2.8.22 in AWS) we have is being affected lately by regular scheduled snapshots/backups. I see read operations increase in latency (or Timeouts) at the time of creation of redis backups.
As per AWS docs Redis backups with version > 2.8.22 spin a child process (in replicas) when enough memory is available to create a snapshot. So, this mean redis doesn't fork the process (of creating snapshot) when enough memory isn't available.
So, my question is how much is the enough memory for Redis to spin up a child process to create backups?
Is there a way to know whether replicas in my Redis cluster is forking a child process to create backups or not?
My Redis replicas has about 15 - 20% of available memory while creating the backups. Is this enough to not affect the read performance?
Some steps we took to mitigate the issue:
Increase number of replicas
Increase reserved-memory (to 10%).
But, both steps didn't mitigate the issue.
Does increase in reserved-memory help in improving the read performance?. As per the AWS docs (https://docs.aws.amazon.com/AmazonElastiCache/latest/red-ug/backups.html#backups-performance) reserved-memory helps to not affect Write performance.
Other workaround I'm thinking is to add a new shard to cluster. Adding a new shard would increase the available/free memory of each Replica and thus it always guarantee (theoretically) the forking of child process to create snapshots.
But, we also don't want to have too many shards to our cluster as too many shards could reduce our current read performance.
So, are there any other steps to make the snapshots/backups creation not affect Read performance?
Partition Tolerance - The system continues to operate as a whole even if individual servers fail or can't be reached.
Better definition from this link
Even if the connections between nodes are down, the other two (A & C)
promises, are kept.
Now consider we have master slave model in both RDBMS(oracle) and mongodb. I am not able to understand why RDBMS is said to not partition tolerant but mongo is partition tolerant.
Consider I have 1 master and 2 slaves. In case master gets down in mongo, reelection is done to select one of the slave as Master so that system continues to operate.
Does not happen the same in RDBMS system like oracle/Mysql ?
See this article about CAP theorem and MySQL.
Replication in Mysql cluster is synchronous, meaning transaction is not committed before replication happen. In this case your data should be consistent, however cluster may be not available for some clients in some cases after partition occurs. It depends on the number of nodes and arbitration process. So MySQL cluster can be made partition tolerant.
Partition handling in one cluster:
If there are not enough live nodes to serve all of the data stored - shutdown
Serving a subset of user data (and risking data consistency) is not an option
If there are not enough failed or unreachable nodes to serve all of the data stored - continue and provide service
No other subset of nodes can be isolated from us and serving clients
If there are enough failed or unreachable nodes to serve all of the data stored - arbitrate.
There could be another subset of nodes regrouped into a viable cluster out there.
Replication btw 2 clusters is asynchronous.
Edit: MySql can be also configured as a cluster, in this case it is CP, otherwise it is CA and Partition tolerance can be broken by having 2 masters.
Tunable consistency means that when an update is received by a master-node, then the master need not wait for all the replica-nodes to copy the same update. How many replicas need to be in-sync can be configured by the user.
If the above is true, then consider the following scenario:
A system has one master and 5 replicas and tunable consistency is set to 2.
An update arrives on the master and is written to the master and 1 replica.
The master is unable to write to any other replica, even though they are online.
Since tunable consistency is not met, master fails the update.
What happens to the replica that already succeeded in copying the update?
Note that I have deliberately avoided specifying any particular distributed system because this is a common pattern across many systems like Elastic-Search, Solr, Couchbase and Cassandra. Do these systems have a rollback feature like a database?
I am using client side partitioning on a 4 node redis setup. The writes and reads are distributed among the nodes. Redis is used as a persistence layer for volatile data as well as a cache by different parts of application. We also have a cassandra deployment for persisting non-volatile data.
On redis we peak at nearly 1k ops/sec (instantaneous_ops_per_sec). The load is expected to increase with time. There are many operations where we query for a non-existent key to check whether data is present for that key.
I want to achieve following things:
Writes should failover to something when a redis node goes down.
There should be a backup for reading the data lost when the redis node went down.
If we add more redis nodes in the future (or a dead node comes back up), reads and writes should be re-distributed consistently.
I am trying to figure out suitable design to handle the above scenario. I have thought of the following options:
Create hot slaves for the existing nodes and swap them as and when a master goes down. This will not address the third point.
Write a Application layer to persist data in both redis and cassandra allowing a lazy load path for reads when a redis node goes down. This approach will have an overhead of writing to two stores.
Which is a better approach? Is there a suitable alternative to the above approaches?
A load of 1k ops/s is far below the capabilities of Redis. You would need to increase by up to two or more orders of magnitude before you come close to overloading it. If you aren't expecting to exceed 50-70,000 ops/second and are not exceeding your available single/0-node memory I really wouldn't bother with sharding your data as it is more effort than it is worth.
That said, I wouldn't do sharding for this client-side. I'd look at something like Twemproxy/Nutcracker to do it do you. This provides a path to a Redis Cluster as well as the ability to scale out connections and proved transparent client-side support for failover scenarios.
To handle failover in the client you would want to set up two instances per slot (in your description a write node) with one shaved to the other. Then you would run a Sentinel Constellation to manage the failover.
Then you would need to have your client code connect to sentinel to get the current master connectivity for each slot. This also means client code which can reconnect to the newly promoted master when a failover occurs. If you have load Balancers available you can place your Redis nodes behind one or more (preferably two with failover) and eliminated client reconnection requirements, but you would then need to implement a sentinel script or monitor to update the load balancer configuration on failover.
For the Sentinel Constellation a standard 3 node setup will work fine. If you do your load balancing with software in nodes you control it would be best to have at least two sentinel nodes on the load Balancers to provide natural connectivity tests.
Given your description I would test out running a single master with multiple read slaves, and instead of hashing in client code, distribute reads to slaves and writes to master. This will provide a much simpler setup and likely less complex code on the client side. Scaling read slaves is easier and simpler, and as you describe it the vast majority if ops will be read requests so it fits your described usage pattern precisely.
You would still need to use Sentinel to manage failover, but that complexity will still exist, resulting in a net decrease in code and code complexity. For a single master, sentinel is almost trivial so setup; the caveats being code to either manage a load balancer or Virtual IP or to handle sentinel discovery in the client code.
You are opening the distributed database Pandora's box here.
My best suggestion is; don't do it, don't implement your own Redis Cluster unless you can afford loosing data and / or you can take some downtime.
If you can afford running on not-yet-production-ready software, my suggestion is to have a look at the official Redis Cluster implementation; if your requirements are low enough for you to kick your own cluster implementation, chances are that you can afford using Redis Cluster directly which has a community behind.
Have you considered looking at different software than Redis? Cassandra,Riak,DynamoDB,Hadoop are great examples of mature distributes databases that would do what you asked out of the box.
When we talk about nosql distributed database system, we know that all of them fall under the 2 out of three of CAP theoram. For a distributed cluster where network failure and node failure are inevitable partition tolerance is a necessity hence leaving us to chose one from availability and consistency. So its basically CP or AP.
My questions are
Under which category does hadoop fall into.
Let's say I have a cluster with 6 nodes ABC and DEF, During a network failure let's say node A,B,C and node D,E,F are divided into two independent cluster.
Now in a consistent and partition tolerant system (CP) model since an update in node A wont replicate to node D the consistency of the system wont allow user to update or read data till the network is up again running, Hence making the database down.
Whereas an Available and partition tolerant system would allow the user of node D to see the old data when update is made at node A but doesn't guarantee the user of node D of the latest data. But after some time when the network is up running again it replicates the latest data of node A into node D and hence allows the user of node D to view the latest data.
From the above two scenarios we can conclude that In an AP model there is no scope for database going hence allowing user to write and read even during failure and promises user latest data when the network is up again, So Why do people go for Consistent and partition tolerant model (CP). In my perspective during network failure (AP) has an advantage over (CP) allowing user to read and write data while the database under (CP) is down.
Is there any system that can provide CAP together excluding the concept of Cassandra's eventually consistency.
When does a user Choose availability over consistency and vice versa. Is there any database out there that allows user to switch its choice accordingly between CP and AP.
Thanks in advance :)
HDFS has a unique central decision point, the namenode. As such it can only fall in the CP side, since taking down the namenode takes down the entire HDFS system (no Availability). Hadoop does not try to hide this:
The NameNode is a Single Point of Failure for the HDFS Cluster. HDFS is not currently a High Availability system. When the NameNode goes down, the file system goes offline. There is an optional SecondaryNameNode that can be hosted on a separate machine. It only creates checkpoints of the namespace by merging the edits file into the fsimage file and does not provide any real redundancy.
Since the decission where to place data and where it can be read from is always handled by the namenode, which maintains a consistent view in memory, HDFS is always consistent (C). It is also partition tolerant in that it can handle loosing data nodes, subject to replication factor and data topology strategies.
Is there any system that can provide CAP together?
Yes, such systems are often mentioned in Marketing and other non-technical publications.
When does a user Choose availability over consistency and vice versa.
This is a business use case decision. When availability is more important they choose AP. When consistency is more important, they choose CP. In general when money changes hands the consistency takes precedence. Almost every other case favors availability.
Is there any database out there that allows user to switch its choice accordingly between CP and AP
Systems that allows you to modify both the write and the read quorums can be tuned to be either CP or AP, depending on the needs.