RethinkDB local and cloud clusters connection - rethinkdb

Just thinking about app arhitecture and whant to know is it possible at all to create local cluster for specific tables and connect it with cloud cluster?
And additional question - is it possible to choose where to create shard (on what machine) for particular table (to show cloud cluster that for this table i need shards in local cluster)?
As example, I whant to have table db.localTable be sharded in local cluster to reduce latency and increase performance due to run queries in local cluster and also have ability to run queries in cloud cluster when local cluster is not accessible. All data between clusters should be consistent.
Tnx in advance.

Actually, I've found the solution: to set specific servers for replicas and servers for shards you should use server-tags and perform changes using ReQL and tables setting. For details see - RethinkDB - Scaling, sharding and replication and RethinkDB - Architecture FAQ

Related

change persistent disk type to ssd

I have an elasticsearch running as a ECK on a GKE cluster for production purposes and in order to increase its performance I'm thinking of changing the persistent disk type to ssd. I came accross solutions that incite the need to create a snapshot of the disk in GCE and then create another ssd disk with the data stored in the snapshot. I'm still concerned whether it still has a risk of data loss and if I create another disk will my elastic be able to match it or not as it is statefulset.
Since this is a production deployment I would advise to do as follows:
Create a volume snapshot (doc).
Set up a secondary cluster (doc).
Modify the deployment so that it uses an SSD (doc).
Deploy to the second cluster.
Once this new deployment has been fully tested you can switch over the traffic.

Migrate library implementation for cassandra with multiple host ip's

I am trying to use golang-migrate library for cassandra migrations.
In the Docs, they have mentioned to use cassandra url like this
cassandra://host:port/keyspace?param1=value&param2=value2
We will be having more than one host for cassandra.
Do I need to loop for each host and run migrations separately? or is there is any other way?
The ALTER should be realised / replicated across the cluster. Migrate uses the highest level of consistency (ALL / https://docs.datastax.com/en/cql/3.3/cql/cql_reference/cqlshConsistency.html) so it's the best choice if your hosts are within a cluster.
If the hosts aren't in a cluster, then hand rolling and applying the migrations for each host is the only option.

Dynamic Partionitioning in TazyGrid Cluster

When adding a new server or removing one, does TazyGrid (Open Source) partitions data at runtime or do we have to pre-plan & do some pre-deployement strategies ?
If some pre-deployment planning is required, can some one please link to any resources?
Yes TayzGrid is completely dynamic and no pre-palnning is required.
Simply add a node to the cluster or remove it & data balancing will be done automatically.
Peer to Peer Dynamic Cluster
Runtime Discovery within Cluster
Runtime Discovery by Clients
Failover Support: Add/Remove Servers at Runtime
and more from TayzGrid website
Data distribution map: The data distribution map is provided only if the caching topology is Partitioned Cache or Partition-Replica Cache. This information is useful for clients to determine the location of data in the cluster. The client can therefore directly access the data from the cache server. The data distribution map is provided in two cases:
(1) at the time of the client’s connection to the server and
(2) if when any change occurs in the partitioning map because a new server has been added or removed from the cluster.

Amazon RDS Master-Slave Relationship between EC2 instances with load balancing activated

We're planning to move our Tomcat/MySQL app onto the Amazon cloud employing 2 EC2 instances (inst_1 & inst_2) running in different availability zones whereby inst_1 will contain the master RDS db and inst_2 the slave RDS db.
If we employ elastic load balancing to balance traffic between the two instances, will traffic directed to inst_2 that includes insert/update/delete db transactions first update the master RDS db in inst_1 followed by a synchronous update of the slave in inst_2; thereby ensuring that the two RDS instances are always synchronized?
Amazon's published info (whitepapers) suggests such, but doesn't explicitly state it. If not, how does one ensure that the two RDS instances remain synchronized?
Additional note: We're planning to employ Amazon's Elastic Beanstalk. Thanks!
You have to take a few things into consideration
AWS RDS instances are simple managed EC2 instances which run a MySQL server.
If you add a slave ( I think Amazon calls them read-replica) this is a read-only slave
Amazon doesn't manage the distribution of writing queries to the master server automatically.
Replication will ensure that your read slave always is up-to-date automatically ( with minimal delay which is increasing with write-load on the master )
This behavior is MySQL-specific
This means that you have to delegate manipulating queries to the master exclusively.
This can either be done by your application or by a MySQL proxy running on a extra machine.
The proxy then is the only interface your application servers will talk to. It is able to manage balancing between your RDS instances and the direction of any manipulation query to the master instance.
When RDS is used in multi-az mode you have no access to the secondary instance. There is only ever one instance that is visible to you, so most if your question doesn't apply. In case of failover the DNS address you are given will start resolving to a different ip. Amazon doesn't disclose how the two instances are kept in sync.
If instead of using a multi-az instance you use a single-az instance + a replica then it is up to you to direct queries appropriately - any attempt to alter data on the replica will fail. Since this is just standard MySQL replication, the replica can lag behind the master (in particular with current versions of MySQL the replica only runs a single thread)

read data from amazon hbase

Can anyone suggest me that whether I can read data from amazon hbase using the org.apache.hadoop.conf.Configuration and org.apache.hadoop.hbase.client.HTablePool.
We are migrating to Amazon's EMR framework having hbase running on top of it.
The present implementation is based on pure Apache hadoop and hbase distributions. I'm trying to verify that no code changes needed even we migrate to amazon's EMR.
Please share your thoughts.
While it should not happen, I would expect the problems and changes related to the nature of EC2 and its networking.
HBase relay on Regions able to renew their leases in timely manner. If Region servers are two busy - because of some massive operations over them, they can not do so and get kicked off the cluster.
In amazon performance of the EC2 instances are much less predictable then in dedicated cluster (unless you use cluster instances), so adjusting timeout parameters and/or nature of your loads might be needed to get cluster to work properly

Resources