When adding a new server or removing one, does TazyGrid (Open Source) partitions data at runtime or do we have to pre-plan & do some pre-deployement strategies ?
If some pre-deployment planning is required, can some one please link to any resources?
Yes TayzGrid is completely dynamic and no pre-palnning is required.
Simply add a node to the cluster or remove it & data balancing will be done automatically.
Peer to Peer Dynamic Cluster
Runtime Discovery within Cluster
Runtime Discovery by Clients
Failover Support: Add/Remove Servers at Runtime
and more from TayzGrid website
Data distribution map: The data distribution map is provided only if the caching topology is Partitioned Cache or Partition-Replica Cache. This information is useful for clients to determine the location of data in the cluster. The client can therefore directly access the data from the cache server. The data distribution map is provided in two cases:
(1) at the time of the client’s connection to the server and
(2) if when any change occurs in the partitioning map because a new server has been added or removed from the cluster.
Related
Just thinking about app arhitecture and whant to know is it possible at all to create local cluster for specific tables and connect it with cloud cluster?
And additional question - is it possible to choose where to create shard (on what machine) for particular table (to show cloud cluster that for this table i need shards in local cluster)?
As example, I whant to have table db.localTable be sharded in local cluster to reduce latency and increase performance due to run queries in local cluster and also have ability to run queries in cloud cluster when local cluster is not accessible. All data between clusters should be consistent.
Tnx in advance.
Actually, I've found the solution: to set specific servers for replicas and servers for shards you should use server-tags and perform changes using ReQL and tables setting. For details see - RethinkDB - Scaling, sharding and replication and RethinkDB - Architecture FAQ
We have to work with some Global Parameters. We want to calculate these parameters on one machine and distribute them to the Ignite Cluster, can we do this?
According to the guidance of the official website, Ignite is an distributed clustering which has no master or slaver or standby.
By the way, we need to use the ignite lightly first, we will use it as a Spring's bean in our Distributed System.
Yes, you can do that easily with a REPLICATED Ignite cache. Every value that you put to that cache will be copied to all other nodes in cluster. Data won't be lost while at least one node still runs.
Here is some documentation on cache modes.
SnappyData documentation and architecture diagrams seem to indicate that a JDBC thin client connection goes from a client to a Locator and then it is routed to a direct connection to a Server.
If this is true, then I can run JDBC queries without a Lead node, correct?
Yes, that is correct. The locator provides load and connectivity information back to the client that is now able to connect to one or more servers either for direct access to a bucket for low latency queries but more importantly, is HA - can failover and failback.
So, yes, your connected clients will continue to function even when the locator goes away. Note that the "lead" plays a different role than the locator. Its primary function is to host Spark driver, orchestrate Spark Jobs and provide HA to Spark. With no lead, you won't be able to run such Jobs.
In addition to what #jagsr has mentioned, if you do not intend to run the lead nodes (and thus no Spark jobs or column store), then you can run the cluster as pure row store using snappy-start-all.sh rowstore (see rowstore docs)
I have been doing a lot of reading about HBase lately and I am little confused as to the role of HMaster and Zookeeper in the architecture of HBase.
When a client requests for data, who gets that request? Assuming this is the first request. I understand subsequent requests can be directly made to region servers. But for that to happen, locations of meta files need to be retrieved and then a get or scan needs to run on the specific meta table in the region server.
The reason I ask is, if I am using Java I would use HConnectionManager class to create a connection. It looks like HConnectionManager already has a cache of region locations available. The reason the cache is built will be when some number of requests are made earlier, but what if the cache isn't there and this is the first request.
Who takes the first HBase request, will it be the zookeeper quorum? We are submitting the hbase-site.xml file for the HBaseConfiguration class.
Also I am a little confused about how do we define a "client"?
The other thing I read was the meta information gets cached on the "client", is this true even in case of HBase REST? Will the client here be the HMaster or the actual user who is making the REST call. If so doesn't it expose a security threat if metadata is exposed to client.
Clients connect to ZooKeeper to get the latest state. The HBaseMaster role is to make sure this list is correct (i.e. assign regions to regionservers on startup, failures etc.). Clients will contact the HBaseMaster only for admin purposes e.g. creating a table, changing its structure etc. (via HBaseAdmin class). You can read more about it here.
In case of HBase REST the client sends REST request to the REST server which holds an HBase client internally
Following answer base on HBase-1.0.1.1:
1.When a client requests for data, who gets that request?
a)look up zookeeper for hbase:meta region location and cache meta region location for future.
b)scan hbase:meta in region server and get region location we need.Client also cache the region location.
c)request the region server.
2.Who takes the first HBase request, will it be the zookeeper quorum?
Yes if all is first,else may be meta region or user table region.
3.security
You can use kerberos.It is supported for HBase.
Problem: I want to cache user information such that all my applications can read the data quickly, but I want only one specific application to be able to write to this cache.
I am on AWS, so one solution that occurred to me was a version of memcached with two ports: one port that accepts read commands only and one that accepts reads and writes. I could then use security groups to control access.
Since I'm on AWS, if there are solutions that use out-of-the box memcached or redis, that'd be great.
I suggest you use ElastiCache with one open port at 11211(Memcached)then create an EC2 instance, set your security group so only this server can access to your ElastiCache cluster. Use this server to filter your applications, so only one specific application can write to it. You control the access with security group, script or iptable. If you are not using VPC, then you can use cache security group.
I believe you can accomplish this using Redis (instead of Memcached) which is also available via ElastiCache. Once the instance has been created, you will want to create a replication group and associate it to the cache cluster you already launched.
You can then add instances to the replication group. Instances within the replication group are simply replicated from the Master Cache Cluster (single Redis instance) and so are (by default) read-only.
So, in this setup, you have a master node (single endpoint) that you can write to and as many read nodes (multiple endpoints) as you would like.
You can take security a step further and assign different routing rules to the replication group (via the VPC) so the applications reading data does not have access to the master node (the only one that can write data).