Event based communication between Hybris cluster nodes - events

In what use-case would there be a need to communicate between cluster nodes? The ClusterAwareEvent interface offers the possibility to specify a source node and a target node, but shouldn't cluster nodes be as independent of each other as possible?

Well there are a few reasons why they would need to communicate or rather why you would want them to communicate.
Firstly there is a concept called the Cache Invalidation Concept, which is where each cluster member holds only valid data, but can communicate with one another by TCP or UDP to mark some cache entries as invalid. For example if a database item has been changed.
A basic overview of the invalidation process:
Product description is changed. Therefore, all cache entries
referring to the product are invalid.
This modification to the description is done on a node, which now has to send a notification to all cluster nodes that the data is invalid.
Nodes that hold the product in their cache discard the cached data of
the product and re-retrieve the product from the database the next
time the product is used.
Other features of clustering within Hybris where you would want to communicate with other nodes would be:
Load Balancing
Semi-Session Failover - This allows sessions(sticky sessions) to transfer to a different cluster. Useful if say a server is going down for maintenance or a hardware defect.
These would be the main reasons I can think of off the top of my head for why you would want clusters to communicate.

Related

Multiple locations in Google Cloud SQL

I need to use SQL on multiple different locations. The best option will be to set some databases (or even some records, like tagging on Mongo) in different locations. Is it possible to achieve on Google SQL?
There maybe two scenarios -
One single Cloud SQL instance in multiple locations
Different Cloud SQL instances in multiple locations
When you create a Cloud SQL instance, you choose a region where the instance and its data are stored. To reduce latency and increase availability, choose the same region for your data and your Compute Engine instances, standard environment apps, and other services.
Location types are of mainly two types, regional location i.e. a specific geographic place and multi-regional location which contains at least two geographic places. Multi-regional locations are only used for backup operations in Cloud SQL.
You choose a location when you first create the instance. The location can't be changed after the instance is created.
One single region consists of many small data centers called zones. While creating a Cloud SQL instance you can specify the instance to be available in a single zone or in two different zones within the selected region. Selecting the Cloud SQL instance to be in two different zones is called High Availability (HA) configuration.
The purpose of an HA configuration is to reduce downtime when a zone or instance becomes unavailable which might happen during a zonal outage, or when an instance becomes corrupted. With HA, your data continues to be available to client applications.
The HA configuration provides data redundancy. A Cloud SQL instance configured for HA is also called a regional instance and is located in a primary and secondary zone within the configured region. Within a regional instance, the configuration is made up of a primary instance and a standby instance.
So considering the first scenario when you say if a Cloud SQL instance can be located in multiple locations then it is yes if you consider different zones as different locations (This is correct as two zones are physically separated data centers within a single GCP region). But it can only be located in two zones and for that you have to configure High Availability(HA) for the instance.
For the second scenario, you can always create different Cloud SQL instances in different regions.
You can go through instance locations in Cloud SQL and overview of HA configuration to have a brief understanding of the above.
There is another option in Cloud SQL called read replicas.
You use a read replica to offload work from a Cloud SQL instance. The read replica is an exact copy of the primary instance. Data and other changes on the primary instance are updated in almost real time on the read replica.
Read replicas are read-only; you cannot write to them. The read replica processes queries, read requests, and analytics traffic, thus reducing the load on the primary instance.
If you want the data to be available in multiple locations you may consider using cross-region read replicas.
Cross-region replication lets you create a read replica in a different region from the primary instance.
Cross-region read replicas has many advantages -
Improve read performance by making replicas available closer to your
application's region.
Provide additional disaster recovery capability to guard against a
regional failure.
Let you migrate data from one region to another.

Is there someway to make distributed table still work for query when one of the shard servers down?

There is a common case that we will update the clickhouse's config which must restart clickhouse to take effect. And during the restarting, the query services depend on clickhouse's distributed table will return the exception due to disconnecting with the restarting server.
So,as the title says, what I want is the way to make distributed table still work for query when one of the shard server down. Thanks.
I see two ways:
Since this server failure is transient, you can refactor your server-side code by adding retry-policy to your request (for c# I would recommend use Polly)
Use the proxy (load-balancer) to CH (for example chproxy).
UPDATE
When one node is restarting in a cluster the distributed table created over replicated tables should be accessible (of course request shouldn't be sent to restarted node).
Availability of data is achieved by using replication, therefore, you need to create Replicated*-tables over materialized view and then create Distributed-tables over Replicated*-tables.
Please look at the articles CH Data Distribution, Distributed vs Shard vs Replicated..
and as a working example (it is not your case) to CH Circular cluster topology.

Oracle RAC and SCAN Listeners

I have a 4 node RAC architecture and SCAN Listeners are running on 3 nodes. As Oracle recommends minimum of 3 SCAN Listeners in the below blog.
http://satya-racdba.blogspot.in/2012/09/scan-in-oracle-rac-11g-r2.html
But is it necessary to configure SCAN on the 4th Listener as well?
Is the 4th Node being picked up by the SCAN listeners?
How do I test it?
Oracle does not reccomends miiumum of three scna listeners. Oracle states:
SCAN is a domain name registered to at least one and up to three IP addresses, either in the domain name service (DNS) or the Grid Naming Service (GNS).
You can find this information in Metalink Note:
Grid Infrastructure Single Client Access Name (SCAN) Explained (Doc ID 887522.1)
In the image you can see an exampel architecture of oracle SCAN:
A new set of cluster processes called scan listeners will run on three nodes in a cluster (or all nodes if there are less than 3). If you have more than three nodes, regardless of the number of nodes you have, there will be at most three scan listeners. The database registers with the SCAN listener through the remote listener parameter in the init.ora/spfile. If any of these clustered processes fail, they are automatically restarted on a new node.
If you have to do an installation you'd better to review the Oralce documentation:
D Oracle Grid Infrastructure for a Cluster Installation Concepts

MongoDB: let different client reads from different replicas

I have a heavy and large mongo table, which has a lot of reads. One of the read clients is an offline process which periodically scans a table aggressively. While other clients read the same table as online service. I'd like to separate them. What I'm thinking is to have a dedicate replica node for this offline client to read from, and then let the other clients read from the remaining replicas. How to do that?
You should consider marking one of the nodes as hidden member of the replica set. It will receive all the replicated writes from primary but won't receive any read traffic (from your online service if you use proper-replicaset-enabled connection string). Then from your offline client you can use connection string which targets the hidden member directly
http://docs.mongodb.org/manual/core/replica-set-hidden-member/

HBase: how put/get knows which region server to write to?

In HBase, how the put/get operations know which region server the row should be written to?
In case of multiple rows to be read how multiple region servers are contacted and the results are retrieved?
I assume your question is simply curiosity, since this behavior is abstracted from the user and you shouldn't care.
In HBase, how the put/get operations know which region server the row should be written to?
From the hbase documentation book:
The HBase client HTable is responsible for finding RegionServers that are serving the particular row range of interest. It does this by querying the .META. and -ROOT- catalog tables (TODO: Explain). After locating the required region(s), the client directly contacts the RegionServer serving that region (i.e., it does not go through the master) and issues the read or write request. This information is cached in the client so that subsequent requests need not go through the lookup process. Should a region be reassigned either by the master load balancer or because a RegionServer has died, the client will requery the catalog tables to determine the new location of the user region.
So first step is looking up in meta and root to determine where it is, then it contacts that regionserver to do that work.
In case of multiple rows to be read how multiple region servers are contacted and the results are retrieved?
There are two ways to read from HBase in general: scanners and gets.
If you run multiple gets, those will each individually fetch those records separately. Each one of those is possibly going to a different region server.
The scanner will simply look for the start of the range and then move forward from there. Sometimes it needs to move to a different regionserver when it reaches the end, but the client handles that behind the scenes. If there is some way to design the table such that your multiple gets is a scan and not a series of gets, you should hypothetically have better performance.
Providing the same scenario and explanation from BigTable Paper: "The client library caches tablet locations. If the client
does not know the location of a tablet, or if it discovers
that cached location information is incorrect, then
it recursively moves up the tablet location hierarchy.
If the client's cache is empty, the location algorithm
requires three network round-trips, including one read
from Chubby. If the client's cache is stale, the location
algorithm could take up to six round-trips, because stale
cache entries are only discovered upon misses (assuming
that METADATA tablets do not move very frequently).
Although tablet locations are stored in memory, so no
GFS accesses are required, we further reduce this cost
in the common case by having the client library prefetch
tablet locations: it reads the metadata for more than one
tablet whenever it reads the METADATA table."
http://static.googleusercontent.com/external_content/untrusted_dlcp/research.google.com/en/us/archive/bigtable-osdi06.pdf

Resources