What are the guidelines for creating Oracle NoSQL Database Storage Node (SN), can we create multiple storage nodes on the same machine? If so what are the trades off? I looked at the product documentation, but it's not clear
So digging deeper here's what was found :
It is recommended that Storage Nodes (SNs) be allocated one per node in the cluster for availability and performance reasons. If you believe that a given node has the I/O and CPU resources to host multiple Replication Nodes, the Storage Node's capacity parameter can be set to a value greater than one, and the system will know that multiple RNs may be hosted at this SN. This way, the system can:
ensure that each Replication Node in a shard is hosted on a different Storage Node, reducing a shard's vulnerability to failure dynamically divide up memory and other hardware resources among the Replication Nodes ensure that the master Replication Nodes, which are the ones which service write operations in a store, are distributed evenly among the Storage Nodes both at startup, and after any failovers. If more than one SN is hosted on the same node, multiple SNs are lost if that node fails, and data may become inaccessible.
You can set the capacity parameter for a Storage Node in several ways:
When using the makebootconfig command
List item with the change-policy command
List item with the plan change-params command.
Also, in very limited situations, such as for early prototyping and experimentation, it can be useful to create multiple SNs on the same node.
On a single machine, a Storage Node is uniquely identified by its root directory (KVROOT) plus a configuration file name, which defaults to "config.xml." This means you can create multiple SNs as by creating a unique KVROOT directory for each SN. Usually, these would be on different nodes, but it's also possible to have them on a single node.
Related
Please help in solving the following problem.
The following entities are given:
Application. Applications reside on storage and they generate traffic through service node.
Service. Service is divided into several nodes. Each node has access to local or/and shared storage.
Storage. This is where applications resides. It can be local (connected to only one service node) or shared by several nodes.
Rules:
Each application is placed on some particular storage. And the storage cannot be changed.
The service node for the application can be changed to another one as long as new service node has access to the application's storage.
For example, if App resides on local storage of Node0, it can only be served by Node0. But if App resides on storage shared0, it can be served by Node0, Node1 or Node2.
The problem is to find the algorithm to rebalance applications between service nodes, given that all applications are already placed on their datastores. And to have this rebalancing as fair as possible.
If we take for example shared2 storage, the solution seems trivial: we take the apps count for Node3 and Node4 and divide all apps equally between them.
But when it comes to shared1 it becomes more complicated since Node2 also has access to shared0 storage. So when rebalancing apps from group [Node2, Node5] we also have to take into account apps from group [Node0, Node1, Node2]. Groups [Node2, Node5] and [Node0, Node1, Node2] are intersecting and rebalancing should be performed for all groups at once.
I suspect that there should be well-known working algorithm to this problem, but still cannot find it.
I think the Hungarian Matching algorithm would fit your needs. However, it might be a simple enough problem to try your own approach.
If you separated all the unconnected graphs, you'll have some set of Shared storage units per graph, each set being associated with a collection of Apps. If you spread each of those Apps evenly across each Storage's associated Nodes, you would have some Nodes with more Apps than others. Those nodes will be connected to multiple Shared storage units.
If all vacant nodes are filled, there should always be a transitive relationship between any two Nodes within a connected graph such that an App from one can be decreased ann an App from the other can be increased, even if some intermediate displacements are needed. So, if you iteratively move an App along the path from the heaviest Node to the lightest Node, shortcutting if you reach a vacant Node, and swapping Apps at any intermediate node as needed to continue along that path through one or more Shared storage units, you should be balanced when the count of the heaviest and lightest nodes differ by no more than one.
I was wondering how to set the replica parameter properly when start a TDengine cluster to balance the storage and high availability? According to documentation of TDengine, default value of replica is 1 which means no copies for each vnode (vGroup size should be 1 as well), and the replica can be dynamically changed to maintain a high avilability of the cluster. However, the extra vnode copies have to be generated physically when starting up multi-replica. So the problem rise up, how should a real company determine the value of replica to increase availability without taking up too much overhead(storage and performance) when using TDengine cluster?
replica means keeping a copy of the same data on multiple machines that are connected via a network. There are reasons you want to replicate data:
To keep data geographically close to your users (and thus reduce latency)
To allow the system to continue working even if some of its parts have failed (and thus increase availability)
To scale out the number of machines that can serve read queries (and thus increase read throughput)
referred from DDIA
I have been trying to understand how replication is handled in oracle coherence distributed caching if a member node goes down.
Say, my coherence cluster has 3 nodes, A, B, and C. As per my understanding, each node has its backup.
Is backup data stored on disk? And if a node C goes does, does the coherence distributed caching algorithm retrieves the data from the backup of the node C, and equally distributes it among other two-nodes?
Could someone please confirm my understanding?
Also, as per my understanding, each node handles only a piece of data. Is it possible for a node to get a request for data that it does not handle? How is such a scenario handled in oracle coherence distributed caching?
The backup is just stored in different node(s). With backup count of one (the default), 2 nodes will have the same piece of data, with one of them acts as the primary node for the data, and the other as the backup node.
If a node fails, it will become unreachable and the other nodes will become aware of this. Once they are aware of it, each node that have the failed node's "backup" piece of data will be promoted to be the primary node for the piece of data, and each of this data will have a new backup on one of the surviving nodes. If the failed node were responsible for a backup data, the primary node of said data will simply elect a new node to be the new backup.
Each node maintains a kind of index which let them map any stored piece of data to the node that is responsible for it. A node is highly possible to get a request that it does not responsible to. When it does happen in a distributed cache, the node will request the piece of data to the responsible node, and pass it back to the requester. The maximum of extra network hop is exactly once.
To better understand about how distributed cache works in Coherence, see: Introduction to Coherence Caches. (The images are sourced from there.)
I was exploring NiFi documentation. I must agree that it is one of the well documented open-source projects out there.
My understanding is that the processor runs on all nodes of the cluster.
However, I was wondering about how the content is distributed among cluster nodes when we use content pulling processors like FetchS3Object, FetchHDFS etc. In processor like FetchHDFS or FetchSFTP, will all nodes make connection to the source? Does it split the content and fetch from multiple nodes or One node fetched the content and load balance it in the downstream queues?
I think this document has an answer to your question:
https://community.hortonworks.com/articles/16120/how-do-i-distribute-data-across-a-nifi-cluster.html
For other file stores the idea is the same.
will all nodes make connection to the source?
Yes. If you did not limit your processor to work only on primary node - it runs on all nodes.
The answer by #dagget has traditionally been the approach to handle this situation, often referred to as the "list + fetch" pattern. List processor runs on Primary Node only, listings sent to RPG to re-distribute across the cluster, input port receives listings and connect to a fetch processor running on all nodes fetching in parallel.
In 1.8.0 there are now load balanced connections which remove the need for the RPG. You would still run the List processor on Primary Node only, but then connect it directly to the Fetch processors, and configure the queue in between to load balance.
I am new B to CouchBase and looking into replication. One thing I am trying to figure out is how Couchbase handle replication conflicts between two caches. That means:
There are two couchbase servers called S1 and S2 added/replicated together and those servers located in different geographical locations.
Also there two clients (C1 and C2).C1 cache to S1 and C2 cache into S2 objects having a same key but different objects(C1 cache an object called Obj1, C2 cahces Ojb2 object) in same time.
My problem is that what the final value for that key is in the cluster? (what is in S1 and S2 for the key)
Writes in Couchbase
Ignoring replication for a moment and explaining how writes work in a single couchbase cluster.
A key in couchbase is hashed to a vbucket(shard). That vbucket only ever lives on one node in the cluster so there is only ever one writable copy of the data. When two clients write to the same key, the client that wrote last will "win". The couchbase SDK do expose a number of operations to help with this, such as "add()" and "cas()".
Internal replication
Couchbase does have replica copies of the data. These copies are not writable by the end user and only become active when a node fails. The replication used is a one way sync from the active vbucket to the replica vbucket. This is done memory to memory and is extremely fast. As a result for inter cluster replication you do not have to worry about conflict resolution. Do understand that if there is a failover before data has been replicated that data is lost, again the SDK expose a number of operations to ensure a write has been replicated to Nth number of nodes. See the observe commands.
External replication
External replication in couchbase is called XDCR where data is synced between two different clusters. It is best practices not to write the same key in both clusters at the same time. Instead to have a key space per a cluster and use the XDCR for disaster recovery. The couchbase manual explains the conflict resolution very well but basically the key in the cluster that has been updated the most will win.
If you would like to read more about cluster systems then CAP Theorem would be the place to start. Couchbase is a CP system.