TCP/IP Connection between multiple nodes for a distributed framework - c++11

I am trying to achieve TCP/IP connections between multiple nodes. I understand how TCP connections work but I cannot have a server-client based approach here since every node must connect with every other node. How should I achieve this?
(I cannot use Nanomessage or ZeroMQ libraries)
Please help. Noob here.
Thanks!

Firstly, the demand that every node must connect with eacy other node doesn't mean we can't use C/S module.
You can build a central controller as a server, every node connects to the server to get the nodes table of all the other nodes, and then connects to them.
Actually, it is a prototype of ICE structure which is used for NAT traversal. And I think you may use it directly without write one for yourself, because if your nodes are not all in WLAN, you may also have to face the problem of NAT traversal.

Related

SWIM protocol how does a new node get an address of an existing node in a cluster

Background:
I've been looking into microservices more specifically service discovery,
one thing thats interested me is the SWIM protocol. But I'm a little confused when it comes to new nodes joining the network.
How does a new node joining the cluster get an address of 1 or more nodes of the existing cluster, without their being a single point of faliure?
If you need any further information or have any questions just let me know.
please check out scalecube that implements microservices based on swim protocol with gossip protocol improvement
https://github.com/scalecube/scalecube
you can find references:
https://github.com/scalecube/scalecube/wiki/Distributed-Computing-Research
in general when new node joining to the network it joins one of the already running cluster nodes (seeds or members) and the cluster gossip about the new member and creating a "cluster" the gossip protocol "infects" the cluster with the membership information.
usually there is a set of nodes that serve as the entry point to the cluster called seeds and they can be a well known members or discovered using diffident methods such as dns name so when new member join the cluster it can look for a host name "seed" and that is resolved to its current or one of the seed ips.
in microservices architecture seeds can also be the api-gateways or specific nodes that act as seeds usually its best to choose the seeds as the members that least subject to changes and upgrades.
I have written a post discussing the topic
https://www.linkedin.com/pulse/swim-cluster-membership-protocol-ronen-nachmias/

Redis failover and Partitioning?

I am using client side partitioning on a 4 node redis setup. The writes and reads are distributed among the nodes. Redis is used as a persistence layer for volatile data as well as a cache by different parts of application. We also have a cassandra deployment for persisting non-volatile data.
On redis we peak at nearly 1k ops/sec (instantaneous_ops_per_sec). The load is expected to increase with time. There are many operations where we query for a non-existent key to check whether data is present for that key.
I want to achieve following things:
Writes should failover to something when a redis node goes down.
There should be a backup for reading the data lost when the redis node went down.
If we add more redis nodes in the future (or a dead node comes back up), reads and writes should be re-distributed consistently.
I am trying to figure out suitable design to handle the above scenario. I have thought of the following options:
Create hot slaves for the existing nodes and swap them as and when a master goes down. This will not address the third point.
Write a Application layer to persist data in both redis and cassandra allowing a lazy load path for reads when a redis node goes down. This approach will have an overhead of writing to two stores.
Which is a better approach? Is there a suitable alternative to the above approaches?
A load of 1k ops/s is far below the capabilities of Redis. You would need to increase by up to two or more orders of magnitude before you come close to overloading it. If you aren't expecting to exceed 50-70,000 ops/second and are not exceeding your available single/0-node memory I really wouldn't bother with sharding your data as it is more effort than it is worth.
That said, I wouldn't do sharding for this client-side. I'd look at something like Twemproxy/Nutcracker to do it do you. This provides a path to a Redis Cluster as well as the ability to scale out connections and proved transparent client-side support for failover scenarios.
To handle failover in the client you would want to set up two instances per slot (in your description a write node) with one shaved to the other. Then you would run a Sentinel Constellation to manage the failover.
Then you would need to have your client code connect to sentinel to get the current master connectivity for each slot. This also means client code which can reconnect to the newly promoted master when a failover occurs. If you have load Balancers available you can place your Redis nodes behind one or more (preferably two with failover) and eliminated client reconnection requirements, but you would then need to implement a sentinel script or monitor to update the load balancer configuration on failover.
For the Sentinel Constellation a standard 3 node setup will work fine. If you do your load balancing with software in nodes you control it would be best to have at least two sentinel nodes on the load Balancers to provide natural connectivity tests.
Given your description I would test out running a single master with multiple read slaves, and instead of hashing in client code, distribute reads to slaves and writes to master. This will provide a much simpler setup and likely less complex code on the client side. Scaling read slaves is easier and simpler, and as you describe it the vast majority if ops will be read requests so it fits your described usage pattern precisely.
You would still need to use Sentinel to manage failover, but that complexity will still exist, resulting in a net decrease in code and code complexity. For a single master, sentinel is almost trivial so setup; the caveats being code to either manage a load balancer or Virtual IP or to handle sentinel discovery in the client code.
You are opening the distributed database Pandora's box here.
My best suggestion is; don't do it, don't implement your own Redis Cluster unless you can afford loosing data and / or you can take some downtime.
If you can afford running on not-yet-production-ready software, my suggestion is to have a look at the official Redis Cluster implementation; if your requirements are low enough for you to kick your own cluster implementation, chances are that you can afford using Redis Cluster directly which has a community behind.
Have you considered looking at different software than Redis? Cassandra,Riak,DynamoDB,Hadoop are great examples of mature distributes databases that would do what you asked out of the box.

How Connection Pool/distribution are across Vertica cluster is done?

How Connection Pool/distribution are across Vertica cluster ?
I am trying to understand how connections are handeled in Vertica! Like Oracle handles it's connections thou it's listener or how the connections are balanced inside the cluster (for better distribution).
Vertica's process of handling a connection is basically as follows:
A node receives the connection, making it the Initiator Node.
The initiator node generates the query execution plan and distributes it to the other nodes.
The nodes fill in any node specific details of the execution plan
The nodes execute the query
(ignoring some stuff here)*
The nodes send the result set back to the initiator node
The initiator node collects the data and does final aggregations
The initiator node sends the data back to the client.
The recommended way to connect through Vertica is through a load balancer so no single node becomes a failure point. Vertica itself does not distribute connections between nodes, it distributes the query to the other nodes.
I'm not well versed in Oracle or the details of how systems do their data connection process; so hopefully I'm not too far off the mark of what you're looking for.
From /my/ experience, each node can handle a number of connections. Once you try to connect more than that to a node, it will reject the connection. That was experienced from a map-reduce job that connected in the map function.
*Depending on the query/data/partitioning it may need to do some data transfer behind the scene to complete the query for each node. It slows the query down when this happens.

Leader Election Algorithm

I am exploring various architectures in cluster computing. Some of the popular ones are:
Master-Slave.
RPC
...
In Master-slave, the normal way is to set one machine as master & a bunch of machines as slaves controlled by master. One particular algo here got me interested. It's called Leader-Election Algo which has a certain randomness in selecting which of the machines will become master.
My question is - Why would anyone want to elect a master machine this way? What advantages does this approach have compared to manually selecting a machine as master?
There are some advantages with this algorithms:
Selection of node as leader will be
done dynamically so for example you
can select node with highest
performance, and arrival of new
nodes may be makes better choice.
Another good approach by dynamically
selecting leader is, if one of a
nodes have major fault (for example
PC is shutting down) you have other
choices and there is no need to
manually change the leader.
if you manually select node should
manually configure all other nodes
to use this node, and also set their
time manually, ... but this
algorithms will help you to handle
timing issues.
for example (not very relevant) why
in most cases using DHCP? too many
configs will be handeled by this
algorithms.
Main idea of using such algorithms is to get rid of additional configuration, add some kind of flexibility, and stability of the whole system. But usually (in HPC/MPI applications) master node is selected manually.
Suppose your master selection algorithms is quite easy - get the list of available systems and select the one with the highest IP address. In this case you can easily start a new process on any of your nodes and it will automatically find the master node.
One nice example of such ideas is the WCCP protocol "designated proxy" selection algorithm where the number of proxies could be flexible and master node is selected in the runtime.
Considering a network of nodes, where it is vital to have one leader node at all times. If the current leader dies, then the network some how has to choose another leader. Given this scenario and requirement there are two possible ways to do it.
The central system approach, where there is a central node
deciding who will be the leader. If
the current leader dies, then this
central node will decide on who
should take over the leader role.
But this is single point of failure,
that is the central node who is
responsible for deciding the leader,
goes down then there is no one there to select leaders if the current leader dies.
Where as in the same scenario we can
use distributed leader selection, as
in all the nodes come to a consensus
who the leader should be. So we do not need to have a central node who decides on who the leader should be, hence eliminating the single point of failure. When the leader node dies, then there will be a way to detect node failure, and then every node will start a distributed leader selection algorithm, and mutually come to a consensus of electing a leader.
So, in short when you have a system which has no central control, probably because the system is meant to be scalable without having single point of failure, in those systems to take choose some node, leader elections algorithms are used.

What cluster node should be active?

There is some cluster and there is some unix network daemon. This daemon is started on each cluster node, but only one can be active.
When active daemon breaks (whether program breaks of node breaks), other node should become active.
I could think of few possible algorithms, but I think there is some already done research on this and some ready-to-go algorithms? Am I right? Can you point me to the answer?
Thanks.
Jgroups is a Java network stack which includes DistributedLockManager type of support and cluster voting capabilities. These allow any number of unix daemons to agree on who should be active. All of the nodes could be trying to obtain a lock (for example) and only one will succeed until the application or the node fails.
Jgroups also have the concept of the coordinator of a specific communication channel. Only one node can be coordinator at one time and when a node fails, another node becomes coordinator. It is simple to test to see if you are the coordinator in which case you would be active.
See: http://www.jgroups.org/javadoc/org/jgroups/blocks/DistributedLockManager.html
If you are going to implement this yourself there is a bunch of stuff to keep in mind:
Each node needs to have a consistent view of the cluster.
All nodes will need to inform all of the rest of the nodes that they are online -- maybe with multicast.
Nodes that go offline (because of ap or node failure) will need to be removed from all other nodes' "view".
You can then have the node with the lowest IP or something be the active node.
If this isn't appropriate then you will need to have some sort of voting exchange so the nodes can agree who is active. Something like: http://en.wikipedia.org/wiki/Two-phase_commit_protocol

Resources