I'm reading around Redis at the moment and trying to find a good understanding of what a 'node' is terms of how Redis works. Am I right to think of it in the same was as an endpoint?
In Redis' context, a node is a server running one or more redis-server processes.
Endpoint is a network address through which you can access one or more such processes, depending on how Redis is clustered.
When using the open source Redis cluster, an endpoint is any of the processes - meaning a node's address and the port that the process listens to. Redis client libraries use the protocol to interrogate the clustered redis-server process about other members of the cluster (again, processes listening on ports on nodes), so they can establish connections to other endpoints accordingly.
Disclaimer: it appears that you're asking about AWS ElastiCache, which may or may not be using the OSS implementation in whole or partially. I do not claim to have any knowledge on that subject.
Its a type of (temporary memory [RAM]) to which network is attached. Its the smallest unit where frequently accessed data is stored by following lazy loading or write through strategy. A collection of such nodes ,where a predefined Redis process is running on each node , is called cluster.
More on node :
https://redis.io/commands/cluster-nodes/
Related
Recently I started learning Redis and have been able to do everything from learning aspect in 32 bit Windows. I am a .net developer and made caching available using Redis using ServiceStack client in a Web API setup. I have been able to successfully run a Redis cluster of 4 masters and 4 slaves, and was wondering how can I make that work in conjunction with the ServiceStack client.
My main concern is that if the master that I connect my client to, goes down, then how can the client automatically connect to some other available slave that takes over, as the port of that slave is going to be different. So failover is working at Redis level, but how the client handles it?
I recreated the mentioned scenario, using Redis Command Line Interface, but when I took the master down, the interface just stopped responding, as in everything was just going in a blackhole. So, per my experience, the cli does not automatically handles failover as a client.
I have started studying StackExchange's client to Redis, but still have the same question.
I am using Redis distribution given by Microsoft for learning purposes available at Github (Sorry, cannot provide link as I am new here and do not have sufficient reputation points).
Redis Sentinel are additional Redis processes which monitor the health of your Redis Master/Slaves and takes care of performing Automatic Failover when it detects that your Master instance is down. The Redis Config project provides a quick way to setup a popular Redis Sentinel Configuration.
The ServiceStack.Redis Client supports Redis Sentinel and implements the Recommended client Strategy which is what enables it to automatically recover after a failover by asking one of the Sentinels for the next available address to connect to, resuming operations with one of the available instances.
You can learn more about Redis Sentinel in the official Documentation.
I am trying to debug an issue where cluster island are getting formed.
checking if there is an network issue.
which port is used to gossip in akka actor cluster by the cluster members ?
It depends on your configuration - it's the TCP port that you set up Akka to listen on. For more details refer to Cluster configuration docs (see the port and seed-node values).
Broader hints on cluster partitions:
If you ended up having a cluster partition in an Akka cluster it most likely means you're using the auto-downing feature. It's not recommended for production use, as it's rather fleaky - it relies on a simple timeout based mechanism. For more advanced downing mechanisms you can look into using the Split Brain Resolver commercial tooling, or build a downing mechanism yourself which would hook into external monitoring infrastructure (we've seen a number of teams do this).
An interesting thought to keep in mind is that perhaps you do not need auto-downing at all, and when leaving the cluster with a node you can do so cleanly by issuing Cluster.leave(address) in the code.
How do I use Consul to make sure only one service is performing a task?
I've followed the examples in http://www.consul.io/ but I am not 100% sure which way to go. Should I use KV? Should I use services? Or should I use a register a service as a Health Check and make it be callable by the cluster at a given interval?
For example, imagine there are several data centers. Within every data center there are many services running. Every one of these services can send emails. These services have to check if there are any emails to be sent. If there are, then send the emails. However, I don't want the same email be sent more than once.
How would it make sure all emails are sent and none was sent more than once?
I could do this using other technologies, but I am trying to implement this using Consul.
This is exactly the use case for Consul Distributed Locks
For example, let's say you have three servers in different AWS availability zones for fail over. Each one is launched with:
consul lock -verbose lock-name ./run_server.sh
Consul agent will only run the ./run_server.sh command on which ever server acquires the lock first. If ./run_server.sh fails on the server with the lock Consul agent will release the lock and another node which acquires it first will execute ./run_server.sh. This way you get fail over and only one server running at a time. If you registered your Consul health checks properly you'll be able to see that the server on the first node failed and you can repair and restart the consul lock ... on that node and it will block until it can acquire the lock.
Currently, Distributed Locking can only happen within a single Consul Datacenter. But, since it is up to you to decide what a Consul Servers make up a Datacenter, you should be able to solve your issue. If you want locking across Federated Consul Datacenters you'll have to wait for it, since it's a roadmap item.
First Point:
The question is how to use Consul to solve a specific problem. However, Consul cannot solve that specific problem because of intrinsic limitations in the nature of a gossip protocol.
When one datacenter cannot talk to another you cannot safely determine if the problem is the network or the affected datacenter.
The usual solution is to define what happens when one DC cannot talk to another one. For example, if we have 3 datacenters (DC1, DC2, and DC3) we can determine that whenever one DC cannot talk to the other 2 DCs then it will stop updating the database.
If DC1 cannot talk to DC2 and DC3 then DC1 will stop updating the database, and the system will assume DC2 and DC3 are still online.
Let's imagine that DC2 and DC3 are still online and they can talk to each other, then we have quorum to continue running the system.
When DC1 comes online again it will play catch up with the database.
Where can Consul help here? It can communicate between DCs and check if they are online... but so can ICMP.
Take a look at the comments. Did this answer your question? Not really. But I don't think the question has an answer.
Second point: The question is "How to use Consul in leader election?" It would have been better to ask how does Consul elect a new leader. Or "Given the documentation in Consul.io, can you give me an example on how to determine the leader using Consul".
If that is what you really want, then the question was already answered: How does a Consul agent know it is the leader of a cluster?
On the activemq MasterSlave page, they introduce a few ways for setting that up using either JDBC, Shared File, or LevelDB Store.
However, on the Network of Brokers page, they talk about the MasterSlave Discovery without the need of setting up one of the shared configuration (JDBC, File, or LevelDB Store).
<networkConnectors>
<networkConnector uri="masterslave:(tcp://host1:61616,tcp://host2:61616,tcp://..)"/>
</networkConnectors>
What are the differences between using the MasterSlave Discovery and Shared Configuration? When should I should one or the other?
JDBC, Shared File or Replicated LevelDB are all options to create a high available persistance store that can be access by masters and it's slave(s). Note that LevelDB store is not Shared, but replicated.
If you want to connect a broker via network connection (network of brokers) to another logical broker that consists of a master and a slave, the masterslave: uri prefix is a shorthand for the failover prefix with less typing.
So, MasterSlave Discovery and Shard Configuration are totally different things.
What you should compare is instead a shared persistence store (JDBC, Shared file) vs a replicated LevelDB store (share nothing). The later will allow you to setup totally independent brokers that act as a failover cluster, without the need to share a disk or database.
one issue if you are using the masterslave discovery uri
that is the cpu is high usage (>90%)
the workaround way
There is an interesting discussion going on ActiveMQ user forum about the same : http://activemq.2283324.n4.nabble.com/Avoiding-shared-state-between-master-and-slave-brokers-td4686401.html
I am also confused about this :
Is there any way to achieve shared nothing fully replicated configuration in a network of brokers wherein there is only one master at a time and all clients are connected to this one instance (with support for reelection of new master when current master goes away)?
I've been doing some research for enhancement of in-house Discovery Service on my project. We have a number of nodes in a cluster accountable for discovery service, higly available. In order to get access to some service each client app sends a multicast message to all these nodes in the cluster. All nodes respond to a client and the very first response defines a particular node for further work. This is an overhead and I'm thinking of using some kind of leader election algorithm where only a single leader responds to clients. Is it reasonable to use such an algorithm for this task?
I think what you are trying to do is load balance across multiple machines where in any machine can handle the requests. Leader selection etc seems a overhead. Probably a loadbalancer can solve the issue.