I am trying to setup redis-sentinel configuration for the fail over support .Here is my configuration ,
machine1 : IP : 10.0.0.1 6379 with redis-sentinel port 26379
machine2 : IP : 10.0.0.2 6379 with redis-sentinel port 26379
machine3 : IP : 10.0.0.3 6379 with redis-sentinel port 26379
Redis sentinel config
machine 1 :
sentinel monitor mymaster 10.0.0.1 6379 2
sentinel down-after-milliseconds mymaster 60000
sentinel failover-timeout mymaster 180000
sentinel parallel-syncs mymaster 1
machine 2 :
sentinel monitor mymaster 10.0.0.1 6379 2
sentinel down-after-milliseconds mymaster 60000
sentinel failover-timeout mymaster 180000
sentinel parallel-syncs mymaster 1
machine 3:
sentinel monitor mymaster 10.0.0.1 6379 2
sentinel down-after-milliseconds mymaster 60000
sentinel failover-timeout mymaster 180000
sentinel parallel-syncs mymaster 1
Added machine 2 and machine 3 as slave of machine 1 . Replication is working fine.But when machine 1 is down then master switch is not happening with other machines. They are still acting as slaves. Is there any config issues with my setup ?
Some questions before I can give a better answer:
Is there authentication running on the redis instances?
Have the sentinels actually detected the pod's topology?
If the above sentinel configurations are complete, the sentinels have not actually attached to the master. Sentinel rewrites the configuration file to store discovered topology, so what you initially configured it with would be accompanied by what it discovered. In particular we would see slave entries as well.
The other possibility is that enough sentinels to reach a quorum have not connected to the master successfully. If Redis is configured with authentication required, you need to tell the sentinels the authentication token as well using the sentinel set command.
If you can post the complete configuration, as well as the logs of the sentinels when you down the master we can provide more specific actions.
On a related note, in production I would recommend against such a setup. With the one you have you can wind up with what is known as split-brain. If the machine the master is on gets isolated from the others, but is still running, the other two will elect a new master, at which point you will have two masters. If clients are still able to connect to the maste, existing connections will stay on the original but new ones using sentinel to get the master will connect to the second master.
By running the sentinels on different machines you reduce this risk. If you have a limited number of client machines and can run sentinel there you can nearly or completely eliminate this possibility.
Related
Consider a redis sentinel setup with 5 machines. Each machine has sentinel process(s1,s2,s3,s4,s5) and redis instance(r1,r2,r3,r4,r5) running. One is master(r1) and others as slave(r2...r5). During failover of master r1, redis configuration slaveof of must be override with new master r3.
Who will override the redis configuration of slave redis(r2,r4,r5)? Elected sentinel responsible for failover(assuming s2 is elected sentinel) s2 will override the redis configuration at r2,r4,r5 or sentinel running at their respective machine will override the local redis configuration(sn will override configuration of rn)?
Elected Sentinel would update the configuration.This is the full list of Sentinel capabilities at a high level:
Monitoring: Sentinel constantly checks if your master and slave instances are working as expected.
Notification: Sentinel can notify the system administrator, another computer programs, via an API, that something is wrong with one of the monitored Redis instances.
Automatic failover: If a master is not working as expected, Sentinel can start a failover process where a slave is promoted to master, the other additional slaves are reconfigured to use the new master, and the applications using the Redis server informed about the new address to use when connecting.
Configuration provider: Sentinel acts as a source of authority for clients service discovery: clients connect to Sentinels in order to ask for the address of the current Redis master responsible for a given service. If a failover occurs, Sentinels will report the new address.
For more details, refer to docs
I have setup an elastic-search cluster with two data nodes , one master node and one client node with KIBANA.
I was running it with iptables disabled on each node (CC 6). Now i need to enable iptables and i want to know which of the ports (9200 , 9300) , i need to open on each node and in which direction (incoming or outgoing). The discovery is using uni-cast.
I would also like to know on which node i should place authentication , i.e just the client node ?
Cluster: mycluster
data-node1
data-node2
master-node1
client-node1
Thanks.
9200 is used for the HTTP API, 9300 is used for communication between nodes and cluster.
For the above configuration I would:
Bind port 9200 on all hosts to 127.0.0.1
Bind port 9300 on all hosts to the local lan, i.e. 192.168.x.x
Run nginx and apply basic authentication (htpasswd for example), reverse proxy to 127.0.0.1:5601 (kibana), assuming you're running your client node on the same machine as you are running Kibana on.
In your Kibana configuration, have it connect to localhost:9200 and bind the interface to 127.0.0.1
I am setting up my first Redis framework, and so far I have the following:
Server1:
- Redis master
- 3 Redis Sentinels (quorum set to 2)
Server2:
- Redis slave
- 3 Redis Sentinels (quorum set to 2)
The master and slave appear to be working properly and data is syncing from the master to the slave. When I install and start the sentinels, they too seem to run ok in the fact that if I connect to any of them, and run sentinel masters, it will show the sentinel is pointed at my Redis master and is showing the various properties.
However, the actual failover doesn't seem to work. For example, if I connect to my Redis master and run debug segfault to get it to fail, the failover to the slave does not occur. None of the sentinels log anything so it appears they are not actually connected. Here is the configuration for my sentinels:
port 26381
sentinel monitor redismaster ServerName 26380 2
sentinel down-after-milliseconds redismaster 10000
sentinel failover-timeout redismaster 180000
sentinel parallel-syncs redismaster 1
logfile "nodes/sentinel1/sentinel.log"
As you can see, this sentinel runs on 26381 (and subsequent sentinels run on 26382 and 26383). My Redis master runs on 26380. All of the ports are open, names/IPs resolve correctly, etc., so I don't think it is an infrastructure issue. In case it is useful, I am running Redis (2.8.17) which I downloaded from the MS Open Tech page.
Does anyone have any thoughts on what might be the problem, or suggestions on how to troubleshoot? I am having a hard time finding accurate documentation for setting up a H.A. instance of Redis on Windows, so any commands useful for troubleshooting these types of issues would be greatly appreciated.
I figured this out. One thing I neglected to mention in my question is that I have the masterauth configuration specified in my Redis master config file, so my clients have to provide a password to connect. I missed this in my sentinel configuration, and did not provide a password. The sentinel logging does not indicate this, so it was not obvious to me. Once I added this:
sentinel auth-pass redismaster <myPassword>
To my sentinel configuration file, everything started working as it should.
I have two Ubuntu instances in the EC2 and I want to cluster them.
One ip will be refered as - X (the "net addr" ifconfig displayed IP) and its public ip will be reffered as PX.
the other ip is Y and its public is Y.
So now I did the following on both machines.
installed the latest rabbbitmq.
installed the management plugin.
opened the port for 5672 (rabbit) and 15672(management plugin)
connected to rabbit with my test app.
connected to the ui.
So now for the cluster.
I did the following commands
on X
rabbitmqctl cluster_status
got the node name which was 'rabbit#ip-X' (where X is the inner IP)
on Y
rabbitmqctl stop_app
rabbitmqctl join_cluster --ram rabbit#ip-X
I got
"The nodes provided are either offline or not running"
Obviously this is the private ip, so the other instance cant connect.
How do I tell the second instance where the first is located?
EDIT
Firewall is completely off, I have a telnet connection from one remote to the other
(to ports 5672(rmq),15672 (ui), 4369 (cluster port)).
The cookie on both servers (and the hash of the cookie in the logs is the same).
when recorded tcp when running the join cluster command and watched in wireshark. I saw the following (no ack. )
http://i.imgur.com/PLezLvQ.png
so I closed the firewall using
sudo ufw disable
(just for the tests) and I re-typed
sudo rabbitmqctl join_cluster --ram rabbit#ip-XX
and the connection was created - but terminated by the remote rabbit
here :
http://i.imgur.com/dxJLNfH.png
and the message is still
"The nodes provided are either offline or not running"
(the remote rabbit app is definitely running)
You need to make sure the nodes can access each other. RabbitMQ uses distributed Erlang primitives for communication across the nodes, so you also have to open up a few ports in the firewall. See:
http://learnyousomeerlang.com/distribunomicon#firewalls
for details.
You should also use the same data center for your nodes in the cluster, since RabbitMQ can get really sad on network partitions. If your nodes are in different data centers, you should use the shovel or federation plugin instead of clustering for replication of data.
Edit: don't forget to use the same Erlang cookie on all nodes, see http://www.rabbitmq.com/clustering.html for details.
The issue are probably TCP ports that should be opened.
You should do the following:
1) Create a Security Group for the Rabbit Servers (both will use it)
we will call it: rabbit-sg
2) In the Security Group, Define the following ports:
All TCP TCP 0 - 65535 sg-xxxx (rabbit-sg)
SSH TCP 22 0.0.0.0/0
Custom TCP Rule TCP 4369 0.0.0.0/0
Custom TCP Rule TCP 5672 0.0.0.0/0
Custom TCP Rule TCP 15672 0.0.0.0/0
Custom TCP Rule TCP 25672 0.0.0.0/0
Custom TCP Rule TCP 35197 0.0.0.0/0
Custom TCP Rule TCP 55672 0.0.0.0/0
3) make sure both EC2 use this security group,
note that we opened all TCP between the EC2
4) make sure the rabbit cookie is the same and that you reboot the EC2
after changing it in the slave EC2
I have a couple of offline nodes in a cluster. I want their time to be synchronized, so I configured one of the nodes to be the NTP server.
This is the configuration file of my NTP server:
# node's ip:192.168.17.11
driftfile /etc/ntp.drift
server 192.168.17.11
fudge 192.168.17.11 stratum 1
restrict 192.168.17.0 mask 255.255.255.0 nomodify notrap
restrict -4 default kod notrap nomodify nopeer noquery
The problem is that the other machines can not synchronize their time with this machine.
Is this configuration file correct for acting as NTP server?
Thank you,
Ali
Using 127.0.0.1 as the server IP you are not allowing other compututers in the LAN to communicate with the NTP server.
Try changing 127.0.0.1 by the LAN IP of the server in the configuration file.