Who rewrites redis configuration slaveof of slave redis instances? - redis-sentinel

Consider a redis sentinel setup with 5 machines. Each machine has sentinel process(s1,s2,s3,s4,s5) and redis instance(r1,r2,r3,r4,r5) running. One is master(r1) and others as slave(r2...r5). During failover of master r1, redis configuration slaveof of must be override with new master r3.
Who will override the redis configuration of slave redis(r2,r4,r5)? Elected sentinel responsible for failover(assuming s2 is elected sentinel) s2 will override the redis configuration at r2,r4,r5 or sentinel running at their respective machine will override the local redis configuration(sn will override configuration of rn)?

Elected Sentinel would update the configuration.This is the full list of Sentinel capabilities at a high level:
Monitoring: Sentinel constantly checks if your master and slave instances are working as expected.
Notification: Sentinel can notify the system administrator, another computer programs, via an API, that something is wrong with one of the monitored Redis instances.
Automatic failover: If a master is not working as expected, Sentinel can start a failover process where a slave is promoted to master, the other additional slaves are reconfigured to use the new master, and the applications using the Redis server informed about the new address to use when connecting.
Configuration provider: Sentinel acts as a source of authority for clients service discovery: clients connect to Sentinels in order to ask for the address of the current Redis master responsible for a given service. If a failover occurs, Sentinels will report the new address.
For more details, refer to docs

Related

Redis Cluster Create Replicas Bind Public IP

We have 6 redis servers running in ports (8001, 8002, 8003, 8004, 8005, 8006).
On the redis.conf of every Redis server we bind the ip in different ways like:
bind 0.0.0.0
bind PRIVATE PUBLIC
bind PUBLIC
If we access like it works fine:
redis-cli -h PUBLIC_IP -p 8001
But when we wanna create the clusters we run:
./src/redis-cli --cluster create PUBLIC_IP:8001 PUBLIC_IP:8002 PUBLIC_IP:8003 PUBLIC_IP:8004 PUBLIC_IP:8005 PUBLIC_IP:8006 --cluster-replicas 1
The console always shows and keeps in Waiting for the cluster forever:
Master[0] -> Slots 0 - 5460
Master[1] -> Slots 5461 - 10922
Master[2] -> Slots 10923 - 16383
Adding replica PUBLIC_IP:8005 to PUBLIC_IP:8001
Adding replica PUBLIC_IP:8006 to PUBLIC_IP:8002
Adding replica PUBLIC_IP:8004 to PUBLIC_IP:8003
>>> Trying to optimize slaves allocation for anti-affinity
[WARNING] Some slaves are in the same host as their master
M: 7ab009459f7f5cf6cef5f46b691748dc236e4c26 PUBLIC_IP:8001
slots:[0-5460] (5461 slots) master
M: 0048ca2cd65c1315b8f0a7c952b69bfb494d5ace PUBLIC_IP:8002
slots:[5461-10922] (5462 slots) master
M: c6ee023719f200b0d175f428fa15e5ab767d0e04 PUBLIC_IP:8003
slots:[10923-16383] (5461 slots) master
S: cf636a1a46b1e947daec3e797cac524c613f08ca PUBLIC_IP:8004
replicates 7ab009459f7f5cf6cef5f46b691748dc236e4c26
S: 5d4bd1041457114353b0b30dbefd86ab8e4ae020 PUBLIC_IP:8005
replicates 0048ca2cd65c1315b8f0a7c952b69bfb494d5ace
S: 62f01289dc3f72cac4a1745fc77b7bd91ec5d107 PUBLIC_IP:8006
replicates c6ee023719f200b0d175f428fa15e5ab767d0e04
Can I set the above configuration? (type 'yes' to accept): yes
>>> Nodes configuration updated
>>> Assign a different config epoch to each node
>>> Sending CLUSTER MEET messages to join the cluster
Waiting for the cluster to join
A lot of people says that we need to bind the private ip, but we wanna do it on public becase when we connect with the external machines the clustering redirect to the master that contains the key, if we bind the private ip the redirect will shows "redirect to PRIVATE_IP" and that will not work as expected.
Are we missing something to let the cluster join by public IP?
From redis security guide:
Redis is designed to be accessed by trusted clients inside trusted environments.
See also: How to connect to redis from remote guide
When a server binds on its public ip, it can get requests from everyone, so unless you built some security around it anyone can access and manipulate your data.
In redis cluster the rules are the same and the replicas which binds on public ips are exposed.
The default use case for a redis cluster is that one machine (or multiple machines) access it from within it's private network, and you shouldn't divert from that unless you know what you are doing security wise.
If it makes sense for your use case, you should make the machine which access the redis cluster a part of the cluster private network.
What I would be doing if I were at your place is:
Bind all the servers with private ip and loopback ip i.e bind {{ private_ip }} 127.0.0.1
Enable ufw (or other any firewalling tool) on each server and do (for ufw) allow from {{ private_ip }} to any port {{ redis_port }} or similar.
My internal DNS will have entry for all the servers with their respective private ip.
Voila! create and access redis cluster securely without any security breach.
NOTE: if you still want to access them over public network then you can do some workaround with SNAT
WARNING: binding redis server to 0.0.0.0 or public ip might cause serious vulnerability issues like:
https://www.exploit-db.com/exploits/47195
https://medium.com/#knownsec404team/rce-exploits-of-redis-based-on-master-slave-replication-ef7a664ce1d0
PS: You can also follow this medium tutorial.

Requiring public IP address for kafka running on EC2

We have kafka and zookeeper installed on a single AWS EC2 instance. We have kafka producers and consumers running on separate ec2 instances which are on the same VPC and have the same security group as that of kafka instance. In the producer or consumer config we are using the internal IP address of the kafka server to connect to it.
But we have noticed that we need to mention the public IP address of the EC2 server as advertised.listeners for letting the producers and consumers connect to the Kafka server:
advertised.listeners=PLAINTEXT://PUBLIC_IP:9092
Also we have to whitelist the public ip addresses and open traffic on 9092 port of each of our ec2 servers running producers and consumers.
We want the traffic to flow using internal IP addresses. Is there a way we need not whitelist the public ip addresses and open traffic on 9092 port for each one of our servers running producer or consumer?
If you don't want to open access to all for either one of your servers, I would recommend adding a proper high performance web server like nginx or Apache HTTPD in front of your applications' servers acting as a reverse proxy. This way you could also add SSL encryption and your server stays on a private network while only the web server would be exposed. It’s very easy and you can find many tutorials on how to set it up. Like this one: http://webapp.org.ua/sysadmin/setting-up-nginx-ssl-reverse-proxy-for-tomcat/
Because of the variable nature of the ecosystem that kafka may need to work in, it only makes sense that you are explicit in declaring the locations which kafka can use. The only way to guarantee that external parts of any system can be reached via an ip address is to ensure that you are using external ip addresses.

Service discovery cache update in the case of node failure

I am trying to adopt a service discovery mechanism for my system. I have a bunch of nodes and they will communicate with each other via gRpc. Because in some frameworks like Mesos, a new node is brought up after it fails would possibly has a different ip address and a different port, I am thinking of using service discovery so that each node can have a cluster config that is agnostic to node failure.
My current options are to using DNS or strongly-consistent key-value store like etcd or zookeeper. My problem is to understand how the cache of name mappings in healthy nodes get invalidated and updated when a node goes through down and up.
The possible ways I can think of are:
When healthy nodes detect a connection problem, they invalidate
their cache entry immediately and keep pulling the DNS registry
until the node is connectable again.
When a node is down and up, the DNS registry broadcasts the events to all healthy nodes. Seems this may require heartbeats from DNS registry.
The cache in each node has a TTL field and within a TTL interval each node has to live with the node failure until the cache entry expires and pulls from the DNS registry again.
My question is which option (you can name more) is the case in reality and why it is better than other alternatives?

Replace ZooKeeper servers

I want to replace current 3 ZooKeeper servers with 3 new ZooKeeper servers. I have added:
new Zoo to Ambari,
add new Zoo to variables:
hbase.zookeeper.quorum
ha.zookeeper.quorum
zookeeper.connect
hadoop.registry.zk.quorum
yarn.resourcemanager.zk-address
Restart services, restart RM, and still I can't connect to any new Zoo when I turn off all old Zoo servers.
zookeeper-client -server zoo-new1
I get the following error:
"Unable to read additional data from server sessionid 0x0, likely server has closed socket"
And on new Zoo server in logs (zookeeper.out):
"Exception causing close of session 0x0 due to java.io.IOException: ZooKeeperServer not running"
When I run one of the old ZooKeepers, then everything is working, and I can connect also to the new ZooKeeper servers.
My best guess is that this has to do with one of the most important properties in zookeeper, namely leader election. If you start with a zookeeper quorum with 3 servers and add 3 more servers to it. You will have to have at least 4 servers running for the quorum to be accessible. When a zookeeper node was unable to elect a leader it will look as if it's down.
This is also the reason why your setup works when you start one of the old zookeepers, because they are now 4 alive of 6 possible. If you want the new setup to work you need to remove the old servers from the config, so that the quorum only knows about the three new ones. To simply shut a zookeeper server down will not remove it from the quorum.

Redis on Windows - Sentinels not communicating

I am setting up my first Redis framework, and so far I have the following:
Server1:
- Redis master
- 3 Redis Sentinels (quorum set to 2)
Server2:
- Redis slave
- 3 Redis Sentinels (quorum set to 2)
The master and slave appear to be working properly and data is syncing from the master to the slave. When I install and start the sentinels, they too seem to run ok in the fact that if I connect to any of them, and run sentinel masters, it will show the sentinel is pointed at my Redis master and is showing the various properties.
However, the actual failover doesn't seem to work. For example, if I connect to my Redis master and run debug segfault to get it to fail, the failover to the slave does not occur. None of the sentinels log anything so it appears they are not actually connected. Here is the configuration for my sentinels:
port 26381
sentinel monitor redismaster ServerName 26380 2
sentinel down-after-milliseconds redismaster 10000
sentinel failover-timeout redismaster 180000
sentinel parallel-syncs redismaster 1
logfile "nodes/sentinel1/sentinel.log"
As you can see, this sentinel runs on 26381 (and subsequent sentinels run on 26382 and 26383). My Redis master runs on 26380. All of the ports are open, names/IPs resolve correctly, etc., so I don't think it is an infrastructure issue. In case it is useful, I am running Redis (2.8.17) which I downloaded from the MS Open Tech page.
Does anyone have any thoughts on what might be the problem, or suggestions on how to troubleshoot? I am having a hard time finding accurate documentation for setting up a H.A. instance of Redis on Windows, so any commands useful for troubleshooting these types of issues would be greatly appreciated.
I figured this out. One thing I neglected to mention in my question is that I have the masterauth configuration specified in my Redis master config file, so my clients have to provide a password to connect. I missed this in my sentinel configuration, and did not provide a password. The sentinel logging does not indicate this, so it was not obvious to me. Once I added this:
sentinel auth-pass redismaster <myPassword>
To my sentinel configuration file, everything started working as it should.

Resources