DC-DR Sync issue in Patroni PostgreSQL setup - high-availability

We have a 3 instance cluster in DC and a 3 instance cluster in DR as standby_cluster. For DC-DR real-time sync, we have added the DC cluster's leader's IP as standby_cluster IP in the DR patroni config. which is working fine and we are getting real-time syncing.
But in the case of changing DC leader internally, DR leader is not able to create a writable connection on the previous DC instance. and gets below error.
FATAL: could not connect to the primary server: could not make a writable connection to server "13.233.76.9:5432"
Can Anyone please help us for solving this issue? We are stuck on this issue for months.
For your reference:
pg_hba config:
TYPE DATABASE USER ADDRESS METHOD
local all all trust
host all all 127.0.0.1/32 trust
host all all ::1/128 trust
local replication all trust
host replication all 127.0.0.1/32 trust
host replication all ::1/128 trust
host replication replicator 127.0.0.1/32 md5
host replication replicator 172.31.4.196/32 scram-sha-256
host replication replicator 172.31.2.237/32 scram-sha-256
host replication replicator 172.31.2.83/32 scram-sha-256
host replication replicator 172.31.45.26/32 scram-sha-256
host replication replicator 172.31.43.207/32 scram-sha-256
host replication replicator 172.31.42.188/32 scram-sha-256
host replication replicator 13.230.225.219/32 trust
host replication replicator 13.200.182.158/32 trust
host replication replicator 13.112.25.208/32 trust
host replication replicator 0.0.0.0/32 trust
DC Current Leader: 13.233.76.9
DC New Leader: 52.67.253.203
DR Leader: 13.230.225.219

An issue was opened REF: https://github.com/zalando/patroni/issues/2460
and is not done automatically.
You have to put something stable to standby_cluster.host, e.g. hostname of load-balancer that always connects to the primary, Virtual IP, or maybe list all potential nodes of the source cluster there.

Related

Can't connect to my Oracle Virtual Cloud Instance

the Terminal just says:
ssh: connect to host xxx.xxx.xxx.xxx port 22: Connection timed out
Here are the firewall rules.
I can't ssh into the VM so I cant change the firewall rules on the VM
Please Help.
First, double-check your IP address: it must match your Oracle Virtual Cloud Public IP Address, assuming it is a reserved one (meaning it is a fixed one)
Second, check your local firewall: you cannot change the remote ones, but the local rules might still block your SSH traffic.
As mentioned here:
ust by opening the port through firewall and security lists will not allow new incoming connections. Ex: unless there is a service listening on port 443 (Tomcat etc), you will be unable to connect. Same with SSH daemon for port 22.
So make sure the SSH daemon is up and running.
Check also Default Security List
Unlike other security lists, the default security list comes with an initial set of stateful rules, which should in most cases be changed to only allow inbound traffic from authorized subnets relevant to the region that homes that VCN or subnet.
A list of authorized subnet ranges relevant to each region can be found here .
In particular:
Stateful ingress: Allow TCP traffic on destination port 22 (SSH) from authorized source IP addresses and any source port.
This rule makes it easy for you to create a new cloud network and public subnet, launch a Linux instance, and then immediately use SSH to connect to that instance without needing to write any security list rules yourself.
You can mount your machine drive to some other machine, edit sshd config and mount it back.
That helped me :D
See this ref: https://blogs.oracle.com/cloud-infrastructure/post/recovering-opc-user-ssh-key-on-oracle-cloud-infrastructure

Able to ping EC2 from on-premises through VPN. But, unable to ping DMS replication instance

I have setup a VPN and able to ping the Private IP of EC2 instance from on-premises and vice versa. However, I am unable to the ping the Private IP of DMS Replication Instance.
I have created an endpoint pointing DB in EC2. Endpoint test connection succeeds. However, endpoint test connection fails for DB in on-premises.
The EC2 and DMS Replication Instance use the same Subnet, Security Group etc., The details are given in the image below.
May I know
1) why the DMS instance is not communicating with on-premises (and vice-versa)
2) why EC2 works fine in VPN but not DMS instance?
EDIT:
Details of Security Group associated with the DMS instance:
vpc - the same default vpc used by EC2
inbound rules - all traffic, all protocol, all port range, source = 192.168.0.0/24
outbound rules - all traffic, all protocol, all port range, source = 0.0.0.0/0
Route table:
destination - 10.0.0.0/16, target = local
destination - 0.0.0.0/0, target = internet gateway
destination - 192.168.0.0/24, target = virtual private gateway used in VPN
This is the error message I get when I try to test the DMS DB endpoint connection:
Test Endpoint failed: Application-Status: 1020912, Application-Message: Failed to connect Network error has occurred, Application-Detailed-Message: RetCode: SQL_ERROR SqlState: HYT00 NativeError: 0 Message: [unixODBC][Microsoft][ODBC Driver 13 for SQL Server]Login timeout expired ODBC general error.
You might need to describe/provide your full network topology for a more precise answer, but my best guess, based on AWS' documentation on "Network Security for AWS Database Migration Service", is that you're missing source and target database configuration:
Database endpoints must include network ACLs and security group rules that allow incoming access from the replication instance. You can achieve this using the replication instance's security group, the private IP address, the public IP address, or the NAT gateway’s public address, depending on your configuration.
Also, is this EC2 you mentioned a NAT instance? Just in case:
If your network uses a VPN tunnel, the Amazon EC2 instance acting as the NAT gateway must use a security group that has rules that allow the replication instance to send traffic through it.

replication server with same id in docker/openshift cluster

I'm having problems when setting up replication in openshift/docker cluster.
In openshift, each opendj server will have two ips: service ip and pod id. So when I setup two opendj service, two service ip and two pod ip will be there.
I want to set up the replication by service ip, because pod is is not accessible from other pod, but apparently OpenDJ think there are four replication server there, with each two server having same ServerId.
Log snippet:
category=SYNC severity=ERROR msgID=org.opends.messages.replication.55 msg=In Replication server Replication Server 8989 31635: replication servers 172.30.244.127(service ip):8989 and 10.129.0.1:8989(pod ip) have the same ServerId : 11281
My Question is: is it possible to just build the Replication Server cluster by Service IP, not Pod id?
Thanks a lot.
PS: seems this issue is similar with this https://bugster.forgerock.org/jira/browse/OPENDJ-567
Wayne
For anyone having the same issue, please config your opendj service to headless service, that will solve the problem

MemSQL - unable to connect remotely to EC2 cluster using MySQL client

I have used http://cloud.memsql.com to successfully deploy a MemSQL cluster to EC2 as documented here: http://docs.memsql.com/4.0/setup/setup_cloud/.
I can SSH to the master aggregator, and successfully login to the MemSQL prompt locally. However, I cannot connect remotely using a MySQL client application.
I have double-checked port 3306 is open and just for testing have applied all privileges to root:
GRANT ALL PRIVILEGES ON *.* TO root#'%' IDENTIFIED BY 'password';
FLUSH PRIVILEGES;
The documentation states:
Once your cluster is up and running, connect to the master aggregator using any valid MySQL client driver.
Can anyone advise on a step I have missed?
--
UPDATE 1 - The source range is open for the MemSQL port:
3306 tcp 0.0.0.0/0
UPDATE 2 - ufw has been disabled for testing.
Currently, clusters spun up by cloud.memsql.com lock down their security group to the vpc for the MemSQL ports (like 3306). If you want to access it from outside of the vpc, you will need to add a new rule to the group. Something like this would open the group completely:
Add an Ingress rule for port 3306-3306 for CIDR: 0.0.0.0/0
Note that this will open the cluster to the world, and anyone will be able to connect. Instead of 0.0.0.0/0, I recommend using your public ip and a /32 like so: YOUR_IP/32
It turned out to be a DNS issue by the provider I was using. Tried connecting using a cell phone and had no issues.

Installing corosync and pacemaker on Amazon EC2 instances

I'm trying to setup a HA cluster for 2 amazon instances. The OS of my instances is CentOS7.
Hostnames:
master1.example.com
master2.example.com
IP internal:
10.0.0.x1
10.0.0.x2
IP public:
52.19.x.x
52.18.x.x
I'm following this tutorial:
http://jensd.be/156/linux/building-a-high-available-failover-cluster-with-pacemaker-corosync-pcs
[root#master1 centos]# pcs status nodes
Pacemaker Nodes:
Online: master1.example.com
Standby:
Offline: master2.example.com
while my master 2 is showing the following
[root#master2 centos]# pcs status nodes
Pacemaker Nodes:
Online: master2.example.com
Standby:
Offline: master1.example.com
But they should be online, both.
What am I doing wrong?
Which IP do I have to choose as Virtual IP? Because the IP's are not in the same subnet.
Change you security group rules to allow inbound and outbound tcp & https traffic between all cluster nodes. That should do it. (pretty old question but unanswered so thought someone might need it).

Resources