cross region oracle exadata active active cluster on OCI - oracle

Does Anybody have prior experience to set up Oracle Exadata Active-Active Cluster across OCI regions yet? If yes, can you share possible best practices and guiding principles if possible.
The goal is to set up an Active-Active Oracle Exadata cluster across two OCI regions, so customer can readily access other region if one region goes down. It has to be spontaneous without any downtime. It should not be read only Passive site and if required other site can be used (R & W mode ) at any given point. The requirement is, NOT to waste infra as Passive or Stand by, instead it is expected to use all infra as Active serving customers.
The goal is to set up an Active-Active Oracle Exadata cluster across two OCI regions, so customer can readily access both regions simultaneously at the same time. It has to be spontaneous without any downtime.
Usually, It is known that Dataguard and Goldengate can be used, but I am looking for specific implementation best practices and architectural principles considering App Middle tier accessing DB cluster spontaneously.

Your mentioned that "The goal is to set up an Active-Active Oracle Exadata cluster across two OCI regions, so customer can readily access other region if one region goes down. It has to be spontaneous without any downtime."
The terminology Active-Active or Active-Standby is used from database semantics rather than for an Exadata Cluster(DB System/VM Cluster). So i am going to take the question as that the goal here is to design a DR solution for Exadata database which has stringent RTO goals and you want a solution which is automatic/spontaneous without downtime .
Active Dataguard : OCI/DBAAS allows customers to configure a cross region dataguard . Dataguard(standby) databases are an exact (block to block) copy of the Primary database . Dataguard can be configured in Active dataguard mode which means that the standby database is opened in a Read Only mode . This essentially means that queries (Selects) could be offloaded to the standby database .
a. With proper planning and execution, Oracle Data Guard and Active Data Guard role transitions can effectively minimize downtime and ensure that the database environment is restored with minimal impact on the business.
b. A failover is used when the primary database is deemed lost or unrecoverable, or the expected time to repair exceeds the required recovery time objective (RTO). During a failover the primary database is taken offline at one site and a standby database is brought online as the primary database. Failover can be completely automated using Data Guard Fast-Start Failover or it can be a manual, administrator-driven process . Fast-Start Failover eliminates the uncertainty inherent in a process that requires manual intervention, assuming similar measures have been taken to automate the failover of the application tier to the new primary database. Fast-Start Failover automatically executes a database failover within seconds of an outage being detected and can complete in seconds.
Please note OCI/DBAAS has not implemented fast start failover yet meaning that this cannot be done via console or DBAAS API's .
Please take a look at https://www.doag.org/formes/pubfiles/5256791/2013-DB-Larry_Carpenter-Session_Keynote__Best_Practices_for_Data_Availability_and_Disaster_Protection-Praesentation.pdf (Page level 38 for more details on Fast start failover )
Goldengate
Using goldengate customers can configure Active-Active primary-standby wherein both primary and standby are open in Read write mode . Please note that goldengate replication differs from dataguard replication in the sense that Goldengate standby is not an exact block to block copy of the Primary . There could also be restrictions around specific object datatypes which goldengate can support .
For more details on configuring goldengate to maintain a live standby database and failover best practices , please refer to
https://docs.oracle.com/en/middleware/goldengate/core/19.1/admin/configuring-oracle-goldengate-maintain-live-standby-database.html#GUID-6CE0810E-A681-4CCA-9BC8-539E8A364FD3
https://www.oracle.com/technetwork/database/availability/8399-goldengate-dataguard-1888654.pdf
Please note there is no current offering yet for Goldengate in OCI/DBAAS meaning no console/DBAAS API's for configuring / setting up goldengate standby .

Related

Oracle Standby Database for Running Queries

We have a physical standby database which is used only for running reports. It is not intended to be ever used as primary.
Now - a reporting database has much different requirements than an OLTP one.
I want to propose that we either convert this standby database from PHYSICAL to LOGICAL and create different database objects - especially INDEXES to support reporting.
What could be pros and cons of such approach ?

Replicating ActiveMQ shared database

We have configured ActiveMQ to use JDBC Master Slave. Our data center is an active and passive model. So we are thinking of replicating database that is used for Master-Slave from Active center to Passive center. But we are seeing three tables activemq_msgs, activemq_lock and activemq_ack. We are not sure which one or all to replicate to passive center and even if replicates whether bring up Master-Slave using replicated database works. This is the first time we are configuring and we don't find many documents in the internet to get started. Please provide your inputs.
If the "active" broker creates and uses those tables in the database then it stands to reason that the "passive" broker would too once it becomes active. In fact, it stands to reason that any table created by the "active" broker would be used by the "passive" broker once it becomes active. Therefore you should replicate all ActiveMQ related database tables.

how to use db2 read on standby feature

IBM DB2 has a feature for HADR database - read on standby. This allows the standby database to be connected to for read-only queries (with certain restrictions on datatypes and isolation levels)
I am trying to configure this as a datasource in an application which runs on websphere liberty profile.
Previously, this application was using the Automatic Client Re-route (which ensures that all connections are directed to the current primary)
However, I would like to configure it in such a way that I can have SELECTs / read-only flows to run on the standby database, and others to run on primary. This should also work when a takeover has been performed on the database (that is, standby becoming primary and vice-versa). The purpose of doing this is to divide the number of connections created between all available databases
What is the correct way to do this?
Things I have attempted (assume my servers are dbserver1 and dbserver2):
Create 2 datasources, one with the db url of dbserver1 and the other with dbserver2.
This works until a takeover is performed and the roles of the servers are switched.
Create 2 datasources, one with the db url of dbserver1 (with the Automatic Client Re-route parameters) and the other with dbserver2 only.
With this configuration, the application works fine, but if dbserver2 becomes the primary then all queries are executed on it.
Setup haproxy and use it to identify which is the primary and which is the standby. Create 2 datasources pointing to haproxy
When takeover is carried out on the database, connection exceptions start to occur (not just at the time of takeover, but for some time following it)
The appropriate way is described in a Whitepaper "Enabling continuous access to read on standby databases using Virtual IP addresses" linked off the Db2 documentation for Read-on-standby.
Virtual IP addresses are assigned to both roles, primary and standby. They are cataloged as database aliases. Websphere or other clients would connect to either the primary or standby datasource. When there is a takeover or failover, the virtual IP addresses are reassigned to the specific server. The client would continue to be routed to the desired server, e.g. the standby.

Oracle RAC One Node

In the latest version of Oracle Database (11g Release 2), there's a new option called Real Application Clusters (RAC) One Node. What is "One Node", and how does it differ from regular RAC?
Mogens Norgaard used to joke about "single node RAC" but now it actually exists!
Basically, One Node is running a RAC database on a single server, rather than across a cluster of servers. There is an Oracle white paper on the topic. Find it here. The money quote is:
"Oracle RAC One Node enables:
• Better server consolidation
• Enhanced protection from failures
• Greater flexibility and workload management
• Better online maintenance
In addition it allows customers to virtualize database storage, standardize their database
environment, and, should the need arise, upgrade to a full multi-node Oracle RAC
database without downtime or disruption."

EC2 database server failover strategy

I am planning to deploy my web app to EC2. I have several webserver instances. I have 1 primary database instance. I have 1 failover database instance. I need a strategy to redirect the webservers to the failover database instance IP when the primary database instance fails.
I was hoping I could use an Elastic IP in my connection strings. But, the webservers are not able to access/ping the Elastic IP. I have several brute force ideas to solve the problem. However, I am trying to find the most elegant solution possible.
I am using all .Net and SQL Server. My connection strings are encrypted.
Does anybody have a strategy for failing over a database instance in EC2 using some form of automation or DNS configuration?
Please let me know.
http://alestic.com/2009/06/ec2-elastic-ip-internal
tells you how to use the Elastic IP public DNS.
Haven't used EC2 but surely you need to either:
(a) put your front-end into some custom maintenance mode, that you define, while you switch the IP over; and have the front-end perform required steps to manage potential data integrity and data loss issues related to the previous server going down and the new server coming up when it enters and leaves your custom maintenance mode
OR, for a zero down-time system:
(b) design the system at the object/relational and transaction levels from the ground up to support zero-down-time fail-over. It's not something you can bolt on quicjkly to just any application.
(c) use some database support for automatic failover. I am unaware whether SQL Server support for failover suitable for your application exists or is appropriate here. I suggest adding a "sql-server" tag to the question to start a search for the right audience.
If Elastic IPs don't work (which sounds odd to say the least - shouldn't you talk to EC2 about that), you mayhave to be able to instruct your front-end which new database IP to use at the same time as telling it to go from maintenance mode to normal mode.
If you're willing to shell out a bit of extra money, take a look at Rightscale's tools; they've built custom server images and supporting tools that handle database failover (among many other things). This link explains how to do it with MySQL, so will hopefully show you some principles even though it doesn't use SQL Server.
I always thought there was this possibility in the connnection string
This is taken (but not yet tested) from How to add Failover Partner to a connection string in VB.NET :
If you connect with ADO.NET or the SQL
Native Client to a database that is
being mirrored, your application can
take advantage of the drivers ability
to automatically redirect connections
when a database mirroring failover
occurs. You must specify the initial
principal server and database in the
connection string and the failover
partner server.
Data Source=myServerAddress;Failover Partner=myMirrorServerAddress;
Initial Catalog=myDataBase;Integrated Security=True;
There is ofcourse many other ways to
write the connection string using
database mirroring, this is just one
example pointing out the failover
functionality. You can combine this
with the other connection strings
options available.
To broaden gareth's answer, cloud management softwares usually solve this type of problems. RightScale is one of them, but you can try enStratus or Scalr (disclaimer: I work at Scalr). These tools provide failover solutions like:
Backups: you can schedule automated snapshots of the EBS volume containing the data
Fault-tolerant database: in the event of failure, a slave is promoted master and mounted storage will be switched if the failed master and new master are in the same AZ, or a snapshot taken of the volume
If you want to build your own solution, you could replicate the process detailed below that we use at Scalr:
Is there a slave in the same AZ? If so, promote it, switch EBS
volumes (which are limited to a single AZ), switch any ElasticIP you
might have, reconfigure replication of the remaining slaves.
If not, is there a slave fully replicated in another AZ? If so, promote it,
then do the above.
If there are no slave in same AZ, and no slave fully
replicated in another AZ, then create a snapshot from master's
volume, and use this snapshot to create a new volume in an AZ where a
slave is running. Then do the above.

Resources