materialized view over multiple databases - oracle

Set-up :
There is one TRANSPORT database and 4 PRODUNIT databases. All these 5 DBs are on different machines and are Oracle databases.
Requirement :
A 'UNIFIED view' is required in the TRANSPORT db which will retrieve data from a table that is present in all the 4 PRODUNIT databases. So when there is a query on the TRANSPORT database(with where clause), the data may be present in any one of the 4 PRODUNIT databases
The query will be kind of 'real time' i.e it requires that as soon as the data is inserted/updated in the table of any of the 4 PRODUNIT databases, it is IMMEDIATELY available in the TRANSPORT db
I searched on the net and ended up with the materialized view. I have the below concerns before I proceed :
Will the 'fast refresh on commit' ensure requirement 2 ?
The table in the individual PRODUNIT databases will experience frequent DML. I'm suspecting a performance impact on the TRANSPORT db - am I correct ? If yes, how shall I proceed ?
I'm rather wondering if there is an approach better than the materialized view !

A materialized view that refreshes on commit cannot refer to a remote object so it doesn't do you a lot of good. If you could do a refresh on commit, you could maintain the data in the transport database synchronously. But you can't.
I would seriously question the wisdom of wanting to do synchronous replication in this case. If you could, then the local databases would become unusable if the transport database was down or the network connection was unavailable. You'd incur the cost of a two-phase commit on every transaction. And it would be very easy for one of the produnit databases to block transactions happening on the other databases.
In virtually every instance I've ever come across, you'd be better served with asynchronous replication that keeps the transport database synchronized to within, say, a few seconds of the produnit database. You probably want to look into GoldenGate or Streams for asynchronous replication with relatively short delays.

whether or not you require a MV would depend on the performance between your DBs and the volume of data concerned.
I would start with a normal view, using DB links to select the data from the databases, but would need to test this to see that the performance is like.
Given requirement 2, a refresh on commit would probably be the best approach if performance on a normal view was poor.

Related

Two database connection vs replication on one database

Have a web service which connect to oracle database. On database side I have two database, from first I need select some information and in second database need to make some operations.
My question is about which is better to connect only second database and make replication by dbms or scheduler job from first db which release x times in a day to refresh data or make two data source on java side and after select some data from first database, connect second one to make some operations.
From my point of view, I'd access only the "second" database (in which you do those operations) and let it acquire data it needs from the "first" database via a database link.
That can be done directly, such as
select some_columns from db1_table#db_link where ...
or, if it turns out to be way too slow and difficult to tune, create a materialized view in the second database which would then be refreshed using one of available options (a scheduled refresh might be one of the choices).
As it is primarily opinion-based answer, I presume that you'll hear other options from someone else. Further discussion will raise the most appropriate solution to the top.

How much data/networking usage does an Oracle Client use while running queries across schemas?

When I run a query to copy data from schemas, does it perform all SQL on the server end or copy data to a local application and then push it back out to the DB?
The two tables sit in the same DB, but the DB is accessed through a VPN. Would it change if it was across databases?
For instance (Running in Toad Data Point):
create table schema2.table
as
select
sum(row1)
,row2
from schema1
The purpose I ask the question is because I'm getting quotes for a Virtual Machine in Azure Cloud and want to make sure that I'm not going to break the bank on data costs.
The processing of SQL statements on the same database usually takes place entirely on the server and generates little network traffic.
In Oracle, schemas are a logical object. There is no physical barrier between them. In a SQL query using two tables it makes no difference if those tables are in the same schema or in different schemas (other than privilege issues).
Some exceptions:
Real Application Clusters (RAC) - RAC may share a huge amount of data between the nodes. For example, if the table was cached on one node and the processing happened on another, it could send all the table data through the network. (I'm not sure how this works on the cloud though. Normally the inter-node traffic is done with a separate, dedicated network connection.)
Database links - It should be obvious if your application is using database links though.
Oracle Reports and Forms(?) - A few rare tools have client-side PL/SQL processing. Possibly those programs might send data to the client for processing. But I still doubt it would do something crazy like send an entire table to the client to be sorted, and then return the results to the server.
Backups/archive logs - I assume all the data will be backed up. I'm not sure how that's counted, but possibly that means all data written will also be counted as network traffic eventually.
The queries below are examples of different ways to check the network traffic being generated.
--SQL*Net bytes sent for a session.
select *
from gv$sesstat
join v$statname
on gv$sesstat.statistic# = v$statname.statistic#
--You probably also want to filter for a specific INST_ID and SID here.
where lower(display_name) like '%sql*net%';
--SQL*Net bytes sent for the entire system.
select *
from gv$sysstat
where lower(name) like '%sql*net%'
order by value desc;

ATTACH - Is there a price to pay?

When two databases are attached is there a hit in performance compared to having a separate connection to each? Also, if I was writing data to one of the attached databases would both databases be locked or just the one being written to?
The reason I ask is it just seems simpler to me to have one connection that I ATTACH / DETACH each database to / from as it becomes needed / redundant rather than opening and closing connections to each of them all the time. My app doesn't have any threads.
Transaction are atomic over all attached databases; this requires creating a separate master journal in addition to all the normal rollback journals of the actual databases.
When having attached databases, table names (and PRAGMA statements) might require that the database name is added.
For these reason, ATTACH is usually used only when you actually need to access multiple databases in the same query.

Data Replication in Oracle

I have a table on master site which only allow insert operation. I want to replicate the rows which are inserted recently so, need something which can track the last record replicated into the local site and perform the replication afterwards.
I have tried Oracle materialized view but still confuse that whether I use the Fast Refresh or Complete Refresh. I need all the newly inserted rows replicated in one transaction.
Are there any better approach to do that? Any help would be highly appreciated.
Thanks.
A fast refresh would copy incremental changes over the network but requires that a materialized view log be created on the master site on the source table. That adds some overhead to the inserts happening on the master table but would generally make the refresh more efficient.
A complete refresh would copy every row over the network every time the materialized view is refreshed. That is likely to be less efficient from a refresh perspective but there will be no overhead to inserts on the source table and the master site does not need to create a materialized view log.
Oracle provides a host of data replication technologies-- materialized views are the oldest and probably the least efficient but are relatively trivial to set up. Streams is a newer technology that has much lower overhead but is quite a bit more complex to set up. Golden Gate is the preferred replication technology today but that has extra licensing costs.

Transaction Performance Impact of Materialized View Logs

I have been researching using materialized views for data aggregation and reporting purposes for a company that is largely centered around transactions (using an Oracle db). The current reporting system is dependent upon a series of views that obscure a lot of the complex data logic of the application. These views place a heavy burden on the system when they are called.
We are interested in using the "fast refresh" for incremental updates to perform some of the complex query logic prior to use in reporting; however, there is a concern within the organization that the materialized view logs (which are required for this fast refresh) will have an impact on our current transaction performance in the database. This performance is very essential to our organization therefore there is a great fear of any change.
Here is an example of the type of materialized view log we would need to implement:
create materialized view log on transaction
with rowid, sequence(transaction_id,account_id,order_id,currency_id,price,transaction_date,payment_processor_id)
including new values;
We would not be using the "on commit" clause for updates but rather the "on demand" clause in creation of the view, as we understand this would have a performance impact.
Will implementing this type of logging affect database transaction performance? I imagine that it must slightly affect performance as there is an additional write procedure (to the log) that is wrapped in the commit, but I cannot find any reference to this in the Oracle documentation. Any literature or advice on this subject would be greatly appreciated.
Thanks for your help!
Yes, there will be an impact. The materialized log needs to be maintained synchronously so the transactions will need to insert a new row into the materialized view log for every row that is modified in the base table. How great an impact depends heavily on the system. If your system is I/O bound and you've optimized it so that physically writing the changes to the base table is a significant fraction of the wait time, the impact will be much greater than if your system is CPU bound and most of your wait time is spent reading data or performing computations.
If you are really concerned about the performance of the OLTP system, it would make sense to offload reporting to a different database on a different server. You can replicate the data to the reporting server using Streams (or GoldenGate if you can afford the additional licensing) which will have less of an impact on the source than materialized views because the redo information can be read asynchronously (and can be read on the reporting server rather than putting that workload on the production server). You could then define materialized views on the reporting server where they won't have any impact on the OLTP server. Or you could create a logical standby database as your reporting server and create the materialized views there. Either way, moving the reporting workload off the production server and reading the redo data asynchronously will protect the performance of the production server.

Resources