the best way to track data changes in oracle - oracle

as the title i am talking about, what's the best way to track data changes in oracle? i just want to know which row being updated/deleted/inserted?
at first i think about the trigger, but i need to write more triggers on each table and then record down the rowid which effected into my change table, it's not good, then i search in Google, learn new concepts about materialized view log and change data capture,
materialized view log is good for me that i can compare it to original table then i can get the different records, even the different of the fields, i think the way is the same with i create/copy new table from original (but i don't know what's different?);
change data capture component is complicate for me :), so i don't want to waste my time to research it.
anybody has the experience the best way to track data changes in oracle?

You'll want to have a look at the AUDIT statement. It gathers all auditing records in the SYS.AUD$ table.
Example:
AUDIT insert, update, delete ON t BY ACCESS
Regards,
Rob.

You might want to take a look at Golden Gate. This makes capturing changes a snap, at a price but with good performance and quick setup.
If performance is no issue, triggers and audit could be a valid solution.
If performance is an issue and Golden Gate is considered too expensive, you could also use Logminer or Change Data Capture. Given this choice, my preference would go for CDC.
As you see, there are quite a few options, near realtime and offline.
Coding a solution by hand also has a price, Golden Gate is worth investigating.

Oracle does this for you via redo logs, it depends on what you're trying to do with this info. I'm assuming your need is replication (track changes on source instance and propagate to 1 or more target instances).
If thats the case, you may consider Oracle streams (other options such as Advanced Replication, but you'll need to consider your needs):
From Oracle:
When you use Streams, replication of a
DML or DDL change typically includes
three steps:
A capture process or an application
creates one or more logical change
records (LCRs) and enqueues them into
a queue. An LCR is a message with a
specific format that describes a
database change. A capture process
reformats changes captured from the
redo log into LCRs, and applications
can construct LCRs. If the change was
a data manipulation language (DML)
operation, then each LCR encapsulates
a row change resulting from the DML
operation to a shared table at the
source database. If the change was a
data definition language (DDL)
operation, then an LCR encapsulates
the DDL change that was made to a
shared database object at a source
database.
A propagation propagates the staged
LCR to another queue, which usually
resides in a database that is separate
from the database where the LCR was
captured. An LCR can be propagated to
a number of queues before it arrives
at a destination database.
At a destination database, an apply
process consumes the change by
applying the LCR to the shared
database object. An apply process can
dequeue the LCR and apply it directly,
or an apply process can dequeue the
LCR and send it to an apply handler.
In a Streams replication environment,
an apply handler performs customized
processing of the LCR and then applies
the LCR to the shared database object.

Related

Dynamically List contents of a table in database that continously updates

It's kinda real-world problem and I believe the solution exists but couldn't find one.
So We, have a Database called Transactions that contains tables such as Positions, Securities, Bogies, Accounts, Commodities and so on being updated continuously every second whenever a new transaction happens. For the time being, We have replicated master database Transaction to a new database with name TRN on which we do all the querying and updating stuff.
We want a sort of monitoring system ( like htop process viewer in Linux) for Database that dynamically lists updated rows in tables of the database at any time.
TL;DR Is there any way to get a continuous updating list of rows in any table in the database?
Currently we are working on Sybase & Oracle DBMS on Linux (Ubuntu) platform but we would like to receive generic answers that concern most of the platform as well as DBMS's(including MySQL) and any tools, utilities or scripts that can do so that It can help us in future to easily migrate to other platforms and or DBMS as well.
To list updated rows, you conceptually need either of the two things:
The updating statement's effect on the table.
A previous version of the table to compare with.
How you get them and in what form is completely up to you.
The 1st option allows you to list updates with statement granularity while the 2nd is more suitable for time-based granularity.
Some options from the top of my head:
Write to a temporary table
Add a field with transaction id/timestamp
Make clones of the table regularly
AFAICS, Oracle doesn't have built-in facilities to get the affected rows, only their count.
Not a lot of details in the question so not sure how much of this will be of use ...
'Sybase' is mentioned but nothing is said about which Sybase RDBMS product (ASE? SQLAnywhere? IQ? Advantage?)
by 'replicated master database transaction' I'm assuming this means the primary database is being replicated (as opposed to the database called 'master' in a Sybase ASE instance)
no mention is made of what products/tools are being used to 'replicate' the transactions to the 'new database' named 'TRN'
So, assuming part of your environment includes Sybase(SAP) ASE ...
MDA tables can be used to capture counters of DML operations (eg, insert/update/delete) over a given time period
MDA tables can capture some SQL text, though the volume/quality could be in doubt if a) MDA is not configured properly and/or b) the DML operations are wrapped up in prepared statements, stored procs and triggers
auditing could be enabled to capture some commands but again, volume/quality could be in doubt based on how the DML commands are executed
also keep in mind that there's a performance hit for using MDA tables and/or auditing, with the level of performance degradation based on individual config settings and the volume of DML activity
Assuming you're using the Sybase(SAP) Replication Server product, those replicated transactions sent through repserver likely have all the info you need to know which tables/rows are being affected; so you have a couple options:
route a copy of the transactions to another database where you can capture the transactions in whatever format you need [you'll need to design the database and/or any customized repserver function strings]
consider using the Sybase(SAP) Real Time Data Streaming product (yeah, additional li$ence is required) which is specifically designed for scenarios like yours, ie, pull transactions off the repserver queues and format for use in downstream systems (eg, tibco/mqs, custom apps)
I'm not aware of any 'generic' products that work, out of the box, as per your (limited) requirements. You're likely looking at some different solutions and/or customized code to cover your particular situation.

Addressing CRUD "tables" in event sourcing

I'm starting down an ES journey and want to know if traditional support tables should be stored in the event log or should those be handled differently? These tables would typical have a CRUD page. In other words, would it be common to have 2 approaches in the same application, one for support tables and one for transactional data?
A support table would be like "Account" in an accounting application or "Product Type" or the actual "Product" table in an ERP application (I'm not writing an ERP application - that's an example of the type of table I'm talking about).
If we store CRUD-type data in the event log, then we might have events:
ProductCreated
ProductUpdated
ProductDeleted (which would just mark it as deleted)
Then, do we attempt to find out what changed (in ProductUpdated event) and just store the change and replay to get the latest image of the Product?
Mostly, I'm after what approach to use for CRUD tables - traditional or store in the event log? Additional information would be great!
Suppose you start purely with an event log, including for events like ProductCreated, etc., and no other data store. What happens then is that every time your application starts up, it has to replay all the events in the log to build its current state.
Now, suppose you create a traditional SQL table to store the current state of your app (say a products table) and the ID of the last event that was processed to get to that state (say a last_event table). What happens then is every time your app starts up, it has to replay only the events with higher IDs than the stored ID and process those to build its new state.
On the flip side, your app now has to be careful to keep these two states synchronised. If you need to have concurrency, you'll need to be careful to do atomic operations only on your SQL tables--but that should be reasonably easy with transacctions.
Your support tables are just a read-model/projection of the event stream. In general you don't create those support models in case you need them. You create a read-model only if you use it somewhere in the UI.
Anyway, one important benefit behind Event sourcing is that you won't need to use join in your queries. That is, you create a table for each read-model that contains all the data it needs - full denormalisation. You keep that table super-optimised for the query.

How to implement an ETL Process

I would like to implement a synchronization between a source SQL base database and a target TripleStore.
However for matter of simplicity let say simply 2 databases. I wonder what approaches to use to have every change in the source database replicated in the target database. More specifically, I would like that each time some row changes in the source database that this can be seen by a process that will read the changes and populate the target database accordingly while applying some transformation in the middle.
I have seen suggestion around the mechanism of notification that can
be available in the database, or building tables such that changes can
be tracked (meaning doing it manually) and have the process polling it
at different intervals, or the usage of Logs (change data capture,
etc...)
I'm seriously puzzle about all of this. I wonder if anyone could give some guidance and explanation about the different approaches with respect to my objective. Meaning: name of methods and where to look.
My organization mostly uses: Postgres and Oracle database.
I have to take relational data and transform them in RDF so as to store them in a triplestore and keep that triplestore constantly synchronized with the data is the SQL Store.
Please,
Many thanks
PS:
A clarification between ETL and replication techniques as in Change Data capture, with respect to my overall objective would be appreciated.
Again i need to make sense of the subject, know what are the methods, so i can further start digging for myself. So far i have understood that CDC is the new way to go.
Assuming you can't use replication and you need to use some kind of ETL process to actually extract, transform and load all changes to the destination database, you could use insert, update and delete triggers to fill a (manually created) audit table. Columns GeneratedId, TableName, RowId, Action (insert, update, delete) and a boolean value to determine if your ETL process has already processed this change. Use that table to get all the changed rows in your database and transport them to the destination database. Then delete the processed rows from the audit table so that it doesn't grow too big. How often you have to run the ETL process depends on the amount of changes occurring in the source database.

Use of Oracle Advanced Queuing to receive changes of database table rows

I am confused about Oracle Advanced Queueing. It looks like it is a way to asynchronously send database notification to application layer.
But looking in some details, there is queue to be setup, alongside a table. and there is explicit calls to publish messages that will afterward be pushed to the application layer.
Does this work automatically with table rows modification ?
I want, if a particular table changes (no matter who/how changed), to receive a notification about it in form of a binary object that represents the row changed.
(Note: I know about Oracle Query change notification, CQN, but I am not satisfied with its performance, my goal is then to see if Oracle Advanced Queue can offer similar goal with better speed).
Thanks in advance.

Compare and Contrast Change Data Capture and Database Change Notification

Oracle has two seemingly competing technologies. CDC and DCN.
What are the strengths of each?
When would you use one and not the other?
In general, you would use DCN to notify a client application that the client application needs to clear/ update the application's cache. You would use CDC for ETL processing.
DCN would generally be preferable when you have an OLTP application that needs to be notified immediately about data changes in the database. Since the goal here is to minimize the number of network round-trips and the number of database hits, you'd generally want the application to use DCN for queries which either are mostly static. If a large fraction of the query is changing regularly, you may be better off just refreshing the application's cache on a set frequency rather than running queries constantly to get the changed data (DCN does not contain the changed data, just the ROWID of the row(s) that changed). If the application goes down, I believe DCN allows changes to be lost.
CDC would generally be preferable when you have a DSS application that needs to periodically pull over all the data that changed in a number of tables. CDC can guarantee that the subscriber has received every change to the underlying table(s) which can be important if you are trying to replicate changes to a different database . CDC allows the subscriber to pull the changes at its convenience rather than trying to notify the subscriber that there are changes, so you'd definitely want CDC if you wanted the subscriber to process new data every hour or every day rather than in near real time. (note: DCN also has a guaranteed delivery mode, see comments below. --Mark Harrison)
CDC seems to be much more complex to set up than DCN.
I mean to setup DCN I wrap a select in a start and end DCN block and then write a procedure to be called with a collect of changes. That's it.
CDC requires publishers and subscribers and anyways, seems like more work.

Resources