Look for best approach replicating oracle table in S3 bucket - oracle

My problem:
I need a data pipeline created from my organization’s Oracle DB (Oracle Cloud Infrastructure) to an AWS S3 bucket. Ideally, I would love for there to be some mechanism for oracle to push new data that has entered the database to be pushed to an S3 bucket as it is added (in whatever format).
Question:
Is this possible with Oracle native, specifically Oracle Cloud Infrastructure?
Or would is there a better solution you have seen?
Note:
I have seen AWS has the Data Sync product, this seems like it could facilitate with this problem, however I am not sure if it is suitable for this specific problem.

An S3 bucket is object storage; it can only hold complete files. You cannot open and update an existing file like you would in a normal file system, even just to add new rows. You will need to construct your whole file outside of Oracle and then push it to S3 with some other mechanism.
You may want to consider the following steps:
Export your data from Oracle Cloud into Oracle Object Storage (similar to S3) using the Oracle Cloud's integration with their object storage. (https://blogs.oracle.com/datawarehousing/the-simplest-guide-to-exporting-data-from-autonomous-database-directly-to-object-storage)
THEN:
Let the customer access the Oracle Object Store as they normally would access S3, using Oracle's Amazon S3 Compatibility API. (https://docs.oracle.com/en-us/iaas/Content/Object/Tasks/s3compatibleapi.htm)
OR:
Use an externally driven script to download the data - either from Oracle Object Store or directly from the database - to a server, then push the file up to Amazon S3. The server could be local, or hosted in either Oracle OCI or in AWS, as long as it has access to both object stores. (https://blogs.oracle.com/linux/using-rclone-to-copy-data-in-and-out-of-oracle-cloud-object-storage)
OR:
You may be able to use AWS Data Sync to move data directly from Oracle Object Storage to S3, depending on networking configuration requirements. (https://aws.amazon.com/blogs/aws/aws-datasync-adds-support-for-on-premises-object-storage/)

Related

Copying JSON data from CosmosDB to Snowflake through ADF

**Hello,
I am trying to copy data from Cosmos DB to Snowflake through Azure Data Factory. But I get the error- "Direct copying data to Snowflake is only supported when source dataset is DelimitedText, Parquet, JSON with Azure Blob Storage or Amazon S3 linked service, for other dataset or linked service, please enable staging". Would that imply that I need to create a linked service with blob storage? What URL and SAS token should I give? Do I need to move everything to Blob and then move forward with staging?
Any help is appreciated. Thank You very much.**
Try it with a data flow activity instead of a copy activity

Load .DMP oracle file from local machine to RDS Oracle

I am wandering around and didn't get an answer to this question.
is there any way to import Oracle .dmp file stored on a local machine to RDS Oracle?
If Yes how to do it?
Else why it's not possible to do so as other databases gave the flexibility to do these kinds of imports through more than one way.
You can't do that. When you import data with Oracle Data Pump, you must transfer the dump file that contains the data from the source database to the target database**. You can transfer the dump file using an Amazon S3 bucket or by using a database link between the two databases.
If your local machine contains a database and you have a network connection between your on-premises database and your Oracle RDS , then you can use NETWORK_LINK, although I don't recommend it. It is much better to tranfer the file using S3 bucket.
https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/Oracle.Procedural.Importing.html

Hybrid database synchronization without data sync agent

I'm working on a project to create hybrid SQL database per tenant model. While we were able to replicate the on-premise databases to databases in Azure. I'm not able to find a way to continuously sync both on-prem DB and cloud DB (We cannot use data sync agent or transaction replication). We are looking for any other alternatives that we can try to achieve our purpose.
Also, how does synchronization works when the internet is down and cannot sync with cloud?
Sorry for my ignorance since I'm new to this field.
Thanks.
why can't you use data sync agent? It can be installed in other machine that has access to the database and internet.

AWS DMS - incremental migration from Oracle to Redshift

I am new to AWS DMS service. Plans are to migrate on-prem Oracle to Redshift. Before going into production environment, currently trying out a test Oracle RDS in AWS which is a small subset of actual database as source. So far have been successful in the bulk load and incremental migration from RDS to Redshift.
When it comes to on-prem oracle , particularly for the incremental load
1) As per document : http://docs.aws.amazon.com/dms/latest/sbs/CHAP_On-PremOracle2Aurora.Steps.ConfigureOracle.html, the on-prem needs to be enabled with supplemental logging. Plans are to use the following two commands.
ALTER DATABASE ADD SUPPLEMENTAL LOG DATA;
ALTER DATABASE ADD SUPPLEMENTAL LOG DATA (PRIMARY KEY) COLUMNS;
The production database has multiple logging locations. Is there any other log settings other than the above two that I should be looking into for DMS to pick up multiple log locations?
2) In the same link given, point 4 says 'Create or configure a database account to be used by AWS DMS.'
Where should I create this user? on-prem oracle or AWS?
How do I configure DMS to use this user?
You need to read this documentation:
https://docs.aws.amazon.com/dms/latest/userguide/CHAP_Source.Oracle.html
For your second question; You need to create a user in the Oracle source database, the section 'Working with a Self-Managed Oracle Database as a Source for AWS DMS' tells you all of the grants you need to give.
For your first question, if you look at the SQL Server documentation;
https://docs.aws.amazon.com/dms/latest/userguide/CHAP_Source.SQLServer.html
It specifies the limitation of; 'SQL Server backup to multiple disks isn't supported. If the backup is defined to write the database backup to multiple files over different disks, AWS DMS can't read the data and the AWS DMS task fails.'
I can't see a similar stipulation in the oracle documentation, first link, I would hazard a guess that DMS is able, in the case of oracle, to determine and cope with multiple logging locations from a configuration value inside the database.

Backup Azure blob storage in line with SQL Azure DB

It is recommended that we store document information in blob storage. In our case the blob storage is related to the SQL Azure data, is the facility available to back up the blob storage in sync with the SQL Azure data ? What I don't want to see is a point in time restore of the SQL Azure data only to find we don't have the same snapshot of the blob data at that time :(
Does anyone know what is available
Interesting issue you have to solve. But there is no automated way to keep in sync BLOB and Azure SQL Database Data. You have to do manage this yourself. And here is not just about blob snapshots. How about your updated DB record refers to a new blob. What happens with the old one ?! This is all business rules to apply at application level. And you have to question yourself to what degree you want that backup of Blobs.
Here is an interesting blog post on Azure SQL and Storage backup. But again - there is no service that will keep for you in sync data between SQL DB and Azure Storage.

Resources