I am working on a task to make a data migration plan to migrate Oracle RAC to AWS Amazon Aurora.
The current in-house production database is based on a 10TB, 8 Node Oracle RAC Cluster with single node
standby in DR site. The database has 2 main schemas, comprising of 500 tables, 300 Packages and Triggers, 20 Partitioned tables, 5000 concurrent session of which 100 are active at a given time and has an IOPS requirement of 50K read and 30K write IOPS. The development database is 1/10th of the production capacity.
I did research and found that DMS (Data Migration Service) and SCT (Schema Conversion Tool) takes care of all the migration process. So do we need to work on any individual specifications mentioned in the task or will DMS and SCT take care of the whole migration process?
The tools you mention (DMS and SCT) are powerful and useful, but will they take care of the whole migration process? Very unlikely unless you have a very simple data model.
There will likely be some objects and code that cannot be converted automatically and will need manual input/development from you. Migrating a database is usually not a simple thing and even with tools like SCT and DMS you need to be prepared to plan, review and test.
SCT can produce an assessment report for you. I would start here. Your question is next to impossible to answer on a forum like this without intricate knowledge of the system you are migrating.
Related
(I do not code on my own, to make things clear)
I am looking for a solution that would allow to replicate data between a, master, Oracle 11g DB and a new PostgreSQL DB. Those are 2 different applications but the need to exchange data in real-time. There are some trigger-based ways but there is quite a big concern that this can affect the master DB efficiency - which we can't do.
I have also come across some log-based solutions, like HVR, but the cost is way too high for 500MB of data to be replicated.
Maybe anyone of You had a similar issue and found a way to deal with it?
Any kind of tips and help will be really appreciated as I am quite short on time
Oracle Archive Logs have different format than Postgres Write Ahead Logs. Despite the general similarity in concept of Oracle Streams, SQL Log Shipping, Postgres Streaming Replication etc, transaction logs <> redo logs <> xlogs and you can't use one provider logs to roll on the other provider engine.
Moreover you can't roll logs over same DB provider different version because of difference in binary format.
Something alike logical replication you can get with Postgres Logical Decoding, Oracle GoldenGate, Heterogeneous Database Replication, AWS DMS. But none of above gives you "Log-Based replication" between different db vendors
You can use a product that specializes in change data capture based data integration. Striim, GoldenGate, Attunity allow you to do CDC from Oracle. Striim also allows you to do CDC from PostgreSQL and write to Oracle as well.
https://striim.com
https://attunity.com
I am new to ETL migration. I have worked with Talend, but not yet faced the task to migrate large ETL project from one tool to another(IBM Data Manager to Informatica PowerCenter or Informatica Developer).
I am looking to general guidlines for migrate jobs from one tool to another one, and of course for my specific case.
I will be more clear:
The Databases Sources and Targes will be the same, what I have to migrate is the ETL part itself.
The approach will be the parallel run as suggested at this blog :
Parallel Run
In my case I have not to migrate the all DWH instead only the ETL as the old software will become a legacy one and the new one is from another Vendor(luckly both of them can export XML ).
I am looking for the pratical approch for parallel run, indeed I am been suggested to copy the Sources and Targes Tables in the orginal Database schema, but it does not look to me the best way to go(even not pratical when a schema has many tables).
The DWH I am woking of course has several DBS instance in Oracle and some in SQL Server, a Test server and a Production one, as well as for each, a Staging, Storage and a Data Mart area.
As from this related question and its answer, I am thinking to copy each schema on the go for each project.
Staging in ETL: Best Practices
Looking to have guidlines references, but my specific case is the migration from IBM Data Manager to Informatica PowerCenter
The approach depends on various criteria and personal preferences. Either way you will need to either duplicate parts or all of the source and destination systems. On one extreme you can use two instances of the entire system. If you have complex upstream processes that are part of the test, or you have massive numbers of tables and processes, and you have the bandwidth and resources to duplicate your system then this approach may be optimal.
At the other extreme, if any complex processes occur within the ETL tool itself, or you are simply loading tables and need to check they are loaded correctly, then making copies of the tables and pointing your new or old tool to the table copies may be the way to go. This method is very simple and easy to setup.
Keep in mind this forum is not meant to replace blogs and in-depth tech articles on those techniques.
We are developing a large data migration from Oracle DB (12c) to another system with SSIS. The developers are using a production copy database but the problem is that, due to the complexity of the data transformation, we have to do things in stages by preprocessing data into intermediate helper tables which are then used further downstream. The problem is that all developers are using the same database and screw each other up by running things simultaneously. Does Oracle DB offer anything in terms of developer sandboxing? We could build a mechanism to handle this (e.g. have dev ID in the helper tables, then query views that map to the dev), but I'd much rather use built-in functionality. Could I use Oracle Multitenant for this?
We ended up producing a master subset database of select schemas/tables through some fairly elaborate PL/SQL, then made several copies of this master schema so each dev has his/her own sandbox (as Alex suggested). We could have used Oracle Data Masking and Subsetting but it's too expensive. Another option for creating the subset database wouldn have been to use Jailer. I should note that we didn't have a need to mask any sensitive data.
Note. I would think this a fairly common problem so if new tools and solutions arise, please post them here as answers.
Current Setup:
SQL Server OLTP database
AWS Redshift OLAP database updated from OLTP
via SSIS every 20 minutes
Our customers only have access to the OLAP Db
Requirement:
One customer requires some additional tables to be created and populated to a schedule which can be done by aggregating the data already in AWS Redshift.
Challenge:
This is only for one customer so I cannot leverage the core process for populating AWS; the process must be independent and is to be handed over to the customer who do not use SSIS and don't wish to start. I was considering using Data Pipeline but this is not yet available in the market in which the customer resides.
Question:
What is my alternative? I am aware of numerous partners who offer ETL like solutions but this seems over the top, ultimately all I want to do is execute a series of SQL statements on a schedule with some form of error handling/ alert. Preference of both customer and management is to not use a bespoke app to do this, hence the intended use of Data Pipeline.
For exporting data from AWS Redshift to another data source using datapipeline you can follow a template similar to https://github.com/awslabs/data-pipeline-samples/tree/master/samples/RedshiftToRDS using which data can be transferred from Redshift to RDS. But instead of using RDSDatabase as the sink you could add a JdbcDatabase (http://docs.aws.amazon.com/datapipeline/latest/DeveloperGuide/dp-object-jdbcdatabase.html). The template https://github.com/awslabs/data-pipeline-samples/blob/master/samples/oracle-backup/definition.json provides more details on how to use the JdbcDatabase.
There are many such templates available in https://github.com/awslabs/data-pipeline-samples/tree/master/samples to use as a reference.
I do exactly the same thing as you, but I use lambda service to perform my ETL. One drawback of lambda service is, it can run max of 5 mins (Initially 1 min) only.
So for ETL's greater than 5 minutes, I am planning to set up PHP server in AWS and with SQL injection I can run my queries, scheduled at any time with help of cron function.
My delayed job has something to do with exporting slightly edited version of most of the tables in the app's database, and while doing so, it is critical that none of the current data is being edited.
Is it possible to lock the entire database while running this delayed job?
More Information:
The database to be exported is in PostgreSQL, Heroku's postgresql database, to be more specific.
The flow is something like (all below should be done automatically by the code):
site would be put in maintenance mode,
freeze then export the database, then
when exporting is complete, re-activate the site back
Given there is not a lot of information with your question, I am going to answer you as best I can.
1) What is the database type and model? Is it a standalone DB like MS Access or Informix SE?
2) If not a standalone engine, does this database support replication. I used to work a lot with MS SQL Server, and replication had implications while the database was live and being edited. That is the implications were whether edited data was replicated. In this case, consult the docs. Is it an option to use replication to preserve the current database?
3) What kind of task is this? It sounds like maintenance. Our Informix SE databases lock when being imported or exported. On the production server, it is my job to make sure no local server applications are trying to access the locked DB, and that our external payments web site cannot interfere while the db is locked.
4) If this is a production site that is not in maintenance mode, then I suggest you probably do not want to lock an entire database.
I am sorry for not answering your question directly, but more information is needed like are you asking if this can be done from the Ruby DB interface on some model of db.