ADF Copy Data remove index in oracle Sink - oracle

I am trying to insert data from a SQL table to an Oracle table using activity Copy Data in Data Factory, on the first try it runs fine but on the second try it throws an error that an index on the target table (Oracle) has been corrupted.
Searching in different forums I found that apparently the Copy Data activity sends the insert statement in the following way: INSERT /*+ SYS_DL_CURSOR */ INTO
any idea how to fix this???
Thank you very much for the help

As per the error index is not corrupted. It was used twice. May be the operation was not planned according to the schedule and worked parallelly.
The Copy activity is executed on an integration runtime. You can use different types of integration runtimes for different data copy scenarios:
When you're copying data between two data stores that are publicly accessible through the internet from any IP, you can use the Azure integration runtime for the copy activity. This integration runtime is secure, reliable, scalable, and globally available.
When you're copying data to and from data stores that are located on-premises or in a network with access control (for example, an Azure virtual network), you need to set up a self-hosted integration runtime.
Use either of the two operations mentioned above, the error will be resolved.
Check link for support document: https://learn.microsoft.com/en-us/azure/data-factory/copy-activity-overview

Related

Manipulating Data Within AWS Redshift to a Schedule

Current Setup:
SQL Server OLTP database
AWS Redshift OLAP database updated from OLTP
via SSIS every 20 minutes
Our customers only have access to the OLAP Db
Requirement:
One customer requires some additional tables to be created and populated to a schedule which can be done by aggregating the data already in AWS Redshift.
Challenge:
This is only for one customer so I cannot leverage the core process for populating AWS; the process must be independent and is to be handed over to the customer who do not use SSIS and don't wish to start. I was considering using Data Pipeline but this is not yet available in the market in which the customer resides.
Question:
What is my alternative? I am aware of numerous partners who offer ETL like solutions but this seems over the top, ultimately all I want to do is execute a series of SQL statements on a schedule with some form of error handling/ alert. Preference of both customer and management is to not use a bespoke app to do this, hence the intended use of Data Pipeline.
For exporting data from AWS Redshift to another data source using datapipeline you can follow a template similar to https://github.com/awslabs/data-pipeline-samples/tree/master/samples/RedshiftToRDS using which data can be transferred from Redshift to RDS. But instead of using RDSDatabase as the sink you could add a JdbcDatabase (http://docs.aws.amazon.com/datapipeline/latest/DeveloperGuide/dp-object-jdbcdatabase.html). The template https://github.com/awslabs/data-pipeline-samples/blob/master/samples/oracle-backup/definition.json provides more details on how to use the JdbcDatabase.
There are many such templates available in https://github.com/awslabs/data-pipeline-samples/tree/master/samples to use as a reference.
I do exactly the same thing as you, but I use lambda service to perform my ETL. One drawback of lambda service is, it can run max of 5 mins (Initially 1 min) only.
So for ETL's greater than 5 minutes, I am planning to set up PHP server in AWS and with SQL injection I can run my queries, scheduled at any time with help of cron function.

Why so slow returning data from Oracle external tables?

We are an ETL shop and make heavy use of external tables. Typically these tables are queried to populated staging tables. I am surprised at the time it takes to for queries to return data from the external tables.
Typically there is around a 15 second delay before any result is returned. This is true even in the cases when the data file contains no data and when the data file does not exist. The delay doesn't seem related to the number of rows in the file.
I am logging into the database server itself, on which the external table data files are located.
Is this expected behaviour?
File system operations (ls, vim) at least on smaller files happen with no delay.
All files on local disk.
Oracle 12.1.
Oracle Linux Server release 6.6
I would recommend reviewing or looking into release Oracle 12.2 notes. There was a Patch for both the Big Data Appliance Firmware (22911748) for Exadata and a fix made in 12.2.
It addresses a view that is specific to the access to external tables. It's possible that you are impacted by this view. The view name is LOADER_DIR_OBJS, which is used to query the directory that external tables point to.
Our customers are running into very similar issues, and Oracle recommended installing the 12.2 release which contains the patch.
So, we are currently testing the 12.2 release. Anytime an external table is read, it has to have access to the LOADER_DIR_OBJS system view. Typically, the poor performance comes from this view, which accesses the SYS.OBJ$ and SYS.X$DIR system object because query plan is not optimal. Some people have found work arounds. (See Oracle Workaround Document ID 2034938.1 to see if it applies to you).

H2 Database multiple connections

I have the following issue:
Two instances of an application on two different systems should share a small database.
The main problem is that both systems can only exchange data through a network-folder.
I don't have the possibilty to setup a database-server somewhere.
Is it possible to place a H2 database on the network-folder and let both instances connect to the database (also concurrently)?
I could connect with both instances to the db using the embedded mode if I disable the file-locking, right?
The instances can perfom either READ or INSERT operations on the db. Do I risk data corruptions using multiple concurrent embedded connections?
As the documentation says; ( http://h2database.com/html/features.html#auto_mixed_mode
)
Multiple processes can access the same database without having to start the server manually. To do that, append ;AUTO_SERVER=TRUE to the database URL. You can use the same database URL independent of whether the database is already open or not. This feature doesn't work with in-memory databases.
// Application 1:
DriverManager.getConnection("jdbc:h2:/data/test;AUTO_SERVER=TRUE");
// Application 2:
DriverManager.getConnection("jdbc:h2:/data/test;AUTO_SERVER=TRUE");
From H2 documentation:
It is also possible to open the database without file locking; in this
case it is up to the application to protect the database files.
Failing to do so will result in a corrupted database.
I think that if your application use always the same configuration (shared file database on network folder), you need to create an application layer that manages concurrency

oracle database sandbox

Is it possible to create database server "sandbox"?So there is a master server that contains real data and a sandbox server that should dispatch read request to the master server in case the sandbox does not have cached data.In the case of a write request it should create a local copy of the data and apply changes to that copy without any impact to the master server.
You could build such a thing.
Create a local Oracle database with a database link that points back to the master database.
Copy the DDL for every object you're interested in from the master database to your local database renaming each table (i.e. EMP becomes EMP_LOC).
Create a view in the local database for each table that does a UNION ALL between the remote and local copies of the table.
Create an INSTEAD OF trigger on the local view that writes any changes only to the local table.
While you could do such a thing, however, it's not obvious why you'd want to. It would be a fair amount of work to set up and maintain and performance could easily get dodgy rather easily. And it's not obvious what problem this approach solves-- it wouldn't replace the need to have isolated development, test, and staging environments. And I'm hard-pressed to come up with a lot of use cases where this sort of "sandbox" would be preferable to one of those environments.
#Justin Cave give a good approach.. however maybe you should consider creating a Virtual Machine and take a snapshot of your PROD instance whenever you want to work on something new with the latest data.

Oracle11g Database Synchornization

I have a WPF application with back-end as Oracle11gR2. We need to enable our application to work in both online and offline(disconnected) mode. We are using Oracle standard edition(with single instance) as client database. I am using Sequnece Numbers for Primary Key Columns. Is there anyway to sync my client and server database without any issues in Sequence number columns. Please note that we will restrict creation of basic(master) data to be created only in server.
There are a couple of approaches to take here.
1- Write the sync process to rebuild the server tables (on the client) each time with a SELECT INTO. Once complete, RENAME the current table to a "temp" table, and RENAME the newly created table with the proper name. The sync process should DROP the temp table as one of its first steps. Finally, recreate the indexes and you should be good-to-go.
2- Create a backup of the server-side database, write a shell script to copy it down and restore it on the client.
Each of these options will preserve your sequence numbers. Which one you choose really depends on your skills. If you're more of a developer, you can make #1 work. If you've got some Oracle DBA skills you should be able to make #2 work.
Since you're on 11g, there might be a cleaner way to do this using Data Pump.

Resources