Copy data from table with LONG RAW column from one database to another database - oracle

I need to create JOB in Pentaho Kettle to automate copying data from one database to another database. I am facing problem while copying data from table containing a long raw column.
I have tried below listed things:
I have used copy table wizard.But getting error "ORA-01461: can bind a LONG value only for insert into a LONG column" while copy for table containg LONG RAW column.Tables in both databases are exactly same.
I have tried creating a oracle function to use pl/sql to insert long raw data by binding the long raw column.
I am making call to oracle function in "Execute sql script" step in PENTAHO.
select function_name(prameter1,parameter2,long raw column,.....) from dual.
But getting error "String literal too long".
Any suggestion how to do copy long raw data having size around 89330 bytes from one table to another.

Tom Kyte writes:
August 26, 2008 - 7pm UTC:
long raw, not going to happen over a database link
(https://asktom.oracle.com/pls/asktom/f?p=100:11:0::::P11_QUESTION_ID:1101164500346436784)
You can try to create "temporary" table at one source DB where you convert from LONG RAW to CLOB using function TO_LOB.
And then you can transfer data to destination DB.
Then, if you absolutely need LONG RAW data type, you can convert back using method described here - Copying data from LOB Column to Long Raw Column

Related

Oracle CLOB data type to Redshift data type

we are in the process of migrating Oracle tables to redshift tables. We found that few tables are having CLOB data type. In redshift we converted CLOB to Varchar(65535) type. While doing copy command , we are getting
The length of the data column investigation_process is longer than the length defined in the table. Table: 65000, Data: 90123.
Which data type we need to use? Please share your suggestion.
Redshift isn't designed to store CLOB (or BLOB) data. Most databases that do store the CLOB separately from the table contents to not burden all queries with the excess data. A CLOB reference is stored in the table contents and a replacement of CLOB for reference is performed at result generation.
CLOBs should be stored in S3 and references to the appropriate CLOB (S3 key) stored in the Redshift table. The issue is that there isn't a prepackaged tool for doing the CLOB for reference replacement with Redshift AFAIK. Your solution will need some retooling to perform this replacement actions for all data users. It's doable, it's just going to take a data layer that performs the needed replacement.

SSIS Incremental Load Performace

I have a table with ~800k records and with ~100 fields.
The table has an ID field which is a unique NVACHAR (18) type.
The table has, also, a field called LastModifiedDate which holds the latest changes that were made.
I’m trying to perform an incremental load based on the following:
Initial load of all data (happens once)
Loading, based on LastModifiedDate, only recent changed/added records (~30k)
Based on the key field (ID), performing INSERT/UPDATE on recent data to the existing data
(*) assuming records are not deleted
I’m trying to achieve this by doing the following steps:
Truncate the temp table (which holds the recent data)
Extracting the recent data and storing it in the temp table
Extracting the data from the temp table
Using Lookup with the following definitions:
a. Cache mode = Full Cache
b. Connection Type = OLE DB connection manager
c. No matching entries = Ignore failure
Selecting ID from the final table and linking it to the ID field from temp table and giving the new filed an output alias LKP_ID
Using Conditional Split and checking if ISNULL(LKP_ID) when true means INSERT and false means UPDATE
INSERT means that that the data from temp table will be inserted to the final table and UPDATE means that an SQL UPDATE statement will be executed based on the temp table data
the final result is good BUT the run time is terrible. it takes ~30 minutes or so to complete
The way I would handle this is to use the LastModifiedDate in your source query to get the records from the source table that have changed since the last import.
Then I would import all of those records into an empty staging table on the destination database server.
Then I would execute a stored procedure to do the INSERT/UPDATE of the final destination table from the data in the staging table. A stored procedure on the destination server will perform MUCH faster than using Lookups and Conditional Splits in SSIS.

Edit RAW column in Oracle SQL Developer

I am using Oracle SQL Developer 18.3 but when I want to edit(or insert) a column with RAW datatype it shows the field as read only and does not allow to edit.
As you may know Oracle SQL Developer shows RAW datatype as hex string despite BLOB datatype that it does not show the value but you can download and upload the BLOB data.
I know that I can update(or insert) the RAW data as hex string like this :
CREATE TABLE t1(the_id NUMBER PRIMARY KEY, raw_col RAW(2000));
INSERT INTO t1(the_id, raw_col) VALUES(1, '1a234c');
But I want do it by Oracle SQL Developer GUI.
Sorry, we do not have a 'raw' editor like we have for BLOBs, so it's up to using SQL.
If you want a reason for that omission, it's partly due to the fact that RAW is not a commonly used data type in Oracle Database.
Related: if you're talking about LONG RAW
We (Oracle) recommend you stop using it, and instead convert them to BLOBs.
The LONG RAW datatype is provided for backward compatibility with
existing applications. For new applications, use the BLOB and BFILE
datatypes for large amounts of binary data. Oracle also recommends
that you convert existing LONG RAW columns to LOB columns. LOB columns
are subject to far fewer restrictions than LONG columns. Further, LOB
functionality is enhanced in every release, whereas LONG RAW
functionality has been static for several releases.

SSIS, what is causing the slow performance?

For Source: OLE DB Source - Sql Command
SELECT -- The destination table Id has IDENTITY(1,1) so I didn't take it here
[GsmUserId]
,[GsmOperatorId]
,[SenderHeader]
,[SenderNo]
,[SendDate]
,[ErrorCodeId]
,[OriginalMessageId]
,[OutgoingSmsId]
,24 AS [MigrateTypeId] --This is a static value
FROM [MyDb].[migrate].[MySource] WITH (NOLOCK)
To Destination: OLE DB Destination
Takes 5 or more minutes to insert 1.000.000 data. I even unchecked Check Constraints
Then, with the same SSIS configurations I wanted to test it with another table exactly the same as the Destination table. So, I re-create the destination table (with the same constrains except the inside data) and named as dbo.MyDestination.
But it takes about 30 seconds or less to complete the SAME data with the same amount of Data.
Why is it significantly faster with the test table and not the original table? Is it because the original table already has 107.000.000 data?
Check for indexes/triggers/constraints etc. on your destination table. These may slow things down considerably.
Check OLE DB connection manager's Packet Size, set it appropriately, you can follow this article to set it to right value.
If you are familiar with of SQL Server Profiler, then use it to get more insight especially what happens when you use re-created table to insert data against original table.

Import most recent data from CSV to SQL Server with SSIS

Here's the deal; the issue isn't with getting the CSV into SQL Server, it's getting it to work how I want it... which I guess is always the issue :)
I have a CSV file with columns like: DATE, TIME, BARCODE, etc... I use a derived column transformation to concatenate the DATE and TIME into a DATETIME for my import into SQL Server, and I import all data into the database. The issue is that we only get a new .CSV file every 12 hours, and for example sake we will say the .CSV is updated four times in a minute.
With the logic that we will run the job every 15 minutes, we will get a ton of overlapping data. I imagine I will use a variable, say LastCollectedTime which can be pulled from my SQL database using the MAX(READTIME). My problem comes in that I only want to collect rows with a readtime more recent than that variable.
Destination table structure:
ID, ReadTime, SubID, ...datacolumns..., LastModifiedTime where LastModifiedTime has a default value of GETDATE() on the last insert.
Any ideas? Remember, our readtime is a Derived Column, not sure if it matters or not.
Here is one approach that you can make use of:
Let's assume that your destination table in SQL Server is named BarcodeData.
Create a staging table (say BarcodeStaging) in your database that has the same column structure as your destination table BarcodeData into which CSV data is imported into.
In the SSIS package, add an Execute SQL Task before the Data Flow Task to truncate the staging table BarcodeStaging.
Import the CSV data into the staging table BarcodeStaging and not into the actual destination table.
Use the MERGE statement (I assume that you are using SQL Server 2008 or higher version), to compare the staging table BarCodeStaging and the actual destination table BarcodeData using the DateTime column as the join key. If there are unmatched rows, then copy the rows from the staging table and insert them into the destination table.
Technet link to MERGE statement: http://technet.microsoft.com/en-us/library/bb510625.aspx
Hope that helps.

Resources