Import massive table from Oracle to PostgreSQL with oracle-fdw return ORA-01406 - oracle

I work on a project to transfer data from an Oracle database to a PostgreSQL database to create a datawarehouse with bash & SQL scripts. To access to the Oracle database, I use the PostgreSQL extension oracle-fdw.
One of my scripts import data from a massive table (~ 100 000 000 new rows/day). This table is partitioned and each partition contains 1 day of data. The query I use to import data looks like that :
INSERT INTO postgre_target_table (some_fields)
SELECT some_aggregated_fields -- (~150 fields)
FROM oracle_source_table
WHERE partition_id = :v_partition_id AND some_others_filters
GROUP BY primary_key;
On DEV server, the query works fine (there is much less data on this server) but in PREPROD, it returns the error ORA-01406: fetched column value was truncated.
In some posts, people say that the output fields may be too small but if I try to send a simple SELECT query without INSERT or GROUP BY I have the same error.
Another idea I found in another post is to create an Oracle side view but in my query I use multiple parameters that I cannot use in a view.
The last idea I found is to create an Oracle stored procedure that fills a table with aggregated data and then import data from this table but the Oracle database is critical and my customer prefers to avoid adding more data on it.
Now, I'm starting to think there's no solution and it's not good...
PostgreSQL version : 12.4 / Oracle version : 11.2
UPDATE
It seems my problem is more complecated than I thought.
After applying the modification given by Laurenz Albe, the query runs correctly on PGAdmin but the problem still appears when I use psql command.
Moreover, another query seems to have the same problem. This other query does not use the same source table as the first query, it uses 4 joined tables without any partition. The common point between these queries is the structure.
The detail I omit to specify in the original post is that the purpose of both queries is to pivot a table. They look like that :
SELECT osr.id,
MIN(CASE osr.category
WHEN 123 THEN
1
END) AS field1,
MIN(CASE osr.category
WHEN 264 THEN
1
END) AS field2,
MIN(CASE osr.category
WHEN 975 THEN
1
END) AS field3,
...
FROM oracle_source_table osr
WHERE osr.category IN (123, 264, 975, ...)
GROUP BY osr.id;
Now that I have detailed what the queries look like, I can give you some results I had with the second one without changing the value of max_long (this query is lighter than the first one) :
Sometimes it works (~10%), sometimes it failed (~90%) on PGadmin but it never works with psql command
If I delete the WHERE, it always works
I don't understand why deleting the WHERE change something, the field used in this clause is a NUMBER(6, 0) between 0 and 2500 and it is still used in the SELECT clause... Oh and in the 4 Oracle tables used by this query, there is no LONG datatype, only NUMBER datatype is used.
Among 20 queries I have, only these two have a problem, their structure is similar and I don't believe in coincidences.

Don't despair!
Set the max_long option on the foreign table big enough that all your oversized data fit.
The documentation has the details:
max_long (optional, defaults to "32767")
The maximal length of any LONG, LONG RAW and XMLTYPE columns in the Oracle table. Possible values are integers between 1 and 1073741823 (the maximal size of a bytea in PostgreSQL). This amount of memory will be allocated at least twice, so large values will consume a lot of memory.
If max_long is less than the length of the longest value retrieved, you will receive the error message
ORA-01406: fetched column value was truncated
Example:
ALTER FOREIGN TABLE my_tab OPTIONS (ADD max_long '1000000');

Related

Import blob through SAS from ORACLE DB

Good time of a day to everyone.
I face with a huge problem during my work on previous week.
Here ia the deal:
I need to download exel file (blob) from ORACLE database through SAS.
I am using:
First step i need to get data from oracle. I used the construction (blob file is nearly 100kb):
proc sql;
connect to oracle;
create table SASTBL as
select * from connection to oracle (
select dbms_lob.substr(myblobfield,1,32767) as blob_1,
dbms_lob.substr(myblobfield,32768,32767) as blob_2,
dbms_lob.substr(myblobfield,65535,32767) as blob_3,
dbms_lob.substr(myblobfield,97302,32767) as blob_4
from my_tbl;
);
quit;
And the result is:
blob_1 = 70020202020202...02
blob_2 = 02020202020...02
blob_3 = 02020202...02
I do not understand why the field consists from "02"(the whole file)
And the length of any variable in sas is 1024 (instead of 37767) $HEX2024 format.
If I ll take:
dbms_lob.substr(my_blob_field,2000,900) from the same object the result will mush more similar to the truth:
blob = "A234ABC4536AE7...."
The question is: 1. how can i get binary data from blob field correctly trough SAS? What is my mistake?
Thank you.
EDIT 1:
I get the information but max string is 2000 kb.
Use the DBMAX_TEXT option on the CONNECT statement (or a LIBNAME statement) to get up to 32,767 characters. The default is probably 1024.
PROC SQL uses SQL to interact with SAS datasets (create tables, query tables, aggregate data, connect externally, etc.). The procedure mostly follows the ANSI standard with a few SAS specific extensions. Each RDMS extends ANSI including Oracle with its XML handling such as saving content in a blob column. Possibly, SAS cannot properly read the Oracle-specific (non-ANSI) binary large object type. Typically SAS processes string, numeric, datetime, and few other types.
As an alternative, consider saving XML content from Oracle externally as an .xml file and use SAS's XML engine to read content into SAS dataset:
** STORING XML CONTENT;
libname tempdata xml 'C:\Path\To\XML\File.xml';
** APPEND CONTENT TO SAS DATASET;
data Work.XMLData;
set tempdata.NodeName; /* CHANGE TO REPEAT PARENT NODE OF XML. */
run;
Adding as another answer as I can't comment yet... the issue you experienced is that the return of dbms_lob.substr is actually a varchar so SAS limits it to 2,000. To avoid this, you could wrap it in to_clob( ... ) AND set the DBMAX_TEXT option as previously answered.
Another alternative is below...
The code below is an effective method for retrieving a single record with a large CLOB. Instead of calculating how many fields to split the clob into resulting in a very wide record, it instead splits it into multiple rows. See expected output at bottom.
Disclaimer: Although effective it may not be efficient ie may not scale well to multiple rows, the generally accepted approach then is row pipelining PLSQL. That being said, the below got me out of a pinch if you can't make a procedure...
PROC SQL;
connect to oracle (authdomain=YOUR_Auth path=devdb DBMAX_TEXT=32767 );
create table clob_chunks (compress=yes) as
select *
from connection to Oracle (
SELECT id
, key
, level clob_order
, regexp_substr(clob_value, '.{1,32767}', 1, level, 'n') clob_chunk
FROM (
SELECT id, key, clob_value
FROM schema.table
WHERE id = 123
)
CONNECT BY LEVEL <= regexp_count(clob_value, '.{1,32767}',1,'n')
)
order by id, key, clob_order;
disconnect from oracle;
QUIT;
Expected output:
ID KEY CHUNK CLOB
1 1 1 short_clob
2 2 1 long clob chunk1of3
2 2 2 long clob chunk2of3
2 2 3 long clob chunk3of3
3 3 1 another_short_one
Explanation:
DBMAX_TEXT tells SAS to adjust the default of 1024 for a clob field.
The regex .{1,32767} tells Oracle to match at least once but no more than 32767 times. This splits the input and captures the last chunk which is likely to be under 32767 in length.
The regexp_substr is pulling a chunk from the clob (param1) starting from the start of the clob (param2), skipping to the 'level'th occurance (param3) and treating the clob as one large string (param4 'n').
The connect by re-runs the regex to count the chunks to stop the level incrementing beyond end of the clob.
References:
SAS KB article for DBMAX_TEXT
Oracle docs for REGEXP_COUNT
Oracle docs for REGEXP_SUBSTR
Oracle regex syntax
Stackoverflow example of regex splitting

How to update an Oracle Table from SAS efficiently?

The problem I am trying to solve:
I have a SAS dataset work.testData (in the work library) that contains 8 columns and around 1 million rows. All columns are in text (i.e. no numeric data). This SAS dataset is around 100 MB in file size. My objective is to have a step to parse this entire SAS dataset into Oracle. i.e. sort of like a "copy and paste" of the SAS dataset from the SAS platform to the Oracle platform. The rationale behind this is that on a daily basis, this table in Oracle gets "replaced" by the one in SAS which will enable downstream Oracle processes.
My approach to solve the problem:
One-off initial setup in Oracle:
In Oracle, I created a table called testData with a table structure pretty much identical to the SAS dataset testData. (i.e. Same table name, same number of columns, same column names, etc.).
On-going repeating process:
In SAS, do a SQL-pass through to truncate ora.testData (i.e. remove all rows whilst keeping the table structure). This ensure the ora.testData is empty before inserting from SAS.
In SAS, a LIBNAME statement to assign the Oracle database as a SAS library (called ora). So I can "see" what's in Oracle and perform read/update from SAS.
In SAS, a PROC SQL procedure to "insert" data from the SAS dataset work.testData into the Oracle table ora.testData.
Sample codes
One-off initial setup in Oracle:
Step 1: Run this Oracle SQL Script in Oracle SQL Developer (to create table structure for table testData. 0 rows of data to begin with.)
DROP TABLE testData;
CREATE TABLE testData
(
NODENAME VARCHAR2(64) NOT NULL,
STORAGE_NAME VARCHAR2(100) NOT NULL,
TS VARCHAR2(10) NOT NULL,
STORAGE_TYPE VARCHAR2(12) NOT NULL,
CAPACITY_MB VARCHAR2(11) NOT NULL,
MAX_UTIL_PCT VARCHAR2(12) NOT NULL,
AVG_UTIL_PCT VARCHAR2(12) NOT NULL,
JOBRUN_START_TIME VARCHAR2(19) NOT NULL
)
;
COMMIT;
On-going repeating process:
Step 2, 3 and 4: Run this SAS code in SAS
******************************************************;
******* On-going repeatable process starts here ******;
******************************************************;
*** Step 2: Trancate the temporary Oracle transaction dataset;
proc sql;
connect to oracle (user=XXX password=YYY path=ZZZ);
execute (
truncate table testData
) by oracle;
execute (
commit
) by oracle;
disconnect from oracle;
quit;
*** Step 3: Assign Oracle DB as a libname;
LIBNAME ora Oracle user=XXX password=YYY path=ZZZ dbcommit=100000;
*** Step 4: Insert data from SAS to Oracle;
PROC SQL;
insert into ora.testData
select NODENAME length=64,
STORAGE_NAME length=100,
TS length=10,
STORAGE_TYPE length=12,
CAPACITY_MB length=11,
MAX_UTIL_PCT length=12,
AVG_UTIL_PCT length=12,
JOBRUN_START_TIME length=19
from work.testData;
QUIT;
******************************************************;
**** On-going repeatable process ends here *****;
******************************************************;
The limitation / problem to my approach:
The Proc SQL step (that transfer 100 MB of data from SAS to Oracle) takes around 5 hours to perform - the job takes too long to run!
The Question:
Is there a more sensible way to perform data transfer from SAS to Oracle? (i.e. updating an Oracle table from SAS).
First off, you can do the drop/recreate from SAS if that's a necessity. I wouldn't drop and recreate each time - a truncate seems easier to get the same results - but if you have other reasons then that's fine; but either way you can use execute (truncate table xyz) from oracle or similar to drop, using a pass-through connection.
Second, assuming there are no constraints or indexes on the table - which seems likely given you are dropping and recreating it - you may not be able to improve this, because it may be based on network latency. However, there is one area you should look in the connection settings (which you don't provide): how often SAS commits the data.
There are two ways to control this, the DBCOMMMIT setting and the BULKLOAD setting. The former controls how frequently commits are executed (so if DBCOMMIT=100 then a commit is executed every 100 rows). More frequent commits = less data is lost if a random failure occurs, but much slower execution. DBCOMMIT defaults to 0 for PROC SQL INSERT, which means just make one commit (fastest option assuming no errors), so this is less likely to be helpful unless you're overriding this.
Bulkload is probably my recommendation; that uses SQLLDR to load your data, ie, it batches the whole bit over to Oracle and then says 'Load this please, thanks.' It only works with certain settings and certain kinds of queries, but it ought to work here (subject to other conditions - read the documentation page above).
If you're using BULKLOAD, then you may be up against network latency. 5 hours for 100 MB seems slow, but I've seen all sorts of things in my (relatively short) day. If BULKLOAD didn't work I would probably bring in the Oracle DBAs and have them troubleshoot this, starting from a .csv file and a SQL*LDR command file (which should be basically identical to what SAS is doing with BULKLOAD); they should know how to troubleshoot that and at least be able to monitor performance of the database itself. If there are constraints on other tables that are problematic here (ie, other tables that too-frequently recalculate themselves based on your inserts or whatever), they should be able to find out and recommend solutions.
You could look into PROC DBLOAD, which sometimes is faster than inserts in SQL (though all in all shouldn't really be, and is an 'older' procedure not used too much anymore). You could also look into whether you can avoid doing a complete flush and fill (ie, if there's a way to transfer less data across the network), or even simply shrinking the column sizes.

Oracle accessing multiple databases

I'm using Oracle SQL Developer version 4.02.15.21.
I need to write a query that accesses multiple databases. All that I'm trying to do is get a list of all the IDs present in "TableX" (There is an instance of Table1 in each of these databases, but with different values) in each database and union all of the results together into one big list.
My problem comes with accessing more than 4 databases -- I get this error: ORA-02020: too many database links in use. I cannot change the INIT.ORA file's open_links maximum limit.
So I've tried dynamically opening/closing these links:
SELECT Local.PUID FROM TableX Local
UNION ALL
----
SELECT Xdb1.PUID FROM TableX#db1 Xdb1;
ALTER SESSION CLOSE DATABASE LINK db1
UNION ALL
----
SELECT Xdb2.PUID FROM TableX#db2 Xdb2;
ALTER SESSION CLOSE DATABASE LINK db2
UNION ALL
----
SELECT Xdb3.PUID FROM TableX#db3 Xdb3;
ALTER SESSION CLOSE DATABASE LINK db3
UNION ALL
----
SELECT Xdb4.PUID FROM TableX#db4 Xdb4;
ALTER SESSION CLOSE DATABASE LINK db4
UNION ALL
----
SELECT Xdb5.PUID FROM TableX#db5 Xdb5;
ALTER SESSION CLOSE DATABASE LINK db5
However this produces 'ORA-02081: database link is not open.' On whichever db is being closed out last.
Can someone please suggest an alternative or adjustment to the above?
Please provide a small sample of your suggestion with syntactically correct SQL if possible.
If you can't change the open_links setting, you cannot have a single query that selects from all the databases you want to query.
If your requirement is to query a large number of databases via database links, it seems highly reasonable to change the open_links setting. If you have one set of people telling you that you need to do X (query data from a large number of tables) and another set of people telling you that you cannot do X, it almost always makes sense to have those two sets of people talk and figure out which imperative wins.
If we can solve the problem without writing a single query, then you have options. You can write a bit of PL/SQL, for example, that selects the data from each table in turn and does something with it. Depending on the number of database links involved, it may make sense to write a loop that generates a dynamic SQL statement for each database link, executes the SQL, and then closes the database link.
If you want need to provide a user with the ability to run a single query that returns all the data, you can write a pipelined table function that implements this sort of loop with dynamic SQL and then let the user query the pipelined table function. This isn't really a single query that fetches the data from all the tables. But it is as close as you're likely to get without modifying the open_links limit.

Oracle: Coercing VARCHAR2 and CLOB to the same type without truncation

In an app that supports MS SQL Server, MySQL, and Oracle, there's a table with the following relevant columns (types shown here are for Oracle):
ShortText VARCHAR2(1700) indexed
LongText CLOB
The app stores values 850 characters or less in ShortText, and longer ones in LongText. I need to create a view that returns that data, whichever column it's in. This works for SQL Server and MySQL:
SELECT
CASE
WHEN ShortText IS NOT NULL THEN ShortText
ELSE LongText
END AS TheValue
FROM MyTable
However, on Oracle, it generates this error:
ORA-00932: inconsistent datatypes: expected CHAR got CLOB
...meaning that Oracle won't implicitly convert the two columns to the same type, so the query has to do it explicitly. Don't want data to get truncated, so the type used has to be able to hold as much data as a CLOB, which as I understand it (not an Oracle expert) means CLOB, only, no other choices are available.
This works on Oracle:
SELECT
CASE
WHEN ShortText IS NOT NULL THEN TO_CLOB(ShortText)
ELSE LongText
END AS TheValue
FROM MyTable
However, performance is amazingly awful. A query that returns LongText directly took 70-80 ms for about 9k rows, but the above construct took between 30 and 60 seconds, unacceptable.
So:
Are there any other Oracle types I could coerce both columns to
that can hold as much data as a CLOB? Ideally something more
text-oriented, like MySQL's LONGTEXT, or SQL Server's NTEXT (or even
better, NVARCHAR(MAX))?
Any other approaches I should be looking at?
Some specifics, in particular ones requested by #Guido Leenders:
Oracle version: Oracle Database 11g 11.2.0.1.0 64bit Production
Not certain if I was the only user, but the relative times are still striking.
Stats for the small table where I saw the performance I posted earlier:
rowcount: 9,237
varchar column total length: 148,516
clob column total length: 227,020
The to_clob is pretty expensive, so try to avoid it. But I think it should perform reasonable well for 9K rows. Following test case based upon one of the applications we develop which has the similar datamodel behaviour:
create table bubs_projecten_sample
( id number
, toelichting varchar2(1700)
, toelichting_l clob
)
begin
for i in 1..10000
loop
insert into bubs_projecten_sample
( id
, toelichting
, toelichting_l
)
values
( i
, case when mod(i, 2) = 0 then 'short' else null end
, case when mod(i, 2) = 0 then rpad('long', i, '*') else null end
)
;
end loop;
commit;
end;
Now make sure everything in cache and dirty blocks written out:
select *
from bubs_projecten_sample
Test performance:
create table bubs_projecten_flat
as
select id
, to_clob(toelichting) toelichting_any
from bubs_projecten_sample
where toelichting is not null
union all
select id
, toelichting_l
from bubs_projecten_sample
where toelichting_l is not null
The create table take less than 1 second on a normal entry level server, including writing out the data, 17K consistent gets, 4K physical reads. Stored on disk (note the rpad) is 25K for toelichting and 16M for toelichting_l.
Can you further elaborate on the problem?
Please check that large CLOBs are not stored inline. Normally large CLOBs are stored in a separate system-maintained table. Storing large CLOBs inside a table can make going through the table with a Full Table Scan expensive.
Also, I can imagine populating both columns always. You still have the benefits of indexing working for the first so many characters. You just need to memorize in the table using an indicator whether the CLOB or the shortText column is leading.
As a side note; I see a difference between 850 and 1700. I would recommend making them equal, but remember to check that you are creating the table using character semantics. That can be done on statement level by using: "varchar2(850 char)". Please note that Oracle will actually create a column that fits 850 * 4 bytes (in AL32UTF8 at least, there the "32" stands for "4 bytes at most per character"). Good luck!

different columns on select query results different costs

There is an index at table invt_item_d on (item_id & branch_id & co_id) columns.
The plan results for the first query are TABLE ACCESS FULL and cost is 528,
results for the second query are INDEX FAST FULL SCAN (my index) and cost is 27.
The only difference is, as you can see, the selected column is used in index on the second query.
Is there something wrong with this? And please, can you tell me what should I do to fix this at db administration level?
select d.qty
from invt_item_d d
where d.item_id = 999
and d.branch_id = 888
and d.co_id = 777
select d.item_id
from invt_item_d d
where d.item_id = 999
and d.branch_id = 888
and d.co_id = 777
EDIT:
i made a new query and this query's cost is 529, with TABLE ACCESS FULL.
select qty from invt_item_d
so it doesn't matter if i use an index or not. Some says this is normal, is this a normal behaviour really?
In the first case, the table must be accessed, since the "qty" column is only stored in the table.
In the second case, all the columns used in the query can be read from the index, skipping the table read altogether.
You can add another index on columns (item_id, branch_id, co_id, qty) and it will most probably be used in the first query.
From the Oracle documentation: http://docs.oracle.com/cd/E11882_01/server.112/e25789/indexiot.htm
A fast full index scan is a full index scan in which the database
accesses the data in the index itself without accessing the table, and
the database reads the index blocks in no particular order.
Fast full index scans are an alternative to a full table scan when
both of the following conditions are met:
The index must contain all columns needed for the query.
A row containing all nulls must not appear in the query result set. For this result to be guaranteed, at least one column in the
index must have either:
A NOT NULL constraint
A predicate applied to it that prevents nulls from being considered in the query result set
This is exactly the main purpose of using index -- make search faster.
Querying columns with indexes are faster compared to querying columns without indexes.
Its basic oracle knowledge.
I am adding another answer because it seems to be more convinient.
First:
" i doesn't hit the index because there are 34000 rows, not millions". This is COMPLETELY WRONG and a dangerous understanding.
What I meant was, if there are a few thousand rows, and the index is not hit(oracle engine does a full table scan(TABLE ACCESS FULL) then), its not a big deal. Oracle is fast enough to read few thousand rows in a matter of a second(even without indexes) , and hence you wont feel the difference.The query is still slower(than the occasion when there is an index) , but its is so minimally slower that you wont feel the difference.
But, if there are millions of rows, the execution of the query will be much, much slower without index ( as this time it will scan millions of rows in a full table scan)and your performance will be hit.
Second: Why on earth do you have to loop over a table with 34000 rows, that too 4000 times???
Thats a terrible approach. Avoid loops as much as possible.There has to be a better approach!
Third:
You can force the oracle optimiser to hit the index by using the index hint.You will need to know the name of the index for that.
select /*+ index(invt_item_d <index_name>) */
d.qty
from invt_item_d d
where d.item_id = 999
and d.branch_id = 888
and d.co_id = 777
Here is the link to a stack overflow question on index hint

Resources