Oracle: Coercing VARCHAR2 and CLOB to the same type without truncation - performance

In an app that supports MS SQL Server, MySQL, and Oracle, there's a table with the following relevant columns (types shown here are for Oracle):
ShortText VARCHAR2(1700) indexed
LongText CLOB
The app stores values 850 characters or less in ShortText, and longer ones in LongText. I need to create a view that returns that data, whichever column it's in. This works for SQL Server and MySQL:
SELECT
CASE
WHEN ShortText IS NOT NULL THEN ShortText
ELSE LongText
END AS TheValue
FROM MyTable
However, on Oracle, it generates this error:
ORA-00932: inconsistent datatypes: expected CHAR got CLOB
...meaning that Oracle won't implicitly convert the two columns to the same type, so the query has to do it explicitly. Don't want data to get truncated, so the type used has to be able to hold as much data as a CLOB, which as I understand it (not an Oracle expert) means CLOB, only, no other choices are available.
This works on Oracle:
SELECT
CASE
WHEN ShortText IS NOT NULL THEN TO_CLOB(ShortText)
ELSE LongText
END AS TheValue
FROM MyTable
However, performance is amazingly awful. A query that returns LongText directly took 70-80 ms for about 9k rows, but the above construct took between 30 and 60 seconds, unacceptable.
So:
Are there any other Oracle types I could coerce both columns to
that can hold as much data as a CLOB? Ideally something more
text-oriented, like MySQL's LONGTEXT, or SQL Server's NTEXT (or even
better, NVARCHAR(MAX))?
Any other approaches I should be looking at?
Some specifics, in particular ones requested by #Guido Leenders:
Oracle version: Oracle Database 11g 11.2.0.1.0 64bit Production
Not certain if I was the only user, but the relative times are still striking.
Stats for the small table where I saw the performance I posted earlier:
rowcount: 9,237
varchar column total length: 148,516
clob column total length: 227,020

The to_clob is pretty expensive, so try to avoid it. But I think it should perform reasonable well for 9K rows. Following test case based upon one of the applications we develop which has the similar datamodel behaviour:
create table bubs_projecten_sample
( id number
, toelichting varchar2(1700)
, toelichting_l clob
)
begin
for i in 1..10000
loop
insert into bubs_projecten_sample
( id
, toelichting
, toelichting_l
)
values
( i
, case when mod(i, 2) = 0 then 'short' else null end
, case when mod(i, 2) = 0 then rpad('long', i, '*') else null end
)
;
end loop;
commit;
end;
Now make sure everything in cache and dirty blocks written out:
select *
from bubs_projecten_sample
Test performance:
create table bubs_projecten_flat
as
select id
, to_clob(toelichting) toelichting_any
from bubs_projecten_sample
where toelichting is not null
union all
select id
, toelichting_l
from bubs_projecten_sample
where toelichting_l is not null
The create table take less than 1 second on a normal entry level server, including writing out the data, 17K consistent gets, 4K physical reads. Stored on disk (note the rpad) is 25K for toelichting and 16M for toelichting_l.
Can you further elaborate on the problem?
Please check that large CLOBs are not stored inline. Normally large CLOBs are stored in a separate system-maintained table. Storing large CLOBs inside a table can make going through the table with a Full Table Scan expensive.
Also, I can imagine populating both columns always. You still have the benefits of indexing working for the first so many characters. You just need to memorize in the table using an indicator whether the CLOB or the shortText column is leading.
As a side note; I see a difference between 850 and 1700. I would recommend making them equal, but remember to check that you are creating the table using character semantics. That can be done on statement level by using: "varchar2(850 char)". Please note that Oracle will actually create a column that fits 850 * 4 bytes (in AL32UTF8 at least, there the "32" stands for "4 bytes at most per character"). Good luck!

Related

Insertion of characters into number column

I have a table with several number columns that are inserted through a Asp.Net application using bind variables.
Due to upgrade of Oracle client to 19c and server change, the code instead of giving an error on insert of invalid data, inserts trash and the application crashes aftewards.
Any help is appreciated in finding the root cause.
SELECT trial1,
DUMP (trial1, 17),
DUMP (trial1, 1016),
trial3,
DUMP (trial3,17),
DUMP (trial3, 1016)
Result in SQL Navigator
results of query
Oracle 12c
Oracle client 19
My DBA found this on Oracle Support and that lead to us find the error in the application side:
NaN is a specific IEEE754 value. However Oracle NUMBER is not IEEE754
compliant. Therefore if you force the data representing NaN into a
NUMBER column results are unpredicatable. SOLUTION If you can put a
value in a C float, double, int etc you can load this into the
database as no checks are undertaken - just as with the Oracle NUMBER
datatype it's up to the application to ensure the data is valid. If
you use the proper IEEE754 compliant type, eg BINARY_FLOAT, then NaN
is recognised and handled correctly.
You have bad data as you have tried to store an double precision NAN value in a NUMBER column rather than a BINARY_DOUBLE column.
We can duplicate the bad data with the function (never use this in a production environment):
CREATE FUNCTION createNumber(
hex VARCHAR2
) RETURN NUMBER DETERMINISTIC
IS
n NUMBER;
BEGIN
DBMS_STATS.CONVERT_RAW_VALUE( HEXTORAW( hex ), n );
RETURN n;
END;
/
Then, we can duplicate your bad values using the hexadecimal values from your DUMP output:
CREATE TABLE table_name (trial1 NUMBER, trial3 NUMBER);
INSERT INTO table_name (trial1, trial3) VALUES (
createNumber('FF65'),
createNumber('FFF8000000000000')
);
Then:
SELECT trial1,
DUMP(trial1, 16) AS t1_hexdump,
trial3,
DUMP(trial3, 16) AS t3_hexdump
FROM table_name;
Replicates your output:
TRIAL1
T1_HEXDUMP
TRIAL3
T3_HEXDUMP
~
Typ=2 Len=2: ff,65
null
Typ=2 Len=8: ff,f8,0,0,0,0,0,0
Any help is appreciated in finding the root cause.
You need to go back through your application and work out where the bad data came from and see if you can determine what the original data was and debug the steps it went through in the application to work out if it was:
Always bad data, and then you need to put in some validation into your application to make sure the bad data does not get propagated; or
Was good data but there is a bug in your code that changed it and then you need to fix the bug.
As for the existing bad data, you either need to correct it (if you know what it should be) or delete it.
We cannot help with any of that as we do not have visibility of your application nor do we know what the correct data should have been.
If you want to store that data as a floating point then you need to change from using a NUMBER to using a BINARY_DOUBLE data type:
CREATE TABLE table_name (value BINARY_DOUBLE);
INSERT INTO table_name(value) VALUES (BINARY_DOUBLE_INFINITY);
INSERT INTO table_name(value) VALUES (BINARY_DOUBLE_NAN);
Then:
SELECT value,
DUMP(value, 16)
FROM table_name;
Outputs:
VALUE
DUMP(VALUE,16)
Inf
Typ=101 Len=8: ff,f0,0,0,0,0,0,0
Nan
Typ=101 Len=8: ff,f8,0,0,0,0,0,0
Then BINARY_DOUBLE_NAN exactly matches the binary value in your column and you have tried to insert a Not-A-Number value into a NUMBER column (that does not support it) in the format expected for a BINARY_DOUBLE column (that would support it).
The issue was a division by zero on the application side that was inserted as infinity into the database, but Oracle has an unpredictable behavior with this values.
Please see original post above for all the details.

Import massive table from Oracle to PostgreSQL with oracle-fdw return ORA-01406

I work on a project to transfer data from an Oracle database to a PostgreSQL database to create a datawarehouse with bash & SQL scripts. To access to the Oracle database, I use the PostgreSQL extension oracle-fdw.
One of my scripts import data from a massive table (~ 100 000 000 new rows/day). This table is partitioned and each partition contains 1 day of data. The query I use to import data looks like that :
INSERT INTO postgre_target_table (some_fields)
SELECT some_aggregated_fields -- (~150 fields)
FROM oracle_source_table
WHERE partition_id = :v_partition_id AND some_others_filters
GROUP BY primary_key;
On DEV server, the query works fine (there is much less data on this server) but in PREPROD, it returns the error ORA-01406: fetched column value was truncated.
In some posts, people say that the output fields may be too small but if I try to send a simple SELECT query without INSERT or GROUP BY I have the same error.
Another idea I found in another post is to create an Oracle side view but in my query I use multiple parameters that I cannot use in a view.
The last idea I found is to create an Oracle stored procedure that fills a table with aggregated data and then import data from this table but the Oracle database is critical and my customer prefers to avoid adding more data on it.
Now, I'm starting to think there's no solution and it's not good...
PostgreSQL version : 12.4 / Oracle version : 11.2
UPDATE
It seems my problem is more complecated than I thought.
After applying the modification given by Laurenz Albe, the query runs correctly on PGAdmin but the problem still appears when I use psql command.
Moreover, another query seems to have the same problem. This other query does not use the same source table as the first query, it uses 4 joined tables without any partition. The common point between these queries is the structure.
The detail I omit to specify in the original post is that the purpose of both queries is to pivot a table. They look like that :
SELECT osr.id,
MIN(CASE osr.category
WHEN 123 THEN
1
END) AS field1,
MIN(CASE osr.category
WHEN 264 THEN
1
END) AS field2,
MIN(CASE osr.category
WHEN 975 THEN
1
END) AS field3,
...
FROM oracle_source_table osr
WHERE osr.category IN (123, 264, 975, ...)
GROUP BY osr.id;
Now that I have detailed what the queries look like, I can give you some results I had with the second one without changing the value of max_long (this query is lighter than the first one) :
Sometimes it works (~10%), sometimes it failed (~90%) on PGadmin but it never works with psql command
If I delete the WHERE, it always works
I don't understand why deleting the WHERE change something, the field used in this clause is a NUMBER(6, 0) between 0 and 2500 and it is still used in the SELECT clause... Oh and in the 4 Oracle tables used by this query, there is no LONG datatype, only NUMBER datatype is used.
Among 20 queries I have, only these two have a problem, their structure is similar and I don't believe in coincidences.
Don't despair!
Set the max_long option on the foreign table big enough that all your oversized data fit.
The documentation has the details:
max_long (optional, defaults to "32767")
The maximal length of any LONG, LONG RAW and XMLTYPE columns in the Oracle table. Possible values are integers between 1 and 1073741823 (the maximal size of a bytea in PostgreSQL). This amount of memory will be allocated at least twice, so large values will consume a lot of memory.
If max_long is less than the length of the longest value retrieved, you will receive the error message
ORA-01406: fetched column value was truncated
Example:
ALTER FOREIGN TABLE my_tab OPTIONS (ADD max_long '1000000');

Can I use an Oracle trigger to force a varchar column to fit column width?

I have an application which is occasionally being passed data which causes it to fail with ORA-12899 error trying to insert the record into a varchar(2000) column on an Oracle 11g database. )In some cases it is trying to insert a field of more than 4000 characters.)
In the longer term, I will request a change to the application to prevent this happening, but meanwhile I need a short-term fix to allow the record to be saved, truncating the data to 2000 characters.
I've experimented with a "before insert" trigger (which works fine when truncating it from say 1500 to 1000 characters), but what I've found is that where the data is more than 2000 characters (i.e. more than the column length) the insert still fails with ORA-12899.
Is there any way round this, other than changing the database column to a CLOB?
Include the case statement similar to the following code in your insert command
insert into test_table
(column1)
values
(
(case when length('ablekkkkkkkkkkkh') > 2000 then substr('ablekkkkkkkkh',1,2000)
else 'ablekkkkkkkkkkkh' end )
);

Import blob through SAS from ORACLE DB

Good time of a day to everyone.
I face with a huge problem during my work on previous week.
Here ia the deal:
I need to download exel file (blob) from ORACLE database through SAS.
I am using:
First step i need to get data from oracle. I used the construction (blob file is nearly 100kb):
proc sql;
connect to oracle;
create table SASTBL as
select * from connection to oracle (
select dbms_lob.substr(myblobfield,1,32767) as blob_1,
dbms_lob.substr(myblobfield,32768,32767) as blob_2,
dbms_lob.substr(myblobfield,65535,32767) as blob_3,
dbms_lob.substr(myblobfield,97302,32767) as blob_4
from my_tbl;
);
quit;
And the result is:
blob_1 = 70020202020202...02
blob_2 = 02020202020...02
blob_3 = 02020202...02
I do not understand why the field consists from "02"(the whole file)
And the length of any variable in sas is 1024 (instead of 37767) $HEX2024 format.
If I ll take:
dbms_lob.substr(my_blob_field,2000,900) from the same object the result will mush more similar to the truth:
blob = "A234ABC4536AE7...."
The question is: 1. how can i get binary data from blob field correctly trough SAS? What is my mistake?
Thank you.
EDIT 1:
I get the information but max string is 2000 kb.
Use the DBMAX_TEXT option on the CONNECT statement (or a LIBNAME statement) to get up to 32,767 characters. The default is probably 1024.
PROC SQL uses SQL to interact with SAS datasets (create tables, query tables, aggregate data, connect externally, etc.). The procedure mostly follows the ANSI standard with a few SAS specific extensions. Each RDMS extends ANSI including Oracle with its XML handling such as saving content in a blob column. Possibly, SAS cannot properly read the Oracle-specific (non-ANSI) binary large object type. Typically SAS processes string, numeric, datetime, and few other types.
As an alternative, consider saving XML content from Oracle externally as an .xml file and use SAS's XML engine to read content into SAS dataset:
** STORING XML CONTENT;
libname tempdata xml 'C:\Path\To\XML\File.xml';
** APPEND CONTENT TO SAS DATASET;
data Work.XMLData;
set tempdata.NodeName; /* CHANGE TO REPEAT PARENT NODE OF XML. */
run;
Adding as another answer as I can't comment yet... the issue you experienced is that the return of dbms_lob.substr is actually a varchar so SAS limits it to 2,000. To avoid this, you could wrap it in to_clob( ... ) AND set the DBMAX_TEXT option as previously answered.
Another alternative is below...
The code below is an effective method for retrieving a single record with a large CLOB. Instead of calculating how many fields to split the clob into resulting in a very wide record, it instead splits it into multiple rows. See expected output at bottom.
Disclaimer: Although effective it may not be efficient ie may not scale well to multiple rows, the generally accepted approach then is row pipelining PLSQL. That being said, the below got me out of a pinch if you can't make a procedure...
PROC SQL;
connect to oracle (authdomain=YOUR_Auth path=devdb DBMAX_TEXT=32767 );
create table clob_chunks (compress=yes) as
select *
from connection to Oracle (
SELECT id
, key
, level clob_order
, regexp_substr(clob_value, '.{1,32767}', 1, level, 'n') clob_chunk
FROM (
SELECT id, key, clob_value
FROM schema.table
WHERE id = 123
)
CONNECT BY LEVEL <= regexp_count(clob_value, '.{1,32767}',1,'n')
)
order by id, key, clob_order;
disconnect from oracle;
QUIT;
Expected output:
ID KEY CHUNK CLOB
1 1 1 short_clob
2 2 1 long clob chunk1of3
2 2 2 long clob chunk2of3
2 2 3 long clob chunk3of3
3 3 1 another_short_one
Explanation:
DBMAX_TEXT tells SAS to adjust the default of 1024 for a clob field.
The regex .{1,32767} tells Oracle to match at least once but no more than 32767 times. This splits the input and captures the last chunk which is likely to be under 32767 in length.
The regexp_substr is pulling a chunk from the clob (param1) starting from the start of the clob (param2), skipping to the 'level'th occurance (param3) and treating the clob as one large string (param4 'n').
The connect by re-runs the regex to count the chunks to stop the level incrementing beyond end of the clob.
References:
SAS KB article for DBMAX_TEXT
Oracle docs for REGEXP_COUNT
Oracle docs for REGEXP_SUBSTR
Oracle regex syntax
Stackoverflow example of regex splitting

SQLPLUS displaying of tables with CLOB types

I have a table which has 3 columns. I have a NUMBER column, CLOB column, and BLOB column. how can i use some sort of SELECT * statement in order to display what I have entered into this table, not just a partial piece of the long character strings i have in there. The only way I know of displaying a long string form a CLOB would be using the DBMS_LOB.substr technique. My BLOB column is currently all NULL so not too worried about displaying that section, Just the number column with its associated CLOB. Thanks!
See here How to query a CLOB column in Oracle
When getting the substring of a CLOB column and using a query tool that has size/buffer restrictions
sometimes you would need to set the BUFFER to a larger size.
For example while using SQL Plus use the SET BUFFER 10000 to set it to 10000 as the default is 4000.
Running the DMBS_LOB.substr command you can also specify the amount of characters you want to return and the offset from which.
So using DMBS_LOB.substr(column, 3000) might restrict it to a small enough amount for the buffer.
See oracle documentation for more info on the substr command
DBMS_LOB.SUBSTR (
lob_loc IN CLOB CHARACTER SET ANY_CS,
amount IN INTEGER := 32767,
offset IN INTEGER := 1)
RETURN VARCHAR2 CHARACTER SET lob_loc%CHARSET;

Resources