Bad performance while writing to Oracle from Kafka using Kafka Connect - oracle

We are using Kafka Connect to write to Oracle and having performance issues while writing to Oracle and based on our understanding so far, It is actually dependent of data and the data type we have defined in Kafka Connect and Oracle.
Test 1:
We got good write performance with sample data set (Dataset 1) in this test.
Test 2:
With another sample dataset (Dataset2), write performance was almost half of what it was in first test.
Upon diving deep, we found that the difference between two data sets was that one of the field (STUDENT_ID) was having all not null values populated in dataset1 whereas in dataset2, it was a mix of NULL and NOT NULL values.
We found that STUDENT_ID field had a data type mismatch between kafka connect and Oracle. Kafka connect had INT64 whereas oracle had VARCHAR(50). so we changed kafka connect data type to STRING and with this change, we were able to get good write performance even with dataset2
Test 3:
During production run, we again ended up with poor write performance and this time we have no clue which field (and its values) are causing this.
Below is the mapping of data type between Kafka connect and Oracle.
Oracle | Kafka Connect
NUMBER(19,0) | int64
NUMBER | bytes (org.apache.kafka.connect.data.Decimal)
VARCHAR2(10) | string
VARCHAR2(1) | string
CHAR(4) | string
VARCHAR2(10) | string
TIMESTAMP | int64 (org.apache.kafka.connect.data.Timestamp)
CLOB | string
Please let me know if you have any suggestions on how to find the root cause for slow performance in Test 3.
Note : While comparing the Test1 and Test2, we suspected that CLOB field might be the cause for poor write performance but it wasn’t the case. Even after removing CLOB field, we got bad performance and it was only once we corrected the data type for STUDENT_ID field on Kafka side, we got the good performance.

Related

How to filter on RAW data type oracle

We are using Oracle for one of our client databases. I am not very well versed with it. There is a column basis on which I need to filter records. The column was printing System.Byte before and when I converted it to VARCHAR(50) it was printing as 000B000000000000000000000000000A.
I need to know how to filter the records with this value in the mentioned column.
If the idea of the column is to represent a hex string:
SELECT UTL_I18N.RAW_TO_CHAR ('000B000000000000000000000000000A', 'AL32UTF8')
FROM DUAL;
could work for you, however more information about the application and expected results will be needed for a more fitting solution.

Import massive table from Oracle to PostgreSQL with oracle-fdw return ORA-01406

I work on a project to transfer data from an Oracle database to a PostgreSQL database to create a datawarehouse with bash & SQL scripts. To access to the Oracle database, I use the PostgreSQL extension oracle-fdw.
One of my scripts import data from a massive table (~ 100 000 000 new rows/day). This table is partitioned and each partition contains 1 day of data. The query I use to import data looks like that :
INSERT INTO postgre_target_table (some_fields)
SELECT some_aggregated_fields -- (~150 fields)
FROM oracle_source_table
WHERE partition_id = :v_partition_id AND some_others_filters
GROUP BY primary_key;
On DEV server, the query works fine (there is much less data on this server) but in PREPROD, it returns the error ORA-01406: fetched column value was truncated.
In some posts, people say that the output fields may be too small but if I try to send a simple SELECT query without INSERT or GROUP BY I have the same error.
Another idea I found in another post is to create an Oracle side view but in my query I use multiple parameters that I cannot use in a view.
The last idea I found is to create an Oracle stored procedure that fills a table with aggregated data and then import data from this table but the Oracle database is critical and my customer prefers to avoid adding more data on it.
Now, I'm starting to think there's no solution and it's not good...
PostgreSQL version : 12.4 / Oracle version : 11.2
UPDATE
It seems my problem is more complecated than I thought.
After applying the modification given by Laurenz Albe, the query runs correctly on PGAdmin but the problem still appears when I use psql command.
Moreover, another query seems to have the same problem. This other query does not use the same source table as the first query, it uses 4 joined tables without any partition. The common point between these queries is the structure.
The detail I omit to specify in the original post is that the purpose of both queries is to pivot a table. They look like that :
SELECT osr.id,
MIN(CASE osr.category
WHEN 123 THEN
1
END) AS field1,
MIN(CASE osr.category
WHEN 264 THEN
1
END) AS field2,
MIN(CASE osr.category
WHEN 975 THEN
1
END) AS field3,
...
FROM oracle_source_table osr
WHERE osr.category IN (123, 264, 975, ...)
GROUP BY osr.id;
Now that I have detailed what the queries look like, I can give you some results I had with the second one without changing the value of max_long (this query is lighter than the first one) :
Sometimes it works (~10%), sometimes it failed (~90%) on PGadmin but it never works with psql command
If I delete the WHERE, it always works
I don't understand why deleting the WHERE change something, the field used in this clause is a NUMBER(6, 0) between 0 and 2500 and it is still used in the SELECT clause... Oh and in the 4 Oracle tables used by this query, there is no LONG datatype, only NUMBER datatype is used.
Among 20 queries I have, only these two have a problem, their structure is similar and I don't believe in coincidences.
Don't despair!
Set the max_long option on the foreign table big enough that all your oversized data fit.
The documentation has the details:
max_long (optional, defaults to "32767")
The maximal length of any LONG, LONG RAW and XMLTYPE columns in the Oracle table. Possible values are integers between 1 and 1073741823 (the maximal size of a bytea in PostgreSQL). This amount of memory will be allocated at least twice, so large values will consume a lot of memory.
If max_long is less than the length of the longest value retrieved, you will receive the error message
ORA-01406: fetched column value was truncated
Example:
ALTER FOREIGN TABLE my_tab OPTIONS (ADD max_long '1000000');

How to update an Oracle Table from SAS efficiently?

The problem I am trying to solve:
I have a SAS dataset work.testData (in the work library) that contains 8 columns and around 1 million rows. All columns are in text (i.e. no numeric data). This SAS dataset is around 100 MB in file size. My objective is to have a step to parse this entire SAS dataset into Oracle. i.e. sort of like a "copy and paste" of the SAS dataset from the SAS platform to the Oracle platform. The rationale behind this is that on a daily basis, this table in Oracle gets "replaced" by the one in SAS which will enable downstream Oracle processes.
My approach to solve the problem:
One-off initial setup in Oracle:
In Oracle, I created a table called testData with a table structure pretty much identical to the SAS dataset testData. (i.e. Same table name, same number of columns, same column names, etc.).
On-going repeating process:
In SAS, do a SQL-pass through to truncate ora.testData (i.e. remove all rows whilst keeping the table structure). This ensure the ora.testData is empty before inserting from SAS.
In SAS, a LIBNAME statement to assign the Oracle database as a SAS library (called ora). So I can "see" what's in Oracle and perform read/update from SAS.
In SAS, a PROC SQL procedure to "insert" data from the SAS dataset work.testData into the Oracle table ora.testData.
Sample codes
One-off initial setup in Oracle:
Step 1: Run this Oracle SQL Script in Oracle SQL Developer (to create table structure for table testData. 0 rows of data to begin with.)
DROP TABLE testData;
CREATE TABLE testData
(
NODENAME VARCHAR2(64) NOT NULL,
STORAGE_NAME VARCHAR2(100) NOT NULL,
TS VARCHAR2(10) NOT NULL,
STORAGE_TYPE VARCHAR2(12) NOT NULL,
CAPACITY_MB VARCHAR2(11) NOT NULL,
MAX_UTIL_PCT VARCHAR2(12) NOT NULL,
AVG_UTIL_PCT VARCHAR2(12) NOT NULL,
JOBRUN_START_TIME VARCHAR2(19) NOT NULL
)
;
COMMIT;
On-going repeating process:
Step 2, 3 and 4: Run this SAS code in SAS
******************************************************;
******* On-going repeatable process starts here ******;
******************************************************;
*** Step 2: Trancate the temporary Oracle transaction dataset;
proc sql;
connect to oracle (user=XXX password=YYY path=ZZZ);
execute (
truncate table testData
) by oracle;
execute (
commit
) by oracle;
disconnect from oracle;
quit;
*** Step 3: Assign Oracle DB as a libname;
LIBNAME ora Oracle user=XXX password=YYY path=ZZZ dbcommit=100000;
*** Step 4: Insert data from SAS to Oracle;
PROC SQL;
insert into ora.testData
select NODENAME length=64,
STORAGE_NAME length=100,
TS length=10,
STORAGE_TYPE length=12,
CAPACITY_MB length=11,
MAX_UTIL_PCT length=12,
AVG_UTIL_PCT length=12,
JOBRUN_START_TIME length=19
from work.testData;
QUIT;
******************************************************;
**** On-going repeatable process ends here *****;
******************************************************;
The limitation / problem to my approach:
The Proc SQL step (that transfer 100 MB of data from SAS to Oracle) takes around 5 hours to perform - the job takes too long to run!
The Question:
Is there a more sensible way to perform data transfer from SAS to Oracle? (i.e. updating an Oracle table from SAS).
First off, you can do the drop/recreate from SAS if that's a necessity. I wouldn't drop and recreate each time - a truncate seems easier to get the same results - but if you have other reasons then that's fine; but either way you can use execute (truncate table xyz) from oracle or similar to drop, using a pass-through connection.
Second, assuming there are no constraints or indexes on the table - which seems likely given you are dropping and recreating it - you may not be able to improve this, because it may be based on network latency. However, there is one area you should look in the connection settings (which you don't provide): how often SAS commits the data.
There are two ways to control this, the DBCOMMMIT setting and the BULKLOAD setting. The former controls how frequently commits are executed (so if DBCOMMIT=100 then a commit is executed every 100 rows). More frequent commits = less data is lost if a random failure occurs, but much slower execution. DBCOMMIT defaults to 0 for PROC SQL INSERT, which means just make one commit (fastest option assuming no errors), so this is less likely to be helpful unless you're overriding this.
Bulkload is probably my recommendation; that uses SQLLDR to load your data, ie, it batches the whole bit over to Oracle and then says 'Load this please, thanks.' It only works with certain settings and certain kinds of queries, but it ought to work here (subject to other conditions - read the documentation page above).
If you're using BULKLOAD, then you may be up against network latency. 5 hours for 100 MB seems slow, but I've seen all sorts of things in my (relatively short) day. If BULKLOAD didn't work I would probably bring in the Oracle DBAs and have them troubleshoot this, starting from a .csv file and a SQL*LDR command file (which should be basically identical to what SAS is doing with BULKLOAD); they should know how to troubleshoot that and at least be able to monitor performance of the database itself. If there are constraints on other tables that are problematic here (ie, other tables that too-frequently recalculate themselves based on your inserts or whatever), they should be able to find out and recommend solutions.
You could look into PROC DBLOAD, which sometimes is faster than inserts in SQL (though all in all shouldn't really be, and is an 'older' procedure not used too much anymore). You could also look into whether you can avoid doing a complete flush and fill (ie, if there's a way to transfer less data across the network), or even simply shrinking the column sizes.

Getting "ORA-01482: unsupported character set" when using WHERE IN pattern

I have a table with the following structure in Oracle database:
CREATE TABLE PASSENGERS
(ID VARCHAR2(6),
PASSPORTNO VARCHAR2(14));
I want to get the IDs of the passengers who have been registered more than once. For that I run the following query.
SELECT ID FROM PASSENGERS WHERE PASSPORTNO IN
(SELECT PASSPORTNO FROM PASSENGERS
GROUP BY PASSPORTNO
HAVING COUNT(*)>1);
But I get "unsuported character set" error. What's the point I'm missing?
Since all queries related with PASSPORTNO are running fine you have at least two more things to do:
Run SELECT ID FROM PASSENGERS and check for errors, if the error cames up, then it may be releated with content stored in your table
Try another SQL tool to execute your queries, your client OS may be using a system enconding which the database can't understand both when processing your query of to display the returning rows.
Since both ID and PASSPORTNO are varchar fields, there's a big change to one of then have data in a enconding which oracle can't decode properly.
Mostly seems like a data issue. Try checking the exact data row which is causing the issue.
Use : DML Error Logging - http://www.oracle-base.com/articles/10g/dml-error-logging-10gr2.php
Btw, you are doing GROUP BY passportno .Is that correct? (This implies multiple passports can have same passport number). I guess it should be GROUP BY id

Oracle single-table constant merge with CLOB using JDBC

As a follow-up to this question, I need help with the following scenario:
In Oracle, given a simple data table:
create table data (
id VARCHAR2(255),
key VARCHAR2(255),
value CLOB);
I am using the following merge command:
merge into data
using (
select
? id,
? key,
? value
from
dual
) val on (
data.id=val.id
and data.key=val.key
)
when matched then
update set data.value = val.value
when not matched then
insert (id, key, value) values (val.id, val.key, val.value);
I am invoking the query via JDBC from a Java application.
When the "value" string is large, the above query results in the following Oracle error:
ORA-01461: cannot bind a LONG value for insert into a long column
I even set the "SetBigStringTryClob" property as documented here with the same result.
Is it possible to achieve the behavior I want given that "value" is a CLOB?
EDIT: Client environment is Java
You haven't mentioned specifically in your post, but judging by the tags for the question, I'm assuming you're doing this from Java.
I've had success with code like this in a project I just finished. This application used Unicode, so there may be simpler solutions if your problem domain is limited to a standard ASCII character set.
Are you currently using the OracleStatement.setCLOB() method? It's a terribly awkward thing to have to do, but we couldn't get around it any other way. You have to actually create a temporary CLOB, and then use that temporary CLOB in the setCLOB() method call.
Now, I've ripped this from a working system, and had to make a few ad-hoc adjustments, so if this doesn't appear to work in your situation, let me know and I'll go back to see if I can get a smaller working example.
This of course assumes you're using the Oracle Corp. JDBC drivers (ojdbc14.jar or ojdbc5.jar) which are found in $ORACLE_HOME/jdbc/lib
CLOB tempClob = CLOB.createTemporary(conn, true, CLOB.DURATION_SESSION);
// Open the temporary CLOB in readwrite mode to enable writing
tempClob.open(CLOB.MODE_READWRITE);
// Get the output stream to write
Writer tempClobWriter = tempClob.getCharacterOutputStream();
// Write the data into the temporary CLOB
tempClobWriter.write(stringData);
// Flush and close the stream
tempClobWriter.flush();
tempClobWriter.close();
// Close the temporary CLOB
tempClob.close();
myStatement.setCLOB(column.order, tempClob);
Regards,
Dwayne King

Resources