I am seeing poor performance in Oracle (11g) when trying to copy CLOBs from one database to another. I have tried several things, but haven't been able to improve this.
The CLOBs are used for gathering report data. This can be quite large on a record by record basis. I am calling a procedure on the remote databases (across a WAN) to build the data, then copying the results back to the database at the corporate headquarters for comparison. The general format is:
CREATE TABLE my_report(the_db VARCHAR2(30), object_id VARCHAR2(30),
final_value CLOB, CONSTRAINT my_report_pk PRIMARY KEY (the_db, object_id));
To gain performance, I accumulate the results for remote sites into remote copies of the table. At the end of the procedure run, I try to copy the data back. This query is very simple:
INSERT INTO my_report SELECT * FROM my_report#europe;
The performance that I am seeing is around 9 rows per second, with an average CLOB size of 3500 bytes. (I am using CLOBs as this size often goes above 4k, the VARCHAR2 limit.) For 70,000 records (not uncommon) this takes around 2 hours to transfer. I have tried using the create table as select method, but this gets the same performance. I also spent more than a few hours tuning SQL*NET, but see no improvement from this. Changing the Arraysize does not improve the performance (though it can reduce it if the value is reduced.
I am able to get a copy over using the old exp/imp methods (export the table from remote, import it back in), which runs much faster, but this is fairly manual for my automated report. I have considered trying to write a pipelined function to select this data from, using it to split the CLOBS into BYTE/VARCHAR2 chunks (with an additional chunk number column), but didn't want to do this if someone had tried it and found a problem.
Thanks for your help.
I was able to get better performance when increasing arraysize to 1500 or higher. See also attached document: http://www.fors.com/velpuri2/PERFORMANCE/SQLNET.pdf
Related
I've seen lots of posts regarding the use of cursors in PL/SQL to return data to a calling application, but none of them touch on the issue I believe I'm having with this technique. I am fairly new to Oracle, but have extensive experience with MSSQL Server. In SQL Server, when building queries to be called by an application for returning data, I usually put the SELECT statement inside a stored proc with/without parameters, and let the stored proc execute the statement(s) and return the data automatically. I've learned that with PL/SQL, you must store the resulting dataset in a cursor and then consume the cursor.
We have a query that doesn't necessarily return huge amounts of rows (~5K - 10K rows), however the dataset is very wide as it's composed of 1400+ columns. Running the SQL query itself in SQL Developer returns results instantaneously. However, calling a procedure that opens a cursor for the same query takes 5+ minutes to finish.
CREATE OR REPLACE PROCEDURE PROCNAME(RESULTS OUT SYS_REFCURSOR)
AS
BEGIN
OPEN RESULTS FOR
<SELECT_query_with_1400+_columns>
...
END;
After doing some debugging to try to get to the root cause of the slowness, I'm leaning towards the cursor returning one row at a time very slowly. I can actually see this real-time by converting the proc code into a PL/SQL block and using DBMS_SQL.return_result(RESULTS) after the SELECT query. When running this, I can see each row show up in the Script output window in SQL Developer one at a time. If this is exactly how the cursor returns the data to the calling application, then I can definitely see how this is the bottleneck as it could take 5-10 minutes to finish returning all 5K-10K rows. If I remove columns from the SELECT query, the cursor displays all the rows much faster, so it does seem like the large amount of columns is an issue using a cursor.
Knowing that running the SQL query by itself returns instant results, how could I get this same performance out of a cursor? It doesn't seem like it's possible. Is the answer putting the embedded SQL in the application code and not using a procedure/cursor to return data in this scenario? We are using Oracle 12c in our environment.
Edit: Just want to address how I am testing performance using the regular SELECT query vs the PL/SQL block with cursor method:
SELECT (takes ~27 seconds to return ~6K rows):
SELECT <1400+_columns>
FROM <table_name>;
PL/SQL with cursor (takes ~5-10 minutes to return ~6K rows):
DECLARE RESULTS SYS_REFCURSOR;
BEGIN
OPEN RESULTS FOR
SELECT <1400+_columns>
FROM <table_name>;
DBMS_SQL.return_result(RESULTS);
END;
Some of the comments are referencing what happens in the console application once all the data is returned, but I am only speaking regarding the performance of the two methods described above within Oracle\SQL Developer. Hope this helps clarify the point I'm trying to convey.
You can run a SQL Monitor report for the two executions of the SQL; that will show you exactly where the time is being spent. I would also consider running the two approaches in separate snapshot intervals and checking into the output from an AWR Differences report and ADDM Compare Report; you'd probably be surprised at the amazing detail these comparison reports provide.
Also, even though > 255 columns in a table is a "no-no" according to Oracle as it will fragment your record across > 1 database blocks, thus increasing the IO time needed to retrieve the results, I suspect the differences in the two approaches that you are seeing is not an IO problem since in straight SQL you report fast result fetching all. Therefore, I suspect more of a memory problem. As you probably know, PL/SQL code will use the Program Global Area (PGA), so I would check the parameter pga_aggregate_target and bump it up to say 5 GB (just guessing). An ADDM report run for the interval when the code ran will tell you if the advisor recommends a change to that parameter.
Questions have been asked in the past that seems to handle pieces of my full question, but I'm not finding a totally good answer.
Here is the situation:
I'm importing data from an old, but operational and production, Oracle server.
One of the columns is created as LONG RAW.
I will not be able to convert the table to a BLOB.
I would like to use a global temporary table to pull out data each time I call to the server.
This feels like a good answer, from here: How to create a temporary table in Oracle
CREATE GLOBAL TEMPORARY TABLE newtable
ON COMMIT PRESERVE ROWS
AS SELECT
MYID, TO_LOB("LONGRAWDATA") "BLOBDATA"
FROM oldtable WHERE .....;
I do not want the table hanging around, and I'd only do a chunk of rows at a time, to pull out the old table in pieces, each time killing the table. Is it acceptable behavior to do the CREATE, then do the SELECT, then DROP?
Thanks..
--- EDIT ---
Just as a follow up, I decided to take an even different approach to this.
Branching the strong-oracle package, I was able to do what I originally hoped to do, which was to pull the data directly from the table without doing a conversion.
Here is the issue I've posted. If I am allowed to publish my code to a branch, I'll post a follow up here for completeness.
Oracle ODBC Driver Release 11.2.0.1.0 says that Prefetch for LONG RAW data types is supported, which is true.
One caveat is that LONG RAW can technically be up to 2GB in size. I had to set a hard max size of 10MB in the code, which is adequate for my use, so far at least. This probably could be a variable sent in to the connection.
This fix is a bit off original topic now however, but it might be useful to someone else.
With Oracle GTTs, it is not be necessary to drop and create each time, and you don't need to worry about data "hanging around." In fact, it's inadvisable to drop and re-create. The structure itself persists, but the data in it does not. The data only persists within each session. You can test this by opening up two separate clients, loading data with one, and you will notice it's not there in the second client.
In effect, each time you open a session, it's like you are reading a completely different table, which was just truncated.
If you want to empty the table within your stored procedure, you can always truncate it. Within a stored proc, you will need to execute immediate if you do this.
This is really handy, but it also can make debugging a bear if you are implementing GTTs through code.
Out of curiosity, why a chunk at a time and not the entire dataset? What kind of data volumes are you talking about?
-- EDIT --
Per our comments conversation, this is very raw and untested, but I hope it will give you an idea what I mean:
CREATE OR REPLACE PROCEDURE LOAD_DATA()
AS
TOTAL_ROWS number;
TABLE_ROWS number := 1;
ROWS_AT_A_TIME number := 100;
BEGIN
select count (*)
into TOTAL_ROWS
from oldtable;
WHILE TABLE_ROWS <= TOTAL_ROWS
LOOP
execute immediate 'truncate table MY_TEMP_TABLE';
insert into MY_TEMP_TABLE
SELECT
MYID, TO_LOB(LONGRAWDATA) as BLOBDATA
FROM oldtable
WHERE ROWNUM between TABLE_ROWS and TABLE_ROWS + ROWS_AT_A_TIME;
insert into MY_REMOTE_TABLE#MY_REMOTE_SERVER
select * from MY_TEMP_TABLE;
commit;
TABLE_ROWS := TABLE_ROWS + ROWS_AT_A_TIME;
END LOOP;
commit;
end LOAD_DATA;
The problem I am trying to solve:
I have a SAS dataset work.testData (in the work library) that contains 8 columns and around 1 million rows. All columns are in text (i.e. no numeric data). This SAS dataset is around 100 MB in file size. My objective is to have a step to parse this entire SAS dataset into Oracle. i.e. sort of like a "copy and paste" of the SAS dataset from the SAS platform to the Oracle platform. The rationale behind this is that on a daily basis, this table in Oracle gets "replaced" by the one in SAS which will enable downstream Oracle processes.
My approach to solve the problem:
One-off initial setup in Oracle:
In Oracle, I created a table called testData with a table structure pretty much identical to the SAS dataset testData. (i.e. Same table name, same number of columns, same column names, etc.).
On-going repeating process:
In SAS, do a SQL-pass through to truncate ora.testData (i.e. remove all rows whilst keeping the table structure). This ensure the ora.testData is empty before inserting from SAS.
In SAS, a LIBNAME statement to assign the Oracle database as a SAS library (called ora). So I can "see" what's in Oracle and perform read/update from SAS.
In SAS, a PROC SQL procedure to "insert" data from the SAS dataset work.testData into the Oracle table ora.testData.
Sample codes
One-off initial setup in Oracle:
Step 1: Run this Oracle SQL Script in Oracle SQL Developer (to create table structure for table testData. 0 rows of data to begin with.)
DROP TABLE testData;
CREATE TABLE testData
(
NODENAME VARCHAR2(64) NOT NULL,
STORAGE_NAME VARCHAR2(100) NOT NULL,
TS VARCHAR2(10) NOT NULL,
STORAGE_TYPE VARCHAR2(12) NOT NULL,
CAPACITY_MB VARCHAR2(11) NOT NULL,
MAX_UTIL_PCT VARCHAR2(12) NOT NULL,
AVG_UTIL_PCT VARCHAR2(12) NOT NULL,
JOBRUN_START_TIME VARCHAR2(19) NOT NULL
)
;
COMMIT;
On-going repeating process:
Step 2, 3 and 4: Run this SAS code in SAS
******************************************************;
******* On-going repeatable process starts here ******;
******************************************************;
*** Step 2: Trancate the temporary Oracle transaction dataset;
proc sql;
connect to oracle (user=XXX password=YYY path=ZZZ);
execute (
truncate table testData
) by oracle;
execute (
commit
) by oracle;
disconnect from oracle;
quit;
*** Step 3: Assign Oracle DB as a libname;
LIBNAME ora Oracle user=XXX password=YYY path=ZZZ dbcommit=100000;
*** Step 4: Insert data from SAS to Oracle;
PROC SQL;
insert into ora.testData
select NODENAME length=64,
STORAGE_NAME length=100,
TS length=10,
STORAGE_TYPE length=12,
CAPACITY_MB length=11,
MAX_UTIL_PCT length=12,
AVG_UTIL_PCT length=12,
JOBRUN_START_TIME length=19
from work.testData;
QUIT;
******************************************************;
**** On-going repeatable process ends here *****;
******************************************************;
The limitation / problem to my approach:
The Proc SQL step (that transfer 100 MB of data from SAS to Oracle) takes around 5 hours to perform - the job takes too long to run!
The Question:
Is there a more sensible way to perform data transfer from SAS to Oracle? (i.e. updating an Oracle table from SAS).
First off, you can do the drop/recreate from SAS if that's a necessity. I wouldn't drop and recreate each time - a truncate seems easier to get the same results - but if you have other reasons then that's fine; but either way you can use execute (truncate table xyz) from oracle or similar to drop, using a pass-through connection.
Second, assuming there are no constraints or indexes on the table - which seems likely given you are dropping and recreating it - you may not be able to improve this, because it may be based on network latency. However, there is one area you should look in the connection settings (which you don't provide): how often SAS commits the data.
There are two ways to control this, the DBCOMMMIT setting and the BULKLOAD setting. The former controls how frequently commits are executed (so if DBCOMMIT=100 then a commit is executed every 100 rows). More frequent commits = less data is lost if a random failure occurs, but much slower execution. DBCOMMIT defaults to 0 for PROC SQL INSERT, which means just make one commit (fastest option assuming no errors), so this is less likely to be helpful unless you're overriding this.
Bulkload is probably my recommendation; that uses SQLLDR to load your data, ie, it batches the whole bit over to Oracle and then says 'Load this please, thanks.' It only works with certain settings and certain kinds of queries, but it ought to work here (subject to other conditions - read the documentation page above).
If you're using BULKLOAD, then you may be up against network latency. 5 hours for 100 MB seems slow, but I've seen all sorts of things in my (relatively short) day. If BULKLOAD didn't work I would probably bring in the Oracle DBAs and have them troubleshoot this, starting from a .csv file and a SQL*LDR command file (which should be basically identical to what SAS is doing with BULKLOAD); they should know how to troubleshoot that and at least be able to monitor performance of the database itself. If there are constraints on other tables that are problematic here (ie, other tables that too-frequently recalculate themselves based on your inserts or whatever), they should be able to find out and recommend solutions.
You could look into PROC DBLOAD, which sometimes is faster than inserts in SQL (though all in all shouldn't really be, and is an 'older' procedure not used too much anymore). You could also look into whether you can avoid doing a complete flush and fill (ie, if there's a way to transfer less data across the network), or even simply shrinking the column sizes.
1
Hello community,
we are currently evaluating possibilities to store generic data structures. We found that at least from a functional point of view Oracle XMLType is a good alternative to the good old BLOB. Because you can query and update single fields from the xml and also create indexes on XPath expressions.
We are a bit worried about the performance of XMLType. Especially the select performance in interesting. We have queries that select multiple data structures at once. These need to be fast.
Such a query looks something like this
SELECT DOC_VALUE.getClobval() AS XML_VALUE FROM XML_TABLE WHERE d.ID = IN ('1','2',...);
Our XML documents are 7 to 8 KB in size. We are on Oracle 11g and create the XML column with type 'XMLTYPE'
Do you have experience about the performance of selects on xml type columns. What overall experiences do you have with XMLTYPE. Is this a robust and fast Oracle feature? Or is it rather something immature and experimental.
Regards, Mathias
XMLDB is a strong and reliable feature we have since 9i.
Flexible-schema XMLType columns are implemented over hidden CLOB columns, while fixed-schema XMLType are decomposed into hidden tables and views; after all XMLType is as reliable as standard objects.
Performance would vary on your use, but just reading the whole XML on an XMLType is fast as reading the same on a classical CLOB, because actually it really just read and give you the XML code stored on CLOB as-is.
You can assign an XDB.XMLIndex to an XMLType, and performance will be good after that. We had a table with only 13,000 records, each containing an XML Clob that wasn't too big, maybe 40 pages each if it was printed on text, and running simple queries without an XMLIndex often took over 10 minutes, and running more complex queries often took 2x or 3x longer!
With an XMLIndex, the performance was on par with a typical table (On the order of milliseconds for full table scans). The XMLIndex can be as complex as you need it to be, I tried this naive one and it worked fine: (I say naive because I do not fully understand the inner workings of this index type!)
CREATE INDEX myschema.my_idx_name ON myschema.mytable
(SYS_MAKEXML(0,"SYS_SOMETHING$"))
INDEXTYPE IS XDB.XMLINDEX
NOPARALLEL;
You should understand the various XMLIndex options available to you, and choose according to your needs. Documentation:
https://docs.oracle.com/cd/B28359_01/appdev.111/b28369/xdb_indexing.htm#CHDJECDA
Is there any subprogram similar to DBMS_METADATA.GET_DDL that can actually export the table data as INSERT statements?
For example, using DBMS_METADATA.GET_DDL('TABLE', 'MYTABLE', 'MYOWNER') will export the CREATE TABLE script for MYOWNER.MYTABLE. Any such things to generate all data from MYOWNER.MYTABLE as INSERT statements?
I know that for instance TOAD Oracle or SQL Developer can export as INSERT statements pretty fast but I need a more programmatically way for doing it. Also I cannot create any procedures or functions in the database I'm working.
Thanks.
As far as I know, there is no Oracle supplied package to do this. And I would be skeptical of any 3rd party tool that claims to accomplish this goal, because it's basically impossible.
I once wrote a package like this, and quickly regretted it. It's easy to get something that works 99% of the time, but that last 1% will kill you.
If you really need something like this, and need it to be very accurate, you must tightly control what data is allowed and what tools can be used to run the script. Below is a small fraction of the issues you will face:
Escaping
Single inserts are very slow (especially if it goes over a network)
Combining inserts is faster, but can run into some nasty parsing bugs when you start inserting hundreds of rows
There are many potential data types, including custom ones. You may only have NUMBER, VARCHAR2, and DATE now, but what happens if someone adds RAW, BLOB, BFILE, nested tables, etc.?
Storing LOBs requires breaking the data into chunks because of VARCHAR2 size limitations (4000 or 32767, depending on how you do it).
Character set issues - This will drive you ¿¿¿¿¿¿¿ insane.
Enviroment limitations - For example, SQL*Plus does not allow more than 2500 characters per line, and will drop whitespace at the end of your line.
Referential Integrity - You'll need to disable these constraints or insert data in the right order.
"Fake" columns - virtual columns, XML lobs, etc. - don't import these.
Missing partitions - If you're not using INTERVAL partitioning you may need to manually create them.
Novlidated data - Just about any constraint can be violated, so you may need to disable everything.
If you want your data to be accurate you just have to use the Oracle utilities, like data pump and export.
Why don't you use regular export ?
If you must you can generate the export script:
Let's assume a Table myTable(Name VARCHAR(30), AGE Number, Address VARCHAR(60)).
select 'INSERT INTO myTable values(''' || Name || ','|| AGE ||',''' || Address ||''');' from myTable
Oracle SQL Developer does that with it's Export feature. DDL as well as data itself.
Can be a bit unconvenient for huge tables and likely to cause issues with cases mentioned above, but works well 99% of the time.