I have an ETL Spoon that read a table from Postgres and write into Oracle.
No transformation, no sort. SELECT col1, col2, ... col33 from table.
350 000 rows in input. The performance is 40-50 rec/sec.
I try to read/write the same table from PS to PS with ALL columns (col1...col100) I have 4-5 000 rec/sec
The same if I read/write from Oracle to Oracle: 4-5 000 rec/sec
So, for me, is not a network problem.
If I try with another table Postgres and only 7 columns, the performances are good.
Thanks for the help.
It happened same in my case also, while loading data from Oracle and running it on my local machine(Windows) the processing rate was 40 r/s but it was 3000 r/s for Vertica database.
I couldn't figure it out what was the exact problem but I found a way to increase the row count. It worked from me. you can also do the same.
Right click on the Table Input steps, you will see "Change Number Of Copies to Start"
Please include below in the where clause, This is to avoid duplicates. Because when you choose the option "Change Number Of Copies to Start" the query will be triggered N number of time and return duplicates but keeping below code in where clause will get only distinct records
where ora_hash(v_account_number,10)=${internal.step.copynr}
v_account_number is primary key in my case
10 is, say for example you have chosen 11 copies to start means, 11 - 1 = 10 so it is up to you to set.
Please note this will work, I suggest you to use on local machine for testing purpose but on the server definitely you will not face this issue. so comment the line while deploying to servers.
Related
I work on a project to transfer data from an Oracle database to a PostgreSQL database to create a datawarehouse with bash & SQL scripts. To access to the Oracle database, I use the PostgreSQL extension oracle-fdw.
One of my scripts import data from a massive table (~ 100 000 000 new rows/day). This table is partitioned and each partition contains 1 day of data. The query I use to import data looks like that :
INSERT INTO postgre_target_table (some_fields)
SELECT some_aggregated_fields -- (~150 fields)
FROM oracle_source_table
WHERE partition_id = :v_partition_id AND some_others_filters
GROUP BY primary_key;
On DEV server, the query works fine (there is much less data on this server) but in PREPROD, it returns the error ORA-01406: fetched column value was truncated.
In some posts, people say that the output fields may be too small but if I try to send a simple SELECT query without INSERT or GROUP BY I have the same error.
Another idea I found in another post is to create an Oracle side view but in my query I use multiple parameters that I cannot use in a view.
The last idea I found is to create an Oracle stored procedure that fills a table with aggregated data and then import data from this table but the Oracle database is critical and my customer prefers to avoid adding more data on it.
Now, I'm starting to think there's no solution and it's not good...
PostgreSQL version : 12.4 / Oracle version : 11.2
UPDATE
It seems my problem is more complecated than I thought.
After applying the modification given by Laurenz Albe, the query runs correctly on PGAdmin but the problem still appears when I use psql command.
Moreover, another query seems to have the same problem. This other query does not use the same source table as the first query, it uses 4 joined tables without any partition. The common point between these queries is the structure.
The detail I omit to specify in the original post is that the purpose of both queries is to pivot a table. They look like that :
SELECT osr.id,
MIN(CASE osr.category
WHEN 123 THEN
1
END) AS field1,
MIN(CASE osr.category
WHEN 264 THEN
1
END) AS field2,
MIN(CASE osr.category
WHEN 975 THEN
1
END) AS field3,
...
FROM oracle_source_table osr
WHERE osr.category IN (123, 264, 975, ...)
GROUP BY osr.id;
Now that I have detailed what the queries look like, I can give you some results I had with the second one without changing the value of max_long (this query is lighter than the first one) :
Sometimes it works (~10%), sometimes it failed (~90%) on PGadmin but it never works with psql command
If I delete the WHERE, it always works
I don't understand why deleting the WHERE change something, the field used in this clause is a NUMBER(6, 0) between 0 and 2500 and it is still used in the SELECT clause... Oh and in the 4 Oracle tables used by this query, there is no LONG datatype, only NUMBER datatype is used.
Among 20 queries I have, only these two have a problem, their structure is similar and I don't believe in coincidences.
Don't despair!
Set the max_long option on the foreign table big enough that all your oversized data fit.
The documentation has the details:
max_long (optional, defaults to "32767")
The maximal length of any LONG, LONG RAW and XMLTYPE columns in the Oracle table. Possible values are integers between 1 and 1073741823 (the maximal size of a bytea in PostgreSQL). This amount of memory will be allocated at least twice, so large values will consume a lot of memory.
If max_long is less than the length of the longest value retrieved, you will receive the error message
ORA-01406: fetched column value was truncated
Example:
ALTER FOREIGN TABLE my_tab OPTIONS (ADD max_long '1000000');
My Oracle 11.2.0.3 FULL DATABASE Datapump Export is very slow, when i ask V$SESSION_LONGOPS
SELECT USERNAME,OPNAME,TARGET_DESC,SOFAR,TOTALWORK,MESSAGE,SYSDATE,ROUND(100*SOFAR/TOTALWORK,2)||'%' COMPLETED FROM V$SESSION_LONGOPS
where SOFAR/TOTALWORK!=1
it show me 2 records, in opname one containing the SYS_EXPORT_FULL_XX, and another "Rowid Range Scan" and the message for the last one is
Rowid Range Scan : MY_SCHEMA.BIG_TABLE: 28118329 out of 30250532 Blocks done and it takes hours and hours.
I.E : MY_SCHEMA.BIG_TABLE is a 220 GB table size having 2 CLOB colunn.
If you have CLOBs in the table it will take a long time to export because that wont parallelize. Exactly what phase are you stuck in? Could you paste the last lines from the log file or get a status from data pump?
There are some best practices that you could try out:
SecureFile LOBs can be faster than BasicFile LOBs. That is yet another reason for going to SecureFile LOBs.
You could try to increase the STREAMS_POOL_SIZE to 256 MB (at least) although I think that is not the reason.
Use PARALLEL option and set it to 2 x CPU cores. Never export statistics - it is better to either export using DBMS_STATS or regather at target database.
Regards,
Daniel
Well for 11g and 12cR1 the Streams AQ Enqueue is a common culprit for this as well. If you ALTER SYSTEM SET EVENTS 'IMMEDIATE TRACE NAME MMAN_CREATE_DEF_REQUEST LEVEL 6' this will help if the issue is the very common Streams AQ Enqueue.
I'm running queries against a Vertica table with close to 500 columns and only 100 000 rows.
A simple query (like select avg(col1) from mytable) takes 10 seconds, as reported by the Vertica vsql client with the \timing command.
But when checking column query_requests.request_duration_ms for this query, there's no mention of the 10 seconds, it reports less than 100 milliseconds.
The query_requests.start_timestamp column indicates that the beginning of the processing started 10 seconds after I actually executed the command.
The resource_acquisitions table show no delay in resource acquisition, but its queue_entry_timestamp column also shows the queue entry occurred 10 seconds after I actually executed the command.
The same query run on the same data but on a table with only one column returns immediately. And since I'm running the queries directly on a Vertica node, I'm excluding any network latency issue.
It feels like Vertica is doing something before executing the query. This is taking most of the time, and is related to the number of columns of the table. Any idea what it could be, and what I could try to fix it ?
I'm using Vertica 8, in a test environment with no load.
I was running Vertica 8.1.0-1, it seems the issue was caused by a Vertica bug in the query planning phase causing a performance degradation. It was solved in versions >= 8.1.1 :
https://my.vertica.com/docs/ReleaseNotes/8.1./Vertica_8.1.x_Release_Notes.htm
VER-53602 - Optimizer - This fix improves complex query performance during the query planning phase.
We are trying to pull data from an oracle database but seem to be getting very low performance.
We have a table of around 10M rows and we have an index via which we are pulling around 1.3k rows {select * from tab where indexed_field = 'value'} (in a simplified form).
SQuirreL reports the query taking "execution: 0.182s, building output: 28.921s". The returned data occupies something like 340kB (eg, when copied/pasted into a text file).
Sometimes the building output phase takes much longer (>5 minutes), particularly the first time a query is run. Repeating it seems to run much faster - eg the 29s value above. Is this likely to just be the result of a transient overload on the database, of might it be due to buffering the repeat data?
Is a second per 50 rows (13kB) a reasonable figure or is this unexpectedly large? (This is unlikely to be a network issue.)
Is it possible that the dbms if failing to leverage the fact that the data could be grouped physically (by having the physical order the same as the index order) and is doing a separate disk read per row, and if so how can it be persuaded to be more efficient?
There isn't much odd about the data - 22 columns per row, mostly defined as varchar2(250) though usually containing a few tens of chars. I'm not sure how big the ironware running Oracle is, but it lives in a datacentre so probably not too puny.
Any thoughts gratefully received.
kfinity> Have you tried setting your fetch size larger, like 500 or so?
That's the one! Speeds it up by an order of magnitude. 1.3k rows in 2.5s, 9.5k rows in 19s. Thanks for that suggestion.
BTW, doing select 1 only provides a speedup of about 10%, which I guess suggests that disk access wasn't the bottleneck.
others>
The fetch plan is:
Operation Options Object Mode Cost Bytes Cardinality
0 SELECT STATEMENT ALL_ROWS 6 17544 86
1 TABLE ACCESS BY INDEX ROWID BATCHED TAB ANALYZED 6 17544 86
2 INDEX RANGE SCAN TAB_IDX ANALYZED 3 86
which, with my limited understanding, looks OK.
The "sho parameter" things didn't work (SQL errors), apart from the select which gave:
Oracle Database 12c Enterprise Edition Release 12.1.0.2.0 - 64bit Production
PL/SQL Release 12.1.0.2.0 - Production
CORE 12.1.0.2.0 Production
TNS for Linux: Version 12.1.0.2.0 - Production
NLSRTL Version 12.1.0.2.0 - Production
I guess the only outstanding question is "what's the downside of setting the fetch size to a large value?". Given that we will always end up reading the entire result set (unless there is an exception) my guess would be "not much". Is that right?
Anyway, many thanks to those who responded and a big thanks for the solution.
1.3k rows on a table of 10M rows for oracle is not too big.
The reason why second results are faster than first results is that oracle load data in RAM on the fisrt query and just read it from RAM on the second.
Are you sure that the index is well used ? Maybe you can do an explain plan and show us the result ?
Few immediate actions to be taken are:
Rebuild the index on table.
Gather the stats on table.
execute following before rerun the query to extract execution plan.
sql> set autotrace traceonly enable ;
turn this off by:
sql> set autotrace off ;
Also,provide result of following :
sql> sho parameter SGA
sql> sho parameter cursor
sql> select banner from v$version;
Abhi
The problem I am trying to solve:
I have a SAS dataset work.testData (in the work library) that contains 8 columns and around 1 million rows. All columns are in text (i.e. no numeric data). This SAS dataset is around 100 MB in file size. My objective is to have a step to parse this entire SAS dataset into Oracle. i.e. sort of like a "copy and paste" of the SAS dataset from the SAS platform to the Oracle platform. The rationale behind this is that on a daily basis, this table in Oracle gets "replaced" by the one in SAS which will enable downstream Oracle processes.
My approach to solve the problem:
One-off initial setup in Oracle:
In Oracle, I created a table called testData with a table structure pretty much identical to the SAS dataset testData. (i.e. Same table name, same number of columns, same column names, etc.).
On-going repeating process:
In SAS, do a SQL-pass through to truncate ora.testData (i.e. remove all rows whilst keeping the table structure). This ensure the ora.testData is empty before inserting from SAS.
In SAS, a LIBNAME statement to assign the Oracle database as a SAS library (called ora). So I can "see" what's in Oracle and perform read/update from SAS.
In SAS, a PROC SQL procedure to "insert" data from the SAS dataset work.testData into the Oracle table ora.testData.
Sample codes
One-off initial setup in Oracle:
Step 1: Run this Oracle SQL Script in Oracle SQL Developer (to create table structure for table testData. 0 rows of data to begin with.)
DROP TABLE testData;
CREATE TABLE testData
(
NODENAME VARCHAR2(64) NOT NULL,
STORAGE_NAME VARCHAR2(100) NOT NULL,
TS VARCHAR2(10) NOT NULL,
STORAGE_TYPE VARCHAR2(12) NOT NULL,
CAPACITY_MB VARCHAR2(11) NOT NULL,
MAX_UTIL_PCT VARCHAR2(12) NOT NULL,
AVG_UTIL_PCT VARCHAR2(12) NOT NULL,
JOBRUN_START_TIME VARCHAR2(19) NOT NULL
)
;
COMMIT;
On-going repeating process:
Step 2, 3 and 4: Run this SAS code in SAS
******************************************************;
******* On-going repeatable process starts here ******;
******************************************************;
*** Step 2: Trancate the temporary Oracle transaction dataset;
proc sql;
connect to oracle (user=XXX password=YYY path=ZZZ);
execute (
truncate table testData
) by oracle;
execute (
commit
) by oracle;
disconnect from oracle;
quit;
*** Step 3: Assign Oracle DB as a libname;
LIBNAME ora Oracle user=XXX password=YYY path=ZZZ dbcommit=100000;
*** Step 4: Insert data from SAS to Oracle;
PROC SQL;
insert into ora.testData
select NODENAME length=64,
STORAGE_NAME length=100,
TS length=10,
STORAGE_TYPE length=12,
CAPACITY_MB length=11,
MAX_UTIL_PCT length=12,
AVG_UTIL_PCT length=12,
JOBRUN_START_TIME length=19
from work.testData;
QUIT;
******************************************************;
**** On-going repeatable process ends here *****;
******************************************************;
The limitation / problem to my approach:
The Proc SQL step (that transfer 100 MB of data from SAS to Oracle) takes around 5 hours to perform - the job takes too long to run!
The Question:
Is there a more sensible way to perform data transfer from SAS to Oracle? (i.e. updating an Oracle table from SAS).
First off, you can do the drop/recreate from SAS if that's a necessity. I wouldn't drop and recreate each time - a truncate seems easier to get the same results - but if you have other reasons then that's fine; but either way you can use execute (truncate table xyz) from oracle or similar to drop, using a pass-through connection.
Second, assuming there are no constraints or indexes on the table - which seems likely given you are dropping and recreating it - you may not be able to improve this, because it may be based on network latency. However, there is one area you should look in the connection settings (which you don't provide): how often SAS commits the data.
There are two ways to control this, the DBCOMMMIT setting and the BULKLOAD setting. The former controls how frequently commits are executed (so if DBCOMMIT=100 then a commit is executed every 100 rows). More frequent commits = less data is lost if a random failure occurs, but much slower execution. DBCOMMIT defaults to 0 for PROC SQL INSERT, which means just make one commit (fastest option assuming no errors), so this is less likely to be helpful unless you're overriding this.
Bulkload is probably my recommendation; that uses SQLLDR to load your data, ie, it batches the whole bit over to Oracle and then says 'Load this please, thanks.' It only works with certain settings and certain kinds of queries, but it ought to work here (subject to other conditions - read the documentation page above).
If you're using BULKLOAD, then you may be up against network latency. 5 hours for 100 MB seems slow, but I've seen all sorts of things in my (relatively short) day. If BULKLOAD didn't work I would probably bring in the Oracle DBAs and have them troubleshoot this, starting from a .csv file and a SQL*LDR command file (which should be basically identical to what SAS is doing with BULKLOAD); they should know how to troubleshoot that and at least be able to monitor performance of the database itself. If there are constraints on other tables that are problematic here (ie, other tables that too-frequently recalculate themselves based on your inserts or whatever), they should be able to find out and recommend solutions.
You could look into PROC DBLOAD, which sometimes is faster than inserts in SQL (though all in all shouldn't really be, and is an 'older' procedure not used too much anymore). You could also look into whether you can avoid doing a complete flush and fill (ie, if there's a way to transfer less data across the network), or even simply shrinking the column sizes.