Slow cross-loading from oracle (oracle-fdw) into PostgreSQL - oracle

I created multiple posts in the forum about the performance problem that I have but now after i made some tests and gathered all the info that is needed I'm creating this post.
I have performance issues with two big tables. Those tables are located on an oracle remote database. I'm running the quert :
insert into local_postgresql_table select * from oracle_remote_table.
The first table has 45M records and its size is 23G. The import of the data from the oracle remote database is taking 1 hour and 38 minutes. After that I create 13 regular indexes on the table and it takes 10 minutes per table ->2 hours and 10 minutes in total.
The second table has 29M records and its size is 26G. The import of the data from the oracle remote database is taking 2 hours and 30 minutes. The creation of the indexes takes 1 hours and 30 minutes (some are indexes on one column and the creation takes 5 min and some are indexes on multiples column and it takes 11 min.
Those operation are very problematic for me and I'm searching for a solution to improve the performance. The parameters I assigned :
min_parallel_relation_size = 200MB
max_parallel_workers_per_gather = 5
max_worker_processes = 8
effective_cache_size = 2500MB
work_mem = 16MB
maintenance_work_mem = 1500MB
shared_buffers = 2000MB
RAM : 5G
CPU CORES : 8
-I tried running select count(*) from table in oracle and in postgresql the running time is almost equal.
-Before importing the data I drop the indexes and the constraints.
-I tried to copy a 23G file from the oracle server to the postgresql server and it took me 12 minutes.
Please advice how can I continue ? How can I improve something in this operation ?

Related

EXPDP Running too slow

we have a database with size 16GB.We are running daily backup using EXPDP,but few days ago this EXPDP taking too long to complete(more than 6 hours).
my question is
1) does TABLE LOCKS affect performance of EXPDP(i have checked for table locking and found a number tables were on lock(we are updating tables using some procedures which are set to run multiple times for a day)..
2) will hard disk related issue slow down EXPDP performance??
As per your suggestion i have included the query(While the expdp is Running)
select elapsed_time/1000000 seconds,sql_text,SHARABLE_MEM,PERSISTENT_MEM,RUNTIME_MEM,USERS_EXECUTING,DISK_READS,BUFFER_GETS,USER_IO_WAIT_TIME from gv$sql where users_executing > 0 order by elapsed_time desc;
For this query i am getting more than 20 records and i will share some records
enter image description here

Why does Vertica query_requests table report that a query took a few milliseconds, while it actually took 10 seconds?

I'm running queries against a Vertica table with close to 500 columns and only 100 000 rows.
A simple query (like select avg(col1) from mytable) takes 10 seconds, as reported by the Vertica vsql client with the \timing command.
But when checking column query_requests.request_duration_ms for this query, there's no mention of the 10 seconds, it reports less than 100 milliseconds.
The query_requests.start_timestamp column indicates that the beginning of the processing started 10 seconds after I actually executed the command.
The resource_acquisitions table show no delay in resource acquisition, but its queue_entry_timestamp column also shows the queue entry occurred 10 seconds after I actually executed the command.
The same query run on the same data but on a table with only one column returns immediately. And since I'm running the queries directly on a Vertica node, I'm excluding any network latency issue.
It feels like Vertica is doing something before executing the query. This is taking most of the time, and is related to the number of columns of the table. Any idea what it could be, and what I could try to fix it ?
I'm using Vertica 8, in a test environment with no load.
I was running Vertica 8.1.0-1, it seems the issue was caused by a Vertica bug in the query planning phase causing a performance degradation. It was solved in versions >= 8.1.1 :
https://my.vertica.com/docs/ReleaseNotes/8.1./Vertica_8.1.x_Release_Notes.htm
VER-53602 - Optimizer - This fix improves complex query performance during the query planning phase.

Postgresql 9.6 writing data from remote oracle db is slow

Im using oracle_fdw extension in my postgresql database. I'm trying to copy the data of many tables in the oracle database into my postgresql tables. I'm doing so by running insert into local_postgresql_temp select * from remote_oracle_table. The performance of this operation are very slow and I tried to check the reason for that and mybe choose a different alternative.
1)First method - Insert into local_postgresql_table select * from remote_oracle_table this generated total disk write of 7 M/s and actual disk write of 4 M/s(iotop). For 32G table it took me 2 hours and 30 minutes.
2)second method - copy (select * from oracle_remote_table) to /tmp/dump generates total disk write of 4 M/s and actuval disk write of 100 K/s. The copy utility suppose to be very fast but it seems very slow.
-When I run copy from the local dump, the reading is very fast 300 M/s.
-I created a 32G file on the oracle server and used scp to copy it and it took me a few minutes.
-The wals directory is located on a different file system.
The parameters I assigned :
min_parallel_relation_size = 200MB
max_parallel_workers_per_gather = 5
max_worker_processes = 8
effective_cache_size = 12GB
work_mem = 128MB
maintenance_work_mem = 4GB
shared_buffers = 2000MB
RAM : 16G
CPU CORES : 8
HOW can I increase the writes ? How can I get the data faster from the oracle database to my postgresql database?
I run perf on this entire process and the results :
I'd concentrate on the bits you said were fast.
-I created a 32G file on the oracle server and used scp to copy it and it took me a few minutes.
-When I run copy from the local dump, the reading is very fast 300 M/s.
I'd suggest you combine these two. Use a dump tool (or SQLPLUS) to export the data from Oracle into a file that Postgresql's COPY command can read. You could generate the binary file format directly but its a bit tricky, but generating a CSV separated version etc, shouldn't be too tricky. An example for that is at How do I spool to a CSV formatted file using SQLPLUS?

Cassandra timing out when queried for key that have over 10,000 rows even after giving timeout of 10sec

Im using a DataStax Community v 2.1.2-1 (AMI v 2.5) with preinstalled default settings.
And i have a table :
CREATE TABLE notificationstore.note (
user_id text,
real_time timestamp,
insert_time timeuuid,
read boolean,
PRIMARY KEY (user_id, real_time, insert_time))
WITH CLUSTERING ORDER BY (real_time DESC, insert_time ASC)
AND bloom_filter_fp_chance = 0.01
AND caching = '{"keys":"ALL", "rows_per_partition":"NONE"}
AND **default_time_to_live** = 20160
The other configurations are:
I have 2 nodes. on m3.large having 1 x 32 (SSD).
Im facing the issue of timeouts even if consistency is set to ONE on this particular table.
I increased the heap space to 3gb [ram size of 8gb]
I increased the read timeout to 10 secs.
select count (*) from note where user_id = 'xxx' limit 2; // errors={}, last_host=127.0.0.1.
I am wondering if the problem could be with time to live? or is there any other configuration any tuning that matters for this.
The data in the database is pretty small.
Also this problem occurs not as soon as you insert. This happens after some time (more than 6 hours)
Thanks.
[Copying my answer from here because it's the same environment/problem: amazon ec2 - Cassandra Timing out because of TTL expiration.]
You're running into a problem where the number of tombstones (deleted values) is passing a threshold, and then timing out.
You can see this if you turn on tracing and then try your select statement, for example:
cqlsh> tracing on;
cqlsh> select count(*) from test.simple;
activity | timestamp | source | source_elapsed
---------------------------------------------------------------------------------+--------------+--------------+----------------
...snip...
Scanned over 100000 tombstones; query aborted (see tombstone_failure_threshold) | 23:36:59,324 | 172.31.0.85 | 123932
Scanned 1 rows and matched 1 | 23:36:59,325 | 172.31.0.85 | 124575
Timed out; received 0 of 1 responses for range 2 of 4 | 23:37:09,200 | 172.31.13.33 | 10002216
You're kind of running into an anti-pattern for Cassandra where data is stored for just a short time before being deleted. There are a few options for handling this better, including revisiting your data model if needed. Here are some resources:
The cassandra.yaml configuration file - See section on tombstone settings
Cassandra anti-patterns: Queues and queue-like datasets
About deletes
For your sample problem, I tried lowering the gc_grace_seconds setting to 300 (5 minutes). That causes the tombstones to be cleaned up more frequently than the default 10 days, but that may or not be appropriate based on your application. Read up on the implications of deletes and you can adjust as needed for your application.

Informix query slow

IDS 9.04 on unix.
I got a table which has 200000+ rows ,each row has 200+ columns.
when I execute a query (supposed to return 470+ rows with 50 columns)on this table,it takes 100+ secs to return,and dbvisualizer told me :
eexecution time : 4.87 secs
fetch time : 97.56 secs
if I export all the 470+ rows into a file, the file size will less than 800K
UPDATE STATISTICS has been runned,only 50 columns selected,no blob involved ,if I select first 100 rows ,it only need 5 secs to return.
Plz help !
If SELECT FIRST 100 only takes a few seconds, it suggests that the query-plan for FIRST_ROWS is dramatically different to that for ALL_ROWS.
Try running the query with SET EXPLAIN ON; both with and without the FIRST n. It might give you a clue what's going on.
Use:
set explain on avoid_execute;
YOUR_QUERY
set explain off;
And review the sqexplain.out file in your folder.

Resources