Select takes a long time in a 160GB database - performance

I have an SQLite database of 160GB. There is an index on the id column and stage column of table hwmp.
CREATE INDEX idx_hwmp_id ON hwmp (id, stage);
When I do a count of rows the query returns in 0.09 seconds.
sqlite> select count (*) from hwmp where id = 2000 and stage = 4;
59397
Run Time: real 0.091 user 0.000074 sys 0.080494
However, for a select all the real time is 85 seconds. The user and system time combined is only 2.5 seconds. Why would the real time be so high?
select * from hwmp where id = 2000 and stage = 4;
Run Time: real 85.420 user 0.801639 sys 1.754250
How to fix it? Another query on a sqlite3 database (300MB) used to return in 20ms. Today, it was taking 652ms.
Run Time: real 0.652 user 0.018766 sys 0.010595
There is something wrong with the Linux environment today. I downloaded the same SQLite to my Mac and it ran quickly.
Run Time: real 0.028 user 0.005990 sys 0.010420
It is using the index:
sqlite> explain query plan select * from hwmp where id = 78 and stage = 4;
QUERY PLAN
`--SEARCH hwmp USING INDEX idx_hwmp_id (id=? AND stage=?)
Run Time: real 0.005 user 0.000857 sys 0.000451

The relevant setting is pragma cache_size = 200000; 200000 pages of 4096 bytes. After setting that, for the first time query, it takes approximately 3s and second time query takes 0.28s. Phew.
The cache settings improved the performance for some time. We are working off an AWS linux VM with EBS SSD attached. There seems to be problem in the environment as well. The query times in my Mac is 6.3 times faster than the AWS linux / EBS environment.

Related

Slow cross-loading from oracle (oracle-fdw) into PostgreSQL

I created multiple posts in the forum about the performance problem that I have but now after i made some tests and gathered all the info that is needed I'm creating this post.
I have performance issues with two big tables. Those tables are located on an oracle remote database. I'm running the quert :
insert into local_postgresql_table select * from oracle_remote_table.
The first table has 45M records and its size is 23G. The import of the data from the oracle remote database is taking 1 hour and 38 minutes. After that I create 13 regular indexes on the table and it takes 10 minutes per table ->2 hours and 10 minutes in total.
The second table has 29M records and its size is 26G. The import of the data from the oracle remote database is taking 2 hours and 30 minutes. The creation of the indexes takes 1 hours and 30 minutes (some are indexes on one column and the creation takes 5 min and some are indexes on multiples column and it takes 11 min.
Those operation are very problematic for me and I'm searching for a solution to improve the performance. The parameters I assigned :
min_parallel_relation_size = 200MB
max_parallel_workers_per_gather = 5
max_worker_processes = 8
effective_cache_size = 2500MB
work_mem = 16MB
maintenance_work_mem = 1500MB
shared_buffers = 2000MB
RAM : 5G
CPU CORES : 8
-I tried running select count(*) from table in oracle and in postgresql the running time is almost equal.
-Before importing the data I drop the indexes and the constraints.
-I tried to copy a 23G file from the oracle server to the postgresql server and it took me 12 minutes.
Please advice how can I continue ? How can I improve something in this operation ?

Oracle SQL query improves performance on second and third execution

We are analyzing sql statements on an Oracle 12c database. We noticed that the following statement improved by running several times. How can it be explained that it improves by executing it a second and third time?
SELECT COUNT (*)
FROM asset
WHERE ( ( (status NOT IN ( 'x1', 'x2', 'x3'))
AND ( (siteid = 'xxx')))
AND (EXISTS
(SELECT siteid
FROM siteauth a, groupuser b
WHERE a.groupname = b.groupname
AND b.userid = 'xxx'
AND a.siteid = asset.siteid)))
AND ( (assetnum LIKE '5%'));
First run: 24 Sec.
Second run: 17 Sec.
Third run: 7 Sec.
Fourth run:7 Sec.
Tuned by using result cash: 0,003 Sec.
Oracle does not cache results of queries by default, but caches data blocks used by the query. Also 12c has features like "Adaptive execution plans" and "Cardinality feedback" which might enforce execution plan changes between executions even if table statistics were not re-calculated.
Oracle fetches data from disc into memory. The second time you run the query the data is found in memory so no disc reads are necessary. Resulting in faster query execution.
The database is "warmed up".

Performance of SASHELP views versus SQL dictionary tables

Why does it take longer for SAS to create a dataset from a data step view using, for example, sashelp.vcolumn versus the equivalent SQL table dictionary.columns?
I did a test using fullstimer and it seems to confirm my suspicion of performance differences.
option fullstimer;
data test1;
set sashelp.vcolumn;
where libname = 'SASHELP' and
memname = 'CLASS' and
memtype = 'DATA';
run;
proc sql;
create table test2 as
select *
from dictionary.columns
where libname = 'SASHELP' and
memname = 'CLASS' and
memtype = 'DATA';
quit;
An excerpt from the log:
NOTE: There were 5 observations read from the data set SASHELP.VCOLUMN.
WHERE (libname='SASHELP') and (memname='CLASS') and (memtype='DATA');
NOTE: The data set WORK.TEST1 has 5 observations and 18 variables.
NOTE: DATA statement used (Total process time):
real time 0.67 seconds
user cpu time 0.23 seconds
system cpu time 0.23 seconds
memory 3820.75k
OS Memory 24300.00k
Timestamp 04/13/2015 09:42:21 AM
Step Count 5 Switch Count 0
NOTE: Table WORK.TEST2 created, with 5 rows and 18 columns.
NOTE: PROCEDURE SQL used (Total process time):
real time 0.03 seconds
user cpu time 0.01 seconds
system cpu time 0.00 seconds
memory 3267.46k
OS Memory 24300.00k
Timestamp 04/13/2015 09:42:21 AM
Step Count 6 Switch Count 0
The memory used is a little higher for SASHELP but the difference isn't huge. Note the time--it's 22 times longer using SASHELP than with the SQL dictionary. Surely it can't just be due to the relatively small difference in memory usage.
At #Salva's suggestion, I resubmitted the code in a new SAS session, this time running the SQL step before the data step. The memory and time differences are even more pronounced:
| sql | sashelp
----------------+-----------+-----------
real time | 0.28 sec | 1.84 sec
user cpu time | 0.00 sec | 0.25 sec
system cpu time | 0.00 sec | 0.24 sec
memory | 3164.78k | 4139.53k
OS Memory | 10456.00k | 13292.00k
Step Count | 1 | 2
Switch Count | 0 | 0
Some (if not all) of this is the difference in overhead between SQL and Data Step. For example:
proc sql;
create table test2 as
select *
from sashelp.vcolumn
where libname = 'SASHELP' and
memname = 'CLASS' and
memtype = 'DATA';
quit;
Also very fast.
The SAS page about Dictionary Tables gives some information that is likely the main explanation.
When querying a DICTIONARY table, SAS launches a discovery process
that gathers information that is pertinent to that table. Depending on
the DICTIONARY table that is being queried, this discovery process can
search libraries, open tables, and execute views. Unlike other SAS
procedures and the DATA step, PROC SQL can mitigate this process by
optimizing the query before the discovery process is launched.
Therefore, although it is possible to access DICTIONARY table
information with SAS procedures or the DATA step by using the SASHELP
views, it is often more efficient to use PROC SQL instead.
In my experience, using the sashelp views is slower than using proc datasets. This is more so if you have a lot of libraries assigned, especially external ones:
10 proc datasets lib=sashelp noprint;
11 contents data=class out=work.test2;
12 quit;
NOTE: The data set WORK.TEST2 has 5 observations and 40 variables.
NOTE: PROCEDURE DATASETS used (Total process time):
real time 0.01 seconds
user cpu time 0.00 seconds
system cpu time 0.01 seconds
memory 635.12k
OS Memory 9404.00k
Timestamp 14.04.2015 kl 10.22

Informix query slow

IDS 9.04 on unix.
I got a table which has 200000+ rows ,each row has 200+ columns.
when I execute a query (supposed to return 470+ rows with 50 columns)on this table,it takes 100+ secs to return,and dbvisualizer told me :
eexecution time : 4.87 secs
fetch time : 97.56 secs
if I export all the 470+ rows into a file, the file size will less than 800K
UPDATE STATISTICS has been runned,only 50 columns selected,no blob involved ,if I select first 100 rows ,it only need 5 secs to return.
Plz help !
If SELECT FIRST 100 only takes a few seconds, it suggests that the query-plan for FIRST_ROWS is dramatically different to that for ALL_ROWS.
Try running the query with SET EXPLAIN ON; both with and without the FIRST n. It might give you a clue what's going on.
Use:
set explain on avoid_execute;
YOUR_QUERY
set explain off;
And review the sqexplain.out file in your folder.

Why is there such a big difference in the execution time of a query ran by ADF and in SQL Developer

I have a strange issue with a query running in my JDeveloper ADF web application. It is a simple search form issuing a select statement to Oracle 10g database. When the search is submitted, ADF framework is (first) running the query, and (second) running the same query wrapped within "select count(1) from (...query...)" - the goal here is to obtain the total number of rows, and to display the "Next 10 results" navigation controls.
So far, so good. Trouble comes from the outrageous performance I am getting from the second query (the one with "count(1)" in it). To investigate the issue, I copied/pasted/ran the query in SQL Developer and was surprised to see much better response.
When comparing the query execution in ADF and SQL Developer, I took all measures to ensure representative environment for both executions:
- freshly restarted database
- same for the OC4J
This way I can be sure that the difference is not related to caching and/or buffering, in both cases the db and the application server were freshly (re)started.
The traces I took for both sessions illustrate the situation:
Query ran in ADF:
call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.00 0.00 0 0 0 0
Execute 1 0.97 0.97 0 0 0 0
Fetch 1 59.42 152.80 35129 1404149 0 1
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 3 60.39 153.77 35129 1404149 0 1
Same query in SQL Developer:
call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.00 0.00 0 0 0 0
Execute 1 1.02 1.16 0 0 0 0
Fetch 1 1.04 3.28 4638 4567 0 1
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 3 2.07 4.45 4638 4567 0 1
Thanks in advance for any comments or suggestions!
Ok, I finally found the explanation of this ghastly behaviour. To make the long story short, the answer is in the definition (Tuning parameters) of my ViewObject in JDeveloper. What I was missing were these two important parameters:
FetchMode="FETCH_AS_NEEDED"
FetchSize="10"
Without them, the following happens - ADF runs the main query, binds the variables and fetches the results. Then, in an attempt to make an estimate of the rowcount, it launches the same query enclosed in "select count(1) from (my_query)", but ...(drum roll)... WITHOUT BINDING THE VARIABLES!!! It really beats me what is the use of estimating the rowcount without taking into account the actual values of the bind variables!
Anyway, it's all in the definition of the ViewObject: the following settings needed to be set, in order to get the expected behaviour:
All Rows in Batches of: 10
(checked) As Needed
(unchecked) Fill Last Page of Rows when Paging through Rowset
The execution plan could not help me (it was identical for both ADF and SQL Developer), the difference was only visible in a trace file taken with binds.
So, now my problem is solved - thanks to all for the tips that finally led me to the resolution!
The query with count is slower because it has to read all the data (to count it).
When you run the other query, you are only fetching a first page of data, so the execution (reading from the cursor) can stop after you have your first ten results.
Try loading to 100th page with your first query, it will likely be much slower than the first page.
If selecting a count online is too expensive, a common trick is to select one item more than you need (11 in your case) to determine if there is more data. You cannot show a page count, but at least a "next page" button.
Update: Are you saying the count query is only slow when run through ADF, but fast through SQL Developer?
If it is the same query, i can think of:
Different settings in ADF vs SQL Developer (have you tried with SQL*Plus?)
Binding variables of incorrect type in the slow case
But without the execution plans or the SQL, it is hard to say
Over the years I've found that "SELECT COUNT..." is often a source of unexpected slowdowns.
If I understand the results posted above, the query takes 153 seconds from JDeveloper, but only about 4.5 seconds from SQL Developer, and you're going to use this query to determine if the "Next 10 Results" control should be displayed.
I don't know that it matters if the runtime is 4.5 seconds or 153 seconds - even the best case seems rather slow for initializing a page. Assume for a moment that you can get the query to respond in 4.5 seconds when submitted from the page - that's still a long time to make a user sit and wait when they're only a mouse-click away from going off to do Something Else. In that same 4.5 seconds the app might be able to fetch enough data to load the page a few times.
I think #Thilo's idea of fetching one more record than is needed to fill the page to determine if there is more data available is a good one. Perhaps this could be adapted to your situation?
Share and enjoy.

Resources