I have a heavy query which spools data into a csv file that is sent to users. I have manually made parallel sessions and am executing the query with filter condition so that i can join all spooled files at the end into one single file thus reducing the time to generate the data (usually it takes about 10 hours, with parallel sessions it takes 2.5-3 hours).
My query is how can I automate this such that the script will find out max(agreementid) and then distribute it into X number of spool calls to generate X files where each file will be having max 100000 record say.
Additional Explanation: I guess my question was not very clear. I will try and explain again.
I have a table/view with large amount of data.
I need to spool this data into a CSV file.
It takes humongous amount of time to spool the CSV file.
I run parallel spools by doing below.
a) Select .... from ... where agreementid between 1 to 1000000;
b) Select .... from ... where agreementid between 1000001 to 2000000;
and so on and then spooling them individually in multiple sessions.
This helps me to generate multiple file which I can then stictch together and share with users.
I need a script (i guess dos based or AIX based) which will find the min and max of agreementID from my table and create the spooling scripts automatically and execute them through separate sessions of sql so that I get the files generated automatically.
Not sure whether I could make myself clear enough.
Thanks guys for replying to my earlier query but that was not what I was looking at.
A bit unclear what you want, but I think you want a query to find a low/high range of agreement_ids for x groups of ids (buckets). If so, try something like (using 4 buckets in this example):
select bucket, min(agreement_id), max(agreement_id), count(1)
from (
select agreement_id, ntile(4) over (order by agreement_id) bucket
from my_table
)
group by bucket;
Edit: If your problem is in messing with spooling multiple queries and combining, I would rather opt for creating a single materialized view (using parallel in the underlying query on the driving table) and refresh (complete, atomic_refresh=>false) when needed. Once refreshed, simply extract from the snapshot table (to a csv or whatever format you want).
There might be a simpler way, but this generates four 'buckets' of IDs, and you could plug the min and max values into your parametrized filter condition:
select bucket, min(agreementid) as min_id, max(agreementid) as max_id
from (
select agreementid,
case when rn between 1 and cn / 4 then 1
when rn between (cn / 4) - 1 and 2 * (cn / 4) then 2
when rn between (2 * cn / 4) - 1 and 3 * (cn / 4) then 3
when rn between (3 * cn / 4) - 1 and cn then 4
end as bucket
from (
select agreementid, rank() over (order by agreementid) as rn,
count(*) over () as cn from agreements
)
)
group by bucket;
If you wanted an upper limit for each bucket rather than a fixed number of buckets then you could do:
select floor(rn / 100000), min(agreementid) as min_id, max(service_num) as max_id
from (
select agreementid, rank() over (order by agreementid) as rn
from agreements
)
group by floor(rn / 100000);
And then pass each min/max to a SQL script, e.g. from a shell script calling SQL*Plus. The bucket number could be passed as well and be used as part of the spool file name, via a positional parameter.
I'm curious about what you've identified as the bottleneck though; have you tried running it as a parallel query inside the database, with a /*+ PARALLEL */ hint?
Related
All the Oracles out here,
I have an Oracle PL/SQL procedure but very small data that can run on the query. I suspect that when the data gets large, the query might start performing back. Are there ways in which I can check for performance and take corrective measure even before the data build up? If I wait for the data buildup, it might get too late.
Do you have any general & practical suggestions for me? Searching the internet did not get me anything convincing.
Better to build yourself some test data to get an idea of how things will perform. Its easy to get started, eg
create table MY_TEST as select * from all_objects;
gives you approx 50,000 rows typically. You can scale that easily with
create table MY_TEST as select a.* from all_objects a ,
( select 1 from dual connect by level <= 10);
Now you have 500,000 rows
create table MY_TEST as select a.* from all_objects a ,
( select 1 from dual connect by level <= 10000);
Now you have 500,000,000 rows!
If you want unique values per row, then add rownum eg
create table MY_TEST as select rownum r, a.* from all_objects a ,
( select 1 from dual connect by level <= 10000);
If you want (say) 100,000 distinct values in a column then TRUNC or MOD. You can also use DBMS_RANDOM to generate random numbers, strings etc.
Also check out Morten's test data generator
https://github.com/morten-egan/testdata_ninja
for some domain specific data, and also the Oracle sample schemas on github which can also be scaled using techniques above.
https://github.com/oracle/db-sample-schemas
Some tables' data need to be updated(or deleted , inserted) in my system.
But I want to know which data are updated,deleted and inserted.
So before the data are changed ,I will backup the table in different schema
just like this:
create table backup_table as select * from schema1.testtable
and after the data are changed,I want to find the difference between backup_table
and testtable ,and I want to save the difference into a table in the backup schema.
the sql I will run is like this:
CREATE TABLE TEST_COMPARE_RESULT
AS
SELECT 'BEFORE' AS STATUS, T1.*
FROM (
SELECT * FROM backup_table
MINUS
SELECT * FROM schema1.testtable
) T1
UNION ALL
SELECT 'AFTER' AS STATUS, T2.*
FROM (
SELECT * FROM schema1.testtable
MINUS
SELECT * FROM backup_table
) T2
What I am worried about is that I heared about that the minus operation will use
a lot of system resource.In my sysytem, some table size will be over 700M .So I want to
know how oracle will read the 700M data in memory (PGA??) or the temporary tablespace?
and How I should make sure that the resource are enough to to the compare operation?
Minus is indeed a resource intensive task. It need to read both tables and do sorts to compare the two tables. However, Oracle has advanced techniques to do this. It won't load the both tables in memory(SGA) if can't do it. It will use, yes, temporary space for sorts. But I would recommend you to have a try. Just run the query and see what happens. The database won't suffer and allways you can stop the execution of statement.
What you can do to improve the performance of the query:
First, if you have columns that you are sure that won't changed, don't include them.
So, is better to write:
select a, b from t1
minus
select a, b from t2
than using a select * from t, if there are more than these two columns, because the work is lesser.
Second, if the amount of data to compare si really big for your system(too small temp space), you should try to compare them on chunks:
select a, b from t1 where col between val1 and val2
minus
select a, b from t2 where col between val1 and val2
Sure, another possibility than minus is to have some log columns, let's say updated_date. selecting with where updated_date greater than start of process will show you updated records. But this depends on how you can alter the database model and etl code.
I am facing a wierd problem where the same query is returning different results.
My query is:
SELECT * FROM TX_HISTORY WHERE acct = 7 AND ROWNUM
What is happening is that I know that for this account there are more than 100 records in tx_history. I want to get the first 100 records based on the processing date.
My data for this account is I have records from 2004 till 2011
The problem is sometimes it correctly shows the 100 records starting 2004 - but sometimes it shows me 100 records starting 2005
I read that this can be solved by:
SELECT * FROM (select * from TX_HISTORY WHERE acct = 7 ORDER BY acct,processing_date)
where rownum
so in my earlier query is it that the:
1> My understanding is that the order by is being applied after the rownum <= 100 and the results returned by oracle are in a random order on which row num is filtering
Though what is not understood why the results would vary
Thanks,
~akila
If you do not specify any ordering (and in this case, as you already found out, you do not order the data being retrieved, you only sort afterwards), it is up to the database to return them in any order it sees fit.
It could for example just start reading the rows in the order they are stored, which changes as the data gets updated. It also does not have to start from the top of the table, it could start with the blocks already in the buffer cache.
Since you did not specify the order, the DB will choose (what it thinks to be) the least expensive way available to it at this particular moment.
Try this:
select top(100) from ...........
its give top 100 rows which u want.
If you include AND RowNum <= 100 Oracle will pull 100 records at free will. If you put it in
SELECT *
FROM TX_HISTORY
WHERE acct = 7
AND ROWNUM <= 100
ORDER BY acct,processing_date
it is performed on all records there are.
However, if you have
SELECT *
FROM (select *
from TX_HISTORY
WHERE acct = 7
ORDER BY acct,processing_date)
where rownum <= 100
it is performed on the records returned in the sub-select (the SELECT within the ( ). In other words Oracle uses a different set of records to perform AND RowNum <= 100 on.
The ordering is performed on the records returned by the query, so it happens after the WHERE-clause. So you will probably still get varying results.
I hope I could make it clearer.
I am trying to execute a query like
select * from tableName where rownum=1
This query is basically to fetch the column names of the table.There are more than million records in the table.When I put the above condition its taking so much time to fetch the first row.Is there any alternate to get the first row.
This question has already been answered, I will just provide an explanation as to why sometimes a filter ROWNUM=1 or ROWNUM <= 1 may result in a long response time.
When encountering a ROWNUM filter (on a single table), the optimizer will produce a FULL SCAN with COUNT STOPKEY. This means that Oracle will start to read rows until it encounters the first N rows (here N=1). A full scan reads blocks from the first extent to the high water mark. Oracle has no way to determine which blocks contain rows and which don't beforehand, all blocks will therefore be read until N rows are found. If the first blocks are empty, it could result in many reads.
Consider the following:
SQL> /* rows will take a lot of space because of the CHAR column */
SQL> create table example (id number, fill char(2000));
Table created
SQL> insert into example
2 select rownum, 'x' from all_objects where rownum <= 100000;
100000 rows inserted
SQL> commit;
Commit complete
SQL> delete from example where id <= 99000;
99000 rows deleted
SQL> set timing on
SQL> set autotrace traceonly
SQL> select * from example where rownum = 1;
Elapsed: 00:00:05.01
Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=ALL_ROWS (Cost=7 Card=1 Bytes=2015)
1 0 COUNT (STOPKEY)
2 1 TABLE ACCESS (FULL) OF 'EXAMPLE' (TABLE) (Cost=7 Card=1588 [..])
Statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
33211 consistent gets
25901 physical reads
0 redo size
2237 bytes sent via SQL*Net to client
278 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
1 rows processed
As you can see the number of consistent gets is extremely high (for a single row). This situation could be encountered in some cases where for example, you insert rows with the /*+APPEND*/ hint (thus above high water mark), and you also delete the oldest rows periodically, resulting in a lot of empty space at the beginning of the segment.
Try this:
select * from tableName where rownum<=1
There are some weird ROWNUM bugs, sometimes changing the query very slightly will fix it. I've seen this happen before, but I can't reproduce it.
Here are some discussions of similar issues: http://jonathanlewis.wordpress.com/2008/03/09/cursor_sharing/ and http://forums.oracle.com/forums/thread.jspa?threadID=946740&tstart=1
Surely Oracle has meta-data tables that you can use to get column names, like the sysibm.syscolumns table in DB2?
And, after a quick web search, that appears to be the case: see ALL_TAB_COLUMNS.
I'd use those rather than go to the actual table, something like (untested):
SELECT COLUMN_NAME
FROM ALL_TAB_COLUMNS
WHERE TABLE_NAME = "MYTABLE"
ORDER BY COLUMN_NAME;
If you are hell-bent on finding out why your query is slow, you should revert to the standard method: asking your DBMS to explain the execution plan of the query for you. For Oracle, see section 9 of this document.
There's a conversation over at Ask Tom - Oracle that seems to suggest the row numbers are created after the select phase, which may mean the query is retrieving all rows anyway. The explain will probably help establish that. If it contains FULL without COUNT STOPKEY, then that may explain the performance.
Beyond that, my knowledge of Oracle specifics diminishes and you will have to analyse the explain further.
Your query is doing a full table scan and then returning the first row.
Try
SELECT * FROM table WHERE primary_key = primary_key_value;
The first row, particularly as it pertains to ROWNUM, is arbitrarily decided by Oracle. It may not be the same from query to query, unless you provide an ORDER BY clause.
So, picking a primary key value to filter by is as good a method as any to get a single row.
I think you're slightly missing the concept of ROWNUM - according to Oracle docs: "ROWNUM is a pseudo-column that returns a row's position in a result set. ROWNUM is evaluated AFTER records are selected from the database and BEFORE the execution of ORDER BY clause."
So it returns ANY row that it consideres #1 in the result set which in your case will contain 1M rows.
You may want to check out a ROWID pseudo-column: http://psoug.org/reference/pseudocols.html
I've recently had the same problem you're describing: I want one row from the very large table as a quick, dirty, simple introspection, and "where rownum=1" alone behaves very poorly. Below is a remedy which worked for me.
Select the max() of the first term of some index, and then use it to choose some small fraction of all rows with "rownum=1". Suppose my table has some index on numerical "group-id", and compare this:
select * from my_table where rownum = 1;
-- Elapsed: 00:00:23.69
with this:
select * from my_table where rownum = 1
and group_id = (select max(group_id) from my_table);
-- Elapsed: 00:00:00.01
I have to use Hibernate and retrieve data from Oracle but the problem is, that the number of parameters passed to the query is not always the same.
For the sake of simplicity let's consider the following query:
select COL_1, COL_2, ..., COL_N from TAB_1 where COL_1 in (?, ?, ... ?)
The number of parameters passed to in clause is between 1 and 500. If the number is about 1-50 it works quite fast, but for 200 it takes a few seconds to execute the query (parsing, creating explain plan, executing the query). Indexes are created and used - it was checked.
The query is created dynamicly so I use Hibernate Criteria API. For the first query (with > 100 parameters) it takes 3-5 seconds, but for the next one it works faster (even if the number of parameters varies). I would like to improve the response time for the first query. What can I do in that case assuming that Hibernate is a must?
I though about removig this dynamic query, creating a few static queries as named queries in xml file (in that case those queries will be precompiled at the beginning) For example
1) one query if the number of parameters is less then 50.
In this case if we have 30 parameters than the query would look like:
select COL_1, COL_2, ..., COL_N from TAB_1 where COL_1 in (PAR_1, PAR_2, ..., PAR_30, -1, -1 , ..., -1 ?)
2) the second one if the number is between 50 and 100 etc.
The problem is that it's not so simple using named queries and HQL (in JDBC it would be straighforward). In HQL we passed only a list and we don't specify a number of parameters in that list i.e. In fact there is only one query
'from Person where id in (:person_list)'
myQuery.setParameterList("person_list", myList)
Is there any option to solve that?
By the way, I thought that the explain plan is executed for each new query so for example:
(a) select COL_1, COL_2, ..., COL_N from TAB_1 where COL_1 in (?, ?, ..., ?) <100> - explain plan must be created
(b) select COL_1, COL_2, ..., COL_N from TAB_1 where COL_1 in (?, ?, ..., ?) <100> - explain plan won't be created because it already exists in cache
(c) select COL_1, COL_2, ..., COL_N from TAB_1 where COL_1 in (?, ?, ..., ?) <120> - explain plan should be created (there is no explain plan for a query with 120 parameters) but it takes less time in comparison with (a), almost the same as (b) so probably Oracle can create this plan faster if a similar query was executed before
What is the reason for that?
There are a couple of things here. First of all, you cannot bind an IN list, at least I am pretty sure you cannot. I suspect Hibernate is using some sort of trick you put your array contents into a static inlist Oracle can use.
Secondly if this query is executed with lots of different parameters, you must you bind variables or the performance of the entire database will suffer.
That said, there is a way to bind an IN list using a 'trick' which Tom Kyte describes on his blog -
http://tkyte.blogspot.com/2006/01/how-can-i.html
The code in there looks like:
ops$tkyte#ORA10GR2> with bound_inlist
2 as
3 (
4 select
5 substr(txt,
6 instr (txt, ',', 1, level ) + 1,
7 instr (txt, ',', 1, level+1) - instr (txt, ',', 1, level) -1 )
8 as token
9 from (select ','||:txt||',' txt from dual)
10 connect by level <= length(:txt)-length(replace(:txt,',',''))+1
11 )
12 select *
13 from all_users
14 where user_id in (select * from bound_inlist);
USERNAME USER_ID CREATED
------------------------------ ---------- ---------
SYSTEM 5 30-JUN-05
OPS$TKYTE 104 20-JAN-06
The part:
12 select *
13 from all_users
14 where user_id in (select * from bound_inlist);
Is basically where your query goes. The bit above is the trick which splits the comma separated string into a list of values. Instead of binding a list into the :txt placeholder, you would need to convert the list to a string and just bind that.
Are you sure the difference in query times isn't due to caching or load variations on the machine? Parsing the query will take a little time, but several seconds is a long time.
I've worked with IN(...) queries that had up to a 1000 of ids in that list; I can guarantee you that it does not take several seconds to parse / prepare / cache a statement.
Hibernate does indeed auto-expand the parameter list you specify using the actual number of elements in the list you pass, so if you really wanted to keep it "fixed" at a certain level all you need to do is to append enough -1s to the end. However, this is most certainly not the problem especially since we're talking about speeding up the first query run - no statements have been prepared / cached yet anyway.
Did you look at the execution plans for your queries? Both via explain plan and autotrace enabled? Do they differ when you have 30 elements and 120 elements in your list? Does your actual query really look like "select ... from table where id in (...)" you've posted or is it more complex? I'm willing to bet that somewhere between 30 and 120 elements Oracle decides (perhaps mistakenly) that it'll be faster not to use an index, which is why you're seeing the time increase.