I am trying to execute a query like
select * from tableName where rownum=1
This query is basically to fetch the column names of the table.There are more than million records in the table.When I put the above condition its taking so much time to fetch the first row.Is there any alternate to get the first row.
This question has already been answered, I will just provide an explanation as to why sometimes a filter ROWNUM=1 or ROWNUM <= 1 may result in a long response time.
When encountering a ROWNUM filter (on a single table), the optimizer will produce a FULL SCAN with COUNT STOPKEY. This means that Oracle will start to read rows until it encounters the first N rows (here N=1). A full scan reads blocks from the first extent to the high water mark. Oracle has no way to determine which blocks contain rows and which don't beforehand, all blocks will therefore be read until N rows are found. If the first blocks are empty, it could result in many reads.
Consider the following:
SQL> /* rows will take a lot of space because of the CHAR column */
SQL> create table example (id number, fill char(2000));
Table created
SQL> insert into example
2 select rownum, 'x' from all_objects where rownum <= 100000;
100000 rows inserted
SQL> commit;
Commit complete
SQL> delete from example where id <= 99000;
99000 rows deleted
SQL> set timing on
SQL> set autotrace traceonly
SQL> select * from example where rownum = 1;
Elapsed: 00:00:05.01
Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=ALL_ROWS (Cost=7 Card=1 Bytes=2015)
1 0 COUNT (STOPKEY)
2 1 TABLE ACCESS (FULL) OF 'EXAMPLE' (TABLE) (Cost=7 Card=1588 [..])
Statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
33211 consistent gets
25901 physical reads
0 redo size
2237 bytes sent via SQL*Net to client
278 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
1 rows processed
As you can see the number of consistent gets is extremely high (for a single row). This situation could be encountered in some cases where for example, you insert rows with the /*+APPEND*/ hint (thus above high water mark), and you also delete the oldest rows periodically, resulting in a lot of empty space at the beginning of the segment.
Try this:
select * from tableName where rownum<=1
There are some weird ROWNUM bugs, sometimes changing the query very slightly will fix it. I've seen this happen before, but I can't reproduce it.
Here are some discussions of similar issues: http://jonathanlewis.wordpress.com/2008/03/09/cursor_sharing/ and http://forums.oracle.com/forums/thread.jspa?threadID=946740&tstart=1
Surely Oracle has meta-data tables that you can use to get column names, like the sysibm.syscolumns table in DB2?
And, after a quick web search, that appears to be the case: see ALL_TAB_COLUMNS.
I'd use those rather than go to the actual table, something like (untested):
SELECT COLUMN_NAME
FROM ALL_TAB_COLUMNS
WHERE TABLE_NAME = "MYTABLE"
ORDER BY COLUMN_NAME;
If you are hell-bent on finding out why your query is slow, you should revert to the standard method: asking your DBMS to explain the execution plan of the query for you. For Oracle, see section 9 of this document.
There's a conversation over at Ask Tom - Oracle that seems to suggest the row numbers are created after the select phase, which may mean the query is retrieving all rows anyway. The explain will probably help establish that. If it contains FULL without COUNT STOPKEY, then that may explain the performance.
Beyond that, my knowledge of Oracle specifics diminishes and you will have to analyse the explain further.
Your query is doing a full table scan and then returning the first row.
Try
SELECT * FROM table WHERE primary_key = primary_key_value;
The first row, particularly as it pertains to ROWNUM, is arbitrarily decided by Oracle. It may not be the same from query to query, unless you provide an ORDER BY clause.
So, picking a primary key value to filter by is as good a method as any to get a single row.
I think you're slightly missing the concept of ROWNUM - according to Oracle docs: "ROWNUM is a pseudo-column that returns a row's position in a result set. ROWNUM is evaluated AFTER records are selected from the database and BEFORE the execution of ORDER BY clause."
So it returns ANY row that it consideres #1 in the result set which in your case will contain 1M rows.
You may want to check out a ROWID pseudo-column: http://psoug.org/reference/pseudocols.html
I've recently had the same problem you're describing: I want one row from the very large table as a quick, dirty, simple introspection, and "where rownum=1" alone behaves very poorly. Below is a remedy which worked for me.
Select the max() of the first term of some index, and then use it to choose some small fraction of all rows with "rownum=1". Suppose my table has some index on numerical "group-id", and compare this:
select * from my_table where rownum = 1;
-- Elapsed: 00:00:23.69
with this:
select * from my_table where rownum = 1
and group_id = (select max(group_id) from my_table);
-- Elapsed: 00:00:00.01
Related
We have a database with a vast number of tables and columns that was set up by a 3rd party.
Many of these columns are entirely unused. I am trying to create a query that returns a list of all the columns that are actually used (contain > 0 values).
My current attempt -
SELECT table_name, column_name
FROM ALL_TAB_COLUMNS
WHERE OWNER = 'XUSER'
AND num_nulls < 1
;
Using num_nulls < 1 dramatically reduces the number of returned values, as expected.
However, on inspection of some of the tables, there are columns missing from the results of the query that appear to have values in them.
Could anybody explain why this might be the case?
First of all, statistics are not always 100% accurate. They can be gathered on a subset of the table rows, since they are, after all, statistics. Just like pollsters do not have to ask every American how they feel about a given politician, Oracle can get an accurate-enough sense of the data in a table by reading only a portion of it.
Even if the statistics were gathered on 100% of the rows in a table (and they can be gathered that way, if you want), the statistics will become outdated as soon as there are any inserts, updates, or deletes on the table.
Second of all, num_nulls < 1 wouldn't tell you the columns that had no data. Imagine a table with 100 rows and "column X" having num_nulls equal to 80. That would imply the column has 20 non-null values, but would NOT pass your filter. A better approach (if you trust your statistics are not stale and based on a 100% sample of the rows), might be to compare DBA_TAB_COLUMNS.NUM_NULLS < DBA_TABLES.NUM_ROWS. For example, a column that has 99 nulls in a 100 row table has data in 1 row.
"there are columns missing from the results of the query that appear to have values in them."
Potentially every non-mandatory column could appear in this set, because it is likely that some rows will have values but not all rows. "Some rows" being greater than zero means such columns won't pass your test for num_nulls < 1.
So maybe you should search for columns which aren't in use. This query will find columns where every row is null:
select t.table_name
, tc.column_name
from user_tables t
join user_tab_cols tc on t.table_name = tc.table_name
where t.num_rows > 0
and t.num_rows = tc.num_nulls;
Note that if you are using Partitioning you will need to scan user_tab_partitions.num_rows and user_part_col_statistics.num_nulls.
Also, I second the advice others have given regarding statistics. The above query may throw out some false positives. I would treat the results generated from that query as a list of candidates to be investigated further. For instance you could generate queries which counted the actual number of nulls for each column.
I want a query that selects the number of rows in each table
but they are NOT updated statistically .So such query will not be accurate:
select table_name, num_rows from user_tables
i want to select several schema and each schema has minimum 500 table some of them contain a lot of columns . it will took for me days if i want to update them .
from the site ask tom he suggest a function includes this query
'select count(*)
from ' || p_tname INTO l_columnValue;
such query with count(*) is really slow and it will not give me fast results.
Is there a query that can give me how many rows are in table in a fast way ?
You said in a comment that you want to delete (drop?) empty tables. If you don't want an exact count but only want to know if a table is empty you can do a shortcut count:
select count(*) from table_name where rownum < 2;
The optimiser will stop when it reaches the first row - the execution plan shows a 'count stopkey' operation - so it will be fast. It will return zero for an empty table, and one for a table with any data - you have no idea how much data, but you don't seem to care.
You still have a slight race condition between the count and the drop, of course.
This seems like a very odd thing to want to do - either your application uses the table, in which case dropping it will break something even if it's empty; or it doesn't, in which case it shouldn't matter whether it has (presumably redundant) and it can be dropped regardless. If you think there might be confusion, that sounds like your source (including DDL) control needs some work, maybe?
To check if either table in two schemas have a row, just count from both of them; either with a union:
select max(c) from (
select count(*) as c from schema1.table_name where rownum < 2
union all
select count(*) as c from schema2.table_name where rownum < 2
);
... or with greatest and two sub-selects, e.g.:
select greatest(
(select count(*) from schema1.table_name where rownum < 2),
(select count(*) from schema2.table_name where rownum < 2)
) from dual;
Either would return one if either table has any rows, and would only return zero f they were both empty.
Full Disclosure: I had originally suggested a query that specifically counts a column that's (a) indexed and (b) not null. #AlexPoole and #JustinCave pointed out (please see their comments below) that Oracle will optimize a COUNT(*) to do this anyway. As such, this answer has been altered significantly.
There's a good explanation here for why User_Tables shouldn't be used for accurate row counts, even when statistics are up to date.
If your tables have indexes which can be used to speed up the count by doing an index scan rather than a table scan, Oracle will use them. This will make the counts faster, though not by any means instantaneous. That said, this is the only way I know to get an accurate count.
To check for empty (zero row) tables, please use the answer posted by Alex Poole.
You could make a table to hold the counts of each table. Then, set a trigger to run on INSERT for each of the tables you're counting that updates the main table.
You'd also need to include a trigger for DELETE.
We are running into performance issue where I need some suggestions ( we are on Oracle 10g R2)
The situation is sth like this
1) It is a legacy system.
2) In some of the tables it holds data for the last 10 years ( means data was never deleted since the first version was rolled out). Now in most of the OLTP tables they are having around 30,000,000 - 40,000,000 rows.
3) Search operations on these tables is taking flat 5-6 minutes of time. ( a simple query like select count(0) from xxxxx where isActive=’Y’ takes around 6 minutes of time.) When we saw the explain plan we found that index scan is happening on isActive column.
4) We have suggested archive and purge of the old data which is not needed and team is working towards it. Even if we delete 5 years of data we are left with around 15,000,000 - 20,000,000 rows in the tables which itself is very huge, so we thought of having table portioning on these tables, but we found that the user can perform search of most of the columns of these tables from UI,so which will defeat the very purpose of table partitioning.
so what are the steps which need to be taken to improve this situation.
First of all: question why you are issuing the query select count(0) from xxxxx where isactive = 'Y' in the first place. Nine out of ten times it is a lazy way to check for existence of a record. If that's the case with you, just replace it with a query that select 1 row (rownum = 1 and a first_rows hint).
The number of rows you mention are nothing to be worried about. If your application doesn't perform well when number of rows grows, then your system is not designed to scale. I'd investigate all queries that take too long using a SQL*Trace or ASH and fix it.
By the way: nothing you mentioned justifies the term legacy, IMHO.
Regards,
Rob.
Just a few observations:
I'm guessing that the "isActive" column can have two values - 'Y' and 'N' (or perhaps 'Y', 'N', and NULL - although why in the name of Fred there wouldn't be a NOT NULL constraint on such a column escapes me). If this is the case an index on this column would have very poor selectivity and you might be better off without it. Try dropping the index and re-running your query.
#RobVanWijk's comment about use of SELECT COUNT(*) is excellent. ONLY ask for a row count if you really need to have the count; if you don't need the count, I've found it's faster to do a direct probe (SELECT whatever FROM wherever WHERE somefield = somevalue) with an apprpriate exception handler than it is to do a SELECT COUNT(*). In the case you cited, I think it would be better to do something like
BEGIN
SELECT IS_ACTIVE
INTO strIsActive
FROM MY_TABLE
WHERE IS_ACTIVE = 'Y';
bActive_records_found := TRUE;
EXCEPTION
WHEN NO_DATA_FOUND THEN
bActive_records_found := FALSE;
WHEN TOO_MANY_ROWS THEN
bActive_records_found := TRUE;
END;
As to partitioning - partitioning can be effective at reducing query times IF the field on which the table is partitioned is used in all queries. For example, if a table is partitioned on the TRANSACTION_DATE variable, then for the partitioning to make a difference all queries against this table would have to have a TRANSACTION_DATE test in the WHERE clause. Otherwise the database will have to search each partition to satisfy the query, so I doubt any improvements would be noted.
Share and enjoy.
I have table with "varchar2" as primary key.
It has about 1 000 000 Transactions per day.
My app wakes up every 5 minute to generate text file by querying only new record.
It will remember last point and process only new records.
Do you have idea how to query with good performance?
I am able to add new column if necessary.
What do you think this process should do by?
plsql?
java?
Everyone here is really really close. However:
Scott Bailey's wrong about using a bitmap index if the table's under any sort of continuous DML load. That's exactly the wrong time to use a bitmap index.
Everyone else's answer about the PROCESSED CHAR(1) check in ('Y','N')column is right, but missing how to index it; you should use a function-based index like this:
CREATE INDEX MY_UNPROCESSED_ROWS_IDX ON MY_TABLE
(CASE WHEN PROCESSED_FLAG = 'N' THEN 'N' ELSE NULL END);
You'd then query it using the same expression:
SELECT * FROM MY_TABLE
WHERE (CASE WHEN PROCESSED_FLAG = 'N' THEN 'N' ELSE NULL END) = 'N';
The reason to use the function-based index is that Oracle doesn't write index entries for entirely NULL values being indexed, so the function-based index above will only contain the rows with PROCESSED_FLAG = 'N'. As you update your rows to PROCESSED_FLAG = 'Y', they'll "fall out" of the index.
Well, if you can add a new column, you could create a Processed column, which will indicate processed records, and create an index on this column for performance.
Then the query should only be for those rows that have been newly added, and not processed.
This should be easily done using sql queries.
Ah, I really hate to add another answer when the others have come so close to nailing it. But
As Ponies points out, Oracle does have a hidden column (ORA_ROWSCN - System Change Number) that can pinpoint when each row was modified. Unfortunately, the default is that it gets the information from the block instead of storing it with each row and changing that behavior will require you to rebuild a really large table. So while this answer is good for quieting the SQL Server fella, I'd not recommend it.
Astander is right there but needs a few caveats. Add a new column needs_processed CHAR(1) DEFAULT 'Y' and add a BITMAP index. For low cardinality columns ('Y'/'N') the bitmap index will be faster. Once you have the rest is pretty easy. But you've got to be careful not select the new rows, process them and mark them as processed in one step. Otherwise, rows could be inserted while you are processing that will get marked processed even though they have not been.
The easiest way would be to use pl/sql to open a cursor that selects unprocessed rows, processes them and then updates the row as processed. If you have an aversion to walking cursors, you could collect the pk's or rowids into a nested table, process them and then update using the nested table.
In MS SQL Server world where I work, we have a 'version' column of type 'timestamp' on our tables.
So, to answer #1, I would add a new column.
To answer #2, I would do it in plsql for performance.
Mark
"astander" pretty much did the work for you. You need to ALTER your table to add one more column (lets say PROCESSED)..
You can also consider creating an INDEX on the PROCESSED ( a bitmap index may be of some advantage, as the possible value can be only 'y' and 'n', but test it out ) so that when you query it will use INDEX.
Also if sure, you query only for every 5 mins, check whether you can add another column with TIMESTAMP type and partition the table with it. ( not sure, check out again ).
I would also think about writing job or some thing and write using UTL_FILE and show it front end if it can be.
If performance is really a problem and you want to create your file asynchronously, you might want to use Oracle Streams, which will actually get modification data from your redo log withou affecting performance of the main database. You may not even need a separate job, as you can configure Oracle Streams to do Asynchronous replication of the changes, through which you can trigger the file creation.
Why not create an extra table that holds two columns. The ID column and a processed flag column. Have an insert trigger on the original table place it's ID in this new table. Your logging process can than select records from this new table and mark them as processed. Finally delete the processed records from this table.
I'm pretty much in agreement with Adam's answer. But I'd want to do some serious testing compared to an alternative.
The issue I see is that you need to not only select the rows, but also do an update of those rows. While that should be pretty fast, I'd like to avoid the update. And avoid having any large transactions hanging around (see below).
The alternative would be to add CREATE_DATE date default sysdate. Index that. And then select records where create_date >= (start date/time of your previous select).
But I don't have enough data on the relative costs of setting a sysdate as default vs. setting a value of Y, updating the function based vs. date index, and doing a range select on the date vs. a specific select on a single value for the Y. You'll probably want to preserve stats or hint the query to use the index on the Y/N column, and definitely want to use a hint on a date column -- the stats on the date column will almost certainly be old.
If data are also being added to the table continuously, including during the period when your query is running, you need to watch out for transaction control. After all, you don't want to read 100,000 records that have the flag = Y, then do your update on 120,000, including the 20,000 that arrived when you query was running.
In the flag case, there are two easy ways: SET TRANSACTION before your select and commit after your update, or start by doing an update from Y to Q, then do your select for those that are Q, and then update to N. Oracle's read consistency is wonderful but needs to be handled with care.
For the date column version, if you don't mind a risk of processing a few rows more than once, just update your table that has the last processed date/time immediately before you do your select.
If there's not much information in the table, consider making it Index Organized.
What about using Materialized view logs? You have a lot of options to play with:
SQL> create table test (id_test number primary key, dummy varchar2(1000));
Table created
SQL> create materialized view log on test;
Materialized view log created
SQL> insert into test values (1, 'hello');
1 row inserted
SQL> insert into test values (2, 'bye');
1 row inserted
SQL> select * from mlog$_test;
ID_TEST SNAPTIME$$ DMLTYPE$$ OLD_NEW$$ CHANGE_VECTOR$$
---------- ----------- --------- --------- ---------------------
1 01/01/4000 I N FE
2 01/01/4000 I N FE
SQL> delete from mlog$_test where id_test in (1,2);
2 rows deleted
SQL> insert into test values (3, 'hello');
1 row inserted
SQL> insert into test values (4, 'bye');
1 row inserted
SQL> select * from mlog$_test;
ID_TEST SNAPTIME$$ DMLTYPE$$ OLD_NEW$$ CHANGE_VECTOR$$
---------- ----------- --------- --------- ---------------
3 01/01/4000 I N FE
4 01/01/4000 I N FE
I think this solution should work..
What you need to do following steps
For the first run, you will have to copy all records. In first run you need to execute following query
insert into new_table(max_rowid) as (Select max(rowid) from yourtable);
Now next time when you want to get only newly inserted values, you can do it by executing follwing command
Select * from yourtable where rowid > (select max_rowid from new_table);
Once you are done with processing above query, simply truncate new_table and insert max(rowid) from yourtable
I think this should work and would be fastest solution;
I have a partitioned table like so:
create table demo (
ID NUMBER(22) not null,
TS TIMESTAMP not null,
KEY VARCHAR2(5) not null,
...lots more columns...
)
The partition is on the TS column (one partition per year).
Since I search a lot via the timestamp, I created a combined index:
create index demo.x1 on demo (ts, key);
The query looks like this:
select *
from demo t
where t.TS = to_timestamp('2009-06-30 07:47:57', 'YYYY-MM-DD HH24:MI:SS')
I also tried to add and t.KEY = '00101' but that doesn't help.
But EXPLAIN PLAN says that TABLE ACCESS and FULL:
# Operation Options Object Mode Cost Bytes Cardinality
0 SELECT STATEMENT ALL_ROWS 583804 287145 2127
1 PARTITION RANGE ALL 583804 287145 2127
2 TABLE ACCESS FULL HEADER ANALYZED 583804 287145 2127
No mention of the index. What could be wrong?
[EDIT] For some reason, Oracle completely miscalculated the cost for the operation. I have 112 million rows in that table. The cost for a full scan of a single partition should be 20 million, not 600'000. That's why it even ignores optimizer hints.
[EDIT2] During my tests, I ran over this puzzling result. When I run this select:
select tx_ts
from kt.header
where tx_ts = to_timestamp('2009-06-30 07:47:57', 'YYYY-MM-DD HH24:MI:SS')
I get this EXPLAIN PLAN:
0 SELECT STATEMENT ALL_ROWS 152 15616 1952
1 PARTITION RANGE ALL 152 15616 1952
2 INDEX FAST FULL SCAN HEADERX2 ANALYZED 152 15616 1952
So when I restrict myself to the indexed column as the result of the select, Oracle decides to use the index. When I want to get all columns, I have to wait for a full table scan. What's going on here?
[EDIT2] Found it; see my answer below.
Okay, it was a mistake on my part: The column had the type DATE, not TIMESTAMP. Since I used to_timestamp(), Oracle saw no way to use the index.
I'm not really an expert on partitioning, but I think what has happened here is that you have created a global index -- a single index that covers rows in all of the partitions. Therefore, the optimizer has to choose between two mutually exclusive access paths: (A) an index range scan, or (B) partition pruning. I believe the PARTITION RANGE operation indicates that it has chosen B.
Updating statistics, as others have suggested, may change the behavior. When you drop and recreate the index, you discard any statistics that existed for the index.
Creating the index as UNIQUE, if the timestamp and key uniquely identify a row, would be a good idea and might change the behavior as well.
However, I think the real "fix" is that you should instead create local indexes -- a separate index on each partition. This should enable the optimizer to do partition pruning followed by an index lookup. Honestly, I'm not sure what the exact syntax is to do this. Maybe you just create the index on each partition explicitly using the individual partition names.
If everything else fails, you might try an optimizer hint:
select /*+ index(demo.demo demo.x1) */ *
from demo t
where t.TS = to_timestamp('2009-06-30 07:47:57', 'YYYY-MM-DD HH24:MI:SS')
Are your stats up to date? Invalid stats may mean that oracle believes a full table scan is faster than using the index. Are you using any hints in your query that might be telling oracle to do a full scan?
Can you supply us with the full query and explain plan results?
Edit: Aaron, you can update the stats using "dbms_stats.gather_schema_stats" or "dbms_stats.gather_table_stats" commands. See here for more information on the commands. This will update all the relevant stats for the schema or table specified. Oracle's Cost Based Optimiser will use the statistics to determine which execution plan to choose. It never uses the actual table sizes. You'll need to re-update your stats when the size of your table changes significantly ( +/- 10% or so)
Another thing. When you use a compound index, you need to specify all the columns used in the index in your query for the optimizer to consider the index (and I think you need to specify them in the same order as well, though I could be wrong about that, it's been a while since I looked at this stuff)
There may just be a typo in your transcription of the "CREATE INDEX..." statement that you posted, but are you sure you actually have created the index?
To give us some first-pass idea of the statistics, use these queries:
select table_name, num_rows
from user_tables
where table_name = 'DEMO';
select table_name, num_rows
from user_tab_partitions
where table_name = 'DEMO';
select index_name, num_rows from user_indexes
where table_name in
(select table_name
from user_tables where table_name = 'DEMO');
Also, exactly how are you generating the EXPLAIN PLAN? Do you have access to the database host to retrieve a trace file if you enable tracing?
[edit]
As I commented, it would be good to see the trace of an actual execution of the query. Since you've indicated you have access to the db host filesystems, run a SQL script that (in the same session) issues the following:
alter session set sql_trace=true;
select /* THIS IS THE TRACE */
*
from demo t
where t.TS = to_timestamp('2009-06-30 07:47:57', 'YYYY-MM-DD HH24:MI:SS');
exit
After the script exits, find out
where the trace file directory is by
this query:
select value from v$parameter where name = 'user_dump_dest';
Use whatever searching tool is
available to you to find the file
that includes the string "THIS IS THE
TRACE"
Process/profile the trace file by
issuing the OS command tkprof
traceFileName.trc tkprof.out
Examine this file - you'll see some overhead information, but there will be a section that details the actual execution plan and statistics for the query. If you see the same results in this information then the next step is to add another statement (after the first "alter session") that will dump information on why the Oracle CBO is ignoring the index:
ALTER SESSION SET EVENTS='10053 trace name context forever, level 1';