does oracle hard parse whenever the text changes - oracle

My understanding of hard parse in Oracle is, when a SQL statement is processed, if no matching text is found it will be hard parsed. I am working on an assignment, where
they fire multiple SQL's from a service as below(around 50 variations executed multiple times)
select * from emp where emp_no in (:V1,:V2);
select * from emp where emp_no in (:V1,:V2,:V3);
select * from emp where emp_no in (:V1,:V2,V3,:V4);
This in turn generates multiple SQL_ID's with same PHV. My question is, does this create hard parse of each statement? Reason is
AWR report shows that only .10 seconds of total elapsed time of SQL is hard parsed time
V$SQL/v$SQLSTAS shows only a fraction of elapsed time is hard parse.

Yes.
When you submit a statement to the database, it hashes the text to produce a SQL_id. Meaning you have a cursor for each => there must have been a hard parse.
You can verify this by querying v$sql. This will have an entry for each version. You can also see the number of hard parses by checking the parse count (hard) stat:
var v1 number;
var v2 number;
var v3 number;
alter system flush shared_pool;
select sql_id, executions
from v$sql
where sql_text like 'select * from hr.employees%';
no rows selected
select n.display_name, s.value
from v$mystat s
join v$statname n
on s.statistic# = n.statistic#
where n.name like 'parse count%';
DISPLAY_NAME VALUE
parse count (total) 1682
parse count (hard) 449
parse count (failures) 4
parse count (describe) 0
select * from hr.employees
where employee_id in ( :v1 );
select * from hr.employees
where employee_id in ( :v1, :v2 );
select * from hr.employees
where employee_id in ( :v1, :v2, :v3 );
select sql_id, executions
from v$sql
where sql_text like 'select * from hr.employees%';
SQL_ID EXECUTIONS
63dqkasu1w4du 1
4kr9jqam2p6dw 1
8gbwr8cry9d84 1
select n.display_name, s.value
from v$mystat s
join v$statname n
on s.statistic# = n.statistic#
where n.name like 'parse count%';
DISPLAY_NAME VALUE
parse count (total) 1697
parse count (hard) 457
parse count (failures) 4
parse count (describe) 0
So what happens if we run the statements again?
Let's see:
select * from hr.employees
where employee_id in ( :v1 );
select * from hr.employees
where employee_id in ( :v1, :v2 );
select * from hr.employees
where employee_id in ( :v1, :v2, :v3 );
select sql_id, executions
from v$sql
where sql_text like 'select * from hr.employees%';
SQL_ID EXECUTIONS
63dqkasu1w4du 2
4kr9jqam2p6dw 2
8gbwr8cry9d84 2
select n.display_name, s.value
from v$mystat s
join v$statname n
on s.statistic# = n.statistic#
where n.name like 'parse count%';
DISPLAY_NAME VALUE
parse count (total) 1707
parse count (hard) 457
parse count (failures) 4
parse count (describe) 0
So:
The execution count for each SQL_id increased by 1
The count of hard parses stayed the same (457)
=> no new hard parses!
This goes part of the way to explaining why you are seeing such small values for hard parsing in AWR etc. Hopefully you're parsing each variation once, then executing it many, many times.
Also, while relatively expensive and something to avoid, hard parsing is still fast in absolute terms. Particularly for simple statements such as the above.

Related

Oracle counting rows in a PARTITION

I have the following setup, which includes PARTITIONS. Is there a query I can use that will provide a count for each PARTITION within the table..
I prefer not to have a possible estimate by gathering statistics as opposed to the actual count(*). Note the PARTITION name can be renamed!!
Below is my test CASE. Thanks to all who answer.
ALTER SESSION SET NLS_DATE_FORMAT = 'MMDDYYYY HH24:MI:SS';
CREATE TABLE dts (
dt DATE
)
PARTITION BY RANGE (dt)
INTERVAL (NUMTODSINTERVAL(7,'DAY'))
(
PARTITION OLD_DATA values LESS THAN (TO_DATE('2022-01-01','YYYY-MM-DD'))
);
INSERT into dts(dt)
select to_date (
'01-08-2022','mm-dd-yyyy' ) +
( level / 24 ) dt
from dual
connect by level <= ( 24 + ( 24 *
(to_date('01-15-2022' ,'mm-dd-yyyy') - to_date('01-08-2022','mm-dd-yyyy') )
)
) ;
SELECT table_name,
partition_name,
num_rows
FROM user_tab_partitions
WHERE table_name not like 'BIN$%'
ORDER BY table_name, partition_name;
TABLE_NAME PARTITION_NAME NUM_ROWS
DTS OLD_DATA -
DTS SYS_P415755 -
DTS SYS_P415756 -
Try this one:
declare
c integer;
begin
for aPart in (select partition_name FROM user_tab_partitions where table_name = 'DTS') loop
execute immediate 'select count(*) from DTS PARTITION ('||aPart.partition_name||')' INTO c;
DBMS_OUTPUT.PUT_LINE(aPart.partition_name || ' ' || c || ' rows');
end loop;
end;
select table_name ,Partition_name, to_number(extractvalue(xmltype(dbms_xmlgen.getxml('select /*+ parallel(a,8) */
count(*) c from '||table_name||' partition ('||partition_name||') a ')),'/ROWSET/ROW/C')) as count
from user_tab_partitions
TABLE_NAME PARTITION_NAME COUNT
DTS OLD_DATA 0
DTS SYS_P415799 167
DTS SYS_P415800 25
Oracle provides a handy function PMARKER exact for this purpose
SELECT DBMS_MVIEW.PMARKER(p.rowid) PMARKER, count(*) cnt, min(dt), max(dt)
from dts p
group by DBMS_MVIEW.PMARKER(p.rowid)
order by 1;
PMARKER CNT MIN(DT) MAX(DT)
---------- ---------- ------------------- -------------------
74312 167 08.01.2022 01:00:00 14.01.2022 23:00:00
74313 25 15.01.2022 00:00:00 16.01.2022 00:00:00
Note that you need not know the partition name, the partition key column value lets you access the partition using the partition extended names:
Example for the first partition
select count(*) from dts partition for (DATE'2022-01-08');
COUNT(*)
----------
167
You can rely on optimizer statistics for a perfect count, as long as you're using the default sample size and algorithm.
begin
dbms_stats.gather_table_stats
(
ownname => user,
tabname => 'DTS',
estimate_percent => dbms_stats.auto_sample_size
);
end;
/
If you run the above PL/SQL block, your original query against USER_TAB_PARTITIONS will return the correct NUM_ROWS. Since version 11g, Oracle scans the entire table to calculate statistics. While it uses an approximation for counting things like non-distinct values and histograms, it's trivial for the algorithm to get a completely accurate row count.
The manual is not super clear about this behavior, but you can put it together from the manual and other articles that discuss how the new algorithm works. From the "Gathering Optmizer Statistics" chapter of the "SQL Tuning Guide":
To maximize performance gains while achieving necessary statistical
accuracy, Oracle recommends that the ESTIMATE_PERCENT parameter use
the default setting of DBMS_STATS.AUTO_SAMPLE_SIZE. In this case,
Oracle Database chooses the sample size automatically. This setting
enables the use of the following:
A hash-based algorithm that is much faster than sampling
This algorithm reads all rows and produces statistics that are nearly
as accurate as statistics from a 100% sample. The statistics computed
using this technique are deterministic.
Most likely, you don't even need to specify the ESTIMATE_PERCENT => DBMS_STATS.AUTO_SAMPLE_SIZE argument. It is extremely unlikely for someone to set that preference for a table or the system. You can use the below query to see how your statistics are typically gathered. Most likely the query will return "DBMS_STATS.AUTO_SAMPLE_SIZE":
select dbms_stats.get_prefs(pname => 'ESTIMATE_PERCENT', ownname => user, tabname => 'DTS')
from dual;

Query to count distinct values in Oracle db CLOB column

I would like to query an Oracle DB table for the number of rows containing each distinct value in a CLOB column.
This returns all rows containing a value:
select * from mytable where dbms_lob.instr(mycol,'value') > 0;
Using DBMS_LOB, this returns the number of rows containing that value:
select count(*) from mytable where dbms_lob.instr(mycol,'value') > 0;
But is it possible to query for the number of times (rows in which) each distinct value appears?
Depending on what that column really contains, see whether TO_CHAR helps.
SQL> create table mytable (mycol clob);
Table created.
SQL> insert into mytable
2 select 'Query to count distinct values' from dual union all
3 select 'I have no idea which values are popular' from dual;
2 rows created.
SQL> select count(*), to_char(mycol) toc
2 from mytable
3 where dbms_lob.instr(mycol,'value') > 0
4 group by to_char(mycol);
COUNT(*) TOC
---------- ----------------------------------------
1 Query to count distinct values
1 I have no idea which values are popular
SQL>
If your CLOB values are more than 4000 bytes (and if not, why are they CLOBs?) then it's not perfect - collisions are possible, if unlikely - but you could hash the CLOB values.
If you want to count the number of distinct values:
select count(distinct dbms_crypto.hash(src=>mycol, typ=>2))
from mytable
where dbms_lob.instr(mycol,'value') > 0;
If you want to count how many times each distinct value appears:
select mycol, cnt
from (
select mycol,
count(*) over (partition by dbms_crypto.hash(src=>mycol, typ=>2)) as cnt,
row_number() over (partition by dbms_crypto.hash(src=>mycol, typ=>2) order by null) as rn
from mytable
where dbms_lob.instr(mycol,'value') > 0
)
where rn = 1;
Both are likely to be fairly expensive and slow with a lot of data.
(typ=>2 gives the numeric value for dbms_crypto.hash_md5, as you can't refer to the package constant in a SQL call, at least up to 12cR1...)
Rather more crudely, but possibly significantly quicker, you could base the count on the just the first 4000 characters - which may or may not be plausible for your actual data:
select count(distinct dbms_lob.substr(mycol, 4000, 1))
from mytable
where dbms_lob.instr(mycol,'value') > 0;
select dbms_lob.substr(mycol, 4000, 1), count(*)
from mytable
where dbms_lob.instr(mycol,'value') > 0
group by dbms_lob.substr(mycol, 4000, 1);
Standard Oracle functions do not support distinction of CLOB values. But, if you have access to DBMS_CRYPTO.HASH function, you can compare CLOB hashes instead, and thus, get the desired output:
select myCol, h.num from
myTable t join
(select min(rowid) rid, count(rowid) num
from myTable
where dbms_lob.instr(mycol,'value') > 0
group by DBMS_CRYPTO.HASH(myCol, 3)) h
on t.rowid = h.rid;
Also, note, that there's a very little possibility of hash collision. But if that's ok with you, you can use this approach.

Oracle query to obtain batches of rows

So here is my problem: I need to get batches of rows (select statements) for a migration to another database (other then oracle).
Suggested solution: I take batches of rows (using rowid maybe?) example:
batch1: 0-10000,
batch2: 10000 - 20000,
batchn: 10000(n) - 10000(n+1)
So what should my query be?
batch1: select * from table_name where rownum >= 0 and rownum < 10000,
batch2: select * from table_name where rownum >= 10000 and rownum < 20000,
batch n: select * from table_name where rownum >= 10000*n and rownum < 10000*(n+1)
This does not work, (only the first select will work).
PS, I am pulling this data from a nodejs app, and thus I am sending in these batch queries in a for loop.
To illustrate my comment:
-- Between rows --
SELECT * FROM
( SELECT deptno, ename, sal, ROW_NUMBER() OVER (ORDER BY ename) Row_Num
FROM scott.emp
)
WHERE Row_Num BETWEEN 5 and 10
/
You may replace between operator with <= and >= if necessary.
Here's what I see in output:
DEPTNO ENAME SAL ROW_NUM
20 FORD 3000 5
30 JAMES 950 6
20 JONES 2975 7
10 KING 5000 8
30 MARTIN 1250 9
10 MILLER 1300 10
Using rownum is not a great idea, because there's no guarantee that the same rows will be assigned the same rownum values in different queries.
If the table has any combination of columns that uniquely identify a row, it is better to generate a ranking based on that and use that ranking to identify batches of rows. For example:
SELECT * FROM (
SELECT table.*, RANK() OVER (ORDER BY column1, column2) as my_rank
FROM table
)
WHERE my_rank >= 10000 AND my_rank < 20000
This will work with any range, and will be reproducible as long as the values in the columns used do not change and uniquely identify a row. (Actually, I think this would be usable even if they do not uniquely identify a row, as long as they work to break the rows into small enough batches.)
The downside is that MY_RANK will be included in the output. You can avoid that by explicitly listing the columns you do want to select; or it may be easier to filter it out when you are loading the data into the other database.
If you want to preserve the rowids, use the following SQL. This SQL took 4 minutes, 20 seconds to run against a 218 million row table on a 2 CPU server with 18 GB devoted to the DB.
CREATE TABLE rowids
AS
WITH
aset
AS
(SELECT ROWID AS row_id, row_number () OVER (ORDER BY ROWID) r
FROM amiadm.big_table)
SELECT *
FROM aset
WHERE MOD (r, 10000) = 0;
After creating this table, loop through it with the following:
BEGIN
FOR recs
IN ( SELECT row_id
, LAG (row_id) OVER (ORDER BY row_id) prev_row_id
, LEAD (row_id) OVER (ORDER BY row_id) next_row_id
FROM rowids
ORDER BY row_id)
LOOP
IF prev_row_id IS NULL
THEN
SELECT *
FROM big_table
WHERE ROWID <= recs.row_id;
ELSIF next_row_id IS NULL
THEN
SELECT *
FROM big_table
WHERE ROWID > row_id;
ELSE
SELECT *
FROM big_table
WHERE ROWID > prev_row_id
AND ROWID <= row_id;
END IF;
END LOOP;
END;

Need to write a procedure to fetch given rownums

I need to write one procedure to pick the record for given rows
for example
procedure test1
(
start_ind number,
end_ind number,
p_out ref cursor
)
begin
opecn p_out for
select * from test where rownum between start_ind and end_ind;
end;
when we pass start_ind 1 and end_ind 10 its working.But when we change start_ind to 5
then query looks like
select * from test where rownum between 5 and 10;
and its fails and not shows the output.
Please assist how to fix this issue.Thanks!
The rownum is assigned and then the where condition evaluated. Since you'll never have a rownum 1-4 in your result set, you never get to rownum 5. You need something like this:
SELECT * FROM (
SELECT rownum AS rn, t.*
FROM (
SELECT t.*
FROM test t
ORDER BY t.whatever
)
WHERE ROWNUM <= 10
)
WHERE rn >= 5
You'll also want an order by clause in the inner select, or which rows you get will be undefined.
This article by Tom Kyte pretty much tells you everything you need to know: http://www.oracle.com/technetwork/issue-archive/2006/06-sep/o56asktom-086197.html
SELECT *
from (SELECT rownum AS rn, t.*
FROM MyTable t
WHERE ROWNUM <= 10
ORDER BY t.NOT-Whatever
-- (its highly important to use primary or unique key of MyTable)
WHERE rn > 5
As a hint, :
Typically we use store-procedures for data validation, access control, extensive or complex processing that requires execution of several SQL statements. Stored procedures may return result sets, i.e. the results of a SELECT statement. Such result sets can be processed using cursors, by other stored procedures, by associating a result set locator, or by applications
I think you are going to use the ruw-number to fetch paged queries.
Try to create a generic select query based on the idea mentioned above.
Two possibilities:
1) Your table is an index-organized table. So its data is sorted. You would select those first rows you want to avoid and based on that get the next rows you are looking for:
create or replace procedure get_records
(
vi_start_ind integer,
vi_end_ind integer,
vo_cursor out sys_refcursor
) as
begin
open vo_cursor for
select *
from test
where rownum <= vi_end_ind - vi_start_ind + 1
and rowid not in
(
select rowid
from test
where rownum < vi_start_ind
)
;
end;
2) Your table is not index-organized, which is normally the case. Then its records are not sorted. To get records m to n, you would have to tell the system what order you have in mind:
create or replace procedure get_records
(
vi_start_ind number,
vi_end_ind number,
vo_cursor out sys_refcursor
) as
begin
open vo_cursor for
select *
from test
where rownum <= vi_end_ind - vi_start_ind + 1
and rowid not in
(
select rowid from
(
select rowid
from test
order by somthing
)
where rownum < vi_start_ind
)
order by something
;
end;
All this said, think it over what you want to achieve. If you want to use this procedure to read your table block for block, keep in mind that it will read the same data again and again. To know what rows 1,000,001 to 1,000,100 are, the dbms must read through one million rows first.

how to make selecting random rows in oracle faster with table with millions of rows

Is there a way to make selecting random rows faster in oracle with a table that has million of rows. I tried to use sample(x) and dbms_random.value and its taking a long time to run.
Thanks!
Using appropriate values of sample(x) is the fastest way you can. It's block-random and row-random within blocks, so if you only want one random row:
select dbms_rowid.rowid_relative_fno(rowid) as fileno,
dbms_rowid.rowid_block_number(rowid) as blockno,
dbms_rowid.rowid_row_number(rowid) as offset
from (select rowid from [my_big_table] sample (.01))
where rownum = 1
I'm using a subpartitioned table, and I'm getting pretty good randomness even grabbing multiple rows:
select dbms_rowid.rowid_relative_fno(rowid) as fileno,
dbms_rowid.rowid_block_number(rowid) as blockno,
dbms_rowid.rowid_row_number(rowid) as offset
from (select rowid from [my_big_table] sample (.01))
where rownum <= 5
FILENO BLOCKNO OFFSET
---------- ---------- ----------
152 2454936 11
152 2463140 32
152 2335208 2
152 2429207 23
152 2746125 28
I suspect you should probably tune your SAMPLE clause to use an appropriate sample size for what you're fetching.
Start with Adam's answer first, but if SAMPLE just isn't fast enough, even with the ROWNUM optimization, you can use block samples:
....FROM [table] SAMPLE BLOCK (0.01)
This applies the sampling at the block level instead of for each row. This does mean that it can skip large swathes of data from the table so the sample percent will be very rough. It's not unusual for a SAMPLE BLOCK with a low percentage to return zero rows.
Here's the same question on AskTom:
http://asktom.oracle.com/pls/apex/f?p=100:11:0::::P11_QUESTION_ID:6075151195522
If you know how big your table is, use sample block as described above. If you don't, you can modify the routine below to get however many rows you want.
Copied from: http://asktom.oracle.com/pls/apex/f?p=100:11:0::::P11_QUESTION_ID:6075151195522#56174726207861
create or replace function get_random_rowid
( table_name varchar2
) return urowid
as
sql_v varchar2(100);
urowid_t dbms_sql.urowid_table;
cursor_v integer;
status_v integer;
rows_v integer;
begin
for exp_v in -6..2 loop
exit when (urowid_t.count > 0);
if (exp_v < 2) then
sql_v := 'select rowid from ' || table_name
|| ' sample block (' || power(10, exp_v) || ')';
else
sql_v := 'select rowid from ' || table_name;
end if;
cursor_v := dbms_sql.open_cursor;
dbms_sql.parse(cursor_v, sql_v, dbms_sql.native);
dbms_sql.define_array(cursor_v, 1, urowid_t, 100, 0);
status_v := dbms_sql.execute(cursor_v);
loop
rows_v := dbms_sql.fetch_rows(cursor_v);
dbms_sql.column_value(cursor_v, 1, urowid_t);
exit when rows_v != 100;
end loop;
dbms_sql.close_cursor(cursor_v);
end loop;
if (urowid_t.count > 0) then
return urowid_t(trunc(dbms_random.value(0, urowid_t.count)));
end if;
return null;
exception when others then
if (dbms_sql.is_open(cursor_v)) then
dbms_sql.close_cursor(cursor_v);
end if;
raise;
end;
/
show errors
Below Solution to this question is not the exact answer but in many scenarios you try to select a row and try to use it for some purpose and then update its status with "used" or "done" so that you do not select it again.
Solution:
Below query is useful but that way if your table is large, I just tried and see that you definitely face performance problem with this query.
SELECT * FROM
( SELECT * FROM table
ORDER BY dbms_random.value )
WHERE rownum = 1
So if you set a rownum like below then you can work around the performance problem. By incrementing rownum you can reduce the possiblities. But in this case you will always get rows from the same 1000 rows. If you get a row from 1000 and update its status with "USED", you will almost get different row everytime you query with "ACTIVE"
SELECT * FROM
( SELECT * FROM table
where rownum < 1000
and status = 'ACTIVE'
ORDER BY dbms_random.value )
WHERE rownum = 1
update the rows status after selecting it, If you can not update that means another transaction has already used it. Then You should try to get a new row and update its status. By the way, getting the same row by two different transaction possibility is 0.001 since rownum is 1000.
Someone told sample(x) is the fastest way you can.
But for me this method works slightly faster than sample(x) method.
It should take fraction of the second (0.2 in my case) no matter what is the size of the table. If it takes longer try to use hints (--+ leading(e) use_nl(e t) rowid(t)) can help
SELECT *
FROM My_User.My_Table
WHERE ROWID = (SELECT MAX(t.ROWID) KEEP(DENSE_RANK FIRST ORDER BY dbms_random.value)
FROM (SELECT o.Data_Object_Id,
e.Relative_Fno,
e.Block_Id + TRUNC(Dbms_Random.Value(0, e.Blocks)) AS Block_Id
FROM Dba_Extents e
JOIN Dba_Objects o ON o.Owner = e.Owner AND o.Object_Type = e.Segment_Type AND o.Object_Name = e.Segment_Name
WHERE e.Segment_Name = 'MY_TABLE'
AND(e.Segment_Type, e.Owner, e.Extent_Id) =
(SELECT MAX(e.Segment_Type) AS Segment_Type,
MAX(e.Owner) AS Owner,
MAX(e.Extent_Id) KEEP(DENSE_RANK FIRST ORDER BY Dbms_Random.Value) AS Extent_Id
FROM Dba_Extents e
WHERE e.Segment_Name = 'MY_TABLE'
AND e.Owner = 'MY_USER'
AND e.Segment_Type = 'TABLE')) e
JOIN My_User.My_Table t
ON t.Rowid BETWEEN Dbms_Rowid.Rowid_Create(1, Data_Object_Id, Relative_Fno, Block_Id, 0)
AND Dbms_Rowid.Rowid_Create(1, Data_Object_Id, Relative_Fno, Block_Id, 32767))
Version with retries when no rows returned:
WITH gen AS ((SELECT --+ inline leading(e) use_nl(e t) rowid(t)
MAX(t.ROWID) KEEP(DENSE_RANK FIRST ORDER BY dbms_random.value) Row_Id
FROM (SELECT o.Data_Object_Id,
e.Relative_Fno,
e.Block_Id + TRUNC(Dbms_Random.Value(0, e.Blocks)) AS Block_Id
FROM Dba_Extents e
JOIN Dba_Objects o ON o.Owner = e.Owner AND o.Object_Type = e.Segment_Type AND o.Object_Name = e.Segment_Name
WHERE e.Segment_Name = 'MY_TABLE'
AND(e.Segment_Type, e.Owner, e.Extent_Id) =
(SELECT MAX(e.Segment_Type) AS Segment_Type,
MAX(e.Owner) AS Owner,
MAX(e.Extent_Id) KEEP(DENSE_RANK FIRST ORDER BY Dbms_Random.Value) AS Extent_Id
FROM Dba_Extents e
WHERE e.Segment_Name = 'MY_TABLE'
AND e.Owner = 'MY_USER'
AND e.Segment_Type = 'TABLE')) e
JOIN MY_USER.MY_TABLE t ON t.ROWID BETWEEN Dbms_Rowid.Rowid_Create(1, Data_Object_Id, Relative_Fno, Block_Id, 0)
AND Dbms_Rowid.Rowid_Create(1, Data_Object_Id, Relative_Fno, Block_Id, 32767))),
Retries(Cnt, Row_Id) AS (SELECT 1, gen.Row_Id
FROM Dual
LEFT JOIN gen ON 1=1
UNION ALL
SELECT Cnt + 1, gen.Row_Id
FROM Retries
LEFT JOIN gen ON 1=1
WHERE Retries.Row_Id IS NULL AND Retries.Cnt < 10)
SELECT *
FROM MY_USER.MY_TABLE
WHERE ROWID = (SELECT Row_Id
FROM Retries
WHERE Row_Id IS NOT NULL)
Can you use pseudorandom rows?
select * from (
select * from ... where... order by ora_hash(rowid)
) where rownum<100

Resources