I have a table in an Oracle db that gets a couple of million new rows every month. Each row has a column which states the date when it was created.
I'd like to run a query that gets the disk space growth over the last 6 months. In other words, the result would be a table with two columns where each row would have the month's name and disk space used during that month.
Thanks,
This article reports a method of getting the table growth: http://www.dba-oracle.com/t_table_growth_reports.htm
column "Percent of Total Disk Usage" justify right format 999.99
column "Space Used (MB)" justify right format 9,999,999.99
column "Total Object Size (MB)" justify right format 9,999,999.99
set linesize 150
set pages 80
set feedback off
select * from (select to_char(end_interval_time, 'MM/DD/YY') mydate, sum(space_used_delta) / 1024 / 1024 "Space used (MB)", avg(c.bytes) / 1024 / 1024 "Total Object Size (MB)",
round(sum(space_used_delta) / sum(c.bytes) * 100, 2) "Percent of Total Disk Usage"
from
dba_hist_snapshot sn,
dba_hist_seg_stat a,
dba_objects b,
dba_segments c
where begin_interval_time > trunc(sysdate) - &days_back
and sn.snap_id = a.snap_id
and b.object_id = a.obj#
and b.owner = c.owner
and b.object_name = c.segment_name
and c.segment_name = '&segment_name'
group by to_char(end_interval_time, 'MM/YY'))
order by to_date(mydate, 'MM/YY');
DBA_TABLES (or the equivalent) gives an AVG_ROW_LEN, so you could simply multiply that by the number of rows created per month.
The caveats to that are, it assumes that the row length of new rows is similar to that of existing rows. If you've got a bunch of historical data that were 'small' (eg 50 bytes) but new rows are larger (150 bytes), then the estimates will be too low.
Also, how do updates figure into things ? If a row starts at 50 bytes and grows to 150 two months later, how do you account for those 100 bytes ?
Finally, tables don't grow for each row insert. Every so often the allocated space will fill up and it will go and allocate another chunk. Depending on the table settings, that next chunk may be, for example, 50% of the existing table size. So you might not physically grow for three months and then have a massive jump, then not grow for another six months.
Related
I am facing a spool space issue for one of my query.Below is the query:
SEL * from (
SEL A.ICCUSNO,
B.ACACCNO,
D.PARTY_ID,
E.PARTY_IDENTIFICATION_NUM,
E.PARTY_IDENTIFICATION_TYPE_CD,
A.ICIDTY,
A.ICIDNO AS ICIDNO,
A.ICEXPD,
ROW_NUMBER() OVER(PARTITION BY ICCUSNO ORDER BY ICEXPD DESC ) AS ICEFFD2
FROM GE_SANDBX.GE_CMCUST A
INNER JOIN GE_SANDBX.GE_CMACCT B
ON A.ICCUSNO=B.ACCUSNO
INNER JOIN GE_VEW.ACCT C
ON B.ACACCNO=C.ACCT_NUM
AND C.DATA_SOURCE_TYPE_CD='ILL'
INNER JOIN GE_VEW.PARTY_ACCT_HIST D
ON C.ACCT_ID=D.ACCT_ID
LEFT OUTER JOIN GE_VEW.GE_PI E
ON D.PARTY_ID=E.PARTY_ID
AND A.ICIDTY=E.PARTY_IDENTIFICATION_TYPE_CD
AND E.DSTC NOT IN( 'SCRM', 'BCRM')
--WHERE B.ACACCNO='0657007129'
--WHERE A.ICIDNO<>E.PARTY_IDENTIFICATION_NUM
QUALIFY ICEFFD2=1) T
where t.PARTY_IDENTIFICATION_NUM<>t.ICIDNO;
I am trying to pick one record based on expiry date ICEXPD. My inner query gives me one record per customer no ICCUSNO as below:
I
CCUSNO ACACCNO PARTY_ID Party_Identification_Num Party_Identification_Type_Cd ICIDNO ICEXPD ICEFFD2
100000013 500010207 5,862,640 1-0121-2073-7 S 1-0212-2073-4 9/20/2007 1
But i have update the table only when the PARTY_IDENTIFICATION_NUM doesn't match with the ICIDNO.
Below is the explain plan:
1) First, we lock GE_SANDBX.A for access, we lock
GE_SANDBX.B for access, we lock DP_TAB.PARTY_ACCT_HIST for
access, we lock DP_TAB.GE_PI for access, and we
lock DP_TAB.ACCT for access.
2) Next, we do an all-AMPs RETRIEVE step from DP_TAB.ACCT by way of
an all-rows scan with a condition of (
"DP_TAB.ACCT.DATA_SOURCE_TYPE_CD = 'ILL '") into Spool 3
(all_amps), which is built locally on the AMPs. The size of Spool
3 is estimated with no confidence to be 9,834,342 rows (
344,201,970 bytes). The estimated time for this step is 2.18
seconds.
3) We do an all-AMPs JOIN step from Spool 3 (Last Use) by way of a
RowHash match scan, which is joined to DP_TAB.PARTY_ACCT_HIST by
way of a RowHash match scan with no residual conditions. Spool 3
and DP_TAB.PARTY_ACCT_HIST are joined using a merge join, with a
join condition of ("Acct_Id = DP_TAB.PARTY_ACCT_HIST.ACCT_ID").
The result goes into Spool 4 (all_amps), which is redistributed by
the hash code of (DP_TAB.ACCT.Acct_Num) to all AMPs. Then we do a
SORT to order Spool 4 by row hash. The size of Spool 4 is
estimated with no confidence to be 13,915,265 rows (487,034,275
bytes). The estimated time for this step is 0.98 seconds.
4) We execute the following steps in parallel.
1) We do an all-AMPs JOIN step from GE_SANDBX.B by way of a
RowHash match scan with no residual conditions, which is
joined to Spool 4 (Last Use) by way of a RowHash match scan.
GE_SANDBX.B and Spool 4 are joined using a merge join,
with a join condition of ("GE_SANDBX.B.ACACCNO =
Acct_Num"). The result goes into Spool 5 (all_amps) fanned
out into 18 hash join partitions, which is redistributed by
the hash code of (GE_SANDBX.B.ACCUSNO) to all AMPs. The
size of Spool 5 is estimated with no confidence to be
13,915,265 rows (2,657,815,615 bytes). The estimated time
for this step is 1.33 seconds.
2) We do an all-AMPs RETRIEVE step from GE_SANDBX.A by way
of an all-rows scan with no residual conditions into Spool 6
(all_amps) fanned out into 18 hash join partitions, which is
redistributed by the hash code of (GE_SANDBX.A.ICCUSNO)
to all AMPs. The size of Spool 6 is estimated with high
confidence to be 12,169,929 rows (5,427,788,334 bytes). The
estimated time for this step is 52.24 seconds.
3) We do an all-AMPs RETRIEVE step from
DP_TAB.GE_PI by way of an all-rows scan with a
condition of (
"(DP_TAB.GE_PI.DSTC <> 'BSCRM')
AND (DP_TAB.GE_PI.DATA_SOURCE_TYPE_CD <>
'SCRM')") into Spool 7 (all_amps), which is built locally on
the AMPs. The size of Spool 7 is estimated with low
confidence to be 161,829 rows (19,419,480 bytes). The
estimated time for this step is 1.97 seconds.
5) We do an all-AMPs JOIN step from Spool 5 (Last Use) by way of an
all-rows scan, which is joined to Spool 6 (Last Use) by way of an
all-rows scan. Spool 5 and Spool 6 are joined using a hash join
of 18 partitions, with a join condition of ("ICCUSNO = ACCUSNO").
The result goes into Spool 8 (all_amps), which is redistributed by
the hash code of (DP_TAB.PARTY_ACCT_HIST.PARTY_ID,
TRANSLATE((GE_SANDBX.A.ICIDTY )USING
LATIN_TO_UNICODE)(VARCHAR(255), CHARACTER SET UNICODE, NOT
CASESPECIFIC)) to all AMPs. The size of Spool 8 is estimated with
no confidence to be 15,972,616 rows (8,593,267,408 bytes). The
estimated time for this step is 4.37 seconds.
6) We do an all-AMPs JOIN step from Spool 7 (Last Use) by way of an
all-rows scan, which is joined to Spool 8 (Last Use) by way of an
all-rows scan. Spool 7 and Spool 8 are right outer joined using a
single partition hash join, with condition(s) used for
non-matching on right table ("NOT (ICIDTY IS NULL)"), with a join
condition of ("(PARTY_ID = Party_Id) AND ((TRANSLATE((ICIDTY
)USING LATIN_TO_UNICODE))= Party_Identification_Type_Cd)"). The
result goes into Spool 2 (all_amps), which is built locally on the
AMPs. The size of Spool 2 is estimated with no confidence to be
16,053,773 rows (10,306,522,266 bytes). The estimated time for
this step is 2.11 seconds.
7) We do an all-AMPs STAT FUNCTION step from Spool 2 (Last Use) by
way of an all-rows scan into Spool 13 (Last Use), which is
redistributed by hash code to all AMPs. The result rows are put
into Spool 11 (all_amps), which is built locally on the AMPs. The
size is estimated with no confidence to be 16,053,773 rows (
18,558,161,588 bytes).
8) We do an all-AMPs RETRIEVE step from Spool 11 (Last Use) by way of
an all-rows scan with a condition of ("(Party_Identification_Num
<> ICIDNO) AND (Field_10 = 1)") into Spool 16 (group_amps), which
is built locally on the AMPs. The size of Spool 16 is estimated
with no confidence to be 10,488,598 rows (10,404,689,216 bytes).
The estimated time for this step is 2.20 seconds.
9) Finally, we send out an END TRANSACTION step to all AMPs involved
in processing the request.
-> The contents of Spool 16 are sent back to the user as the result
of statement 1.
All the required stats are collected.
Thanks for your help.
I am working on moving 70M rows from a source table to a target table and using a complete dump and restore it on the other end is not an option. I have decided to create a small SQL file that selects 1M rows at a time and inserts the rows to the new table (after some clean up). The problem becomes that I need to iterate over through the 70M rows with 1M chunks, and I just realised that every iteration is getting slower and slower.
Is there a way to create a partial index to speed up queries having OFFSET 0 LIMIT 1000000, OFFSET 1000000 LIMIT 1000000 etc?
Example:
Fast:
SELECT id FROM huge_table ORDER BY id OFFSET 0 LIMIT 1000000
Slower:
SELECT id FROM huge_table ORDER BY id OFFSET 1000000 LIMIT 1000000
Very slow:
SELECT id FROM huge_table ORDER BY id OFFSET 5000000 LIMIT 1000000
I have an orders table. The table belongs to a multi-tenant application, so there are orders from several merchants in the same table. The table stores hundreds of millions of records. There are two relevant columns for this question:
MerchantID, an integer storing the merchant's unique ID
TransactionID, a string identifying the transaction
I want to know whether there is an efficient index to do the following:
Enforce a unique constraint on Transaction ID for each Merchant ID. The constraint should be enforced in constant time.
Do constant time queries involving exact matches on both columns (for instance, SELECT * FROM <table> WHERE TransactionID = 'ff089f89feaac87b98a' AND MerchantID = 24)
Further info:
I am using Oracle 11g. Maybe this Oracle article is relevant to my question?
I cannot change the column's data type.
constant time means an index performing in O(1) time complexity. Like a hashmap.
Hash clusters can provide O(1) access time, but not O(1) constraint enforcement time. However, in practice the constant access time of a hash cluster is worse than the O(log N) access time of a regular b-tree index. Also, clusters are more difficult to configure and do not scale well for some operations.
Create Hash Cluster
drop table orders_cluster;
drop cluster cluster1;
create cluster cluster1
(
MerchantID number,
TransactionID varchar2(20)
)
single table hashkeys 10000; --This number is important, choose wisely!
create table orders_cluster
(
id number,
MerchantID number,
TransactionID varchar2(20)
) cluster cluster1(merchantid, transactionid);
--Add 1 million rows. 20 seconds.
begin
for i in 1 .. 10 loop
insert into orders_cluster
select rownum + i * 100000, mod(level, 100)+ i * 100000, level
from dual connect by level <= 100000;
commit;
end loop;
end;
/
create unique index orders_cluster_idx on orders_cluster(merchantid, transactionid);
begin
dbms_stats.gather_table_stats(user, 'ORDERS_CLUSTER');
end;
/
Create Regular Table (For Comparison)
drop table orders_table;
create table orders_table
(
id number,
MerchantID number,
TransactionID varchar2(20)
) nologging;
--Add 1 million rows. 2 seconds.
begin
for i in 1 .. 10 loop
insert into orders_table
select rownum + i * 100000, mod(level, 100)+ i * 100000, level
from dual connect by level <= 100000;
commit;
end loop;
end;
/
create unique index orders_table_idx on orders_table(merchantid, transactionid);
begin
dbms_stats.gather_table_stats(user, 'ORDERS_TABLE');
end;
/
Trace Example
SQL*Plus Autotrace is a quick way to find the explain plan and track I/O activity per statement. The number of I/O requests is labeled as "consistent gets" and is a decent way of measuring the amount of work done. This code demonstrates how the numbers were generated for other sections. The queries often need to be run more than once to warm things up.
SQL> set autotrace on;
SQL> select * from orders_cluster where merchantid = 100001 and transactionid = '2';
no rows selected
Execution Plan
----------------------------------------------------------
Plan hash value: 621801084
------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 16 | 1 (0)| 00:00:01 |
|* 1 | TABLE ACCESS HASH| ORDERS_CLUSTER | 1 | 16 | 1 (0)| 00:00:01 |
------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - access("MERCHANTID"=100001 AND "TRANSACTIONID"='2')
Statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
31 consistent gets
0 physical reads
0 redo size
485 bytes sent via SQL*Net to client
540 bytes received via SQL*Net from client
1 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
0 rows processed
SQL>
Find Optimal Hashkeys, Trade-Offs
For optimal read performance all the hash collisions should fit in one block (all Oracle I/O is done per block, usually 8K). Getting the ideal storage right is tricky and requires knowing the hash algorithm, storage size (not the same as the block size), and number of hash keys (the buckets). Oracle has a default algorithm and size so it is possible to focus on only one attribute, the number of hash keys.
More hash keys leads to fewer collisions. This is good for TABLE ACCESS HASH performance as there is only one block to read. Below are the number of consistent gets for different hashkey sizes. For comparison an index access is also included. With enough hashkeys the number of blocks decreases to the optimal number, 1.
Method Consistent Gets (for transactionid = 1, 20, 300, 4000, and 50000)
Index 4, 3, 3, 3, 3
Hashkeys 100 1, 31, 31, 31, 31
Hashkeys 1000 1, 3, 4, 4, 4
Hashkeys 10000 1, 1, 1, 1, 1
More hash keys also lead to more buckets, more wasted space, and a slower TABLE ACCESS FULL operation.
Table type Space in MB
HeapTable 24MB
Hashkeys 100 26MB
hashkeys 1000 30MB
hashkeys 10000 81MB
To reproduce my results, use a sample query like select * from orders_cluster where merchantid = 100001 and transactionid = '1'; and change the last value to 1, 20, 300, 4000, and 50000.
Performance Comparison
Consistent gets are predictable and easy to measure, but at the end of the day only the wall clock time matters. Surprisingly, the index access with 4 times more
consistent gets is still faster than the optimal hash cluster scenario.
--3.5 seconds for b-tree access.
declare
v_count number;
begin
for i in 1 .. 100000 loop
select count(*)
into v_count
from orders_table
where merchantid = 100000 and transactionid = '1';
end loop;
end;
/
--3.8 seconds for hash cluster access.
declare
v_count number;
begin
for i in 1 .. 100000 loop
select count(*)
into v_count
from orders_cluster
where merchantid = 100000 and transactionid = '1';
end loop;
end;
/
I also tried the test with variable predicates but the results were similar.
Does it Scale?
No, hash clusters do not scale. Despite the O(1) time complexity of TABLE ACCESS HASH, and the O(log n) time complexity of INDEX UNIQUE SCAN, hash clusters never seem to outperform b-tree indexes.
I tried the above sample code with 10 million rows. The hash cluster was painfully slow to load, and still under-performed the index on SELECT performance. I tried to scale it up to 100 million rows but the insert was going to take 11 days.
The good news is that b*trees scale well. Adding 100 million rows to the above example only require 3 levels in the index. I looked at all DBA_INDEXES for a large database environment (hundreds of databases and a petabyte of data) - the worst index had only 7 levels. And that was a pathological index on VARCHAR2(4000) columns. In most cases your b-tree indexes will stay shallow regardless of the table size.
In this case, O(log n) beats O(1).
But WHY?
Poor hash cluster performance is perhaps a victim of Oracle's attempt to simplify things and hide the kind of details necessary to make a hash cluster work well. Clusters are difficult to setup and use properly and would rarely provide a significant benefit anyway. Oracle has not put a lot of effort into them in the past few decades.
The commenters are correct that a simple b-tree index is best. But it's not obvious why that should be true and it's good to think about the algorithms used in the database.
I have a table in oracle database which may contain amounts >=$10M or <=$-10B.
99999999.99 chunks and also include remainder.
If the value is less than or equal to $-10B, I need to break into one or more 999999999.99 chunks and also include remainder.
Your question is somewhat unreadable, but unless you did not provide examples here is something for start, which may help you or someone with similar problem.
Let's say you have this data and you want to divide amounts into chunks not greater than 999:
id amount
-- ------
1 1500
2 800
3 2500
This query:
select id, amount,
case when level=floor(amount/999)+1 then mod(amount, 999) else 999 end chunk
from data
connect by level<=floor(amount/999)+1
and prior id = id and prior dbms_random.value is not null
...divides amounts, last row contains remainder. Output is:
ID AMOUNT CHUNK
------ ---------- ----------
1 1500 999
1 1500 501
2 800 800
3 2500 999
3 2500 999
3 2500 502
SQLFiddle demo
Edit: full query according to additional explanations:
select id, amount,
case
when amount>=0 and level=floor(amount/9999999.99)+1 then mod(amount, 9999999.99)
when amount>=0 then 9999999.99
when level=floor(-amount/999999999.99)+1 then -mod(-amount, 999999999.99)
else -999999999.99
end chunk
from data
connect by ((amount>=0 and level<=floor(amount/9999999.99)+1)
or (amount<0 and level<=floor(-amount/999999999.99)+1))
and prior id = id and prior dbms_random.value is not null
SQLFiddle
Please adjust numbers for positive and negative borders (9999999.99 and 999999999.99) according to your needs.
There are more possible solutions (recursive CTE query, PLSQL procedure, maybe others), this hierarchical query is one of them.
I have a Table with 5000 rows. Is there any way to find out how much space is used by the first 100 rows?
EDIT I found this script on the internet:
WITH table_size AS
(SELECT owner, segment_name, SUM (BYTES) total_size
FROM dba_extents
WHERE segment_type = 'TABLE'
GROUP BY owner, segment_name)
SELECT table_name, avg_row_len, num_rows * avg_row_len actual_size_of_data,
b.total_size
FROM dba_tables a, table_size b
WHERE a.owner = UPPER ('&&ENTER_OWNER_NAME')
AND a.table_name = UPPER ('&&ENTER_TABLE_NAME')
AND a.owner = b.owner
AND a.table_name = b.segment_name;
I don't know if it gives the desired result. It calculates the Average Row Length and I multiply that with 100.
Oracle separates the logical storage from the physical by packing rows into blocks. So, while it's not straight forward and it is also not exact, it is possible.
You have to determine the sum of the bytes used by a row (this will be an educated guess if you have one or more varchar2 columns) then, based on block size, determine how many rows will fit in a block. Oracle always allocates a full block even if it only has two store on byte in it so the total storage will be a factor of block size.