Oracle how to defragment tablespace - oracle

I'm trying to de-fragment a table space in order to shrink it.
I need the SQL query to find the last element in the table space.
I'm using the following query, but I'm not sure that it is correct:
select * from (
select * from dba_extents where tablespace_name = 'TS_DATA' order by BLOCK_ID desc)
where rownum<22;
I then want to de-fragment it with commands like alter index XXX_PK rebuild; and alter table XXX move;
My question is: do I have to sort by BLOCK_ID or by EXTENT_ID, or both or other?
I think I must be doing it wrong, since the shrink command doesn't work: ALTER DATABASE DATAFILE 'C:\ORACLEXE\APP\ORACLE\ORADATA\XE\TS_DATA.DBF' RESIZE 8777M;. It can't work because of ORA-03297: file contains used data beyond requested RESIZE value.

Related

Oracle - how to see how many blocks have been used in a table

I have very limited experience when using Oracle, and am after a rather simple query I imagine. I have a table which contains 1 million rows, Im trying to proof that compressing the data uses less space, however im not sure how to do this, based on this table creation below could someone please show me what i need to write to see the blocks used before/after?
CREATE TABLE OrderTableCompressed(OrderID, StaffID, CustomerID, TotalOrderValue)
as (select level, ceil(dbms_random.value(0, 1000)),
ceil(dbms_random.value(0,10000)),
round(dbms_random.value(0,10000),2)
from dual
connect by level <= 1000000);
ALTER TABLE OrderTableCompressed ADD CONSTRAINT OrderID_PKC PRIMARY KEY (OrderID);
--QUERY HERE THAT SHOWS BLOCKS USED/TIME TAKEN
SELECT COUNT(ORDERID) FROM OrderTableCompressed;
ALTER TABLE OrderTableCompressed COMPRESS;
--QUERY HERE THAT SHOWS BLOCKS USED/TIME TAKEN WHEN COMPRESSED
SELECT COUNT(ORDERID) FROM OrderTableCompressed;
I know how the compression works etc... its just applying the code to proove my theory. thanks for any help
--QUERY HERE THAT SHOWS BLOCKS USED
SELECT blocks, bytes/1024/1024 as MB
FROM user_segments
where segment_name = 'ORDERTABLECOMPRESSED';
Now compress the table: (Note the move. Without it you just change the attribute of the table and subsequent direct path inserts will create compressed blocks)
ALTER TABLE OrderTableCompressed MOVE COMPRESS;
Verify blocks:
--QUERY HERE THAT SHOWS BLOCKS USED TAKEN WHEN COMPRESSED
SELECT blocks, bytes/1024/1024 as MB
FROM user_segments
where segment_name = 'ORDERTABLECOMPRESSED';

Oracle 11G - Performance effect of indexing at insert

Objective
Verify if it is true that insert records without PK/index plus create thme later is faster than insert with PK/Index.
Note
The point here is not about indexing takes more time (it is obvious), but the total cost (Insert without index + create index) is higher than (Insert with index). Because I was taught to insert without index and create index later as it should be faster.
Environment
Windows 7 64 bit on DELL Latitude core i7 2.8GHz 8G memory & SSD HDD
Oracle 11G R2 64 bit
Background
I was taught that insert records without PK/Index and create them after insert would be faster than insert with PK/Index.
However 1 million record inserts with PK/Index was actually faster than creating PK/Index later, approx 4.5 seconds vs 6 seconds, with the experiments below. By increasing the records to 3 million (999000 -> 2999000), the result was the same.
Conditions
The table DDL is below. One bigfile table space for both data and
index.
(Tested a separate index tablespace with the same result & inferior overall perforemace)
Flush the buffer/spool before each run.
Run the experiment 3 times each and made sure the results
were similar.
SQL to flush:
ALTER SYSTEM CHECKPOINT;
ALTER SYSTEM FLUSH SHARED_POOL;
ALTER SYSTEM FLUSH BUFFER_CACHE;
Question
Would it be actually true that "insert witout PK/Index + PK/Index creation later" is faster than "insert with PK/Index"?
Did I make mistakes or missed some conditions in the experiment?
Insert records with PK/Index
TRUNCATE TABLE TBL2;
ALTER TABLE TBL2 DROP CONSTRAINT PK_TBL2_COL1 CASCADE;
ALTER TABLE TBL2 ADD CONSTRAINT PK_TBL2_COL1 PRIMARY KEY(COL1) ;
SET timing ON
INSERT INTO TBL2
SELECT i+j, rpad(TO_CHAR(i+j),100,'A')
FROM (
WITH DATA2(j) AS (
SELECT 0 j FROM DUAL
UNION ALL
SELECT j+1000 FROM DATA2 WHERE j < 999000
)
SELECT j FROM DATA2
),
(
WITH DATA1(i) AS (
SELECT 1 i FROM DUAL
UNION ALL
SELECT i+1 FROM DATA1 WHERE i < 1000
)
SELECT i FROM DATA1
);
commit;
1,000,000 rows inserted.
Elapsed: 00:00:04.328 <----- Insert records with PK/Index
Insert records without PK/Index and create them after
TRUNCATE TABLE TBL2;
ALTER TABLE &TBL_NAME DROP CONSTRAINT PK_TBL2_COL1 CASCADE;
SET TIMING ON
INSERT INTO TBL2
SELECT i+j, rpad(TO_CHAR(i+j),100,'A')
FROM (
WITH DATA2(j) AS (
SELECT 0 j FROM DUAL
UNION ALL
SELECT j+1000 FROM DATA2 WHERE j < 999000
)
SELECT j FROM DATA2
),
(
WITH DATA1(i) AS (
SELECT 1 i FROM DUAL
UNION ALL
SELECT i+1 FROM DATA1 WHERE i < 1000
)
SELECT i FROM DATA1
);
commit;
ALTER TABLE TBL2 ADD CONSTRAINT PK_TBL2_COL1 PRIMARY KEY(COL1) ;
1,000,000 rows inserted.
Elapsed: 00:00:03.454 <---- Insert without PK/Index
table TBL2 altered.
Elapsed: 00:00:02.544 <---- Create PK/Index
Table DDL
CREATE TABLE TBL2 (
"COL1" NUMBER,
"COL2" VARCHAR2(100 BYTE),
CONSTRAINT "PK_TBL2_COL1" PRIMARY KEY ("COL1")
) TABLESPACE "TBS_BIG" ;
The current test case is probably good enough for you to overrule the "best practices". There are too many variables involved to make a blanket statement that "it's always best to leave the indexes enabled". But you're probably close enough to say it's true for your environment.
Below are some considerations for the test case. I've made this a community wiki in the hopes that others will add to the list.
Direct-path inserts. Direct-path writes use different mechanisms and may work completely differently. Direct-path inserts can often be significantly faster than regular inserts, although they have some complicated restrictions (for example, triggers must be disabled) and disadvantages (the data is not immediately backed-up). One particular way it affects this scenario is that NOLOGGING for indexes only applies during index creation. So even if a direct-path insert is used, an enabled index will always generate REDO and UNDO.
Parallelism. Large insert statements often benefit from parallel DML. Usually it's not worth worrying about the performance of bulk loads until it takes more than several seconds, which is when parallelism starts to be useful.
Bitmap indexes are not meant for large DML. Inserts or updates to a table with a bitmap index can lock the whole table and lead to disastrous performance. It might be helpful to limit the test case to b-tree indexes.
Add alter system switch logfile;? Log file switches can sometimes cause performance issues. The tests would be somewhat more consistent if they all started with empty logfiles.
Move data generation logic into a separate step. Hierarchical queries are useful for generating data but they can have their own performance issues. It might be better to create in intermediate table to hold the results, and then only test inserting the intermediate table into the final table.
It's true that it is faster to modify a table if you do not also have to modify one or more indexes and possibly perform constraint checking as well, but it is also largely irrelevant if you then have to add those indexes. You have to consider the complete change to the system that you wish to effect, not just a single part of it.
Obviously if you are adding a single row into a table that already contains millions of rows then it would be foolish to drop and rebuild indexes.
However, even if you have a completely empty table into which you are going to add several million rows it can still be slower to defer the indexing until afterwards.
The reason for this is that such an insert is best performed with the direct path mechanism, and when you use direct path inserts into a table with indexes on it, temporary segments are built that contain the data required to build the indexes (data plus rowids). If those temporary segments are much smaller than the table you have just loaded then they will also be faster to scan and to build the indexes from.
the alternative, if you have five index on the table, is to incur five full table scans after you have loaded it in order to build the indexes.
Obviously there are huge grey areas involved here, but well done for:
Questioning authority and general rules of thumb, and
Running actual tests to determine the facts in your own case.
Edit:
Further considerations -- you run a backup while the indexes are dropped. Now, following an emergency restore, you have to have a script that verifies that all indexes are in place, when you have the business breathing down your neck to get the system back up.
Also, if you absolutely were determined to not maintain indexes during a bulk load, do not drop the indexes -- disable them instead. This preserves the metadata for the indexes existence and definition, and allows a more simple rebuild process. Just be careful that you do not accidentally re-enable indexes by truncating the table, as this will render disabled indexes enabled again.
Oracle has to do more work while inserting data into table having an index. In general, inserting without index is faster than inserting with index.
Think in this way,
Inserting rows in a regular heap-organized table with no particular row order is simple. Find a table block with enough free space, put the rows randomly.
But, when there are indexes on the table, there is much more work to do. Adding new entry for the index is not that simple. It has to traverse the index blocks to find the specific leaf node as the new entry cannot be made into any block. Once the correct leaf node is found, it checks for enough free space and then makes the new entry. If there is not enough space, then it has to split the node and distribute the new entry into old and new node. So, all this work is an overhead and consumes more time overall.
Let's see a small example,
Database version :
SQL> SELECT banner FROM v$version where ROWNUM =1;
BANNER
--------------------------------------------------------------------------------
Oracle Database 12c Enterprise Edition Release 12.1.0.1.0 - 64bit Production
OS : Windows 7, 8GB RAM
With Index
SQL> CREATE TABLE t(A NUMBER, CONSTRAINT PK_a PRIMARY KEY (A));
Table created.
SQL> SET timing ON
SQL> INSERT INTO t SELECT LEVEL FROM dual CONNECT BY LEVEL <=1000000;
1000000 rows created.
Elapsed: 00:00:02.26
So, it took 00:00:02.26. Index details:
SQL> column index_name format a10
SQL> column table_name format a10
SQL> column uniqueness format a10
SQL> SELECT index_name, table_name, uniqueness FROM user_indexes WHERE table_name = 'T';
INDEX_NAME TABLE_NAME UNIQUENESS
---------- ---------- ----------
PK_A T UNIQUE
Without Index
SQL> DROP TABLE t PURGE;
Table dropped.
SQL> CREATE TABLE t(A NUMBER);
Table created.
SQL> SET timing ON
SQL> INSERT INTO t SELECT LEVEL FROM dual CONNECT BY LEVEL <=1000000;
1000000 rows created.
Elapsed: 00:00:00.60
So, it took only 00:00:00.60 which is faster compared to 00:00:02.26.

Table not created in specific tablespace - Oracle

I am a moderate user of Oracle and I had to create some of the tables in specified table space as shown below.
create table t_abc tablespace abc_tbspc as select * from abc;
create table t_xyz tablespace abc_tbspc as select * from xyz;
After running these through jobs (file containing around 5 tables to be created in a single tablespace), I could see that the table t_abc is created in abc_tbspc ; but the table t_xyz is assigned to null when I query the all_tables. Not sure why the 2nd table is not created in the specified tablespace even though there is abundant space in the table space.
TABLESPACE_NAME will be null for one of these reasons:
Temporary Temporary tables use a temporary tablespace.
Index Organized Index-organized tables store data in an index, not in a heap.
Partitioned Each partition could have a different tablespace, there is not necessarily one tablespace for the whole table.
External External tables do not store their data in a tablespace.
Your code should not meet one of the conditions above; did you leave out some details? I ran the query below to look for other cases where TABLESPACE_NAME is null but could not find any.
select *
from dba_tables
where tablespace_name is null
and (temporary is null or temporary <> 'Y') -- #1
and (iot_type is null or iot_type <> 'IOT') -- #2
and (partitioned is null or partitioned <> 'YES') -- #3
and (owner, table_name) not in -- #4
(select owner, table_name from dba_external_tables)

Oracle Table Size without Data

Is it possible to have a table that reports a size but does not have any rows in it? When I run the following query one of the tables reports a size but does not contain any rows. How is this possible?
select table_name,
b.tablespace_name,
sum( bytes)/1024/1024 "SIZE IN MB"
from USER_segments a,
user_tables b
where table_name=segment_name
group by segment_name,
b.tablespace_name,
table_name;
Table segements grow when data is inserted into them. Since 11g a new created table can be created without a segment. When data is inserted into such a newly created table, the segment is created.
The space that is occupied by the segment is not automatically returned to the free space in the datafile when rows are deleted. So, your table is created empty with direct segment creation, or it had rows and they are deleted.

Hive: Creating smaller table from big table

I currently have a Hive table that has 1.5 billion rows. I would like to create a smaller table (using the same table schema) with about 1 million rows from the original table. Ideally, the new rows would be randomly sampled from the original table, but getting the top 1M or bottom 1M of the original table would be ok, too. How would I do this?
As climbage suggested earlier, you could probably best use Hive's built-in sampling methods.
INSERT OVERWRITE TABLE my_table_sample
SELECT * FROM my_table
TABLESAMPLE (1m ROWS) t;
This syntax was introduced in Hive 0.11. If you are running an older version of Hive, you'll be confined to using the PERCENT syntax like so.
INSERT OVERWRITE TABLE my_table_sample
SELECT * FROM my_table
TABLESAMPLE (1 PERCENT) t;
You can change the percentage to match you specific sample size requirements.
You can define a new table with the same schema as your original table.
Then use INSERT OVERWRITE TABLE <tablename> <select statement>
The SELECT statement will need to query your original table, use LIMIT to only get 1M results.
This query will pull out top 1M rows and overwrite them in a new table.
CREATE TABLE new_table_name AS
SELECT col1, col2, col3, ....
FROM original_table
WHERE (if you want to put any condition) limit 100000;

Resources