Gather Stats Required After Truncate and Insert? - oracle

I needed to truncate and reload a table.
I learned that truncate needs stats gathering on the table as its successor process so the database gets the actual statistics, otherwise previous stats are not cleared by the truncate statement.
After doing these two operations (truncate and stats gathering on the empty table), ran the insert... but don't see new statistics in all_tab_statistics table for my table. Sample_size is still 0.
Why is that? Shouldn't have Oracle done the automatic stats gathering after the insert?
Do I need to rerun the stats or is it just fine considering the performance around this table (please note it's going to truncate and reload each time)?

Consider the following approach. It has the advantage of the table always being present.
Create an empty new table like the old one.
Load the data into the new table. This is the slowest step.
Do whatever cleanup you might need, such as refreshing the statistics.
RENAME tables to swap the new table in place. This step is fast enough so you won't notice.

I know it's a long time since I posted my question above. But recently, we again faced the similar situation and this time below steps worked towards a much better performance on a table with 800 million rows.
Take a backup of the original table.
Truncate the original table.
Gather stats on the truncated table, so that statistics show 0 in the DB. Us CASCADE=>TRUE in the command to also include indexes in the process.
Drop the indexes on the truncated table and Insert the required data from the backup table.
Recreate the indexes and gather stats again (ofcourse, with CASCADE=>TRUE; however recreation of the indexes should ideally have calculated the appropriate stats).
Drop the backup table if not needed.

Related

Alter table on cluster

I have to add ability to store Nullable data into some column of existing tables, but this tables have ReplicatedMergeTree engine. So I got a question if exists command alter table on cluster modify column... ?
And how much it will be cost in time?
Thanks!
This is the syntax of the modify query on cluster, it will also update the replicas (but it can take a while).
ALTER TABLE example_table ON CLUSTER example_cluster MODIFY COLUMN IF EXISTS `example_column` Nullable(UInt8)
For the time required to do the operation, it depends on a lot of parameters (the amount of data, the hardware ...), we can't estimate it without more informations.
Usually when we have such modification and we want to ensure that it won't take too much time and that it is working, we make some benchmarks to estimate the time it would take.

Removing bloat from greenplum table

I have created some tables in Greenplum, performing insert update and delete operation. Regularly I am also performing vacuum operation. I Found bloat in it. Found solution to remove bloat https://discuss.pivotal.io/hc/en-us/articles/206578327-What-are-the-different-option-to-remove-bloat-from-a-table
However, if I truncate the table and reinsert the data, it removes bloat. Is it good practice to truncate the data from the table?
If you are performing UPDATE and DELETE statements on a heap table (default storage) and running VACUUM regularly, you will get some bloat by design. Heap storage, which is similar to the default PostgreSQL storage mechanism, provides read consistency using Multi-Version Concurrency Control (MVCC).
When you UPDATE or DELETE a record, the old value is still in the table and is able to be read by transactions that are still inflight and started before you issued the UPDATE or DELETE command. This provides the read consistency to the table.
When you execute a VACUUM statement, the database will mark the stale rows as available to be overwritten. It doesn't shrink the files. It just marks rows so they can be overwritten. The next time you execute an INSERT or UPDATE, the stale rows are now able to be used for the new data.
So if you UPDATE or DELETE 10% of a table between running VACUUM, you will probably have about 10% bloat.
Greenplum also has Append-Optimized (AO) storage which doesn't use MVCC and uses a visibility map instead. The files are bit smaller too so you should get better performance. The stale rows are hidden with the visibility map and VACUUM won't do anything until you hit the gp_appendonly_compaction_threshold percentage. The default is 10%. When you have 10% bloat in an AO table and execute VACUUM, the table will automatically get rebuilt for you.
Append-Optimized is called "appendonly" for backwards compatibility reasons but it does allow UPDATE and DELETE. Here is an example of an AO table:
CREATE TABLE sales
(txn_id int, qty int, date date)
WITH (appendonly=true)
DISTRIBUTED BY (txn_id);
Instead of truncate it is better to use drop the table, create the table and then insert the data.

Not able to vacuum table in greenplum

I have created a table in Greenplum and performing insert update delete operation on it. I have run vacuum command on the table, showing it ran successfully.
However when I run the command select * from gp_toolkit.gp_bloat_diag;. It displays same table name.
After repeatedly running vacuum also display table name in list from command select * from gp_toolkit.gp_bloat_diag;
How should I make sure table does not have any bloat and vacuumed properly?
For clarification:
VACUUM does remove bloat (the dead tuples in a table), and allows that space to be re-used by new tuples.
The difference between VACUUM and VACUUM FULL is that FULL re-writes the relfiles ( the table storage ) and reclaims the space for the OS.
gp_toolkit.gp_bloat_diag doesn't update in immediately, but updates shortly after an ANALYZE when the stats for the table have been updated.
I would only recommend that you run VACUUM FULL if the table is very small or if a system catalog table has grown out of proportion, where you don't have a lot of options.
VACUUM FULL is a very expensive operation.
On a very large table can lead to unexpected run time and during this run the table will be on exclusive lock the whole time.
In general, frequent VACUUM will save your tables from growing unnecessarily large. The dead tuples will be removed and the space will be reused.
If you have a large table with significant bloat and a lot of dead space, you will likely want to reorganize -- which is a less expensive way to reclaim space.
alter table <table_name> set with (reorganize=true) distributed (randomly -- or -- by (<column_names1>,<column_names2>....)
Please refer this to know Different option to remove bloat from a table
VACUUM will not remove bloat but VACUUM FULL will. Check below example
Table Creation:
DROP TABLE IF EXISTS testbloat;
CREATE TABLE testbloat
(
id BIGSERIAL NOT NULL
, dat_year INTEGER
)
WITH (OIDS = FALSE)
DISTRIBUTED BY (id);
Inserting 1M records into table :
INSERT INTO testbloat (dat_year) VALUES(generate_series(1,1000000));
Checking the size of the table. Size is 43MB
SELECT 'After Inserting data',pg_size_pretty(pg_relation_size('testbloat'));
Updating all the records in the table
UPDATE testbloat
SET dat_year = dat_year+1;
Checking the size of the table after updating. Size is 85MB. It is increased because of bloat which was caused because of update operation
SELECT 'After updating data',pg_size_pretty(pg_relation_size('testbloat'));
Applying VACUUM on the table
Vacuum testbloat;
Checking the size of the table after VACUUM. Size is still 85MB.
SELECT 'After Vacuum', pg_size_pretty(pg_relation_size('testbloat'));
Applying VACUUM FULL on the table
Vacuum FULL testbloat;
Checking the size of the table after VACUUM FULL. Size is still 43MB. It got reduced as the table bloat was not there
SELECT 'After Vacuum FULL ', pg_size_pretty(pg_relation_size('testbloat'));
Vacuum never releases the space occupied by expired rows rather it mark those space to be reused for later insert of new rows to the same table itself. Hence, even after you run vaccum, the size of the table won't come down.
Instead of using vaccum full , use CTAS , it is faster than vaccum full and unlike vaccum full it does not hold locks on pg_class table.
And after CTAS operations, rename the table to the old table name.

Slow query execution in an empty table. (after deleting a large amount of inserts)

I have a table in an oracle database with 15 fields.
This table had 3500000 inserts. I deleted them all.
delete
from table
After that, whenever I execute a select statement
I get a very slow response (7 sec) even though the table is empty.
I get a normal response only in the case that I search
according to an indexed field.
Why?
As Gritem says, you need to understand high water marks etc
If you do not want to truncate the table now (because fresh data has been inserted), use alter table xyz shrink space documented here for 10g
Tom Kyte has a good explanation of this issue:
http://asktom.oracle.com/pls/asktom/f?p=100:11:0::::P11_QUESTION_ID:492636200346818072
It should help you understand deletes, truncates, and high watermarks etc.
In sql when you want to completely clear out a table, you should use truncate instead of delete. Let's say you have your table with 3.5 million rows in it and there is an index (unique identifier) on a column of bigint that increments for each row. Truncating the table will completely clear out the table and reset the index to 0. Delete will not clear the index and will continue at 3,500,001 when the next record is inserted. Truncate is also much faster than delete. Read the articles below to understand the differences.
Read this article Read this article that explains the difference between truncate and delete. There are times to use each one. Here is another article from an Oracle point of view.

What is the fastest way to insert data into an Oracle table?

I am writing a data conversion in PL/SQL that processes data and loads it into a table. According to the PL/SQL Profiler, one of the slowest parts of the conversion is the actual insert into the target table. The table has a single index.
To prepare the data for load, I populate a variable using the rowtype of the table, then insert it into the table like this:
insert into mytable values r_myRow;
It seems that I could gain performance by doing the following:
Turn logging off during the insert
Insert multiple records at once
Are these methods advisable? If so, what is the syntax?
It's much better to insert a few hundred rows at a time, using PL/SQL tables and FORALL to bind into insert statement. For details on this see here.
Also be careful with how you construct the PL/SQL tables. If at all possible, prefer to instead do all your transforms directly in SQL using "INSERT INTO t1 SELECT ..." as doing row-by-row operations in PL/SQL will still be slower than SQL.
In either case, you can also use direct-path inserts by using INSERT /*+APPEND*/, which basically bypasses the DB cache and directly allocates and writes new blocks to data files. This can also reduce the amount of logging, depending on how you use it. This also has some implications, so please read the fine manual first.
Finally, if you are truncating and rebuilding the table it may be worthwhile to first drop (or mark unusable) and later rebuild indexes.
Regular insert statements are the slowest way to get data in a table and not meant for bulk inserts. The following article references a lot of different techniques for improving performance: http://www.dba-oracle.com/oracle_tips_data_load.htm
Drop the index, then insert the rows, then re-create the index.
If dropping the index doesn't speed things up enough, you need the Oracle SQL*Loader:
http://www.oracle.com/technology/products/database/utilities/htdocs/sql_loader_overview.html
Suppose you have taken eid,ename,sal,job. So create a table first as:
SQL>create table tablename(eid number, ename varchar2(20),sal number,job char(10));
Now insert data:-
SQL>insert into tablename values(&eid,'&ename',&sal,'&job');
Check this link
http://www.dba-oracle.com/t_optimize_insert_sql_performance.htm
main points to consider for your
case is to use Append hint as this
will directly append into the table
instead of using freelist. If you can afford to turn off logging than use append with nologging hint to do it
Use a bulk insert instead instead of iterating in PL/SQL
Use sqlloaded to load the data directly into the table if you are getting data from a file feed
Here are my recommendations on fast insert.
Trigger - Disable any triggers associated with a table. Enable after Inserts are complete.
Index - Drop Index and re-create it after your Inserts are complete.
Stale stats - Re-analyze table and index stats.
Index de-fragmentation - Rebuild Index if needed
Use No Logging -Insert using INSERT APPEND (Oracle only). This approach is very risky approach, no redo logs are generated therefore you can’t do a rollback - make a backup of table before you start and don't try on live tables. Check if your db has similar option
Parallel Insert: Running parallel insert will get the job faster.
Use Bulk Insert
Constraints - Not much overhead during inserts but still a good idea to check, if it is still slow after even after step 1
You can learn more on http://www.dbarepublic.com/2014/04/slow-insert.html
Maybe one of your best option is to avoid Oracle as much as possible actually.
I've been baffled by this myself, but very often a Java process can outperform many of the Oracle's utilities which either use OCI (read: SQL Plus) or will take up so much of your time to get right (read: SQL*Loader).
This doesn't prevent you to use specific hints either (like /APPEND/).
I've been pleasantly surprised each time I've turned to that kind of solution.
Cheers,
Rollo

Resources