How to safely TRUNCATE and re-populate a table in PostgreSQL?

How to safely TRUNCATE and re-populate a table in PostgreSQL? - spring

One of the tables in my db has to be updated daily. A web server actively queries this table every 5 seconds. Now I can't simply UPDATE the table rows because of some constraints, so I need to clear all the rows and repopulate the table. How can I safely do this without affecting the functioning of web server?
The repopulation is done by an other web service isolated from the web server. I am using Spring framework.
The table has approx. 170k rows and 5 columns.

Truncate and re-populate the table in a single transaction. The truncate isn't visible to concurrent readers, who continue to see the old data. Correction per #AlexanderEmelianov and the docs:
TRUNCATE is not MVCC-safe. After truncation, the table will appear empty to concurrent transactions, if they are using a snapshot taken before the truncation occurred. See Section 13.5 for more details.
so after the TRUNCATEing txn commits , concurrent txns started before the TRUNCATE will see the table as empty.
Any transaction attempting to write to the table after it's truncated will wait until the transaction doing the truncate either commits or rolls back.
BEGIN;
TRUNCATE TABLE my_table;
INSERT INTO my_table (blah) VALUES (blah), (blah), (blah);
COMMIT;
You can COPY instead of INSERT too. Anything in a normal transaction.
Even better, there's an optimisation in PostgreSQL that makes populating a table after a truncate faster than if you do it otherwise.

Related

Sessions are locking each other while direct insert into subpartition with name of subpartition specified

We have a one large table that we need to insert data of it into another table.
Target table is partitioned by range (by day) and subpartitioned by departments.
For loading table data, we have used dbms_parallelel_execute and created a task using sql that gets list of departments, level is 20, that is at one time only 20 tasks corresponding to 20 departments will run. Those task will select the department's data from source table and inserts into target table.
Before doing insert, we first get subpartition name and generate the following insert:
INSERT /*+ NO_GATHER_OPTIMIZER_STATISTICS ENABLE_PARALLEL_DML APPEND_VALUES */ into Target_Table subpartition (subpartition_name) values (:B1, :B2, :B3, ....) ;
We read on oracle documentation that specifiying subpartition during insert will lock only that subpartition and append will work . The goal of doing this was to create n jobs that will independently insert into their own give subpartitions. Append itself is working, but when we monitor v$session while loading table data, we see that
BLOCKING_SESSION_STATUS is VALID;
FINAL_BLOCKING_SESSION_STATUS is VALID;
EVENT# is library cache lock
STATE is WAITING,
WAIT_CLASS is Concurrency
From this, we are concluding that one append_values is still locking other sessions to insert to another subpartition, is there something we missed? We have enabled parallel dml, disabled target table's indexes, set skip_unusable_indexes to true, no referential constraints are present in target table, table, partitions and subpartitions are set to nologging.
EDIT: Tested the same thing with another table that is also partitioned, but it doesn't have subpartitions, it is only list partitioned. So instead of subpartition (subpartition_name) inside insert statement there was partition(partition_name) . However, in this case , insert run without sessions waiting for others and no locks were held. I am assuming with subpartitioned interval tables the above won't work.
EDIT2 I have created the same scenario in another database which is also Oracle 19c. Created a table with partitions and subpartitions, set the interval, disabled indexes, set nologging and run the job that inserts into subpartitions. Surprisingly, the insert went without errors and no sessions locking each other. Now I am thinking maybe its some database parameter that should be turned on or changed. Because database versions, table structures, jobs, inserts are the same, but in one it is locking each other, in another it is not.
UPDATE Adding the insert part of the code :
if c_tab_cursor %isopen then
close c_tab_cursor;
end if;
open c_tab_cursor;
loop
fetch c_tab_cursor bulk collect
into v_row limit 100000;
exit when(v_row.count = 0);
forall i in v_row.First .. v_row.Last
insert /*+ NO_GATHER_OPTIMIZER_STATISTICS APPEND_VALUES */ into
Target_Table subpartition(SYS_P68457)
values v_row
(i);
commit;
end loop;
close c_tab_cursor;
Edit3 Adding table info, table is daily partitioned, and each partition has around 150 subpartitions. At the time of writing this, table had total 177845 subpartitions. My other guess is oracle is spending many time to find the right subpartition, which is also arguable because subpartition name is provided during insert.

I'd say it is expected "feature" - when you insert into the same segment. Direct path insert writes data beyond HWM(high water mark) rather than using segment's free space map.
When you commit direct path insert HWM advances, when you rollback HWM stays and data is discarded.
Check Oracle segment parameter "FREELIST", but I'm afraid even this parameter wont help you.
When your inserts touch different subpartitions this should not be happening.
There can be various objects held by library cache lock (maybe due to bug).
IMHO only way how to investigate this would be either to use hanganalyze to check which function in oracle is being blocked or to query P1,P2,P3 parameters of library cache lock and identify which object is blocking parallel run.
PS: I saw bugs like: Only one session could run Java stored procedure at the time because Oracle unnecessarily wanted to hold exclusive lock on some library case object.

v$session reports the wait state at that precise instant that you query it. It's meaningless unless you keep requerying and keep seeing the same thing. Better yet, use v$active_session_history to see Oracle's own 1-second sampling of the wait state. If you see lots of rows with that wait, then it's meaningful.
Assuming that this is meaningful, I would point out that you are using a single row VALUES list and yet are asking for parallel dml. Parallel dml is for multiple row operations, not single row operations. You can use it for an insert-select, for example, but not an insert-values.
If your application is necessarily single-row driven, remove ENABLE_PARALLEL_DML APPEND_VALUES hints. If you are binding arrays to these variables, you can leave the APPEND_VALUES but remove the ENABLE_PARALLEL_DML. For inserts, parallel DML only works with insert-select.
As you clearly intend to have multiple sessions, each loading a separate subpartitions, that's your parallelism right there - you don't need nor want to add another layer of parallelism with PDML.

how works DBA_TAB_MODIFICATIONS

I wonder how works the DBA_TAB_MODIFICATIONS.
How long is kept the data for a table? I have tables with timestamp
on Dec/2019
What does it mean when a table is not in
DBA_TAB_MODIFICATIONS? do it mean the table hasn't had (delete,
insert, update) in a period? if yes, for how long?
I have a Schema with around 1000 tables, only around 300 appear in
DBA_TAB_MODIFICATIONS

DBA_TAB_MODIFICATIONS is used by Oracle internally to track how many inserts, updates and deletes have been done to a table or table partition since the stats had been gathered on it with dbms_stats.
What version of Oracle are you using because after Oracle 9 it is automatically inserted
into the DBA_TAB_MODIFICATIONS table, before oracle 9 you have to register a table as MONITORED.

Removing bloat from greenplum table

I have created some tables in Greenplum, performing insert update and delete operation. Regularly I am also performing vacuum operation. I Found bloat in it. Found solution to remove bloat https://discuss.pivotal.io/hc/en-us/articles/206578327-What-are-the-different-option-to-remove-bloat-from-a-table
However, if I truncate the table and reinsert the data, it removes bloat. Is it good practice to truncate the data from the table?

If you are performing UPDATE and DELETE statements on a heap table (default storage) and running VACUUM regularly, you will get some bloat by design. Heap storage, which is similar to the default PostgreSQL storage mechanism, provides read consistency using Multi-Version Concurrency Control (MVCC).
When you UPDATE or DELETE a record, the old value is still in the table and is able to be read by transactions that are still inflight and started before you issued the UPDATE or DELETE command. This provides the read consistency to the table.
When you execute a VACUUM statement, the database will mark the stale rows as available to be overwritten. It doesn't shrink the files. It just marks rows so they can be overwritten. The next time you execute an INSERT or UPDATE, the stale rows are now able to be used for the new data.
So if you UPDATE or DELETE 10% of a table between running VACUUM, you will probably have about 10% bloat.
Greenplum also has Append-Optimized (AO) storage which doesn't use MVCC and uses a visibility map instead. The files are bit smaller too so you should get better performance. The stale rows are hidden with the visibility map and VACUUM won't do anything until you hit the gp_appendonly_compaction_threshold percentage. The default is 10%. When you have 10% bloat in an AO table and execute VACUUM, the table will automatically get rebuilt for you.
Append-Optimized is called "appendonly" for backwards compatibility reasons but it does allow UPDATE and DELETE. Here is an example of an AO table:
CREATE TABLE sales
(txn_id int, qty int, date date)
WITH (appendonly=true)
DISTRIBUTED BY (txn_id);

Instead of truncate it is better to use drop the table, create the table and then insert the data.

How to purge an Advanced Queue in Oracle

The documentation is clear about how to purge an Oracle AQ:
dbms_aqadm.purge_queue_table()
However, what happens to the storage, especially the high water marks of the queue table, the indexes and of the LOB segments? Is it necessary to shrink the table, too?
In production, the queues are nearly always empty (as they should), but in our test system, they fill up to millions of rows for various reasons, so they need to be emptied sometimes.
Is it neccessary to look at the underlying tables and indexes or is this taken care of automatically?
Many thanks!

DBMS_AQADM.PURGE_QUEUE_TABLE it is equivalent for truncate table. Also look at this error message when you try truncate queue table
ORA-24005: Inappropriate utilities used to perform DDL on AQ table %s.%s
*Cause: An attempt was made to use the SQL command DROP TABLE or TRUNCATE
TABLE or ALTER TABLE on queue metadata or tables.
*Action: Use the DBMS_AQADM.DROP_QUEUE_TABLE to DROP TABLE,
DBMS_AQADM.PURGE_QUEUE_TABLE to TRUNCATE TABLE.
ALTER TABLE redefinition based on only ALTER_TABLE_PROPERTIES and
ALTER_TABLE_PARTITIONING clauses are allowed.
Tom Kyte has already written info about often truncating table https://asktom.oracle.com/pls/apex/f?p=100:11:0::::P11_QUESTION_ID:47911859692542

Oracle 11g Deleting large amount of data without generating archive logs

I need to delete a large amount of data from my database on a regular basis. The process generates huge volume of archive logs. We had a database crash at one point because there was no storage space available on archive destination. How can I avoid generation of logs while I delete data?
The data to be deleted is already marked as inactive in the database. Application code ignores inactive data. I do not need the ability to rollback the operation.
I cannot partition the data in such a way that inactive data falls in one partition that can be dropped. I have to delete the data with delete statements.
I can ask DBAs to set certain configuration at table level/schema level/tablespace level/server level if needed.
I am using Oracle 11g.

What proportion of the data on the table would be deleted, what volume? Are there any referential integrity constraints to manage or is this table childless?
Depending on the answers , you might consider:
"CREATE TABLE keep_data UNRECOVERABLE AS SELECT * FROM ... WHERE
[keep condition]"
Then drop the original table
Then rename keep_table to original table
Rebuild the indexes (again with unrecoverable to prevent redo),constraints etc.
The problem with this approach is it's a multi-step DDL, process, which you will have a job to make fault tolerant and reversible.
A safer option might be to use data-pump to:
Data-pump expdp to extract the "Keep" data
TRUNCATE the table
Data-pump impdp import of data from step 1, with direct-path
At this point I suggest you read the Oracle manual on Data Pump, particularly the section on Direct Path Loads to be sure this will work for you.
MY preferred option would be partitioning.

Of course, the best way would be TenG solution (CTAS, drop and rename table) but it seems it's impossible for you.
Your only problem is the amount of archive logs and database crash problem. In this case, maybe you could partition your delete statement (for example per 10.000 rows).
Something like:
declare
e number;
i number
begin
select count(*) from myTable where [delete condition];
f :=trunc(e/10000)+1;
for i in 1.. f
loop
delete from myTable where [delete condition] and rownum<=10000;
commit;
dbms_lock.sleep(600); -- purge old archive if it's possible
end loop;
end;
After this operation, you should reorganize your table which is surely fragmented.

Alter the table to set NOLOGGING, delete the rows, then turn logging back on.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio