Hive truncate table takes too much time - hadoop

My hive query Truncate table tablename is taking too much time. Table definition has these properties defined
CLUSTERED BY(field1) INTO 2 BUCKETS
STORED AS ORC TBLPROPERTIES('transactional'='true');
The data in table may be 20-30k rows only.
ACID transactions are enabled.
set hive.support.concurrency=true;
set hive.txn.manager=org.apache.hadoop.hive.ql.lockmgr.DbTxnManager;
set hive.enforce.bucketing=true;
set hive.exec.dynamic.partition.mode=nostrict;
set hive.compactor.initiator.on=true;
set hive.compactor.worker.threads=1;
After waiting for long. It throw an error as below
FAILED: Error in acquiring locks: Lock acquisition for
LockRequest(component:[LockComponent(type:EXCLUSIVE, level:TABLE, dbname:db1,
tablename:tbl1, operationType:NO_TXN, isAcid:true)], txnid:0, user:xyz,
hostname:host123, agentInfo:xyz_20190310220349_62d794b8-3166-4049-b9f9-646e40f1d344) timed out after 5503335ms. LockResponse(lockid:5563,
state:WAITING)
But no other user or job is using this table. So as to wait for the lock. What else could be the reason for the wait?
Also an insert query was executed right before the truncate(for a particular condition) .

As there was no other answer i would like to mention that Delete from table completed in usual time(took 2 mins and more importantly No lock error) in my case compared to Truncate.

Hive's concurrency support for acid tables are not quite right.
As per https://community.hortonworks.com/content/supportkb/150639/hive-queries-randomly-fail-due-to-error-in-acquiri.html
Disable concurrency support with
(set hive.support.concurrency=false)
and restart the affected components.

Related

Sessions are locking each other while direct insert into subpartition with name of subpartition specified

We have a one large table that we need to insert data of it into another table.
Target table is partitioned by range (by day) and subpartitioned by departments.
For loading table data, we have used dbms_parallelel_execute and created a task using sql that gets list of departments, level is 20, that is at one time only 20 tasks corresponding to 20 departments will run. Those task will select the department's data from source table and inserts into target table.
Before doing insert, we first get subpartition name and generate the following insert:
INSERT /*+ NO_GATHER_OPTIMIZER_STATISTICS ENABLE_PARALLEL_DML APPEND_VALUES */ into Target_Table subpartition (subpartition_name) values (:B1, :B2, :B3, ....) ;
We read on oracle documentation that specifiying subpartition during insert will lock only that subpartition and append will work . The goal of doing this was to create n jobs that will independently insert into their own give subpartitions. Append itself is working, but when we monitor v$session while loading table data, we see that
BLOCKING_SESSION_STATUS is VALID;
FINAL_BLOCKING_SESSION_STATUS is VALID;
EVENT# is library cache lock
STATE is WAITING,
WAIT_CLASS is Concurrency
From this, we are concluding that one append_values is still locking other sessions to insert to another subpartition, is there something we missed? We have enabled parallel dml, disabled target table's indexes, set skip_unusable_indexes to true, no referential constraints are present in target table, table, partitions and subpartitions are set to nologging.
EDIT: Tested the same thing with another table that is also partitioned, but it doesn't have subpartitions, it is only list partitioned. So instead of subpartition (subpartition_name) inside insert statement there was partition(partition_name) . However, in this case , insert run without sessions waiting for others and no locks were held. I am assuming with subpartitioned interval tables the above won't work.
EDIT2 I have created the same scenario in another database which is also Oracle 19c. Created a table with partitions and subpartitions, set the interval, disabled indexes, set nologging and run the job that inserts into subpartitions. Surprisingly, the insert went without errors and no sessions locking each other. Now I am thinking maybe its some database parameter that should be turned on or changed. Because database versions, table structures, jobs, inserts are the same, but in one it is locking each other, in another it is not.
UPDATE Adding the insert part of the code :
if c_tab_cursor %isopen then
close c_tab_cursor;
end if;
open c_tab_cursor;
loop
fetch c_tab_cursor bulk collect
into v_row limit 100000;
exit when(v_row.count = 0);
forall i in v_row.First .. v_row.Last
insert /*+ NO_GATHER_OPTIMIZER_STATISTICS APPEND_VALUES */ into
Target_Table subpartition(SYS_P68457)
values v_row
(i);
commit;
end loop;
close c_tab_cursor;
Edit3 Adding table info, table is daily partitioned, and each partition has around 150 subpartitions. At the time of writing this, table had total 177845 subpartitions. My other guess is oracle is spending many time to find the right subpartition, which is also arguable because subpartition name is provided during insert.
I'd say it is expected "feature" - when you insert into the same segment. Direct path insert writes data beyond HWM(high water mark) rather than using segment's free space map.
When you commit direct path insert HWM advances, when you rollback HWM stays and data is discarded.
Check Oracle segment parameter "FREELIST", but I'm afraid even this parameter wont help you.
When your inserts touch different subpartitions this should not be happening.
There can be various objects held by library cache lock (maybe due to bug).
IMHO only way how to investigate this would be either to use hanganalyze to check which function in oracle is being blocked or to query P1,P2,P3 parameters of library cache lock and identify which object is blocking parallel run.
PS: I saw bugs like: Only one session could run Java stored procedure at the time because Oracle unnecessarily wanted to hold exclusive lock on some library case object.
v$session reports the wait state at that precise instant that you query it. It's meaningless unless you keep requerying and keep seeing the same thing. Better yet, use v$active_session_history to see Oracle's own 1-second sampling of the wait state. If you see lots of rows with that wait, then it's meaningful.
Assuming that this is meaningful, I would point out that you are using a single row VALUES list and yet are asking for parallel dml. Parallel dml is for multiple row operations, not single row operations. You can use it for an insert-select, for example, but not an insert-values.
If your application is necessarily single-row driven, remove ENABLE_PARALLEL_DML APPEND_VALUES hints. If you are binding arrays to these variables, you can leave the APPEND_VALUES but remove the ENABLE_PARALLEL_DML. For inserts, parallel DML only works with insert-select.
As you clearly intend to have multiple sessions, each loading a separate subpartitions, that's your parallelism right there - you don't need nor want to add another layer of parallelism with PDML.

Sqoop: sqoop export DB2 loack

Does sqoop export issue any locks while exporting the data from hive to db2?
What type of lock does it issue? If there is a lock how are these locks released?
I get a validation error as there are parallel sqoop export process running on the same db2 table. Hence, wondering if there are any locks issued and what type of locks.
Yes Aavik, DB2 supports locks. There are three types of locks.
1. S lock (share)
2. U lock (update)
3. X lock (exclusive)
When you scan the table to read the data, for E.g when you do select * from table where <condition>, it does a read operation on the table, when it reads the data from the table it applies S lock on the table, meaning other requests can read the data, but cannot update or write.
When you do transactions on the table it applies U lock.
When you insert new data it acquires X lock, meaning it doesn't allow any read operation or update operation.
So, when you do sqoop export from Hive to DB2 it acquires X lock on
the table as it is inserting new records.
When you do sqoop import, it acquires S lock on the table.
This is a very common problem everywhere, you have many options to overcome this issue.
1. maintain separate views/Tables for regular transactions.
2. Increase number of max retries or write a shell script which checks if DB2 is free from locks, basically you've to create a dependency, I know this will become bit complicated, there may be better ways to do though.
Hope this gives you a better understanding.

How to purge an Advanced Queue in Oracle

The documentation is clear about how to purge an Oracle AQ:
dbms_aqadm.purge_queue_table()
However, what happens to the storage, especially the high water marks of the queue table, the indexes and of the LOB segments? Is it necessary to shrink the table, too?
In production, the queues are nearly always empty (as they should), but in our test system, they fill up to millions of rows for various reasons, so they need to be emptied sometimes.
Is it neccessary to look at the underlying tables and indexes or is this taken care of automatically?
Many thanks!
DBMS_AQADM.PURGE_QUEUE_TABLE it is equivalent for truncate table. Also look at this error message when you try truncate queue table
ORA-24005: Inappropriate utilities used to perform DDL on AQ table %s.%s
*Cause: An attempt was made to use the SQL command DROP TABLE or TRUNCATE
TABLE or ALTER TABLE on queue metadata or tables.
*Action: Use the DBMS_AQADM.DROP_QUEUE_TABLE to DROP TABLE,
DBMS_AQADM.PURGE_QUEUE_TABLE to TRUNCATE TABLE.
ALTER TABLE redefinition based on only ALTER_TABLE_PROPERTIES and
ALTER_TABLE_PARTITIONING clauses are allowed.
Tom Kyte has already written info about often truncating table https://asktom.oracle.com/pls/apex/f?p=100:11:0::::P11_QUESTION_ID:47911859692542

Not able to apply dynamic partitioning for a huge data set in Hive

I have a table test_details with some 4 million records. Using the data in this table, I have to create a new partitioned table test_details_par with records partitioned on visit_date. Creating the table is not a challenge, but when I come to the part where I have to INSERT the data using Dynamic Partitions, Hive gives up when I try to insert data for more number of days. If I do it for 2 or 3 days the Map Reduce jobs runs successfully but for more days it fails giving a JAVA Heap Space Error or GC Error.
A Simplified Snapshot of my DDLs is as follows:
CREATE TABLE test_details_par( visit_id INT, visit_date DATE, store_id SMALLINT);
INSERT INTO TABLE test_details_par PARTITION(visit_date) SELECT visit_id, store_id, visit_date FROM test_details DISTRIBUTE BY visit_date;
I have tried setting these parameters, so that Hive executes my job in a better way:
set hive.exec.dynamic.partition.mode=nonstrict;
set hive.exec.dynamic.partition=true;
set hive.exec.max.dynamic.partitions.pernode = 10000;
Is there anything that I am missing to run the INSERT for a complete batch without specifying the dates specifically?
Neels,
Hive 12 and below have well-known scalability issues with dynamic partitioning that will be addressed with Hive 13. The problem is that Hive attempts to hold a file handle open for each and every partition it writes out, which causes out of memory and crashes. Hive 13 will sort by partition key so that it only needs to hold one file open at a time.
You have 3 options as I see
Change your job to insert only a few partitions at a time.
Wait for Hive 13 to be released and try that (2-3 months to wait).
If you know how, build Hive from trunk and use it to complete your data load.

ORA-00054 while loading large data file

I get ORA-00054 while loading large data files(~ 10 gb)
The error occurs when this a new file is loaded after a previous file.
Any ideas how I can solve this?
One possible scenario.
Is this a direct path load ? If so, please check the v$locked_object view and see if is being locked by someone during your load.
select dbao.object_name
from v$locked_object vlo,
dba_objects dbao
where vlo.object_id = dbao.object_id
and dbao.object_name = 'Table that you are trying to load...'
From the Oracle Documentation at http://download.oracle.com/docs/cd/B10500_01/server.920/a96524/c21dlins.htm
Locking Considerations with
Direct-Path INSERT
During direct-path INSERT, Oracle
obtains exclusive locks on the table
(or on all partitions of a partitioned
table). As a result, users cannot
perform any concurrent insert, update,
or delete operations on the table, and
concurrent index creation and build
operations are not permitted.
Concurrent queries, however, are
supported, but the query will return
only the information before the insert
operation.
Maybe this is linked to tablespace datafile sizes, table size, because ORA-00054 usually appears when an ALTER statement is run.
I do not pretend to be right here.
Check those views.
DBA_BLOCKERS – Shows non-waiting sessions holding locks being waited-on
DBA_DDL_LOCKS – Shows all DDL locks held or being requested
DBA_DML_LOCKS - Shows all DML locks held or being requested
DBA_LOCK_INTERNAL – Displays 1 row for every lock or latch held or being requested with the username of who is holding the lock
DBA_LOCKS - Shows all locks or latches held or being requested
DBA_WAITERS - Shows all sessions waiting on, but not holding waited for locks
http://www.dba-oracle.com/t_ora_00054_locks.htm
Your table seems to be locked: ORA-00054
It can be because of the way that Oracle driver handles the BLOB types (the driver locks the record, opens an stream to write the binary data, and needs "some help" to release the record).
I would try the next secuence:
Load the first file
COMMIT;
Load the second file

Resources