Hive table locks - hadoop

I have hive tables which are queried through queries in a file.
I had invoked an oozie workflow which invoked a hive action for mentioned file.
The job did not succeed and I killed the workflow.
But the tables are still shown as locked on Hive CLI. I am looking for a command/process that will release locks from Hive tables.

We can use the following query to release the lock
set hive.txn.manager=org.apache.hadoop.hive.ql.lockmgr.DummyTxnManager;
unlock table tablename;

if you use mysql as metastore, it will store table locks info in table HIVE_LOCKS, truncate it .
mysql> select * from HIVE_LOCKS;
Empty set (0.00 sec)
mysql>

To Check the locks on table (Run in Hive)-
show locks tablename extended;
To find the activity id for long running query - (You need to pass User from above query and can verify the Agent Info from first query with the application name in below query). Run outside hive
yarn application -list | grep User
To kill the activity id -
yarn application -kill activityid

I also met a similar problem in hive3, and i read the source code in org.apache.hadoop.hive.metastore.txn.TxnHandler, i found that there is a function called performTimeOuts(), which is scheduled periodically by a daemon thread called org.apache.hadoop.hive.metastore.txn.AcidHouseKeeperService.
That daemon thread will clean outdated lock infomation automatically in the mysql table hive.hive_locks, but it is not enabled by default, so we just need to configure it in hive-site.xml, like this:
<property>
<name>metastore.task.threads.always</name>
<value>org.apache.hadoop.hive.metastore.events.EventCleanerTask,org.apache.hadoop.hive.metastore.RuntimeStatsCleanerTask,org.apache.hadoop.hive.metastore.repl.DumpDirCleanerTask,org.apache.hadoop.hive.metastore.txn.AcidHouseKeeperService</value>
</property>

Related

Hive metastore update

I have a script running which creates hive tables, loads data int them and at the end deletes those tables for given date.
When I run same script again I get error as "Table already exists" hive metadata is getting updated late.
Can someone advise ?
Root cause may be there are Other Hive jobs are using the same table names.
You can try below
CREATE TABLE [IF NOT EXISTS] xyz
IF OBJECT_ID('xyz', 'U') IS NOT NULL DROP TABLE xyz;
To Check metastore status
service hive-metastore status
In case it is not started/dead then run
service hive-metastore start
Please refer this link for more information on External tables Hive: Create Table and Partition By

Why is mapreduce not Executed for Hive queries?

I had one query where we have Hive table created and when we select * from table where=< condition>; ,it gives results immediately without invoking MR job.When I create a same duplicate table and try to execute a query then MR is invoked. What could be the possible reason for this?
I got the answer,The reason was Hive analyze command was issued on the table .Once you execute a hive analyze command it stores number of row,file size in hive metastore.So ,when u do select count(*) from table.It directly fetches it from the hive metastore instead of invoking a map reduce job.
You can also issue a Analyze command on column as well.
ANALYZE TABLE [db_name.]tablename [PARTITION(partcol1[=val1], partcol2[=val2], ...)] -- (Note: Fully support qualified table name since Hive 1.2.0, see HIVE-10007.)
COMPUTE STATISTICS
[FOR COLUMNS] -- (Note: Hive 0.10.0 and later.)
[CACHE METADATA] -- (Note: Hive 2.1.0 and later.)
[NOSCAN];
Documentation link :
https://cwiki.apache.org/confluence/display/Hive/StatsDev
Local mode (hive not invoking MR) depends on several conditions (see HIVE-1408):
hive.exec.mode.local.auto=true/false - Lets Hive determine whether to run in local mode automatically.
hive.exec.mode.local.auto.input.size.max=1G - When hive.exec.mode.local.auto is true, input bytes should be less than this for local mode.
hive.exec.mode.local.auto.input.files.max=4 - When hive.exec.mode.local.auto is true, the number of tasks should be less than this for local mode.
If the tables have the same data, my guess is that there is a difference in the number of tasks that are spawned when querying the two tables causing one query to run in local mode and another to spawn a MR job.

How to reset mload after job?

I am running an mload script on a table that is cleared at the beginning of a job. When the script fails, the error and log tables prevent the job from running a second time. How does one reset mload cleanly after a failure?
You need to drop the work/error/log table
DROP TABLE UV_mytable;
DROP TABLE ET_mytable;
DROP TABLE WT_mytable;
DROP TABLE LT_mytable;
And then release the load lock:
RELEASE MLOAD mytable;,
If this fails
RELEASE MLOAD mytable IN APPLY;
But why does the job fail at all?

Hive Locks entire database when running select on one table

HIVE 0.13 will SHARED lock the entire database(I see a node like LOCK-0000000000 as a child of the database node in Zookeeper) when running a select statement on any table in the database. HIVE creates a shared lock on the entire schema even when running a select statement - this results in a freeze on CREATE/DELETE statements on other tables in the database until the original query finishes and the lock is released.
Does anybody know a way around this? Following link suggests concurrency to be turned off but we can't do that as we are replacing the entire table and we have to make sure that no select statement is accessing the table before we replace the entire contents.
http://mail-archives.apache.org/mod_mbox/hive-user/201408.mbox/%3C0eba01cfc035$3501e4f0$9f05aed0$#com%3E
use mydatabase;
select count(*) from large_table limit 1; # this table is very large and hive.support.concurrency=true`
In another hive shell, meanwhile the 1st query is executing:
use mydatabase;
create table sometable (id string) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' STORED AS TEXTFILE ;
The problem is that the “create table” does not execute untill the first query (select) has finished.
Update:
We are using Cloudera's distribution of Hive CDH-5.2.1-1 and we are seeing this issue.
I think they never made such that in Hive 0.13. Please verify your Resource manager and see that you have enough memory when you are executing multiple Hive queries.
As you know each Hive query will trigger a map reduce job and if YARN doesn't have enough resources it will wait till the previous running job completes. Please approach your issue from memory point of view.
All the best !!

Hive is not showing tables

I am new to Hadoop and Hive world.
I have a strange problem. When I was working on hive prompt. I have created few tables and hive was showing those tables.
After exiting Hive session when I am again starting Hive terminal "show tables;" is not showing any table!. I can see tables in '/user/hive/warehouse' in HDFS.
What is wrong am I doing. Can you please help me on this?
BalduZ is right . set this in $HIVE_HOME/conf/hive-site.xml
property name = javax.jdo.option.ConnectionURL
property value = jdbc:derby:;databaseName=/home/youruser/hive_metadata/metastore_db;create=true
Next time onwards you can run hive from any dir location. This will solve your problem.
I assume you are using the default configuration, so the problem is where you call hive to start working, since you need to call it from the same directory in order to see the tables you created in the previous hive session.
For example, if you call hive when you are in ~/test/hive and create some tables, and the next time you use hive you start it from ~/test you will not see the tables you created earlier. The easiest solution is to always start hive from the same directory.
However, a better solution would be to configure hive so that it uses a database like MySQL as a metastore. You can find how to do this here.

Resources