Hive Select Count(*) filenotfound exception for job.splitmetainfo - hadoop

I have a hiveserver2 running and wrote a java program to query from hive.
I tried this query
SELECT * FROM table1
where, 'table1' is the table name in hive, and its works fine and gave me the results.
But when i tried to run
SELECT COUNT(*) FROM table1
it threw an exception
Exception in thread "main" java.sql.SQLException: Error while processing statement: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask
I check the logs and this was recorded
Job init failed : org.apache.hadoop.yarn.exceptions.YarnRuntimeException: java.io.FileNotFoundException: File does not exist: hdfs://vseccoetv04:9000/tmp/hadoop-yarn/staging/anonymous/.staging/job_1453359797695_0017/job.splitmetainfo
at org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl$InitTransition.createSplits(JobImpl.java:1568)
at org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl$InitTransition.transition(JobImpl.java:1432)
at org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl$InitTransition.transition(JobImpl.java:1390)
....
I checked in a number of places, and other people too got 'FileNotFoundException' but not dude to this reason.
Is there any way to solve this problem?

Okay,
I figured out the problem myself :)
I had added some properties in the hive-site.xml file earlier to check support for transactions. I think i might have added some wrong values there. Now, I have removed the properties which i added, and have restarted hive service. Everything works fine :D

Related

How to catch DB Error in Sub Job in Talend

I have the below design in Talend I am catching the error when the component fails but if there is any error from DB like Cannot Insert as Parent Key not found Cannot Insert into column col2 expected 15 but actual 16 it is not showing any error if the insert job is ran a subjob
If I run the Job FACTDIM_COMBINE I can see the error but if it is ran as Subjob I am not able see the error
Please help to get the DB error when it is run as SubJob also
Please use tLogCatcher component in your job. This will log all the errors even in the sub jobs. Also enable die on error functionality in all the components where ever necessary

How to make group by work on a hive query (error:26:25)

I have a really simple hive table and I'm trying to query it with a GROUP BY clause. When I run the query I get this error:
org.apache.hive.service.cli.HiveSQLException:Expected states: [FINISHED], but found ERROR:26:25
Any help is appreciated.
A simple select query works fine, but when I added the GROUP BY clause it starts failing.
This works:
SELECT city,
count(*)
FROM cust_sales;
This fails:
SELECT city,
count(*)
FROM cust_sales
GROUP BY city;
cust_sales has only 2 columns. City (varchar) and amount (int).
The full error I'm getting:
TFetchResultsResp(results=None, status=TStatus(infoMessages=['*org.apache.hive.service.cli.HiveSQLException:Expected states: [FINISHED], but found ERROR:26:25',
'org.apache.hive.service.cli.operation.Operation:assertState:Operation.java:197',
'org.apache.hive.service.cli.operation.SQLOperation:getNextRowSet:SQLOperation.java:441',
'org.apache.hive.service.cli.operation.OperationManager:getOperationNextRowSet:OperationManager.java:328',
'org.apache.hive.service.cli.session.HiveSessionImpl:fetchResults:HiveSessionImpl.java:910', 'sun.reflect.GeneratedMethodAccessor149:invoke::-1',
'sun.reflect.DelegatingMethodAccessorImpl:invoke:DelegatingMethodAccessorImpl.java:43',
'java.lang.reflect.Method:invoke:Method.java:498',
'org.apache.hive.service.cli.session.HiveSessionProxy:invoke:HiveSessionProxy.java:78',
'org.apache.hive.service.cli.session.HiveSessionProxy:access$000:HiveSessionProxy.java:36',
'org.apache.hive.service.cli.session.HiveSessionProxy$1:run:HiveSessionProxy.java:63',
'java.security.AccessController:doPrivileged:AccessController.java:-2',
'javax.security.auth.Subject:doAs:Subject.java:422',
'org.apache.hadoop.security.UserGroupInformation:doAs:UserGroupInformation.java:1730',
'org.apache.hive.service.cli.session.HiveSessionProxy:invoke:HiveSessionProxy.java:59',
'com.sun.proxy.$Proxy71:fetchResults::-1',
'org.apache.hive.service.cli.CLIService:fetchResults:CLIService.java:564',
'org.apache.hive.service.cli.thrift.ThriftCLIService:FetchResults:ThriftCLIService.java:786',
'org.apache.hive.service.rpc.thrift.TCLIService$Processor$FetchResults:getResult:TCLIService.java:1837',
'org.apache.hive.service.rpc.thrift.TCLIService$Processor$FetchResults:getResult:TCLIService.java:1822',
'org.apache.thrift.ProcessFunction:process:ProcessFunction.java:39',
'org.apache.thrift.TBaseProcessor:process:TBaseProcessor.java:39',
'org.apache.hadoop.hive.metastore.security.HadoopThriftAuthBridge$Server$TUGIAssumingProcessor:process:HadoopThriftAuthBridge.java:647',
'org.apache.thrift.server.TThreadPoolServer$WorkerProcess:run:TThreadPoolServer.java:286',
'java.util.concurrent.ThreadPoolExecutor:runWorker:ThreadPoolExecutor.java:1149',
'java.util.concurrent.ThreadPoolExecutor$Worker:run:ThreadPoolExecutor.java:624'
,
'java.lang.Thread:run:Thread.java:748'],
sqlState=None, statusCode=3,
errorCode=0, errorMessage='Expected states: [FINISHED], but found ERROR'), hasMoreRows=None)
I had the same error. This error resolved when i restarted my TEZ service. The group by query still timed out through superset, but i was able to ssh into my vagrant vm and then complete the query via hive. Hope that helps.
It was a problem with permissions. I had to fix the YARN permissions (it was using ranger's acls and its own, instead of just ranger's) by setting ranger.add-yarn-authorization to false.
Now everything works fine.

Hive/Hadoop intermittent failure: Unable to move source to destination

There have been some SO articles about Hive/Hadoop "Unable to move source" error. Many of them point to permission problem.
However, in my site I saw the same error but I am quite sure that it is not related to permission problem. This is because the problem is intermittent -- it worked one day but failed on another day.
I thus looked more deeply into the error message. It was complaining about failing to move from a
.../.hive-stating_hive.../-ext-10000/part-00000-${long-hash}
source path to a destination path of
.../part-00000-${long-hash}
folder. Would this observation ring a bell with someone?
This error was triggered by a super simple test query: just insert a row into a test table (see below)
Error message
org.apache.hadoop.hive.ql.metadata.HiveException:
Unable to move source
hdfs://namenodeHA/apps/hive/warehouse/some_db.db/testTable1/.hive-staging_hive_2018-02-02_23-02-13_065_2316479064583526151-5/-ext-10000/part-00000-832944cf-7db4-403b-b02e-55b6e61b1af1-c000
to destination
hdfs://namenodeHA/apps/hive/warehouse/some_db.db/testTable1/part-00000-832944cf-7db4-403b-b02e-55b6e61b1af1-c000;
Query that triggered this error (but only intermittently)
insert into testTable1
values (2);
Thanks for all the help. I have found a solution. I am providing my own answer here.
The problem was with a "CTAS" create table as ... operation that preceded the failing insert command due to an inappropriate close of the file system. The telltale sign was that there would be an IOException: Filesystem closed message shown together with the failing HiveException: Unable to move source ... to destination operation. ( I found the log message from my Spark Thrift Server not my application log )
Caused by: java.io.IOException: Filesystem closed
at org.apache.hadoop.hdfs.DFSClient.checkOpen(DFSClient.java:808)
at org.apache.hadoop.hdfs.DFSClient.getEZForPath(DFSClient.java:3288)
at org.apache.hadoop.hdfs.DistributedFileSystem.getEZForPath(DistributedFileSystem.java:2093)
at org.apache.hadoop.hdfs.client.HdfsAdmin.getEncryptionZoneForPath(HdfsAdmin.java:289)
at org.apache.hadoop.hive.shims.Hadoop23Shims$HdfsEncryptionShim.isPathEncrypted(Hadoop23Shims.java:1221)
at org.apache.hadoop.hive.ql.metadata.Hive.moveFile(Hive.java:2607)
The solution was actually from another SO article: https://stackoverflow.com/a/47067350/1168041
But here I provide an excerpt in case that article is gone:
add the property to hdfs-site.xml
<property>
<name>fs.hdfs.impl.disable.cache</name>
<value>true</value>
</property>
Reason: spark and hdfs use the same api (at the bottom they use the same instance).
When beeline close a filesystem instance . It close the thriftserver's
filesystem instance too. Second beeline try to get instance , it will
always report "Caused by: java.io.IOException: Filesystem closed"
Please check this issue here:
https://issues.apache.org/jira/browse/SPARK-21725
I was not using beeline but the problem with CTAS was the same.
My test sequence:
insert into testTable1
values (11)
create table anotherTable as select 1
insert into testTable1
values (12)
Before the fix, any insert would failed after the create table as …
After the fix, this problem was gone.

Can we insert into external table

I am debugging a Big Data code in Production environment of my company. Hive return the following error:
Exception: org.apache.hadoop.hive.ql.lockmgr.LockException: No record of lock could be found, may have timed out
Killing DAG...
Execution has failed.
Exception in thread "main" java.sql.SQLException: Error while processing statement: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.tez.TezTask.
at org.apache.hive.jdbc.HiveStatement.execute(HiveStatement.java:282)
at org.apache.hive.jdbc.HiveStatement.executeUpdate(HiveStatement.java:392)
at HiveExec.main(HiveExec.java:159)
After investigation, I have found that this error could be caused by BoneCP in connectionPoolingType property, but the cluster support team told me that they fixed this bug by upgrading BoneCP.
My question is: can we INSERT INTO an external table in Hive, because I have doubt about the insertion script ?
Yes, you can insert into external table.

Hive - error while using dynamic partition query

I am trying to execute the query below:
INSERT OVERWRITE TABLE nasdaq_daily
PARTITION(stock_char_group)
select exchage, stock_symbol, date, stock_price_open,
stock_price_high, stock_price_low, stock_price_close,
stock_volue, stock_price_adj_close,
SUBSTRING(stock_symbol,1,1) as stock_char_group
FROM nasdaq_daily_stg;
I have already set hive.exec.dynamic.partition=true and hive.exec.dynamic.partiion.mode=nonstrict;.
Table nasdaq_daily_stg table contains proper information in the form of a number of CSV files. When I execute this query, I get this error message:
Caused by: java.lang.SecurityException: sealing violation: package org.apache.derby.impl.jdbc.authentication is sealed.
FAILED: Execution Error, return code -101 from org.apache.hadoop.hive.ql.exec.MapRedTask
The mapreduce job didnt start at all. So there are no logs present in the jobtracker web-UI for this error. I am using derby to store meta-store information.
Can someone help me fix this?
Please try this. This may be the issue. You may be having Derby classes twice on your classpath.
"SecurityException: sealing violation" when starting Derby connection

Resources