Hive/Hadoop intermittent failure: Unable to move source to destination - hadoop

There have been some SO articles about Hive/Hadoop "Unable to move source" error. Many of them point to permission problem.
However, in my site I saw the same error but I am quite sure that it is not related to permission problem. This is because the problem is intermittent -- it worked one day but failed on another day.
I thus looked more deeply into the error message. It was complaining about failing to move from a
.../.hive-stating_hive.../-ext-10000/part-00000-${long-hash}
source path to a destination path of
.../part-00000-${long-hash}
folder. Would this observation ring a bell with someone?
This error was triggered by a super simple test query: just insert a row into a test table (see below)
Error message
org.apache.hadoop.hive.ql.metadata.HiveException:
Unable to move source
hdfs://namenodeHA/apps/hive/warehouse/some_db.db/testTable1/.hive-staging_hive_2018-02-02_23-02-13_065_2316479064583526151-5/-ext-10000/part-00000-832944cf-7db4-403b-b02e-55b6e61b1af1-c000
to destination
hdfs://namenodeHA/apps/hive/warehouse/some_db.db/testTable1/part-00000-832944cf-7db4-403b-b02e-55b6e61b1af1-c000;
Query that triggered this error (but only intermittently)
insert into testTable1
values (2);

Thanks for all the help. I have found a solution. I am providing my own answer here.
The problem was with a "CTAS" create table as ... operation that preceded the failing insert command due to an inappropriate close of the file system. The telltale sign was that there would be an IOException: Filesystem closed message shown together with the failing HiveException: Unable to move source ... to destination operation. ( I found the log message from my Spark Thrift Server not my application log )
Caused by: java.io.IOException: Filesystem closed
at org.apache.hadoop.hdfs.DFSClient.checkOpen(DFSClient.java:808)
at org.apache.hadoop.hdfs.DFSClient.getEZForPath(DFSClient.java:3288)
at org.apache.hadoop.hdfs.DistributedFileSystem.getEZForPath(DistributedFileSystem.java:2093)
at org.apache.hadoop.hdfs.client.HdfsAdmin.getEncryptionZoneForPath(HdfsAdmin.java:289)
at org.apache.hadoop.hive.shims.Hadoop23Shims$HdfsEncryptionShim.isPathEncrypted(Hadoop23Shims.java:1221)
at org.apache.hadoop.hive.ql.metadata.Hive.moveFile(Hive.java:2607)
The solution was actually from another SO article: https://stackoverflow.com/a/47067350/1168041
But here I provide an excerpt in case that article is gone:
add the property to hdfs-site.xml
<property>
<name>fs.hdfs.impl.disable.cache</name>
<value>true</value>
</property>
Reason: spark and hdfs use the same api (at the bottom they use the same instance).
When beeline close a filesystem instance . It close the thriftserver's
filesystem instance too. Second beeline try to get instance , it will
always report "Caused by: java.io.IOException: Filesystem closed"
Please check this issue here:
https://issues.apache.org/jira/browse/SPARK-21725
I was not using beeline but the problem with CTAS was the same.
My test sequence:
insert into testTable1
values (11)
create table anotherTable as select 1
insert into testTable1
values (12)
Before the fix, any insert would failed after the create table as …
After the fix, this problem was gone.

Related

Can we insert into external table

I am debugging a Big Data code in Production environment of my company. Hive return the following error:
Exception: org.apache.hadoop.hive.ql.lockmgr.LockException: No record of lock could be found, may have timed out
Killing DAG...
Execution has failed.
Exception in thread "main" java.sql.SQLException: Error while processing statement: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.tez.TezTask.
at org.apache.hive.jdbc.HiveStatement.execute(HiveStatement.java:282)
at org.apache.hive.jdbc.HiveStatement.executeUpdate(HiveStatement.java:392)
at HiveExec.main(HiveExec.java:159)
After investigation, I have found that this error could be caused by BoneCP in connectionPoolingType property, but the cluster support team told me that they fixed this bug by upgrading BoneCP.
My question is: can we INSERT INTO an external table in Hive, because I have doubt about the insertion script ?
Yes, you can insert into external table.

Oracle destination in SSIS data flow is failing with Error- ORA-01405: fetched column value is NULL

I have one SSIS package in which there is one DFT. In DFT, I have one Oracle source and one Oracle destination.
In Oracle destination I am using Data Access Mode as 'Table Name - Fast Load (Using Direct Path)'
There is one strange issue with that. It is failing with the following error
[Dest 1 [251]] Error: Fast Load error encountered during
PreLoad or Setup phase. Class: OCI_ERROR Status: -1 Code: 0 Note:
At: ORAOPRdrpthEngine.c:735 Text: ORA-00604: error occurred at
recursive SQL level 1 ORA-01405: fetched column value is NULL
I thought it is due to NULL values in source but there is no NOT NULL constraint in the destination table, so it should not be an issue. And to add into this, the package is working fine in case of 'Normal Load' but 'Fast Load'.
I have tried using NVL in case of NULL values from source but still no luck.
I have also recreated the DFT with these connections but that too in vain.
Can some one please help me with this?
It worked fine after recreating the oracle table with the same script

Hive Select Count(*) filenotfound exception for job.splitmetainfo

I have a hiveserver2 running and wrote a java program to query from hive.
I tried this query
SELECT * FROM table1
where, 'table1' is the table name in hive, and its works fine and gave me the results.
But when i tried to run
SELECT COUNT(*) FROM table1
it threw an exception
Exception in thread "main" java.sql.SQLException: Error while processing statement: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask
I check the logs and this was recorded
Job init failed : org.apache.hadoop.yarn.exceptions.YarnRuntimeException: java.io.FileNotFoundException: File does not exist: hdfs://vseccoetv04:9000/tmp/hadoop-yarn/staging/anonymous/.staging/job_1453359797695_0017/job.splitmetainfo
at org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl$InitTransition.createSplits(JobImpl.java:1568)
at org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl$InitTransition.transition(JobImpl.java:1432)
at org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl$InitTransition.transition(JobImpl.java:1390)
....
I checked in a number of places, and other people too got 'FileNotFoundException' but not dude to this reason.
Is there any way to solve this problem?
Okay,
I figured out the problem myself :)
I had added some properties in the hive-site.xml file earlier to check support for transactions. I think i might have added some wrong values there. Now, I have removed the properties which i added, and have restarted hive service. Everything works fine :D

Hive - error while using dynamic partition query

I am trying to execute the query below:
INSERT OVERWRITE TABLE nasdaq_daily
PARTITION(stock_char_group)
select exchage, stock_symbol, date, stock_price_open,
stock_price_high, stock_price_low, stock_price_close,
stock_volue, stock_price_adj_close,
SUBSTRING(stock_symbol,1,1) as stock_char_group
FROM nasdaq_daily_stg;
I have already set hive.exec.dynamic.partition=true and hive.exec.dynamic.partiion.mode=nonstrict;.
Table nasdaq_daily_stg table contains proper information in the form of a number of CSV files. When I execute this query, I get this error message:
Caused by: java.lang.SecurityException: sealing violation: package org.apache.derby.impl.jdbc.authentication is sealed.
FAILED: Execution Error, return code -101 from org.apache.hadoop.hive.ql.exec.MapRedTask
The mapreduce job didnt start at all. So there are no logs present in the jobtracker web-UI for this error. I am using derby to store meta-store information.
Can someone help me fix this?
Please try this. This may be the issue. You may be having Derby classes twice on your classpath.
"SecurityException: sealing violation" when starting Derby connection

Creating index in hive 0.9

I am trying to create index on tables in Hive 0.9. One table has 1 billion rows, another has 30 Million rows. The command I used is (other than creating the table and so on)
CREATE INDEX DEAL_IDX_1 ON TABLE DEAL (ID) AS
'org.apache.hadoop.hive.ql.index.compact.CompactIndexHandler'
WITH DEFERRED REBUILD;
alter index DEAL_IDX_1 ON DEAL rebuild;
set hive.optimize.autoindex=true;
set hive.optimize.index.filter=true;
For the 30 Mill. row table, the rebuilding process looks alright (mapper and reducer both finished) until in the end it prints
Invalid alter operation: Unable to alter index.
FAILED: Execution Error, return code 1
from org.apache.hadoop.hive.ql.exec.DDLTask
Checking the log, and it had the error
java.lang.ClassNotFoundException: org.apache.derby.jdbc.EmbeddedDriver"
Not sure why this error was encountered, but anyway, I added the derby-version.jar:
add jar /path/derby-version.jar
The reported error was resolved, but still got another error:
org.apache.hadoop.hive.ql.exec.FileSinkOperator:
StatsPublishing error: cannot connect to database
Not sure how to solve the problem. I do see the created index table under hive/warehouse though.
For the 1 Billion row table, it is another story. The mapper just got stuck at 2% or so. And error showed
FATAL org.apache.hadoop.mapred.Child: Error running child :
java.lang.OutOfMemoryError: Java heap space
I attempted to enforce max heap size, as well as max mapr memory (see the settings mentioned somewhere but not in hive's configuration settings):
set mapred.child.java.opts = -Xmx6024m
set mapred.job.map.memory.mb=6000;
set mapred.job.reduce.memory.mb=4000;
However, this is not help. The mapper would still got stuck at 2% with the same error.
I had a similar problem of the index creating and in the hive/warehouse, but the process as a whole failing. My index_name was TypeTarget (yours is DEAL_IDX_1) and after many days of trying different approaches, making the index_name all lowercase (typetarget) fixed the issue. My problem was in Hive 0.10.0.
Also, the class not found and StatsPublishing issue is because by default, hive.stats.autogather is turned on. Turning that off (false) in hive-site.xml should get rid of those issues.
Hopefully this helps anyone looking for a quick fix.

Resources