FileNotFoundException for hive/lib/hive-builtins-0.9.0.jar in Hive - hadoop

I've a configured Hadoop 0.23 on my local box and got it work with a simple map-reduce wordcount program. I Have configured Hive to work with it. All the DDL queries works fine. But when i fire queries that have aggregates (which will trigger Map-educe jobs)
java.io.FileNotFoundException: File does not exist: /Users/varadham/projects/hadoop/hive/lib/hive-builtins-0.9.0.jar
at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:738)
at org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.getFileStatus(ClientDistributedCacheManager.java:208)
at org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.determineTimestamps(ClientDistributedCacheManager.java:71)
at org.apache.hadoop.mapreduce.JobSubmitter.copyAndConfigureFiles(JobSubmitter.java:252)
at org.apache.hadoop.mapreduce.JobSubmitter.copyAndConfigureFiles(JobSubmitter.java:290)
at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:361)
at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1218)
at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1215)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1212)
at org.apache.hadoop.mapreduce.Job.submit(Job.java:1215)
at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:609)
at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:604)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1212)
at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:604)
at org.apache.hadoop.hive.ql.exec.ExecDriver.execute(ExecDriver.java:435)
at org.apache.hadoop.hive.ql.exec.ExecDriver.main(ExecDriver.java:693)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:601)
at org.apache.hadoop.util.RunJar.main(RunJar.java:208)
Job Submission failed with exception 'java.io.FileNotFoundException(File does not exist: /Users/varadham/projects/hadoop/hive/lib/hive-builtins-0.9.0.jar)'

You should create the same file /Users/varadham/projects/hadoop/hive/lib/hive-builtins-0.9.0.jar in your hadoop file system. Then it should work.

Make sure you have the jar in HDFS.

Related

MapReduce, FileNotFoundException

Hadoop 2.9.1, standalone installation.
The hdfs directory is organized by time (yyyyMMdd/HH/mm), like, hdfs://server1:9000/foo/20190410/10/00. And there're several files in each minute.
What I need to do is, process data for each hour, for example, process all data under hdfs://server1:9000/foo/20190410/10. So the mapreduce input setting is something like,
job.setInputFormatClass(org.apache.hadoop.mapreduce.lib.input.SequenceFileAsBinaryInputFormat.class);
Path inputPath = new Path("hdfs://server1:9000/foo/20190410/10");
org.apache.hadoop.mapreduce.lib.input.SequenceFileAsBinaryInputFormat.addInputPath(job, inputPath);
But I keep getting this,
Exception in thread "main" java.io.FileNotFoundException: File does not exist: hdfs://server01:9000/foo/20190410/10/00/data
at org.apache.hadoop.hdfs.DistributedFileSystem$29.doCall(DistributedFileSystem.java:1533)
at org.apache.hadoop.hdfs.DistributedFileSystem$29.doCall(DistributedFileSystem.java:1526)
at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1526)
at org.apache.hadoop.mapreduce.lib.input.SequenceFileInputFormat.listStatus(SequenceFileInputFormat.java:67)
at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.getSplits(FileInputFormat.java:393)
at org.apache.hadoop.mapreduce.JobSubmitter.writeNewSplits(JobSubmitter.java:314)
at org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(JobSubmitter.java:331)
at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:202)
at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1570)
at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1567)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1889)
at org.apache.hadoop.mapreduce.Job.submit(Job.java:1567)
at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1588)
at com.misc.mr.TestJob.main(TestJob.java:54)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.util.RunJar.run(RunJar.java:239)
at org.apache.hadoop.util.RunJar.main(RunJar.java:153)
I have no idea why it try to access path hdfs://server01:9000/foo/20190410/10/00/data
If the input is a file instead of a folder (for example hdfs://server1:9000/foo/20190410/10/00/part1), it works fine.
Can anyone please help to give some explanation? Many thanks.
Figured out.
Set mapreduce.input.fileinputformat.input.dir.recursive to true.
Or
In code, call, FileInputFormat.setInputDirRecursive(job, true)

Gobblin Kafka to HDFS pull job error

I'm trying to pull data from Kafka to HDFS using Gobblin.
Gobblin version (compiled from github source code with command sudo ./gradlew clean build -PuseHadoop2 -PhadoopVersion=2.7.1 -x test):
0.6.2-546-g431188b
Hadoop version:
Hadoop 2.7.1.2.4.2.0-258
Subversion git#github.com:hortonworks/hadoop.git -r 13debf893a605e8a88df18a7d8d214f571e05289
Compiled by jenkins on 2016-04-24T16:02Z
Compiled with protoc 2.5.0
From source with checksum 2a2d95f05ec6c3ac547ed58cab713ac
This command was run using /usr/hdp/2.4.2.0-258/hadoop/hadoop-common-2.7.1.2.4.2.0-258.jar
Gobblin job:
job.name=GobblinKafkaQuickStart
job.group=GobblinKafka
job.description=Gobblin quick start job for Kafka
job.lock.enabled=false
job.schedule=0 0/2 * * * ?
kafka.brokers=hd-mgt03:6667,hd-mgt02:6667,hd-mgt04:6667
source.class=gobblin.source.extractor.extract.kafka.KafkaSimpleSource
extract.namespace=gobblin.extract.kafka
writer.builder.class=gobblin.writer.AvroHdfsDataWriter
writer.file.path.type=tablename
writer.destination.type=HDFS
writer.output.format=AVRO
data.publisher.type=gobblin.publisher.BaseDataPublisher
mr.job.max.mappers=1
metrics.reporting.file.enabled=true
metrics.log.dir=/gobblin-kafka/metrics
metrics.reporting.file.suffix=txt
bootstrap.with.offset=earliest
fs.uri=hdfs://hdfs:8020
writer.fs.uri=hdfs://hdfs:8020
state.store.fs.uri=hdfs://hdfs:8020
mr.job.root.dir=/kafka/working
state.store.dir=/kafka/state-store
task.data.root.dir=/kafka/task-data
data.publisher.final.dir=/kafka/job-output
I'm trying to run gobblin-mapreduce.sh from gobblin-dist/bin folder, but getting error:
Exception in thread "main" gobblin.runtime.JobException: Job job_GobblinKafkaQuickStart_1464962113982 failed
at gobblin.runtime.AbstractJobLauncher.launchJob(AbstractJobLauncher.java:363)
at gobblin.runtime.mapreduce.CliMRJobLauncher.launchJob(CliMRJobLauncher.java:84)
at gobblin.runtime.mapreduce.CliMRJobLauncher.run(CliMRJobLauncher.java:61)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
at gobblin.runtime.mapreduce.CliMRJobLauncher.main(CliMRJobLauncher.java:106)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
Log file contains error:
2016-06-03 16:55:17 MSK ERROR [main] gobblin.runtime.AbstractJobLauncher 321 - Failed to launch and run job job_GobblinKafkaQuickStart_1464962113982: java.lang.NoSuchFieldError: DEFAULT_MR_AM_ADMIN_USER_ENV
java.lang.NoSuchFieldError: DEFAULT_MR_AM_ADMIN_USER_ENV
at org.apache.hadoop.mapred.YARNRunner.createApplicationSubmissionContext(YARNRunner.java:470)
at org.apache.hadoop.mapred.YARNRunner.submitJob(YARNRunner.java:285)
at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:240)
at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1290)
at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1287)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
at org.apache.hadoop.mapreduce.Job.submit(Job.java:1287)
at gobblin.runtime.mapreduce.MRJobLauncher.runWorkUnits(MRJobLauncher.java:198)
at gobblin.runtime.AbstractJobLauncher.launchJob(AbstractJobLauncher.java:296)
at gobblin.runtime.mapreduce.CliMRJobLauncher.launchJob(CliMRJobLauncher.java:84)
at gobblin.runtime.mapreduce.CliMRJobLauncher.run(CliMRJobLauncher.java:61)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
at gobblin.runtime.mapreduce.CliMRJobLauncher.main(CliMRJobLauncher.java:106)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
What could be the reason for this error? How can I fix it?
From you error I can tell it might be the problem of JAR.
Usually, this error (java.lang.NoSuchFieldError: DEFAULT_MR_AM_ADMIN_USER_ENV) is caused by jar conflicts. You can check your class path to see if there are any version conflicts.

Hive MapReduce Job Submission fails "Target is a directory"

I've been playing around with Hadoop and it's sister projects, and I've had a few problems along the way, but I've finally hit one that I can not find an answer to:
I have a hive table stored on hdfs as a tab-delimited text file. And I can do a basic select on the table, but as soon as I make the query a little more complicated, hive turns it into a map reduce job which fails with the following stack trace
13/11/29 08:31:00 ERROR security.UserGroupInformation: PriviledgedActionException as:hduser (auth:SIMPLE) cause:java.io.IOException: Target /tmp/hadoop-> > yarn/staging/hduser/.staging/job_1385633903169_0013/libjars/lib/lib is a directory
13/11/29 08:31:00 ERROR security.UserGroupInformation: PriviledgedActionException as:hduser (auth:SIMPLE) cause:java.io.IOException: Target /tmp/hadoop-yarn/staging/hduser/.staging/job_1385633903169_0013/libjars/lib/lib is a directory
java.io.IOException: Target /tmp/hadoop-yarn/staging/hduser/.staging/job_1385633903169_0013/libjars/lib/lib is a directory
at org.apache.hadoop.fs.FileUtil.checkDest(FileUtil.java:500)
at org.apache.hadoop.fs.FileUtil.checkDest(FileUtil.java:502)
at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:348)
at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:338)
at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:289)
at org.apache.hadoop.mapreduce.JobSubmitter.copyRemoteFiles(JobSubmitter.java:139)
at org.apache.hadoop.mapreduce.JobSubmitter.copyAndConfigureFiles(JobSubmitter.java:212)
at org.apache.hadoop.mapreduce.JobSubmitter.copyAndConfigureFiles(JobSubmitter.java:300)
at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:387)
at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1268)
at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1265)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491)
at org.apache.hadoop.mapreduce.Job.submit(Job.java:1265)
at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:562)
at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:557)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491)
at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:557)
at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:548)
at org.apache.hadoop.hive.ql.exec.mr.ExecDriver.execute(ExecDriver.java:425)
at org.apache.hadoop.hive.ql.exec.mr.MapRedTask.execute(MapRedTask.java:144)
at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:151)
at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:65)
at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1414)
at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1192)
at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1020)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:888)
at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:259)
at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:216)
at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:413)
at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:781)
at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:675)
at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:614)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
Job Submission failed with exception 'java.io.IOException(Target /tmp/hadoop-yarn/staging/hduser/.staging/job_1385633903169_0013/libjars/lib/lib is a directory)'
13/11/29 08:31:00 ERROR exec.Task: Job Submission failed with exception 'java.io.IOException(Target /tmp/hadoop-yarn/staging/hduser/.staging/job_1385633903169_0013/libjars/lib/lib is a directory)'
java.io.IOException: Target /tmp/hadoop-yarn/staging/hduser/.staging/job_1385633903169_0013/libjars/lib/lib is a directory
at org.apache.hadoop.fs.FileUtil.checkDest(FileUtil.java:500)
at org.apache.hadoop.fs.FileUtil.checkDest(FileUtil.java:502)
at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:348)
at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:338)
at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:289)
at org.apache.hadoop.mapreduce.JobSubmitter.copyRemoteFiles(JobSubmitter.java:139)
at org.apache.hadoop.mapreduce.JobSubmitter.copyAndConfigureFiles(JobSubmitter.java:212)
at org.apache.hadoop.mapreduce.JobSubmitter.copyAndConfigureFiles(JobSubmitter.java:300)
at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:387)
at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1268)
at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1265)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491)
at org.apache.hadoop.mapreduce.Job.submit(Job.java:1265)
at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:562)
at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:557)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491)
at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:557)
at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:548)
at org.apache.hadoop.hive.ql.exec.mr.ExecDriver.execute(ExecDriver.java:425)
at org.apache.hadoop.hive.ql.exec.mr.MapRedTask.execute(MapRedTask.java:144)
at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:151)
at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:65)
at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1414)
at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1192)
at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1020)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:888)
at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:259)
at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:216)
at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:413)
at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:781)
at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:675)
at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:614)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
The folder in question does exist on the dfs, at least the "/tmp/hadoop-yarn/staging" part, and no matter what I set it's permissions to, hive or hadoop resets them on a job submission. The really concerning part is that the full path appears to be a generated folder name, so why does the software have a problem with something that it has generated on its own? Why is it a problem that the path is a directory? And what should it rather be?
Edit:
Here is the table I'm working with and the query I'm trying to run:
Query:
select * from hive_flow_details where node_id = 100 limit 10;
Table:
col_name data_type comment
id bigint None
flow_versions_id int None
node_id int None
node_name string None
Bear in mind, this happens to any uery that I attempt that has any kind of a where clause, as hive translates that into an MR job.
I eventually came right with this issue. I found conflicting jars in my classpath which I cleaned, and since then I have had no problems.

LeaseExpiredException in Hive

all. I run a hive query runs to 97% and exception shows that org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException: No lease on sth.
Can anyone kindly explain the reason why this error occurred?
And this is a single user Hive cluster environment.
Thank you in advance.
2013-01-02 22:16:17,833 ERROR org.apache.hadoop.hdfs.DFSClient: Exception closing file /tmp/hive-hadoop/hive_2013-01-01_21-21-32_067_6367259756570557828/_task_tmp.-ext-10002/_tmp.000004_1 : org.apache.hadoop.ipc.RemoteException: org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException: No lease on /tmp/hive-hadoop/hive_2013-01-01_21-21-32_067_6367259756570557828/_task_tmp.-ext-10002/_tmp.000004_1 File does not exist. Holder DFSClient_attempt_201301012114_0002_m_000004_1 does not have any open files.
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:1631)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:1622)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.completeFileInternal(FSNamesystem.java:1677)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.completeFile(FSNamesystem.java:1665)
at org.apache.hadoop.hdfs.server.namenode.NameNode.complete(NameNode.java:718)
at sun.reflect.GeneratedMethodAccessor13.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:601)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:563)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1388)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1384)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1382)
org.apache.hadoop.ipc.RemoteException: org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException: No lease on /tmp/hive-hadoop/hive_2013-01-01_21-21-32_067_6367259756570557828/_task_tmp.-ext-10002/_tmp.000004_1 File does not exist. Holder DFSClient_attempt_201301012114_0002_m_000004_1 does not have any open files.
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:1631)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:1622)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.completeFileInternal(FSNamesystem.java:1677)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.completeFile(FSNamesystem.java:1665)
at org.apache.hadoop.hdfs.server.namenode.NameNode.complete(NameNode.java:718)
at sun.reflect.GeneratedMethodAccessor13.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:601)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:563)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1388)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1384)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1382)
at org.apache.hadoop.ipc.Client.call(Client.java:1070)
at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:225)
at $Proxy2.complete(Unknown Source)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:601)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
at $Proxy2.complete(Unknown Source)
at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.closeInternal(DFSClient.java:3897)
at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.close(DFSClient.java:3812)
at org.apache.hadoop.hdfs.DFSClient$LeaseChecker.close(DFSClient.java:1345)
at org.apache.hadoop.hdfs.DFSClient.close(DFSClient.java:275)
at org.apache.hadoop.hdfs.DistributedFileSystem.close(DistributedFileSystem.java:328)
at org.apache.hadoop.fs.FileSystem$Cache.closeAll(FileSystem.java:1446)
at org.apache.hadoop.fs.FileSystem.closeAll(FileSystem.java:277)
at org.apache.hadoop.fs.FileSystem$ClientFinalizer.run(FileSystem.java:260)
Does your hive query create parallel MR jobs ?
I had the same problem, and found that: LeaseExpiredException: No lease error on HDFS :
When the job ends he deletes /data/work/ folder. If few jobs are running in parallel the deletion will also delete the files of the another job. actually I need to delete /data/work/.
In other words this exception is thrown when the job try to access to files which are not existed anymore
SET hive.exec.max.dynamic.partitions=100000;
SET hive.exec.max.dynamic.partitions.pernode=100000;

HBase completebulkload returns exception

I am trying to bulk-populate an HBase table quickly from a text file (several GB) by using the bulk loading method described in the Hadoop docs.
I have created an HFile which I now want to push to my HBase table.
When I use this command:
hadoop jar /home/hxcaine/hadoop/lib/hbase.jar completebulkload /user/hxcaine/dbpopulate/output/cf1 my_hbase_table
The job starts and then I get this exception:
Exception in thread "main" java.lang.NoClassDefFoundError: com/google/common/util/concurrent/ThreadFactoryBuilder
at org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.doBulkLoad(LoadIncrementalHFiles.java:195)
at org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.run(LoadIncrementalHFiles.java:696)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
at org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.main(LoadIncrementalHFiles.java:701)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68)
at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
at org.apache.hadoop.hbase.mapreduce.Driver.main(Driver.java:49)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.util.RunJar.main(RunJar.java:197)
Caused by: java.lang.ClassNotFoundException: com.google.common.util.concurrent.ThreadFactoryBuilder
at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
at java.lang.ClassLoader.loadClass(ClassLoader.java:307)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
at java.lang.ClassLoader.loadClass(ClassLoader.java:248)
... 17 more
However, I can see that the Guava jar is in my classpath and when I check inside the jar I can see ThreadFactoryBuilder.class.
I am using these versions (and stuck with them):
Hadoop 0.20.2-cdh3u3
HBase 0.90.4-cdh3u3
Guava jar: /usr/lib/hadoop-0.20/lib/guava-r09-jarjar.jar
I do have an older Guava jar in my classpath but I don't know where it came from, I don't suppose it should have an effect.
Any ideas?
what happens if you run:
export HADOOP_CLASSPATH=`hbase classpath`
before running the load? From the stack trace, it looks like the jar is needed by one of the actual tasks though I am surprised to see that this actually kicks off an M/R job.

Resources