Location of a Hadoop job input file - spring

I'm trying to generate a sequence file in my Spring batch job to be passed to a Hadoop map/reduce. I managed to get the job to work once by manually copying the file onto the hdfs. And when it's run in my local system test, it runs fine because the local filesystem finds the file. But when I attempt to deploy it to a remote Hadoop instance, I
get the following exception.
org.apache.hadoop.mapreduce.lib.input.InvalidInputException: Input path does not exist: hdfs://ngram-test:9000/user/hduser/DocumentsPTOgrants2007_2011.seq
at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.listStatus(FileInputFormat.java:224)
at org.apache.hadoop.mapreduce.lib.input.SequenceFileInputFormat.listStatus(SequenceFileInputFormat.java:55)
at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.getSplits(FileInputFormat.java:241)
at org.apache.hadoop.mapred.JobClient.writeNewSplits(JobClient.java:885)
at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:779)
at org.apache.hadoop.mapreduce.Job.submit(Job.java:432)
at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:447)
at com.atsid.hadoop.jobs.AbstractJobRunner.executeJob(AbstractJobRunner.java:70)
at com.atsid.hadoop.jobs.AbstractJobRunner.run(AbstractJobRunner.java:36)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
at com.atsid.cloudbase.ngram.ingest.mapreduce.NGramIngestJobRunner.main(NGramIngestJobRunner.java:34)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.springframework.util.ReflectionUtils.invokeMethod(ReflectionUtils.java:191)
at org.springframework.data.hadoop.mapreduce.JarExecutor.invokeTargetObject(JarExecutor.java:71)
at org.springframework.data.hadoop.mapreduce.HadoopCodeExecutor.invokeTarget(HadoopCodeExecutor.java:185)
at org.springframework.data.hadoop.mapreduce.HadoopCodeExecutor.runCode(HadoopCodeExecutor.java:102)
at org.springframework.data.hadoop.mapreduce.JarTasklet.execute(JarTasklet.java:32)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.springframework.aop.support.AopUtils.invokeJoinpointUsingReflection(AopUtils.java:317)
at org.springframework.aop.framework.ReflectiveMethodInvocation.invokeJoinpoint(ReflectiveMethodInvocation.java:183)
at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:150)
at org.springframework.aop.support.DelegatingIntroductionInterceptor.doProceed(DelegatingIntroductionInterceptor.java:132)
at org.springframework.aop.support.DelegatingIntroductionInterceptor.invoke(DelegatingIntroductionInterceptor.java:120)
at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:172)
at org.springframework.aop.framework.JdkDynamicAopProxy.invoke(JdkDynamicAopProxy.java:204)
at com.sun.proxy.$Proxy43.execute(Unknown Source)
at org.springframework.batch.core.step.tasklet.TaskletStep$ChunkTransactionCallback.doInTransaction(TaskletStep.java:386)
at org.springframework.transaction.support.TransactionTemplate.execute(TransactionTemplate.java:131)
at org.springframework.batch.core.step.tasklet.TaskletStep$2.doInChunkContext(TaskletStep.java:264)
at org.springframework.batch.core.scope.context.StepContextRepeatCallback.doInIteration(StepContextRepeatCallback.java:76)
at org.springframework.batch.repeat.support.RepeatTemplate.getNextResult(RepeatTemplate.java:367)
at org.springframework.batch.repeat.support.RepeatTemplate.executeInternal(RepeatTemplate.java:214)
at org.springframework.batch.repeat.support.RepeatTemplate.iterate(RepeatTemplate.java:143)
at org.springframework.batch.core.step.tasklet.TaskletStep.doExecute(TaskletStep.java:250)
at org.springframework.batch.core.step.AbstractStep.execute(AbstractStep.java:195)
at org.springframework.batch.core.job.SimpleStepHandler.handleStep(SimpleStepHandler.java:135)
at org.springframework.batch.core.job.flow.JobFlowExecutor.executeStep(JobFlowExecutor.java:61)
at org.springframework.batch.core.job.flow.support.state.StepState.handle(StepState.java:60)
at org.springframework.batch.core.job.flow.support.SimpleFlow.resume(SimpleFlow.java:144)
at org.springframework.batch.core.job.flow.support.SimpleFlow.start(SimpleFlow.java:124)
at org.springframework.batch.core.job.flow.FlowJob.doExecute(FlowJob.java:135)
at org.springframework.batch.core.job.AbstractJob.execute(AbstractJob.java:293)
at org.springframework.batch.core.launch.support.SimpleJobLauncher$1.run(SimpleJobLauncher.java:120)
at org.springframework.core.task.SyncTaskExecutor.execute(SyncTaskExecutor.java:49)
at org.springframework.batch.core.launch.support.SimpleJobLauncher.run(SimpleJobLauncher.java:114)
at org.springframework.batch.core.launch.support.CommandLineJobRunner.start(CommandLineJobRunner.java:349)
at org.springframework.batch.core.launch.support.CommandLineJobRunner.main(CommandLineJobRunner.java:574)
Here's the tasklet configuration used by the step. I'm attempting to use the files attribute to pass the input file to the HDFS. The file is appearing in the Hadoop logs.
<hdp:jar-tasklet id="ingestJarTasklet" scope="step"
jar="file:${ingest.job.jar.path}"
main-class="com.atsid.cloudbase.ngram.ingest.mapreduce.NGramIngestJobRunner"
libs="${ingest.job.libs}"
files="#{seqFileLocation.URI.toString()}"
configuration-ref="hadoopConfiguration">
ngram.jobrunner.input.document.sequence.file=${job.file.location}#{seqFileLocation.filename}
</hdp:jar-tasklet>

Is your input path /user/hduser/DocumentsPTOgrants2007_2011.seq or you give hdfs://ngram-test:9000/user/hduser/DocumentsPTOgrants2007_2011.seq?
If you use the second one try the first one, and make sure that DocumentsPTOgrants2007_2011.seq is there.
You can have a look at whether it is there by this command: hadoop dfs -ls ./

Related

Hadoop on windows: Not a valid DFS filename

I have configured Hadoop 2.7.2 on Windows, I can see name node, data node, resource manager and node manager are running properly, problem occures when I try to run one of the map reduce program provided as example.
Please find below the command I am runningc:\hdp\bin\yarn jar c:\hdp\share\hadoop\mapreduce\hadoop-mapreduce-examples-2.7.2.jar wordcount c:\hdp\LICENSE.txt /out
I can see all the files are present at the desired locations.
Please find below the stack trace:
C:\WINDOWS\system32>c:\hdp\bin\yarn jar c:\hdp\share\hadoop\mapreduce\hadoop-map
reduce-examples-2.7.2.jar wordcount c:\hdp\LICENSE.txt /out
16/02/03 16:50:55 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0
:8032
16/02/03 16:50:56 INFO mapreduce.JobSubmitter: Cleaning up the staging area /tmp
/hadoop-yarn/staging/tbhakre/.staging/job_1454492091405_0006
java.lang.IllegalArgumentException: Pathname /c:/hdp/LICENSE.txt from hdfs://0.0
.0.0:19000/c:/hdp/LICENSE.txt is not a valid DFS filename.
at org.apache.hadoop.hdfs.DistributedFileSystem.getPathName(DistributedF
ileSystem.java:197)
at org.apache.hadoop.hdfs.DistributedFileSystem.access$000(DistributedFi
leSystem.java:106)
at org.apache.hadoop.hdfs.DistributedFileSystem$22.doCall(DistributedFil
eSystem.java:1305)
at org.apache.hadoop.hdfs.DistributedFileSystem$22.doCall(DistributedFil
eSystem.java:1301)
at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkRes
olver.java:81)
at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(Distribute
dFileSystem.java:1301)
at org.apache.hadoop.fs.Globber.getFileStatus(Globber.java:57)
at org.apache.hadoop.fs.Globber.glob(Globber.java:252)
at org.apache.hadoop.fs.FileSystem.globStatus(FileSystem.java:1674)
at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.singleThreadedL
istStatus(FileInputFormat.java:294)
at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.listStatus(File
InputFormat.java:265)
at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.getSplits(FileI
nputFormat.java:387)
at org.apache.hadoop.mapreduce.JobSubmitter.writeNewSplits(JobSubmitter.
java:301)
at org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(JobSubmitter.jav
a:318)
at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitt
er.java:196)
at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1290)
at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1287)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInforma
tion.java:1657)
at org.apache.hadoop.mapreduce.Job.submit(Job.java:1287)
at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1308)
at org.apache.hadoop.examples.WordCount.main(WordCount.java:87)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.
java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAcces
sorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(Progra
mDriver.java:71)
at org.apache.hadoop.util.ProgramDriver.run(ProgramDriver.java:144)
at org.apache.hadoop.examples.ExampleDriver.main(ExampleDriver.java:74)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.
java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAcces
sorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
if you are running everything on the same machine, you can just refer to it with a file: URL, like file://c:/hdp/LICENSE.txt, [maybe it's file:/c:/hdp/LICENSE.txt; I forget the naming logic]
LICENSE.txt (first argument for wordcount) should be located in HDFS, not on your local machine.

No valid local directories in property: mapred.local.dir

I am running the VM in pseudo mode.
Due to some resource related issues (Name Node in safe mode, not able to leave) I had to format and restart the namenode of my Cloudera 4.x. I didn't have any other choice.
I used the steps provided here:
Writing to HDFS could only be replicated to 0 nodes instead of minReplication (=1)
After that I am able to properly use get/put command in hdfs which means I have read/write permission.
Now, when I try to submit the job, I am getting following exception.
Exception in thread "main"org.apache.hadoop.ipc.RemoteException(java.io.IOException): java.io.IOException: No valid local directories in property: mapred.local.dir
at org.apache.hadoop.mapred.JobTracker.submitJob(JobTracker.java:3491)
at org.apache.hadoop.mapred.JobTracker.submitJob(JobTracker.java:3459)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.ipc.WritableRpcEngine$Server$WritableRpcInvoker.call(WritableRpcEngine.java:474)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:898)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1693)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1689)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1332)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1687)
Caused by: java.io.IOException: No valid local directories in property: mapred.local.dir
at org.apache.hadoop.conf.Configuration.getLocalPath(Configuration.java:1678)
at org.apache.hadoop.mapred.JobConf.getLocalPath(JobConf.java:500)
at org.apache.hadoop.mapred.JobInProgress.<init>(JobInProgress.java:409)
at org.apache.hadoop.mapred.JobTracker.submitJob(JobTracker.java:3489)
... 13 more
at org.apache.hadoop.ipc.Client.call(Client.java:1160)
at org.apache.hadoop.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:225)
at org.apache.hadoop.mapred.$Proxy10.submitJob(Unknown Source)
at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:973)
at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:896)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1332)
at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:896)
at org.apache.hadoop.mapreduce.Job.submit(Job.java:531)
at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:561)
at clustering.mapreduce.KMeansClusteringJob.main(KMeansClusteringJob.java:69)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.util.RunJar.main(RunJar.java:208)**
When I searched for above exception I found multiple links stating that mapred.local.dir should be properly defined and if not set then hadoop.tmp.dir is used.
I explicitly set mapred.local.dir in mapred-site.xml and given full permission to the default folder (/var/lib/hadoop-hdfs/cache).
The problem still persists.
Can someone please help in solving the issue?
Regards
Didn't give proper permission to the local directory -- Marking as Community wiki as answer was provided in the comments

Hadoop distcp exception

We are using dictcp to copy data from CDH4 to CDH5. When we run the command on CDH5 destination namenode, we get the following exception. Please let me know if you have already encountered the problem and know the solution. Thanks.
5/01/05 18:15:47 ERROR tools.DistCp: Exception encountered
org.apache.hadoop.ipc.RemoteException(java.lang.NoSuchMethodError): org.apache.hadoop.net.NetworkTopology.pseudoSortByDistance(Lorg/apache/hadoop/net/Node;[Lorg/apache/hadoop/net/Node;)V
at org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager.sortLocatedBlocks(DatanodeManager.java:354)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1618)
at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getBlockLocations(NameNodeRpcServer.java:482)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getBlockLocations(ClientNamenodeProtocolServerSideTranslatorPB.java:322)
at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:587)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1026)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:416)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007)
at org.apache.hadoop.ipc.Client.call(Client.java:1411)
at org.apache.hadoop.ipc.Client.call(Client.java:1364)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
at sun.proxy.$Proxy14.getBlockLocations(Unknown Source)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getBlockLocations(ClientNamenodeProtocolTranslatorPB.java:246)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:616)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
at sun.proxy.$Proxy15.getBlockLocations(Unknown Source)
at org.apache.hadoop.hdfs.DFSClient.callGetBlockLocations(DFSClient.java:1179)
at org.apache.hadoop.hdfs.DFSClient.getLocatedBlocks(DFSClient.java:1169)
at org.apache.hadoop.hdfs.DFSClient.getLocatedBlocks(DFSClient.java:1159)
at org.apache.hadoop.hdfs.DFSInputStream.fetchLocatedBlocksAndGetLastBlockLength(DFSInputStream.java:270)
at org.apache.hadoop.hdfs.DFSInputStream.openInfo(DFSInputStream.java:237)
at org.apache.hadoop.hdfs.DFSInputStream.<init>(DFSInputStream.java:230)
at org.apache.hadoop.hdfs.DFSClient.open(DFSClient.java:1457)
at org.apache.hadoop.hdfs.DistributedFileSystem$3.doCall(DistributedFileSystem.java:301)
at org.apache.hadoop.hdfs.DistributedFileSystem$3.doCall(DistributedFileSystem.java:297)
at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
at org.apache.hadoop.hdfs.DistributedFileSystem.open(DistributedFileSystem.java:297)
at org.apache.hadoop.io.SequenceFile$Reader.openFile(SequenceFile.java:1832)
at org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1752)
at org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1773)
at org.apache.hadoop.io.SequenceFile$Sorter$SortPass.run(SequenceFile.java:2825)
at org.apache.hadoop.io.SequenceFile$Sorter.sortPass(SequenceFile.java:2785)
at org.apache.hadoop.io.SequenceFile$Sorter.sort(SequenceFile.java:2733)
at org.apache.hadoop.io.SequenceFile$Sorter.sort(SequenceFile.java:2774)
at org.apache.hadoop.tools.util.DistCpUtils.sortListing(DistCpUtils.java:356)
at org.apache.hadoop.tools.CopyListing.validateFinalListing(CopyListing.java:145)
at org.apache.hadoop.tools.CopyListing.buildListing(CopyListing.java:91)
at org.apache.hadoop.tools.GlobbedCopyListing.doBuildListing(GlobbedCopyListing.java:90)
at org.apache.hadoop.tools.CopyListing.buildListing(CopyListing.java:84)
at org.apache.hadoop.tools.DistCp.createInputFileListing(DistCp.java:353)
at org.apache.hadoop.tools.DistCp.execute(DistCp.java:160)
at org.apache.hadoop.tools.DistCp.run(DistCp.java:121)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.hadoop.tools.DistCp.main(DistCp.java:401)
Seems like this is because of protocol mismatch between two clusters.
As Hadoop version used in the source and destinations are different, you cannot use distcp by using hdfs://Cluster2-Namenode1:Port/ in source or destination, you have to use webhdfs:// instead of simple hdfs as follows.
hadoop distcp /source-directory webhdfs://Cluster2-Namenode1:50070/dir/
Please note 50070 is the default namenode Web UI, If you have configured different port for HDFS namenode WebUI, modify 50070 to your modified port.

FileNotFoundException for hive/lib/hive-builtins-0.9.0.jar in Hive

I've a configured Hadoop 0.23 on my local box and got it work with a simple map-reduce wordcount program. I Have configured Hive to work with it. All the DDL queries works fine. But when i fire queries that have aggregates (which will trigger Map-educe jobs)
java.io.FileNotFoundException: File does not exist: /Users/varadham/projects/hadoop/hive/lib/hive-builtins-0.9.0.jar
at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:738)
at org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.getFileStatus(ClientDistributedCacheManager.java:208)
at org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.determineTimestamps(ClientDistributedCacheManager.java:71)
at org.apache.hadoop.mapreduce.JobSubmitter.copyAndConfigureFiles(JobSubmitter.java:252)
at org.apache.hadoop.mapreduce.JobSubmitter.copyAndConfigureFiles(JobSubmitter.java:290)
at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:361)
at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1218)
at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1215)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1212)
at org.apache.hadoop.mapreduce.Job.submit(Job.java:1215)
at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:609)
at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:604)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1212)
at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:604)
at org.apache.hadoop.hive.ql.exec.ExecDriver.execute(ExecDriver.java:435)
at org.apache.hadoop.hive.ql.exec.ExecDriver.main(ExecDriver.java:693)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:601)
at org.apache.hadoop.util.RunJar.main(RunJar.java:208)
Job Submission failed with exception 'java.io.FileNotFoundException(File does not exist: /Users/varadham/projects/hadoop/hive/lib/hive-builtins-0.9.0.jar)'
You should create the same file /Users/varadham/projects/hadoop/hive/lib/hive-builtins-0.9.0.jar in your hadoop file system. Then it should work.
Make sure you have the jar in HDFS.

HBase completebulkload returns exception

I am trying to bulk-populate an HBase table quickly from a text file (several GB) by using the bulk loading method described in the Hadoop docs.
I have created an HFile which I now want to push to my HBase table.
When I use this command:
hadoop jar /home/hxcaine/hadoop/lib/hbase.jar completebulkload /user/hxcaine/dbpopulate/output/cf1 my_hbase_table
The job starts and then I get this exception:
Exception in thread "main" java.lang.NoClassDefFoundError: com/google/common/util/concurrent/ThreadFactoryBuilder
at org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.doBulkLoad(LoadIncrementalHFiles.java:195)
at org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.run(LoadIncrementalHFiles.java:696)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
at org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.main(LoadIncrementalHFiles.java:701)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68)
at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
at org.apache.hadoop.hbase.mapreduce.Driver.main(Driver.java:49)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.util.RunJar.main(RunJar.java:197)
Caused by: java.lang.ClassNotFoundException: com.google.common.util.concurrent.ThreadFactoryBuilder
at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
at java.lang.ClassLoader.loadClass(ClassLoader.java:307)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
at java.lang.ClassLoader.loadClass(ClassLoader.java:248)
... 17 more
However, I can see that the Guava jar is in my classpath and when I check inside the jar I can see ThreadFactoryBuilder.class.
I am using these versions (and stuck with them):
Hadoop 0.20.2-cdh3u3
HBase 0.90.4-cdh3u3
Guava jar: /usr/lib/hadoop-0.20/lib/guava-r09-jarjar.jar
I do have an older Guava jar in my classpath but I don't know where it came from, I don't suppose it should have an effect.
Any ideas?
what happens if you run:
export HADOOP_CLASSPATH=`hbase classpath`
before running the load? From the stack trace, it looks like the jar is needed by one of the actual tasks though I am surprised to see that this actually kicks off an M/R job.

Resources