Getting Snapshot export failed - hadoop

I am getting the below error while exporting a snapshot in hbase. I have searched the jar in that location. It was there. Also I tried to modify the permissions of the jar after this but it didn't work too.
Snapshot export failed!
java.io.FileNotFoundException: File does not exist: hdfs://master1c.abc.def.com:8020/usr/lib/hbase/lib/hbase-protocol-0.98.0.2.1.7.0-784-hadoop2.jar
at org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1128)
at org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1120)
at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1120)
at org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.getFileStatus(ClientDistributedCacheManager.java:288)
at org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.getFileStatus(ClientDistributedCacheManager.java:224)
at org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.determineTimestamps(ClientDistributedCacheManager.java:93)
at org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.determineTimestampsAndCacheVisibilities(ClientDistributedCacheManager.java:57)
at org.apache.hadoop.mapreduce.JobSubmitter.copyAndConfigureFiles(JobSubmitter.java:265)
at org.apache.hadoop.mapreduce.JobSubmitter.copyAndConfigureFiles(JobSubmitter.java:301)
at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:389)
at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1285)
at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1282)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1594)
at org.apache.hadoop.mapreduce.Job.submit(Job.java:1282)
at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1303)
at org.apache.hadoop.hbase.snapshot.ExportSnapshot.runCopyJob(ExportSnapshot.java:543)
at org.apache.hadoop.hbase.snapshot.ExportSnapshot.run(ExportSnapshot.java:646)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.hadoop.hbase.snapshot.ExportSnapshot.innerMain(ExportSnapshot.java:697)
at org.apache.hadoop.hbase.snapshot.ExportSnapshot.main(ExportSnapshot.java:701)

i faced the similar problem, what you can try doing is if you have bundled your core-site.xml in your code remove it. Bundle only hdfs-site.xml and hbase-site.xml.
the probable cause it FS.defaultfs is set to "hdfs:// .." so it tries to work in that directory.

Related

Pig Error trying Edureka's Tutorial

I've been trying to run hadoop and other components like PIG.
I'm trying this tutorial:
https://www.edureka.co/blog/pig-programming-create-your-first-apache-pig-script/
Everything is right, but when I run the script at step 2 it throws this error:
2018-01-09 13:47:20,682 [JobControl] INFO org.apache.hadoop.mapreduce.lib.jobcontrol.ControlledJob - PigLatin:output.pig got an error while submitting
org.apache.pig.backend.executionengine.ExecException: ERROR 2118: Input path does not exist: hdfs://localhost:9000/carlos/information.txt
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.getSplits(PigInputFormat.java:279)
at org.apache.hadoop.mapreduce.JobSubmitter.writeNewSplits(JobSubmitter.java:301)
at org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(JobSubmitter.java:318)
at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:196)
at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1290)
at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1287)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698)
at org.apache.hadoop.mapreduce.Job.submit(Job.java:1287)
at org.apache.hadoop.mapreduce.lib.jobcontrol.ControlledJob.submit(ControlledJob.java:335)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.pig.backend.hadoop23.PigJobControl.submit(PigJobControl.java:128)
at org.apache.pig.backend.hadoop23.PigJobControl.run(PigJobControl.java:194)
at java.lang.Thread.run(Thread.java:748)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher$1.run(MapReduceLauncher.java:276)
Prior to step 1, you needed this command
hadoop fs –copyFromLocal /home/carlos/information.txt /carlos
However, this will copy your file to a file named /carlos on HDFS, if that directory doesn't already exist.
If you want /carlos to be a directory, you need to delete the file and make it
hadoop fs -rm /carlos
hadoop fs -mkdir /carlos
Also, trailing slashes should generally be used when copying files into a directory, like so
hadoop fs –copyFromLocal /home/carlos/information.txt /carlos/
You could also just make your Pig code load /carlos as the file. Even if it was a directory, this would still work to read all files in there

hbase mapreduce file not found exception

I have installed hadoop 2.4.1 and hbase 0.98.8 in 2 machines. When I run an hbase mapreduce job I get the below error:
Exception in thread "main" java.io.FileNotFoundException: File does not exist: hdfs://pc1/opt/hbase-0.98.8-hadoop2/lib/hbase-server-0.98.8-hadoop2.jar
at org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1128)
at org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1120)
at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1120)
at org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.getFileStatus(ClientDistributedCacheManager.java:288)
at org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.getFileStatus(ClientDistributedCacheManager.java:224)
at org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.determineTimestamps(ClientDistributedCacheManager.java:93)
at org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.determineTimestampsAndCacheVisibilities(ClientDistributedCacheManager.java:57)
at org.apache.hadoop.mapreduce.JobSubmitter.copyAndConfigureFiles(JobSubmitter.java:265)
at org.apache.hadoop.mapreduce.JobSubmitter.copyAndConfigureFiles(JobSubmitter.java:301)
at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:389)
at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1285)
at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1282)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1556)
at org.apache.hadoop.mapreduce.Job.submit(Job.java:1282)
at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1303)
at thesis.test2.run(test2.java:93)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at thesis.test2.main(test2.java:107)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
I can run hadoop mapreduce jobs and simple hbase jobs without any problems. The code I m trying to run is an example that is supposed to run.
Please provide "jps" output.
Because it seems like your hbase is not working , hopefully the problem will be with zookeeper
I faced the exact problem. You have to add the hbase library path to the .bashrc file. Add the lib folder in hbase to the CLASSPATH.
Also, add the classpath of hbase to HADOOP_CLASSPATH.
Your .bashrc file should contain the following:
export HADOOP_CLASSPATH=$HADOOP_CLASSPATH:`${HBASE_HOME}/bin/hbase classpath`
export HADOOP_CLASSPATH=$HADOOP_CLASSPATH:`${HBASE_HOME}/bin/hbase mapredcp`
export CLASSPATH=${HBASE_HOME}/lib/*
Note: The CLASSPATH should point to the lib folder of your hbase installation folder. Use the following to compile and run your java code.
javac Example.java
java -classpath $CLASSPATH:. Example

Hadoop distcp exception

We are using dictcp to copy data from CDH4 to CDH5. When we run the command on CDH5 destination namenode, we get the following exception. Please let me know if you have already encountered the problem and know the solution. Thanks.
5/01/05 18:15:47 ERROR tools.DistCp: Exception encountered
org.apache.hadoop.ipc.RemoteException(java.lang.NoSuchMethodError): org.apache.hadoop.net.NetworkTopology.pseudoSortByDistance(Lorg/apache/hadoop/net/Node;[Lorg/apache/hadoop/net/Node;)V
at org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager.sortLocatedBlocks(DatanodeManager.java:354)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1618)
at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getBlockLocations(NameNodeRpcServer.java:482)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getBlockLocations(ClientNamenodeProtocolServerSideTranslatorPB.java:322)
at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:587)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1026)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:416)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007)
at org.apache.hadoop.ipc.Client.call(Client.java:1411)
at org.apache.hadoop.ipc.Client.call(Client.java:1364)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
at sun.proxy.$Proxy14.getBlockLocations(Unknown Source)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getBlockLocations(ClientNamenodeProtocolTranslatorPB.java:246)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:616)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
at sun.proxy.$Proxy15.getBlockLocations(Unknown Source)
at org.apache.hadoop.hdfs.DFSClient.callGetBlockLocations(DFSClient.java:1179)
at org.apache.hadoop.hdfs.DFSClient.getLocatedBlocks(DFSClient.java:1169)
at org.apache.hadoop.hdfs.DFSClient.getLocatedBlocks(DFSClient.java:1159)
at org.apache.hadoop.hdfs.DFSInputStream.fetchLocatedBlocksAndGetLastBlockLength(DFSInputStream.java:270)
at org.apache.hadoop.hdfs.DFSInputStream.openInfo(DFSInputStream.java:237)
at org.apache.hadoop.hdfs.DFSInputStream.<init>(DFSInputStream.java:230)
at org.apache.hadoop.hdfs.DFSClient.open(DFSClient.java:1457)
at org.apache.hadoop.hdfs.DistributedFileSystem$3.doCall(DistributedFileSystem.java:301)
at org.apache.hadoop.hdfs.DistributedFileSystem$3.doCall(DistributedFileSystem.java:297)
at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
at org.apache.hadoop.hdfs.DistributedFileSystem.open(DistributedFileSystem.java:297)
at org.apache.hadoop.io.SequenceFile$Reader.openFile(SequenceFile.java:1832)
at org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1752)
at org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1773)
at org.apache.hadoop.io.SequenceFile$Sorter$SortPass.run(SequenceFile.java:2825)
at org.apache.hadoop.io.SequenceFile$Sorter.sortPass(SequenceFile.java:2785)
at org.apache.hadoop.io.SequenceFile$Sorter.sort(SequenceFile.java:2733)
at org.apache.hadoop.io.SequenceFile$Sorter.sort(SequenceFile.java:2774)
at org.apache.hadoop.tools.util.DistCpUtils.sortListing(DistCpUtils.java:356)
at org.apache.hadoop.tools.CopyListing.validateFinalListing(CopyListing.java:145)
at org.apache.hadoop.tools.CopyListing.buildListing(CopyListing.java:91)
at org.apache.hadoop.tools.GlobbedCopyListing.doBuildListing(GlobbedCopyListing.java:90)
at org.apache.hadoop.tools.CopyListing.buildListing(CopyListing.java:84)
at org.apache.hadoop.tools.DistCp.createInputFileListing(DistCp.java:353)
at org.apache.hadoop.tools.DistCp.execute(DistCp.java:160)
at org.apache.hadoop.tools.DistCp.run(DistCp.java:121)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.hadoop.tools.DistCp.main(DistCp.java:401)
Seems like this is because of protocol mismatch between two clusters.
As Hadoop version used in the source and destinations are different, you cannot use distcp by using hdfs://Cluster2-Namenode1:Port/ in source or destination, you have to use webhdfs:// instead of simple hdfs as follows.
hadoop distcp /source-directory webhdfs://Cluster2-Namenode1:50070/dir/
Please note 50070 is the default namenode Web UI, If you have configured different port for HDFS namenode WebUI, modify 50070 to your modified port.

hadoop mapreduce job fails as it checking for jars in the HDFS (only in hbase jobs)

Hadoop mapreduce job fails with the below exception if the job has hbase handling as in the below log, how ever I added the wanted jars to hadoop_classPath.
I can overcome this issue by adding the wanted jars on the wanted path on HDFS but I think that isn't the correct handling for the issue.
I want to know if there is missed thing that I should do to handle this issue.
BTW this issue is the same as in:
issue1
issue2
issue3
Here is the error:
Exception in thread "main" java.io.FileNotFoundException: File does not exist: hdfs://localhost/dedge1/hadoop/hbase-0.96.1.1-hadoop2/lib/netty-3.6.6.Final.jar
at org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1110)
at org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1102)
at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1102)
at org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.getFileStatus(ClientDistributedCacheManager.java:288)
at org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.getFileStatus(ClientDistributedCacheManager.java:224)
at org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.determineTimestamps(ClientDistributedCacheManager.java:93)
at org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.determineTimestampsAndCacheVisibilities(ClientDistributedCacheManager.java:57)
at org.apache.hadoop.mapreduce.JobSubmitter.copyAndConfigureFiles(JobSubmitter.java:264)
at org.apache.hadoop.mapreduce.JobSubmitter.copyAndConfigureFiles(JobSubmitter.java:300)
at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:387)
at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1268)
at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1265)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)

Error while copying from S3 to HDFS

I am trying to copy some files from S3 bucket to HDFS of my EMR cluster. But I am getting the following error:
Exception in thread "main" java.lang.RuntimeException: Error running job
at com.amazon.elasticmapreduce.s3distcp.S3DistCp.run(S3DistCp.java:771)
at com.amazon.elasticmapreduce.s3distcp.S3DistCp.run(S3DistCp.java:580)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
at com.amazon.elasticmapreduce.s3distcp.Main.main(Main.java:22)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
Caused by: org.apache.hadoop.mapreduce.lib.input.InvalidInputException: Input path does not exist: hdfs://10.87.26.26:9000/tmp/33e4f3b9-d29a-49e8-9706-ea70e07e3ff2/files
at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.listStatus(FileInputFormat.java:285)
at org.apache.hadoop.mapreduce.lib.input.SequenceFileInputFormat.listStatus(SequenceFileInputFormat.java:59)
at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.getSplits(FileInputFormat.java:340)
at org.apache.hadoop.mapreduce.JobSubmitter.writeNewSplits(JobSubmitter.java:491)
at org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(JobSubmitter.java:508)
at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:392)
at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1268)
at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1265)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491)
at org.apache.hadoop.mapreduce.Job.submit(Job.java:1265)
at com.amazon.elasticmapreduce.s3distcp.S3DistCp.run(S3DistCp.java:751)
... 9 more
The command I am using is :
./elastic-mapreduce --jobflow j-12345678 --jar /home/hadoop/lib/emr-s3distcp-1.0.jar --args '--src,s3n://my-bucket/data/,--dest,hdfs:///data/in,--srcPattern,xyz01-1-1*ped*' --step-name "Copy input files to HDFS" --wait-for-steps
I tried to run the sample word-count job, to check if there is any issue with HDFS, but it ran fine.
Can anyone please help me with this? If any more info is needed, please let me know and I will update the description.
Usually its the --srcPattern '<regex>' argument. You can also use hadoop fs -cp s3://src/file1.something /my/output/path/ to test for 1 file and modify your regex. Also starting with .* any char-0 or more times, should relax the matching.
It would be great to know if regex non-matches get logged and where.

Resources