When I tried to execute the following command to copy data from hbase to another cluster in a hbase client environment. The command I ran is:
hbase org.apache.hadoop.hbase.mapreduce.CopyTable --peer.adr=[destination zk]:/hbase [source table name]
I got this error:
Exception in thread "main" java.io.FileNotFoundException: File does
not exist:
hdfs://servername:8020/opt/hbase-1.2.10/lib/metrics-core-2.2.0.jar at
org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1072)
at
org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1064)
at
org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
The /opt/hbase-1.2.10/lib/metrics-core-2.2.0.jar is on my local path but it does not exist in the hdfs. It seems the CopyTable util is submitting a mapreduce job without the dependency jars. I read a few articles and it seems the only solution is to upload the jar lib to hdfs with the same path. This is really an ugly solution.
Please kindly advise. Thanks!
I was running Spark job which make use of other files that is passed in through --archives flag of spark
spark-submit .... --archives hdfs:///user/{USER}/{some_folder}.zip .... {file_to_run}.py
Spark is currently running on YARN and when I tried it with spark version 1.5.1 it was fine.
However, when I ran the same commands with spark 2.0.1, I got
ERROR yarn.ApplicationMaster: User class threw exception: java.io.IOException: Cannot run program "/home/{USER}/{some_folder}/.....": error=2, No such file or directory
Since the resource is managed by YARN, it is challenging to manually check if the file gets successfully decompressed and exist when the job runs.
I wonder if anyone has experienced similar issue.
I am new to hadoop.I have installed hadoop using cygwin in windows 7. Now I am trying to run Wordcount example. when using the command:-
bin/hadoop jar hadoop-examples-1.2.1.jar WordCount /user/Taniya/input /user/Taniya/output
An error is shown:-
Mkdirs failed to create >\cygwin64\home\Taniya\tmp\hadoop-cyg_server
I have looked through the web and the solutions provided are not clear to me. Looking for a solution that is clearly stated.
I'm working on a hadoop cluster with CDH4.2.0 installed and ran into this error. It's been fixed in later versions of hadoop but I don't have access to update the cluster. Is there a way to tell hadoop to use this jar when running my job through the command line arguments like
hadoop jar MyJob.jar -D hadoop.mapreduce.client=hadoop-mapreduce-client-core-2.0.0-cdh4.2.0.jar
where the new mapreduce-client-core.jar file is the patched jar from the ticket. Or must hadoop be completely recompiled with this new jar? I'm new to hadoop so I don't know all the command line options that are possible.
I'm not sure how that would work as when you're executing the hadoop command you're actually executing code in the client jar.
Can you not use MR1? The issue says this issue only occurs when you're using MR2, so unless you really need Yarn you're probably better using the MR1 library to run your map/reduce.
Suppose I write a java program and i want to run it in Hadoop, then
where should the file be saved?
how to access it from hadoop?
should i be calling it by the following command? hadoop classname
what is the command in hadoop to execute the java file?
The simplest answers I can think of to your questions are:
1) Anywhere
2,3,4)$HADOOP_HOME/bin/hadoop jar [path_to_your_jar_file]
A similar question was asked here Executing helloworld.java in apache hadoop
It may seem complicated, but it's simpler than you might think!
Compile your map/reduce classes, and your main class into a jar. Let's call this jar myjob.jar.
This jar does not need to include the Hadoop libraries, but it should include any other dependencies you have.
Your main method should set up and run your map/reduce job, here is an example.
Put this jar on any machine with the hadoop command line utility installed.
Run your main method using the hadoop command line utility:
hadoop jar myjob.jar
Hope that helps.
where should the file be saved?
The data should be saved in "hdfs". You will want to probably load it into the cluster from your data source using something like Apache Flume. The file can be placed anywhere but most home is /user/hadoop/
how to access it from hadoop?
SSH into the hadoop cluster headnode like a standard linux server.
To list your hadoop root hdfs
hadoop fs -ls /
should i be calling it by the following command? hadoop classname
You should be using the hadoop command to access your data and run your programs, try hadoop help
what is the command in hadoop to execute the java file?
hadoop -jar MyJar.jar com.mycompany.MainDriver arg[0] arg[1] ...