Executing Mahout against Hadoop cluster - hadoop

I have a jar file which contains the mahout jars as well as other code I wrote.
It works fine in my local machine.
I would like to run it in a cluster that has Hadoop already installed.
When I do
$HADOOP_HOME/bin/hadoop jar myjar.jar args
I get the error
Exception in thread "main" java.io.IOException: Mkdirs failed to create /some/hdfs/path (exists=false, cwd=file:local/folder/where/myjar/is)
at org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java 440)
...
at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
I checked that I can access and create the dir in the hdfs system.
I have also ran hadoop code (no mahout) without a problem.
I am running this in a linux machine.

Check for the mahout user and hadoop user being same. and also check for mahout and hadoop version compatibility.
Regards
Jyoti ranjan panda

Related

Jar file not found exception when running map reduce job when copying data from hbase

When I tried to execute the following command to copy data from hbase to another cluster in a hbase client environment. The command I ran is:
hbase org.apache.hadoop.hbase.mapreduce.CopyTable --peer.adr=[destination zk]:/hbase [source table name]
I got this error:
Exception in thread "main" java.io.FileNotFoundException: File does
not exist:
hdfs://servername:8020/opt/hbase-1.2.10/lib/metrics-core-2.2.0.jar at
org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1072)
at
org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1064)
at
org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
The /opt/hbase-1.2.10/lib/metrics-core-2.2.0.jar is on my local path but it does not exist in the hdfs. It seems the CopyTable util is submitting a mapreduce job without the dependency jars. I read a few articles and it seems the only solution is to upload the jar lib to hdfs with the same path. This is really an ugly solution.
Please kindly advise. Thanks!

Spark 2.0.1 not finding file passed in through archives flag

I was running Spark job which make use of other files that is passed in through --archives flag of spark
spark-submit .... --archives hdfs:///user/{USER}/{some_folder}.zip .... {file_to_run}.py
Spark is currently running on YARN and when I tried it with spark version 1.5.1 it was fine.
However, when I ran the same commands with spark 2.0.1, I got
ERROR yarn.ApplicationMaster: User class threw exception: java.io.IOException: Cannot run program "/home/{USER}/{some_folder}/.....": error=2, No such file or directory
Since the resource is managed by YARN, it is challenging to manually check if the file gets successfully decompressed and exist when the job runs.
I wonder if anyone has experienced similar issue.

Mkdirs failed to create hadoop.tmp.dir when running word count example for hadoop 1.2.1 version on windows 7 using cygwin

I am new to hadoop.I have installed hadoop using cygwin in windows 7. Now I am trying to run Wordcount example. when using the command:-
bin/hadoop jar hadoop-examples-1.2.1.jar WordCount /user/Taniya/input /user/Taniya/output
An error is shown:-
Mkdirs failed to create >\cygwin64\home\Taniya\tmp\hadoop-cyg_server
I have looked through the web and the solutions provided are not clear to me. Looking for a solution that is clearly stated.

Using different hadoop-mapreduce-client-core.jar to run hadoop cluster

I'm working on a hadoop cluster with CDH4.2.0 installed and ran into this error. It's been fixed in later versions of hadoop but I don't have access to update the cluster. Is there a way to tell hadoop to use this jar when running my job through the command line arguments like
hadoop jar MyJob.jar -D hadoop.mapreduce.client=hadoop-mapreduce-client-core-2.0.0-cdh4.2.0.jar
where the new mapreduce-client-core.jar file is the patched jar from the ticket. Or must hadoop be completely recompiled with this new jar? I'm new to hadoop so I don't know all the command line options that are possible.
I'm not sure how that would work as when you're executing the hadoop command you're actually executing code in the client jar.
Can you not use MR1? The issue says this issue only occurs when you're using MR2, so unless you really need Yarn you're probably better using the MR1 library to run your map/reduce.

hadoop - Where are input/output files stored in hadoop and how to execute java file in hadoop?

Suppose I write a java program and i want to run it in Hadoop, then
where should the file be saved?
how to access it from hadoop?
should i be calling it by the following command? hadoop classname
what is the command in hadoop to execute the java file?
The simplest answers I can think of to your questions are:
1) Anywhere
2,3,4)$HADOOP_HOME/bin/hadoop jar [path_to_your_jar_file]
A similar question was asked here Executing helloworld.java in apache hadoop
It may seem complicated, but it's simpler than you might think!
Compile your map/reduce classes, and your main class into a jar. Let's call this jar myjob.jar.
This jar does not need to include the Hadoop libraries, but it should include any other dependencies you have.
Your main method should set up and run your map/reduce job, here is an example.
Put this jar on any machine with the hadoop command line utility installed.
Run your main method using the hadoop command line utility:
hadoop jar myjob.jar
Hope that helps.
where should the file be saved?
The data should be saved in "hdfs". You will want to probably load it into the cluster from your data source using something like Apache Flume. The file can be placed anywhere but most home is /user/hadoop/
how to access it from hadoop?
SSH into the hadoop cluster headnode like a standard linux server.
To list your hadoop root hdfs
hadoop fs -ls /
should i be calling it by the following command? hadoop classname
You should be using the hadoop command to access your data and run your programs, try hadoop help
what is the command in hadoop to execute the java file?
hadoop -jar MyJar.jar com.mycompany.MainDriver arg[0] arg[1] ...

Resources