what's wrong with my hadoop configuration? - hadoop

I've done all the work that hadoop requires, but there seems something wrong with it, for example:
I have a class Hello.class, when I use the command "java Hello" it works correctly, but when I try to use the command "hadoop Hello" it reports that "cannot load or find the main class", but when I use "jar" command to change Hello.class into Hello.jar, however, I use the command "hadoop jar Hello.jar Hello", this time it works correctly just as I used the command "java Hello"
What is wrong with my configuration?
In file etc/profile the following has been added:
export JAVA_HOME=/usr/jdk1.7.0_04
export HADOOP_INSTALL=/usr/hadoop-1.0.1
export PATH=$PATH:$JAVA_HOME/bin:$HADOOP_INSTALL/bin
export CLASS_PATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar
I've added "export JAVA_HOME=/usr/jdk1.7.0_04" into file "hadoop-env.sh"
I've changed core-site.xml, hdfs-site.xml, mapred-site.xml accordingly
Is there anyone having the same problem?

The hadoop Hello command runs hadoop and looks for a class named Hello on the current classpath - which doesn't contain your class.
Bundling your class into a jar and running hadoop jar myjar.jar Hello tells hadoop to add the jar file myjar.jar to the classpath and then run the class named Hello (which is now on the classpath)
If you want to add a class to the classpath configure the HADOOP_CLASSPATH environment variable

Related

Change tmp directory while running yarn jar command

I am running an MR job using yarn jar command and it creates a temporary jar in /tmp folder which fills up the entire disk space. I want to redirect the path of this jar to some other folder where I have more disk space. On this link, I came to know that we can change the path by setting the property mapred.local.dir for hadoop version 1.x. I am using the following command to run the jar
yarn jar myjar.jar MyClass myyml.yml arg1 -D mapred.local.dir="/grid/1/uie/facts"
The above argument mapred.local.dir doesn't change the path and it is still creating the jar in tmp folder.
Found the hack to not write the unjar file to /tmp folder. Apparently, it is not a configurable behaviour, so we can avoid the use of 'hadoop jar' or 'yarn jar'(RunJar utility) by invoking instead with the generated classpath:
java -cp $(hadoop classpath):my-fat-jar-with-all-dependencies.jar
your.app.mainClass
1. Reference link

Is maven JAR runnable on hadoop?

To produce jar from hadoop mapreduce program(mapreduce wordcount example) i used maven.
Here i successfully done 'clean' and 'install'.
Also 'build' successfully by running as a Java Application by including arguments(input and output).
And it provided expected result successfully.
Now the problem is not running on hadoop.
Giving the following error:
Exception in thread "main" java.lang.ClassNotFoundException: WordCount
Is maven JAR runnable on hadoop?
Maven is a build tool which creates a Java Artifact. Any JAR containing the hadoop dependencies and the class having main() method in the Manifest file should be working with the hadoop.
Try running your JAR using the below command
hadoop jar your-jar.jar wordcount input output
where "wordcount" is the name of the class with main method,"input" and "output" are the arguments.
Two things
1) I think you are missing the package details before the classname. Copy your package name and put it before the classname and it should work.
hadoop jar /home/user/examples.jar com.test.examples.WordCount /home/user/inputfolder /home/user/outputfolder
PS: If you are using a jar for which the source code is not available with you, you can do
jar -tvf /home/user/examples.jar
and it will print all the classses with their folder names. Replace the "/" with "." (dot) and you get the package name. But this needs JDK (not JRE) in the PATH.
2) You are trying to run a MapReduce program from a Windows prompt. Are you sure you have Hadoop installed on your Windows?

Can't run java program using classpath variable from terminal

I'm on macOS Sierra, FYI.
So I'm trying to set a classpath variable, which has been done using:
nano .bash_profile
and I'm able to set the variable as follows:
# setting CLASSPATH for Hello World as a test
CLASSPATH="/Users/jonvanderlaan/NetBeansProjects/HelloWorld/src/HelloWorld"
export CLASSPATH
# setting PATH for Java 1.8.0
PATH="/Library/Java/JavaVirtualMachines/jdk1.8.0_40.jdk/Contents/Home/bin/java:$
export PATH
When I echo the classpath, it shows me the correct classpath. However, when I try to compile and run a .java in that folder, it doesn't work.
javac HelloWorld.java
javac: file not found: HelloWorld.java
Usage: javac <options> <source files>
use -help for a list of possible options
I am attaching a picture of what my directories look like.
Can someone tell me what my problem is here?
(also I'm fairly new to programming and VERY new to the command line. First time posting here. Please be nice!)
(edited to fix problems...still not working)
javac is the java compiler. try java -jar HelloWorld.jar to run your program.

Running a hadoop job

It is the first time I'm running a job on hadoop and started from WordCount example. To run my job, I', using this command
hduser#ubuntu:/usr/local/hadoop$ bin/hadoop jar hadoop*examples*.jar wordcount /user/hduser/gutenberg /user/hduser/gutenberg-output
and I think we should copy the jar file in /usr/local/hadoop . My first question is that what is the meaning of hadoop*examples*? and if we want to locate our jar file in another location for example /home/user/WordCountJar, what I should do? Thanks for your help in advance.
I think we should copy the jar file in /usr/local/hadoop
It is not mandatory. But if you have your jar at some other location, you need to specify the complete path while running your job.
My first question is that what is the meaning of hadoop*examples*?
hadoop*examples* is the name of your jar package that contains your MR job along with other dependencies. Here, * signifies that it can be any version. Not specifically 0.19.2 or something else. But, I feel it should be hadoop-examples-*.jar and not hadoop*examples*.jar
and if we want to locate our jar file in another location for example
/home/user/WordCountJar, what I should do?
If your jar is present in a directory other than the directory from where you are executing the command, you need to specify the complete path to your jar. Say,
bin/hadoop jar /home/user/WordCountJar/hadoop-*-examples.jar wordcount /user/hduser/gutenberg /user/hduser/gutenberg-output
The examples is just wildcard expansion to account for different version numbers in the file name. For example: hadoop-0.19.2-examples.jar
You can use the full path to your jar like so:
bin/hadoop jar /home/user/hadoop-0.19.2-examples.jar wordcount /user/hduser/gutenberg /user/hduser/gutenberg-output
Edit: the asterisks surrounding the word examples got removed from my post at time of submission.

Send executable jar to hadoop cluster and run as "hadoop jar"

I commonly make a executable jar package with a main method and run by the commandline "hadoop jar Some.jar ClassWithMain input output"
In this main method, Job and Configuration may be configured and Configuration class has a setter to specify mapper or reducer class like conf.setMapperClass(Mapper.class).
However, In the case of submitting job remotely, I should set jar and Mapper or more classes to use hadoop client api.
job.setJarByClass(HasMainMethod.class);
job.setMapperClass(Mapper_Class.class);
job.setReducerClass(Reducer_Class.class);
I want to programmatically transfer jar in client to remote hadoop cluster and execute this jar like "hadoop jar" command to make main method specify mapper and reducer.
So how can I deal with this problem?
hadoop is only a shell script. Eventually, hadoop jar will invoke org.apache.hadoop.util.RunJar. What hadoop jar do is helping you set up the CLASSPATH. So you can use it directly.
For example,
String input = "...";
String output = "...";
org.apache.hadoop.util.RunJar.main(
new String[]{"Some.jar", "ClassWithMain", input, output});
However, you need to set the CLASSPATH correctly before you use it. A convenient way to get the correct CLASSPATH is hadoop classpath. Type this command and you will get the full CLASSPATH.
Then set up the CLASSPATH before you run your java application. For example,
export CLASSPATH=$(hadoop classpath):$CLASSPATH
java -jar YourJar.jar

Resources