How to run the hadoop simple program through command line - hadoop

I'm new to the hadoop technologies .How to run the simple program through command line.I'm using windows environment.I install the Cygwin.Can you help me ...

Try the below URLs.
http://v-lad.org/Tutorials/Hadoop/00%20-%20Intro.html
http://hayesdavis.net/2008/06/14/running-hadoop-on-windows/
If you are new to Hadoop, try using one of the IDE plugins. This will help you get started quickly.
http://karmasphere.com/Studio-Eclipse/quick-click-guide.html
http://wiki.apache.org/hadoop/EclipsePlugIn
FYI ..... Hadoop on Windows is not recommended for Production.

Are your program written in Java? If so, you need to compile your program and pack the compiled files into a Jar file. And then run the program with hadoop command:
${hadoop_home}/bin/hadoop jar ${your_program_jar_file} ${main_class_of_jar}

You can run the Hadoop commands from anywhere in the terminal/command line, but only if the $path variable is set properly.
The syntax would be like this:
hadoop fs -<command> or hdfs fs -<command>
You review the docs for more information.

Related

Not a valid jar when I was running an example of Hadoop

I am learning Hadoop recently. I am using sandbox on virtualbox. I downloaded a python script with mrjob frame and run the following command,
python RatingsBreakdown.py -r hadoop --hadoop-streaming-jar /usr/hdp/current/hadoop-mapreduce-client/hadoop-streaming-jar u.data
and then got this,
Running step 1 of 1...
Not a valid JAR: /usr/hdp/2.6.3.0-235/hadoop-mapreduce/hadoop-streaming-jar
lib/hadoop-mapreduce/hadoop-streaming.jar
This is the jar in my computer ,
a valid jar is end with .jar your command is has some mistakes .
you can open the folder to observe (cd foldername) the filename or try to use tab to completion your file name .In that way to reduce mistakes.

How can I fix ClassNotFounException when executing HBase java application from command line?

I don't know anything about bash, but i put together a script to help me run my Hbase java application:
#!/bin/bash
HADOOP_CLASSPATH="$(hbase classpath)"
hadoop jar my.jar my_pkg.my_class
When I run it I get a:
Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/hadoop/hbase/regionserver/IncreasingToUpperBoundRegionSplitPolicy
When I echo out the HADOOP_CLASSPATH I see that hbase-server-1.2.0-cdh5.8.0.jar is there...
Is the hadoop jar command ignoring the HADOOP_CLASSPATH?
Also I have tried to run the commands from the command-line instead of using my script. I get the same error.
The approach was inspired by this cloduera-question
The solution was to include the Hadoop class path on the same line. I am not certain what the difference is, but this works:
HADOOP_CLASSPATH="$(hbase classpath)" hadoop jar my.jar my_pkg.my_class

hbase installation on single node

i have installed hadoop single node on ubuntu 12.04. Now I am trying to install hbase over it (version 0.94.18). But i get the following errors(even though i have extracted it in the /usr/local/hbase):
Error: Could not find or load main class org.apache.hadoop.hbase.util.HBaseConfTool
Error: Could not find or load main class org.apache.hadoop.hbase.zookeeper.ZKServerTool
starting master, logging to /usr/lib/hbase/hbase-0.94.8/logs/hbase-hduser-master-ubuntu.out
nice: /usr/lib/hbase/hbase-0.94.8/bin/hbase: No such file or directory
cat: /usr/lib/hbase/hbase-0.94.8/conf/regionservers: No such file or directory
To resolve This Error
Download binary version of hbase
Edit conf file hbase-env.sh and hbase-site.xml
Set Up Hbase Home Directory
Start hbase By - Start-hbase.sh
Explanation To above Error:
Could not find or load main class your downloaded version does not have required jar
Hi can you tell when it is coming this error.
I think you gave environment set wrong
You should enter bellow command:
export HBASE_HOME="/usr/lib/hbase/hbase-0.94.18"
Then try hbase it will work.
If you want shell script you can download this lik :: https://github.com/tonyreddy/Apache-Hadoop1.2.1-SingleNode-installation-shellscript
It have hadoop, hive, hbase, pig.
Thank
Tony.
It is not recommended to run hbase from the source distribution directly instead you have to download the binary distribution as they have mentioned in their official site, follow the same instructions and you will get it up.
You could try installing the version 0.94.27
Download it from : h-base 0.94.27 dowload
This one worked for me.
Follow the instruction specified in :
Hbase installation guide
sed "s/<\/configuration>/<property>\n<name>hbase.rootdir<\/name>\n<value>hdfs:\/\/'$c':54310\/hbase<\/value>\n<\/property>\n<property>\n<name>hbase.cluster.distributed<\/name>\n<value>true<\/value>\n<\/property>\n<property>\n<name>hbase.zookeeper.property.clientPort<\/name>\n<value>2181<\/value>\n<\/property>\n<property>\n<name>hbase.zookeeper.quorum<\/name>\n<value>'$c'<\/value>\n<\/property>\n<\/configuration>/g" -i.bak hbase/conf/hbase-site.xml
sed 's/localhost/'$c'/g' hbase/conf/regionservers -i
sed 's/#\ export\ HBASE_MANAGES_ZK=true/export\ HBASE_MANAGES_ZK=true/g' hbase/conf/hbase-env.sh -i
Yes just type this tree commands and you need change replace $c to your hostname.
Then try it will work.

Using different hadoop-mapreduce-client-core.jar to run hadoop cluster

I'm working on a hadoop cluster with CDH4.2.0 installed and ran into this error. It's been fixed in later versions of hadoop but I don't have access to update the cluster. Is there a way to tell hadoop to use this jar when running my job through the command line arguments like
hadoop jar MyJob.jar -D hadoop.mapreduce.client=hadoop-mapreduce-client-core-2.0.0-cdh4.2.0.jar
where the new mapreduce-client-core.jar file is the patched jar from the ticket. Or must hadoop be completely recompiled with this new jar? I'm new to hadoop so I don't know all the command line options that are possible.
I'm not sure how that would work as when you're executing the hadoop command you're actually executing code in the client jar.
Can you not use MR1? The issue says this issue only occurs when you're using MR2, so unless you really need Yarn you're probably better using the MR1 library to run your map/reduce.

hadoop - Where are input/output files stored in hadoop and how to execute java file in hadoop?

Suppose I write a java program and i want to run it in Hadoop, then
where should the file be saved?
how to access it from hadoop?
should i be calling it by the following command? hadoop classname
what is the command in hadoop to execute the java file?
The simplest answers I can think of to your questions are:
1) Anywhere
2,3,4)$HADOOP_HOME/bin/hadoop jar [path_to_your_jar_file]
A similar question was asked here Executing helloworld.java in apache hadoop
It may seem complicated, but it's simpler than you might think!
Compile your map/reduce classes, and your main class into a jar. Let's call this jar myjob.jar.
This jar does not need to include the Hadoop libraries, but it should include any other dependencies you have.
Your main method should set up and run your map/reduce job, here is an example.
Put this jar on any machine with the hadoop command line utility installed.
Run your main method using the hadoop command line utility:
hadoop jar myjob.jar
Hope that helps.
where should the file be saved?
The data should be saved in "hdfs". You will want to probably load it into the cluster from your data source using something like Apache Flume. The file can be placed anywhere but most home is /user/hadoop/
how to access it from hadoop?
SSH into the hadoop cluster headnode like a standard linux server.
To list your hadoop root hdfs
hadoop fs -ls /
should i be calling it by the following command? hadoop classname
You should be using the hadoop command to access your data and run your programs, try hadoop help
what is the command in hadoop to execute the java file?
hadoop -jar MyJar.jar com.mycompany.MainDriver arg[0] arg[1] ...

Resources