Run hadoop locally - hadoop

I have installed hadoop on my computer and I am learnig how touse it with the cmd, however it doesnt seem to recognize my comands when I type the start-all.cmd command opens it opens yarn and the dfs while printing:
DEPRECATED: Use of this script to execute hdfs command is deprecated.
Instead use the hdfs command for it.
(after this it claims there is no main class)
Not to mention it doesn´t reconie the hadoop version
C:\hadoop\sbin>hadoop version
Error: main class not found
Also, I am unable to connect with the localhost browse server, evendo I configurate it as I was told.
As you may be able to tell this is my first time using hadoop, is there a book or a webpage where I can learn to use it locally?

Related

Is there a way to load the install-interpreter.sh file in EMR in order to load 3rd party interpreters?

I have an Apache Zeppelin notebook running and I'm trying to load the jdbc and/or postgres interpreter to my notebook in order to write to a postgres DB from Zeppelin.
The main resource to load new interpreters here tells me to run the code below to get other interpreters:
./bin/install-interpreter.sh --all
However, when I run this command in EMR terminal, I find that the EMR cluster does not come with an install-interpreter.sh executable file.
What is the recommended path?
1. Should I find the install-interpreter.sh file and load that to the EMR cluster under ./bin/?
2. Is there an EMR configuration on start time that would enable the install-interpreter.sh file?
Currently all tutorials and documentations assumes that you can run the install-interpreter.sh file.
The solution is to not run this code below in root (aka - ./ )
./bin/install-interpreter.sh --all
Instead in EMR, run the code above in Zeppelin, which in the EMR cluster, is in /usr/lib/zeppelin

start-all.sh command not found

I have just installed Cloudera VM setup for hadoop. But when I open the command prompt and want to start all daemons for hadoop using command 'start-all.sh' , I get an error stating "bash : start-all.sh: command not found".
I have tried 'start-dfs.sh' too yet still gives the same error. When I use 'jps' command, I can see that none of the daemons have been started.
You can find start-all.sh and start-dfs.sh scripts in bin or sbin folders. You can use the following command to find that. Go to hadoop installation folder and run this command.
find . -name 'start-all.sh' # Finds files having name similar to start-all.sh
Then you can specify the path to start all the daemons using bash /path/to/start-all.sh
If you're using the QuickStart VM then the right way to start the cluster (as #cricket_007 hinted) is by restarting it in the Cloudera Manager UI. The start-all.sh scripts will not work since those only apply to the Hadoop servers (Name Node, Data Node, Resource Manager, Node Manager ...) but not all the services in the ecosystem (like Hive, Impala, Spark, Oozie, Hue ...).
You can refer to the YouTube video and the official documentation Starting, Stopping, Refreshing, and Restarting a Cluster

"No such file or directory" in hadoop while executing WordCount program using jar command

I am new to Hadoop and am trying to execute the WordCount Problem.
Things I did so far -
Setting up the Hadoop Single Node cluster referring the below link.
http://www.bogotobogo.com/Hadoop/BigData_hadoop_Install_on_ubuntu_single_node_cluster.php
Write the word count problem referring the below link
https://kishorer.in/2014/10/22/running-a-wordcount-mapreduce-example-in-hadoop-2-4-1-single-node-cluster-in-ubuntu-14-04-64-bit/
Problem is when I execute the last line to run the program -
hadoop jar wordcount.jar /usr/local/hadoop/input /usr/local/hadoop/output
Following is the error I get -
The directory seems to be present
The file is also present in the directory with contents
Finally, on a side note I also tried the following directory sturcture in the jar command.
No avail! :/
I would really appreciate if someone could guide me here!
Regards,
Paul Alwin
Your first image is using input from the local Hadoop installation directory, /usr
If you want to use that data on your local filesystem, you can specify file:///usr/...
Otherwise, if you're running pseudo distributed mode, HDFS has been setup, and /usr does not exist in HDFS unless you explicitly created it there.
Based on the stacktrace, I believe the error comes from the /app/hadoop/ staging directory path not existing, or the permissions for it are not allowing your current user to run commands against that path
Suggestion: Hortonworks and Cloudera offer pre-built VirtualBox images and lots of tutorial resources. Most companies will have Hadoop from one of those vendors, so it's better to get familiar with that rather than mess around with having to install Hadoop yourself from scratch, in my opinion

unable to setup psuedo distributed hadoop cluster

I am using centos 7. Downloaded and untarred hadoop 2.4.0 and followed the instruction as per the link Hadoop 2.4.0 setup
Ran the following command.
./hdfs namenode -format
Got this error :
Error: Could not find or load main class org.apache.hadoop.hdfs.server.namenode.NameNode
I see a number of posts with the same error with no accepted answers and I have tried them all without any luck.
This error can occur if the necessary jarfiles are not readable by the user running the "./hdfs" command or are misplaced so that they can't be found by hadoop/libexec/hadoop-config.sh.
Check the permissions on the jarfiles under: hadoop-install/share/hadoop/*:
ls -l share/hadoop/*/*.jar
and if necessary, chmod them as the owner of the respective files to ensure they're readable. Something like chmod 644 should be sufficient to at least check if that fixes the initial problem. For the more permanent fix, you'll likely want to run the hadoop commands as the same user that owns all the files.
I followed the link Setup hadoop 2.4.0
and I was able to get over the error message.
Seems like the documentation on hadoop site is not complete.

How to run the hadoop simple program through command line

I'm new to the hadoop technologies .How to run the simple program through command line.I'm using windows environment.I install the Cygwin.Can you help me ...
Try the below URLs.
http://v-lad.org/Tutorials/Hadoop/00%20-%20Intro.html
http://hayesdavis.net/2008/06/14/running-hadoop-on-windows/
If you are new to Hadoop, try using one of the IDE plugins. This will help you get started quickly.
http://karmasphere.com/Studio-Eclipse/quick-click-guide.html
http://wiki.apache.org/hadoop/EclipsePlugIn
FYI ..... Hadoop on Windows is not recommended for Production.
Are your program written in Java? If so, you need to compile your program and pack the compiled files into a Jar file. And then run the program with hadoop command:
${hadoop_home}/bin/hadoop jar ${your_program_jar_file} ${main_class_of_jar}
You can run the Hadoop commands from anywhere in the terminal/command line, but only if the $path variable is set properly.
The syntax would be like this:
hadoop fs -<command> or hdfs fs -<command>
You review the docs for more information.

Resources