hadoop WordCount program gets stuck and doesn't finish runnig - hadoop

I have configured hadoop on windows10. When I try to run WordCount program it doesn't finish the job:
What can I do to fix it?
Note: I have made a little change to the original program; input and output files are compiled with the code and don't need to be provided as arguments.

Related

See print in python script running on spark with spark-submit

I have to test some code using Spark and I'm pretty new to it.
The code I have runs an ETL script on a cluster. The ETL script is written in Python and have several prints in it but I'm unable to see those prints. The Python script is added to the spark-submit in the --py-files tag. I don't if those prints are unreachable since they are happening in the YARN executors and I should change them to logs and use log4j or add them to an accumulator reachable by the driver.
Any suggestions would help.
The final goal is to see how the execution of the code is going.I don't know if simple prints is the best solution but it was already in the code I was given to test.

can't run a MapReduce Job on Hadoop

I'm trying to run a mapreduce job took from the internet. This job takes in input a 'points.dat' file and makes a k-means clustering on it. It should produce a file 'centroids.dat' and a file with points matched to their own centroid. A couple of months this was working, but now i'm trying to re-execute on a new installation.
I made
bin/hdfs dfs -copyFromLocal ..//..//../home/<myusername>/Downloads/points.dat
Everything is fine and the file appears in the web service tool in the /user// path on hdfs . Jps is ok
The jar requests args:
<input> <output> <n clusters>
so i made
bin/hadoop jar ../../../home/<myusername>/Downloads/kmeans.jar /user/<myusername>/ /out 3
it creates a "centroids.dat" file in /user/ and a out/ directory. As much as i can understand it tries to re-read "centroids.dat" to execute. So it ends with some failures like
"failed creating symlink /tmp/hadoop-<myusername>/mapred/local/1466809349241/centroids.dat <- /usr/local/hadoop/centroids.dat
So java raise a FileNotFoundException
I tried to shorten the question as much as possible. If more info are needed, no problem for me
I think you are missing to mention main class in your command
bin/hadoop jar kmeans.jar MainClass input output

Running MapReduce code that uses zooKeeper

I want to ask about how to execute a MapReduce java code that uses zooKeeper.
My first code is just to create a variable (znode) and to modify it by each mapper.
So I modified the wordCount code just to test zookeeper for the first time.
When I run it using the eclipse console, everything goes well, so I can see the changes on the value of the znode, etc.
However, I was trying to execute it using linux command line:
**bin/hadoop jar ./myjar.jar algo.WordCount /input.txt /out
I got the following error
**Error: java.lang.ClassNotFoundException: org.apache.zookeeper.Watcher
Although that I added the path of the jar file using conf.set("mapred.jar","...."); in the mapreduce code but I don't know why it did not recognize the classes of zookeeper.
Any idea?

How to run the hadoop simple program through command line

I'm new to the hadoop technologies .How to run the simple program through command line.I'm using windows environment.I install the Cygwin.Can you help me ...
Try the below URLs.
http://v-lad.org/Tutorials/Hadoop/00%20-%20Intro.html
http://hayesdavis.net/2008/06/14/running-hadoop-on-windows/
If you are new to Hadoop, try using one of the IDE plugins. This will help you get started quickly.
http://karmasphere.com/Studio-Eclipse/quick-click-guide.html
http://wiki.apache.org/hadoop/EclipsePlugIn
FYI ..... Hadoop on Windows is not recommended for Production.
Are your program written in Java? If so, you need to compile your program and pack the compiled files into a Jar file. And then run the program with hadoop command:
${hadoop_home}/bin/hadoop jar ${your_program_jar_file} ${main_class_of_jar}
You can run the Hadoop commands from anywhere in the terminal/command line, but only if the $path variable is set properly.
The syntax would be like this:
hadoop fs -<command> or hdfs fs -<command>
You review the docs for more information.

Running Hadoop examples halt in Pseudo-Distributed mode

Every thing run well in Standalone mode and when going to the pseudo-distributed mode, the HDFS works well, I can put files to HDFS and browse it. And I also checked that there is one DataNode in the live nodes lists.
However, when I run bin/hadoop jar hadoop-*-examples.jar grep input output 'dfs[a-z.]+', the program just halt there without producing any error. And from http://ereg.adobe.com:50070/dfsnodelist.jsp?whatNodes=LIVE I can see that nothing has ever been run on that DataNode.
I followed the configuration in the tutorial for those xml conf files. So anyone have any idea about what other mistakes I might have made? B.T.W, I'm running the stuffs on Mac OS X.
By halt, do you mean it hangs, or that it just silently returns? For Mapreduce issues, you should check the JobTracker's webpage (at port 50030) to see the status of the submitted job.

Resources