No output files from mahout - hadoop

I am running a mahout recommenderJob on hadoop in syncfusion. I get the following. But no output... it seems to run indefinitely
Does anyone have an idea why I am not getting an output.txt from this? Why does this seem to run indefinitely?

I suspect this could be due to the insufficient disk space in your machine and in this case, I'd suggest you to clean up your disk space and try this again from your end.
In alternate, I'd also suggest you to use the Syncfusion Cluster Manager - using which you can form a cluster with multiple nodes/machines, so that there will be suffifient memory available to execute your job.
-Ramkumar

I've tested the same map reduce job which you're trying to execute using Syncfusion BigData Studio and it worked for me.
Please find the input details which I've used from the following,
Command:
hadoop jar E:\mahout-examples-0.12.2-job.jar org.apache.mahout.cf.taste.hadoop.item.RecommenderJob -s SIMILARITY_COOCCURRENCE --input=/Input.txt --output=output
Sample input (Input.txt):
For input data, I've used the data available in Apache - Mahout site (refer below link) and saved the same in a text file.
http://mahout.apache.org/users/recommender/userbased-5-minutes.html
I've also seen a misspelled word "COOCCURRENCE" used in your command. Please correct this, or else you could face "Class Not Found Exception".
Output:
Please find the generated output from below.
-Ramkumar :)

Related

"No such file or directory" in hadoop while executing WordCount program using jar command

I am new to Hadoop and am trying to execute the WordCount Problem.
Things I did so far -
Setting up the Hadoop Single Node cluster referring the below link.
http://www.bogotobogo.com/Hadoop/BigData_hadoop_Install_on_ubuntu_single_node_cluster.php
Write the word count problem referring the below link
https://kishorer.in/2014/10/22/running-a-wordcount-mapreduce-example-in-hadoop-2-4-1-single-node-cluster-in-ubuntu-14-04-64-bit/
Problem is when I execute the last line to run the program -
hadoop jar wordcount.jar /usr/local/hadoop/input /usr/local/hadoop/output
Following is the error I get -
The directory seems to be present
The file is also present in the directory with contents
Finally, on a side note I also tried the following directory sturcture in the jar command.
No avail! :/
I would really appreciate if someone could guide me here!
Regards,
Paul Alwin
Your first image is using input from the local Hadoop installation directory, /usr
If you want to use that data on your local filesystem, you can specify file:///usr/...
Otherwise, if you're running pseudo distributed mode, HDFS has been setup, and /usr does not exist in HDFS unless you explicitly created it there.
Based on the stacktrace, I believe the error comes from the /app/hadoop/ staging directory path not existing, or the permissions for it are not allowing your current user to run commands against that path
Suggestion: Hortonworks and Cloudera offer pre-built VirtualBox images and lots of tutorial resources. Most companies will have Hadoop from one of those vendors, so it's better to get familiar with that rather than mess around with having to install Hadoop yourself from scratch, in my opinion

How do you transfer files onto the Hadoop FS (HDFS) on WIndows cmdline without Cygwin?

I have zero experience with Hadoop, but suddenly have to use it at work with Spark on Windows. My question, which has been asked a few times here, but I never could quite get the syntax for what I need, is this. I'm trying to transfer a simple file called:
gensortText.txt which let's say is at c:\gensortText.txt
I know you can use hadoop fs -copyFromLocal. I've tried these things:
hadoop fs -copyFromLocal C:\gensortText.txt hdfs://0.0.0.0:19000
ERROR: Relative path in absolute URI.
hadoop fs -copyFromLocal C:\gensortOutText.txt \tmp\hadoop-Administrator\dfs
ERROR: copyFromLocal: `tmphadoop-Administratordfs': No such file or directory
and a number of other variations with hdfs: and using the tmp directory which all returned similar errors.
I have hadoop in c:\deploy as suggested in the Hadoop2Windows guide (which works and allowed me to run Hadoop. I can access the WebGui and all that). Hadoop has created my new HDFS at c:\temp. Please someone help me figure out how to transfer files into the system. It can even be manually if that's possible, but that doesn't seem to work as it doesn't show up in the Web GUI when I go to "Utilities->Browse the Filesystem". Nothing shows up there actually.
Can someone please help. Any information that's relevant I can provide, but I'm so new to this I don't really know what would be helpful. I think it's just my syntax for the cmdline tool. Can someone give me a concrete example of how to use hadoop -fs copyFromLocal or another simple way to do this? Sorry for my ignorance on the subject, and thanks for any help
To be able to run hadoop commands on Windows you need to have winutils installed and visible to hadoop process.

unable to setup psuedo distributed hadoop cluster

I am using centos 7. Downloaded and untarred hadoop 2.4.0 and followed the instruction as per the link Hadoop 2.4.0 setup
Ran the following command.
./hdfs namenode -format
Got this error :
Error: Could not find or load main class org.apache.hadoop.hdfs.server.namenode.NameNode
I see a number of posts with the same error with no accepted answers and I have tried them all without any luck.
This error can occur if the necessary jarfiles are not readable by the user running the "./hdfs" command or are misplaced so that they can't be found by hadoop/libexec/hadoop-config.sh.
Check the permissions on the jarfiles under: hadoop-install/share/hadoop/*:
ls -l share/hadoop/*/*.jar
and if necessary, chmod them as the owner of the respective files to ensure they're readable. Something like chmod 644 should be sufficient to at least check if that fixes the initial problem. For the more permanent fix, you'll likely want to run the hadoop commands as the same user that owns all the files.
I followed the link Setup hadoop 2.4.0
and I was able to get over the error message.
Seems like the documentation on hadoop site is not complete.

Hadoop streaming with python on Windows

I'm using Hortonworks HDP for Windows and have it successfully configured with a master and 2 slaves.
I'm using the following command;
bin\hadoop jar contrib\streaming\hadoop-streaming-1.1.0-SNAPSHOT.jar -files file:///d:/dev/python/mapper.py,file:///d:/dev/python/reducer.py -mapper "python mapper.py" -reducer "python reduce.py" -input /flume/0424/userlog.MDAC-HD1.MDAC.local..20130424.1366789040945 -output /flume/o%1 -cmdenv PYTHONPATH=c:\python27
The mapper runs through fine, but the log reports that the reduce.py file wasn't found. In the exception it looks like the hadoop taskrunner is creating the symlink for the reducer to the mapper.py file.
When I check the job configuration file, I noticed that mapred.cache.files is set to;
hdfs://MDAC-HD1:8020/mapred/staging/administrator/.staging/job_201304251054_0021/files/mapper.py#mapper.py
It looks like although the reduce.py file is being added to the jar file, it's not being included in the configuration correctly and can't be found when the reducer tries to run.
I think my command is correct, I've tried using -file parameters instead but then neither file is found.
Can anyone see or know of an obvious reason?
Please note, this is on Windows.
EDIT- I've just run it locally and it worked, looks like my problem may be with the copying of the files round the cluster.
Still welcome input!
Well, thats embarrassing... my first question and I answer it myself.
I found the problem by renaming the hadoop conf file to force default settings which meant the local job tracker.
The job ran properly and it gave me the room to work out what the problem is, looks like communication around the cluster isn't as complete as it need be.
When I see your command, it shows "file:///d:/dev/python/reducer.py" for -files option, but you specify the reduce.py for -reducer. Does this cause the problem?? Sorry I am not sure.

How to run the hadoop simple program through command line

I'm new to the hadoop technologies .How to run the simple program through command line.I'm using windows environment.I install the Cygwin.Can you help me ...
Try the below URLs.
http://v-lad.org/Tutorials/Hadoop/00%20-%20Intro.html
http://hayesdavis.net/2008/06/14/running-hadoop-on-windows/
If you are new to Hadoop, try using one of the IDE plugins. This will help you get started quickly.
http://karmasphere.com/Studio-Eclipse/quick-click-guide.html
http://wiki.apache.org/hadoop/EclipsePlugIn
FYI ..... Hadoop on Windows is not recommended for Production.
Are your program written in Java? If so, you need to compile your program and pack the compiled files into a Jar file. And then run the program with hadoop command:
${hadoop_home}/bin/hadoop jar ${your_program_jar_file} ${main_class_of_jar}
You can run the Hadoop commands from anywhere in the terminal/command line, but only if the $path variable is set properly.
The syntax would be like this:
hadoop fs -<command> or hdfs fs -<command>
You review the docs for more information.

Resources