I'm trying to make an item based recommendation but i got this error:
Lore:~ lorenzoferri$ mahout parallelALS --input prova1.csv --output output.csv --lambda 0.1 --implicitFeedback true --alpha 0.8 --numFeatures 2 --numIterations 5 --numThreadsPerSolver 1 --tempDir tmp
No MAHOUT_CONF_DIR found
Running on hadoop, using /usr/local/bin/hadoop and HADOOP_CONF_DIR=
MAHOUT-JOB: /usr/local/Cellar/mahout/0.9/libexec/mahout-examples-0.9-job.jar
15/02/09 17:32:25 WARN driver.MahoutDriver: No parallelALS.props found on classpath, will use command-line arguments only
15/02/09 17:32:26 INFO common.AbstractJob: Command line arguments: {--alpha=[0.8], --endPhase=[2147483647], --implicitFeedback=[true], --input=[prova1.csv], --lambda=[0.1], --numFeatures=[2], --numIterations=[5], --numThreadsPerSolver=[1], --output=[output.csv], --startPhase=[0], --tempDir=[tmp]}
15/02/09 17:32:26 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
15/02/09 17:32:26 INFO Configuration.deprecation: mapred.input.dir is deprecated. Instead, use mapreduce.input.fileinputformat.inputdir
15/02/09 17:32:26 INFO Configuration.deprecation: mapred.compress.map.output is deprecated. Instead, use mapreduce.map.output.compress
15/02/09 17:32:26 INFO Configuration.deprecation: mapred.output.dir is deprecated. Instead, use mapreduce.output.fileoutputformat.outputdir
Exception in thread "main" java.lang.IncompatibleClassChangeError: Found interface org.apache.hadoop.mapreduce.JobContext, but class was expected
at org.apache.mahout.common.HadoopUtil.getCustomJobName(HadoopUtil.java:174)
at org.apache.mahout.common.AbstractJob.prepareJob(AbstractJob.java:614)
at org.apache.mahout.cf.taste.hadoop.als.ParallelALSFactorizationJob.run(ParallelALSFactorizationJob.java:166)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
at org.apache.mahout.cf.taste.hadoop.als.ParallelALSFactorizationJob.main(ParallelALSFactorizationJob.java:113)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:483)
at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:71)
at org.apache.hadoop.util.ProgramDriver.run(ProgramDriver.java:144)
at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:152)
at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:195)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:483)
at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
Lore:~ lorenzoferri$
I also tried:
Lore:~ lorenzoferri$ mahout recommenditembased -i '/Users/lorenzoferri/Desktop/prova1.csv' -o '/Users/lorenzoferri/Desktop/output.csv' -n 100 -s SIMILARITY_COOCCURRENCE
No MAHOUT_CONF_DIR found
Running on hadoop, using /usr/local/bin/hadoop and HADOOP_CONF_DIR=
MAHOUT-JOB: /usr/local/Cellar/mahout/0.9/libexec/mahout-examples-0.9-job.jar
15/02/09 17:37:02 WARN driver.MahoutDriver: No recommenditembased.props found on classpath, will use command-line arguments only
15/02/09 17:37:02 INFO common.AbstractJob: Command line arguments: {--booleanData=[false], --endPhase=[2147483647], --input=[/Users/lorenzoferri/Desktop/prova1.csv], --maxPrefsInItemSimilarity=[500], --maxPrefsPerUser=[10], --maxSimilaritiesPerItem=[100], --minPrefsPerUser=[1], --numRecommendations=[100], --output=[/Users/lorenzoferri/Desktop/output.csv], --similarityClassname=[SIMILARITY_COOCCURRENCE], --startPhase=[0], --tempDir=[temp]}
15/02/09 17:37:02 INFO common.AbstractJob: Command line arguments: {--booleanData=[false], --endPhase=[2147483647], --input=[/Users/lorenzoferri/Desktop/prova1.csv], --minPrefsPerUser=[1], --output=[temp/preparePreferenceMatrix], --ratingShift=[0.0], --startPhase=[0], --tempDir=[temp]}
15/02/09 17:37:03 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
15/02/09 17:37:03 INFO Configuration.deprecation: mapred.input.dir is deprecated. Instead, use mapreduce.input.fileinputformat.inputdir
15/02/09 17:37:03 INFO Configuration.deprecation: mapred.compress.map.output is deprecated. Instead, use mapreduce.map.output.compress
15/02/09 17:37:03 INFO Configuration.deprecation: mapred.output.dir is deprecated. Instead, use mapreduce.output.fileoutputformat.outputdir
Exception in thread "main" java.lang.IncompatibleClassChangeError: Found interface org.apache.hadoop.mapreduce.JobContext, but class was expected
at org.apache.mahout.common.HadoopUtil.getCustomJobName(HadoopUtil.java:174)
at org.apache.mahout.common.AbstractJob.prepareJob(AbstractJob.java:614)
at org.apache.mahout.cf.taste.hadoop.preparation.PreparePreferenceMatrixJob.run(PreparePreferenceMatrixJob.java:73)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.mahout.cf.taste.hadoop.item.RecommenderJob.run(RecommenderJob.java:164)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.mahout.cf.taste.hadoop.item.RecommenderJob.main(RecommenderJob.java:322)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:483)
at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:71)
at org.apache.hadoop.util.ProgramDriver.run(ProgramDriver.java:144)
at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:152)
at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:195)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:483)
at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
I installed mahout via brew, on mac os x 10.10, but i got the same problem on ubuntu. On the web i can only find tutorial on eclipse, but i would like to run it through the terminal.
No MAHOUT_CONF_DIR found
add to your .bash_profile or .zshrc:
export MAHOUT=/path/to/mahout
export MAHOUT_CONF_DIR=$MAHOUT_HOME/conf
In order to run Mahout locally
add to your .bash_profile or .zshrc:
export MAHOUT_LOCAL="not-an-empty-string"
set to anything other than an empty string to force mahout to run locally
even if HADOOP_CONF_DIRand HADOOP_HOME are set.
In order to get up and running quickly I will recommend for you to:
Download mahout as a tar.gz: http://apache.mivzakim.net/mahout/0.9/mahout-distribution-0.9.tar.gz
Follow steps 1 and 2 as mentioned, the /path/to/mahout is where you extracted mahout-distribution-0.9
Run the mahout command from mahout-distribution-0.9/bin/mahout
i.e: ./bin/mahout recommenditembased -i '/Users/lorenzoferri/Desktop/prova1.csv' -o '/Users/lorenzoferri/Desktop/output.csv' -n 100 -s SIMILARITY_COOCCURRENCE
In mahout-distribution-0.9/examples you can run and explore some shell scripts
https://svn.apache.org/repos/asf/mahout/trunk/bin/mahout
Related
When I execute a sqoop job it throws a FileNotFoundException error as below
18/05/29 06:18:59 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-hduser/compile/0ce66d1f09ce960a71c165855afbe42c/QueryResult.jar
18/05/29 06:18:59 INFO mapreduce.ImportJobBase: Beginning query import.
18/05/29 06:18:59 INFO Configuration.deprecation: mapred.job.tracker is deprecated. Instead, use mapreduce.jobtracker.address
18/05/29 06:18:59 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
18/05/29 06:18:59 INFO Configuration.deprecation: mapred.jar is deprecated. Instead, use mapreduce.job.jar
18/05/29 06:19:01 INFO Configuration.deprecation: mapred.job.tracker is deprecated. Instead, use mapreduce.jobtracker.address
18/05/29 06:19:01 INFO Configuration.deprecation: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps
18/05/29 06:19:01 INFO Configuration.deprecation: session.id is deprecated. Instead, use dfs.metrics.session-id
18/05/29 06:19:01 INFO jvm.JvmMetrics: Initializing JVM Metrics with processName=JobTracker, sessionId=
18/05/29 06:19:01 INFO mapreduce.JobSubmitter: Cleaning up the staging area file:/app/hadoop/tmp/mapred/staging/hduser1354549662/.staging/job_local1354549662_0001
18/05/29 06:19:01 ERROR tool.ImportTool: Encountered IOException running import job: java.io.FileNotFoundException: File does not exist: hdfs://svn-server:54310/home/hduser/sqoop-1.4.6.bin__hadoop-2.0.4-alpha/lib/postgresql-9.2-1002-jdbc4.jar
at org.apache.hadoop.hdfs.DistributedFileSystem$18.doCall(DistributedFileSystem.java:1122)
at org.apache.hadoop.hdfs.DistributedFileSystem$18.doCall(DistributedFileSystem.java:1114)
at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1114)
at org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.getFileStatus(ClientDistributedCacheManager.java:288)
at org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.getFileStatus(ClientDistributedCacheManager.java:224)
at org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.determineTimestamps(ClientDistributedCacheManager.java:99)
at org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.determineTimestampsAndCacheVisibilities(ClientDistributedCacheManager.java:57)
at org.apache.hadoop.mapreduce.JobSubmitter.copyAndConfigureFiles(JobSubmitter.java:269)
at org.apache.hadoop.mapreduce.JobSubmitter.copyAndConfigureFiles(JobSubmitter.java:390)
at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:483)
at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1296)
at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1293)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
at org.apache.hadoop.mapreduce.Job.submit(Job.java:1293)
at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1314)
at org.apache.sqoop.mapreduce.ImportJobBase.doSubmitJob(ImportJobBase.java:196)
at org.apache.sqoop.mapreduce.ImportJobBase.runJob(ImportJobBase.java:169)
at org.apache.sqoop.mapreduce.ImportJobBase.runImport(ImportJobBase.java:266)
at org.apache.sqoop.manager.SqlManager.importQuery(SqlManager.java:729)
at org.apache.sqoop.tool.ImportTool.importTable(ImportTool.java:499)
at org.apache.sqoop.tool.ImportTool.run(ImportTool.java:605)
at org.apache.sqoop.tool.JobTool.execJob(JobTool.java:228)
at org.apache.sqoop.tool.JobTool.run(JobTool.java:283)
at org.apache.sqoop.Sqoop.run(Sqoop.java:143)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:179)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:218)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:227)
at org.apache.sqoop.Sqoop.main(Sqoop.java:236)
`
It should look for jars and other dependencies in local sqoop/lib directory but it looks it in HDFS with the same file path as my local sqoop lib path. As per project requirement, I need sqoop to look into my local library. How can I achieve this? Thanks.
Have you setup the SQOOP_HOME and PATH environment variables for the sqoop in .bashrc file
if not Please add
vi ~/.bashrc
include the below lines
export SQOOP_HOME=/your/path/to/the/sqoop
export PATH=$PATH:$SQOOP_HOME/bin
save the file and execute the following command
source ~/.bashrc
Hope this is helpfull to you!!!
I am executing a command to run mahout jar for input file to generate an output file. But I am facing several errors. I have placed the input file in hdfs. The command is:
mahout recommenditembased -s SIMILARITY_COOCCURRENCE -i /input.txt -o /output --booleanData true
I am facing error:
MAHOUT_LOCAL is not set; adding HADOOP_CONF_DIR to classpath.
Running on hadoop, using /usr/lib/hadoop/bin/hadoop and HADOOP_CONF_DIR=/etc/hadoop/conf
MAHOUT-JOB: /usr/lib/mahout/mahout-examples-0.12.0-job.jar
16/07/05 00:23:47 WARN driver.MahoutDriver: No recommenditembased.props found on classpath, will use command-line arguments only
16/07/05 00:23:48 INFO common.AbstractJob: Command line arguments: {--booleanData=[true], --endPhase=[2147483647], --input=[/input.txt], --maxPrefsInItemSimilarity=[500], --maxPrefsPerUser=[10], --maxSimilaritiesPerItem=[100], --minPrefsPerUser=[1], --numRecommendations=[10], --output=[/output], --similarityClassname=[SIMILARITY_COOCCURRENCE], --startPhase=[0], --tempDir=[temp]}
16/07/05 00:23:48 INFO common.AbstractJob: Command line arguments: {--booleanData=[true], --endPhase=[2147483647], --input=[/input.txt], --minPrefsPerUser=[1], --output=[temp/preparePreferenceMatrix], --ratingShift=[0.0], --startPhase=[0], --tempDir=[temp]}
16/07/05 00:23:48 INFO Configuration.deprecation: mapred.input.dir is deprecated. Instead, use mapreduce.input.fileinputformat.inputdir
16/07/05 00:23:48 INFO Configuration.deprecation: mapred.compress.map.output is deprecated. Instead, use mapreduce.map.output.compress
16/07/05 00:23:48 INFO Configuration.deprecation: mapred.output.dir is deprecated. Instead, use mapreduce.output.fileoutputformat.outputdir
16/07/05 00:23:49 INFO client.RMProxy: Connecting to ResourceManager at ip-172-31-15-19.us-west-2.compute.internal/172.31.15.19:8032
16/07/05 00:23:51 INFO mapreduce.JobSubmitter: Cleaning up the staging area /tmp/hadoop-yarn/staging/hadoop/.staging/job_1467669538614_0015
Exception in thread "main" org.apache.hadoop.mapreduce.lib.input.InvalidInputException: Input path does not exist: /input.txt
at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.singleThreadedListStatus(FileInputFormat.java:317)
at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.listStatus(FileInputFormat.java:265)
at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.getSplits(FileInputFormat.java:352)
at org.apache.hadoop.mapreduce.JobSubmitter.writeNewSplits(JobSubmitter.java:301)
at org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(JobSubmitter.java:318)
at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:196)
at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1290)
at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1287)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
at org.apache.hadoop.mapreduce.Job.submit(Job.java:1287)
at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1308)
at org.apache.mahout.cf.taste.hadoop.preparation.PreparePreferenceMatrixJob.run(PreparePreferenceMatrixJob.java:77)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.mahout.cf.taste.hadoop.item.RecommenderJob.run(RecommenderJob.java:168)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.mahout.cf.taste.hadoop.item.RecommenderJob.main(RecommenderJob.java:335)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:71)
at org.apache.hadoop.util.ProgramDriver.run(ProgramDriver.java:144)
at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:152)
at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:195)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
Thank you in advance.
From error trace it is clear that your mahout job is not able to find input file location (/input.txt).
Check your mahout configuration and hadoop config
If you are running it Locally you can try running with
file:///input.txt ( file protocol is for local file system) or if you are running on hdfs you can use hdfs://input.txt ( hdfs is for hdfs file system)
Trying to run Wordcount program in hadoop in eclipse (windows 7). and passing these argument in eclipse only
E:\hadoop\eclipse-hadoop-pro\workspace-hadoop\WordCountPro\input\word.txt
E:\hadoop\eclipse-hadoop-pro\workspace-hadoop\WordCountPro\output
I have created input file in project only like input folder and inside it word.txt file
But it is throughing below excption
2015-04-08 15:30:09,947 WARN [main] util.NativeCodeLoader (NativeCodeLoader.java:<clinit>(62)) - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
2015-04-08 15:30:10,238 ERROR [main] util.Shell (Shell.java:getWinUtilsPath(373)) - Failed to locate the winutils binary in the hadoop binary path
java.io.IOException: Could not locate executable E:\hadoop\hadoop-HADOOP_HOME\hadoop-2.6.0\bin\bin\winutils.exe in the Hadoop binaries.
at org.apache.hadoop.util.Shell.getQualifiedBinPath(Shell.java:355)
at org.apache.hadoop.util.Shell.getWinUtilsPath(Shell.java:370)
at org.apache.hadoop.util.Shell.<clinit>(Shell.java:363)
at org.apache.hadoop.util.StringUtils.<clinit>(StringUtils.java:79)
at org.apache.hadoop.security.Groups.parseStaticMapping(Groups.java:104)
at org.apache.hadoop.security.Groups.<init>(Groups.java:86)
at org.apache.hadoop.security.Groups.<init>(Groups.java:66)
at org.apache.hadoop.security.Groups.getUserToGroupsMappingService(Groups.java:280)
at org.apache.hadoop.security.UserGroupInformation.initialize(UserGroupInformation.java:271)
at org.apache.hadoop.security.UserGroupInformation.ensureInitialized(UserGroupInformation.java:248)
at org.apache.hadoop.security.UserGroupInformation.loginUserFromSubject(UserGroupInformation.java:763)
at org.apache.hadoop.security.UserGroupInformation.getLoginUser(UserGroupInformation.java:748)
at org.apache.hadoop.security.UserGroupInformation.getCurrentUser(UserGroupInformation.java:621)
at org.apache.hadoop.mapreduce.task.JobContextImpl.<init>(JobContextImpl.java:72)
at org.apache.hadoop.mapreduce.Job.<init>(Job.java:144)
at org.apache.hadoop.mapreduce.Job.getInstance(Job.java:187)
at org.apache.hadoop.mapreduce.Job.getInstance(Job.java:206)
at com.WordCount.main(WordCount.java:52)
2015-04-08 15:30:11,039 INFO [main] Configuration.deprecation (Configuration.java:warnOnceIfDeprecated(1049)) - session.id is deprecated. Instead, use dfs.metrics.session-id
2015-04-08 15:30:11,041 INFO [main] jvm.JvmMetrics (JvmMetrics.java:init(76)) - Initializing JVM Metrics with processName=JobTracker, sessionId=
Exception in thread "main" org.apache.hadoop.mapred.FileAlreadyExistsException: Output directory file:/E:/hadoop/eclipse-hadoop-pro/workspace-hadoop/WordCountPro/output already exists
at org.apache.hadoop.mapreduce.lib.output.FileOutputFormat.checkOutputSpecs(FileOutputFormat.java:146)
at org.apache.hadoop.mapreduce.JobSubmitter.checkSpecs(JobSubmitter.java:562)
at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:432)
at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1296)
at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1293)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Unknown Source)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
at org.apache.hadoop.mapreduce.Job.submit(Job.java:1293)
at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1314)
at com.WordCount.main(WordCount.java:61)
I doubt if Hadoop is installed correctly. Check in your machine if all the daemons are running or not.If not, then consider re-checking or re-installing what you are missing.
ERROR [main] util.Shell (Shell.java:getWinUtilsPath(373)) - Failed to locate the winutils binary in the hadoop binary path java.io.IOException: Could not locate executable
I configured pig as directed in the documentation.
Enviornment: Windows 7, Hadoop-0.20.2, pig 0.13.0, Cygwin
But when i type pig (mapreduce) on the command prompt it just displays below thing. I am not sure whether pig is started or not. I don't see GRUNT shell to execute script.
Btw, Hadoop is running on the same node.
Can someone please help?
$ pig
Find hadoop at /hadoop-0.20.2/bin/hadoop
dry run:
HADOOP_CLASSPATH: C:\cygwin64\pig-0.13.0\conf;C;C:\Program Files\Java\jdk1.6.0_25\lib\tools.jar;C;C:\cygwin64\hadoop-0.20.2\conf;C:\cygwin64\pig-0.13.0\lib\accumulo-core-1.5.0.jar;C:\cygwin64\pig-0.13.0\lib\accumulo-fate-1.5.0.jar;C:\cygwin64\pig-0.13.0\lib\accumulo-server-1.5.0.jar;C:\cygwin64\pig-0.13.0\lib\accumulo-start-1.5.0.jar;C:\cygwin64\pig-0.13.0\lib\accumulo-trace-1.5.0.jar;C:\cygwin64\pig-0.13.0\lib\avro-1.7.5.jar;C:\cygwin64\pig-0.13.0\lib\avro-mapred-1.7.5.jar;C:\cygwin64\pig-0.13.0\lib\avro-tools-1.7.5-nodeps.jar;C:\cygwin64\pig-0.13.0\lib\groovy-all-1.8.6.jar;C:\cygwin64\pig-0.13.0\lib\hbase-0.94.1.jar;C:\cygwin64\pig-0.13.0\lib\jruby-complete-1.6.7.jar;C:\cygwin64\pig-0.13.0\lib\js-1.7R2.jar;C:\cygwin64\pig-0.13.0\lib\json-simple-1.1.jar;C:\cygwin64\pig-0.13.0\lib\jython-standalone-2.5.3.jar;C:\cygwin64\pig-0.13.0\lib\piggybank.jar;C:\cygwin64\pig-0.13.0\lib\protobuf-java-2.4.0a.jar;C:\cygwin64\pig-0.13.0\lib\zookeeper-3.4.5.jar:C:\cygwin64\PIG-01~1.0/pig-withouthadoop-h2.jar:
HADOOP_OPTS: -Xmx1000m -Dpig.log.dir=C:\cygwin64\PIG-01~1.0\logs -Dpig.log.file=pig.log -Dpig.home.dir=C:\cygwin64\PIG-01~1.0\
HADOOP_CLIENT_OPTS: -Xmx1000m -Dpig.log.dir=C:\cygwin64\PIG-01~1.0\logs -Dpig.log.file=pig.log -Dpig.home.dir=C:\cygwin64\PIG-01~1.0\
/hadoop-0.20.2/bin/hadoop jar C:\cygwin64\PIG-01~1.0/pig-withouthadoop-h2.jar
when i run in debug mode, i see this exception. This is because Hadoop Jar is not set.
localhsot#mymachine~
$ echo $PIG_INSTALL
C:\cygwin64\pig-0.13.0
localhsot#mymachine~
$ export PIG_INSTALL=/cygdrive/c/cygwin64/pig-0.13.0
localhsot#mymachine~
$ export HADOOP_INSTALL=/cygdrive/c/cygwin64/hadoop-0.20.2/
localhsot#mymachine~
$ export PATH=$PATH:$PIG_INSTALL/bin:$HADOOP_INSTALL/bin
$ pig
14/08/26 14:05:12 INFO pig.ExecTypeProvider: Trying ExecType : LOCAL
14/08/26 14:05:12 INFO pig.ExecTypeProvider: Trying ExecType : MAPREDUCE
14/08/26 14:05:12 INFO pig.ExecTypeProvider: Picked MAPREDUCE as the ExecType
2014-08-26 14:05:12,998 [main] INFO org.apache.pig.Main - Apache Pig version 0. 13.0 (r1606446) compiled Jun 29 2014, 02:27:58
2014-08-26 14:05:12,998 [main] INFO org.apache.pig.Main - Logging error message s to: C:\cygwin64\home\chparekh\pig_1409076312996.log
Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/hadoop/map reduce/task/JobContextImpl
at org.apache.pig.tools.pigstats.PigStatsUtil.<clinit>(PigStatsUtil.java :68)
at org.apache.pig.Main.run(Main.java:643)
at org.apache.pig.Main.main(Main.java:156)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl. java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAcces sorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.mapreduce.task.Jo bContextImpl
at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
... 8 more
you can refer below link for the same, i hope this will help you.
http://abhijitsureshshingate.wordpress.com/2013/07/08/code-debug-test-apache-pig-scripts-using-eclipse-on-windows/
I'm following the mahout cookbook example. In one of them, I'm getting an exception:
mahout seqdirectory -i /home/hduser/cook/lastfm/current -o /home/hduser/cook/lastfm sequencefiles/
And then, I'm getting the following exception:
14/04/29 15:45:38 INFO common.AbstractJob: Command line arguments: {--charset=[UTF-8], --chunkSize=[64], --endPhase=[2147483647], --fileFilterClass=[org.apache.mahout.text.PrefixAdditionFilter], --input=[/home/hduser/cook/lastfm/current], --keyPrefix=[], --method=[mapreduce], --output=[/home/hduser/cook/lastfm/sequencefiles/], --startPhase=[0], --tempDir=[temp]}
14/04/29 15:45:38 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
14/04/29 15:45:38 INFO Configuration.deprecation: mapred.input.dir is deprecated. Instead, use mapreduce.input.fileinputformat.inputdir
14/04/29 15:45:38 INFO Configuration.deprecation: mapred.compress.map.output is deprecated. Instead, use mapreduce.map.output.compress
14/04/29 15:45:38 INFO Configuration.deprecation: mapred.output.dir is deprecated. Instead, use mapreduce.output.fileoutputformat.outputdir
Exception in thread "main" java.io.FileNotFoundException: File does not exist: /home/hduser/cook/lastfm/current
at org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1110)
at org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1102)
at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1102)
at org.apache.mahout.text.SequenceFilesFromDirectory.runMapReduce(SequenceFilesFromDirectory.java:162)
at org.apache.mahout.text.SequenceFilesFromDirectory.run(SequenceFilesFromDirectory.java:91)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
at org.apache.mahout.text.SequenceFilesFromDirectory.main(SequenceFilesFromDirectory.java:65)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:72)
at org.apache.hadoop.util.ProgramDriver.run(ProgramDriver.java:144)
at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:152)
at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:195)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
The problem is that the folder exists, as shown here:
hduser#ACER-V3-571G:~/cook/lastfm/current$ ls
artists.txt ArtistTags.dat README.txt tags.txt
Mahout 0.9 and Hadoop 2.2.0
jps shows me:
5435 ResourceManager
7257 Jps
5531 NodeManager
5104 DataNode
5262 SecondaryNameNode
5008 NameNode
I'm sorry, but i found the solution by myself.
Since the file isn't in the Hadoop Filesystem, I should run mahout as local.
export MAHOUT_LOCAL=TRUE
solves the problem.
Since an "export" configures a system environment variable in unix, an similar command should do it in Windows. Try:
SET MAHOUT_LOCAL=TRUE