Tez - DAGAppMaster - java.lang.IllegalArgumentException: Invalid ContainerId - hadoop

I try to launch a mapreduce job, but I get an error while excuting the jobs in shell or in hive :
hive> select count(*) from employee ; Query ID =
mapr_20171107135114_a574713d-7d69-45e1-aa73-d4de07a3059b Total jobs =
1 Launching Job 1 out of 1 Number of reduce tasks determined at
compile time: 1 In order to change the average load for a reducer (in
bytes): set hive.exec.reducers.bytes.per.reducer= In order to
limit the maximum number of reducers: set
hive.exec.reducers.max= In order to set a constant number of
reducers: set mapreduce.job.reduces= Starting Job =
job_1510052734193_0005, Tracking URL =
http://hdpsrvpre2.intranet.darty.fr:8088/proxy/application_1510052734193_0005/
Kill Command = /opt/mapr/hadoop/hadoop-2.7.0/bin/hadoop job -kill
job_1510052734193_0005 Hadoop job information for Stage-1: number of
mappers: 0; number of reducers: 0 2017-11-07 13:51:25,951 Stage-1 map
= 0%, reduce = 0% Ended Job = job_1510052734193_0005 with errors Error during job, obtaining debugging information... **FAILED: Execution
Error, return code 2 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask
MapReduce Jobs Launched: Stage-Stage-1: MAPRFS Read: 0 MAPRFS Write: 0
FAIL Total MapReduce CPU Time Spent: 0 mse
in Ressourcemanager logs that what I find :
> 2017-11-07 13:51:25,269 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl:
> appattempt_1510052734193_0005_000002 State change from LAUNCHED to
> FINAL_SAVING 2017-11-07 13:51:25,269 INFO
> org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore:
> Updating info for attempt: appattempt_1510052734193_0005_000002 at:
> /var/mapr/cluster/yarn/rm/system/FSRMStateRoot/RMAppRoot/application_1510052734193_0005/appattempt_1510052734193_0005_000002
> 2017-11-07 13:51:25,283 INFO
> org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService:
> Unregistering app attempt : appattempt_1510052734193_0005_000002
> 2017-11-07 13:51:25,283 INFO
> org.apache.hadoop.yarn.server.resourcemanager.security.AMRMTokenSecretManager:
> Application finished, removing password for
> appattempt_1510052734193_0005_000002 2017-11-07 13:51:25,283 **INFO
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl:
> appattempt_1510052734193_0005_000002 State change from FINAL_SAVING to
> FAILED** 2017-11-07 13:51:25,284 INFO
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl: The
> number of failed attempts is 2. The max attempts is 2 2017-11-07
> 13:51:25,284 INFO
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl:
> Updating application application_1510052734193_0005 with final state:
> FAILED 2017-11-07 13:51:25,284 INFO
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl:
> application_1510052734193_0005 State change from ACCEPTED to
> FINAL_SAVING 2017-11-07 13:51:25,284 INFO
> org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore:
> Updating info for app: application_1510052734193_0005 2017-11-07
> 13:51:25,284 INFO
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler:
> Application appattempt_1510052734193_0005_000002 is done.
> finalState=FAILED 2017-11-07 13:51:25,284 INFO
> org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore:
> Updating info for app: application_1510052734193_0005 at:
> /var/mapr/cluster/yarn/rm/system/FSRMStateRoot/RMAppRoot/application_1510052734193_0005/application_1510052734193_0005
> 2017-11-07 13:51:25,284 INFO
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo:
> Application application_1510052734193_0005 requests cleared 2017-11-07
> 13:51:25,296 INFO
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl:
> Application application_1510052734193_0005 failed 2 times due to AM
> Container for appattempt_1510052734193_0005_000002 exited with
> exitCode: 1 For more detailed output, check application tracking
> page:http://hdpsrvpre2.intranet.darty.fr:8088/cluster/app/application_1510052734193_0005Then,
> click on links to logs of each attempt. Diagnostics: Exception from
> container-launch. Container id:
> container_e10_1510052734193_0005_02_000001 Exit code: 1 Stack trace:
> ExitCodeException exitCode=1: at
> org.apache.hadoop.util.Shell.runCommand(Shell.java:545) at
> org.apache.hadoop.util.Shell.run(Shell.java:456) at
> org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:722)
> at
> org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.launchContainer(LinuxContainerExecutor.java:304)
> at
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:354)
> at
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:87)
> at java.util.concurrent.FutureTask.run(FutureTask.java:262) at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1152)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:622)
> at java.lang.Thread.run(Thread.java:748) Shell output: main : command
> provided 1 main : user is mapr main : requested yarn user is mapr
>
> Container exited with a non-zero exit code 1 Failing this attempt. Failing the application.
Also , in sys log of jobs I find :
2017-11-07 12:09:46,419 FATAL [main] app.DAGAppMaster: Error starting
DAGAppMaster java.lang.IllegalArgumentException: Invalid ContainerId:
container_e10_1510052734193_0001_01_000001 at
org.apache.hadoop.yarn.util.ConverterUtils.toContainerId(ConverterUtils.java:182)
at org.apache.tez.dag.app.DAGAppMaster.main(DAGAppMaster.java:1794)
Caused by: java.lang.NumberFormatException: For input string: "e10"
at
java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
at java.lang.Long.parseLong(Long.java:441) at
java.lang.Long.parseLong(Long.java:483) at
org.apache.hadoop.yarn.util.ConverterUtils.toApplicationAttemptId(ConverterUtils.java:137)
at
org.apache.hadoop.yarn.util.ConverterUtils.toContainerId(ConverterUtils.java:177)
... 1 more
It seems to be that Tez which causes the issue, is there any solution to solve that?
Thank you !

I think that the execution environment has different versions of hadoop and their respective jar files.
Please verify the environment and make sure you use only the required version and remove the references of other versions from any of your environment variables.

Related

Spark on Yarn job failed with ExitCode:1 and stderr says "Can't find main class"

We tried to submit a simple SparkPI example onto Spark on Yarn. The bat is written as below:
./bin/spark-submit --class org.apache.spark.examples.SparkPi --master yarn-cluster --num-executors 3 --driver-memory 4g --executor-memory 1g --executor-cores 1 .\examples\target\spark-examples_2.10-1.4.0.jar 10
pause
Our HDFS and Yarn works well. We are using Hadoop 2.7.0 and Spark 1.4.1. We have only 1 node that acts as both NameNode and DataNode.
When we execute it, it fails with log says the following:
2015-08-21 11:07:22,044 DEBUG [main] | ===============================================================================
2015-08-21 11:07:22,044 DEBUG [main] | Yarn AM launch context:
2015-08-21 11:07:22,044 DEBUG [main] | user class: org.apache.spark.examples.SparkPi
2015-08-21 11:07:22,044 DEBUG [main] | env:
2015-08-21 11:07:22,044 DEBUG [main] | CLASSPATH -> {{PWD}}<CPS>{{PWD}}/__hadoop_conf__<CPS>{{PWD}}/__spark__.jar<CPS>%HADOOP_HOME%\etc\hadoop<CPS>%HADOOP_HOME%\share\hadoop\common\*<CPS>%HADOOP_HOME%\share\hadoop\common\lib\*<CPS>%HADOOP_HOME%\share\hadoop\mapreduce\*<CPS>%HADOOP_HOME%\share\hadoop\mapreduce\lib\*<CPS>%HADOOP_HOME%\share\hadoop\hdfs\*<CPS>%HADOOP_HOME%\share\hadoop\hdfs\lib\*<CPS>%HADOOP_HOME%\share\hadoop\yarn\*<CPS>%HADOOP_HOME%\share\hadoop\yarn\lib\*<CPS>%HADOOP_MAPRED_HOME%\share\hadoop\mapreduce\*<CPS>%HADOOP_MAPRED_HOME%\share\hadoop\mapreduce\lib\*
2015-08-21 11:07:22,060 DEBUG [main] | SPARK_YARN_CACHE_FILES_FILE_SIZES -> 165181064,1420218
2015-08-21 11:07:22,060 DEBUG [main] | SPARK_YARN_STAGING_DIR -> .sparkStaging/application_1440062075415_0026
2015-08-21 11:07:22,060 DEBUG [main] | SPARK_YARN_CACHE_FILES_VISIBILITIES -> PRIVATE,PRIVATE
2015-08-21 11:07:22,060 DEBUG [main] | SPARK_USER -> msrabi
2015-08-21 11:07:22,060 DEBUG [main] | SPARK_YARN_MODE -> true
2015-08-21 11:07:22,060 DEBUG [main] | SPARK_YARN_CACHE_FILES_TIME_STAMPS -> 1440126441200,1440126441575
2015-08-21 11:07:22,060 DEBUG [main] | SPARK_YARN_CACHE_FILES -> hdfs://msra-sa-44:9000/user/msrabi/.sparkStaging/application_1440062075415_0026/spark-assembly-1.4.0-hadoop2.7.0.jar#__spark__.jar,hdfs://msra-sa-44:9000/user/msrabi/.sparkStaging/application_1440062075415_0026/spark-examples_2.10-1.4.0.jar#__app__.jar
2015-08-21 11:07:22,060 DEBUG [main] | resources:
2015-08-21 11:07:22,060 DEBUG [main] | __app__.jar -> resource { scheme: "hdfs" host: "msra-sa-44" port: 9000 file: "/user/msrabi/.sparkStaging/application_1440062075415_0026/spark-examples_2.10-1.4.0.jar" } size: 1420218 timestamp: 1440126441575 type: FILE visibility: PRIVATE
2015-08-21 11:07:22,060 DEBUG [main] | __spark__.jar -> resource { scheme: "hdfs" host: "msra-sa-44" port: 9000 file: "/user/msrabi/.sparkStaging/application_1440062075415_0026/spark-assembly-1.4.0-hadoop2.7.0.jar" } size: 165181064 timestamp: 1440126441200 type: FILE visibility: PRIVATE
2015-08-21 11:07:22,060 DEBUG [main] | __hadoop_conf__ -> resource { scheme: "hdfs" host: "msra-sa-44" port: 9000 file: "/user/msrabi/.sparkStaging/application_1440062075415_0026/__hadoop_conf__7908628615251032149.zip" } size: 82888 timestamp: 1440126441794 type: ARCHIVE visibility: PRIVATE
2015-08-21 11:07:22,060 DEBUG [main] | command:
2015-08-21 11:07:22,075 DEBUG [main] | {{JAVA_HOME}}/bin/java -server -Xmx4096m -Djava.io.tmpdir={{PWD}}/tmp '-Dspark.app.name=org.apache.spark.examples.SparkPi' '-Dspark.executor.memory=1g' '-Dspark.driver.memory=4g' '-Dspark.master=yarn-cluster' -Dspark.yarn.app.container.log.dir=<LOG_DIR> org.apache.spark.deploy.yarn.ApplicationMaster --class 'org.apache.spark.examples.SparkPi' --jar file:/D:/sp/./examples/target/spark-examples_2.10-1.4.0.jar --arg '10' --executor-memory 1024m --executor-cores 1 --num-executors 3 1> <LOG_DIR>/stdout 2> <LOG_DIR>/stderr
2015-08-21 11:07:22,075 DEBUG [main] | ===============================================================================
...........(omitting some lines)......
2015-08-21 11:07:23,231 INFO [main] | Application report for application_1440062075415_0026 (state: ACCEPTED)
2015-08-21 11:07:23,247 DEBUG [main] |
client token: N/A
diagnostics: N/A
ApplicationMaster host: N/A
ApplicationMaster RPC port: -1
queue: default
start time: 1440126442169
final status: UNDEFINED
tracking URL: http://msra-sa-44:8088/proxy/application_1440062075415_0026/
user: msrabi
2015-08-21 11:07:24,263 TRACE [main] | 1: Call -> MSRA-SA-44/10.190.173.181:8032: getApplicationReport {application_id { id: 26 cluster_timestamp: 1440062075415 }}
2015-08-21 11:07:24,263 DEBUG [IPC Parameter Sending Thread #0] | IPC Client (443384617) connection to MSRA-SA-44/10.190.173.181:8032 from msrabi sending #37
2015-08-21 11:07:24,263 DEBUG [IPC Client (443384617) connection to MSRA-SA-44/10.190.173.181:8032 from msrabi] | IPC Client (443384617) connection to MSRA-SA-44/10.190.173.181:8032 from msrabi got value #37
2015-08-21 11:07:24,263 DEBUG [main] | Call: getApplicationReport took 0ms
2015-08-21 11:07:24,263 TRACE [main] | 1: Response <- MSRA-SA-44/10.190.173.181:8032: getApplicationReport {application_report { applicationId { id: 26 cluster_timestamp: 1440062075415 } user: "msrabi" queue: "default" name: "org.apache.spark.examples.SparkPi" host: "N/A" rpc_port: -1 yarn_application_state: ACCEPTED trackingUrl: "http://msra-sa-44:8088/proxy/application_1440062075415_0026/" diagnostics: "" startTime: 1440126442169 finishTime: 0 final_application_status: APP_UNDEFINED app_resource_Usage { num_used_containers: 1 num_reserved_containers: 0 used_resources { memory: 4608 virtual_cores: 1 } reserved_resources { memory: 0 virtual_cores: 0 } needed_resources { memory: 4608 virtual_cores: 1 } memory_seconds: 0 vcore_seconds: 0 } originalTrackingUrl: "N/A" currentApplicationAttemptId { application_id { id: 26 cluster_timestamp: 1440062075415 } attemptId: 1 } progress: 0.0 applicationType: "SPARK" }}
2015-08-21 11:07:24,263 INFO [main] | Application report for application_1440062075415_0026 (state: ACCEPTED)
.......(omitting some lines where the state are all ACCEPTED and final status are all UNDEFINED).....
2015-08-21 11:07:30,359 INFO [main] | Application report for application_1440062075415_0026 (state: FAILED)
2015-08-21 11:07:30,359 DEBUG [main] |
client token: N/A
diagnostics: Application application_1440062075415_0026 failed 2 times due to AM Container for appattempt_1440062075415_0026_000002 exited with exitCode: 1
For more detailed output, check application tracking page:http://msra-sa-44:8088/cluster/app/application_1440062075415_0026Then, click on links to logs of each attempt.
Diagnostics: Exception from container-launch.
Container id: container_1440062075415_0026_02_000001
Exit code: 1
Stack trace: ExitCodeException exitCode=1:
at org.apache.hadoop.util.Shell.runCommand(Shell.java:545)
at org.apache.hadoop.util.Shell.run(Shell.java:456)
at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:722)
at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:211)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82)
at java.util.concurrent.FutureTask.run(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.lang.Thread.run(Unknown Source)
Shell output: 1 file(s) moved.
And then we opened stderr, it says:
Error: Could not find or load main class 'Dspark.app.name=org.apache.spark.examples.SparkPi'
It's so strange, this should be a parameter passed to java, and it seems that java recognized it as the main class. There should be a main class parameter in the command section of the log, but there is not.
How can that happen? What should we do to know what's wrong with it?
Thank you!
We solved this problem.
The root cause is that when generating the java command line, our Spark uses single quote('-Dxxxx') to wrap the parameters. Single quote works only in Linux. On Windows, the parameters are either not wrapped, or wrapped with double quotes("-Dxxxx"). The only way to solve this is to edit the source code of Spark and re-compile it.
It seems that this is currently an issue of Spark. (https://issues.apache.org/jira/browse/SPARK-5754)

Hadoop Cluster Deployment Using Pivotal

I am trying to deploy Hadoop cluster via Pivotal distribution.
For the same, I am following link mentioned below
http://pivotalhd.docs.pivotal.io/doc/2100/webhelp/topics/ManuallyInstallingandUsingPivotalHD21Stack.html
Deployment Configuration:
1) phd1.xyz.com - NameNode, ResourceManager
2) phd2.xyz.com - DataNode, NodeManager
I have above mentioned services UP and Running and also able to access the HDFS file system but not able to execute jobs on cluster
Above provided link doesn't mention if the job has to be executed via root or hdfs user, so I tried both the ways
Error when job is executed via root user
hadoop jar/usr/lib/gphd/hadoop-mapreduce/hadoop-mapreduce-examples-2.2.0-gphd-3.1.0.0.jar
pi 2 200
The following error occurring:
> Number of Maps = 2
> Samples per Map = 200
> org.apache.hadoop.security.AccessControlException: Permission denied: user=root, access=WRITE,
> inode="/user":hdfs:supergroup:drwxr-xr-x
> at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:234)
> at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:214)
> at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:158)
> at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPermission(FSNamesystem.java:5389)
> at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPermission(FSNamesystem.java:5371)
> at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkAncestorAccess(FSNamesystem.java:5345)
> at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirsInternal(FSNamesystem.java:3583)
> at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirsInt(FSNamesystem.java:3553)
> at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirs(FSNamesystem.java:3525)
> at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.mkdirs(NameNodeRpcServer.java:745)
> at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.mkdirs(ClientNamenodeProtocolServerSideTranslatorPB.java:550)
> at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:63031)
> at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2048)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2044)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2042)
>
> at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
> at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
> at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
> at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
> at
Error when job is executed via hdfs user
sudo -u hdfs hadoop jar /usr/lib/gphd/hadoop-mapreduce/hadoop-mapreduce-examples-2.2.0-gphd-3.1.0.0.jar pi 2 200
the following error ocurring:
> Number of Maps = 2
> Samples per Map = 200
> Wrote input for Map #0
> Wrote input for Map #1
> Starting Job
> 15/01/01 20:48:20 INFO client.RMProxy: Connecting to ResourceManager at phd1.xyz.com/10.44.189.6:8050
> 15/01/01 20:48:21 INFO input.FileInputFormat: Total input paths to process : 2
> 15/01/01 20:48:21 INFO mapreduce.JobSubmitter: number of splits:2
> 15/01/01 20:48:21 INFO Configuration.deprecation: user.name is deprecated. Instead, use mapreduce.job.user.name
> 15/01/01 20:48:21 INFO Configuration.deprecation: mapred.jar is deprecated. Instead, use mapreduce.job.jar
> 15/01/01 20:48:21 INFO Configuration.deprecation: mapred.map.tasks.speculative.execution is deprecated. Instead, use
> mapreduce.map.speculative
> 15/01/01 20:48:21 INFO Configuration.deprecation: mapred.reduce.tasks is deprecated. Instead, use mapreduce.job.reduces
> 15/01/01 20:48:21 INFO Configuration.deprecation: mapred.output.value.class is deprecated. Instead, use
> mapreduce.job.output.value.class
> 15/01/01 20:48:21 INFO Configuration.deprecation: mapred.reduce.tasks.speculative.execution is deprecated. Instead, use
> mapreduce.reduce.speculative
> 15/01/01 20:48:21 INFO Configuration.deprecation: mapreduce.map.class is deprecated. Instead, use
> mapreduce.job.map.class
> 15/01/01 20:48:21 INFO Configuration.deprecation: mapred.job.name is deprecated. Instead, use mapreduce.job.name
> 15/01/01 20:48:21 INFO Configuration.deprecation: mapreduce.reduce.class is deprecated. Instead, use
> mapreduce.job.reduce.class
> 15/01/01 20:48:21 INFO Configuration.deprecation: mapreduce.inputformat.class is deprecated. Instead, use
> mapreduce.job.inputformat.class
> 15/01/01 20:48:21 INFO Configuration.deprecation: mapred.input.dir is deprecated. Instead, use mapreduce.input.fileinputformat.inputdir
> 15/01/01 20:48:21 INFO Configuration.deprecation: mapred.output.dir is deprecated. Instead, use
> mapreduce.output.fileoutputformat.outputdir
> 15/01/01 20:48:21 INFO Configuration.deprecation: mapreduce.outputformat.class is deprecated. Instead, use
> mapreduce.job.outputformat.class
> 15/01/01 20:48:21 INFO Configuration.deprecation: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps
> 15/01/01 20:48:21 INFO Configuration.deprecation: mapred.output.key.class is deprecated. Instead, use
> mapreduce.job.output.key.class
> 15/01/01 20:48:21 INFO Configuration.deprecation: mapred.working.dir is deprecated. Instead, use
> mapreduce.job.working.dir
> 15/01/01 20:48:21 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1420122968684_0002
> 15/01/01 20:48:22 INFO impl.YarnClientImpl: Submitted application application_1420122968684_0002 to ResourceManager at
> phd1.xyz.com/10.44.189.6:8050
> 15/01/01 20:48:22 INFO mapreduce.Job: The url to track the job: http://phd1.persistent.co.in:8088/proxy/application_1420122968684_0002/
> 15/01/01 20:48:22 INFO mapreduce.Job: Running job: job_1420122968684_0002
> 15/01/01 20:48:26 INFO mapreduce.Job: Job job_1420122968684_0002 running in uber mode : false
> 15/01/01 20:48:26 INFO mapreduce.Job: map 0% reduce 0%
> 15/01/01 20:48:26 INFO mapreduce.Job: Job job_1420122968684_0002 failed with state FAILED due to: Application
> application_1420122968684_0002 failed 2 times due to AM Container for
> appattempt_1420122968684_0002_000002 exited with exitCode: 1 due to:
> Exception from container-launch:
> org.apache.hadoop.util.Shell$ExitCodeException:
> at org.apache.hadoop.util.Shell.runCommand(Shell.java:464)
> at org.apache.hadoop.util.Shell.run(Shell.java:379)
> at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:589)
> at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:195)
> at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:283)
> at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:79)
> at java.util.concurrent.FutureTask.run(FutureTask.java:262)
> at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:745)
>
>
> .Failing this attempt.. Failing the application.
> 15/01/01 20:48:26 INFO mapreduce.Job: Counters: 0
> Job Finished in 5.973 seconds
> java.io.FileNotFoundException: File does not exist: hdfs://phd1.xyz.com:8020/user/hdfs/QuasiMonteCarlo_1420125497811_11863122/out/reduce-out
> at org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1120)
> at org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1112)
> at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
> at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1112)
> at org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1749)
> at org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1773)
> at org.apache.hadoop.examples.QuasiMonteCarlo.estimatePi(QuasiMonteCarlo.java:314)
> at org.apache.hadoop.examples.QuasiMonteCarlo.run(QuasiMonteCarlo.java:354)
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
> at org.apache.hadoop.examples.QuasiMonteCarlo.main(QuasiMonteCarlo.java:363)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:606)
> at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:72)
> at org.apache.hadoop.util.ProgramDriver.run(ProgramDriver.java:144)
> at org.apache.hadoop.examples.ExampleDriver.main(ExampleDriver.java:74)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:606)
> at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
Please let me know how can I resolve this error.
Thanks

Spring Integration - outbound transfer renaming issues

I am using the sample program here to build my code. Everything works fine with the local SFTP test server, When I tested today in my client SFTP servers, it gave me an exception as below.
When I debugged I saw the file being written with '.writing' extension on the client's SFTP server. The contents are fine, I don't see any issues with the file that was transferred but the file name is the issue now. After reading through the spring docs, I see that this is the temporary file extension, and the program tries to rename it back to the original name, but since the client SFTP does not provide this option, it throws and exception.
I tried writing the temporary-file-suffix=".writing" as temporary-file-suffix="" thinking that will set the options off, alas the same issue. Is there a work around?
One of the post mentions this, but no solution or the issue vanished for the user.
> 2014-07-28 20:11:21,564 [main] INFO com.jcraft.jsch - SSH_MSG_NEWKEYS
> sent 2014-07-28 20:11:21,608 [main] INFO com.jcraft.jsch -
> SSH_MSG_NEWKEYS received 2014-07-28 20:11:21,678 [main] INFO
> com.jcraft.jsch - SSH_MSG_SERVICE_REQUEST sent 2014-07-28 20:11:21,770
> [main] INFO com.jcraft.jsch - SSH_MSG_SERVICE_ACCEPT received
> 2014-07-28 20:11:21,818 [main] INFO com.jcraft.jsch - Authentications
> that can continue: publickey,keyboard-interactive,password 2014-07-28
> 20:11:21,818 [main] INFO com.jcraft.jsch - Next authentication
> method: publickey 2014-07-28 20:11:21,819 [main] INFO com.jcraft.jsch
> - Authentications that can continue: keyboard-interactive,password 2014-07-28 20:11:21,819 [main] INFO com.jcraft.jsch - Next
> authentication method: keyboard-interactive 2014-07-28 20:11:21,978
> [main] INFO com.jcraft.jsch - Authentication succeeded
> (keyboard-interactive). 2014-07-28 20:11:22,199 [main] DEBUG
> org.springframework.beans.factory.support.DefaultListableBeanFactory -
> Returning cached instance of singleton bean
> 'integrationEvaluationContext' 2014-07-28 20:11:22,878 [main] DEBUG
> org.springframework.integration.sftp.session.SftpSession - Initial
> File rename failed, possibly because file already exists. Will attempt
> to delete file: remote/TESTFILE.ABC and execute rename again.
> 2014-07-28 20:11:22,958 [main] INFO com.jcraft.jsch - Disconnecting
> from fsgatewaytest.aexp.com port 22 2014-07-28 20:11:22,961 [Connect
> thread fsgatewaytest.aexp.com session] INFO com.jcraft.jsch - Caught
> an exception, leaving main loop due to Socket closed Exception in
> thread "main" org.springframework.messaging.MessageDeliveryException:
> Error handling message for file [data/TESTFILE.ABC -> TESTFILE.ABC]
> at
> org.springframework.integration.file.remote.RemoteFileTemplate$1.doInSession(RemoteFileTemplate.java:227)
> at
> org.springframework.integration.file.remote.RemoteFileTemplate$1.doInSession(RemoteFileTemplate.java:190)
> at
> org.springframework.integration.file.remote.RemoteFileTemplate.execute(RemoteFileTemplate.java:302)
> at
> org.springframework.integration.file.remote.RemoteFileTemplate.send(RemoteFileTemplate.java:190)
> at
> org.springframework.integration.file.remote.RemoteFileTemplate.send(RemoteFileTemplate.java:182)
> at
> org.springframework.integration.file.remote.handler.FileTransferringMessageHandler.handleMessageInternal(FileTransferringMessageHandler.java:101)
> at
> org.springframework.integration.handler.AbstractMessageHandler.handleMessage(AbstractMessageHandler.java:78)
> at
> org.springframework.integration.dispatcher.AbstractDispatcher.tryOptimizedDispatch(AbstractDispatcher.java:116)
> at
> org.springframework.integration.dispatcher.UnicastingDispatcher.doDispatch(UnicastingDispatcher.java:101)
> at
> org.springframework.integration.dispatcher.UnicastingDispatcher.dispatch(UnicastingDispatcher.java:97)
> at
> org.springframework.integration.channel.AbstractSubscribableChannel.doSend(AbstractSubscribableChannel.java:77)
> at
> org.springframework.integration.channel.AbstractMessageChannel.send(AbstractMessageChannel.java:255)
> at
> org.springframework.integration.channel.AbstractMessageChannel.send(AbstractMessageChannel.java:223)
> at
> com.reachlocal.payment.integration.SftpOutboundTransferMain.main(SftpOutboundTransferMain.java:43)
> Caused by: org.springframework.messaging.MessagingException: Failed to
> write to 'inbox/REACHLOCALTST.CUFI.writing' while uploading the file
> at
> org.springframework.integration.file.remote.RemoteFileTemplate.sendFileToRemoteDirectory(RemoteFileTemplate.java:397)
> at
> org.springframework.integration.file.remote.RemoteFileTemplate.access$500(RemoteFileTemplate.java:56)
> at
> org.springframework.integration.file.remote.RemoteFileTemplate$1.doInSession(RemoteFileTemplate.java:213)
> ... 13 more Caused by: org.springframework.core.NestedIOException:
> Failed to delete file inbox/TESTFILE.ABC; nested exception is
> org.springframework.core.NestedIOException: Failed to remove file: 2:
> Specified file path is invalid. at
> org.springframework.integration.sftp.session.SftpSession.rename(SftpSession.java:200)
> at
> org.springframework.integration.file.remote.RemoteFileTemplate.sendFileToRemoteDirectory(RemoteFileTemplate.java:393)
> ... 15 more Caused by: org.springframework.core.NestedIOException:
> Failed to remove file: 2: Specified file path is invalid. at
> org.springframework.integration.sftp.session.SftpSession.remove(SftpSession.java:83)
> at
> org.springframework.integration.sftp.session.SftpSession.rename(SftpSession.java:194)
> ... 16 more
Updated config: Using use-temporary-file-name="false" solved this issue. Thanks a ton.
<int:channel id="inputChannel"/>
<int-sftp:outbound-channel-adapter id="sftpOutboundAdapter"
session-factory="sftpSessionFactory"
channel="inputChannel"
remote-filename-generator-expression="payload.getName()"
remote-directory="inbox"
use-temporary-file-name="false"/>
How about this use-temporary-file-name?
In this case you end up with this:
try {
session.write(inputStream, tempFilePath);
// then rename it to its final name if necessary
if (useTemporaryFileName){
session.rename(tempFilePath, remoteFilePath);
}
}
From mentioned doc:
However, there may be situations where you don't want to use this technique (for example, if the server does not permit renaming files). For situations like this, you can disable this feature by setting use-temporary-file-name to false (default is true). When this attribute is false, the file is written with its final name and the consuming application will need some other mechanism to detect that the file is completely uploaded before accessing it.

Pipe Broken exception every time when I run Mahout samples at EC2 server

I've installed mahout at bitnami AMI ami-02fb006b, (as well as several other ami's, otherwise I won't be asking the question)
according to instructions provided here
and here:
I'm always getting stuck when trying to run ./examples/bin/build-reuters.sh
Here's the output of the command:
> Please select a number to choose the corresponding clustering
> algorithm
> 1. kmeans clustering
> 2. lda clustering Enter your choice : 1 ok. You chose 1 and we'll use
> kmeans Clustering Downloading Reuters-21578 % Total % Received %
> Xferd Average Speed Time Time Time Current
> Dload Upload Total Spent
> Left Speed 100 7959k 100 7959k 0 0 294k 0 0:00:26
> 0:00:26 --:--:-- 305k Extracting... Running on hadoop, using
> HADOOP_HOME=/usr/local/hadoop-0.20.2
> HADOOP_CONF_DIR=/usr/local/hadoop-0.20.2/conf MAHOUT-JOB:
> /usr/local/mahout-0.4/examples/target/mahout-examples-0.6-SNAPSHOT-job.jar
> 11/08/16 20:10:25 WARN driver.MahoutDriver: No
> org.apache.lucene.benchmark.utils.ExtractReuters.props found on
> classpath, will use command-line arguments only
> Deleting all files in mahout-work/reuters-out-tmp
> 11/08/16 20:10:30 INFO driver.MahoutDriver: Program took 4906 ms
> MAHOUT_LOCAL is set, running locally
> CLASSPATH:
> :/usr/local/mahout-0.4/src/conf:/usr/local/hadoop-0.20.2/conf:/usr/lib/jvm/java-6-openjdk//lib/tools.jar:/usr/local/mahout-0.4/mahout-*.jar:/usr/local/mahout-0.4/examples/target/mahout-examples-0.6-SNAPSHOT-job.jar:/usr/local/mahout-0.4/mahout-examples-*-job.jar:/usr/local/mahout-0.4/lib/*.jar:/usr/local/mahout-0.4/examples/target/dependency/antlr-2.7.7.jar:/usr/local/mahout-0.4/examples/target/dependency/antlr-3.2.jar:/usr/local/mahout-0.4/examples/target/dependency/antlr-runtime-3.2.jar:/usr/local/mahout-0.4/examples/target/dependency/avro-1.4.0-cassandra-1.jar:/usr/local/mahout-0.4/examples/target/dependency/bson-2.5.jar:/usr/local/mahout-0.4/examples/target/dependency/cassandra-all-0.8.1.jar:/usr/local/mahout-0.4/examples/target/dependency/cassandra-thrift-0.8.1.jar:/usr/local/mahout-0.4/examples/target/dependency/cglib-nodep-2.2.jar:/usr/local/mahout-0.4/examples/target/dependency/commons-beanutils-1.7.0.jar:/usr/local/mahout-0.4/examples/target/dependency/commons-beanutils-core-1.8.0.jar:/usr/local/mahout-0.4/examples/target/dependency/commons-cli-1.2.jar:/usr/local/mahout-0.4/examples/target/dependency/commons-cli-2.0-mahout.jar:/usr/local/mahout-0.4/examples/target/dependency/commons-codec-1.4.jar:/usr/local/mahout-0.4/examples/target/dependency/commons-collections-3.2.1.jar:/usr/local/mahout-0.4/examples/target/dependency/commons-compress-1.1.jar:/usr/local/mahout-0.4/examples/target/dependency/commons-configuration-1.6.jar:/usr/local/mahout-0.4/examples/target/dependency/commons-dbcp-1.4.jar:/usr/local/mahout-0.4/examples/target/dependency/commons-digester-1.7.jar:/usr/local/mahout-0.4/examples/target/dependency/commons-httpclient-3.0.1.jar:/usr/local/mahout-0.4/examples/target/dependency/commons-lang-2.6.jar:/usr/local/mahout-0.4/examples/target/dependency/commons-logging-1.1.1.jar:/usr/local/mahout-0.4/examples/target/dependency/commons-math-2.2.jar:/usr/local/mahout-0.4/examples/target/dependency/commons-pool-1.5.6.jar:/usr/local/mahout-0.4/examples/target/dependency/concurrentlinkedhashmap-lru-1.1.jar:/usr/local/mahout-0.4/examples/target/dependency/easymock-3.0.jar:/usr/local/mahout-0.4/examples/target/dependency/google-collections-1.0-rc2.jar:/usr/local/mahout-0.4/examples/target/dependency/guava-r09.jar:/usr/local/mahout-0.4/examples/target/dependency/hadoop-core-0.20.203.0.jar:/usr/local/mahout-0.4/examples/target/dependency/hector-core-0.8.0-2.jar:/usr/local/mahout-0.4/examples/target/dependency/high-scale-lib-1.1.2.jar:/usr/local/mahout-0.4/examples/target/dependency/httpclient-4.0.1.jar:/usr/local/mahout-0.4/examples/target/dependency/httpcore-4.0.1.jar:/usr/local/mahout-0.4/examples/target/dependency/jackson-core-asl-1.8.2.jar:/usr/local/mahout-0.4/examples/target/dependency/jackson-mapper-asl-1.8.2.jar:/usr/local/mahout-0.4/examples/target/dependency/jakarta-regexp-1.4.jar:/usr/local/mahout-0.4/examples/target/dependency/jamm-0.2.2.jar:/usr/local/mahout-0.4/examples/target/dependency/jcommon-1.0.12.jar:/usr/local/mahout-0.4/examples/target/dependency/jetty-6.1.22.jar:/usr/local/mahout-0.4/examples/target/dependency/jetty-util-6.1.22.jar:/usr/local/mahout-0.4/examples/target/dependency/jfreechart-1.0.13.jar:/usr/local/mahout-0.4/examples/target/dependency/jline-0.9.94.jar:/usr/local/mahout-0.4/examples/target/dependency/json-simple-1.1.jar:/usr/local/mahout-0.4/examples/target/dependency/jul-to-slf4j-1.6.1.jar:/usr/local/mahout-0.4/examples/target/dependency/junit-4.8.2.jar:/usr/local/mahout-0.4/examples/target/dependency/libthrift-0.6.1.jar:/usr/local/mahout-0.4/examples/target/dependency/log4j-1.2.16.jar:/usr/local/mahout-0.4/examples/target/dependency/lucene-analyzers-3.1.0.jar:/usr/local/mahout-0.4/examples/target/dependency/lucene-benchmark-3.1.0.jar:/usr/local/mahout-0.4/examples/target/dependency/lucene-core-3.1.0.jar:/usr/local/mahout-0.4/examples/target/dependency/lucene-highlighter-3.1.0.jar:/usr/local/mahout-0.4/examples/target/dependency/lucene-memory-3.1.0.jar:/usr/local/mahout-0.4/examples/target/dependency/lucene-queries-3.1.0.jar:/usr/local/mahout-0.4/examples/target/dependency/lucene-xercesImpl-3.1.0.jar:/usr/local/mahout-0.4/examples/target/dependency/mahout-collections-1.0.jar:/usr/local/mahout-0.4/examples/target/dependency/mahout-core-0.6-SNAPSHOT.jar:/usr/local/mahout-0.4/examples/target/dependency/mahout-core-0.6-SNAPSHOT-tests.jar:/usr/local/mahout-0.4/examples/target/dependency/mahout-integration-0.6-SNAPSHOT.jar:/usr/local/mahout-0.4/examples/target/dependency/mahout-math-0.6-SNAPSHOT.jar:/usr/local/mahout-0.4/examples/target/dependency/mahout-math-0.6-SNAPSHOT-tests.jar:/usr/local/mahout-0.4/examples/target/dependency/mongo-java-driver-2.5.jar:/usr/local/mahout-0.4/examples/target/dependency/objenesis-1.2.jar:/usr/local/mahout-0.4/examples/target/dependency/servlet-api-2.5-20081211.jar:/usr/local/mahout-0.4/examples/target/dependency/servlet-api-2.5.jar:/usr/local/mahout-0.4/examples/target/dependency/slf4j-api-1.6.1.jar:/usr/local/mahout-0.4/examples/target/dependency/slf4j-jcl-1.6.1.jar:/usr/local/mahout-0.4/examples/target/dependency/slf4j-log4j12-1.6.1.jar:/usr/local/mahout-0.4/examples/target/dependency/snakeyaml-1.6.jar:/usr/local/mahout-0.4/examples/target/dependency/solr-commons-csv-3.1.0.jar:/usr/local/mahout-0.4/examples/target/dependency/speed4j-0.9.jar:/usr/local/mahout-0.4/examples/target/dependency/stringtemplate-3.2.jar:/usr/local/mahout-0.4/examples/target/dependency/uncommons-maths-1.2.2.jar:/usr/local/mahout-0.4/examples/target/dependency/uuid-3.2.0.jar:/usr/local/mahout-0.4/examples/target/dependency/watchmaker-framework-0.6.2.jar:/usr/local/mahout-0.4/examples/target/dependency/watchmaker-swing-0.6.2.jar:/usr/local/mahout-0.4/examples/target/dependency/xml-apis-1.0.b2.jar:/usr/local/mahout-0.4/examples/target/dependency/xpp3_min-1.1.4c.jar:/usr/local/mahout-0.4/examples/target/dependency/xstream-1.3.1.jar
> SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found
> binding in
> [jar:file:/usr/local/mahout-0.4/examples/target/mahout-examples-0.6-SNAPSHOT-job.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: Found binding in
> [jar:file:/usr/local/mahout-0.4/examples/target/dependency/slf4j-jcl-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: Found binding in
> [jar:file:/usr/local/mahout-0.4/examples/target/dependency/slf4j-log4j12-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an
> explanation. WARNING: org.apache.hadoop.metrics.jvm.EventCounter is
> deprecated. Please use org.apache.hadoop.log.metrics.EventCounter in
> all the log4j.properties files. 11/08/16 20:10:32 INFO
> common.AbstractJob: Command line arguments: {--charset=UTF-8,
> --chunkSize=5, --endPhase=2147483647,
> --fileFilterClass=org.apache.mahout.text.PrefixAdditionFilter,
> --input=mahout-work/reuters-out, --keyPrefix=,
> --output=mahout-work/reuters-out-seqdir, --startPhase=0,
> --tempDir=temp} Exception in thread "main" java.io.IOException: Call
> to localhost/127.0.0.1:9000 failed on local exception:
> java.io.IOException: Broken pipe
> at
> org.apache.hadoop.ipc.Client.wrapException(Client.java:1065)
> at org.apache.hadoop.ipc.Client.call(Client.java:1033)
> at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:224)
> at $Proxy1.getProtocolVersion(Unknown Source)
> at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:364)
> at
> org.apache.hadoop.hdfs.DFSClient.createRPCNamenode(DFSClient.java:106)
> at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:208)
> at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:175)
> at
> org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:89)
> at
> org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1310)
> at
> org.apache.hadoop.fs.FileSystem.access$100(FileSystem.java:65)
> at
> org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1328)
> at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:226)
> at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:109)
> at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:210)
> at
> org.apache.mahout.text.SequenceFilesFromDirectory.run(SequenceFilesFromDirectory.java:59)
> at
> org.apache.mahout.text.SequenceFilesFromDirectory.run(SequenceFilesFromDirectory.java:110)
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
> at
> org.apache.mahout.text.SequenceFilesFromDirectory.main(SequenceFilesFromDirectory.java:85)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:616)
> at
> org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68)
> at
> org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
> at
> org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:188)
> Caused by: java.io.IOException: Broken pipe
> at sun.nio.ch.FileDispatcher.write0(Native Method)
> at sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:47)
> at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:122)
> at sun.nio.ch.IOUtil.write(IOUtil.java:93)
> at
> sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:352)
> at
> org.apache.hadoop.net.SocketOutputStream$Writer.performIO(SocketOutputStream.java:55)
> at
> org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:142)
> at
> org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:146)
> at
> org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:107)
> at
> java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82)
> at
> java.io.BufferedOutputStream.flush(BufferedOutputStream.java:140)
> at java.io.DataOutputStream.flush(DataOutputStream.java:123)
> at
> org.apache.hadoop.ipc.Client$Connection.sendParam(Client.java:746)
> at org.apache.hadoop.ipc.Client.call(Client.java:1011)
> ... 25 more rmr: cannot remove mahout-work/reuters-out-seqdir:
> No such file or directory. put: File mahout-work/reuters-out-seqdir
> does not exist.
this is a consistent error and I am getting it in every single installation I attempt.
What do I do to fix this?
This looks like an error from Hadoop and/or EC2. Hadoop workers failed writing data to each other for some reason. Why, I don't know, but I might guess that ports aren't open, even locally.
I have always used Amazon EMR directly instead.
Maybe you can debug by trying another M/R job to test. It is not related to Mahout directly as far as I can tell.

Hadoop in windows : file not found exception

I'm using hadoop in windows and i've configured everything good (installing cygwin, passwordless ssh etc..)
I've compiled the wordcount program in WC.jar and tried to run. Its running perfectly in standalone mode.. but in fully distributed mode it gives FileNotFoundException
Please look into the logs and tel me what is wrong with it.
i've started the dfs and mapreduce in the MACH1. (thats my master)
$ bin/hadoop jar WC.jar WordCount words result
10/07/24 16:57:38 INFO input.FileInputFormat: Total input paths to process : 2
10/07/24 16:57:39 INFO mapred.JobClient: Running job: job_201007241657_0001
10/07/24 16:57:40 INFO mapred.JobClient: map 0% reduce 0%
10/07/24 16:57:50 INFO mapred.JobClient: Task Id : attempt_201007241657_0001_m_0
00003_0, Status : FAILED
java.io.FileNotFoundException: File C:/tmp/hadoop-328510/mapred/local/taskTracke
r/jobcache/job_201007241657_0001/attempt_201007241657_0001_m_000003_0/work/tmp d
oes not exist.
at org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSys
tem.java:361)
at org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.
java:245)
at org.apache.hadoop.mapred.TaskRunner.setupWorkDir(TaskRunner.java:519)
at org.apache.hadoop.mapred.Child.main(Child.java:155)
10/07/24 16:57:55 INFO mapred.JobClient: Task Id : attempt_201007241657_0001_r_0
00002_0, Status : FAILED
java.io.FileNotFoundException: File C:/tmp/hadoop-328510/mapred/local/taskTracke
r/jobcache/job_201007241657_0001/attempt_201007241657_0001_r_000002_0/work/tmp d
oes not exist.
at org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSys
tem.java:361)
at org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.
java:245)
at org.apache.hadoop.mapred.TaskRunner.setupWorkDir(TaskRunner.java:519)
at org.apache.hadoop.mapred.Child.main(Child.java:155)
10/07/24 16:58:07 INFO mapred.JobClient: Task Id : attempt_201007241657_0001_m_0
00003_1, Status : FAILED
java.io.FileNotFoundException: File C:/tmp/hadoop-SYSTEM/mapred/local/taskTracke
r/jobcache/job_201007241657_0001/attempt_201007241657_0001_m_000003_1/work/tmp d
oes not exist.
at org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSys
tem.java:361)
at org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.
java:245)
at org.apache.hadoop.mapred.TaskRunner.setupWorkDir(TaskRunner.java:519)
at org.apache.hadoop.mapred.Child.main(Child.java:155)
10/07/24 16:58:14 INFO mapred.JobClient: Task Id : attempt_201007241657_0001_m_0
00003_2, Status : FAILED
java.io.FileNotFoundException: File C:/tmp/hadoop-SYSTEM/mapred/local/taskTracke
r/jobcache/job_201007241657_0001/attempt_201007241657_0001_m_000003_2/work/tmp d
oes not exist.
at org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSys
tem.java:361)
at org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.
java:245)
at org.apache.hadoop.mapred.TaskRunner.setupWorkDir(TaskRunner.java:519)
at org.apache.hadoop.mapred.Child.main(Child.java:155)
10/07/24 16:58:26 INFO mapred.JobClient: Task Id : attempt_201007241657_0001_m_0
00002_0, Status : FAILED
java.io.FileNotFoundException: File C:/tmp/hadoop-SYSTEM/mapred/local/taskTracke
r/jobcache/job_201007241657_0001/attempt_201007241657_0001_m_000002_0/work/tmp d
oes not exist.
at org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSys
tem.java:361)
at org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.
java:245)
at org.apache.hadoop.mapred.TaskRunner.setupWorkDir(TaskRunner.java:519)
at org.apache.hadoop.mapred.Child.main(Child.java:155)
10/07/24 16:58:34 INFO mapred.JobClient: Task Id : attempt_201007241657_0001_r_0
00001_0, Status : FAILED
java.io.FileNotFoundException: File C:/tmp/hadoop-SYSTEM/mapred/local/taskTracke
r/jobcache/job_201007241657_0001/attempt_201007241657_0001_r_000001_0/work/tmp d
oes not exist.
at org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSys
tem.java:361)
at org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.
java:245)
at org.apache.hadoop.mapred.TaskRunner.setupWorkDir(TaskRunner.java:519)
at org.apache.hadoop.mapred.Child.main(Child.java:155)
10/07/24 16:58:41 INFO mapred.JobClient: Task Id : attempt_201007241657_0001_m_0
00002_1, Status : FAILED
java.io.FileNotFoundException: File C:/tmp/hadoop-328510/mapred/local/taskTracke
r/jobcache/job_201007241657_0001/attempt_201007241657_0001_m_000002_1/work/tmp d
oes not exist.
at org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSys
tem.java:361)
at org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.
java:245)
at org.apache.hadoop.mapred.TaskRunner.setupWorkDir(TaskRunner.java:519)
at org.apache.hadoop.mapred.Child.main(Child.java:155)
10/07/24 16:58:47 INFO mapred.JobClient: Task Id : attempt_201007241657_0001_m_0
00002_2, Status : FAILED
java.io.FileNotFoundException: File C:/tmp/hadoop-328510/mapred/local/taskTracke
r/jobcache/job_201007241657_0001/attempt_201007241657_0001_m_000002_2/work/tmp d
oes not exist.
at org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSys
tem.java:361)
at org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.
java:245)
at org.apache.hadoop.mapred.TaskRunner.setupWorkDir(TaskRunner.java:519)
at org.apache.hadoop.mapred.Child.main(Child.java:155)
10/07/24 16:58:53 INFO mapred.JobClient: Job complete: job_201007241657_0001
10/07/24 16:58:53 INFO mapred.JobClient: Counters: 0
328510#01HW179531 /usr/local/hadoop-0.20.2
$`
Thanks.
I think I might have seen this exception before but I don't have access to my old logs to confirm it. I solved my FileNotFoundException by reformatting the namenode. You might want to check the namenode logs for "inconsistent state" to confirm the cause before reformatting.

Resources