In a small Hadoop cluster set up on a number of developer workstations (i.e., they have different local configurations), I have one TaskTracker of 6 that is being problematic. Whenever it receives a task, that task immediately fails with ChildError:
java.lang.Throwable: Child Error
at org.apache.hardoop.mapred.TaskRunner.run(TaskRunner.java:242)
Caused by: java.io.IOException: Task process exit with nonzero status of 1.
at org.apache.hardoop.mapred.TaskRunner.run(TaskRunner.java:229)
When I look at the stdout and stderr logs for the task, the stdout log is empty, and the stderr log only has:
execvp: Permission denied
My jobs complete because the tasktracker eventually gets blacklisted and runs on the other nodes that have no problem running a task. I am not able to get any tasks running on this one node, from any number of jobs, so this is a universal problem.
I have a DataNode running on this node with no issues.
I imagine there might some sort of Java issue here where it is having a hard time spawning a JVM or something...
We have same problem. we fix it by adding 'execute' to below file.
$JAVA_HOME/jre/bin/java
Because hadoop use $JAVA_HOME/jre/bin/java to spawn task program instead of $JAVA_HOME/bin/java.
If you still have this issue after change the file mode, suggest you use remote debug to find the shell cmd which spawning the task, see debugging hadoop task
Whatever it is trying to execvp does not have the executable bit set on it. You can set the executable bit using chmod from the commandline.
I have encountered the same problem.
You can try changing the jdk version 32bit to 64bit or 64bit to 32bit.
Related
I have been running some benchmarks and i am new to hadoop and hdfs. I have got the setup and things running and they were working fine. But now i am faced with this issue, jps on the master shows
1. secondary name node
2. job tracker
but not the name node and task tracker.
similarly jps on the slave nodes shows only name node, but task tracker is not running.
I usually run the job as the user and not root, but mistakenly i ran it as root and then when i exited and ran the job as user, i found the job doesn't start. then with jps i found the task tracker is not running.
I am new to hdfs, and not sure how to debug and solve this, it would be great if you can give some pointers/help on this one, i did try google and couldnt find relevant answers.
Edit: I tried clearing tmp files, killing obsolete java process and restarting. still i get the same issue.
Thanks.
Kill all java process, after stopping the cluster
remove /tmp hadoop pids
verify file permission errors, but looking at hadoop/logs/*.log file in name node and data node, this gave me useful info in debugging the issue.
this link was helpful,
http://felixtechnique.blogspot.com/2010/09/no-namenode-to-stop-no-tasktracker-to.html
I'm stuck with this problem. I'm using Hadoop (CDHu3). I have tried every possible solution, I found by Googling.
This is the issue:
When I ran Hadoop example "wordcount", the tasktracker's log in one slave node gave following errors:
1.WARN org.apache.hadoop.mapred.DefaultTaskController: Task wrapper stderr: bash:
/var/tmp/mapred/local/ttprivate/taskTracker/hdfs/jobcache/job_201203131751_0003/attempt_201203131751_0003_m_000006_0/taskjvm.sh:
Permission denied
2.WARN org.apache.hadoop.mapred.TaskRunner: attempt_201203131751_0003_m_000006_0 : Child Error
java.io.IOException: Task process exit with nonzero status of 126.
3.WARN org.apache.hadoop.mapred.TaskLog: Failed to retrieve stdout log for task: attempt_201203131751_0003_m_000003_0
java.io.FileNotFoundException:
/usr/lib/hadoop-0.20/logs/userlogs/job_201203131751_0003/attempt_201203131751_0003_m_000003_0/log.index
(No such file or directory)
I could not find similar issues in Google. I got some posts seem a little relevant and which suggest:
The ulimit of Hadoop user: My ulimit is set large enough for this bundled example
The memory used by JVM: My JVM uses only Xmx200m, too small to exceed the limit of my machine
The privilege of the mapred.local.dir and logs dir: I set them by "chmod 777"
The disk space is full: There is enough space for Hadoop in my log directory and mapred.local.dir.
How can I solve this problem?
For me this happended because hadoop wasn't able to create a MapReduce Job logs on hadoop/logs/userlogs/JobID/attemptID
ulimit is of course one of the highest possibility.
but for me it was because the disk we were using was full somehow and creating the log files failed
I am getting this error with a mostly out of the box configuration from
version 0.20.203.0
Where should I look for a potential issue. Most of the configuration is out of the box. I was able to visit the local websites for hdfs, task manager.
I am guessing the error is related to a permissions issue on cygwin and windows. Also, googling the problem, they say there might be some kind of out of memory issue. It is such a simple example, I don't see how that could be.
When I try to run the wordcount examples.
$ hadoop jar hadoop-examples-0.20.203.0.jar wordcount /user/hduser/gutenberg /user/hduser/gutenberg-output6
I get this error:
2011-08-12 15:45:38,299 WARN org.apache.hadoop.mapred.TaskRunner:
attempt_201108121544_0001_m_000008_2 : Child Error
java.io.IOException: Task process exit with nonzero status of 127.
at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:258)
2011-08-12 15:45:38,878 WARN org.apache.hadoop.mapred.TaskLog: Failed to
retrieve stdout log for task: attempt_201108121544_0001_m_000008_1
java.io.FileNotFoundException:
E:\projects\workspace_mar11\ParseLogCriticalErrors\lib\h\logs\userlogs\j
ob_201108121544_0001\attempt_201108121544_0001_m_000008_1\log.index (The
system cannot find the file specified)
at java.io.FileInputStream.open(Native Method)
at java.io.FileInputStream.<init>(FileInputStream.java:106)
at
org.apache.hadoop.io.SecureIOUtils.openForRead(SecureIOUtils.java:102)
at
org.apache.hadoop.mapred.TaskLog.getAllLogsFileDetails(TaskLog.java:112)
...
The userlogs/job* directory is empty. Maybe there is some permission
issue with those directories.
I am running on windows with cygwin so I don't really know permissions
to set.
I couldn't figure out this problem with the current version of hadoop. I reverted from the current version and went to a previous release, hadoop-0.20.2. I had to play around with the core-site.xml configuration file and temp directories but I eventually got the hdfs and map reduce to work properly.
The issue seems to be cygwin, windows and the drive setup that I was using. Hadoop launches a new JVM process when it tries to invoke a 'child' map/reduce task. The actual jvm execute statement is in some shell script.
In my case, hadoop couldn't find the path to the shell script. I am assuming that status code 127 error was the result of the Java Runtime execute not finding the shell script.
I am working on 8 node Hadoop cluster, and I am trying to execute a simple streaming Job with the specified configuration.
hadoop jar /usr/lib/hadoop-0.20/contrib/streaming/hadoop-streaming-0.20.2-cdh3u0.jar \-D mapred.map.max.tacker.failures=10 \-D mared.map.max.attempts=8 \-D mapred.skip.attempts.to.start.skipping=8 \-D mapred.skip.map.max.skip.records=8 \-D mapred.skip.mode.enabled=true \-D mapred.max.map.failures.percent=5 \-input /user/hdfs/ABC/ \-output "/user/hdfs/output1/" \-mapper "perl -e 'while (<>) { chomp; print; }; exit;" \-reducer "perl -e 'while (<>) { ~s/LR\>/LR\>\n/g; print ; }; exit;"
I am using cloudera's distribution for hadoop CDH3u0 with hadoop 0.20.2. The problem in execution of this job is that the job is getting failed everytime. The job is giving the error:
java.lang.Throwable: Child Error
at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:242)
Caused by: java.io.IOException: Task process exit with nonzero status of 1.
at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:229)
-------
java.lang.Throwable: Child Error
at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:242)
Caused by: java.io.IOException: Task process exit with nonzero status of 1.
at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:229)
STDERR on the datanodes:
Exception in thread "main" java.io.IOException: Exception reading file:/mnt/hdfs/06/local/taskTracker/hdfs/jobcache/job_201107141446_0001/jobToken
at org.apache.hadoop.security.Credentials.readTokenStorageFile(Credentials.java:146)
at org.apache.hadoop.mapreduce.security.TokenCache.loadTokens(TokenCache.java:159)
at org.apache.hadoop.mapred.Child.main(Child.java:107)
Caused by: java.io.FileNotFoundException: File file:/mnt/hdfs/06/local/taskTracker/hdfs/jobcache/job_201107141446_0001/jobToken does not exist.
For the cause of the error I have checked the following things and still it is crashing for which I am unable to understand the reason.
1. All the temp directories are in place
2. Memory is way more than it might be required for job (running a small job)
3. Permissions verified.
4. Nothing Fancier done in the configuration just usual stuff.
The most weird thing is that job runs successfully sometime and fails most of the time. Any guidance/Help regarding the issues would be really helpful. I am working on this error from last 4 days and I am not able to figure out anything. Please Help!!!
Thanks & Regards,
Atul
I have faced the same problem, it happens if task tracker is not able to allocates specified memory to the child JVM for the task.
Try executing same job again when cluster is not busy running many other jobs along with this one, it will go through or have speculative execution to true, in that case hadoop will execute the same task in another task tracker.
Every thing run well in Standalone mode and when going to the pseudo-distributed mode, the HDFS works well, I can put files to HDFS and browse it. And I also checked that there is one DataNode in the live nodes lists.
However, when I run bin/hadoop jar hadoop-*-examples.jar grep input output 'dfs[a-z.]+', the program just halt there without producing any error. And from http://ereg.adobe.com:50070/dfsnodelist.jsp?whatNodes=LIVE I can see that nothing has ever been run on that DataNode.
I followed the configuration in the tutorial for those xml conf files. So anyone have any idea about what other mistakes I might have made? B.T.W, I'm running the stuffs on Mac OS X.
By halt, do you mean it hangs, or that it just silently returns? For Mapreduce issues, you should check the JobTracker's webpage (at port 50030) to see the status of the submitted job.