Job Token file not found when running Hadoop wordcount example - hadoop

I just installed Hadoop successfully on a small cluster. Now I'm trying to run the wordcount example but I'm getting this error:
****hdfs://localhost:54310/user/myname/test11
12/04/24 13:26:45 INFO input.FileInputFormat: Total input paths to process : 1
12/04/24 13:26:45 INFO mapred.JobClient: Running job: job_201204241257_0003
12/04/24 13:26:46 INFO mapred.JobClient: map 0% reduce 0%
12/04/24 13:26:50 INFO mapred.JobClient: Task Id : attempt_201204241257_0003_m_000002_0, Status : FAILED
Error initializing attempt_201204241257_0003_m_000002_0:
java.io.IOException: Exception reading file:/tmp/mapred/local/ttprivate/taskTracker/myname/jobcache/job_201204241257_0003/jobToken
at org.apache.hadoop.security.Credentials.readTokenStorageFile(Credentials.java:135)
at org.apache.hadoop.mapreduce.security.TokenCache.loadTokens(TokenCache.java:165)
at org.apache.hadoop.mapred.TaskTracker.initializeJob(TaskTracker.java:1179)
at org.apache.hadoop.mapred.TaskTracker.localizeJob(TaskTracker.java:1116)
at org.apache.hadoop.mapred.TaskTracker$5.run(TaskTracker.java:2404)
at java.lang.Thread.run(Thread.java:722)
Caused by: java.io.FileNotFoundException: File file:/tmp/mapred/local/ttprivate/taskTracker/myname/jobcache/job_201204241257_0003/jobToken does not exist.
at org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:397)
at org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:251)
at org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSInputChecker.<init>(ChecksumFileSystem.java:125)
at org.apache.hadoop.fs.ChecksumFileSystem.open(ChecksumFileSystem.java:283)
at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:427)
at org.apache.hadoop.security.Credentials.readTokenStorageFile(Credentials.java:129)
... 5 more
Any help?

I just worked through this same error--setting the permissions recursively on my Hadoop directory didn't help. Following Mohyt's recommendation here, I modified core-site.xml (in the hadoop/conf/ directory) to remove the place where I specified the temp directory (hadoop.tmp.dir in the XML). After allowing Hadoop to create its own temp directory, I'm running error-free.

It is better to create your own temp directory.
<configuration>
<property>
<name>hadoop.tmp.dir</name>
<value>/home/unmesha/mytmpfolder/tmp</value>
<description>A base for other temporary directories.</description>
</property>
.....
And give permission
unmesha#unmesha-virtual-machine:~$chmod 750 /mytmpfolder/tmp
check this for core-site.xml configuration

Related

S3distcp on local hadoop cluster not working

I am trying to run s3distcp from my local hadoop pseudo cluster. As a result of executing s3distcp.jar i received the following stack-trace . It seems that reducer task is failing but I am not able to pinpoint the reason which could be causing reducer to fail :-
18/02/21 12:14:01 WARN mapred.LocalJobRunner: job_local639263089_0001
java.lang.Exception: java.lang.RuntimeException: Reducer task failed to copy 1 files: file:/home/chirag/workspaces/lzo/data-1518765365022.lzo etc
at org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:489)
at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:556)
Caused by: java.lang.RuntimeException: Reducer task failed to copy 1 files: file:/home/chirag/workspaces/lzo/data-1518765365022.lzo etc
at com.amazon.external.elasticmapreduce.s3distcp.CopyFilesReducer.close(CopyFilesReducer.java:70)
at org.apache.hadoop.io.IOUtils.cleanup(IOUtils.java:250)
at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:459)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:392)
at org.apache.hadoop.mapred.LocalJobRunner$Job$ReduceTaskRunnable.run(LocalJobRunner.java:346)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
18/02/21 12:14:02 INFO mapreduce.Job: Job job_local639263089_0001 running in uber mode : false
18/02/21 12:14:02 INFO mapreduce.Job: map 100% reduce 0%
18/02/21 12:14:02 INFO mapreduce.Job: Job job_local639263089_0001 failed with state FAILED due to: NA
18/02/21 12:14:02 INFO mapreduce.Job: Counters: 35
I'm getting the same error. In my case, I found logs in HDFS /var/log/hadoop-yarn/apps/hadoop/logs related to the MR job that s3-dist-cp kicks off.
hadoop fs -ls /var/log/hadoop-yarn/apps/hadoop/logs
I copied them out to local:
hadoop fs -get /var/log/hadoop-yarn/apps/hadoop/logs/application_nnnnnnnnnnnnn_nnnn/ip-nnn-nn-nn-nnn.ec2.internal_nnnn
And then examined them in a text editor to find more diagnostic information about the detailed results of the Reducer phase. In my case I was getting an error back from the S3 service. You might find a different error.

class not found exception in mapreduce

When I tried the below query
[cloudera#localhost ~]$ hadoop jar /home/cloudera/workspace/para.jar word.Paras examples/wordcount examples/output3;
15/11/05 10:13:04 INFO mapred.JobClient: map 0% reduce 0% 15/11/05 10:13:18 INFO mapred.JobClient: Task Id : attempt_201511050944_0005_m_000000_0, Status : FAILED java.lang.RuntimeException: java.lang.ClassNotFoundException: Class word.Paras$Map not found.
Here word is package name Paras is class name I have checked the output file is created as logs.
How can I fix this issue.
Thanks,
Anbu k

not able to prevent local job runner from running

I am trying to fill up the hbase table from a java program using HTable and LoadIncrementalHFiles on Hadoop-1.
I have a fully distributed 3 node cluster with 1 master and 2 slaves.
Namenode,jobtracker are running on master and 3 datanodes,3 tasktrackers on all 3 nodes.
3 zookeepers on 3 nodes.
HMaster on master node and 3 regionservers on all 3 nodes.
My core-site.xml contains :
<property>
<name>hadoop.tmp.dir</name>
<value>/usr/local/hadoop/TMPDIR/</value>
<description>A base for other temporary directories.</description>
</property>
<property>
<name>fs.default.name</name>
<value>hdfs://master:54310/</value>
</property>
mapred-site.xml contains :
<property>
<name>mapred.job.tracker</name>
<value>master:54311</value>
</property>
But, when I run the program it gives me below error :
15/08/06 00:11:14 INFO mapred.TaskRunner: Creating symlink: /usr/local/hadoop/TMPDIR/mapred/local/archive/328189779182527451_-1963144838_2133510842/192.168.72.1/user/hduser/partitions_736cc0de-3c15-4a3d-8ae3-e4d239d73f93 <- /usr/local/hadoop/TMPDIR/mapred/local/localRunner/_partition.lst
15/08/06 00:11:14 WARN fs.FileUtil: Command 'ln -s /usr/local/hadoop/TMPDIR/mapred/local/archive/328189779182527451_-1963144838_2133510842/192.168.72.1/user/hduser/partitions_736cc0de-3c15-4a3d-8ae3-e4d239d73f93 /usr/local/hadoop/TMPDIR/mapred/local/localRunner/_partition.lst' failed 1 with: ln: failed to create symbolic link `/usr/local/hadoop/TMPDIR/mapred/local/localRunner/_partition.lst': No such file or directory
15/08/06 00:11:14 WARN mapred.TaskRunner: Failed to create symlink: /usr/local/hadoop/TMPDIR/mapred/local/archive/328189779182527451_-1963144838_2133510842/192.168.72.1/user/hduser/partitions_736cc0de-3c15-4a3d-8ae3-e4d239d73f93 <- /usr/local/hadoop/TMPDIR/mapred/local/localRunner/_partition.lst
15/08/06 00:11:14 INFO mapred.JobClient: Running job: job_local_0001
15/08/06 00:11:15 INFO util.ProcessTree: setsid exited with exit code 0
15/08/06 00:11:15 INFO mapred.Task: Using ResourceCalculatorPlugin : org.apache.hadoop.util.LinuxResourceCalculatorPlugin#35506f5f
15/08/06 00:11:15 INFO mapred.MapTask: io.sort.mb = 100
15/08/06 00:11:15 INFO mapred.JobClient: map 0% reduce 0%
15/08/06 00:11:17 INFO mapred.MapTask: data buffer = 79691776/99614720
15/08/06 00:11:17 INFO mapred.MapTask: record buffer = 262144/327680
15/08/06 00:11:17 WARN mapred.LocalJobRunner: job_local_0001
java.lang.IllegalArgumentException: Can't read partitions file
at org.apache.hadoop.mapreduce.lib.partition.TotalOrderPartitioner.setConf(TotalOrderPartitioner.java:116)
at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:62)
at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117)
at org.apache.hadoop.mapred.MapTask$NewOutputCollector.<init>(MapTask.java:677)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:756)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:214)
Caused by: java.io.FileNotFoundException: File _partition.lst does not exist.
at org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:397)
at org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:251)
at org.apache.hadoop.fs.FileSystem.getLength(FileSystem.java:796)
at org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1479)
at org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1474)
at org.apache.hadoop.mapreduce.lib.partition.TotalOrderPartitioner.readPartitions(TotalOrderPartitioner.java:301)
at org.apache.hadoop.mapreduce.lib.partition.TotalOrderPartitioner.setConf(TotalOrderPartitioner.java:88)
... 6 more
Few lines from my code :
Path input = new Path(args[0]);
input = input.makeQualified(input.getFileSystem(conf));
Path partitionFile = new Path(input, "_partitions.lst");
TotalOrderPartitioner.setPartitionFile(conf, partitionFile);
InputSampler.Sampler<IntWritable, Text> sampler = new InputSampler.RandomSampler<IntWritable, Text>(0.1, 100);
InputSampler.writePartitionFile(job, sampler);
job.setNumReduceTasks(2);
job.setPartitionerClass(TotalOrderPartitioner.class);
job.setJarByClass(TextToHBaseTransfer.class);
Why its still running the local job runner and giving me "Can't read partitions file"?
What am I missing in the cluster configuration?

ERROR security.UserGroupInformation: PriviledgedActionException in Hadoop 2.2

I am using Hadoop 2.2.0. hadoop-mapreduce-examples-2.2.0.jar are running fine on hdfs.
I have made a wordcount program in eclipse and add the jars using maven and run this jar:
ubuntu#ubuntu-linux:~$ yarn jar Sample-0.0.1-SNAPSHOT.jar com.vij.Sample.WordCount /user/ubuntu/wordcount/input/vij.txt user/ubuntu/wordcount/output
it give following error: 15/02/17 13:09:09 WARN util.NativeCodeLoader: Unable to load
native-hadoop library for your platform... using builtin-java classes
where applicable
15/02/17 13:09:10 INFO client.RMProxy: Connecting to ResourceManager
at /0.0.0.0:8032
15/02/17 13:09:11 ERROR security.UserGroupInformation:
PriviledgedActionException as:ubuntu (auth:SIMPLE)
cause:org.apache.hadoop.mapred.FileAlreadyExistsException: Output
directory hdfs://localhost:54310/user/ubuntu/wordcount/input/vij.txt
already exists
Exception in thread "main"
org.apache.hadoop.mapred.FileAlreadyExistsException: Output directory
hdfs://localhost:54310/user/ubuntu/wordcount/input/vij.txt already
exists
at org.apache.hadoop.mapreduce.lib.output.FileOutputFormat.checkOutputSpecs(FileOutputFormat.java:146)
at org.apache.hadoop.mapreduce.JobSubmitter.checkSpecs(JobSubmitter.java:456)
at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:342)
at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1268)
at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1265)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491)
at org.apache.hadoop.mapreduce.Job.submit(Job.java:1265)
at com.vij.Sample.WordCount.main(WordCount.java:33)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
jar is on my local system. both input and output path is on hdfs. there is no output dir exist on output path on hdfs.
please advice.
Thanks.
Acctually the error is:
ERROR security.UserGroupInformation:PriviledgedActionException as:ubuntu (auth:SIMPLE)
cause:org.apache.hadoop.mapred.FileAlreadyExistsException: Output
directory hdfs://localhost:54310/user/ubuntu/wordcount/input/vij.txt
already exists
Delete the output file that already exists "vij.txt", or output to a different file.
OR try to do the following steps:
Download and unzip the WordCount source code from this link under $HADOOP_HOME.
$ cd $HADOOP_HOME
$ wget http://salsahpc.indiana.edu/tutorial/source_code/Hadoop-WordCount.zip
$ unzip Hadoop-WordCount.zip
then , upload the input files (any text format file) into Hadoop distributed file system (HDFS):
$bin/hadoop fs -put $HADOOP_HOME/Hadoop-WordCount/input/ input
$bin/hadoop fs -ls input
Here, $HADOOP_HOME/Hadoop-WordCount/input/ is the local directory where the program inputs are stored. The second "input" represents the remote destination directory on the HDFS.
After uploading the inputs into HDFS, run the WordCount program with the following commands. We assume you have already compiled the word count program.
$ bin/hadoop jar $HADOOP_HOME/Hadoop-WordCount/wordcount.jar WordCount input output
If Hadoop is running correctly, it will print hadoop running messages similar to the following:
WARNING: org.apache.hadoop.metrics.jvm.EventCounter is deprecated.
Please use org.apache.hadoop.log.metrics.EventCounter in all the log4j.properties files.
11/11/02 18:34:46 INFO input.FileInputFormat: Total input paths to process : 1
11/11/02 18:34:46 INFO mapred.JobClient: Running job: job_201111021738_0001 11/11/02 18:34:47 INFO mapred.JobClient: map 0% reduce 0%
11/11/02 18:35:01 INFO mapred.JobClient: map 100% reduce 0%
11/11/02 18:35:13 INFO mapred.JobClient: map 100% reduce 100%
11/11/02 18:35:18 INFO mapred.JobClient: Job complete: job_201111021738_0001 11/11/02 18:35:18 INFO mapred.JobClient: Counters: 25
...

Hadoop - Mapreduce job fails to run in Windows (Cygwin)

I have installed cygwin in windows, and configured the hadoop0.20.0 set up on it, i could able to run the word count project in eclipse successfully, but when i run the wordcount in hadoop-..*-example.jar,it throws the following error
3/06/28 07:32:51 INFO input.FileInputFormat: Total input paths to process : 1
13/06/28 07:32:52 INFO mapred.JobClient: Running job: job_201306280622_0002
13/06/28 07:32:53 INFO mapred.JobClient: map 0% reduce 0%
13/06/28 07:32:57 INFO mapred.JobClient: Task Id : attempt_201306280622_0002_m_000002_0, Status : FAILED
Error initializing attempt_201306280622_0002_m_000002_0:
org.apache.hadoop.util.Shell$ExitCodeException: //job.jar: invalid mode: `jar'
Try `//job.jar --help' for more information.
at org.apache.hadoop.util.Shell.runCommand(Shell.java:195)
at org.apache.hadoop.util.Shell.run(Shell.java:134)
at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java: 286)
what would be the problem, plz assit
Your commands looks fine to me. try giving input path with file name. I hope that will solve your problem.
bin/hadoop jar hadoop-0.20.0-examples.jar wordcount /user/input/input_file.txt /user/output

Resources