Hadoop pipes problem

Hadoop pipes problem - hadoop

I have configured hadoop in pseudo-distributed mode (single -node cluster) on my ubuntu 10.04.
I have a problem in running hadoop pipes code
my code is following:
#include "/home/hadoop/project/hadoop-0.20.2/c++/Linux-amd64-64/include/hadoop/Pipes.hh"
#include "/home/hadoop/project/hadoop-0.20.2/c++/Linux-amd64-64/include/hadoop/TemplateFactory.hh"
#include "/home/hadoop/project/hadoop-0.20.2/c++/Linux-amd64-64/include/hadoop/StringUtils.hh"
#include "/home/hadoop/project/hadoop-0.20.2/src/c++/libhdfs/hdfs.h"
const std::string WORDCOUNT = "WORDCOUNT";
const std::string INPUT_WORDS = "INPUT_WORDS";
const std::string OUTPUT_WORDS = "OUTPUT_WORDS";
//hdfs fs;
//hdfs writefile;
hdfsFS fs;
hdfsFile writefile;
const char* writepath="/temp/mest";
class WordCountMap: public HadoopPipes::Mapper {
public:
HadoopPipes::TaskContext::Counter* inputWords;
WordCountMap(HadoopPipes::TaskContext& context) {
fs = hdfsConnect("192.168.0.133", 54310);
inputWords = context.getCounter(WORDCOUNT, INPUT_WORDS);
}
~WordCountMap()
{
hdfsCloseFile(fs, writefile);
}
void map(HadoopPipes::MapContext& context)
{
hdfsFile writefile = hdfsOpenFile(fs, writepath, O_WRONLY|O_CREAT, 0, 0, 0);
std::vector<std::string> words =
HadoopUtils::splitString(context.getInputValue(), " ");
for(unsigned int i=0; i < words.size(); ++i) {
context.emit(words[i], "1");
}
context.incrementCounter(inputWords, words.size());
}
};
class WordCountReduce: public HadoopPipes::Reducer {
public:
HadoopPipes::TaskContext::Counter* outputWords;
WordCountReduce(HadoopPipes::TaskContext& context) {
outputWords = context.getCounter(WORDCOUNT, OUTPUT_WORDS);
}
void reduce(HadoopPipes::ReduceContext& context) {
int sum = 0;
while (context.nextValue()) {
sum += HadoopUtils::toInt(context.getInputValue());
}
context.emit(context.getInputKey(), HadoopUtils::toString(sum));
context.incrementCounter(outputWords, 1);
}
};
int main(int argc, char *argv[]) {
return HadoopPipes::runTask(HadoopPipes::TemplateFactory<WordCountMap,
WordCountReduce>());
}
I compiled it it compiled successfully.
I run it with following command:
bin/hadoop pipes -D java.pipes.recordreader=true -D java.pipes.recordwriter=true -input gutenberg -output manish_gut2 -program bin/cat
but when i run it it shows following problems:
11/05/04 16:13:12 WARN mapred.JobClient: No job jar file set. User classes may not be found. See JobConf(Class) or JobConf#setJar(String).
11/05/04 16:13:12 INFO mapred.FileInputFormat: Total input paths to process : 3
11/05/04 16:13:13 INFO mapred.JobClient: Running job: job_201105041611_0001
11/05/04 16:13:14 INFO mapred.JobClient: map 0% reduce 0%
11/05/04 16:13:24 INFO mapred.JobClient: Task Id : attempt_201105041611_0001_m_000000_0, Status : FAILED
java.io.IOException: pipe child exception
at org.apache.hadoop.mapred.pipes.Application.abort(Application.java:151)
at org.apache.hadoop.mapred.pipes.PipesMapRunner.run(PipesMapRunner.java:101)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)
at org.apache.hadoop.mapred.Child.main(Child.java:170)
Caused by: java.io.EOFException
at java.io.DataInputStream.readByte(DataInputStream.java:250)
at org.apache.hadoop.io.WritableUtils.readVLong(WritableUtils.java:298)
at org.apache.hadoop.io.WritableUtils.readVInt(WritableUtils.java:319)
at org.apache.hadoop.mapred.pipes.BinaryProtocol$UplinkReaderThread.run(BinaryProtocol.java:114)
attempt_201105041611_0001_m_000000_0: Hadoop Pipes Exception: RecordReader not defined at /export/crawlspace/chris/work/branch-0.20/src/c++/pipes/impl/HadoopPipes.cc:692 in virtual void HadoopPipes::TaskContextImpl::runMap(std::string, int, bool)
11/05/04 16:13:24 INFO mapred.JobClient: Task Id : attempt_201105041611_0001_m_000001_0, Status : FAILED
java.io.IOException: pipe child exception
at org.apache.hadoop.mapred.pipes.Application.abort(Application.java:151)
at org.apache.hadoop.mapred.pipes.PipesMapRunner.run(PipesMapRunner.java:101)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)
at org.apache.hadoop.mapred.Child.main(Child.java:170)
Caused by: java.io.EOFException
at java.io.DataInputStream.readByte(DataInputStream.java:250)
at org.apache.hadoop.io.WritableUtils.readVLong(WritableUtils.java:298)
at org.apache.hadoop.io.WritableUtils.readVInt(WritableUtils.java:319)
at org.apache.hadoop.mapred.pipes.BinaryProtocol$UplinkReaderThread.run(BinaryProtocol.java:114)
attempt_201105041611_0001_m_000001_0: Hadoop Pipes Exception: RecordReader not defined at /export/crawlspace/chris/work/branch-0.20/src/c++/pipes/impl/HadoopPipes.cc:692 in virtual void HadoopPipes::TaskContextImpl::runMap(std::string, int, bool)
11/05/04 16:13:29 INFO mapred.JobClient: Task Id : attempt_201105041611_0001_m_000001_1, Status : FAILED
java.io.IOException: pipe child exception
at org.apache.hadoop.mapred.pipes.Application.abort(Application.java:151)
at org.apache.hadoop.mapred.pipes.PipesMapRunner.run(PipesMapRunner.java:101)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)
at org.apache.hadoop.mapred.Child.main(Child.java:170)
Caused by: java.io.EOFException
at java.io.DataInputStream.readByte(DataInputStream.java:250)
at org.apache.hadoop.io.WritableUtils.readVLong(WritableUtils.java:298)
at org.apache.hadoop.io.WritableUtils.readVInt(WritableUtils.java:319)
at org.apache.hadoop.mapred.pipes.BinaryProtocol$UplinkReaderThread.run(BinaryProtocol.java:114)
attempt_201105041611_0001_m_000001_1: Hadoop Pipes Exception: RecordReader not defined at /export/crawlspace/chris/work/branch-0.20/src/c++/pipes/impl/HadoopPipes.cc:692 in virtual void HadoopPipes::TaskContextImpl::runMap(std::string, int, bool)
11/05/04 16:13:29 INFO mapred.JobClient: Task Id : attempt_201105041611_0001_m_000000_1, Status : FAILED
java.io.IOException: pipe child exception
at org.apache.hadoop.mapred.pipes.Application.abort(Application.java:151)
at org.apache.hadoop.mapred.pipes.PipesMapRunner.run(PipesMapRunner.java:101)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)
at org.apache.hadoop.mapred.Child.main(Child.java:170)
Caused by: java.io.EOFException
at java.io.DataInputStream.readByte(DataInputStream.java:250)
at org.apache.hadoop.io.WritableUtils.readVLong(WritableUtils.java:298)
at org.apache.hadoop.io.WritableUtils.readVInt(WritableUtils.java:319)
at org.apache.hadoop.mapred.pipes.BinaryProtocol$UplinkReaderThread.run(BinaryProtocol.java:114)
attempt_201105041611_0001_m_000000_1: Hadoop Pipes Exception: RecordReader not defined at /export/crawlspace/chris/work/branch-0.20/src/c++/pipes/impl/HadoopPipes.cc:692 in virtual void HadoopPipes::TaskContextImpl::runMap(std::string, int, bool)
11/05/04 16:13:35 INFO mapred.JobClient: Task Id : attempt_201105041611_0001_m_000000_2, Status : FAILED
java.io.IOException: pipe child exception
at org.apache.hadoop.mapred.pipes.Application.abort(Application.java:151)
at org.apache.hadoop.mapred.pipes.PipesMapRunner.run(PipesMapRunner.java:101)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)
at org.apache.hadoop.mapred.Child.main(Child.java:170)
Caused by: java.io.EOFException
at java.io.DataInputStream.readByte(DataInputStream.java:250)
at org.apache.hadoop.io.WritableUtils.readVLong(WritableUtils.java:298)
at org.apache.hadoop.io.WritableUtils.readVInt(WritableUtils.java:319)
at org.apache.hadoop.mapred.pipes.BinaryProtocol$UplinkReaderThread.run(BinaryProtocol.java:114)
attempt_201105041611_0001_m_000000_2: Hadoop Pipes Exception: RecordReader not defined at /export/crawlspace/chris/work/branch-0.20/src/c++/pipes/impl/HadoopPipes.cc:692 in virtual void HadoopPipes::TaskContextImpl::runMap(std::string, int, bool)
11/05/04 16:13:35 INFO mapred.JobClient: Task Id : attempt_201105041611_0001_m_000001_2, Status : FAILED
java.io.IOException: pipe child exception
at org.apache.hadoop.mapred.pipes.Application.abort(Application.java:151)
at org.apache.hadoop.mapred.pipes.PipesMapRunner.run(PipesMapRunner.java:101)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)
at org.apache.hadoop.mapred.Child.main(Child.java:170)
Caused by: java.io.EOFException
at java.io.DataInputStream.readByte(DataInputStream.java:250)
at org.apache.hadoop.io.WritableUtils.readVLong(WritableUtils.java:298)
at org.apache.hadoop.io.WritableUtils.readVInt(WritableUtils.java:319)
at org.apache.hadoop.mapred.pipes.BinaryProtocol$UplinkReaderThread.run(BinaryProtocol.java:114)
attempt_201105041611_0001_m_000001_2: Hadoop Pipes Exception: RecordReader not defined at /export/crawlspace/chris/work/branch-0.20/src/c++/pipes/impl/HadoopPipes.cc:692 in virtual void HadoopPipes::TaskContextImpl::runMap(std::string, int, bool)
11/05/04 16:13:44 INFO mapred.JobClient: Job complete: job_201105041611_0001
11/05/04 16:13:44 INFO mapred.JobClient: Counters: 3
11/05/04 16:13:44 INFO mapred.JobClient: Job Counters
11/05/04 16:13:44 INFO mapred.JobClient: Launched map tasks=8
11/05/04 16:13:44 INFO mapred.JobClient: Data-local map tasks=8
11/05/04 16:13:44 INFO mapred.JobClient: Failed map tasks=1
Exception in thread "main" java.io.IOException: Job failed!
at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1252)
at org.apache.hadoop.mapred.pipes.Submitter.runJob(Submitter.java:248)
at org.apache.hadoop.mapred.pipes.Submitter.run(Submitter.java:479)
at org.apache.hadoop.mapred.pipes.Submitter.main(Submitter.java:494)
I dont know what i'm doing wrong ?
how can i run this program ?
how to resolve these error?

I would start with what they do here http://wiki.apache.org/hadoop/C%2B%2BWordCount, get it functional and then expand it to your implementation.
You can also use that page to see differences in your implementation and theirs and try to solve the problem that way. One difference I notice is your recordreader and recordwriter classes. You have java.pipes.recordreader and java.pipes.recordwriter while the example at the link uses hadoop.pipes.java.recordreader and hadoop.pipes.java.recordwriter.
I haven't used hadoop in this method before, so this is just me finding a similar thing and noticing differences. :)
Hopefully this will point you in the correct direction

The problem here is as #Nija describes.
hadoop.pipes.java.recordreader is not being specified, and defaults to false. That means it's expecting you to have a RecordReader in your C++ code. And you don't have one, and thus it can't be found.

Related

How to debug where a map-reduce fails?

I compile the program it runs successfully but mapper reducer fails; I need help in code to run the mapper and reducer successfully.
I guess there are some parsing issues in the code. Attached are the errors and below that is the code. How can I find the issues to solve this problem?
01/23 03:27:33 INFO mapreduce.Job: map 0% reduce 0%
19/01/23 03:27:50 INFO mapreduce.Job: map 100% reduce 0%
19/01/23 03:27:51 INFO mapreduce.Job: map 0% reduce 0%
19/01/23 03:27:52 INFO mapreduce.Job: Task Id : attempt_1548178978946_0002_m_000000_0, Status : FAILED
Error: java.lang.NumberFormatException: For input string: "1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20"
at sun.misc.FloatingDecimal.readJavaFormatString(FloatingDecimal.java:2043)
at sun.misc.FloatingDecimal.parseDouble(FloatingDecimal.java:110)
at java.lang.Double.parseDouble(Double.java:538)
at java.lang.Double.<init>(Double.java:608)
at com.hadoop.imcdp.MA$Map.partitionData(MA.java:69)
at com.hadoop.imcdp.MA$Map.map(MA.java:58)
at com.hadoop.imcdp.MA$Map.map(MA.java:49)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:146)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:787)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1762)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
19/01/23 03:28:21 INFO mapreduce.Job: Task Id : attempt_1548178978946_0002_m_000000_1, Status : FAILED
Error: java.lang.NumberFormatException: For input string: "1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20"
at sun.misc.FloatingDecimal.readJavaFormatString(FloatingDecimal.java:2043)
at sun.misc.FloatingDecimal.parseDouble(FloatingDecimal.java:110)
at java.lang.Double.parseDouble(Double.java:538)
at java.lang.Double.<init>(Double.java:608)
at com.hadoop.imcdp.MA$Map.partitionData(MA.java:69)
at com.hadoop.imcdp.MA$Map.map(MA.java:58)
at com.hadoop.imcdp.MA$Map.map(MA.java:49)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:146)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:787)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1762)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
Container killed by the ApplicationMaster.
Container killed on request. Exit code is 143
Container exited with a non-zero exit code 143
19/01/23 03:28:32 INFO mapreduce.Job: Task Id : attempt_1548178978946_0002_m_000000_2, Status : FAILED
Error: java.lang.NumberFormatException: For input string: "1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20"
at sun.misc.FloatingDecimal.readJavaFormatString(FloatingDecimal.java:2043)
at sun.misc.FloatingDecimal.parseDouble(FloatingDecimal.java:110)
at java.lang.Double.parseDouble(Double.java:538)
at java.lang.Double.<init>(Double.java:608)
at com.hadoop.imcdp.MA$Map.partitionData(MA.java:69)
at com.hadoop.imcdp.MA$Map.map(MA.java:58)
at com.hadoop.imcdp.MA$Map.map(MA.java:49)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:146)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:787)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1762)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
Container killed by the ApplicationMaster.
Container killed on request. Exit code is 143
Container exited with a non-zero exit code 143
19/01/23 03:28:51 INFO mapreduce.Job: map 100% reduce 100%
19/01/23 03:28:53 INFO mapreduce.Job: Job job_1548178978946_0002 failed with state FAILED due to: Task failed task_1548178978946_0002_m_000000
Job failed as tasks failed. failedMaps:1 failedReduces:0
19/01/23 03:28:53 INFO mapreduce.Job: Counters: 13
Job Counters
Failed map tasks=4
Killed reduce tasks=1
Launched map tasks=4
Other local map tasks=3
Data-local map tasks=1
Total time spent by all maps in occupied slots (ms)=67635
Total time spent by all reduces in occupied slots (ms)=0
Total time spent by all map tasks (ms)=67635
Total time spent by all reduce tasks (ms)=0
Total vcore-milliseconds taken by all map tasks=67635
Total vcore-milliseconds taken by all reduce tasks=0
Total megabyte-milliseconds taken by all map tasks=69258240
Total megabyte-milliseconds taken by all reduce tasks=0
19/01/23 03:28:53 INFO mapreduce.Job: Running job: job_1548178978946_0002
19/01/23 03:28:53 INFO mapreduce.Job: Job job_1548178978946_0002 running in uber mode : false
19/01/23 03:28:53 INFO mapreduce.Job: map 100% reduce 100%
19/01/23 03:28:53 INFO mapreduce.Job: Task Id : attempt_1548178978946_0002_m_000000_0, Status : FAILED
Error: java.lang.NumberFormatException: For input string: "1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20"
at sun.misc.FloatingDecimal.readJavaFormatString(FloatingDecimal.java:2043)
at sun.misc.FloatingDecimal.parseDouble(FloatingDecimal.java:110)
at java.lang.Double.parseDouble(Double.java:538)
at java.lang.Double.<init>(Double.java:608)
at com.hadoop.imcdp.MA$Map.partitionData(MA.java:69)
at com.hadoop.imcdp.MA$Map.map(MA.java:58)
at com.hadoop.imcdp.MA$Map.map(MA.java:49)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:146)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:787)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1762)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
Container killed by the ApplicationMaster.
Container killed on request. Exit code is 143
Container exited with a non-zero exit code 143
19/01/23 03:28:54 INFO mapreduce.Job: Task Id : attempt_1548178978946_0002_m_000000_1, Status : FAILED
Error: java.lang.NumberFormatException: For input string: "1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20"
at sun.misc.FloatingDecimal.readJavaFormatString(FloatingDecimal.java:2043)
at sun.misc.FloatingDecimal.parseDouble(FloatingDecimal.java:110)
at java.lang.Double.parseDouble(Double.java:538)
at java.lang.Double.<init>(Double.java:608)
at com.hadoop.imcdp.MA$Map.partitionData(MA.java:69)
at com.hadoop.imcdp.MA$Map.map(MA.java:58)
at com.hadoop.imcdp.MA$Map.map(MA.java:49)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:146)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:787)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1762)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
Container killed by the ApplicationMaster.
Container killed on request. Exit code is 143
Container exited with a non-zero exit code 143
19/01/23 03:28:54 INFO mapreduce.Job: Task Id : attempt_1548178978946_0002_m_000000_2, Status : FAILED
Error: java.lang.NumberFormatException: For input string: "1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20"
at sun.misc.FloatingDecimal.readJavaFormatString(FloatingDecimal.java:2043)
at sun.misc.FloatingDecimal.parseDouble(FloatingDecimal.java:110)
at java.lang.Double.parseDouble(Double.java:538)
at java.lang.Double.<init>(Double.java:608)
at com.hadoop.imcdp.MA$Map.partitionData(MA.java:69)
at com.hadoop.imcdp.MA$Map.map(MA.java:58)
at com.hadoop.imcdp.MA$Map.map(MA.java:49)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:146)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:787)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1762)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
Container killed by the ApplicationMaster.
Container killed on request. Exit code is 143
Container exited with a non-zero exit code 143
19/01/23 03:28:54 INFO mapreduce.Job: Job job_1548178978946_0002 failed with state FAILED due to: Task failed task_1548178978946_0002_m_000000
Job failed as tasks failed. failedMaps:1 failedReduces:0
19/01/23 03:28:54 INFO mapreduce.Job: Counters: 13
Job Counters
Failed map tasks=4
Killed reduce tasks=1
Launched map tasks=4
Other local map tasks=3
Data-local map tasks=1
Total time spent by all maps in occupied slots (ms)=67635
Total time spent by all reduces in occupied slots (ms)=0
Total time spent by all map tasks (ms)=67635
Total time spent by all reduce tasks (ms)=0
Total vcore-milliseconds taken by all map tasks=67635
Total vcore-milliseconds taken by all reduce tasks=0
Total megabyte-milliseconds taken by all map tasks=69258240
Total megabyte-milliseconds taken by all reduce tasks=0
code of moving average example
public class MA extends Configured implements Tool{
// For production the windowlength would be a commandline or other argument
static double windowlength = 3.0;
static int thekey = (int)windowlength/2;
// used for handling the circular list.
static boolean initialised=false;
// Sample window
static ArrayList<Double> window = new ArrayList<Double>() ;
// The Map method processes the data one point at a time and passes the circular list to the
// reducer.
public static class Map extends Mapper<LongWritable, Text, Text, Text> {
private final static IntWritable one = new IntWritable(1);
private Text word = new Text();
public void map(LongWritable key, Text value, Context context
) throws IOException , InterruptedException
{
double wlen = windowlength;
// creates windows of samples and sends them to the Reducer
partitionData(value, context, wlen);
}
// Create sample windows starting at each sata point and sends them to the reducer
private void partitionData(Text value,
Context context, double wlen)
throws IOException , InterruptedException {
String line = value.toString();
// the division must be done this way in the mapper.
Double ival = new Double(line)/wlen;
// Build initial sample window
if(window.size() < windowlength)
{
window.add(ival);
}
// emit first window
if(!initialised && window.size() == windowlength)
{
initialised = true;
emit(thekey, window, context);
thekey++;
return;
}
// Update and emit subsequent windows
if(initialised)
{
// remove oldest datum
window.remove(0);
// add new datum
window.add(ival);
emit(thekey, window, context);
thekey++;
}
}
}
// Transform list to a string and send to reducer. Text to be replaced by ObjectWritable
// Problem: Hadoop apparently requires all output formats to be the same so
// cannot make this output collector differ from the one the reducer uses.
public static void emit(int key, ArrayList<Double> value, Context context) throws IOException , InterruptedException
{
Text tx = new Text();
tx.set(new Integer(key).toString());
String outstring = value.toString();
// remove the square brackets Java puts in
String tidied = outstring.substring(1,outstring.length()-1).trim();
Text out = new Text();
out.set(tidied);
context.write(tx,out);
}
public static class Reduce extends Reducer<Text, Text, Text, Text>
{
public void reduce(Text key,
Iterator<Text> values,
Context context
) throws IOException , InterruptedException
{
while (values.hasNext())
{
computeAverage(key, values, context);
}
}
// computes the average of each window and sends to ouptut collector.
private void computeAverage(Text key, Iterator<Text> values,
Context context)
throws IOException , InterruptedException {
double sum = 0;
String thevalue = values.next().toString();
String[] thenumbers = thevalue.split(",");
for( String temp: thenumbers)
{
// need to trim the string because the constructor does not trim.
Double ds = new Double(temp.trim());
sum += ds;
}
Text out = new Text();
String outstring = Double.toString(sum);
out.set(outstring);
context.write (key, out);
}
}
#Override
public int run(String[] args) throws Exception {
Configuration conf = new Configuration();
args = new GenericOptionsParser(conf, args).getRemainingArgs();
String input = args[0];
String output = args[1];
Job job = new Job(conf, "MA");
job.setJarByClass(MA.class);
job.setInputFormatClass(TextInputFormat.class);
job.setMapperClass(Map.class);
job.setMapOutputKeyClass(Text.class);
job.setMapOutputValueClass(Text.class);
job.setCombinerClass(Reduce.class);
job.setReducerClass(Reduce.class);
job.setOutputFormatClass(TextOutputFormat.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(Text.class);
FileInputFormat.setInputPaths(job, new Path(input));
Path outPath = new Path(output);
FileOutputFormat.setOutputPath(job, outPath);
outPath.getFileSystem(conf).delete(outPath, true);
job.waitForCompletion(true);
return (job.waitForCompletion(true) ? 0 : 1);
}
public static void main(String[] args) throws Exception {
int exitCode = ToolRunner.run(new MA(), args);
System.exit(exitCode);
}
// public static void main(String[] args) throws Exception {
// JobConf conf = new JobConf(MA.class);
// conf.setJobName("MA");
// conf.setOutputKeyClass(Text.class);
// conf.setOutputValueClass(Text.class);
// conf.setMapperClass(Map.class);
// conf.setCombinerClass(Reduce.class);
// conf.setReducerClass(Reduce.class);
// conf.setInputFormat(TextInputFormat.class);
// conf.setOutputFormat(TextOutputFormat.class);
// FileInputFormat.setInputPaths(conf, new Path(args[0]));
// FileOutputFormat.setOutputPath(conf, new Path(args[1]));
// FileInputFormat.setInputPaths(conf, new Path("input/movingaverage.txt"));
// FileOutputFormat.setOutputPath(conf, new Path("output/smoothed"));
// JobClient.runJob(conf);
// }
}

Double ival = new Double(line)/wlen;
is throwing the error, as line is 1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20, which doesn't parse into a Double. Assuming you want each of these numbers to be a double, you need to do:
List<Double> ivals = new ArrayList<>();
String[] numbers = line.split(",");
for (int i = 0; i < numbers.length(); i++) {
ivals.add(new Double(numbers[i])/wlen);
}

getting Error while importing data from mongodb to hdfs

I am getting errors while importing data from mongodb to hdfs.
I as using:
Ambari Sandbox [Hortonworks] Hadoop 2.7
MongoDB version 3.0
These are the jar files I am including:
mongo-java-driver-2.11.4.jar
mongo-hadoop-core-1.3.0.jar
Here is the code I am using:
package com.mongo.test;
import java.io.*;
import org.apache.commons.logging.*;
import org.apache.hadoop.conf.*;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.*;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.*;
import org.apache.hadoop.mapreduce.*;
import org.bson.*;
import com.mongodb.MongoClient;
import com.mongodb.hadoop.*;
import com.mongodb.hadoop.util.*;
public class ImportFromMongoToHdfs {
private static final Log log =
LogFactory.getLog(ImportFromMongoToHdfs.class);
public static class ReadEmpDataFromMongo extends Mapper<Object,
BSONObject, Text, Text>{
public void map(Object key, BSONObject value, Context context) throws
IOException, InterruptedException{
System.out.println("Key: " + key);
System.out.println("Value: " + value);
String md5 = value.get("md5").toString();
String name = value.get("name").toString();
String dev = value.get("dev").toString();
String salary = value.get("salary").toString();
String location = value.get("location").toString();
String output = "\t" + name + "\t" + dev + "\t" + salary + "\t" +
location;
context.write( new Text(md5), new Text(output));
}
}
public static void main(String[] args)throws Exception {
final Configuration conf = new Configuration();
MongoConfigUtil.setInputURI(conf,"mongodb://10.25.3.196:27017/admin.emp")
;
MongoConfigUtil.setCreateInputSplits(conf, false);
System.out.println("Configuration: " + conf);
final Job job = new Job(conf, "ReadWeblogsFromMongo");
Path out = new Path("/mongodb3");
FileOutputFormat.setOutputPath(job, out);
job.setJarByClass(ImportFromMongoToHdfs.class);
job.setMapperClass(ReadEmpDataFromMongo.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(Text.class);
job.setInputFormatClass(com.mongodb.hadoop.MongoInputFormat.class);
job.setOutputFormatClass(TextOutputFormat.class);
job.setNumReduceTasks(0);
System.exit(job.waitForCompletion(true) ? 0 : 1 );
}
}
This is the error I am getting back:
[root#sandbox ~]# hadoop jar /mongoinput/mongdbconnect.jar com.mongo.test.ImportFromMongoToHdfs
WARNING: Use "yarn jar" to launch YARN applications.
Configuration: Configuration: core-default.xml, core-site.xml
15/09/09 09:22:51 INFO impl.TimelineClientImpl: Timeline service address: http://sandbox.hortonworks.com:8188/ws/v1/timeline/
15/09/09 09:22:53 INFO client.RMProxy: Connecting to ResourceManager at sandbox.hortonworks.com/10.25.3.209:8050
15/09/09 09:22:53 WARN mapreduce.JobResourceUploader: Hadoop command-line option parsing not performed. Implement the Tool interface and execute your application with ToolRunner to remedy this.
15/09/09 09:22:54 INFO splitter.SingleMongoSplitter: SingleMongoSplitter calculating splits for mongodb://10.25.3.196:27017/admin.emp
15/09/09 09:22:54 INFO mapreduce.JobSubmitter: number of splits:1
15/09/09 09:22:55 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1441784509780_0003
15/09/09 09:22:55 INFO impl.YarnClientImpl: Submitted application application_1441784509780_0003
15/09/09 09:22:55 INFO mapreduce.Job: The url to track the job: http://sandbox.hortonworks.com:8088/proxy/application_1441784509780_0003/
15/09/09 09:22:55 INFO mapreduce.Job: Running job: job_1441784509780_0003
15/09/09 09:23:05 INFO mapreduce.Job: Job job_1441784509780_0003 running in uber mode : false
15/09/09 09:23:05 INFO mapreduce.Job: map 0% reduce 0%
15/09/09 09:23:12 INFO mapreduce.Job: Task Id : attempt_1441784509780_0003_m_000000_0, Status : FAILED
Error: java.lang.RuntimeException: java.lang.ClassNotFoundException: Class com.mongodb.hadoop.MongoInputFormat not found
at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2195)
at org.apache.hadoop.mapreduce.task.JobContextImpl.getInputFormatClass(JobContextImpl.java:174)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:749)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
Caused by: java.lang.ClassNotFoundException: Class com.mongodb.hadoop.MongoInputFormat not found
at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:2101)
at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2193)
... 8 more
15/09/09 09:23:18 INFO mapreduce.Job: Task Id : attempt_1441784509780_0003_m_000000_1, Status : FAILED
Error: java.lang.RuntimeException: java.lang.ClassNotFoundException: Class com.mongodb.hadoop.MongoInputFormat not found
at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2195)
at org.apache.hadoop.mapreduce.task.JobContextImpl.getInputFormatClass(JobContextImpl.java:174)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:749)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
Caused by: java.lang.ClassNotFoundException: Class com.mongodb.hadoop.MongoInputFormat not found
at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:2101)
at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2193)
... 8 more
15/09/09 09:23:24 INFO mapreduce.Job: Task Id : attempt_1441784509780_0003_m_000000_2, Status : FAILED
Error: java.lang.RuntimeException: java.lang.ClassNotFoundException: Class com.mongodb.hadoop.MongoInputFormat not found
at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2195)
at org.apache.hadoop.mapreduce.task.JobContextImpl.getInputFormatClass(JobContextImpl.java:174)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:749)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
Caused by: java.lang.ClassNotFoundException: Class com.mongodb.hadoop.MongoInputFormat not found
at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:2101)
at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2193)
... 8 more
15/09/09 09:23:32 INFO mapreduce.Job: map 100% reduce 0%
15/09/09 09:23:32 INFO mapreduce.Job: Job job_1441784509780_0003 failed with state FAILED due to: Task failed task_1441784509780_0003_m_000000
Job failed as tasks failed. failedMaps:1 failedReduces:0
15/09/09 09:23:32 INFO mapreduce.Job: Counters: 9
Job Counters
Failed map tasks=4
Launched map tasks=4
Other local map tasks=3
Rack-local map tasks=1
Total time spent by all maps in occupied slots (ms)=16996
Total time spent by all reduces in occupied slots (ms)=0
Total time spent by all map tasks (ms)=16996
Total vcore-seconds taken by all map tasks=16996
Total megabyte-seconds taken by all map tasks=4249000
[root#sandbox ~]#
Does anyone know what is wrong?

make sure you keep mongo-hadoop jar in Hadoop class path and restart the Hadoop.
The error java.lang.ClassNotFoundException: Class com.mongodb.hadoop.MongoInputFormat should be resolved.

You are getting ClassNotFoundException becuase you is unable to reach to jar "mongo-hadoop-core*.jar". You have to make "mongo-hadoop-core*.jar" available to your code
There are many ways you resolve this error -
Create Fat Jar for your program. Fat jar will contain all necessary dependent jars. You can easily create fat jar if you are using any IDE.
use "-libjars" argument while submitting your yarn job
Copy mongo jars to Hadoop_Classpath location

I have just resolved a problem like this. In fact, this is an error at run time. If we set Hadoop_ClassPath pointing to the external necessary jar files, this was not enough yet. Because, I think at run time, Hadoop will look for jar files in the folder in which Hadoop is installed. I realize that we need to copy all necessary external jar files in the folder installed Hadoop. So :
First, you need to check HADOOP_CLASSPATH by typing :
- hadoop classpath
Then copy the necessary external jar file in one the HADOOP_CLASSPATH. For exemple, I will copy mongo-hadoop-1.5.1.jar and some others jar files to folder /usr/local/hadoop/share/hadoop/mapreduce.
Then it works for me!

Custom Partitioning gives ArrayIndexOuntOfBounds Error

When I run my code, I get the following exception:
hadoop#hadoop:~/testPrograms$ hadoop jar cp.jar CustomPartition /test/test.txt /test/output33
15/03/03 16:33:33 INFO Configuration.deprecation: session.id is deprecated. Instead, use dfs.metrics.session-id
15/03/03 16:33:33 INFO jvm.JvmMetrics: Initializing JVM Metrics with processName=JobTracker, sessionId=
15/03/03 16:33:33 WARN mapreduce.JobSubmitter: Hadoop command-line option parsing not performed. Implement the Tool interface and execute your application with ToolRunner to remedy this.
15/03/03 16:33:33 INFO input.FileInputFormat: Total input paths to process : 1
15/03/03 16:33:34 INFO mapreduce.JobSubmitter: number of splits:1
15/03/03 16:33:34 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_local1055584612_0001
15/03/03 16:33:35 INFO mapreduce.Job: The url to track the job: http://localhost:8080/
15/03/03 16:33:35 INFO mapreduce.Job: Running job: job_local1055584612_0001
15/03/03 16:33:35 INFO mapred.LocalJobRunner: OutputCommitter set in config null
15/03/03 16:33:35 INFO mapred.LocalJobRunner: OutputCommitter is org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter
15/03/03 16:33:35 INFO mapred.LocalJobRunner: Waiting for map tasks
15/03/03 16:33:35 INFO mapred.LocalJobRunner: Starting task: attempt_local1055584612_0001_m_000000_0
15/03/03 16:33:35 INFO mapred.Task: Using ResourceCalculatorProcessTree : [ ]
15/03/03 16:33:35 INFO mapred.MapTask: Processing split: hdfs://node1/test/test.txt:0+107
15/03/03 16:33:35 INFO mapred.MapTask: (EQUATOR) 0 kvi 26214396(104857584)
15/03/03 16:33:35 INFO mapred.MapTask: mapreduce.task.io.sort.mb: 100
15/03/03 16:33:35 INFO mapred.MapTask: soft limit at 83886080
15/03/03 16:33:35 INFO mapred.MapTask: bufstart = 0; bufvoid = 104857600
15/03/03 16:33:35 INFO mapred.MapTask: kvstart = 26214396; length = 6553600
15/03/03 16:33:35 INFO mapred.MapTask: Map output collector class = org.apache.hadoop.mapred.MapTask$MapOutputBuffer
15/03/03 16:33:35 INFO mapred.MapTask: Starting flush of map output
15/03/03 16:33:35 INFO mapred.LocalJobRunner: map task executor complete.
15/03/03 16:33:35 WARN mapred.LocalJobRunner: job_local1055584612_0001
java.lang.Exception: java.lang.ArrayIndexOutOfBoundsException: 2
at org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:462)
at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:522)
Caused by: java.lang.ArrayIndexOutOfBoundsException: 2
at CustomPartition$MapperClass.map(CustomPartition.java:27)
at CustomPartition$MapperClass.map(CustomPartition.java:17)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:784)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:243)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
15/03/03 16:33:36 INFO mapreduce.Job: Job job_local1055584612_0001 running in uber mode : false
15/03/03 16:33:36 INFO mapreduce.Job: map 0% reduce 0%
15/03/03 16:33:36 INFO mapreduce.Job: Job job_local1055584612_0001 failed with state FAILED due to: NA
15/03/03 16:33:36 INFO mapreduce.Job: Counters: 0
I am trying to partition based on the game the persons play. Each word is separated by a tab. And after the three fields, I got the next line by pressing the return key.
My code:
public class CustomPartition {
public static class MapperClass extends Mapper<Object, Text, Text, Text>{
public void map(Object key, Text value, Context context) throws IOException, InterruptedException {
String itr[] = value.toString().split("\t");
String game=itr[2].toString();
String nameGoals=itr[0]+"\t"+itr[1];
context.write(new Text(game), new Text(nameGoals));
}
}
public static class GoalPartition extends Partitioner<Text, Text> {
#Override
public int getPartition(Text key,Text value, int numReduceTasks){
if(key.toString()=="football")
{return 0;}
else if(key.toString()=="basketball")
{return 1;}
else// (key.toString()=="icehockey")
{return 2;}
}
}
public static class ReducerClass extends Reducer<Text,Text,Text,Text> {
#Override
public void reduce(Text key, Iterable<Text> values, Context context) throws IOException, InterruptedException {
String name="";
String game="";
int maxGoals=0;
for (Text val : values)
{
String valTokens[]= val.toString().split("\t");
int goals = Integer.parseInt(valTokens[1]);
if(goals > maxGoals)
{
name = valTokens[0];
game = key.toString();
maxGoals = goals;
context.write(new Text(name), new Text ("game"+game+"score"+maxGoals));
}
}
}
public static void main(String[] args) throws Exception {
Configuration conf = new Configuration();
Job job = Job.getInstance(conf, "custom partition");
job.setJarByClass(CustomPartition.class);
job.setMapperClass(MapperClass.class);
job.setCombinerClass(ReducerClass.class);
job.setPartitionerClass(GoalPartition.class);
job.setReducerClass(ReducerClass.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(Text.class);
FileInputFormat.addInputPath(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job, new Path(args[1]));
System.exit(job.waitForCompletion(true) ? 0 : 1);
}
}

Avro Map Reduce - AvroInputFormat not found error

This is what i have understood so far reading from varied sources on the internet.
Avro mapred and Avro are not part of CDH4 (Cloudera Distribution) and i have to set it up manually using HADOOP_CLASSPATH=avro.jar:avro-mapred.jar
I have done that and when i run my job on my pseudo cluster it throws the following exception:
13/12/27 00:47:40 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
13/12/27 00:47:40 INFO mapred.FileInputFormat: Total input paths to process : 1
13/12/27 00:47:41 INFO mapred.JobClient: Running job: job_201312221245_0017
13/12/27 00:47:42 INFO mapred.JobClient: map 0% reduce 0%
13/12/27 00:47:57 INFO mapred.JobClient: Task Id : attempt_201312221245_0017_m_000000_0, Status : FAILED
java.lang.RuntimeException: java.lang.RuntimeException: java.lang.ClassNotFoundException: Class org.apache.avro.mapred.AvroInputFormat not found
I'm running the job as follows:
hadoop jar build/libs/hadoop-boilerplate-1.0.jar CustomerMapReduce transactions/input transactions/output1 -libjars /path/to/libs/avro-1.7.4.jar,/path/to/libs/avro-mapred-1.7.4.jar

You should implement Tool and use getConf() for job configuration.
public class SomeClass extends Configured implements Tool {
#Override
public int run(String[] args) throws Exception {
Configuration conf = getConf();
...
}
}

hadoop not running in the multinode cluster

I have a jar file "Tsp.jar" that I made myself. This same jar files executes well in single node cluster setup of hadoop. However when I run it on a cluster comprising 2 machines, a laptop and desktop it gives me an exception when the map function reach 50%. Here is the output
`hadoop#psycho-O:/usr/local/hadoop$ bin/hadoop jar Tsp.jar clust-Tsp_ip1 clust_Tsp_op4
11/04/27 16:13:06 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
11/04/27 16:13:06 WARN mapred.JobClient: No job jar file set. User classes may not be found. See JobConf(Class) or JobConf#setJar(String).
11/04/27 16:13:06 INFO mapred.FileInputFormat: Total input paths to process : 1
11/04/27 16:13:06 INFO mapred.JobClient: Running job: job_201104271608_0001
11/04/27 16:13:07 INFO mapred.JobClient: map 0% reduce 0%
11/04/27 16:13:17 INFO mapred.JobClient: map 50% reduce 0%
11/04/27 16:13:20 INFO mapred.JobClient: Task Id : attempt_201104271608_0001_m_000001_0, Status : FAILED
java.lang.RuntimeException: java.lang.RuntimeException: java.lang.ClassNotFoundException: Tsp$TspReducer
at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:841)
at org.apache.hadoop.mapred.JobConf.getCombinerClass(JobConf.java:853)
at org.apache.hadoop.mapred.Task$CombinerRunner.create(Task.java:1100)
at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.<init>(MapTask.java:812)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:350)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)
at org.apache.hadoop.mapred.Child.main(Child.java:170)
Caused by: java.lang.RuntimeException: java.lang.ClassNotFoundException: Tsp$TspReducer
at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:809)
at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:833)
... 6 more
Caused by: java.lang.ClassNotFoundException: Tsp$TspReducer
at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
at java.lang.ClassLoader.loadClass(ClassLoader.java:307)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
at java.lang.ClassLoader.loadClass(ClassLoader.java:248)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:247)
at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:762)
at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:807)
... 7 more
11/04/27 16:13:20 WARN mapred.JobClient: Error reading task outputemil-desktop
11/04/27 16:13:20 WARN mapred.JobClient: Error reading task outputemil-desktop
^Z
[1]+ Stopped bin/hadoop jar Tsp.jar clust-Tsp_ip1 clust_Tsp_op4
hadoop#psycho-O:~$ jps
4937 Jps
3976 RunJar
`
Alse the cluster worked fine executing the wordcount example. So I guess its the problem with the Tsp.jar file.
1) Is it necessary to have a jar file to run on a cluster?
2) Here I tried to run a jar file in the cluster which I made. But is still gives a warning that jar file is not found. Why is that?
3) What all should be taken care of when running a jar file? Like what all it must contain other than the program which I wrote? My jar file contains a a Tsp.class, Tsp$TspReducer.class and a Tsp$TspMapper.class. The terminal says it cant find Tsp$TspReducer when it is already there in the jar file.
Thankyou
EDIT
public class Tsp {
public static void main(String[] args) throws IOException {
JobConf conf = new JobConf(Tsp.class);
conf.setJobName("Tsp");
conf.setOutputKeyClass(Text.class);
conf.setOutputValueClass(Text.class);
conf.setMapperClass(TspMapper.class);
conf.setCombinerClass(TspReducer.class);
conf.setReducerClass(TspReducer.class);
FileInputFormat.addInputPath(conf,new Path(args[0]));
FileOutputFormat.setOutputPath(conf,new Path(args[1]));
JobClient.runJob(conf);
}
public static class TspMapper extends MapReduceBase
implements Mapper<LongWritable, Text, Text, Text> {
function findCost() {
}
public void map(LongWritable key,Text value, OutputCollector<Text, Text> output, Reporter reporter) throws IOException {
find adjacency matrix from the input;
for(int i = 0; ...) {
.....
output.collect(new Text(string1), new Text(string2));
}
}
}
public static class TspReducer extends MapReduceBase implements Reducer<Text, Text, Text, Text> {
Text t1 = new Text();
public void reduce(Text key, Iterator<Text> values, OutputCollector<Text, Text> output, Reporter reporter) throws IOException {
String a;
a = values.next().toString();
output.collect(key,new Text(a));
}
}
}

You currently have
conf.setJobName("Tsp");
conf.setOutputKeyClass(Text.class);
conf.setOutputValueClass(Text.class);
conf.setMapperClass(TspMapper.class);
conf.setCombinerClass(TspReducer.class);
conf.setReducerClass(TspReducer.class);
and as the error is stating No job jar file set you are not setting a jar.
You will need to something similar to
conf.setJarByClass(Tsp.class);
From what I'm seeing, that should resolve the error seen here.

11/04/27 16:13:06 WARN mapred.JobClient: No job jar file set. User classes may not be found. See JobConf(Class) or JobConf#setJar(String).
Do what they say, when setting up your job, set the jar where the class is contained. Hadoop copies the jar into the DistributedCache (a filesystem on every node) and uses the classes out of it.

I had the exact same issue. Here is how I solved the problem(imagine your map reduce class is called A). After creating the job call:
job.setJarByClass(A.class);

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

Hadoop pipes problem - hadoop

The problem here is as #Nija describes. hadoop.pipes.java.recordreader is not being specified, and defaults to false. That means it's expecting you to have a RecordReader in your C++ code. And you don't have one, and thus it can't be found.

Related

How to debug where a map-reduce fails?

getting Error while importing data from mongodb to hdfs

Custom Partitioning gives ArrayIndexOuntOfBounds Error

Avro Map Reduce - AvroInputFormat not found error

hadoop not running in the multinode cluster

Categories

Resources