NullPointerException when writing mapper output - hadoop

I am using Amazon EMR. I have a map / reduce job configured like this:
private static final String TEMP_PATH_PREFIX = System.getProperty("java.io.tmpdir") + "/dmp_processor_tmp";
...
private Job setupProcessorJobS3() throws IOException, DataGrinderException {
String inputFiles = System.getProperty(DGConfig.INPUT_FILES);
Job processorJob = new Job(getConf(), PROCESSOR_JOBNAME);
processorJob.setJarByClass(DgRunner.class);
processorJob.setMapperClass(EntityMapperS3.class);
processorJob.setReducerClass(SelectorReducer.class);
processorJob.setOutputKeyClass(Text.class);
processorJob.setOutputValueClass(Text.class);
FileOutputFormat.setOutputPath(processorJob, new Path(TEMP_PATH_PREFIX));
processorJob.setOutputFormatClass(TextOutputFormat.class);
processorJob.setInputFormatClass(NLineInputFormat.class);
FileInputFormat.setInputPaths(processorJob, inputFiles);
NLineInputFormat.setNumLinesPerSplit(processorJob, 10000);
return processorJob;
}
In my mapper class, I have:
private Text outkey = new Text();
private Text outvalue = new Text();
...
outkey.set(entity.getEntityId().toString());
outvalue.set(input.getId().toString());
printLog("context write");
context.write(outkey, outvalue);
This last line (context.write(outkey, outvalue);), causes this exception. Of course both outkey and outvalue are not null.
2013-10-24 05:48:48,422 INFO com.s1mbi0se.grinder.core.mapred.EntityMapperCassandra (main): Current Thread: Thread[main,5,main]Current timestamp: 1382593728422 context write
2013-10-24 05:48:48,422 ERROR com.s1mbi0se.grinder.core.mapred.EntityMapperCassandra (main): Error on entitymapper for input: 03a07858-4196-46dd-8a2c-23dd824d6e6e
java.lang.NullPointerException
at java.lang.System.arraycopy(Native Method)
at org.apache.hadoop.mapred.MapTask$MapOutputBuffer$Buffer.write(MapTask.java:1293)
at org.apache.hadoop.mapred.MapTask$MapOutputBuffer$Buffer.write(MapTask.java:1210)
at java.io.DataOutputStream.writeByte(DataOutputStream.java:153)
at org.apache.hadoop.io.WritableUtils.writeVLong(WritableUtils.java:264)
at org.apache.hadoop.io.WritableUtils.writeVInt(WritableUtils.java:244)
at org.apache.hadoop.io.Text.write(Text.java:281)
at org.apache.hadoop.io.serializer.WritableSerialization$WritableSerializer.serialize(WritableSerialization.java:90)
at org.apache.hadoop.io.serializer.WritableSerialization$WritableSerializer.serialize(WritableSerialization.java:77)
at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:1077)
at org.apache.hadoop.mapred.MapTask$NewOutputCollector.write(MapTask.java:698)
at org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80)
at com.s1mbi0se.grinder.core.mapred.EntityMapper.map(EntityMapper.java:78)
at com.s1mbi0se.grinder.core.mapred.EntityMapperS3.map(EntityMapperS3.java:34)
at com.s1mbi0se.grinder.core.mapred.EntityMapperS3.map(EntityMapperS3.java:14)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:771)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:375)
at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1132)
at org.apache.hadoop.mapred.Child.main(Child.java:249)
2013-10-24 05:48:48,422 INFO com.s1mbi0se.grinder.core.mapred.EntityMapperS3 (main): Current Thread: Thread[main,5,main]Current timestamp: 1382593728422 Entity Mapper end
The first records on each task are just processed ok. In some point of the task processing, I start to take this exception over and over, and then it doesn't process a single record anymore for that task.
I tried to set TEMP_PATH_PREFIX to "s3://mybucket/dmp_processor_tmp", but same thing happened.
Any idea why is this happening? What could be making hadoop not being able to write on it's output?

Related

How to debug where a map-reduce fails?

I compile the program it runs successfully but mapper reducer fails; I need help in code to run the mapper and reducer successfully.
I guess there are some parsing issues in the code. Attached are the errors and below that is the code. How can I find the issues to solve this problem?
01/23 03:27:33 INFO mapreduce.Job: map 0% reduce 0%
19/01/23 03:27:50 INFO mapreduce.Job: map 100% reduce 0%
19/01/23 03:27:51 INFO mapreduce.Job: map 0% reduce 0%
19/01/23 03:27:52 INFO mapreduce.Job: Task Id : attempt_1548178978946_0002_m_000000_0, Status : FAILED
Error: java.lang.NumberFormatException: For input string: "1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20"
at sun.misc.FloatingDecimal.readJavaFormatString(FloatingDecimal.java:2043)
at sun.misc.FloatingDecimal.parseDouble(FloatingDecimal.java:110)
at java.lang.Double.parseDouble(Double.java:538)
at java.lang.Double.<init>(Double.java:608)
at com.hadoop.imcdp.MA$Map.partitionData(MA.java:69)
at com.hadoop.imcdp.MA$Map.map(MA.java:58)
at com.hadoop.imcdp.MA$Map.map(MA.java:49)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:146)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:787)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1762)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
19/01/23 03:28:21 INFO mapreduce.Job: Task Id : attempt_1548178978946_0002_m_000000_1, Status : FAILED
Error: java.lang.NumberFormatException: For input string: "1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20"
at sun.misc.FloatingDecimal.readJavaFormatString(FloatingDecimal.java:2043)
at sun.misc.FloatingDecimal.parseDouble(FloatingDecimal.java:110)
at java.lang.Double.parseDouble(Double.java:538)
at java.lang.Double.<init>(Double.java:608)
at com.hadoop.imcdp.MA$Map.partitionData(MA.java:69)
at com.hadoop.imcdp.MA$Map.map(MA.java:58)
at com.hadoop.imcdp.MA$Map.map(MA.java:49)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:146)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:787)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1762)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
Container killed by the ApplicationMaster.
Container killed on request. Exit code is 143
Container exited with a non-zero exit code 143
19/01/23 03:28:32 INFO mapreduce.Job: Task Id : attempt_1548178978946_0002_m_000000_2, Status : FAILED
Error: java.lang.NumberFormatException: For input string: "1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20"
at sun.misc.FloatingDecimal.readJavaFormatString(FloatingDecimal.java:2043)
at sun.misc.FloatingDecimal.parseDouble(FloatingDecimal.java:110)
at java.lang.Double.parseDouble(Double.java:538)
at java.lang.Double.<init>(Double.java:608)
at com.hadoop.imcdp.MA$Map.partitionData(MA.java:69)
at com.hadoop.imcdp.MA$Map.map(MA.java:58)
at com.hadoop.imcdp.MA$Map.map(MA.java:49)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:146)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:787)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1762)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
Container killed by the ApplicationMaster.
Container killed on request. Exit code is 143
Container exited with a non-zero exit code 143
19/01/23 03:28:51 INFO mapreduce.Job: map 100% reduce 100%
19/01/23 03:28:53 INFO mapreduce.Job: Job job_1548178978946_0002 failed with state FAILED due to: Task failed task_1548178978946_0002_m_000000
Job failed as tasks failed. failedMaps:1 failedReduces:0
19/01/23 03:28:53 INFO mapreduce.Job: Counters: 13
Job Counters
Failed map tasks=4
Killed reduce tasks=1
Launched map tasks=4
Other local map tasks=3
Data-local map tasks=1
Total time spent by all maps in occupied slots (ms)=67635
Total time spent by all reduces in occupied slots (ms)=0
Total time spent by all map tasks (ms)=67635
Total time spent by all reduce tasks (ms)=0
Total vcore-milliseconds taken by all map tasks=67635
Total vcore-milliseconds taken by all reduce tasks=0
Total megabyte-milliseconds taken by all map tasks=69258240
Total megabyte-milliseconds taken by all reduce tasks=0
19/01/23 03:28:53 INFO mapreduce.Job: Running job: job_1548178978946_0002
19/01/23 03:28:53 INFO mapreduce.Job: Job job_1548178978946_0002 running in uber mode : false
19/01/23 03:28:53 INFO mapreduce.Job: map 100% reduce 100%
19/01/23 03:28:53 INFO mapreduce.Job: Task Id : attempt_1548178978946_0002_m_000000_0, Status : FAILED
Error: java.lang.NumberFormatException: For input string: "1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20"
at sun.misc.FloatingDecimal.readJavaFormatString(FloatingDecimal.java:2043)
at sun.misc.FloatingDecimal.parseDouble(FloatingDecimal.java:110)
at java.lang.Double.parseDouble(Double.java:538)
at java.lang.Double.<init>(Double.java:608)
at com.hadoop.imcdp.MA$Map.partitionData(MA.java:69)
at com.hadoop.imcdp.MA$Map.map(MA.java:58)
at com.hadoop.imcdp.MA$Map.map(MA.java:49)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:146)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:787)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1762)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
Container killed by the ApplicationMaster.
Container killed on request. Exit code is 143
Container exited with a non-zero exit code 143
19/01/23 03:28:54 INFO mapreduce.Job: Task Id : attempt_1548178978946_0002_m_000000_1, Status : FAILED
Error: java.lang.NumberFormatException: For input string: "1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20"
at sun.misc.FloatingDecimal.readJavaFormatString(FloatingDecimal.java:2043)
at sun.misc.FloatingDecimal.parseDouble(FloatingDecimal.java:110)
at java.lang.Double.parseDouble(Double.java:538)
at java.lang.Double.<init>(Double.java:608)
at com.hadoop.imcdp.MA$Map.partitionData(MA.java:69)
at com.hadoop.imcdp.MA$Map.map(MA.java:58)
at com.hadoop.imcdp.MA$Map.map(MA.java:49)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:146)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:787)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1762)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
Container killed by the ApplicationMaster.
Container killed on request. Exit code is 143
Container exited with a non-zero exit code 143
19/01/23 03:28:54 INFO mapreduce.Job: Task Id : attempt_1548178978946_0002_m_000000_2, Status : FAILED
Error: java.lang.NumberFormatException: For input string: "1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20"
at sun.misc.FloatingDecimal.readJavaFormatString(FloatingDecimal.java:2043)
at sun.misc.FloatingDecimal.parseDouble(FloatingDecimal.java:110)
at java.lang.Double.parseDouble(Double.java:538)
at java.lang.Double.<init>(Double.java:608)
at com.hadoop.imcdp.MA$Map.partitionData(MA.java:69)
at com.hadoop.imcdp.MA$Map.map(MA.java:58)
at com.hadoop.imcdp.MA$Map.map(MA.java:49)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:146)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:787)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1762)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
Container killed by the ApplicationMaster.
Container killed on request. Exit code is 143
Container exited with a non-zero exit code 143
19/01/23 03:28:54 INFO mapreduce.Job: Job job_1548178978946_0002 failed with state FAILED due to: Task failed task_1548178978946_0002_m_000000
Job failed as tasks failed. failedMaps:1 failedReduces:0
19/01/23 03:28:54 INFO mapreduce.Job: Counters: 13
Job Counters
Failed map tasks=4
Killed reduce tasks=1
Launched map tasks=4
Other local map tasks=3
Data-local map tasks=1
Total time spent by all maps in occupied slots (ms)=67635
Total time spent by all reduces in occupied slots (ms)=0
Total time spent by all map tasks (ms)=67635
Total time spent by all reduce tasks (ms)=0
Total vcore-milliseconds taken by all map tasks=67635
Total vcore-milliseconds taken by all reduce tasks=0
Total megabyte-milliseconds taken by all map tasks=69258240
Total megabyte-milliseconds taken by all reduce tasks=0
code of moving average example
public class MA extends Configured implements Tool{
// For production the windowlength would be a commandline or other argument
static double windowlength = 3.0;
static int thekey = (int)windowlength/2;
// used for handling the circular list.
static boolean initialised=false;
// Sample window
static ArrayList<Double> window = new ArrayList<Double>() ;
// The Map method processes the data one point at a time and passes the circular list to the
// reducer.
public static class Map extends Mapper<LongWritable, Text, Text, Text> {
private final static IntWritable one = new IntWritable(1);
private Text word = new Text();
public void map(LongWritable key, Text value, Context context
) throws IOException , InterruptedException
{
double wlen = windowlength;
// creates windows of samples and sends them to the Reducer
partitionData(value, context, wlen);
}
// Create sample windows starting at each sata point and sends them to the reducer
private void partitionData(Text value,
Context context, double wlen)
throws IOException , InterruptedException {
String line = value.toString();
// the division must be done this way in the mapper.
Double ival = new Double(line)/wlen;
// Build initial sample window
if(window.size() < windowlength)
{
window.add(ival);
}
// emit first window
if(!initialised && window.size() == windowlength)
{
initialised = true;
emit(thekey, window, context);
thekey++;
return;
}
// Update and emit subsequent windows
if(initialised)
{
// remove oldest datum
window.remove(0);
// add new datum
window.add(ival);
emit(thekey, window, context);
thekey++;
}
}
}
// Transform list to a string and send to reducer. Text to be replaced by ObjectWritable
// Problem: Hadoop apparently requires all output formats to be the same so
// cannot make this output collector differ from the one the reducer uses.
public static void emit(int key, ArrayList<Double> value, Context context) throws IOException , InterruptedException
{
Text tx = new Text();
tx.set(new Integer(key).toString());
String outstring = value.toString();
// remove the square brackets Java puts in
String tidied = outstring.substring(1,outstring.length()-1).trim();
Text out = new Text();
out.set(tidied);
context.write(tx,out);
}
public static class Reduce extends Reducer<Text, Text, Text, Text>
{
public void reduce(Text key,
Iterator<Text> values,
Context context
) throws IOException , InterruptedException
{
while (values.hasNext())
{
computeAverage(key, values, context);
}
}
// computes the average of each window and sends to ouptut collector.
private void computeAverage(Text key, Iterator<Text> values,
Context context)
throws IOException , InterruptedException {
double sum = 0;
String thevalue = values.next().toString();
String[] thenumbers = thevalue.split(",");
for( String temp: thenumbers)
{
// need to trim the string because the constructor does not trim.
Double ds = new Double(temp.trim());
sum += ds;
}
Text out = new Text();
String outstring = Double.toString(sum);
out.set(outstring);
context.write (key, out);
}
}
#Override
public int run(String[] args) throws Exception {
Configuration conf = new Configuration();
args = new GenericOptionsParser(conf, args).getRemainingArgs();
String input = args[0];
String output = args[1];
Job job = new Job(conf, "MA");
job.setJarByClass(MA.class);
job.setInputFormatClass(TextInputFormat.class);
job.setMapperClass(Map.class);
job.setMapOutputKeyClass(Text.class);
job.setMapOutputValueClass(Text.class);
job.setCombinerClass(Reduce.class);
job.setReducerClass(Reduce.class);
job.setOutputFormatClass(TextOutputFormat.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(Text.class);
FileInputFormat.setInputPaths(job, new Path(input));
Path outPath = new Path(output);
FileOutputFormat.setOutputPath(job, outPath);
outPath.getFileSystem(conf).delete(outPath, true);
job.waitForCompletion(true);
return (job.waitForCompletion(true) ? 0 : 1);
}
public static void main(String[] args) throws Exception {
int exitCode = ToolRunner.run(new MA(), args);
System.exit(exitCode);
}
// public static void main(String[] args) throws Exception {
// JobConf conf = new JobConf(MA.class);
// conf.setJobName("MA");
// conf.setOutputKeyClass(Text.class);
// conf.setOutputValueClass(Text.class);
// conf.setMapperClass(Map.class);
// conf.setCombinerClass(Reduce.class);
// conf.setReducerClass(Reduce.class);
// conf.setInputFormat(TextInputFormat.class);
// conf.setOutputFormat(TextOutputFormat.class);
// FileInputFormat.setInputPaths(conf, new Path(args[0]));
// FileOutputFormat.setOutputPath(conf, new Path(args[1]));
// FileInputFormat.setInputPaths(conf, new Path("input/movingaverage.txt"));
// FileOutputFormat.setOutputPath(conf, new Path("output/smoothed"));
// JobClient.runJob(conf);
// }
}
Double ival = new Double(line)/wlen;
is throwing the error, as line is 1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20, which doesn't parse into a Double. Assuming you want each of these numbers to be a double, you need to do:
List<Double> ivals = new ArrayList<>();
String[] numbers = line.split(",");
for (int i = 0; i < numbers.length(); i++) {
ivals.add(new Double(numbers[i])/wlen);
}

InvocationTargetException in Yarn task with Hadoop

While running Kafka -> Apache Apex ->Hbase, it is saying following exception in Yarn tasks:
com.datatorrent.stram.StreamingAppMasterService: Application master, appId=4, clustertimestamp=1479188884109, attemptId=2
2016-11-15 11:59:51,068 INFO org.apache.hadoop.service.AbstractService: Service com.datatorrent.stram.StreamingAppMasterService failed in state INITED; cause: java.lang.RuntimeException: java.lang.reflect.InvocationTargetException
java.lang.RuntimeException: java.lang.reflect.InvocationTargetException
at org.apache.hadoop.fs.AbstractFileSystem.newInstance(AbstractFileSystem.java:130)
at org.apache.hadoop.fs.AbstractFileSystem.createFileSystem(AbstractFileSystem.java:156)
at org.apache.hadoop.fs.AbstractFileSystem.get(AbstractFileSystem.java:241)
at org.apache.hadoop.fs.FileContext$2.run(FileContext.java:333)
at org.apache.hadoop.fs.FileContext$2.run(FileContext.java:330)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614)
at org.apache.hadoop.fs.FileContext.getAbstractFileSystem(FileContext.java:330)
at org.apache.hadoop.fs.FileContext.getFileContext(FileContext.java:444)
And my DataTorrent log showing the following exception. I am running the app which communicates Kafka -> Apex -> Hbase streaming application.
Connecting to ResourceManager at hduser1/127.0.0.1:8032
16/11/15 17:47:38 WARN client.EventsAgent: Cannot read events for application_1479208737206_0008: java.io.FileNotFoundException: File does not exist: /user/hduser1/datatorrent/apps/application_1479208737206_0008/events/index.txt
at org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:66)
at org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:56)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsUpdateTimes(FSNamesystem.java:1893)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsInt(FSNamesystem.java:1834)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1814)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1786)
at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getBlockLocations(NameNodeRpcServer.java:552)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getBlockLocations(ClientNamenodeProtocolServerSideTranslatorPB.java:362)
at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:619)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:962)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2040)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2036)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1656)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2034)
Adding the code :
public void populateDAG(DAG dag, Configuration conf){
KafkaSinglePortInputOperator in
= dag.addOperator("kafkaIn", new KafkaSinglePortInputOperator());
in.setInitialOffset(AbstractKafkaInputOperator.InitialOffset.EARLIEST.name());
LineOutputOperator out = dag.addOperator("fileOut", new LineOutputOperator());
dag.addStream("data", in.outputPort, out.input);}
LineOutputOperator extends AbstractFileOutputOperator
private static final String NL = System.lineSeparator();
private static final Charset CS = StandardCharsets.UTF_8;
#NotNull
private String baseName;
#Override
public byte[] getBytesForTuple(byte[] t) {
String result = new String(t, CS) + NL;
return result.getBytes(CS);
}
#Override
protected String getFileName(byte[] tuple) {
return baseName;
}
public String getBaseName() { return baseName; }
public void setBaseName(String v) { baseName = v; }
How to resolve this problem?
Thanks.
Can you share some details about your environment like what version of hadoop and apex ? Also, which log does this exception appear in ?
Just as a simple sanity check, can you run the simple maven archetype generated application described at: http://docs.datatorrent.com/beginner/
If that works, try running the fileIO and kafka applications at:
https://github.com/DataTorrent/examples/tree/master/tutorials
If those work ok we can look at the details of your code.
I got the solution,
The problem related to expiry of my license, So reinstalled new one and works fine for actual code.

Kite Dataset map-reduce

I am trying to do map-reduce with kite-dataset api.
I have followed below urls to refer.
https://community.cloudera.com/t5/Kite-SDK-includes-Morphlines/Map-Reduce-with-Kite/td-p/22165
https://github.com/kite-sdk/kite/blob/master/kite-data/kite-data-mapreduce/src/test/java/org/kitesdk/data/mapreduce/TestMapReduce.java
My code snippet as below
public class MapReduce {
private static final String sourceDatasetURI = "dataset:hive:test_avro";
private static final String destinationDatasetURI = "dataset:hive:test_parquet";
private static class LineCountMapper
extends Mapper<GenericData.Record, Void, Text, IntWritable> {
#Override
protected void map(GenericData.Record record, Void value,
Context context)
throws IOException, InterruptedException {
System.out.println("Record is "+record);
context.write(new Text(record.get("index").toString()), new IntWritable(1));
}
}
private Job createJob() throws Exception {
System.out.println("Inside Create Job");
Job job = new Job();
job.setJarByClass(getClass());
Dataset<GenericData.Record> inputDataset = Datasets.load(sourceDatasetURI, GenericData.Record.class);
Dataset<GenericData.Record> outputDataset = Datasets.load(destinationDatasetURI, GenericData.Record.class);
DatasetKeyInputFormat.configure(job).readFrom(inputDataset).withType(GenericData.Record.class);
job.setMapperClass(LineCountMapper.class);
DatasetKeyOutputFormat.configure(job).writeTo(outputDataset).withType(GenericData.Record.class);
job.waitForCompletion(true);
return job;
}
public static void main(String[] args) throws Exception {
MapReduce httAvroToParquet = new MapReduce();
httAvroToParquet.createJob();
}
}
I am using HDP 2.3.2 box ,creating assembly jar and submitting my application through spark-submit.
I am getting below error when I submit my application.
2015-12-18 04:09:07,156 WARN [main] org.apache.hadoop.hdfs.shortcircuit.DomainSocketFactory: The short-circuit local reads feature cannot be used because libhadoop cannot be loaded.
2015-12-18 04:09:07,282 INFO [main] org.apache.hadoop.mapreduce.v2.app.MRAppMaster: OutputCommitter set in config null
2015-12-18 04:09:07,333 WARN [main] org.kitesdk.data.spi.Registration: Not loading URI patterns in org.kitesdk.data.spi.hive.Loader
2015-12-18 04:09:07,334 INFO [main] org.apache.hadoop.service.AbstractService: Service org.apache.hadoop.mapreduce.v2.app.MRAppMaster failed in state INITED; cause: org.apache.hadoop.yarn.exceptions.YarnRuntimeException: org.kitesdk.data.DatasetNotFoundException: Unknown dataset URI: hive://{}:9083/default/test_parquet. Check that JARs for hive datasets are on the classpath.
org.apache.hadoop.yarn.exceptions.YarnRuntimeException: org.kitesdk.data.DatasetNotFoundException: Unknown dataset URI: hive://{}:9083/default/test_parquet. Check that JARs for hive datasets are on the classpath.
at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$1.call(MRAppMaster.java:478)
at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$1.call(MRAppMaster.java:458)
at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.callWithJobClassLoader(MRAppMaster.java:1560)
at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.createOutputCommitter(MRAppMaster.java:458)
at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.serviceInit(MRAppMaster.java:377)
at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$4.run(MRAppMaster.java:1518)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.initAndStartAppMaster(MRAppMaster.java:1515)
at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.main(MRAppMaster.java:1448)
Caused by: org.kitesdk.data.DatasetNotFoundException: Unknown dataset URI: hive://{}:9083/default/test_parquet. Check that JARs for hive datasets are on the classpath.
at org.kitesdk.data.spi.Registration.lookupDatasetUri(Registration.java:109)
at org.kitesdk.data.Datasets.load(Datasets.java:103)
at org.kitesdk.data.Datasets.load(Datasets.java:165)
at org.kitesdk.data.mapreduce.DatasetKeyOutputFormat.load(DatasetKeyOutputFormat.java:510)
at org.kitesdk.data.mapreduce.DatasetKeyOutputFormat.getOutputCommitter(DatasetKeyOutputFormat.java:473)
at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$1.call(MRAppMaster.java:476)
... 11 more
I am not getting what's wrong ? Is there any class-path problem ? If yes then where do I set it ?
You effectively have a classpath problem
Your project is missing org.kitesdk:kite-data-hive
You can
Add this jar to your fat jar before submitting it to Spark
Tells Spark to add it to your classpath when you submit

DistributedCache - third party jar not found

I'm trying to get a hold of DistributedCache. I'm using Apache Hadoop 1.2.1 on two nodes.
I referred to the Cloudera post which is simply extended in the other posts that explain how to use third-party jars using -libjars
Note:
In my jar, I haven't included any jar libs. - neither Hadoop core nor commons lang.
The code :
public class WordCounter extends Configured implements Tool {
#Override
public int run(String[] args) throws Exception {
// TODO Auto-generated method stub
// Job job = new Job(getConf(), args[0]);
Job job = new Job(super.getConf(), args[0]);
job.setInputFormatClass(TextInputFormat.class);
job.setOutputFormatClass(TextOutputFormat.class);
job.setJarByClass(WordCounter.class);
FileInputFormat.setInputPaths(job, new Path(args[1]));
FileOutputFormat.setOutputPath(job, new Path(args[2]));
job.setMapperClass(WCMapper.class);
job.setReducerClass(WCReducer.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
int jobState = job.waitForCompletion(true) ? 0 : 1;
return jobState;
}
public static void main(String[] args) throws Exception {
// TODO Auto-generated method stub
if (args == null || args.length < 3) {
System.out.println("The below three arguments are expected");
System.out
.println("<job name> <hdfs path of the input file> <hdfs path of the output file>");
return;
}
WordCounter wordCounter = new WordCounter();
// System.exit(ToolRunner.run(wordCounter, args));
System.exit(ToolRunner.run(new Configuration(), wordCounter, args));
}
}
The Mapper class is naive, its only attempting to use the StringUtils from Apache Commons(and NOT hadoop)
import java.io.IOException;
import java.util.StringTokenizer;
import org.apache.commons.lang3.StringUtils;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper;
/**
* #author 298790
*
*/
public class WCMapper extends Mapper<LongWritable, Text, Text, IntWritable> {
private static IntWritable one = new IntWritable(1);
#Override
protected void map(
LongWritable key,
Text value,
org.apache.hadoop.mapreduce.Mapper<LongWritable, Text, Text, IntWritable>.Context context)
throws IOException, InterruptedException {
// TODO Auto-generated method stub
StringTokenizer strTokenizer = new StringTokenizer(value.toString());
Text token = new Text();
while (strTokenizer.hasMoreTokens()) {
token.set(strTokenizer.nextToken());
context.write(token, one);
}
System.out.println("Converting " + value + " to upper case "
+ StringUtils.upperCase(value.toString()));
}
}
The commands that I use :
bigdata#slave3:~$ export HADOOP_CLASSPATH=dumphere/lib/commons-lang3-3.1.jar
bigdata#slave3:~$
bigdata#slave3:~$ echo $HADOOP_CLASSPATH
dumphere/lib/commons-lang3-3.1.jar
bigdata#slave3:~$
bigdata#slave3:~$ echo $LIBJARS
dumphere/lib/commons-lang3-3.1.jar
bigdata#slave3:~$ hadoop jar dumphere/code/jars/hdp_3rdparty.jar com.hadoop.basics.WordCounter "WordCount" "/input/dumphere/Childhood_days.txt" "/output/dumphere/wc" -libjars ${LIBJARS}
The exception I get :
Warning: $HADOOP_HOME is deprecated.
14/08/13 21:56:05 INFO input.FileInputFormat: Total input paths to process : 1
14/08/13 21:56:05 INFO util.NativeCodeLoader: Loaded the native-hadoop library
14/08/13 21:56:05 WARN snappy.LoadSnappy: Snappy native library not loaded
14/08/13 21:56:05 INFO mapred.JobClient: Running job: job_201408111719_0190
14/08/13 21:56:06 INFO mapred.JobClient: map 0% reduce 0%
14/08/13 21:56:37 INFO mapred.JobClient: Task Id : attempt_201408111719_0190_m_000000_0, Status : FAILED
Error: java.lang.ClassNotFoundException: org.apache.commons.lang3.StringUtils
at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
at com.hadoop.basics.WCMapper.map(WCMapper.java:40)
at com.hadoop.basics.WCMapper.map(WCMapper.java:1)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:364)
at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1190)
at org.apache.hadoop.mapred.Child.main(Child.java:249)
14/08/13 21:56:42 INFO mapred.JobClient: Task Id : attempt_201408111719_0190_m_000000_1, Status : FAILED
Error: java.lang.ClassNotFoundException: org.apache.commons.lang3.StringUtils
at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:423)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
at java.lang.ClassLoader.loadClass(ClassLoader.java:356)
at com.hadoop.basics.WCMapper.map(WCMapper.java:40)
at com.hadoop.basics.WCMapper.map(WCMapper.java:1)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:364)
at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1190)
at org.apache.hadoop.mapred.Child.main(Child.java:249)
The Cloudera post mentions :
The jar will be placed in distributed cache and will be made available to all of the job’s task attempts. More specifically, you will find the JAR in one of the ${mapred.local.dir}/taskTracker/archive/${user.name}/distcache/… subdirectories on local nodes.
But on that path, I'm not able to find the commons-lang3-3.1.jar
What am I missing?

hadoop not running in the multinode cluster

I have a jar file "Tsp.jar" that I made myself. This same jar files executes well in single node cluster setup of hadoop. However when I run it on a cluster comprising 2 machines, a laptop and desktop it gives me an exception when the map function reach 50%. Here is the output
`hadoop#psycho-O:/usr/local/hadoop$ bin/hadoop jar Tsp.jar clust-Tsp_ip1 clust_Tsp_op4
11/04/27 16:13:06 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
11/04/27 16:13:06 WARN mapred.JobClient: No job jar file set. User classes may not be found. See JobConf(Class) or JobConf#setJar(String).
11/04/27 16:13:06 INFO mapred.FileInputFormat: Total input paths to process : 1
11/04/27 16:13:06 INFO mapred.JobClient: Running job: job_201104271608_0001
11/04/27 16:13:07 INFO mapred.JobClient: map 0% reduce 0%
11/04/27 16:13:17 INFO mapred.JobClient: map 50% reduce 0%
11/04/27 16:13:20 INFO mapred.JobClient: Task Id : attempt_201104271608_0001_m_000001_0, Status : FAILED
java.lang.RuntimeException: java.lang.RuntimeException: java.lang.ClassNotFoundException: Tsp$TspReducer
at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:841)
at org.apache.hadoop.mapred.JobConf.getCombinerClass(JobConf.java:853)
at org.apache.hadoop.mapred.Task$CombinerRunner.create(Task.java:1100)
at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.<init>(MapTask.java:812)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:350)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)
at org.apache.hadoop.mapred.Child.main(Child.java:170)
Caused by: java.lang.RuntimeException: java.lang.ClassNotFoundException: Tsp$TspReducer
at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:809)
at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:833)
... 6 more
Caused by: java.lang.ClassNotFoundException: Tsp$TspReducer
at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
at java.lang.ClassLoader.loadClass(ClassLoader.java:307)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
at java.lang.ClassLoader.loadClass(ClassLoader.java:248)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:247)
at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:762)
at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:807)
... 7 more
11/04/27 16:13:20 WARN mapred.JobClient: Error reading task outputemil-desktop
11/04/27 16:13:20 WARN mapred.JobClient: Error reading task outputemil-desktop
^Z
[1]+ Stopped bin/hadoop jar Tsp.jar clust-Tsp_ip1 clust_Tsp_op4
hadoop#psycho-O:~$ jps
4937 Jps
3976 RunJar
`
Alse the cluster worked fine executing the wordcount example. So I guess its the problem with the Tsp.jar file.
1) Is it necessary to have a jar file to run on a cluster?
2) Here I tried to run a jar file in the cluster which I made. But is still gives a warning that jar file is not found. Why is that?
3) What all should be taken care of when running a jar file? Like what all it must contain other than the program which I wrote? My jar file contains a a Tsp.class, Tsp$TspReducer.class and a Tsp$TspMapper.class. The terminal says it cant find Tsp$TspReducer when it is already there in the jar file.
Thankyou
EDIT
public class Tsp {
public static void main(String[] args) throws IOException {
JobConf conf = new JobConf(Tsp.class);
conf.setJobName("Tsp");
conf.setOutputKeyClass(Text.class);
conf.setOutputValueClass(Text.class);
conf.setMapperClass(TspMapper.class);
conf.setCombinerClass(TspReducer.class);
conf.setReducerClass(TspReducer.class);
FileInputFormat.addInputPath(conf,new Path(args[0]));
FileOutputFormat.setOutputPath(conf,new Path(args[1]));
JobClient.runJob(conf);
}
public static class TspMapper extends MapReduceBase
implements Mapper<LongWritable, Text, Text, Text> {
function findCost() {
}
public void map(LongWritable key,Text value, OutputCollector<Text, Text> output, Reporter reporter) throws IOException {
find adjacency matrix from the input;
for(int i = 0; ...) {
.....
output.collect(new Text(string1), new Text(string2));
}
}
}
public static class TspReducer extends MapReduceBase implements Reducer<Text, Text, Text, Text> {
Text t1 = new Text();
public void reduce(Text key, Iterator<Text> values, OutputCollector<Text, Text> output, Reporter reporter) throws IOException {
String a;
a = values.next().toString();
output.collect(key,new Text(a));
}
}
}
You currently have
conf.setJobName("Tsp");
conf.setOutputKeyClass(Text.class);
conf.setOutputValueClass(Text.class);
conf.setMapperClass(TspMapper.class);
conf.setCombinerClass(TspReducer.class);
conf.setReducerClass(TspReducer.class);
and as the error is stating No job jar file set you are not setting a jar.
You will need to something similar to
conf.setJarByClass(Tsp.class);
From what I'm seeing, that should resolve the error seen here.
11/04/27 16:13:06 WARN mapred.JobClient: No job jar file set. User classes may not be found. See JobConf(Class) or JobConf#setJar(String).
Do what they say, when setting up your job, set the jar where the class is contained. Hadoop copies the jar into the DistributedCache (a filesystem on every node) and uses the classes out of it.
I had the exact same issue. Here is how I solved the problem(imagine your map reduce class is called A). After creating the job call:
job.setJarByClass(A.class);

Resources