hbase co-processor failing to load from shell - hadoop

I am trying to add a coprocessor with one hbase table and it is failing with error -
2016-03-15 14:40:14,130 INFO org.apache.hadoop.hbase.regionserver.RSRpcServices: Open PRODUCT_DETAILS,,1457953190424.f687dd250bfd1f18ffbb8075fd625145.
2016-03-15 14:40:14,173 ERROR org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost: Failed to load coprocessor com.optymyze.coprocessors.ProductObserver
java.io.IOException: Failed on local exception: com.google.protobuf.InvalidProtocolBufferException: Protocol message end-group tag did not match expected tag.; Host Details : local host is: "mylocalhost/mylocalhostip"; destination host is: "mydestinationhost":9000;
at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:772)
to add co processor I did following -
hbase> disable 'PRODUCT_DETAILS'
hbase> alter 'PRODUCT_DETAILS', METHOD => 'table_att', 'coprocessor'=>'hdfs://mydestinationhost:9000/hbase-coprocessors-0.0.3-SNAPSHOT.jar|com.optymyze.coprocessors.ProductObserver|1001|arg1=1,arg2=2'
now enable 'PRODUCT_DETAILS' won't work.
co processor code is as follows-
package com.optymyze.coprocessors;
import org.apache.hadoop.hbase.KeyValue;
import org.apache.hadoop.hbase.client.HTableInterface;
import org.apache.hadoop.hbase.client.Put;
import org.apache.hadoop.hbase.coprocessor.BaseRegionObserver;
import org.apache.hadoop.hbase.coprocessor.ObserverContext;
import org.apache.hadoop.hbase.coprocessor.RegionCoprocessorEnvironment;
import org.apache.hadoop.hbase.regionserver.wal.WALEdit;
import org.apache.hadoop.hbase.util.Bytes;
import org.slf4j.Logger;
import java.io.IOException;
import java.util.ArrayList;
import java.util.HashMap;
import java.util.List;
import java.util.Map;
import static org.slf4j.LoggerFactory.*;
/**
*
* Created by adnan on 14-03-2016.
*/
public class ProductObserver extends BaseRegionObserver {
private static final Logger LOGGER = getLogger(ProductObserver.class);
private static final String PRODUCT_DETAILS_TABLE = "PRODUCT_DETAILS";
public static final String COLUMN_FAMILY = "CF";
#Override
public void postPut(ObserverContext<RegionCoprocessorEnvironment> e, Put put, WALEdit edit, boolean writeToWAL) throws IOException {
List<KeyValue> kvs = put.getFamilyMap().get(Bytes.toBytes(COLUMN_FAMILY));
LOGGER.info("key values {}", kvs);
Map<String, Integer> qualifierVsValue = getMapForQualifierVsValuesForRequiredOnes(kvs);
LOGGER.info("qualifier values {}", qualifierVsValue);
List<Put> puts = createPuts(kvs, qualifierVsValue);
LOGGER.info("puts values {}", puts);
updateProductTable(e, puts);
LOGGER.info("puts done");
}
private void updateProductTable(ObserverContext<RegionCoprocessorEnvironment> e, List<Put> puts) throws IOException {
HTableInterface productTable = e.getEnvironment().getTable(Bytes.toBytes(PRODUCT_DETAILS_TABLE));
try {
productTable.put(puts);
}finally {
productTable.close();
}
}
private List<Put> createPuts(List<KeyValue> kvs, Map<String, Integer> qualifierVsValue) {
int salePrice, baseline = 0, finalPrice = 0;
List<Put> puts = new ArrayList<Put>(kvs.size());
for (KeyValue kv : kvs) {
if (kv.matchingQualifier(Bytes.toBytes("BASELINE"))) {
baseline = convertToZeroIfNull(qualifierVsValue, "PRICE")
- convertToZeroIfNull(qualifierVsValue, "PRICE")
* convertToZeroIfNull(qualifierVsValue, "DISCOUNT") / 100;
puts.add(newPut(kv, baseline));
}
if (kv.matchingQualifier(Bytes.toBytes("FINALPRICE"))) {
finalPrice = baseline + baseline * convertToZeroIfNull(qualifierVsValue, "UPLIFT") / 100;
puts.add(newPut(kv, finalPrice));
}
if (kv.matchingQualifier(Bytes.toBytes("SALEPRICE"))) {
salePrice = finalPrice * convertToZeroIfNull(qualifierVsValue, "VOLUME");
puts.add(newPut(kv, salePrice));
}
}
return puts;
}
private Map<String, Integer> getMapForQualifierVsValuesForRequiredOnes(List<KeyValue> kvs) {
Map<String, Integer> qualifierVsValue = new HashMap<String, Integer>();
for (KeyValue kv : kvs) {
getValueFromQualifier(kv, "PRICE", qualifierVsValue);
getValueFromQualifier(kv, "DISCOUNT", qualifierVsValue);
getValueFromQualifier(kv, "UPLIFT", qualifierVsValue);
getValueFromQualifier(kv, "VOLUME", qualifierVsValue);
}
return qualifierVsValue;
}
private Integer convertToZeroIfNull(Map<String, Integer> qualifierVsValue, String qualifier) {
Integer v = qualifierVsValue.get(qualifier);
return v == null ? 0 : v;
}
private void getValueFromQualifier(KeyValue kv, String qualifier, Map<String, Integer> qualifierVsValue) {
if (kv.matchingQualifier(Bytes.toBytes(qualifier))) {
qualifierVsValue.put(qualifier, Bytes.toInt(convertToByteZeroIfNull(kv)));
}
}
private Put newPut(KeyValue kv, int newVal) {
Put put = new Put(kv.getValue(), kv.getTimestamp());
put.add(kv.getFamily(), kv.getQualifier(), Bytes.toBytes(newVal));
return put;
}
private byte[] convertToByteZeroIfNull(KeyValue kv) {
return kv.getValue() == null ? Bytes.toBytes(0) : kv.getValue();
}
}

Related

NiFI "unable to find flowfile content"

I am using nifi 1.6 and get the following errors when trying to modify a clone of an incoming flowFile:
[1]"unable to find content for FlowFile: ... MissingFlowFileException
...
Caused by ContentNotFoundException: Could not find contetn for StandardClaim
...
Caused by java.io.EOFException: null"
[2]"FlowFileHandlingException: StandardFlowFileRecord... is not known in this session"
The first error occurs when trying to access the contents of the flow file, the second when removing the flow file from the session (within a catch of the first). This process is known to have worked under nifi 0.7.
The basic process is:
Clone the incoming flow file
Write to the clone
Write to the clone again (some additional formatting)
Repeat 1-3
The error occurs on the second iteration step 3.
An interesting point is that if immediately after the clone is performed, a session.read of the clone is done everything works fine. The read seems to reset some pointer.
I have created unit tests for this processor, but they do not fail in either case.
Below is code simplified from the actual version in use that demonstrates the issue. (The development system is not connected so I had to copy the code. Please forgive any typos - it should be close. This is also why a full stack trace is not provided.) The processor doing the work has a property to determine if an immediate read should be done, or not. So both scenarios can be performed easily. To set it up, all that is needed is a GetFile processor to supply the input and terminators for the output from the SampleCloningProcessor. A sample input file is included as well. The meat of the code is in the onTrigger and manipulate methods. The manipulation in this simplified version really don't do anything but copy the input to the output.
Any insights into why this is happening and suggestions for corrections will be appreciated - thanks.
SampleCloningProcessor.java
processor sample.package.cloning
import java.io.IOException;
import java.io.InputStream;
import java.io.InputStreamReader;
import java.io.OutputStream;
import java.io.Reader;
import java.util.Arrays;
import java.util.Hashset;
import java.util.List;
import java.util.Scanner;
import java.util.Set;
import org.apache.commons.compress.utils.IOUtils;
import org.apache.nifi.annotation.documentaion.CapabilityDescription;
import org.apache.nifi.annotation.documentaion.Tags;
import org.apache.nifi.componets.PropertyDescriptor;
import org.apache.nifi.flowfile.FlowFile;
import org.apache.nifi.processor.AbstractProcessor;
import org.apache.nifi.processor.ProcessorContext;
import org.apache.nifi.processor.ProcessorSession;
import org.apache.nifi.processor.ProcessorInitioalizationContext;
import org.apache.nifi.processor.Relationship;
import org.apache.nifi.processor.exception.ProcessException;
import org.apache.nifi.processor.io.InputStreamCalback;
import org.apache.nifi.processor.io.OutputStreamCalback;
import org.apache.nifi.processor.io.StreamCalback;
import org.apache.nifi.processor.util.StandardValidators;
import com.google.gson.Gson;
#Tags({"example", "clone"})
#CapabilityDescription("Demonsrates cloning of flowfile failure.")
public class SampleCloningProcessor extend AbstractProcessor {
/* Determines if an immediate read is performed after cloning of inoming flowfile. */
public static final PropertyDescriptor IMMEDIATE_READ = new PropertyDescriptor.Builder()
.name("immediateRead")
.description("Determines if processor runs successfully. If a read is done immediatly "
+ "after the clone of the incoming flowFile, then the processor should run successfully.")
.required(true)
.allowableValues("true", "false")
.defaultValue("true")
.addValidator(StandardValidators.BOLLEAN_VALIDATOR)
.build();
public static final Relationship SUCCESS = new Relationship.Builder().name("success").
description("No unexpected errors.").build();
public static final Relationship FAILURE = new Relationship.Builder().name("failure").
description("Errors were thrown.").build();
private Set<Relationship> relationships;
private List<PropertyDescriptors> properties;
#Override
public void init(final ProcessorInitializationContext contex) {
relationships = new HashSet<>(Arrays.asList(SUCCESS, FAILURE));
properties = new Arrays.asList(IMMEDIATE_READ);
}
#Override
public Set<Relationship> getRelationships() {
return this.relationships;
}
#Override
public List<PropertyDescriptor> getSuppprtedPropertyDescriptors() {
return this.properties;
}
#Override
public void onTrigger(final ProcessContext context, final ProcessSession session) throws ProcessException {
FlowFile incomingFlowFile = session.get();
if (incomingFlowFile == null) {
return;
}
try {
final InfileReader inFileReader = new InfileReader();
session.read(incomingFlowFile, inFileReader);
Product product = infileReader.getProduct();
boolean transfer = false;
getLogger().info("\tSession :\n" + session);
getLogger().info("\toriginal :\n" + incomingFlowFile);
for(int i = 0; i < 2; i++) {
transfer = manipulate(context, session, inclmingFlowFile, product);
}
} catch (Exception e) {
getLogger().error(e.getMessage(), e);
session.rollback(true);
}
}
private boolean manipuate(final ProcessContext context, final ProcessSession session
final FlowFile incomingFlowFile, final Product product) {
boolean transfer = false;
FlowFile outgoingFlowFile = null;
boolean immediateRead = context.getProperty(IMMEDIATE_READ).asBoolean();
try {
//Clone incoming flowFile
outgoinFlowFile = session.clone(incomingFlowFile);
getLogger().info("\tclone outgoing :\n" + outgoingFlowFile);
if(immediateRead) {
readFlowFile(session, outgoingFlowFile);
}
//First write into clone
StageOneWrite stage1Write = new StaeOneWrite(product);
outgoingFlowFile = session.write(outgoingFlowFile, stage1Write);
getLogger().info("\twrite outgoing :\n" + outgoingFlowFile);
// Format the cloned file with another write
outgoingFlowFile = formatFlowFile(outgoingFlowFile, session)
getLogger().info("\format outgoing :\n" + outgoingFlowFile);
session.transfer(outgoingFlowFile, SUCCESS);
transfer != true;
} catch(Exception e)
getLogger().error(e.getMessage(), e);
if(outgoingFlowFile ! = null) {
session.remove(outgoingFlowFile);
}
}
return transfer;
}
private void readFlowFile(fainl ProcessSession session, fianl Flowfile flowFile) {
session.read(flowFile, new InputStreamCallback() {
#Override
public void process(Final InputStream in) throws IOException {
try (Scanner scanner = new Scanner(in)) {
scanner.useDelimiter("\\A").next();
}
}
});
}
private FlowFile formatFlowFile(fainl ProcessSession session, FlowFile flowfile) {
OutputFormatWrite formatWrite = new OutputFormatWriter();
flowfile = session.write(flowFile, formatWriter);
return flowFile;
}
private static class OutputFormatWriter implement StreamCallback {
#Override
public void process(final InputStream in, final OutputStream out) throws IOException {
try {
IOUtils.copy(in. out);
out.flush();
} finally {
IOUtils.closeQuietly(in);
IOUtils.closeQuietly(out);
}
}
}
private static class StageOneWriter implements OutputStreamCallback {
private Product product = null;
public StageOneWriter(Produt product) {
this.product = product;
}
#Override
public void process(final OutputStream out) throws IOException {
final Gson gson = new Gson();
final String json = gson.toJson(product);
out.write(json.getBytes());
}
}
private static class InfileReader implements InputStreamCallback {
private Product product = null;
public StageOneWriter(Produt product) {
this.product = product;
}
#Override
public void process(final InputStream out) throws IOException {
product = null;
final Gson gson = new Gson();
Reader inReader = new InputStreamReader(in, "UTF-8");
product = gson.fromJson(inreader, Product.calss);
}
public Product getProduct() {
return product;
}
}
SampleCloningProcessorTest.java
package sample.processors.cloning;
import org.apache.nifi.util.TestRunner;
import org.apache.nifi.util.TestRunners;
import org.junit.Before;
import org.junit.Test;
public class SampleCloningProcessorTest {
final satatic String flowFileContent = "{"
+ "\"cost\": \"cost 1\","
+ "\"description\": \"description","
+ "\"markup\": 1.2"
+ "\"name\":\"name 1\","
+ "\"supplier\":\"supplier 1\","
+ "}";
private TestRunner testRunner;
#Before
public void init() {
testRunner = TestRunner.newTestRunner(SampleCloningProcessor.class);
testRunner.enqueue(flowFileContent);
}
#Test
public void testProcessorImmediateRead() {
testRunner.setProperty(SampleCloningProcessor.IMMEDIATE_READ, "true");
testRunner.run();
testRinner.assertTransferCount("success", 2);
}
#Test
public void testProcessorImmediateRead_false() {
testRunner.setProperty(SampleCloningProcessor.IMMEDIATE_READ, "false");
testRunner.run();
testRinner.assertTransferCount("success", 2);
}
}
Product.java
package sample.processors.cloning;
public class Product {
private String name;
private String description;
private String supplier;
private String cost;
private float markup;
public String getName() {
return name;
}
public void setName(final String name) {
this.name = name;
}
public String getDescription() {
return description;
}
public void setDescriptione(final String description) {
this.description = description;
}
public String getSupplier() {
return supplier;
}
public void setSupplier(final String supplier) {
this.supplier = supplier;
}
public String getCost() {
return cost;
}
public void setCost(final String cost) {
this.cost = cost;
}
public float getMarkup() {
return markup;
}
public void setMarkup(final float name) {
this.markup = markup;
}
}
product.json A sample input file.
{
"const" : "cost 1",
"description" : "description 1",
"markup" : 1.2,
"name" : "name 1",
"supplier" : "supplier 1"
}
Reported as a bug in Nifi. Being addressed by https://issues.apache.org/jira/browse/NIFI-5879

How to convert .yml to .properties with a gradle task?

The only supported i18n formats for Spring are .properties and .xml, but it's not really optimal.
What I'd like is to have a complex Yaml file (messages.yml and messages_xx.yml) that get converted to .properties in a Gradle task so I can queue it before Build task.
For example, a messages.yml would look like:
group1:
group2:
group3:
message1: hello
message2: how are you?
group4:
message3: good
group5:
group6:
message4: let's party
And the output .properties would be:
group1.group2.group3.message1: hello
group1.group2.group3.message2: how are you?
group1.group2.group4.message3: good
group1.group5.group6.message4: let's party
Is there a way to achieve this?
I didn't find existing converters.
import java.io.IOException;
import java.io.InputStream;
import java.nio.file.Files;
import java.nio.file.Paths;
import java.util.Map;
import java.util.TreeMap;
import org.yaml.snakeyaml.Yaml;
public class YamlBackToProperties {
public static void main(String[] args) throws IOException {
Yaml yaml = new Yaml();
try (InputStream in = Files.newInputStream(Paths.get("test.yml"))) {
TreeMap<String, Map<String, Object>> config = yaml.loadAs(in, TreeMap.class);
System.out.println(String.format("%s%n\nConverts to Properties:%n%n%s", config.toString(), toProperties(config)));
}
}
private static String toProperties(TreeMap<String, Map<String, Object>> config) {
StringBuilder sb = new StringBuilder();
for (String key : config.keySet()) {
sb.append(toString(key, config.get(key)));
}
return sb.toString();
}
private static String toString(String key, Map<String, Object> map) {
StringBuilder sb = new StringBuilder();
for (String mapKey : map.keySet()) {
if (map.get(mapKey) instanceof Map) {
sb.append(toString(String.format("%s.%s", key, mapKey), (Map<String, Object>) map.get(mapKey)));
} else {
sb.append(String.format("%s.%s=%s%n", key, mapKey, map.get(mapKey).toString()));
}
}
return sb.toString();
}
}
Made some changes to the first answer. Now works for me for all cases:
import java.io.IOException;
import java.io.InputStream;
import java.nio.file.Files;
import java.nio.file.Paths;
import java.util.Map;
import java.util.TreeMap;
import org.yaml.snakeyaml.Yaml;
public class YamlConverter {
public static void main(String[] args) throws IOException {
Yaml yaml = new Yaml();
try (InputStream in = Files.newInputStream(Paths.get("yourpath/application.yml"))) {
TreeMap<String, Map<String, Object>> config = yaml.loadAs(in, TreeMap.class);
System.out.println(String.format("%s%n\nConverts to Properties:%n%n%s", config.toString(), toProperties(config)));
}
}
private static String toProperties(TreeMap<String, Map<String, Object>> config) {
StringBuilder sb = new StringBuilder();
for (String key : config.keySet()) {
sb.append(toString(key, config.get(key)));
}
return sb.toString();
}
private static String toString(String key, Object mapr) {
StringBuilder sb = new StringBuilder();
if(!(mapr instanceof Map)) {
sb.append(key+"="+mapr+"\n");
return sb.toString();
}
Map<String, Object> map = (Map<String, Object>)mapr;
for (String mapKey : map.keySet()) {
if (map.get(mapKey) instanceof Map) {
sb.append(toString(key+"."+mapKey, map.get(mapKey)));
} else {
sb.append(String.format("%s.%s=%s%n", key, mapKey, map.get(mapKey).toString()));
}
}
return sb.toString();
}
}
Here's a straighforward implementation in Kotlin:
Once you have a Map with the parsed Yaml just call flatten():
fun flatten(map: Map<String, *>): MutableMap<String, Any> {
val processed = mutableMapOf<String, Any>()
map.forEach { key, value ->
doFlatten(key, value as Any, processed)
}
return processed
}
fun doFlatten(parentKey: String, value: Any, processed: MutableMap<String, Any>) {
if (value is Map<*, *>) {
value.forEach {
doFlatten("$parentKey.${it.key}", it.value as Any, processed)
}
} else {
processed[parentKey] = value
}
}
You can try like this
package com.example.yaml;
import java.io.IOException;
import java.io.InputStream;
import java.nio.file.Files;
import java.nio.file.Paths;
import org.yaml.snakeyaml.Yaml;
public class YamlConfigRunner {
public static void main(String[] args) throws IOException {
if( args.length != 1 ) {
System.out.println( "Usage: <file.yml>" );
return;
}
Yaml yaml = new Yaml();
try( InputStream in = Files.newInputStream( Paths.get( args[ 0 ] ) ) ) {
Configuration config = yaml.loadAs( in, Configuration.class );
System.out.println( config.toString() );
}
}
}
reference: https://dzone.com/articles/using-yaml-java-application

Confluent Kafka Avro deserializer for spring boot kafka listener

Does somebody implemented confluent-kafka messages deserializer to consume kafka messages by spring "#KafkaListener"-s ?
Here is my answer, which I've implemented based on: "io.confluent.kafka.serializers.AbstractKafkaAvroDeserializer"
import java.io.IOException;
import java.nio.ByteBuffer;
import java.util.Arrays;
import java.util.Map;
import javax.xml.bind.DatatypeConverter;
import org.apache.avro.Schema;
import org.apache.avro.io.DatumReader;
import org.apache.avro.io.DecoderFactory;
import org.apache.avro.specific.SpecificDatumReader;
import org.apache.avro.specific.SpecificRecordBase;
import org.apache.kafka.common.errors.SerializationException;
import org.apache.kafka.common.serialization.Deserializer;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
public class AvroConfluentDeserializer<T extends SpecificRecordBase> implements Deserializer<T> {
private static final Logger LOG = LoggerFactory.getLogger(AvroConfluentDeserializer.class);
protected static final byte MAGIC_BYTE = 0x0;
protected static final int idSize = 4;
private final DecoderFactory decoderFactory = DecoderFactory.get();
protected final Class<T> targetType;
public AvroConfluentDeserializer(Class<T> targetType) {
this.targetType = targetType;
}
#Override
public void close() {
// No-op
}
#Override
public void configure(Map<String, ?> arg0, boolean arg1) {
// No-op
}
#Override
public T deserialize(String topic, byte[] data) {
try {
T result = null;
if (data != null) {
LOG.info("data='{}'", DatatypeConverter.printHexBinary(data));
result = (T) deserializePayload(data, targetType.newInstance().getSchema());
LOG.info("deserialized data='{}'", result);
}
return result;
} catch (Exception ex) {
throw new SerializationException(
"Can't deserialize data '" + Arrays.toString(data) + "' from topic '" + topic + "'", ex);
}
}
protected T deserializePayload(byte[] payload, Schema schema) throws SerializationException {
int id = -1;
try {
ByteBuffer buffer = getByteBuffer(payload);
id = buffer.getInt();
int length = buffer.limit() - 1 - idSize;
int start = buffer.position() + buffer.arrayOffset();
DatumReader<T> reader = new SpecificDatumReader<T>(schema);
return reader.read(null, decoderFactory.binaryDecoder(buffer.array(), start, length, null));
} catch (IOException | RuntimeException e) {
throw new SerializationException("Error deserializing Avro message for id " + id, e);
}
}
private ByteBuffer getByteBuffer(byte[] payload) {
ByteBuffer buffer = ByteBuffer.wrap(payload);
if (buffer.get() != MAGIC_BYTE) {
throw new SerializationException("Unknown magic byte!");
}
return buffer;
}
}

I want to show max,min and avg temperature using hadoop

My project is to show max,min and avg temperature. I have already done it, but I have to show this functions using group by key. There are 4 radio buttons for Year, month, date and city in my application. If I select one then it will ask me to input the aggregate functions(max,min,avg). For these I need to change my CompositeGroupKey class, but I don't have any idea about that. So please help me, and provide inputs about the changes need to be done with the code.
The driver :
import org.apache.hadoop.io.*;
import org.apache.hadoop.fs.*;
import org.apache.hadoop.mapreduce.*;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
public class MaxTemperature
{
public static void Main (String[] args) throws Exception
{
if (args.length != 2)
{
System.err.println("Please Enter the input and output parameters");
System.exit(-1);
}
Job job = new Job();
job.setJarByClass(MaxTemperature.class);
job.setJobName("Max temperature");
FileInputFormat.addInputPath(job,new Path(args[0]));
FileOutputFormat.setOutputPath(job,new Path (args[1]));
job.setMapperClass(MaxTemperatureMapper.class);
job.setReducerClass(MaxTemperatureReducer.class);
job.setMapOutputKeyClass(CompositeGroupKey.class);
job.setMapOutputValueClass(IntWritable.class);
job.setOutputKeyClass(CompositeGroupKey.class);
job.setOutputValueClass(DoubleWritable.class);
System.exit(job.waitForCompletion(true)?0:1);
}
}
The mapper :
import org.apache.hadoop.io.*;
import org.apache.hadoop.mapreduce.*;
import java.io.IOException;
public class MaxTemperatureMapper extends Mapper <LongWritable, Text, CompositeGroupKey, IntWritable>
{
public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException
{
String line = value.toString();
int year = Integer.parseInt(line.substring(0,4));
String mnth = line.substring(7,10);
int date = Integer.parseInt(line.substring(10,12));
int temp= Integer.parseInt(line.substring(12,14));
CompositeGroupKey cntry = new CompositeGroupKey(year,mnth, date);
context.write(cntry, new IntWritable(temp));
}
}
The reducer :
import org.apache.hadoop.io.DoubleWritable;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.mapreduce.*;
import java.io.IOException;
public class MaxTemperatureReducer extends Reducer <CompositeGroupKey, IntWritable, CompositeGroupKey, CompositeGroupkeyall >{
public void reduce(CompositeGroupKey key, Iterable<IntWritable> values , Context context) throws IOException,InterruptedException
{
Double max = Double.MIN_VALUE;
Double min =Double.MAX_VALUE;
for (IntWritable value : values )
{
min = Math.min(min, value.get());
max = Math.max(max, value.get());
}
CompositeGroupkeyall val =new CompositeGroupkeyall(max,min);
context.write(key, val);
}
}
And the composite key :
import java.io.DataInput;
import java.io.DataOutput;
import java.io.IOException;
import org.apache.hadoop.io.WritableComparable;
import org.apache.hadoop.io.WritableUtils;
class CompositeGroupKey implements WritableComparable<CompositeGroupKey> {
int year;
String mnth;
int date;
CompositeGroupKey(int y, String c, int d){
year = y;
mnth = c;
date = d;
}
CompositeGroupKey(){}
public void write(DataOutput out) throws IOException {
out.writeInt(year);
WritableUtils.writeString(out, mnth);
out.writeInt(date);
}
public void readFields(DataInput in) throws IOException {
this.year = in.readInt();
this.mnth = WritableUtils.readString(in);
this.date = in.readInt();
}
public int compareTo(CompositeGroupKey pop) {
if (pop == null)
return 0;
int intcnt;
intcnt = Integer.valueOf(year).toString().compareTo(Integer.valueOf(pop.year).toString());
if(intcnt != 0){
return intcnt;
}else if(mnth.compareTo(pop.mnth) != 0){
return mnth.compareTo(pop.mnth);
}else{
return Integer.valueOf(date).toString().compareTo(Integer.valueOf(pop.date).toString());
}
}
public String toString() {
return year + " :" + mnth.toString() + " :" + date;
}
}
import java.io.DataInput;
import java.io.DataOutput;
import java.io.IOException;
import org.apache.hadoop.io.WritableComparable;
class CompositeGroupkeyall implements WritableComparable<CompositeGroupkeyall> {
Double max;
Double min;
CompositeGroupkeyall(double x, double y){
max = x ;
min = y ;
}
CompositeGroupkeyall(){}
public void readFields(DataInput in) throws IOException {
this.max = in.readDouble();
this.min = in.readDouble();
}
public void write(DataOutput out) throws IOException {
out.writeDouble(max);
out.writeDouble(min);
}
public int compareTo(CompositeGroupkeyall arg0) {
return -1;
}
public String toString() {
return max + " " + min +" " ;
}
}
You can create more key value pairs as below and let the same reducer process the data, all the date/month/year will be processed by the same reducer
CompositeGroupKey cntry = new CompositeGroupKey(year, mnth, date);
CompositeGroupKey cntry_date = new CompositeGroupKey((int)0, "ALL", date);
CompositeGroupKey cntry_mnth = new CompositeGroupKey((int)0, mnth, (int) 1);
CompositeGroupKey cntry_year = new CompositeGroupKey(year, "ALL", (int) 1);
context.write(cntry, new IntWritable(temp));
context.write(cntry_date, new IntWritable(temp));
context.write(cntry_mnth, new IntWritable(temp));
context.write(cntry_year, new IntWritable(temp));

Using Multiple Mappers for multiple output directories in Hadoop MapReduce

I want to run two mappers that produce two different outputs in different directories.The output of the first mapper(Send as argument) should be send to the input of the second mapper.i have this code in the driver class
import java.io.IOException;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.input.TextInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.mapreduce.lib.output.MultipleOutputs;
import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat;
public class Export_Column_Mapping
{
private static String[] Detail_output_column_array = new String[27];
private static String[] Shop_output_column_array = new String[8];
private static String details_output = null ;
private static String Shop_output = null;
public static void main(String[] args) throws Exception
{
String Output_filetype = args[3];
String Input_column_number = args[4];
String Output_column_number = args[5];
Configuration Detailsconf = new Configuration(false);
Detailsconf.setStrings("output_filetype",Output_filetype);
Detailsconf.setStrings("Input_column_number",Input_column_number);
Detailsconf.setStrings("Output_column_number",Output_column_number);
Job Details = new Job(Detailsconf," Export_Column_Mapping");
Details.setJarByClass(Export_Column_Mapping.class);
Details.setJobName("DetailsFile_Job");
Details.setMapperClass(DetailFile_Mapper.class);
Details.setNumReduceTasks(0);
Details.setInputFormatClass(TextInputFormat.class);
Details.setOutputFormatClass(TextOutputFormat.class);
FileInputFormat.setInputPaths(Details, new Path(args[0]));
FileOutputFormat.setOutputPath(Details, new Path(args[1]));
if(Details.waitForCompletion(true))
{
Configuration Shopconf = new Configuration();
Job Shop = new Job(Shopconf,"Export_Column_Mapping");
Shop.setJarByClass(Export_Column_Mapping.class);
Shop.setJobName("ShopFile_Job");
Shop.setMapperClass(ShopFile_Mapper.class);
Shop.setNumReduceTasks(0);
Shop.setInputFormatClass(TextInputFormat.class);
Shop.setOutputFormatClass(TextOutputFormat.class);
FileInputFormat.setInputPaths(Shop, new Path(args[1]));
FileOutputFormat.setOutputPath(Shop, new Path(args[2]));
MultipleOutputs.addNamedOutput(Shop, "text", TextOutputFormat.class,LongWritable.class, Text.class);
System.exit(Shop.waitForCompletion(true) ? 0 : 1);
}
}
public static class DetailFile_Mapper extends Mapper<LongWritable,Text,Text,Text>
{
public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException
{
String str_Output_filetype = context.getConfiguration().get("output_filetype");
String str_Input_column_number = context.getConfiguration().get("Input_column_number");
String[] input_columns_number = str_Input_column_number.split(",");
String str_Output_column_number= context.getConfiguration().get("Output_column_number");
String[] output_columns_number = str_Output_column_number.split(",");
String str_line = value.toString();
String[] input_column_array = str_line.split(",");
try
{
for(int i = 0;i<=input_column_array.length+1; i++)
{
int int_outputcolumn = Integer.parseInt(output_columns_number[i]);
int int_inputcolumn = Integer.parseInt(input_columns_number[i]);
if((int_inputcolumn != 0) && (int_outputcolumn != 0) && output_columns_number.length == input_columns_number.length)
{
Detail_output_column_array[int_outputcolumn-1] = input_column_array[int_inputcolumn-1];
if(details_output != null)
{
details_output = details_output+" "+ Detail_output_column_array[int_outputcolumn-1];
Shop_output = Shop_output+" "+ Shop_output_column_array[int_outputcolumn-1];
}else
{
details_output = Detail_output_column_array[int_outputcolumn-1];
Shop_output = Shop_output_column_array[int_outputcolumn-1];
}
}
}
}catch (Exception e)
{
}
context.write(null,new Text(details_output));
}
}
public static class ShopFile_Mapper extends Mapper<LongWritable,Text,Text,Text>
{
public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException
{
try
{
for(int i = 0;i<=Shop_output_column_array.length; i++)
{
Shop_output_column_array[0] = Detail_output_column_array[0];
Shop_output_column_array[1] = Detail_output_column_array[1];
Shop_output_column_array[2] = Detail_output_column_array[2];
Shop_output_column_array[3] = Detail_output_column_array[3];
Shop_output_column_array[4] = Detail_output_column_array[14];
if(details_output != null)
{
Shop_output = Shop_output+" "+ Shop_output_column_array[i];
}else
{
Shop_output = Shop_output_column_array[i-1];
}
}
}catch (Exception e){
}
context.write(null,new Text(Shop_output));
}
}
}
I get the error..
Error:org.apache.hadoop.mapreduce.lib.input.InvalidInputException:
Input path does not exist:
file:/home/Barath.B.Natarajan.ap/rules/text.txt
I want to run the jobs one by one can any one help me in this?...
There is something called jobcontrol with which you will be able to achieve it.
Suppose there are two jobs A and B
ControlledJob A= new ControlledJob(JobConf for A);
ControlledJob B= new ControlledJob(JobConf for B);
B.addDependingJob(A);
JobControl jControl = newJobControl("Name");
jControl.addJob(A);
jControl.addJob(B);
Thread runJControl = new Thread(jControl);
runJControl.start();
while (!jControl.allFinished()) {
code = jControl.getFailedJobList().size() == 0 ? 0 : 1;
Thread.sleep(1000);
}
System.exit(1);
Initialize code at the beginning like this:
int code =1;
Let the first job in your case be the first mapper with zero reducer and second job be the second mapper with zero reducer.The configuration should be such that the input path of B and output path of A should be same.

Resources