Apache Beam - Unable to infer a Coder on a DoFn with multiple output tags - java-8

I am trying to execute a pipeline using Apache Beam but I get an error when trying to put some output tags:
import com.google.cloud.Tuple;
import com.google.gson.Gson;
import com.google.gson.reflect.TypeToken;
import org.apache.beam.sdk.Pipeline;
import org.apache.beam.sdk.io.gcp.pubsub.PubsubIO;
import org.apache.beam.sdk.io.gcp.pubsub.PubsubMessage;
import org.apache.beam.sdk.options.PipelineOptionsFactory;
import org.apache.beam.sdk.transforms.DoFn;
import org.apache.beam.sdk.transforms.ParDo;
import org.apache.beam.sdk.transforms.windowing.FixedWindows;
import org.apache.beam.sdk.transforms.windowing.Window;
import org.apache.beam.sdk.values.TupleTag;
import org.apache.beam.sdk.values.TupleTagList;
import org.joda.time.Duration;
import java.lang.reflect.Type;
import java.util.Map;
import java.util.stream.Collectors;
/**
* The Transformer.
*/
class Transformer {
final static TupleTag<Map<String, String>> successfulTransformation = new TupleTag<>();
final static TupleTag<Tuple<String, String>> failedTransformation = new TupleTag<>();
/**
* The entry point of the application.
*
* #param args the input arguments
*/
public static void main(String... args) {
TransformerOptions options = PipelineOptionsFactory.fromArgs(args)
.withValidation()
.as(TransformerOptions.class);
Pipeline p = Pipeline.create(options);
p.apply("Input", PubsubIO
.readMessagesWithAttributes()
.withIdAttribute("id")
.fromTopic(options.getTopicName()))
.apply(Window.<PubsubMessage>into(FixedWindows
.of(Duration.standardSeconds(60))))
.apply("Transform",
ParDo.of(new JsonTransformer())
.withOutputTags(successfulTransformation,
TupleTagList.of(failedTransformation)));
p.run().waitUntilFinish();
}
/**
* Deserialize the input and convert it to a key-value pairs map.
*/
static class JsonTransformer extends DoFn<PubsubMessage, Map<String, String>> {
/**
* Process each element.
*
* #param c the processing context
*/
#ProcessElement
public void processElement(ProcessContext c) {
String messagePayload = new String(c.element().getPayload());
try {
Type type = new TypeToken<Map<String, String>>() {
}.getType();
Gson gson = new Gson();
Map<String, String> map = gson.fromJson(messagePayload, type);
c.output(map);
} catch (Exception e) {
LOG.error("Failed to process input {} -- adding to dead letter file", c.element(), e);
String attributes = c.element()
.getAttributeMap()
.entrySet().stream().map((entry) ->
String.format("%s -> %s\n", entry.getKey(), entry.getValue()))
.collect(Collectors.joining());
c.output(failedTransformation, Tuple.of(attributes, messagePayload));
}
}
}
}
The error shown is:
Exception in thread "main" java.lang.IllegalStateException: Unable to
return a default Coder for Transform.out1 [PCollection]. Correct one
of the following root causes: No Coder has been manually specified;
you may do so using .setCoder(). Inferring a Coder from the
CoderRegistry failed: Unable to provide a Coder for V. Building a
Coder using a registered CoderProvider failed. See suppressed
exceptions for detailed failures. Using the default output Coder from
the producing PTransform failed: Unable to provide a Coder for V.
Building a Coder using a registered CoderProvider failed.
I tried different ways to fix the issue but I think I just do not understand what is the problem. I know that these lines cause the error to happen:
.withOutputTags(successfulTransformation,TupleTagList.of(failedTransformation))
but I do not get which part of it, what part needs a specific Coder and what is "V" in the error (from "Unable to provide a Coder for V").
Why is the error happening? I also tried to look at Apache Beam's docs but they do not seems to explain such a usage nor I understand much from the section discussing about coders.
Thanks

First, I would suggest the following -- change:
final static TupleTag<Map<String, String>> successfulTransformation =
new TupleTag<>();
final static TupleTag<Tuple<String, String>> failedTransformation =
new TupleTag<>();
into this:
final static TupleTag<Map<String, String>> successfulTransformation =
new TupleTag<Map<String, String>>() {};
final static TupleTag<Tuple<String, String>> failedTransformation =
new TupleTag<Tuple<String, String>>() {};
That should help the coder inference determine the type of the side output. Also, have you properly registered a CoderProvider for Tuple?

Thanks to #Ben Chambers' answer, Kotlin is:
val successTag = object : TupleTag<MyObj>() {}
val deadLetterTag = object : TupleTag<String>() {}

Related

How to Understand if a Batch ended in a Batch To Record Adapter

I am developing a springboot application that reads messages from a topic. Messages are managed in transaction and read as string in batch mode and then deserialized to an object. This operation may fail but I don't want to discard all the batch but rather I would move failed messages to DLQ.
As I am using spring-kafka 2.6.5 I found out that I can use BatchToRecordAdapter in order to achieve this purpose. However I did not find out how to know when I am reading the last message of any batch.
I would like to read one message at a time, serialize it and then store in an ArrayList; when listener reads the last message I want to make some processing and finally commit the transaction.
Thanks,
Giuseppe.
UPDATE
In order to achieve this purpose I override BatchToRecordAdapter and added headers that allow me to know the position in a batch of every element.
package com.doxee.commons.lifecycle.kafka;
import java.util.List;
import lombok.extern.slf4j.Slf4j;
import org.apache.kafka.clients.consumer.Consumer;
import org.apache.kafka.clients.consumer.ConsumerRecord;
import org.springframework.kafka.listener.ConsumerRecordRecoverer;
import org.springframework.kafka.listener.adapter.BatchToRecordAdapter;
import org.springframework.kafka.support.Acknowledgment;
import org.springframework.messaging.Message;
import org.springframework.messaging.support.MessageBuilder;
import org.springframework.util.Assert;
/*
* Insert a description here.
*
* Bugs: none known
*
* #author gmiano gmiano#doxee.com
* #createDate 25/01/21
*
* Copyright (C) 2021 Doxee S.p.A. C.F. - P.IVA: IT02714390362. All Rights Reserved
*/
#Slf4j
public class BatchToEnrichedRecordAdapter<K, V> implements BatchToRecordAdapter<K, V> {
private final ConsumerRecordRecoverer recoverer;
public BatchToEnrichedRecordAdapter(ConsumerRecordRecoverer recoverer) {
Assert.notNull(recoverer, "'recoverer' cannot be null");
this.recoverer = recoverer;
}
#Override
public void adapt(List<Message<?>> messages, List<ConsumerRecord<K, V>> records,
Acknowledgment ack, Consumer<?, ?> consumer, Callback<K, V> callback) {
for (int i = 0; i < messages.size(); ++i) {
Message enrichedMessage = MessageBuilder.fromMessage(messages.get(i))
.setHeader(MyHeaders.BATCH_SIZE, messages.size())
.setHeader(MyHeaders.MESSAGE_BATCH_POSITION, i + 1)
.build();
try {
callback.invoke(records.get(i), ack, consumer, enrichedMessage);
} catch (Exception var9) {
this.recoverer.accept(records.get(i), var9);
}
}
}
}
with this bean as recoverer
#Bean
ConsumerRecordRecoverer recoverer(KafkaOperations<?, ?> template) {
return new DeadLetterPublishingRecoverer(template, (record, ex) -> {
String srcTopic = record.topic();
String srcKey = record.key().toString();
log.error("Failed consume of message {} from topic {}", srcKey, srcTopic, ex);
String dstTopic;
if (ex.getCause() instanceof ClientResumableException) {
dstTopic = srcTopic.concat(".RECOVERABLE");
} else {
dstTopic = srcTopic.concat(".DLT");
}
log.error("Cannot retry. Try to write message to topic: {}", dstTopic);
return new TopicPartition(dstTopic, 0);
});
}
Is this the proper solution?

What's the best practice for Kafka Streaming as ETL replacement?

I am new to kafka and currently looking at Kafka Streams, especially joining two streams.
The samples I browsed worked with rather simple messages/ text messages.
So I constructed another simple sample, that more applies to the traditional ETL.
Let's say, we have two "datasets": Contract (=Vertrag) and Cashflow, with a cardinality of 1 to n.
In my sample I created a topic for each and sent objects (Vertrag, Cashflow) to each.
And I managed a first join of them.
KStream<String, String> joined = srcVertrag.leftJoin(srcCashflow,
(leftValue, rightValue) -> "left=" + leftValue + ", right=" + rightValue, /* ValueJoiner */
JoinWindows.of(5000),
Joined.with(
Serdes.String(), /* key */
Serdes.String(), /* left value */
Serdes.String()) /* right value */
);
The result looks like this:
left={"name":"Vertrag123","vertragId":"123"}, right={"buchungstag":1560715764709,"betrag":12.0,"vertragId":"123"}
Now my questions:
is this the right way to do this?
should I create Objects at all or rather process just Strings?
After your hints and further research, I came up with the following test.
- I created Pojos for "Vertrag" and "Cashflow"
- I created Serdes for each
- I stream them as objects
- Finally I try to join them into a Wrapper-Class. (and here I hang)
I don't find samples, that do something like this. Is this so exotic?
package tki.bigdata.kafkaetl;
import java.time.Duration;
import java.util.Date;
import java.util.HashMap;
import java.util.Map;
import java.util.Properties;
import java.util.concurrent.CountDownLatch;
import org.apache.kafka.common.serialization.Deserializer;
import org.apache.kafka.common.serialization.Serde;
import org.apache.kafka.common.serialization.Serdes;
import org.apache.kafka.common.serialization.Serializer;
import org.apache.kafka.streams.KafkaStreams;
import org.apache.kafka.streams.StreamsBuilder;
import org.apache.kafka.streams.StreamsConfig;
import org.apache.kafka.streams.Topology;
import org.apache.kafka.streams.kstream.JoinWindows;
import org.apache.kafka.streams.kstream.Joined;
import org.apache.kafka.streams.kstream.KStream;
import org.apache.kafka.streams.kstream.Printed;
import org.apache.kafka.streams.kstream.ValueJoiner;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.boot.CommandLineRunner;
import org.springframework.boot.SpringApplication;
import org.springframework.boot.autoconfigure.SpringBootApplication;
import org.springframework.context.annotation.ComponentScan;
import org.springframework.kafka.core.KafkaTemplate;
import org.springframework.scheduling.annotation.EnableScheduling;
import tki.bigdata.domain.Cashflow;
import tki.bigdata.domain.Vertrag;
import tki.bigdata.serde.JsonPOJODeserializer;
import tki.bigdata.serde.JsonPOJOSerializer;
#ComponentScan(basePackages = { "tki.bigdata.domain", "tki.bigdata.config", "tki.bigdata.app" }, basePackageClasses = App.class)
#SpringBootApplication
#EnableScheduling
public class App implements CommandLineRunner {
private static String bootstrapServers = "tobi0179.westeurope.cloudapp.azure.com:9092";
#Autowired
private KafkaTemplate<String, Object> template;
// #Autowired
// ExcelReader excelReader;
public static void main(String[] args) {
SpringApplication.run(App.class, args).close();
}
private void populateSampleData() {
Vertrag v = new Vertrag();
v.setVertragId("123");
v.setName("Vertrag123");
template.send("Vertrag", "123", v);
//template.send("Vertrag", "124", "124;Vertrag12");
Cashflow c = new Cashflow();
c.setVertragId("123");
c.setBetrag(12);
c.setBuchungstag(new Date());
template.send("Cashflow", "123", c);
}
//#Override
public void run(String... args) throws Exception {
// Topics mit Demodata befüllen
populateSampleData();
Properties streamsConfiguration = new Properties();
streamsConfiguration.put(StreamsConfig.APPLICATION_ID_CONFIG, "streams-pipe");
streamsConfiguration.put(StreamsConfig.BOOTSTRAP_SERVERS_CONFIG, bootstrapServers);
streamsConfiguration.put(StreamsConfig.DEFAULT_KEY_SERDE_CLASS_CONFIG, Serdes.String().getClass().getName());
streamsConfiguration.put(StreamsConfig.DEFAULT_VALUE_SERDE_CLASS_CONFIG, Serdes.String().getClass().getName());
// TODO: the following can be removed with a serialization factory
Map<String, Object> serdeProps = new HashMap<>();
// prepare Serde for Vertrag
final Serializer<Vertrag> vertragSerializer = new JsonPOJOSerializer<Vertrag>();
serdeProps.put("JsonPOJOClass", Vertrag.class);
vertragSerializer.configure(serdeProps, false);
final Deserializer<Vertrag> vertragDeserializer = new JsonPOJODeserializer<Vertrag>();
serdeProps.put("JsonPOJOClass", Vertrag.class);
vertragDeserializer.configure(serdeProps, false);
final Serde<Vertrag> vertragSerde = Serdes.serdeFrom(vertragSerializer, vertragDeserializer);
// prepare Serde for Cashflow
final Serializer<Cashflow> cashflowSerializer = new JsonPOJOSerializer<Cashflow>();
serdeProps.put("JsonPOJOClass", Vertrag.class);
cashflowSerializer.configure(serdeProps, false);
final Deserializer<Cashflow> cashflowDeserializer = new JsonPOJODeserializer<Cashflow>();
serdeProps.put("JsonPOJOClass", Vertrag.class);
cashflowDeserializer.configure(serdeProps, false);
final Serde<Cashflow> cashflowSerde = Serdes.serdeFrom(cashflowSerializer, cashflowDeserializer);
// streamsConfiguration.put(StreamsConfig.STATE_DIR_CONFIG,
// TestUtils.tempDir().getAbsolutePath());
StreamsBuilder builder = new StreamsBuilder();
KStream<String, Vertrag> srcVertrag = builder.stream("Vertrag");
KStream<String, Cashflow> srcCashflow = builder.stream("Cashflow");
// print to sysout
//srcVertrag.print(Printed.toSysOut());
KStream<String, MyValueContainer> joined = srcVertrag.leftJoin(srcCashflow,
(leftValue, rightValue) -> new MyValueContainer(leftValue , rightValue), /* ValueJoiner */
JoinWindows.of(600),
Joined.with(
Serdes.String(), /* key */
vertragSerde, /* left value */
cashflowSerde) /* right value */
);
joined.to("Output");
final Topology topology = builder.build();
System.out.println(topology.describe());
final KafkaStreams streams = new KafkaStreams(topology, streamsConfiguration);
final CountDownLatch latch = new CountDownLatch(1);
// attach shutdown handler to catch control-c
Runtime.getRuntime().addShutdownHook(new Thread("streams-shutdown-hook") {
#Override
public void run() {
streams.close();
latch.countDown();
}
});
try {
streams.start();
latch.await();
} catch (Throwable e) {
System.exit(1);
}
System.exit(0);
}
}
When executed, it produces the error:
2019-06-17 22:18:31.892 ERROR 1599 --- [-StreamThread-1] o.a.k.s.p.i.AssignedStreamsTasks : stream-thread [streams-pipe-0638d359-94df-43bd-9ef7-eb6769ed8a1c-StreamThread-1] Failed to process stream task 0_0 due to the following error:
java.lang.ClassCastException: java.lang.String cannot be cast to tki.bigdata.domain.Vertrag
at org.apache.kafka.streams.kstream.internals.KStreamKStreamJoin$KStreamKStreamJoinProcessor.process(KStreamKStreamJoin.java:98) ~[kafka-streams-2.0.1.jar:na]
at org.apache.kafka.streams.processor.internals.ProcessorNode$1.run(ProcessorNode.java:50) ~[kafka-streams-2.0.1.jar:na]
at org.apache.kafka.streams.processor.internals.ProcessorNode.runAndMeasureLatency(ProcessorNode.java:244) ~[kafka-streams-2.0.1.jar:na]
at org.apache.kafka.streams.processor.internals.ProcessorNode.process(ProcessorNode.java:133) ~[kafka-streams-2.0.1.jar:na]
at org.apache.kafka.streams.processor.internals.ProcessorContextImpl.forward(ProcessorContextImpl.java:143) ~[kafka-streams-2.0.1.jar:na]
at org.apache.kafka.streams.processor.internals.ProcessorContextImpl.forward(ProcessorContextImpl.java:126) ~[kafka-streams-2.0.1.jar:na]
at org.apache.kafka.streams.processor.internals.ProcessorContextImpl.forward(ProcessorContextImpl.java:90) ~[kafka-streams-2.0.1.jar:na]
at org.apache.kafka.streams.kstream.internals.KStreamJoinWindow$KStreamJoinWindowProcessor.process(KStreamJoinWindow.java:63) ~[kafka-streams-2.0.1.jar:na]
at org.apache.kafka.streams.processor.internals.ProcessorNode$1.run(ProcessorNode.java:50) ~[kafka-streams-2.0.1.jar:na]
at org.apache.kafka.streams.processor.internals.ProcessorNode.runAndMeasureLatency(ProcessorNode.java:244) ~[kafka-streams-2.0.1.jar:na]
at org.apache.kafka.streams.processor.internals.ProcessorNode.process(ProcessorNode.java:133) ~[kafka-streams-2.0.1.jar:na]
at org.apache.kafka.streams.processor.internals.ProcessorContextImpl.forward(ProcessorContextImpl.java:143) ~[kafka-streams-2.0.1.jar:na]
at org.apache.kafka.streams.processor.internals.ProcessorContextImpl.forward(ProcessorContextImpl.java:129) ~[kafka-streams-2.0.1.jar:na]
at org.apache.kafka.streams.processor.internals.ProcessorContextImpl.forward(ProcessorContextImpl.java:90) ~[kafka-streams-2.0.1.jar:na]
at org.apache.kafka.streams.processor.internals.SourceNode.process(SourceNode.java:87) ~[kafka-streams-2.0.1.jar:na]
at org.apache.kafka.streams.processor.internals.StreamTask.process(StreamTask.java:302) ~[kafka-streams-2.0.1.jar:na]
at org.apache.kafka.streams.processor.internals.AssignedStreamsTasks.process(AssignedStreamsTasks.java:94) ~[kafka-streams-2.0.1.jar:na]
at org.apache.kafka.streams.processor.internals.TaskManager.process(TaskManager.java:409) [kafka-streams-2.0.1.jar:na]
at org.apache.kafka.streams.processor.internals.StreamThread.processAndMaybeCommit(StreamThread.java:964) [kafka-streams-2.0.1.jar:na]
at org.apache.kafka.streams.processor.internals.StreamThread.runOnce(StreamThread.java:832) [kafka-streams-2.0.1.jar:na]
at org.apache.kafka.streams.processor.internals.StreamThread.runLoop(StreamThread.java:767) [kafka-streams-2.0.1.jar:na]
at org.apache.kafka.streams.processor.internals.StreamThread.run(StreamThread.java:736) [kafka-streams-2.0.1.jar:na]
is this the right way to do this?
Yes.
should I create Objects at all or rather process just Strings?
Yes. Look at Avro as a good example of a data format for serializing/deserializing your pojos. Here, you are looking for an Avro "serde" (serializer/deserializer). Confluent provide such an Avro serde for KStreams, for instance (this serde requires the use of Confluent Schema Registry).
what should I do with the above result?
It's unclear to me what your question is.

How to register my custom MessageBodyReader in my CLIENT?

Maybe somebody can help me find out how to solve this.
I am using jersey-apache-client 1.17
I tried to use Jersey client to build a standalone application (no Servlet container or whatever, just the Java classes) which communicates with a RESTFUL API, and everything worked fine until I tried to handle the mediatype "text/csv; charset=utf-8" which is a CSV stream sent by the server.
The thing is that I can read this stream with the following code:
InputStreamReader reader = new InputStreamReader(itemExportBuilder
.get(ClientResponse.class).getEntityInputStream());
Csv csv = new Csv();
Input input = csv.createInput(reader);
try {
String[] readLine;
while ((readLine = input.readLine()) != null) {
LOG.debug("Reading CSV: {}", readLine);
}
} catch (IOException e) {
e.printStackTrace();
}
try {
input.close();
} catch (IOException e) {
e.printStackTrace();
}
But I'd like to encapsulate it and put it into a MessageBodyReader. But after writing this code, I just can't make the client use the following class:
package client.response;
import java.io.IOException;
import java.io.InputStream;
import java.io.InputStreamReader;
import java.lang.annotation.Annotation;
import java.lang.reflect.Type;
import java.util.ArrayList;
import java.util.List;
import javax.ws.rs.WebApplicationException;
import javax.ws.rs.core.MediaType;
import javax.ws.rs.core.MultivaluedMap;
import javax.ws.rs.ext.MessageBodyReader;
import javax.ws.rs.ext.Provider;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
#Provider
public class ItemExportMessageBodyReader implements MessageBodyReader<ItemExportResponse> {
private static final Logger LOG = LoggerFactory.getLogger(ItemExportMessageBodyReader.class);
private static final Integer SKU = 0;
private static final Integer BASE_SKU = 1;
public boolean isReadable(Class<?> paramClass, Type type, Annotation[] annotations,
MediaType mediaType) {
LOG.info("Cheking if content is readable or not");
return paramClass == ItemExportResponse.class && !mediaType.isWildcardType()
&& !mediaType.isWildcardSubtype()
&& mediaType.isCompatible(MediaType.valueOf("text/csv; charset=utf-8"));
}
public ItemExportResponse readFrom(Class<ItemExportResponse> paramClass, Type paramType,
Annotation[] paramArrayOfAnnotation, MediaType paramMediaType,
MultivaluedMap<String, String> paramMultivaluedMap, InputStream entityStream)
throws IOException, WebApplicationException {
InputStreamReader reader = new InputStreamReader(entityStream);
Csv csv = new Csv();
Input input = csv.createInput(reader);
List<Item> items = new ArrayList<Item>();
try {
String[] readLine;
while ((readLine = input.readLine()) != null) {
LOG.trace("Reading CSV: {}", readLine);
Item item = new Item();
item.setBaseSku(readLine[BASE_SKU]);
items.add(item);
}
} catch (IOException e) {
LOG.warn("Item export HTTP response handling failed", e);
} finally {
try {
input.close();
} catch (IOException e) {
LOG.warn("Could not close the HTTP response stream", e);
}
}
ItemExportResponse response = new ItemExportResponse();
response.setItems(items);
return response;
}
}
The following documentation says that the preferred way of making this work in a JAX-RS client to register the message body reader with the code below:
Using Entity Providers with JAX-RS Client API
Client client = ClientBuilder.newBuilder().register(MyBeanMessageBodyReader.class).build();
Response response = client.target("http://example/comm/resource").request(MediaType.APPLICATION_XML).get();
System.out.println(response.getStatus());
MyBean myBean = response.readEntity(MyBean.class);
System.out.println(myBean);
Now the thing is that I can't use the ClientBuilder. I have to extend from a specific class which constructs the client another way, and I have no access to change the construction.
So when I receive the response from the server, the client fails with the following Exception:
com.sun.jersey.api.client.ClientHandlerException: A message body reader for Java class client.response.ItemExportResponse, and Java type class client.response.ItemExportResponse, and MIME media type text/csv; charset=utf-8 was not found
Any other way to register my MessageBodyReader?
OK. If anybody would bump into my question I solved this mystery by upgrading from Jersey 1.17 to version 2.9. The documentation I linked above also covers this version not the old one, this is where the confusion stems from.
Jersey introduced backward INCOMPATIBLE changes starting from version 2, so I have no clue how to configure it in version 1.17.
In version 2 the proposed solution worked fine.

Hadoop Not Finding Map Class

I am using hadoop-1.2.1 and trying to run a simple RowCount HBase job using ToolRunner. However, no matter what I seem to try, hadoop cannot find the map class. The jar file is being copied correctly into hdfs, but I can't seem to figure out where it is going wrong. Please help!
Here is the code:
import java.io.IOException;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.hbase.HBaseConfiguration;
import org.apache.hadoop.hbase.client.Result;
import org.apache.hadoop.hbase.client.Scan;
import org.apache.hadoop.hbase.filter.FirstKeyOnlyFilter;
import org.apache.hadoop.hbase.io.ImmutableBytesWritable;
import org.apache.hadoop.hbase.mapreduce.TableMapReduceUtil;
import org.apache.hadoop.hbase.mapreduce.TableMapper;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.output.NullOutputFormat;
import org.apache.hadoop.conf.Configured;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.util.Tool;
import org.apache.hadoop.util.ToolRunner;
public class HBaseRowCountToolRunnerTest extends Configured implements Tool
{
// What to copy.
public static final String JAR_NAME = "myJar.jar";
public static final String LOCAL_JAR = <path_to_jar> + JAR_NAME;
public static final String REMOTE_JAR = "/tmp/"+JAR_NAME;
public static void main(String[] args) throws Exception
{
Configuration config = HBaseConfiguration.create();
//All connection configs set here -- omitted to post the code
config.set("tmpjars", REMOTE_JAR);
FileSystem dfs = FileSystem.get(config);
System.out.println("pathString = " + (new Path(LOCAL_JAR)).toString() + " \n");
// Copy jar file to remote.
dfs.copyFromLocalFile(new Path(LOCAL_JAR), new Path(REMOTE_JAR));
// Get rid of jar file when we're done.
dfs.deleteOnExit(new Path(REMOTE_JAR));
// Run the job.
System.exit(ToolRunner.run(config, new HBaseRowCountToolRunnerTest(), args));
}
#Override
public int run(String[] args) throws Exception
{
Job job = new RowCountJob(getConf(), "testJob", "myLittleHBaseTable");
return job.waitForCompletion(true) ? 0 : 1;
}
public static class RowCountJob extends Job
{
RowCountJob(Configuration conf, String jobName, String tableName) throws IOException
{
super(conf, RowCountJob.class.getCanonicalName() + "_" + jobName);
setJarByClass(getClass());
Scan scan = new Scan();
scan.setCacheBlocks(false);
scan.setFilter(new FirstKeyOnlyFilter());
setOutputFormatClass(NullOutputFormat.class);
TableMapReduceUtil.initTableMapperJob(tableName, scan,
RowCounterMapper.class, ImmutableBytesWritable.class, Result.class, this);
setNumReduceTasks(0);
}
}//end public static class RowCountJob extends Job
//Mapper that runs the count
//TableMapper -- TableMapper<KEYOUT, VALUEOUT> (*OUT by type)
public static class RowCounterMapper extends TableMapper<ImmutableBytesWritable, Result>
{
//Counter enumeration to count the actual rows
public static enum Counters {ROWS}
/**
* Maps the data.
*
* #param row The current table row key.
* #param values The columns.
* #param context The current context.
* #throws IOException When something is broken with the data.
* #see org.apache.hadoop.mapreduce.Mapper#map(KEYIN, VALUEIN,
* org.apache.hadoop.mapreduce.Mapper.Context)
*/
#Override
public void map(ImmutableBytesWritable row, Result values, Context context) throws IOException
{
// Count every row containing data times 2, whether it's in qualifiers or values
context.getCounter(Counters.ROWS).increment(2);
}
}//end public static class RowCounterMapper extends TableMapper<ImmutableBytesWritable, Result>
}//end public static void main(String[] args) throws Exception
Ok- I found a workaround to the problem and thought that I would share for all others having similar issues...
As is turns out, I abandoned the tmpjars configuration option and just copied the jar file directed into the DistributedCache from the code itself. Here is what it looks like:
// Copy jar file to remote.
FileSystem dfs = FileSystem.get(conf);
dfs.copyFromLocalFile(new Path(LOCAL_JAR), new Path(REMOTE_JAR));
// Get rid of jar file when we're done.
dfs.deleteOnExit(new Path(REMOTE_JAR));
//Place it in the distributed cache
DistributedCache.addFileToClassPath(new Path(REMOTE_JAR), conf, dfs);
Perhaps it doesn't solve what is going on with tmpjars, but it does work.
I got the same problem today.Finally, I found it was because I forgot to insert the following sentence in the driver class...
job.setJarByClass(HBaseTestDriver.class);

Calling through Spring a procedure with collection of object (oracle ARRAY STRUCT)

im trying to execute a procedure which contains between others a parameter which is a collection of object (oracle). I have managed them lot of times without spring, but I'm a bit lost trying to do it with spring, althoug there is some information on the internet, I can't find a full example in order to compare my code. Spring doc has just fragments. Probably my code is wrong but i ignore why, could you help me? I'm running simplier procedures without problems. My DAO looks like this:
//[EDITED]
private SimpleJdbcCall pActualizaDia;
....
#Autowired
public void setDataSource(DataSource dataSource) {
pActualizaDia = new SimpleJdbcCall(dataSource).withCatalogName("PTR_GRUPOS_TRABAJO").withProcedureName("UPDATE_DIA");
pActualizaDia.getJdbcTemplate().setNativeJdbcExtractor(new OracleJdbc4NativeJdbcExtractor());
}
...
public Calendario updateSingle(final Calendario calendario) {
SqlTypeValue cambiosEmpresa = new AbstractSqlTypeValue() {
protected Object createTypeValue(Connection conn, int sqlType, String typeName) throws SQLException {
ArrayDescriptor arrayDescriptor = new ArrayDescriptor("TTPTR_CAMBIO_EMPRESA", conn);
Object[] collection = new Object[calendario.getCambiosEmpresa().size()];
int i = 0;
for (CeAnoEmp ce : calendario.getCambiosEmpresa()) {
collection[i++] = new STRUCT(new StructDescriptor("TPTR_CAMBIO_EMPRESA", conn), conn, new Object[] {
ce.getSQLParam1(),
//...more parameters here in order to fit your type.
ce.getSQLparamn() });
}
ARRAY idArray = new ARRAY(arrayDescriptor, conn, collection);
return idArray;
}
};
MapSqlParameterSource mapIn = new MapSqlParameterSource();
mapIn.addValue("P_ID_ESCALA", calendario.getEscala().getIdEscala());
//more simple params here
//Here it is the Oracle ARRAY working properly
pActualizaDia.declareParameters(new SqlParameter("P_CAMBIOS_EMPRESA",
OracleTypes.STRUCT, "TTPR_CAMBIO_EMPRESA"));
mapIn.addValue("P_CAMBIOS_EMPRESA",cambiosEmpresa);
//When executing the procedure it just work :)
pActualizaDia.execute(mapIn);
return null;
}
The exception I get sais
java.lang.ClassCastException: $Proxy91 cannot be cast to oracle.jdbc.OracleConnection
I've been reading more about this topic and i found that It almost seems like if using Oracle Arrays you also have to cast the connection to be an oracle connection.
However, most Spring jdbc framework classes like SimpleJDBCTemplate and StoredProcedure hide the connection access from you. Do I need to subclass one of those and override a method somewhere to get the dbcp connection and then cast it to an Oracle connection?
Thank you very much.
I've solved it finally, I've edited the post in order to have an example for anyone looking for a piece of code to solve this issue.
There are two important things to have in mind:
1) It's mandatory to set oracle extractor in jdbctemplate in order to cast properly the connection to get oracle functionality.
2) When using this extractor ojdbc and JRE version must be the same, any other case you'll get an abstractmethodinvocation exception.
Thanks anyone who tried to solve it and hope it helps.
you can use spring to call a procedure with array of collection of oracle structure : below a simple example to do this
import java.util.Collection;
import java.util.HashMap;
import java.util.Iterator;
import java.util.Map;
import javax.sql.DataSource;
import oracle.jdbc.driver.OracleConnection;
import oracle.jdbc.driver.OracleTypes;
import oracle.sql.ARRAY;
import oracle.sql.ArrayDescriptor;
import oracle.sql.STRUCT;
import oracle.sql.StructDescriptor;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import org.springframework.jdbc.core.SqlParameter;
import org.springframework.jdbc.object.StoredProcedure;
public class SpringObjectMapper {
public static class SaveObjectFunction extends StoredProcedure {
final static Logger logger = LoggerFactory.getLogger(SavePackInterFunction.class);
private static final String PROC_NAME = "schema.proc_name";
private final static String ARRAY_OF_VALUE_PARAM_NAME = "ARRAY_OF_VALUE";
private final static String OUT_PARAM_NAME = "out";
public SaveObjectFunction(DataSource dataSource) {
super(dataSource, PROC_NAME);
declareParameter(new SqlParameter(ARRAY_OF_VALUE_PARAM_NAME, OracleTypes.ARRAY, "schema.array_object_type"));
compile();
}
public String execute(Collection<Model> values) {
logger.info("------------------------EnregInterlocuteurPrcedure::execute : begin----------------------------");
String message = null;
try {
OracleConnection connection = getJdbcTemplate().getDataSource().getConnection().unwrap(OracleConnection.class);
ArrayDescriptor arrayValueDescriptor = new ArrayDescriptor("schema.array_object_type", connection);
StructDescriptor typeObjeDescriptor = new StructDescriptor("schema.object_type", connection);
Object[] valueStructArray = new Object[values.size()];
int i = 0;
for (Iterator<Model> iterator = values.iterator(); iterator.hasNext();) {
Model model = (Model) iterator.next();
STRUCT s = new STRUCT(typeObjeDescriptor, connection, new Object[] {model.getAttribute1(), model.getAttribute2(), model.getAttribute3(),
model.getAttribute4(), model.getAttribute5(), model.getAttribute6(), model.getAttribute7()});
valueStructArray[i++] = s;
}
ARRAY inZoneStructArray = new ARRAY(arrayValueDescriptor, connection, valueStructArray);
Map<String, Object> inputs = new HashMap<String, Object>();
inputs.put(ARRAY_OF_VALUE_PARAM_NAME, inZoneStructArray);
Map<String, Object> out = super.execute(inputs);
message = (String) out.get(OUT_PARAM_NAME);
} catch (Exception e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
return message;
}
}
}

Resources