nullpointer exception in aggregate operation in kafka stream - apache-kafka-streams

below is code snippet,
streamsConfiguration.put(StreamsConfig.APPLICATION_ID_CONFIG, "wordcount-live-test");
streamsConfiguration.put(StreamsConfig.BOOTSTRAP_SERVERS_CONFIG, "brokerIP:port");
streamsConfiguration.put(StreamsConfig.DEFAULT_KEY_SERDE_CLASS_CONFIG, Serdes.String().getClass().getName());
streamsConfiguration.put(StreamsConfig.DEFAULT_VALUE_SERDE_CLASS_CONFIG, Serdes.String().getClass().getName());
streamsConfiguration.put(StreamsConfig.TOPOLOGY_OPTIMIZATION, StreamsConfig.OPTIMIZE);
StreamBuilder builder = new StreamBuilder();
KStream streamData = builder.stream(inputTopicName);
streamData.groupByKey(Grouped.with(jsonSerde,jsonSerde))
.aggregate( //some transformation );
KafkaStreams kafkaStreams = new KafkaStreams(
builder.build(streamConfiguration),
streamConfiguration
);
here we are not using session window and this snippet gives me perfect result. But when I introduce session window with this stream then it gives null pointer exception for aggregate function.
can anybody help here

Related

java.lang.IllegalStateException: block()/blockFirst()/blockLast() are blocking, which is not supported in thread reactor-http-kqueue-4

Im getting an error with the blocking operation in Spring Webflux. I retrieve a Mono of list of Address documents and im using this Mono list of address documents to form the street address(withStreet)as shown below :
Mono<List<Address>> docs = getAddress(id, name);
AddressResponse addrResponse = new AddressResponse.Builder().
withStreet(docs.map(doc -> doc.stream().
map(StreetAddress::map).
collect(Collectors.toList())).block()).
build();
map method :
public static StreetAddress map(Address addr) {
return new Builder().
withId(addr.getId()).
withStreet(addr.getStreetAddress()).
build();
}
When i execute the above code, it throws a "block()/blockFirst()/blockLast() are blocking, which is not supported in thread reactor-http-nio-2". Could you please suggest how to fix. i want to retrieve AddressResponse without blocking it. This response will be further used in the code in Response Entity as shown below :
return Mono.just(ResponseEntity
.status(HttpStatus.OK)
.body(addrResponse)));
The problem is you try to mix reactive and imperative code.
Instead, just map it in the reactive pipeline:
Mono<AddressResponse> response = docs.map(addresses->{
return new AddressResponse.Builder()
.withStreet(addresses -> addresses.stream()
.map(StreetAddress::map)
.collect(Collectors.toList()))
.build();
})
Then you can return it as is, or map it into a Mono> type, apply the same method then above.

Is stream needs to be closed in this use case?

I am reading a csv file input from the client browser and code is below
#RequestMapping(value = "/file-upload", method = {RequestMethod.PUT, RequestMethod.POST}, consumes =
MediaType.APPLICATION_JSON_UTF8_VALUE, produces = MediaType.APPLICATION_JSON_UTF8_VALUE)
#ResponseStatus(HttpStatus.OK)
public Set<SessionBulkUploadLine> fileUpload(#RequestBody final FileUploadInfo fileUploadInfo) {
final CsvFileParser parser = new CsvFileParser();
try (final ByteArrayInputStream stream = new ByteArrayInputStream(settings.getBytes())) {
final Spreadsheet sheet = parser.parse(stream, true);
}
Do i need to close the stream in the above??
Please advise.
Thanks
1) No, you don't need to close it by explicitly calling .close() because it will be automatically closed after the try catch finish its execution. (you're using a try-catch with resources)
2) If you're not using a try-catch with resources then any stream should be closed after you're done using it so the garbage collector can remove it from memory. (A stream often uses a lot of resources)
try{
final ByteArrayInputStream stream = ... ;
// logic here
}catch(Exception e)
{
// print error
}finally{
stream.close();
}
Please note that the stream is initialized INSIDE the try block.

Can Kafka Streams consume message in a format and produce another format such as AVRO message

I am using kafka streams to consume JSON string from one topic, process and generate response to be stored in another topic. However the message that needs to be produced to the response topic needs to be in avro format.
I have tried using key as string serde and value as SpecificAvroSerde
Following is my Code to create Topology:
StreamsBuilder builder = new StreamsBuilder();
KStream<Object, Object> consumerStream =builder.stream(kafkaConfiguration.getConsumerTopic());
consumerStream = consumerStream.map(getKeyValueMapper(keyValueMapperClassName));
consumerStream.to(kafkaConfiguration.getProducerTopic());
Following is my config
if (schemaRegistry != null && schemaRegistry.length > 0) {
streamsConfig.put(KafkaAvroSerializerConfig.SCHEMA_REGISTRY_URL_CONFIG, String.join(",", schemaRegistry));
}
streamsConfig.put(this.keySerializerKeyName, StringSerde.class);
streamsConfig.put(this.valueSerialzerKeyName, SpecificAvroSerde.class);
streamsConfig.put(StreamsConfig.APPLICATION_ID_CONFIG, applicationId);
streamsConfig.put(ConsumerConfig.AUTO_OFFSET_RESET_CONFIG, autoOffsetReset);
streamsConfig.put(ConsumerConfig.MAX_POLL_RECORDS_CONFIG, batchSize);
streamsConfig.put(StreamsConfig.DEFAULT_TIMESTAMP_EXTRACTOR_CLASS_CONFIG, FailOnInvalidTimestamp.class);
streamsConfig.put(StreamsConfig.PROCESSING_GUARANTEE_CONFIG, processingGuarantee);
streamsConfig.put(StreamsConfig.COMMIT_INTERVAL_MS_CONFIG, Integer.parseInt(commitIntervalMs));
streamsConfig.put(StreamsConfig.NUM_STREAM_THREADS_CONFIG, numberOfThreads);
streamsConfig.put(StreamsConfig.REPLICATION_FACTOR_CONFIG, replicationFactor);
streamsConfig.put(StreamsConfig.DEFAULT_DESERIALIZATION_EXCEPTION_HANDLER_CLASS_CONFIG, DeserializationExceptionHandler.class);
streamsConfig.put(StreamsConfig.DEFAULT_PRODUCTION_EXCEPTION_HANDLER_CLASS_CONFIG, ProductionExceptionHandler.class);
streamsConfig.put(StreamsConfig.TOPOLOGY_OPTIMIZATION,StreamsConfig.OPTIMIZE);
streamsConfig.put(ProducerConfig.COMPRESSION_TYPE_CONFIG, compressionMode);
streamsConfig.put(ConsumerConfig.MAX_POLL_RECORDS_CONFIG, maxPollRecords);
I am seeing the following error when I try with the example:
org.apache.kafka.common.errors.SerializationException: Error deserializing Avro message for id -1
Caused by: org.apache.kafka.common.errors.SerializationException: Unknown magic byte!
Problem is with the Key Value Serdes. You should use the correct serdes while consuming the stream and same for while publishing the stream.
In case if your input is JSON and you want to publish as Avro, you can do it as following:
Properties streamsConfig= new Properties();
streamsConfig.put(StreamsConfig.DEFAULT_KEY_SERDE_CLASS_CONFIG, Serdes.String().getClass().getName());
streamsConfig.put(StreamsConfig.DEFAULT_VALUE_SERDE_CLASS_CONFIG, SpecificAvroSerde.class);
StreamsBuilder builder = new StreamsBuilder();
KStream<Object, Object> consumerStream =builder.stream(kafkaConfiguration.getConsumerTopic(),Consumed.with(Serdes.String(), Serdes.String()));
// Replace AvroObjectClass with your avro object type
KStream<String,AvroObjectClass> consumerAvroStream = consumerStream.map(getKeyValueMapper(keyValueMapperClassName));
consumerAvroStream.to(kafkaConfiguration.getProducerTopic());

CRFClassifier: loading model from a stream gives exception "invalid stream header: 1F8B0800"

I am trying to load a CRFClassifier model from a file. This way works:
// this works
classifier = CRFClassifier.getClassifier("../res/stanford-ner-2018-02-27/classifiers/english.all.3class.distsim.crf.ser.gz");
When I want to use stream, however, I get invalid stream header: 1F8B0800 exception:
// this throws an exception
String modelResourcePath = "../res/stanford-ner-2018-02-27/classifiers/english.all.3class.distsim.crf.ser.gz";
BufferedInputStream stream = new BufferedInputStream(new FileInputStream(modelResourcePath));
classifier = CRFClassifier.getClassifier(stream);
Exception:
Exception in thread "main" java.io.StreamCorruptedException: invalid stream header: 1F8B0800
at java.io.ObjectInputStream.readStreamHeader(ObjectInputStream.java:866)
at java.io.ObjectInputStream.<init>(ObjectInputStream.java:358)
at edu.stanford.nlp.ie.AbstractSequenceClassifier.loadClassifier(AbstractSequenceClassifier.java:1473)
at edu.stanford.nlp.ie.AbstractSequenceClassifier.loadClassifier(AbstractSequenceClassifier.java:1456)
at edu.stanford.nlp.ie.crf.CRFClassifier.getClassifier(CRFClassifier.java:2890)
at com.sv.research.ner.stanford.StanfordEntityExtractor.<init>(StanfordEntityExtractor.java:34)
at com.sv.research.ner.stanford.StanfordEntityExtractor.main(StanfordEntityExtractor.java:59)
I would expect both ways to be equivalent. My reason to load through a stream is that ultimately I want to load the model from JAR resources using:
stream = ClassLoader.getSystemClassLoader().getResourceAsStream(modelResourcePath));
The way the classifier you are trying to use was serialized via GZIPInputStream as far as I could see from their sources.
So can you try deserializing the way that they serialize, like this:
BufferedInputStream stream = new BufferedInputStream(new GZIPInputStream(new FileInputStream(modelResourcePath)));
Cheers

Programatically bridge a QueueChannel to a MessageChannel in Spring

I'm attempting to wire a queue to the front of a MessageChannel, and I need to do so programatically so it can be done at run time in response to an osgi:listener being triggered. So far I've got:
public void addService(MessageChannel mc, Map<String,Object> properties)
{
//Create the queue and the QueueChannel
BlockingQueue<Message<?>> q = new LinkedBlockingQueue<Message<?>>();
QueueChannel qc = new QueueChannel(q);
//Create the Bridge and set the output to the input parameter channel
BridgeHandler b = new BridgeHandler();
b.setOutputChannel(mc);
//Presumably, I need something here to poll the QueueChannel
//and drop it onto the bridge. This is where I get lost
}
Looking through the various relevant classes, I came up with:
PollerMetadata pm = new PollerMetadata();
pm.setTrigger(new IntervalTrigger(10));
PollingConsumer pc = new PollingConsumer(qc, b);
but I'm not able to put it all together. What am I missing?
So, the solution that ended up working for me was:
public void addEngineService(MessageChannel mc, Map<String,Object> properties)
{
//Create the queue and the QueueChannel
BlockingQueue<Message<?>> q = new LinkedBlockingQueue<Message<?>>();
QueueChannel qc = new QueueChannel(q);
//Create the Bridge and set the output to the input parameter channel
BridgeHandler b = new BridgeHandler();
b.setOutputChannel(mc);
//Setup a Polling Consumer to poll the queue channel and
//retrieve 1 thing at a time
PollingConsumer pc = new PollingConsumer(qc, b);
pc.setMaxMessagesPerPoll(1);
//Now use an interval trigger to poll every 10 ms and attach it
IntervalTrigger trig = new IntervalTrigger(10, TimeUnit.MILLISECONDS);
trig.setInitialDelay(0);
trig.setFixedRate(true);
pc.setTrigger(trig);
//Now set a task scheduler and start it
pc.setTaskScheduler(taskSched);
pc.setAutoStartup(true);
pc.start();
}
I'm not terribly clear if all the above is explicitly needed, but neither the trigger or the task scheduler alone worked, I did appear to need both. I should also note the taskSched used was the default taskScheduler dependency injected from spring via
<property name="taskSched" ref="taskScheduler"/>

Resources