KTable & LogAndContinueExceptionHandler - apache-kafka-streams

I have a very simple consumer from which I create a materialized view. I have enabled validation on my value object (throwing Constraintviolationexception for invalid json data). When I receive a value on which the validation fails, I exepct the value to logged & consumer should read the next offset as I have LogAndContinueExceptionHandler enabled.
However LogAndContinueExceptionHandler is never invoked and consumePojo State transition from PENDING_ERROR to ERROR
Code
#Bean
public Consumer<KTable<String, Pojo>> consume() {
return values->
values
.filter((key, value) -> Objects.nonNull(key))
.mapValues(value-> value, Materialized.<String, Pojo>as(Stores.inMemoryKeyValueStore("POJO_STORE_NAME"))
.withKeySerde(Serdes.String())
.withValueSerde(SerdeUtil.pojoSerde())
.withLoggingDisabled())
.toStream()
.peek((key, value) -> log.debug("Receiving Pojo from topic with key: {}, and UUID: {}", key, value == null ? 0 : value.getUuid()));
}
Why is it that LogAndContinueExceptionHandler is not invoked in case of KTable?
Note: If code is changed to KStreams then I see logging and records being skipped but with KTable not !!

In order to handle exceptions not handled by Kafka Streams use the KafkaStreams.setUncaughtExceptionHandler method and StreamsUncaughtExceptionHandler implementation, this needs to return one of 3 available enumerations:
REPLACE_THREAD
SHUTDOWN_CLIENT
SHUTDOWN_APPLICATION
and in your case REPLACE_THREAD is the best option, as you can see in KIP-671:
REPLACE_THREAD:
The current thread is shutdown and transits to state DEAD.
A new thread is started if the Kafka Streams client is in state RUNNING or REBALANCING.
For the Global thread this option will log an error and revert to shutting down the client until the option had been added
In Spring Kafka you can replace default StreamsUncaughtExceptionHandler by StreamsBuilderFactoryBean:
#Autowired
void setMyStreamsUncaughtExceptionHandler(StreamsBuilderFactoryBean streamsBuilderFactoryBean) {
streamsBuilderFactoryBean.setStreamsUncaughtExceptionHandler(exception -> StreamsUncaughtExceptionHandler.StreamThreadExceptionResponse.REPLACE_THREAD);
}

I was able to solve the problem after looking at the logs carefully, I found that valueSerde for the Pojo, was showing useNativeDecoding (default being JsonSerde) due to this DeserializationExceptionHandler wasn't invoked and thread terminated.
Problem went away when I fixed the valueSerde in application.properties

Related

Flowfiles stuck in queue (Apache NiFi)

I have following flow:
ListFTP -> RouteOnAttribute -> FetchFTP -> UnpackContent -> ExecuteScript.
Some of files are stuck in queue UnpackContent -> ExecuteScript.
ExecuteScript ate some flowfiles and they just disappeared: failure and success relationships are empty. It just showed some activity in Tasks/Time field. All of them stuck in queue before ExecuteScript. I tried to empty queue, but not all of flowfiles have been deleted from this queue. About 1/3 of them still stuck in queue. I tried to disable all processors and empty queue again but it returns: 0 FlowFiles (0 bytes) were removed from the queue.
When i'm trying to change Connection destionation it returns:
Cannot change destination of Connection because FlowFiles from this Connection are currently held by ExecuteScript[id=d33c9b73-0177-1000-5151-83b7b938de39]
ExecuScript from this answer (uses Python).
So, I can't empty queue because its always return message that there is no any flowfile, and i can't remove connection. This has been going on for several hours.
Connection configuration:
Scheduling is set to 0 sec, no penalties for flowfiles, etc.
Is it script problem?
UPDATE
Changed script to:
flowFile = session.get()
if (flowFile != None):
# All processing code starts at this indent
if errorOccurred:
session.transfer(flowFile, REL_FAILURE)
else:
session.transfer(flowFile, REL_SUCCESS)
# implicit return at the end
Same result.
UPDATE v2
I set concurent tasks to 50 and then ran ExecuteScript again and terminated it. I got this error:
UPDATE v3
I created additional ExecuteScript processor with same script and it works fine. But after i stopped this new processor and create new flowfiles, this processor now have same problems: it's just stuck.
Hilarious. Is ExecuteScript for single use?
You need to modify Your code in nifi-1.13.2 because NIFI-8080 caused these bugs. Or you just use nifi 1.12.1
JythonScriptEngineConfigurator:
#Override
public Object init(ScriptEngine engine, String scriptBody, String[] modulePaths) throws ScriptException {
// Always compile when first run
if (engine != null) {
// Add prefix for import sys and all jython modules
prefix = "import sys\n"
+ Arrays.stream(modulePaths).map((modulePath) -> "sys.path.append(" + PyString.encode_UnicodeEscape(modulePath, true) + ")")
.collect(Collectors.joining("\n"));
}
return null;
}
#Override
public Object eval(ScriptEngine engine, String scriptBody, String[] modulePaths) throws ScriptException {
Object returnValue = null;
if (engine != null) {
returnValue = ((Compilable) engine).compile(prefix + scriptBody).eval();
}
return returnValue;
}

Why would Kafka Stream app crash with suppress() enabled?

I'm writing a kafka stream 2.3.0 application to count the number of events in a session window and hopefully to print out only the final record when a session times out.
Serde<String> stringSerde = Serdes.serdeFrom(new StringSerializer(), new StringDeserializer());
Serde<MuseObject> museObjectSerde = Serdes.serdeFrom(new MuseObjectSerializer(), new MuseObjectDeserializer());
StreamsBuilder builder = new StreamsBuilder();
builder
.stream(INPUT_TOPIC, Consumed.with(stringSerde, museObjectSerde))
.map((key, value) -> {
return KeyValue.pair(value.getSourceValue("vid"), value.toString());
})
.groupByKey(Grouped.with(Serdes.String(), Serdes.String()))
.windowedBy(SessionWindows.with(Duration.ofSeconds(INACTIVITY_GAP)).grace(Duration.ZERO))
.count(Materialized.with(Serdes.String(), Serdes.Long()))
.suppress(Suppressed.untilWindowCloses(Suppressed.BufferConfig.unbounded()))
.toStream()
.print(Printed.toSysOut());
However the application crashes when a session times out:
12:35:03.859 [kafka-producer-network-thread | kafka-streams-test-kgu-4c3f2398-8f67-429d-82ce-6062c86af466-StreamThread-1-producer] ERROR o.a.k.s.p.i.RecordCollectorImpl - task [1_0] Error sending record to topic kafka-streams-test-kgu-KTABLE-SUPPRESS-STATE-STORE-0000000008-changelog due to The server experienced an unexpected error when processing the request.; No more records will be sent and no more offsets will be recorded for this task. Enable TRACE logging to view failed record key and value.
org.apache.kafka.common.errors.UnknownServerException: The server experienced an unexpected error when processing the request.
12:35:03.862 [kafka-streams-test-kgu-4c3f2398-8f67-429d-82ce-6062c86af466-StreamThread-1] ERROR o.a.k.s.p.i.AssignedStreamsTasks - stream-thread [kafka-streams-test-kgu-4c3f2398-8f67-429d-82ce-6062c86af466-StreamThread-1] Failed to commit stream task 1_0 due to the following error:
org.apache.kafka.streams.errors.StreamsException: task [1_0] Abort sending since an error caught with a previous record (key user01\x00\x00\x01m!\xCE\x99u\x00\x00\x01m!\xCE\x80\xD1 value null timestamp null) to topic kafka-streams-test-kgu-KTABLE-SUPPRESS-STATE-STORE-0000000008-changelog due to org.apache.kafka.common.errors.UnknownServerException: The server experienced an unexpected error when processing the request.
at org.apache.kafka.streams.processor.internals.RecordCollectorImpl.recordSendError(RecordCollectorImpl.java:138)
I've tried to comment out ".suppress..." line. It works fine without suppress() and prints out something like this
[KSTREAM-FILTER-0000000011]: [user01#1568230244561/1568230250869], MuseSession{vid='user01', es='txnSuccess', count=6, start=2019-06-26 17:11:02.937, end=2019-06-26 18:07:10.685, sessionType='open'}".
What did I miss in using suppress()? Is there another way to filter out only the session records that have been timed out?
Any help is appreciated. Thanks in advance.
suppress() requires at least broker version 0.11.0 and message format 0.11.

duplicate key given in txn request while trying to remove all keys by prefix and put them again

Trying to use coreos/jetcd for updating haproxy settings in etcd from Java-code.
What I want to achieve is:
remove all endpoints for single host
add an updated data for given host
I want to remove all keys by prefix and put actual data in etcd as atomic operation.
That's why I tried to use etcd transactions. My code is:
Op.DeleteOp deleteOp = Op.delete(
fromString(prefix),
DeleteOption.newBuilder().withPrefix(fromString(prefix)).build()
);
Txn tx = kvClient.txn().Else(deleteOp);
newKvs.forEach((k,v) -> {
tx.Else(Op.put(fromString(k), fromString(v), DEFAULT));
});
try {
tx.commit().get();
} catch (InterruptedException | ExecutionException e) {
log.error("ETCD transaction failed", e);
throw new RuntimeException(e);
}
ETCD v3 API is used (etcd v3.2.9). KVstore is initially empty and I want to add 3 records.
prefix value is:
/proxy-service/hosts/example.com
and kvs is a Map:
"/proxy-service/hosts/example.com/FTP/0" -> "localhost:10021"
"/proxy-service/hosts/example.com/HTTPS/0" -> "localhost:10443"
"/proxy-service/hosts/example.com/HTTP/0" -> "localhost:10080"
An Exception happens on commit().get() line with the following root cause:
Caused by: io.grpc.StatusRuntimeException: INVALID_ARGUMENT: etcdserver: duplicate key given in txn request
at io.grpc.Status.asRuntimeException(Status.java:526)
at io.grpc.stub.ClientCalls$UnaryStreamToFuture.onClose(ClientCalls.java:427)
at io.grpc.ForwardingClientCallListener.onClose(ForwardingClientCallListener.java:41)
at com.coreos.jetcd.internal.impl.ClientConnectionManager$AuthTokenInterceptor$1$1.onClose(ClientConnectionManager.java:267)
at io.grpc.internal.ClientCallImpl.closeObserver(ClientCallImpl.java:419)
at io.grpc.internal.ClientCallImpl.access$100(ClientCallImpl.java:60)
at io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl.close(ClientCallImpl.java:493)
at io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl.access$500(ClientCallImpl.java:422)
at io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl$1StreamClosed.runInContext(ClientCallImpl.java:525)
at io.grpc.internal.ContextRunnable.run(ContextRunnable.java:37)
at io.grpc.internal.SerializingExecutor.run(SerializingExecutor.java:102)
... 3 more
What am i doing wrong and how else can I complete several etcd changes as atomic operation?
It looks like that the operations to remove a key and then adding a new value for the same key cannot be in the same txn. According to the The etcd APIv3 documentation:
Txn processes multiple requests in a single transaction. A txn request increments the revision of the key-value store and generates events with the same revision for every completed request. It is not allowed to modify the same key several times within one txn.

How to continuously read JMS Messages in a thread and achnowledge them based on their JMSMessageID in another thread?

I've written a Continuous JMS Message reveiver :
Here, I'm using CLIENT_ACKNOWLEDGE because I don't want this thread to acknowledge the messages.
(...)
connection.start();
session = connection.createQueueSession(true, Session.CLIENT_ACKNOWLEDGE);
queue = session.createQueue(QueueId);
receiver = session.createReceiver(queue);
While (true) {
message = receiver.receive(1000);
if ( message != null ) {
// NB : I can only pass Strings to the other thread
sendMessageToOtherThread( message.getText() , message.getJMSMessageID() );
}
// TODO Implement criteria to exit the loop here
}
In another thread, I'll do something as follows (after successful processing) :
This is in a distinct JMS Connection executed simultaneously.
public void AcknowledgeMessage(String messageId) {
if (this.first) {
this.connection.start();
this.session = this.connection.createQueueSession( false, Session.AUTO_ACKNOWLEDGE );
this.queue = this.session.createQueue(this.QueueId);
}
QueueReceiver receiver = this.session.createReceiver(this.queue, "JMSMessageID='" + messageId + "'");
Message AckMessage = receiver.receive(2000);
receiver.close();
}
It appears that the message is not found (AckMessage is null after timeout) whereas it does exist in the Queue.
I suspect the message to be blocked by the continuous input thread.. indeed, when firing the AcknowledgeMessage() alone, it works fine.
Is there a cleaner way to retrieve 1 message ? based on its QueueId and messageId
Also, I feel like there could be a risk of memory leak in the continuous reader if it has to memorize the Messages or IDs during a long time.. justified ?
If I'm using a QueueBrowser to avoid impacting the Acknowledge Thread, it looks like I cannot have this continuous input feed.. right ?
More context : I'm using ActiveMQ and the 2 threads are 2 custom "Steps" of a Pentaho Kettle transformation.
NB : Code samples are simplified to focus on the issue.
Well, you can't read that message twice, since you have already read it in the first thread.
ActiveMQ will not delete the message as you have not acknowledge it, but it won't be visible until you drop the JMS connection (I'm not sure if there is a long timeout here as well in ActiveMQ).
So you will have to use the original message and do: message.acknowledge();.
Note, however, that sessions are not thread safe, so be careful if you do this in two different threads.

Issue or confusion with JMS/spring/AMQ not processing messages asynchronously

We have a situation where we set up a component to run batch jobs using spring batch remotely. We send a JMS message with the job xml path, name, parameters, etc. and we wait on the calling batch client for a response from the server.
The server reads the queue and calls the appropriate method to run the job and return the result, which our messaging framework does by:
this.jmsTemplate.send(queueName, messageCreator);
this.LOGGER.debug("Message sent to '" + queueName + "'");
try {
final Destination replyTo = messageCreator.getReplyTo();
final String correlationId = messageCreator.getMessageId();
this.LOGGER.debug("Waiting for the response '" + correlationId + "' back on '" + replyTo + "' ...");
final BytesMessage message = (BytesMessage) this.jmsTemplate.receiveSelected(replyTo, "JMSCorrelationID='"
+ correlationId + "'");
this.LOGGER.debug("Response received");
Ideally, we want to be able to call out runJobSync method twice, and have two jobs simultaneously operate. We have a unit test that does something similar, without jobs. I realize this code isn't very great, but, here it is:
final List result = Collections.synchronizedList(new ArrayList());
Thread thread1 = new Thread(new Runnable(){
#Override
public void run() {
client.pingWithDelaySync(1000);
result.add(Thread.currentThread().getName());
}
}, "thread1");
Thread thread2 = new Thread(new Runnable(){
#Override
public void run() {
client.pingWithDelaySync(500);
result.add(Thread.currentThread().getName());
}
}, "thread2");
thread1.start();
Thread.sleep(250);
thread2.start();
thread1.join();
thread2.join();
Assert.assertEquals("both thread finished", 2, result.size());
Assert.assertEquals("thread2 finished first", "thread2", result.get(0));
Assert.assertEquals("thread1 finished second", "thread1", result.get(1));
When we run that test, thread 2 completes first since it just has a 500 millisencond wait, while thread 1 does a 1 second wait:
Thread.sleep(delayInMs);
return result;
That works great.
When we run two remote jobs in the wild, one which takes about 50 seconds to complete and one which is designed to fail immediately and return, this does not happen.
Start the 50 second job, then immediately start the instant fail job. The client prints that we sent a message requesting that the job run, the server prints that it received the 50 second request, but waits until that 50 second job is completed before handling the second message at all, even though we use the ThreadPoolExecutor.
We are running transactional with Auto acknowledge.
Doing some remote debugging, the Consumer from AbstractPollingMessageListenerContainer shows no unhandled messages (so consumer.receive() obviously just returns null over and over). The webgui for the amq broker shows 2 enqueues, 1 deque, 1 dispatched, and 1 in the dispatched queue. This suggests to me that something is preventing AMQ from letting the consumer "have" the second message. (prefetch is 1000 btw)
This shows as the only consumer for the particular queue.
Myself and a few other developers have poked around for the last few days and are pretty much getting nowhere. Any suggestions on either, what we have misconfigured if this is expected behavior, or, what would be broken here.
Does the method that is being remotely called matter at all? Currently the job handler method uses an executor to run the job in a different thread and does a future.get() (the extra thread is for reasons related to logging).
Any help is greatly appreciated
not sure I follow completely, but off the top, you should try the following...
set the concurrentConsumers/maxConcurrentConsumers greater than the default (1) on the MessageListenerContainer
set the prefetch to 0 to better promote balancing messages between consumers, etc.

Resources