KafkStreams: Discard message during a window - apache-kafka-streams

Need to discard duplicate message within a time window. Message are coming in continuously. Bellow is the part of the code.
kStream.groupByKey(Grouped.with(Serdes.String(), Serdes.String()))
.windowedBy(TimeWindows.of(Duration.ofSeconds(15)))
.reduce((k,m) -> m)
.suppress(Suppressed.untilWindowCloses(Suppressed.BufferConfig.unbounded()))
.toStream()
.foreach((k, v) -> doSomeProcess(k,v));
What I am doing wrong here. I am not seeing any call to the method doSomeProcess. Messages are coming in.

Turned out that "This feature requires adding a "grace period" parameter for windows" From https://cwiki.apache.org/confluence/display/KAFKA/KIP-328%3A+Ability+to+suppress+updates+for+KTables
....
.windowedBy(TimeWindows.of(Duration.ofSeconds(15)).grace(Duration.ofSeconds(5)) )
....
This fixed the issue.

Related

Scatter Gather with parallel flow (Timeout in aggregator)

I've been trying to add a timeout in the gather to don't wait that every flow finished.
but when I added the timeout doesn't work because the aggregator waits that each flow finished.
#Bean
public IntegrationFlow queueFlow(LogicService service) {
return f -> f.scatterGather(scatterer -> scatterer
.applySequence(true)
.recipientFlow(aFlow(service))
.recipientFlow(bFlow(service))
, aggregatorSpec -> aggregatorSpec.groupTimeout(2000L))
E.g of my flows one of them has 2 secs of delay and the other one 4 secs
public IntegrationFlow bFlow(LogicService service) {
return IntegrationFlows.from(MessageChannels.executor(Executors.newCachedThreadPool()))
.handle(service::callFakeServiceTimeout2)
.transform((MessageDomain.class), message -> {
message.setMessage(message.getMessage().toUpperCase());
return message;
}).get();
}
I use Executors.newCachedThreadPool() to run parallel.
I'd like to release each message that was contained until the timeout is fulfilled
Another approach that I've been testing was to use a default gatherer and in scatterGather set the gatherTimeout but I don't know if I'm missing something
Approach gatherTimeout
UPDATE
All the approaches given in the comments were tested and work normally, the only problem is that each action is evaluated over the message group creation. and the message group is created just until the first message arrived. The ideal approach is having an option of valid at the moment when the scatterer distributes the request message.
My temporal solution was to use a release strategy ad hoc applying a GroupConditionProvider which reads a custom header that I created when I send the message through the gateway. The only concern of this is that the release strategy only will be executed when arriving at a new message or I set a group time out.
The groupTimeout on the aggregator is not enough to release the group. If you don't get the whole group on that timeout, then it is going to be discarded. See sendPartialResultOnExpiry option: https://docs.spring.io/spring-integration/reference/html/message-routing.html#agg-and-group-to
If send-partial-result-on-expiry is true, existing messages in the (partial) MessageGroup are released as a normal aggregator reply message to the output-channel. Otherwise, it is discarded.
The gatherTimeout is good to have if you expect no replies from the gatherer at all. So, this way you won't block the scatter-gather thread forever: https://docs.spring.io/spring-integration/reference/html/message-routing.html#scatter-gather-error-handling

How do I debug a Mono that never completes

I have a Spring Boot application which contains a complex reactive flow (it involves MongoDB and RabbitMQ operations). Most of the time it works, but...
Some of the methods return a Mono<Void>. This is a typical pattern, in multiple layers:
fun workflowStep(things: List<Thing>): Mono<Void> =
Flux.fromIterable(things).flatMap { thing -> doSomethingTo(thing) }.collectList().then()
Let's say doSomethingTo() returns a Mono<Void> (it writes something to the database, sends a message etc). If I just replace it with Mono.empty() then everything works as expected, but otherwise it doesn't. More specifically the Mono never completes, it runs through all processing but misses the termination signal at the end. So the things are actually written in the database, messages are actually sent, etc.
To prove that the lack of termination is the problem, here is a hack that works:
val hackedDelayedMono = Mono.empty<Void>().delayElement(Duration.ofSeconds(1))
return Mono.first(
workflowStep(things),
hackedDelayedMono
)
The question is, what can I do with a Mono that never completes, to figure out what's going on? There is nowhere I could put a logging statement or a brakepoint, because:
there are no errors
there are no signals emitted
How could I check what the Mono is waiting for to be completed?
ps. I could not reproduce this behaviour outside the application, with simple Mono workflows.
You can trace and log events in your stream by using the log() operator in your reactive stream. This is useful for gaining a better understanding about what events are occurring within your app.
Flux.fromIterable(things)
.flatMap(thing -> doSomethingTo(thing))
.log()
.collectList()
.then()
Chained inside a sequence, it peeks at every event of the Flux or Mono
upstream of it (including onNext, onError, and onComplete as well as
subscriptions, cancellations, and requests).
Reactor Reference Documentation - Logging a Sequence
The Reactor reference documentation also contains other helpful advice for debugging a reactive stream and can be found here: Debugging Reactor
(We managed to fix the problem - it was not directly in the code I was working on, but for some reason my changes triggered it. I still don't understand the root cause, but higher up the chain we found a Mono.zip() zipping a Mono<Void>. Although this used to work before, it stopped working at some point. Why is a Mono<Void> even zippable, why don't we get a compiler error, and even worse, why does it work sometimes?)
To answer my own question here, the tool used for debugging was adding the following to all Monos in the chain, until it didn't produce any output:
mono.doOnEach { x ->
logger.info("signal: ${x}")
}
.then(Mono.defer {
logger.info("then()")
Mono.empty<Void>()
})
I also experimented with the .log() - also fine tool, but maybe too detailed, and it is not very easy to understand which Mono produces which log messages - as these are logged with the dynamic scope, not the lexical scope, which the above method gives you unambiguously.

MacOS not responding to MPRemoteCommandCenter commands in the background

I am writing an application for my own purposes that aims to get play pause events no matter what is going on in the system. I have gotten this much working
let commandCenter = MPRemoteCommandCenter.shared()
commandCenter.togglePlayPauseCommand.isEnabled = true
commandCenter.togglePlayPauseCommand.addTarget { (MPRemoteCommandEvent) -> MPRemoteCommandHandlerStatus in
print("Play Pause Command")
return .success
}
commandCenter.nextTrackCommand.isEnabled = true
commandCenter.nextTrackCommand.addTarget { (MPRemoteCommandEvent) -> MPRemoteCommandHandlerStatus in
print("NextTrackCommand")
return .success
}
commandCenter.previousTrackCommand.isEnabled = true
commandCenter.previousTrackCommand.addTarget { (MPRemoteCommandEvent) -> MPRemoteCommandHandlerStatus in
print("previousTrackCommand")
return .success
}
commandCenter.playCommand.isEnabled = true
commandCenter.playCommand.addTarget { (MPRemoteCommandEvent) -> MPRemoteCommandHandlerStatus in
print("playCommand")
return .success
}
MPNowPlayingInfoCenter.default().playbackState = .playing
Most of those methods are there because apparently you will not get any notifications without having nextTrackCommand or previousTrackCommand or playCommand implemented.
Anyways my one issue is that as soon as you open another application that uses audio these event handlers stop getting called and I cant find a way to detect and fix this.
I would normally try doing AVAudioSession things to state this as a background application however that does not seem to work. Any ideas on how I can get playpause events no matter what state the system is in?
I would like to be able to always listen for these events OR get an indication of when someone else has taken control of the audio? Perhaps even be able to re-subscribe to these play pause events.
There's an internal queue in the system which contains all the audio event subscribers. Other applications get on top of it when you start using them.
I would like to be able to always listen for these events
There's no API for that but there's a dirty workaround. If I understand your issue correctly, this snippet:
MPNowPlayingInfoCenter.default().playbackState = .paused
MPNowPlayingInfoCenter.default().playbackState = .playing
must do the trick for you if you run it in a loop somewhere in your application.
Note that this is not 100% reliable because:
If an event is generated before two subsequent playbackState state changes right after you've switched to a different application, it would still be catched by the application in the active window;
If another application is doing the same thing, there would be a constant race condition in the queue, with unpredictable outcome.
References:
Documentation for playbackState is here;
See also a similar question;
See also a bug report for mpv with a similar
issue (a pre-MPRemoteCommandCenter one, but still very valuable)
OR get an indication of when someone else has taken control of the audio
As far as I know there's no public API for this in macOS.

Suppress triggers events only when new events are received on the stream

I am using Kafka streams 2.2.1.
I am using suppress to hold back events until a window closes. I am using event time semantics.
However, the triggered messages are only triggered once a new message is available on the stream.
The following code is extracted to sample the problem:
KStream<UUID, String>[] branches = is
.branch((key, msg) -> "a".equalsIgnoreCase(msg.split(",")[1]),
(key, msg) -> "b".equalsIgnoreCase(msg.split(",")[1]),
(key, value) -> true);
KStream<UUID, String> sideA = branches[0];
KStream<UUID, String> sideB = branches[1];
KStream<Windowed<UUID>, String> sideASuppressed =
sideA.groupByKey(
Grouped.with(new MyUUIDSerde(),
Serdes.String()))
.windowedBy(TimeWindows.of(Duration.ofMinutes(31)).grace(Duration.ofMinutes(32)))
.reduce((v1, v2) -> {
return v1;
})
.suppress(Suppressed.untilWindowCloses(Suppressed.BufferConfig.unbounded()))
.toStream();
Messages are only streamed from 'sideASuppressed' when a new message gets to 'sideA' stream (messages arriving to 'sideB' will not cause the suppress to emit any messages out even if the window closure time has passed a long time ago).
Although, in production the problem is likely not to occur much due to high volume, there are enough cases when it is essential not to wait for a new message that gets into 'sideA' stream.
Thanks in advance.
According to Kafka streams documentation:
Stream-time is only advanced if all input partitions over all input topics have new data (with newer timestamps) available. If at least one partition does not have any new data available, stream-time will not be advanced and thus punctuate() will not be triggered if PunctuationType.STREAM_TIME was specified. This behavior is independent of the configured timestamp extractor, i.e., using WallclockTimestampExtractor does not enable wall-clock triggering of punctuate().
I am not sure why this is the case, but, it explains why suppressed messages are only being emitted when messages are available in the queue it uses.
If anyone has an answer regarding why the implementation is such, I will be happy to learn. This behavior causes my implementation to emit messages just to get my the suppressed message to emit in time and causes the code to be much less readable.

How GetKeyState works exactly?

I have been struggling with understand how GetKeyState operating. I have done endless google searching and haven't yet managed yet to understand exactly how it works
According to MSDN:
The key status returned from this function changes as a thread reads key messages from its message queue.
Take a look at the following code. I didn't create a message processing loop. 65 represents the virtual key of the character 'A'.
while(true) {
printf("the character %c, the vkey_state is %x",
MapVirtualKey(65, MAPVK_VK_TO_CHAR),GetKeyState(65) & 0x8000);
Sleep(150);
}
I pressed "A" on the keyboard, while being at the window console of my program.
sometimes, the vkey_state value is 0x8000 as expected, sometimes not.
What exactly is happening under the hood? I didn't write any message-processing code, so i assume it is created automatically. When I press 'A', a WM_KEYDOWN is sent to my thread message queue. When I release the key 'A', a WM_KEYUP is sent to my thread message queue. other key-related messages might be sent in between. What happens when I call GetKeyState? when exactly it will set the MSB of its return value to '1'? When will it change back to 0? Is it related to the calls to GetMessage?
In Addition - what confused me the most - is when I switched to another program (cmd.exe), and I typed 'A', my program was able to monitor it while being in the background - but cmd.exe thread has another message queue - why does it work? However - it didn't work If I started cmd.exe in elevated mode (high integrity).
this contradicts the information I found here.
If the user has switched to another program, then the GetKeyState function will not see the input that the user typed into that other program, since that input was not sent to your input queue.

Resources