KStreams: implementing session window with pocessor API - apache-kafka-streams

I need to implement a logic similar to session windows using processor API in order to have a full control over state store. Since processor API doesn't provide windowing abstraction, this needs to be done manually. However, I fail to find the source code for KStreams session window logic, to get some initial ideas (specifically regarding session timeouts).
I was expecting to use punctuate method, but it's a per processor timer rather than per key timer. Additionally SessionStore<K, AGG> doesn't provide an API to traverse the database for all keys.
[UPDATE]
As an example, assume processor instance is processing K1 and stream time is incremented which causes the session for K2 to timeout. K2 may or may not exist at all. How do you know that there exists a specific key (like K2 when stream time is incremented (while processing a different key)? In other words when stream time is incremented, how do you figure out which windows are expired (because you don't know those keys exists)?

This is the DSL code: https://github.com/apache/kafka/blob/trunk/streams/src/main/java/org/apache/kafka/streams/kstream/internals/KStreamSessionWindowAggregate.java -- hope it helps.
It's unclear what your question is though -- it's mostly statements. So let me try to give some general answer.
In the DSL, sessions are close based on "stream time" progress. Only relying on the input data makes the operation deterministic. Using wall-clock time would introduce non-determinism. Hence, using a Punctuation is not necessary in the DSL implementation.
Additionally SessionStore<K, AGG> doesn't provide an API to traverse the database for all keys.
Sessions in the DSL are based on keys and thus it's sufficient to scan the store on a per-key basis over a time range (as done via findSessions(...)).
Update:
In the DSL, each time a session window is updated, as corresponding update event is sent downstream immediately. Hence, the DSL implementation does not wait for "stream time" to advance any further but publishes the current (potentially intermediate) result right away.
To obey the grace period, the record timestamp is compared to "stream time" and if the corresponding session window is already closed, the record is skipped (cf. https://github.com/apache/kafka/blob/trunk/streams/src/main/java/org/apache/kafka/streams/kstream/internals/KStreamSessionWindowAggregate.java#L146). I.e., closing a window is just a logical step (not an actually operation); the session will still be stored and if a window is closed no additional event needs to be sent downstream because the final result was sent downstream in the last update to the window already.
Retention time itself must not be handled by the Processor implementation because it's a built-in feature of the SessionStore: internally, the session store maintains so-called "segments" that store sessions for a certain time period. Each time a put() is done, the store checks if old segments can be dropped (based on the timestamp provided by put()). I.e., old sessions are deleted lazily and as bulk deletes (i.e., all session of the whole segment will be deleted at once) as it's more efficient than individual deletes.

Related

Redefine database "transactional" boundary on a spring batch job

Is there a way to redefine the database "transactional" boundary on a spring batch job?
Context:
We have a simple payment processing job that reads x number of payment records, processes and marks the records in the database as processed. Currently, the writer does a REST API call (to the payment gateway), processes the API response and marks the records as processed. We're doing a chunk oriented approach so the updates aren't flushed to the database until the whole chunk has completed. Since, basically the whole read/write is within a transaction, we are starting to see excessive database locks and contentions. For example, if the API takes a long time to respond (say 30 seconds), the whole application starts to suffer.
We can obviously reduce the timeout for the API call to be a smaller value.. but that still doesn't solve the issue of the tables potentially getting locked for longer than desirable duration. Ideally, we want to keep the database transaction as short lived as possible. Our thought is that if the "meat" of what the job does can be done outside of the database transaction, we could get around this issue. So, if the API call happens outside of a database transaction.. we can afford it to take a few more seconds to accept the response and not cause/add to the long lock duration.
Is this the right approach? If not, what would be the recommended way to approach this "simple" job in spring-batch fashion? Are there other batch tools better suited for the task? (if spring-batch is not the right choice).
Open to providing more context if needed.
I don't have a precise answer to all your questions but I will try to give some guidelines.
Since, basically the whole read/write is within a transaction, we are starting to see excessive database locks and contentions. For example, if the API takes a long time to respond (say 30 seconds), the whole application starts to suffer.
Since its inception, the term batch processing or processing data in "batches" is based on the idea that a batch of records is treated as a unit: either all records are processed (whatever the term "process" means) or none of the records is processed. This "all or nothing" semantic is exactly what Spring Batch implements in its chunk-oriented processing model. Achieving such a (powerful) property comes with trade-offs. In your case, you need to make a trade-off between consistency and responsiveness.
We can obviously reduce the timeout for the API call to be a smaller value.. but that still doesn't solve the issue of the tables potentially getting locked for longer than desirable duration.
The chunk-size is the most impactful parameter on the transaction behaviour. What you can do is try to reduce the number of records to be processed within a single transaction and see the result. There is no best value, this is an empirical process. This will also depend on the responsiveness of the API you are calling during the processing of a chunk.
Our thought is that if the "meat" of what the job does can be done outside of the database transaction, we could get around this issue. So, if the API call happens outside of a database transaction.. we can afford it to take a few more seconds to accept the response and not cause/add to the long lock duration.
A common technique to avoid doing such updates on a live system is to offload the processing against another datastore and then replicate the updates in a single transaction. The idea is to mark records with a given batch id and copy those records to a different datastore (or even a temporary table within the same datastore) that the batch process can use without impacting the live datastore. Once the processing is done (which could be done in parallel to improve performance), records can be marked as processed in the live system within in a single transaction (this is usually very fast and could be based on the batch id to identify which records to update).

Intentionally drop state when using suppress for rate limiting updates to KTable

I am using Kafka Streams 2.3.1 suppress() operator to limit the number of updates being sent to the underlying KTable.
The use case here is that in my processing logic, I want to make an HTTP call, however to limit the number of calls, I am windowing the stream and aggregating source topic messages that fall into the same time window to make a single API call.
Code looks roughly as follows
KTable<Windowed<String>, List<Event>> windowedEventKTable = inputKStream
.groupByKey()
.windowedBy(TimeWindows.of(Duration.ofSeconds(30)).grace(Duration.ofSeconds(5))
.aggregate(Aggregator::new, ((key, value, aggregate) -> aggregate.aggregate(value)), stateStore)
.suppress(Suppressed.untilTimeLimit(Duration.ofSeconds(5), maxRecords(500).emitEarlyWhenFull())
.mapValues((windowedKey, groupedTriggerAggregator) -> {//code here returning a list})
.toStream((k,v) -> k.key())
.flatMapValues((readOnlyKey, value) -> value);
The problem I am running into is that while the windows exceeding the record limit are emitted, the state is preserved. At some point the state for a single time window grows into multiple MB's, causing the supress store changelog message to exceed the topic's max.message.bytes limit. For our use case, as soon as window is emitted we actually don't care about leftover state and it would be safe to drop it.
As we are sharing the Kafka Cluster between multiple teams, the team running the cluster is hesitant to increase cluster level max.message.bytes property beyond 10 MB's that we require.
Do I have any options other than implementing my logic using transformValues? If not, are there any future Kafka Streams enhancements that would be able to handle this more out of the box?
For our use case, as soon as window is emitted we actually don't care about leftover state and it would be safe to drop it.
For this case, you can set the store retention time (default is 1 day) to the same value as the specified grace period, via aggregation() parameter Materialized.withRetentiontTime(...).
The problem I am running into is that while the windows exceeding the record limit are emitted, the state is preserved. At some point the state for a single time window grows into multiple MB's, causing the supress store changelog message to exceed the topic's max.message.bytes limit.
This is actually an interesting statement, and looking at your code, I just want to clarify something: As you limit by time and allow to emit early based on cache size, it seems that you have a lot of records that are out of order and update the state further even after an intermediate result was emitted. If you purge the state via retention time as describe above you need to consider the following:
Purging state won't affect any emits that are triggered base on cache size, because, the state will only be purges after the retention time passed.
0 Furthermore, purging state implies that all out of order records the appear after purging would not be processed at all, but would be dropped (because retention time implicitly marks input records with smaller timestamp as "late").
However, overall it seems that you don't really care about out of order data and event-time windows as it's ok for you to "arbitrarily" put records into a window as the only goal is to reduce the number of external API calls. Hence, it seems appropriate that you actually switch to processing time semantics by using WallclockTimetampExtractor (instead of the default extractor). For ensure that each record is only emitted once, you should change the suppress() configuration to only emit "final" results.

Which guarantees does Kafka Stream provide when using a RocksDb state store with changelog?

I'm building a Kafka Streams application that generates change events by comparing every new calculated object with the last known object.
So for every message on the input topic, I update an object in a state store and every once in a while (using punctuate), I apply a calculation on this object and compare the result with the previous calculation result (coming from another state store).
To make sure this operation is consistent, I do the following after the punctuate triggers:
write a tuple to the state store
compare the two values, create change events and context.forward them. So the events go to the results topic.
swap the tuple by the new_value and write it to the state store
I use this tuple for scenario's where the application crashes or rebalances, so I can always send out the correct set of events before continuing.
Now, I noticed the resulting events are not always consistent, especially if the application frequently rebalances. It looks like in rare cases the Kafka Streams application emits events to the results topic, but the changelog topic is not up to date yet. In other words, I produced something to the results topic, but my changelog topic is not at the same state yet.
So, when I do a stateStore.put() and the method call returns successfully, are there any guarantees when it will be on the changelog topic?
Can I enforce a changelog flush? When I do context.commit(), when will that flush+commit happen?
To get complete consistency, you will need to enable processing.guarantee="exaclty_once" -- otherwise, with a potential error, you might get inconsistent results.
If you want to stay with "at_least_once", you might want to use a single store, and update the store after processing is done (ie, after calling forward()). This minimized the time window to get inconsistencies.
And yes, if you call context.commit(), before input topic offsets are committed, all stores will be flushed to disk, and all pending producer writes will also be flushed.

Consisntent N1QL Query Couchbase GOCB sdk

I'm currently implementing EventSourcing for my Go Actor lib.
The problem that I have right now is that when an actor restarts and need to replay all it's state from the event journal, the query might return inconsistent data.
I know that I can solve this using MutationToken
But, if I do that, I would be forced to write all events in sequential order, that is, write the last event last.
That way the mutation token for the last event would be enough to get all the data consistently for the specific actor.
This is however very slow, writing about 10 000 events in order, takes about 5 sec on my setup.
If I instead write those 10 000 async, using go routines, I can write all of the data in less than one sec.
But, then the writes are in indeterministic order and I can know which mutation token I can trust.
e.g. Event 999 might be written before Event 843 due to go routine scheduling AFAIK.
What are my options here?
Technically speaking MutationToken and asynchronous operations are not mutually exclusive. It may be able to be done without a change to the client (I'm not sure) but the key here is to take all MutationToken responses and then issue the query with the highest number per vbucket with all of them.
The key here is that given a single MutationToken, you can add the others to it. I don't directly see a way to do this, but since internally it's just a map it should be relatively straightforward and I'm sure we (Couchbase) would take a contribution that does this. At the lowest level, it's just a map of vbucket sequences that is provided to query at the time the query is issued.

How to I set up a lock that will automatically time out if it does not get a keep alive signal?

I have a certain resouce I want to limit access to. Basically, I am using a session level lock. However, it is getting to be a pain writing JavaScript that covers every possible way a window can close.
Once the user leaves that page I would like to unlock the resouce.
My basic idea is to use some sort of server side timeout, to unlock the resouce. Basically, if I fail to unlock the resource, I want a timer to kick in and unlock the resouce.
For example, after 30 seconds with now update from the clientside, unlock the resouce.
My basic question, is what sort of side trick can I use to do this? It is my understanding, that I can't just create a thread in JSF, because it would be unmanaged.
I am sure other people do this kind of thing, what is the correct thing to use?
Thanks,
Grae
As BalusC right fully asked, the big question is at what level of granularity would you like to do this locking? Per logged-in user, for all users, or perhaps you could get away with locking per request?
Or, and this will be a tougher one, is the idea that a single page request grabs the lock and then that specific page is intended to keep the lock between requests? E.g. as a kind of reservation. I'm browsing a hotel page, and when I merely look at a room I have made an implicit reservation in the system for that room so it can't happen that somebody else reserves the room for real while I'm looking at it?
In the latter case, maybe the following scheme would work:
In application scope, define a global concurrent map.
Keys of the map represent the resources you want to protect.
Values of the map are a custom structure which hold a read write lock (e.g. ReentrantReadWriteLock), a token and a timestamp.
In application scope, there also is a single global lock (e.g. ReentrantLock)
Code in a request first grabs the global lock, and quickly checks if the entry in the map is there.
If the entry is there it is taken, otherwise it's created. Creation time should be very short. The global lock is quickly released.
If the entry was new, it's locked via its write lock and a new token and timestamp are created.
If the entry was not new, it's locked via its read lock
if the code has the same token, it can go ahead and access the protected resource, otherwise it checks the timestamp.
If the timestamp has expired, it tries to grab the write lock.
The write lock has a time-out. When the time-out occurs give up and communicate something to the client. Otherwise a new token and timestamp are created.
This just the general idea. In a Java EE application that I have build I have used something similar (though not exactly the same) and it worked quite well.
Alternatively you could use a quartz job anyway that periodically removed the stale entries. Yet another alternative for that is replacing the global concurrent map with e.g. a JBoss Cache or Infinispan instance. These allow you to define an eviction policy for their entries, which saves you from having to code this yourself. If you have never used those caches though, learning how to set them up and configuring them correctly can be more trouble than just building a simple quartz job yourself.

Resources