How to cache a reactive publisher like Observable? - caching

I'm trying to understand what happens if I cache the result of a method returning a cold Observable? The flow has not been materialized yet, so what does the cache actually contain? I tried to find out using Hazelcast and Spring Boot but couldn't get the cache working.
Edit:
When I say cache not working, I am speaking based on what I see from Hazelcast Management Center. Depending on the cache config (I tried many things), either the cache shows up but no entries, or the cache doesn't show up at all.
Example:
#javax.cache.annotation.CacheResult
Observable<Integer> random() {
// Do I get a new number every time?
return Observable.just(new Random().nextInt());
}

From rx-java wiki (source here):
A cold Observable emits a particular sequence of items, but can begin
emitting this sequence when its Observer finds it to be convenient,
and at whatever rate the Observer desires, without disrupting the
integrity of the sequence. For example if you convert a static
Iterable into an Observable, that Observable will emit the same
sequence of items no matter when it is later subscribed to or how
frequently those items are observed. Examples of items emitted by a
cold Observable might include the results of a database query, file
retrieval, or web request.
With cold Observable, like your example, the request is done at subscribe time, for each subscriber. Even without cache, if you subscribe twice to the same Observable, the request will occur twice. The Observable is not bound to a specific stream. Observable is just a contract describing how to access data.
Caching the result of a method returning an Observable is I think somewhat similar to storing the result to a local property; you just avoid to recreate the Observable object again later. But just the 'getter', not the data.
rx-java give some tools to achieve caching by it's own. You could have a look to Subject or ConnectableObservable.

Related

How to set up a pinia store to wait for the state

I have a (composition) store that is used in s couple of modules, they are unaware of each other and can be initialized in the same moment as well as at different times.
The store retrieves data from an API and saves it in the state. All of the modules await the execute store's fetchData method, all of them await a promise it returns.
The problem is that when the modules are initialized at the same moment, both perform fetchData resulting in two requests fired. The ideal situation would be to
Allow only one request being fired (that's pretty easy to do, just save the request state in a ref, f.e. "pending")
Export the state as a promise, so that the module awaits the state to be loaded from an API, receiving it's data only thereafter.
Those with a requirement that the modules are unaware of the store implementation - the less code on their side the better.
How do you handle such cases?
Is a there anything wrong with this approach? If not - how to implement it?

API waiting for a specific record on DynamoDb without pooling

I am inheriting a workflow that has a reasonable amount of data stored in DynamoDb. The data is periodically refreshed by Lambdas calling third parties when needed. The lambdas are triggered by both SQS and DynamoDB streams and go through four or five steps before the data is updated.
I'm given the task to write an API that can forcibly update N items and return their status. The obvious way to do this without reinventing the wheel and honoring DRY is to trigger an event that spawns off a refresh for each item so that the lambdas can do their thing.
The trouble is that I'm not sure the best pub/sub approach to handle being notified that end state of each workflow is met. Do I read from an update/insert stream of dynamodb to see if the records are updated? Do I create some sort of pub/sub model like Reddis or SNS to listen for the end state of each lambda being triggered?
Since I'm writing a REST API, timeouts, if there are failures along the line, arefine. But at the same time I want to make sure I can handle the following.
Be guaranteed that I can be notified that an update occurred for my targets after my call (in the case of multiple forced updates being called at once I only care about the first one to arrive).
Not be bogged down by listening for updates for record updates that are not contextually relevant to the API call in question.
Have an amortized time complexity of 1
In other words, in terms of cap theory i care about C & A but not P (because a 502 isn't that big a deal). But getting the timing wrong or missing a subscription is a problem.
I know I can just listen to a dynamodb event stream but I'm concerned that when things get noisy there will be more irrelevant stuff slowing me down. And I'm not sure if having every single record getting it's own topic is scalable (or how messy that would be).
You can use DynamoDB streams in combination with Lambda Event Filtering so the Lambda function only executes for the relevant change you are interested in. More information is available here:
https://aws.amazon.com/about-aws/whats-new/2021/11/aws-lambda-event-filtering-amazon-sqs-dynamodb-kinesis-sources/

How many "temperatures" are there for a Rx Observable?

All over the Rx.Net literature there are references to what is commonly know as the temperature of an observable.
There are cold observables (like the ones created by Observable.Interval() and similar factory methods), which will create side effects every time that a new Subscription is created.
On the other side of the spectrum there are hot observables (like Subject<T>) which will onboard new subscriptions as they come.
There are also warm observables, like the ones returned by RefCount() which will execute the initialisation every time one subscription is created, but only if there was no other active subscription. The behaviour of these warm observables is explained here by Dave Sexton:
Alternatively, you can call Publish then RefCount to get an IObservable that is shared among multiple consecutive observers. Note that this isn't truly a hot observable - it's more like a warm observable. RefCount makes a single subscription to the underlying observable while there's at least one observer of your query. When your query has no more observers, changing the reference count to 0, the underlying subscription is disposed. If another observer subscribes to your query later, moving the reference count from 0 to 1 again, then RefCount makes a new subscription to the underlying observable, causing subscription side-effects to occur again.
Are there any other temperatures that one should be aware of? Is it possible to obtain programmatically the temperature of an Observable?
Easy question first:
Is it possible to obtain programmatically the temperature of an Observable?
No. Best you can do is subscribe and see what happens.
The observable 'contract' specifies that when you subscribe to an observable you get zero or more OnNext messages, optionally followed by either one OnCompleted or one OnError message. The contract doesn't specify anything about how multiple or earlier/later subscribers are treated, which is what observable 'temperature' is mostly concerned with.
Are there any other temperatures that one should be aware of?
I wouldn't even think of it in such concrete or discrete terms as you have specified.
I think of it in terms of on-subscribe effects: The coldest of observables have all their effects happen on subscribe (like Observable.Return(42)). The hottest of observables have no effects happening on subscribe (new Subject<int>()). In between those two poles is a continuum.
Observable.Interval(TimeSpan.FromMilliseconds(100)) for example will emit a new number every 100 milliseconds. That example, unlike Observable.Return(42), could be mostly 'warmed-over' via .Publish().RefCount(): The first subscriber starts the numbers, but the second subscriber will see the only the latest numbers, not starting from 0. However, if instead of .Publish() you did .Replay(2).RefCount(), then you have some on-subscribe effects going on. Do the Publish and Replay observables have the same 'temperature'?
TL;DR: Don't focus on the classifications that much. Understand the difference between the two and know that some observables have colder properties and some have warmer ones.

Which guarantees does Kafka Stream provide when using a RocksDb state store with changelog?

I'm building a Kafka Streams application that generates change events by comparing every new calculated object with the last known object.
So for every message on the input topic, I update an object in a state store and every once in a while (using punctuate), I apply a calculation on this object and compare the result with the previous calculation result (coming from another state store).
To make sure this operation is consistent, I do the following after the punctuate triggers:
write a tuple to the state store
compare the two values, create change events and context.forward them. So the events go to the results topic.
swap the tuple by the new_value and write it to the state store
I use this tuple for scenario's where the application crashes or rebalances, so I can always send out the correct set of events before continuing.
Now, I noticed the resulting events are not always consistent, especially if the application frequently rebalances. It looks like in rare cases the Kafka Streams application emits events to the results topic, but the changelog topic is not up to date yet. In other words, I produced something to the results topic, but my changelog topic is not at the same state yet.
So, when I do a stateStore.put() and the method call returns successfully, are there any guarantees when it will be on the changelog topic?
Can I enforce a changelog flush? When I do context.commit(), when will that flush+commit happen?
To get complete consistency, you will need to enable processing.guarantee="exaclty_once" -- otherwise, with a potential error, you might get inconsistent results.
If you want to stay with "at_least_once", you might want to use a single store, and update the store after processing is done (ie, after calling forward()). This minimized the time window to get inconsistencies.
And yes, if you call context.commit(), before input topic offsets are committed, all stores will be flushed to disk, and all pending producer writes will also be flushed.

Asynchronous data loading in flux stores

Say I have a TodoStore. The TodoStore is responsible for keeping my TODO items. Todo items are stored in a database.
I want to know what is the recommended way for loading all todo items into the store and how the views should interact with the store to load the TODO items on startup.
The first alternative is to create a loadTodos action that will retrieve the Todos from the database and emit a TODOS_LOADED event. Views will then call the loadTodos action and then listen to the TODOS_LOADED event and then update themselves by calling TodoStore.getTodos().
Another alternative is to not have a loadTodos action, and have a TodoStore.getTodos() that will return a promise with the existing TODO items. If the TodoStore has already loaded the TODO items, it just returns them; if not, then it will query from the database and return the retrieved items. In this case, even though the store now has loaded the TODO items, it will not emit a TODOS_LOADED event, since getTodos isn't an action.
function getTodos() {
if (loaded)
return Promise.resolve($todoItems);
else
return fetchTodoItemsFromDatabase().then(todoItems) {
loaded = true;
$todoItems = todoItems;
return $todoItems;
});
}
I'm sure many will say that that breaks the Flux architecture because the getTodos function is changing the store state, and store state should only be changed though actions sent in from the dispatcher.
However, if you consider that state for the TodoStore is the existing TODO items in the database, then getTodos isn't really changing any state. The TODO items are exactly the same, hence no view need to be updated or notified. The only thing is that now the store has already retrieved the data, so it is now cached in the store. From the View's perspective, it shouldn't really care about how the Store is implemented. It shouldn't really care if the store still needs to retrieve data from the database or not. All views care about is that they can use the Store to get the TODO items and that the Store will notify them when new TODO items are created, deleted, or changed.
Hence, in this scenario, views should just call TodoStore.getTodos() to render themselves on load, and register an event handler on TODO_CHANGE to be notified when they need to update themselves due to a addition, deletion, or change.
What do you think about these two solutions. Are they any other solutions?
The views do not have to be the entities that call loadTodos(). This can happen in a bootstrap file.
You're correct that you should try your best to restrict the data flow to actions inside the dispatch payload. Sometimes you need to derive data based on the state of other stores, and this is what Dispatcher.waitFor() is for.
What is Flux-like about your fetchTodoItemsFromDatabase() solution is that no other entity is setting data on the store. The store is updating itself. This is good.
My only serious criticism of this solution is that it could result in a delay in rendering if you are actually getting the initial data from the server. Ideally, you would send down some data with the HTML. You would also want to make sure to call for the stores' data within your controller-views' getInitialState() method.
Here is my opinion about that, very close to yours.
I maintain the state of my application in Store via Immutable.Record and Immutable.OrderedMap from Immutable.js
I have a top controller-view component that get its state from the Store.
Something such as the following :
function getInitialState() {
return {
todos: TodoStore.getAll()
}
}
TodoStore.getAll methods will retrieve the data from the server via a APIUtils.getTodos() request if it's internal _todos map is empty. I advocate for read data triggered in Store and write data triggered in ActionCreators.
By the time the request is processing, my component will render a simple loading spinner or something like that
When the request resolves, APIUtils trigger an action such as TODO_LIST_RECEIVE_SUCCESS or TODO_LIVE_RECEIVE_FAIL depending on the status of the response
My TodoStore will responds to these action by updating its internal state (populating it's internal Immutable.OrderedMap with Immutable.Record created from action payloads.
If you want to see an example through a basic implementation, take a look to this answer about React/Flux and xhr/routing/caching .
I know it's been a couple of years since this was asked, but it perfectly summed up the questions I am struggling with this week. So to help any others that may come across this question, I found this blog post that really helped me out by Nick Klepinger: "ngrx and Tour of Heroes".
It is specifically using Angular 2 and #ngrx/store, but answers your question very well.

Resources