Strategy handling and invalidating cached data on subscriptions in a moderately complex usecase - caching

Let's take a chat application for example.
A user has access to multiple chat threads, each with multiple messages.
The user interface consists of a list of threads on the left side (listThreads) that contains the name of the other party, the last message, the unread message count and the date & time of the last message, and the actual messages (viewThread) and a reply box on the right hand side (think facebook messenger).
When the user selects a message thread, the viewThread component subscribes to a query something along the lines of:
query thread {
threads(id: 'xxxx') {
id
other_party { id name }
unread_messages
messages {
sent_by { id }
timestamp
text
}
}
To make updates live, it is setting q.subscriptToMore with a subscription along the lines of:
subcription newMessages {
newMessage(thread_id: 'xxx') {
sent_by { id }
timestamp
text
}
}
This works perfectly, the new messages show up as they should.
To list the available message threads, a less detailed view of all threads are queried:
query listThreads {
threads {
id
other_party { id name }
unread_messages
last_updated_at
}
}
To keep the list in sync the same subscription is used, without filtering on the thread_id, and the thread list data is updated manually
This also works fine.
However if a thread A is selected, the messages of thread A are cached.
If thread B is selected afterwards the subscription to the query getting the detailed info of thread A is destroyed since the observable is destroyed when the router excanges the viewThread component.
If then a message arrives to thread A while the user is viewing thread B, the threadList is updated (since that subscription is live), but if the user switches back to thread A, the messages are loaded from the cache which are now outdated, since there was no subscription for that particular message thread that would have updated or invalidated the cache.
In other circumstances where the user navigates to an entirely different page, where thread list would not be in view the problem is even more obvious, as there is nothing related to the chat messages that are actively subscribed to, so nothing to invalidate the cached data when a new message arrives, although the server theoretically provides a way to do that by offering new message subscription events.
My question is the following:
What are the best practices on keeping data in sync / invalidating that has been cached by Apollo, which are not actively "in use"?
What are the best practices on keeping nested data in sync (messages of threads of an event [see below]). I don't feel like having to implement the logic on how to subscribe to and update message data in the event query is a clean solution.
Using .subscribeToMore works for keeping data that is actively used in sync, but once that query is no longer in use the data remains in the cache which may or may not get outdated with time. Is there a way to remove cached data when an observable goes out of scope? As in keep this data cached as long as there is at least one query using it, because i trust that it also implements logic that will keep it in sync based on the server push events.
Should a service be used that subscribes (through the whole lifecycle of the SPA) to all subscription events and contains the knowledge on how to update each type of cached data, if present in the cache? (this service could be notified on what data needs to be kept in sync to avoid using more resources than necessary) (as in a service that subscribes to all newMessage events, and pokes the cache based on that)? Would that automatically emit new values for queries that have returned objects that have references to such data? (would updating message:1 make a thread query that returned the same message:1 in its messages field emit a new value automatically) Or those queries have to also be updated manually?
This starts to be very cumbersome when extending this model with say Events that also have their own chat thread, so querying event { thread { messages { ... } } now needs to subscribe to the newMessage subscription which breaks encapsulation and the single responsibility principle.
It is also problematic that to subscribe to newMessage data one would need to provide the id of the message thread associated with the event, but that is not known before the query returns. Due to this .subscribeToMore cannot be used, because at that point I don't have the thread_id available yet.

If the intended behavior is "every time I open a thread, show the latest messages and not just what's cached", then you just need to set the fetchPolicy for your thread query to network-only, which will ensure that the request is always sent to the server rather than being fulfilled from the cache. The docs for apollo-angular are missing information about this option, but here's the description from the React docs:
Valid fetchPolicy values are:
cache-first: This is the default value where we always try reading data from your cache first. If all the data needed to fulfill your query is in the cache then that data will be returned. Apollo will only fetch from the network if a cached result is not available. This fetch policy aims to minimize the number of network requests sent when rendering your component.
cache-and-network: This fetch policy will have Apollo first trying to read data from your cache. If all the data needed to fulfill your query is in the cache then that data will be returned. However, regardless of whether or not the full data is in your cache this fetchPolicy will always execute query with the network interface unlike cache-first which will only execute your query if the query data is not in your cache. This fetch policy optimizes for users getting a quick response while also trying to keep cached data consistent with your server data at the cost of extra network requests.
network-only: This fetch policy will never return you initial data from the cache. Instead it will always make a request using your network interface to the server. This fetch policy optimizes for data consistency with the server, but at the cost of an instant response to the user when one is available.
cache-only: This fetch policy will never execute a query using your network interface. Instead it will always try reading from the cache. If the data for your query does not exist in the cache then an error will be thrown. This fetch policy allows you to only interact with data in your local client cache without making any network requests which keeps your component fast, but means your local data might not be consistent with what is on the server. If you are interested in only interacting with data in your Apollo Client cache also be sure to look at the readQuery() and readFragment() methods available to you on your ApolloClient instance.
no-cache: This fetch policy will never return your initial data from the cache. Instead it will always make a request using your network interface to the server. Unlike the network-only policy, it also will not write any data to the cache after the query completes.

Related

Cache invalidation; refetch practices when nothing says the data has changed, but it might have?

While learning about GraphQL and Apollo, I went through this tutorial series.
It shows how to create an application, that has:
A channel list view (/)
Shows all channels
Allows to open up channel detail view
Allows to create new channel
A channel detail view (ie. /soccer)
Shows messages added to channel
Allows user to add a new message
Apollo by default caches queries and that presents an issue:
open channel (/soccer), for the first time, data is not present in cache, query is executed and result stored in cache
go back to channel list view
open a different channel (/baseball)
other visitor adds a message for /soccer
go back to channel list view
open channel (/soccer), due to data being present in store - stale data gets loaded, because nothing says the data is stale and should be refetched
I cannot seem to find a reasonable way to tackle it. Not looking for code, just some good practices on how to handle it with GraphQL.
I tried changing fetchPolicy to cache-and-network, but it doesn't ask for more data - same applies, nothing says the data is stale. network works, but that goes around cache - a viable solution when it fits the requirements, but I actually wan't some caching.
Possible options I have thought of:
Separate queries: one for main channel details, one for messages. Set messages fetchPolicy to network. Viable option, but sends already available data around.
Separate queries. Messages are being paginated. Upon loading determine if first load or not and fetchMore.
Utilize a GraphQL subscription that notifies about server side events, use that to determine if should refetch. Introduces loads of other nuances, for instance, how long should I listen to events for channel X if I have already left it.
I know it depends on the project, but what other options are out there, which are favored, why?

ES,CQRS messaging flow

I was trying to understanding ES+CQRS and tech stack can be used.
As per my understanding flow should be as below.
UI sends a request to Controller(HTTP Adapter)
Controller calls application service by passing Request Object as parameter.
Application Service creates Command from Request Object passed from controller.
Application Service pass this Command to Message Consumer.
Message Consumer publish Command to message broker(RabbitMQ)
Two Subscriber will be listening for above command
a. One subscriber will generate Aggregate from eventStore using command
and will apply command than generated event will be stored in event store.
b. Another subscriber will be at VIEW end,that will populate data in view database/cache.
Kindly suggest my understanding is correct.
Kindly suggest my understanding is correct
I think you've gotten a bit tangled in your middleware.
As a rule, CQRS means that the writes happen to one data model, and reads in another. So the views aren't watching commands, they are watching the book of record.
So in the subscriber that actually processes the command, the command handler will load the current state from the book of record into memory, update the copy in memory according to the domain model, and then replace the state in the book of record with the updated version.
Having update the book of record, we can now trigger a refresh of the data model that backs the view; no business logic is run here, this is purely a transform of the data from the model we use for writes to the model we use for reads.
When we add event sourcing, this pattern is the same -- the distinction is that the data model we use for writes is a history of events.
How atomicity is achieved in writing data in event store and writing data in VIEW Model?
It's not -- we don't try to make those two actions atomic.
how do we handle if event is stored in EventStrore but System got crashed before we send event in Message Queue
The key idea is to realize that we typically build new views by reading events out of the event store; not by reading the events out of the message queue. The events in the queue just tell us that an update is available. In the absence of events appearing in the message queue, we can still poll the event store watching for updates.
Therefore, if the event store is unreachable, you just leave the stale copy of the view in place, and wait for the system to recover.
If the event store is reachable, but the message queue isn't, then you update the view (if necessary) on some predetermined schedule.
This is where the eventual consistency part comes in. Given a successful write into the event store, we are promising that the effects of that write will be visible in a finite amount of time.

CQRS+ES: Client log as event

I'm developing small CQRS+ES framework and develop applications with it. In my system, I should log some action of the client and use it for analytics, statistics and maybe in the future do something in domain with it. For example, client (on web) download some resource(s) and I need save date, time, type (download, partial,...), from region or country (maybe IP), etc. after that in some view client can see count of download or some complex report. I'm not sure how to implement this feather.
First solution creates analytic context and some aggregate, in each client action send some command like IncreaseDownloadCounter(resourced) them handle the command and raise domain event's and updating view, but in this scenario first download occurred and after that, I send command so this is not really command and on other side version conflict increase.
The second solution is raising event, from client side and update the view model base on it, but in this type of handling my event not store in event store because it's not raise by command and never change any domain context. If is store it in event store, no aggregate to handle it after fetch for some other use.
Third solution is raising event, from client side and I store it on other database may be for each type of event have special table, but in this manner of event handle I have multiple event storage with different schema and difficult on recreating view models and trace events for recreating contexts states so in future if I add some domain for use this type of event's it's difficult to use events.
What is the best approach and solution for this scenario?
First solution creates analytic context and some aggregate
Unquestionably the wrong answer; the event has already happened, so it is too late for the domain model to complain.
What you have is a stream of events. Putting them in the same event store that you use for your aggregate event streams is fine. Putting them in a separate store is also fine. So you are going to need some other constraint to make a good choice.
Typically, reads vastly outnumber writes, so one concern might be that these events are going to saturate the domain store. That might push you towards storing these events separately from your data model (prior art: we typically keep the business data in our persistent book of record, but the sequence of http requests received by the server is typically written instead to a log...)
If you are supporting an operational view, push on the requirement that the state be recovered after a restart. You might be able to get by with building your view off of an in memory model of the event counts, and use something more practical for the representations of the events.
Thanks for your complete answer, so I should create something like the ES schema without some field (aggregate name or type, version, etc.) and collect client event in that repository, some offline process read and update read model or create command to do something on domain space.
Something like that, yes. If the view for the client doesn't actually require any validation by your model at all, then building the read model from the externally provided events is fine.
Are you recommending save some claim or authorization token of the user and sender app for validation in another process?
Maybe, maybe not. The token describes the authority of the event; our own event handler is the authority for the command(s) that is/are derived from the events. It's an interesting question that probably requires more context -- I'd suggest you open a new question on that point.

Tracking ajax request status in a Flux application

We're refactoring a large Backbone application to use Flux to help solve some tight coupling and event / data flow issues. However, we haven't yet figured out how to handle cases where we need to know the status of a specific ajax request
When a controller component requests some data from a flux store, and that data has not yet been loaded, we trigger an ajax request to fetch the data. We dispatch one action when the request is initiated, and another on success or failure.
This is sufficient to load the correct data, and update the stores once the data has been loaded. But, we have some cases where we need to know whether a certain ajax request is pending or completed - sometimes just to display a spinner in one or more views, or sometimes to block other actions until the data is loaded.
Are there any patterns that people are using for this sort of behavior in flux/react apps? here are a few approaches I've considered:
Have a 'request status' store that knows whether there is a pending, completed, or failed request of any type. This works well for simple cases like 'is there a pending request for workout data', but becomes complicated if we want to get more granular 'is there a pending request for workout id 123'
Have all of the stores track whether the relevant data requests are pending or not, and return that status data as part of the store api - i.e. WorkoutStore.getWorkout would return something like { status: 'pending', data: {} }. The problem with this approach is that it seems like this sort of state shouldn't be mixed in with the domain data as it's really a separate concern. Also, now every consumer of the workout store api needs to handle this 'response with status' instead of just the relevant domain data
Ignore request status - either the data is there and the controller/view act on it, or the data isn't there and the controller/view don't act on it. Simpler, but probably not sufficient for our purposes
The solutions to this problem vary quite a bit based on the needs of the application, and I can't say that I know of a one-size-fits-all solution.
Often, #3 is fine, and your React components simply decide whether to show a spinner based on whether a prop is null.
When you need better tracking of requests, you may need this tracking at the level of the request itself, or you might instead need this at the level of the data that is being updated. These are two different needs that require similar, but slightly different approaches. Both solutions use a client-side id to track the request, like you have described in #1.
If the component that calls the action creator needs to know the state of the request, you create a requestID and hang on to that in this.state. Later, the component will examine a collection of requests passed down through props to see if the requestID is present as a key. If so, it can read the request status there, and clear the state. A RequestStore sounds like a fine place to store and manage that state.
However, if you need to know the status of the request at the level of a particular record, one way to manage this is to have your records in the store hold on to both a clientID and a more canonical (server-side) id. This way you can create the clientID as part of an optimistic update, and when the response comes back from the server, you can clear the clientID.
Another solution that we've been using on a few projects at Facebook is to create an action queue as an adjunct to the store. The action queue is a second storage area. All of your getters draw from both the store itself and the data in the action queue. So your optimistic updates don't actually update the store until the response comes back from the server.

Mule - Returning data from multiple flows as soon as it's ready

Hello there Stack Overflow.
My scenario is that I have a web page where a user can enter data (search terms, such as the name of a product on sale, a category, etc). On submission, this data is sent to the Mule ESB which then uses it to query two (or more) databases. One of these databases is rather quick and returns data fast, but the other is slow and can take a minute or longer to come back with information (if it doesn't timeout).
Currently, Mule is waiting to collect results from all flows before sending any information back to the web browser which made the query.
My problem is that this creates a very bad experience for the user - especially if the product that they're looking for is not in a database. They could be waiting quite a while before receiving anything back.
My current flow is here: http://i.stack.imgur.com/fyyI0.png
I have attempted to experiment with asynchronous flows but have never got them to send back data as and when it's ready.
Is there any way in Mule to return results from multiple flows as soon as the result is available? I would like to display the results for each query/flow as and when they come in, rather than waiting for all flows to terminate before sending data back to the user's browser.
I think the best option for your use case, if I understood it correctly, would be to use asynchronous processing and return the results through the Ajax transport: http://www.mulesoft.org/documentation/display/current/AJAX+Transport+Reference
This way you can return immediately to the client and publish results when you get them in the Ajax channel.

Resources