How to organize async response in RxJS (or FRP more generically) - rxjs

Let's say we have a source$ observable, which is usually user interaction. And we want to perform an async operation on each firing of source$ and "map" the async result to an output observable, say result$.
mergeMap
The naivest implementation is
result$ = source$.pipe(
mergeMap((s) => someAsyncOperation(s))
)
However, it is possible for a previous response to override most recent response because someAsyncOperation may spend different amount of time for each round.
source: -----1-----2------->
result1: -----------|
result2: --|
result: -------------2--1-->
The last value on result$ observable is 1, which is incorrect as we have already triggered the operation for 2 and the response 2 has already arrived.
switchMap
We can replace mergeMap with switchMap and the graph would be:
source: -----1-----2------->
result1: -----------|
result2: --|
result: -------------2----->
For typical use cases like search suggestion, switchMap is desirable since the response-1 is most likely to be valueless once action-2 is fired.
Problem
But for some other cases, responses for previous actions may still be valid. For example, for a periodic polling scenario, responses are valuable as long as their chronical order is perserved.
source: -----1-----2----3------->
result1: --------|
result2: -----------|
result3: ----|
mergeMap: -----------1------3-2->
switchMap:------------------3--->
expected: -----------1------3--->
It's obvious that response-1 and response-3 are both desirable as they arrive in chronical order (while response-2 is invalid because it arrives after response-3).
The problem with mergeMap is that it cannot omit the invalid response-2.
While switchMap is also suboptimal because it omits a desirable value response-1 as the second observable has already started when response-1 arrives. The problem of switchMap worsens when the average RTT is larger than the polling interval.
source: -----1----2----3----4----5----->
result1: --------|
result2: --------|
result3: --|
result4: -------|
result5: ----|
switchMap:---------------3------------->
expected: -----------1---3---------4-5->
You can see switchMap generates far less outputs than the ideal one
Question
How should I get the expected output observable in this case?

You can attach an "emission index" to each response and use that to filter out older emissions.
const result$ = source$.pipe(
mergeMap((s, index) => someAsyncOperation(s).pipe(
map(response => ({ response, index }))
)),
scan((prev, cur) => ({...cur, maxIndex: Math.max(prev.maxIndex, cur.index)}), INITIAL),
filter(({index, maxIndex}) => index === maxIndex),
map(({response}) => response),
);
Here we can use scan to keep track of the highest emitted index thus far, then use filter to prevent emissions from older requests.
Here's a working StackBlitz demo.

Related

dynamoDB / aws-amplify - Query with 'limit' + 'filter' - returns half empty or completely empty pages and not full with items with 'limit' amount

Code example:
import { API, graphqlOperation } from "aws-amplify"
const fetchInitialAds = () => {
const { filterAttributes } = useContext(FilterContext)
const {
query: { slug },
} = useRouter()
const fetchAds = async ({ pageParam = null }) => {
const {
deal_type,
priceFrom,
priceTo,
} = filterAttributes
const currTimeInSeconds = Math.ceil(Date.now() / 1000)
const options = {
subcategoryID: `SUBCATEGORY#${slug}`,
sortDirection: "DESC",
limit: 24,
nextToken: pageParam,
filter: {
and: [
{
or: [
{
list_price: {
between: [
Number(priceFrom) || 0,
Number(priceTo) || 100000000,
],
},
},
{
new_price: {
between: [
Number(priceFrom) || 0,
Number(priceTo) || 100000000,
],
},
},
],
},
],
},
}
const adsResult = await API.graphql(
graphqlOperation(adsBySubcategoryByExpdate, options)
)
const data = await adsResult.data.adsBySubcategoryByExpdate
return data
}
return useInfiniteQuery(["ads", slug], fetchAds, {
getNextPageParam: (currentPage, allPages) => currentPage.nextToken,
})
}
Problem with 'Query' with 'limit' + 'filter' on a big database for example with 5000 items in it. If you limit down to show 24 items at a time (for pagination), the 'filter' expresion will filter only the first 24 items in your table and return matched items. If you get filter-match for 5 out of 24 first items it will return only these 5 items... <-- this means your first page you want to display for client will be not full with 24 items but only 5 items... and then u need to filter again next 24 items with 'nextToken'. This time maybe you get 10 matched items. So now you are displaying only 16 items for the client (still not full page of 24 items...). And lets say the third run on next 24 items u get 0 matched items <-- you will not get any new items to display your client. And it will feel wierd that - You clicked on Button to fetch next 24 items but got 0 items and you have to continue clicking that button till it hits some new matched items and till you have went throught the whole Table....
So my question is: Is there a solution where if you put a limit of 24 items, so that dynamo db collects full list of 24 matched items and only then return a full page? I dont want to get halffull pages and spam my 'Fetch more' button and see if i havnt missed any other filter-matched items......
The fact that a Query operation first reads a page of data (1MB of data, or Limit rows if you specified that parameter), and only then does filtering on it, is deliberate and documented:
A single Query operation will read up to the maximum number of items set (if using the Limit parameter) or a maximum of 1 MB of data and then apply any filtering to the results using FilterExpression. ... A Query operation can return an empty result set and a LastEvaluatedKey if all the items read for the page of results are filtered out.
The reason for this design is that it limits the latency and the cost of each request: Imagine a situation that in a 1 TB database you run a query with a filter that only matches 5 items. If the query were to continue until it can return 5 items, it would potentially read 1 TB of data before returning anything - which will cause huge latency (the client would most likely assume the connection broke and disconnect it before getting a response...), as well as huge cost. By returning an empty page each time the database reads 1 MB of data, the client remains aware that the query is progressing and doesn't time out - and also remains aware of the cost of this query and given an opportunity to stop it.
One could imagine a better API, which includes both a limit on the number of items to read, and a limit on the items to return, with a page being ended as soon as either one of these limits is reached. Unfortunately, DynamoDB doesn't have such an API. As you noticed you are forced to use a Limit and retry if not enough results were returned.
If you believe that the matching results are uniformly distributed across items, you can attempt to "guess" how big Limit should be to produce roughly the right number of results - and you can improve this guess as you go along. Guessing a too-low Limit just means you'll need to issue a second query, and guessing a too-high Limit just means you will have read a bit too much (and paid a bit too much), but in either case it's (usually) not a disaster. In any case you don't need the user to click to get more results - you can do it internally in code.

Replay Subject from Observable (emit all previous events)

I have an observable and I'm trying to make a replay subject from. It should emit the current and all previous events that the observable emitted.
Here is what I thought would work, as per this answer:
// create an observable from dummy array and a blank replay subject
const observable$ = from([1, 2, 3])
const replay$ = new ReplaySubject()
// create a replay subject from the observable (not working as expected)
observable$.subscribe(replay$)
// log out the events
observable$.subscribe(e => console.log('observable', e) )
replay$.subscribe(e => console.log('replay', e))
Logs
observable 1
observable 2
observable 3
replay 1
replay 2
replay 3
The behavior I'm looking for is such that the replay subject emits the previous events as well, like so:
replay [1]
replay [1, 2]
replay [1, 2, 3]
How can I archieve this?
ReplaySubject replays the whole sequence only on subscription. This means only when you call replay$.subscribe(). After that it only passes through all emissions.
From what output you want to get it looks like you want to chain it with scan() because ReplaySubject emits items one by one and not as accumulated arrays as you expect.
observable$.pipe(
scan((acc, val) => [...acc, val], []),
).subscribe(replay$);
Live demo: https://stackblitz.com/edit/rxjs-yfgkvf?file=index.ts

Replay cached items in reverse order on subscribe

I have a ConnectableObservable which upon subscribe, will replay the last x items in their original order (oldest to newest) and any subsequent events after.
I am using this Observable as the backing store for an event blotter, however upon subscribe I would actually like the replayed items to be pushed/onNext'ed in the reverse order (newest to oldest) so I can display the most relevant items first.
Is this possible with standard RX operators or will I have to create a custom one?
You can't do it with replay() as you'd need to get only the cached items on a non-terminated source. However, ReplaySubject lets you peek into it and get an array of items that you can reverse, then concatenate with the rest from the same subject but skipping the snapshot items just retrieved:
ReplaySubject<ItemType> subject = ReplaySubject.create();
source.subscribe(subject);
Observable<ItemType> result = Observable.defer(() -> {
ItemType[] current = subject.getValues(new ItemType[0]);
return Observable.range(0, current.length)
.map(index -> current[current.length - 1 - index])
.concatWith(subject.skip(current.length));
});

How to extend the duration of an rx Observable.timer?

Given an Observable.timer(10000). Say, that I'd like to continuously update the timer and not allow it to emit, is it possible?
For example, at t = 2000, I want to increase the timeout time by 2000. Given this dynamic code change, the timer will now emit at t = 12000 rather than the original t = 10000.
try the code below.
Rx.Observable.fromEvent(document,"click")
.scan((curr,acc)=>++curr,0)
.flatMap(e=>{
console.log(e)
return Rx.Observable.timer(e*1000)}
)
.subscribe(console.log)

How can I create a histogram of time stamp deltas?

We are storing small documents in ES that represent a sequence of events for an object. Each event has a date/time stamp. We need to analyze the time between events for all objects over a period of time.
For example, imagine these event json documents:
{ "object":"one", "event":"start", "datetime":"2016-02-09 11:23:01" }
{ "object":"one", "event":"stop", "datetime":"2016-02-09 11:25:01" }
{ "object":"two", "event":"start", "datetime":"2016-01-02 11:23:01" }
{ "object":"two", "event":"stop", "datetime":"2016-01-02 11:24:01" }
What we would want to get out of this is a histogram plotting the two resulting time stamp deltas (from start to stop): 2 minutes / 120 seconds for object one and 1 minute / 60 seconds for object two.
Ultimately we want to monitor the time between start and stop events but it requires that we calculate the time between those events then aggregate them or provide them to the Kibana UI to be aggregated / plotted. Ideally we would like to feed the results directly to Kibana so we can avoid creating any custom UI.
Thanks in advance for any ideas or suggestions.
Since you're open to use Logstash, there's a way to do it using the aggregate filter
Note that this is a community plugin that needs to be installed first. (i.e. it doesn't ship with Logstash by default)
The main idea of the aggregate filter is to merge two "related" log lines. You can configure the plugin so it knows what "related" means. In your case, "related" means that both events must share the same object name (i.e. one or two) and then that the first event has its event field with the start value and the second event has its event field with the stop value.
When the filter encounters the start event, it stores the datetime field of that event in an internal map. When it encounters the stop event, it computes the time difference between the two datetimes and stores the duration in seconds in the new duration field.
input {
...
}
filter {
...other filters
if [event] == "start" {
aggregate {
task_id => "%{object}"
code => "map['start'] = event['datetime']"
map_action => "create"
}
} else if [event] == "stop" {
aggregate {
task_id => "%{object}"
code => "map['duration'] = event['datetime'] - map['start']"
end_of_task => true
timeout => 120
}
}
}
output {
elasticsearch {
...
}
}
Note that you can adjust the timeout value (here 120 seconds) to better suit your needs. When the timeout has elapsed and no stop event has happened yet, the existing start event will be ditched.

Resources