Observable defaulting when not empty - rxjs

I have some code that is reading from a database, iterating each row of data and performing some logic on it, then creating an observable that then writes to the database, adding it to an array (creating an array of observables), so that when the array of observables is subscribed to via forkJoin all the necessary data is written to the database.
This seems to work perfectly fine until the number of observables in the array gets quite large. The amount of rows can be anywhere from 0-6000, so the size of the array can grow up to this. When it does get to this size the observable no longer writes to the database but instead returns the default value from defaultIfEmpty. I'm stumped as to why it works normally with smaller amounts of observables, but suddenly becomes empty on larger amounts...
It might be a little more clear with a code example
function writeToDB() {
// rows taken from the database, n = 0..6000
data = []
// array of observables
observables = []
for (const row of data) {
if (row.age > 20) {
// websocket between service and database, returns an observable
const observable = websocket.put(row).pipe(
o$.catchError((err) => {
return r$.of(err)
}),
o$.defaultIfEmpty({
success: true,
status: 200
})
);
observables.push(observable);
}
}
return forkJoin([...observables]);
}
Using this example works perfectly fine when subscribed to, except with a large data set where the array observables is about 5000 in length. At that point it starts to return the defaultIfEmpty values { success: true, status: 200 } and I cannot workout why... Any help or advice would greatly appreciated.

It's not clear from what you've shown here. Still, if this works with a smaller number of calls, then there's a good chance that websocket exhibits some strange behavior at those numbers.
Something worth trying might be to limit the concurrency on you websocket calls.
function writeToDB(data) {
// data contains rows taken from the database, n = 0..6000
return from(data).pipe(
filter(row => row.age > 20),
map(row => websocket.put(row).pipe(
catchError(err => of(err)),
// last makes sure that mergeAll behaves like forkJoin
last(undefined, {
success: true,
status: 200
})
)),
// mergeAll lets you choose how many can run concurrently
// for example, at most 50 websocket calls are made at
// once here
mergeAll(50),
toArray()
);
}
I prefer map, mergeAll over mergeMap in this case (as I think you're less likely to miss the concurrent aspect of this), but you can use either.
function writeToDB(data) {
// data contains rows taken from the database, n = 0..6000
return from(data).pipe(
filter(row => row.age > 20),
mergeMap(row => websocket.put(row).pipe(
catchError(err => of(err)),
// last makes sure that mergeMap behaves like forkJoin
last(undefined, {
success: true,
status: 200
})
), 50), // <- sneaky! ;)
toArray()
);
}

Related

RxJS: execute concatMap i parallel

Is it possible to execute a high-order observable in parallel, but still preserve the order when merging the results?
I have something looking like this:
invoker$: Observable<void>;
fetch: (index: number) => Observable<T[]>;
invoker$
.pipe(
concatMap((_, index) => fetch(index)),
scan((acc: T[], values) => [...acc, ...values], [])
)
.subscribe(/* Do something with the array */);
The idea is having an observable that invokes a callback (e.g. backend call that takes a considerable amount of time) generating a new observable that emits a single value (array of some generic type). The returned values should be concatenated in another array while preserve their original fetch order.
I would, however, like the requests to be fired in parallel. So if the invoker$ is called rapidly, the requests are made in parallel and the results are merged as they complete.
My understanding is that the concatMap will wait for one observable to complete, before starting the next one. mergeMap will do it parallel, but won't do anything to preserve the order.
You can do it using mergeMap.
First, you need to pass the index together with the async response down the stream.
Then you can sort based on the index from the previous step.
Then you have two choices:
if the stream needs to end once all the requests are made and handle only once all the responses you can use reduce https://rxmarbles.com/#reduce
if the stream needs to continue for another batch of requests you need to use scan and later filter until you reach the needed event count. https://rxmarbles.com/#scan and https://rxmarbles.com/#filter
I am going to give you some pseudo-code for both examples:
In the reduce case, the stream ends once all requests are sent:
invoker$
.pipe(
mergeMap((_, index) => fetch(index).then(value => {value, index})),
reduce((acc: T[], singleValue) => [...acc, ...singleValue], []),
map(array => array.sort(/*Sort on index here*/).map(valueWithIndex => valueWithIndex.value))
)
.subscribe(/* Do something with the array */);
In the multiple-use case, I am assuming the size of the batch to be constant:
invoker$
.pipe(
mergeMap((_, index) => fetch(index).then(value => {value, index})),
scan((acc: T[], singleValue) => {
let resp = [...acc, ...singleValue];
// The scan can accumulate more than the batch size,
// so we need to limit it and restart for the new batch
if(resp.length > BATCH_SIZE) {
resp = [singleValue];
}
return resp;
}, []),
filter(array => array.length == BATCH_SIZE),
map(array =>
array
.sort(/*Sort on index here*/)
.map(valueWithIndex => valueWithIndex.value))
)
.subscribe(/* Do something with the array */);
2.1. In case the batch size is dynamic:
invoker$
.pipe(
mergeMap((_, index) => fetch(index).then(value => {value, index})),
withLatestFrom(batchSizeStream),
scan((acc: [T[], number], [singleValue, batchSize]) => {
let resp = [[...acc[0], ...singleValue], batchSize];
// The scan can accumulate more than the batch size,
// so we need to limit it and restart for the new batch
// NOTE: the batch size is dynamic and we do not want to drop data
// once the buffer size changes, so we need to drop the buffer
// only if the batch size did not change
if(resp[0].length > batchSize && acc[1] == batchSize) {
resp = [[singleValue], batchSize];
}
return resp;
}, [[],0]),
filter(arrayWithBatchSize =>
arrayWithBatchSize[0].length >= arrayWithBatchSize[1]),
map(arrayWithBatchSize =>
arrayWithBatchSize[0]
.sort(/*Sort on index here*/)
.map(valueWithIndex => valueWithIndex.value))
)
.subscribe(/* Do something with the array */);
EDIT: optimized sorting, added dynamic batch size case
I believe that the operator you are looking for is forkJoin.
This operator will take as input a list of observables, fire them in parallel and will return a list of the last emitted value of each observable once they all complete.
forkJoin({
invoker: invoker$,
fetch: fetch$,
})
.subscribe(({invoker, fetch}) => {
console.log(invoker, fetch);
});
Seems like this behavior is provided by the concatMapEager operator from the cartant/rxjs-etc library - written by Nicholas Jamieson
(cartant) who's a developer on the core RxJS team.

Observable make request multiple times and collect response together

In database I have 19 users.
In my API, I can get only 5 results in one call.
If I want to get all them, I need to do request 4 times, each time to get 5 users. With start query I will change from which position I want new users to get.
I'm trying to do it in RxJS together with redux-observable.
I have some idea, but maybe my approach is imperative, and RxJS is opposite ideology.
// get users from API and `pipe` them helps me to see actual data and to count length of array
function getUsers(position = 0) {
return ajax.getJSON(`${API}/users?_start=${position}&_limit=5`).
pipe(map(({data}) => ({responseLength: data.length, data})))
}
// here when I got response if array.lenght is equal to 5, I know that I need to do fetch of data again.
// Problem is encountered here: if I do recursion after doing I will get only last result, not both of them,
// if I put my previous result into array, and then recursion result again push in array it become too complicated after
// in userFetchEpic to manipulate with this data
function count(data) {
return data.pipe(
map(item => {
if (item.responseLength === 5) {
count(getUsers(5));
}
return {type: "TEST" , item}
})
)
}
function userFetchEpic(action$) {
return action$
.pipe(
ofType(USER_FETCH),
mergeMap(() => {
return count(getUsers()).pipe(
map(i => i)
)
})
);
}
My code is here just to show what was my way of thinking.
Main problem is in recursion how to save all values together, if I save values in array.
Then I need to loop through array of observables and that sounds complicated in my head. :)
Probably this problem have much easier and better solution.
19 Users with 4 concurrent calls
I've re-arranged your get-users function to generate 4 Ajax calls, run them all concurrently, then flatten the result into one array. This should get you all 19 users in a single array.
function getUsers() {
return forkJoin(
[0,5,10,15].map(position =>
ajax.getJSON(`${API}/users?_start=${position}&_limit=5`)
)
).pipe(
map(resArray => resArray.flatMap(res => res.data))
)
}
function userFetchEpic(action$) {
return action$.pipe(
ofType(USER_FETCH),
mergeMap(_ => getUsers())
);
}
Generalize: Recursively get users
This, again, will return all 19 users, but this time you don't need to know that you have 19 users ahead of time. On the other hand, this makes all its calls in sequence, so I would expect it to be slower.
You'll notice this one is done recursively. You are creating a call stack this way, but so long as you don't have millions of users, it shouldn't be a problem.
function getUsers(position = 0) {
return ajax.getJSON(`${API}/users?_start=${position}&_limit=5`).pipe(
switchMap(({data}) =>
data.length < 5 ?
of(data) :
getUsers(position + 5).pipe(
map(recursiveData => data.concat(recursiveData))
)
})
);
}

Resolve array of observables and append in final array

I have an endpoint url like http://site/api/myquery?start=&limit= which returns an array of strings.
If I call this endpoint in this way, the server hangs since the array of strings length is huge.
I need to generate an an array of observables with incremental "start" and "limit" parameters, resolve all of then either in sequence or in parallel, and then get a final observable which at the end yields the true array of strings, obtained merging all the subarray of strings returned by the inner observables.
How should I do that?
i.e. the array of observables would be something like
[
httpClient.get(http://site/api/myquery?start=0&limit=1000),
httpClient.get(http://site/api/myquery?start=1000&limit=1000),
httpClient.get(http://site/api/myquery?start=2000&limit=1000),
....
]
If you know the length before making all these queries — then you can create as many http-get Observables as you need, and then forkJoin them using projection fn.
forkJoin will let you make parallel queries and then merge results of those queries. Heres an example:
import { forkJoin } from 'rxjs';
// given we know the length:
const LENGTH = 500;
// we can pick arbitrary page size
const PAGE_SIZE = 50;
// calculate requests count
const requestsCount = Math.ceil(LENGTH / 50);
// generate calculated number of requests
const requests = (new Array(requestsCount))
.fill(void 0)
.map((_,i) => {
const start = i * PAGE_SIZE;
return http.get(`http://site/api/myquery?start=${start}&limit=${PAGE_SIZE}`);
});
forkJoin(
requests,
// projecting fn
// merge all arrays into one
// suboptimal merging, just for example
(...results) => results.reduce(((acc, curr)=> [...acc, ...curr]) , [])
).subscribe(array => {
console.log(array);
})
Check this forkJoin example for reference.
Hope this helps
In the case that you do not know the total number of items, you can do this using expand.
The following article gives a good introduction to expand and an explanation of how to use it for pagination.
https://ncjamieson.com/understanding-expand/
Something along the lines of the code below would work in your case, making the requests for each page in series.
const limit = 1000;
let currentStart = 0;
let getUrl = (start, limit) => `http://site/api/myquery?start=${start}&limit=${limit}`;
httpClient.get(getUrl(currentStart, limit)).pipe(
expand(itemsArray => {
if (itemsArray.length) {
currentStart += limit;
return httpClient.get(getUrl(currentStart, limit));
}
return empty();
}),
reduce((acc, value) => [...acc, ...value]),
).subscribe(itemsArray => {
console.log(itemsArray);
})
This will log out the final array of items once the entire series of requests has been resolved.

How to combine a parent and a dependent child observable

There is a continuous stream of event objects which doesn't complete. Each event has bands. By subscribing to events you get an event with several properties, among these a property "bands" which stores an array of bandIds. With these ids you can get each band. (The stream of bands is continuous as well.)
Problem: In the end you'd not only like to have bands, but a complete event object with bandIds and the complete band objects.
// This is what I could come up with myself, but it seems pretty ugly.
getEvents().pipe(
switchMap(event => {
const band$Array = event.bands.map(bandId => getBand(bandId));
return combineLatest(of(event), ...band$Array);
})),
map(combined => {
const newEvent = combined[0];
combined.forEach((x, i) => {
if (i === 0) return;
newEvent.bands = {...newEvent.bands, ...x};
})
})
)
Question: Please help me find a cleaner way to do this (and I'm not even sure if my attempt produces the intended result).
ACCEPTED ANSWER
getEvents().pipe(
switchMap(event => {
const band$Array = event.bands.map(bandId => getBand(bandId));
return combineLatest(band$Array).pipe(
map(bandArray => ({bandArray, event}))
);
})
)
ORIGINAL ANSWER
You may want to try something along these lines
getEvents().pipe(
switchMap(event => {
const band$Array = event.bands.map(bandId => getBand(bandId));
return forkJoin(band$Array).pipe(
map(bandArray => ({bandArray, event}))
);
})
)
The Observable returned by this transformation emits an object with 2 properties: bandArray holding the array of bands retrieved with the getBand service and event which is the object emitted by the Observable returned by getEvents.
Consider also that you are using switchMap, which means that as soon as the Observable returned by getEvents emits you are going to switch to the last emission and complete anything which may be on fly at the moment. In other words you can loose some events if the time required to exectue the forkJoin is longer than the time from one emission and the other of getEvents.
If you do not want to loose anything, than you better use mergeMap rather than switchMap.
UPDATED ANSWER - The Band Observable does not complete
In this case I understand that getBand(bandId) returns an Observable which emits first when the back end is queried the first time and then when the band data in the back end changes.
If this is true, then you can consider something like this
getEvents().pipe(
switchMap(event => {
return from(event.bands).pipe(
switchMap(bandId => getBand(bandId)).pipe(
map(bandData => ({event, bandData}))
)
);
})
)
This transformation produces an Observable which emits either any time a new event occurs or any time the data of a band changes.

Keep delaying HTTP request until new params are arriving

Suppose we have a function getIds() which takes an array of some ids
like this:
getIds([4, 1, 32]);
This function will delay HTTP call for 100ms. But during 100ms if this
same function is called again:
getIds([1, 8, 5]);
It will reset the 100ms timer and keep merging the passed ids. It will
send HTTP request only if it's not called by anyone for more than 100ms.
I am new to RxJS and here's my attempt to solve this problem but I have
a feeling that there could be better solution for this problem.
https://jsfiddle.net/iFadey/v3v3L0yd/2/
function getIds(ids) {
let observable = getIds._observable,
subject = getIds._subject;
if (!observable) {
subject = getIds._subject = new Rx.ReplaySubject();
observable = getIds._observable = subject
.distinct()
.reduce((arr, id) => {
arr.push(id);
return arr;
}, [])
// Some HTTP GET request will go here
// whose results may get flatMapped here
.publish()
.refCount()
;
}
ids.forEach((id) => {
console.log(id);
subject.next(id);
});
clearTimeout(getIds._timer);
getIds._timer = setTimeout(() => {
getIds._observable = null;
getIds._subject = null;
subject.complete();
}, 100);
return observable;
}
getIds([1, 2, 3])
.subscribe((ids) => {
console.log(ids);
});
getIds([3, 4, 5])
.subscribe((ids) => {
console.log(ids);
});
edit:
I am looking for an operator which behaves like debounce but without dropping previous values. Instead it must queue them.
I am not certain to have captured exactly which of the following you are looking for, so I will simply describe both. There are two "time based patterns" that are most often suited for this type of problem in my experience:
debounce
rxmarbles url: http://rxmarbles.com/#debounce ; github doc
As it says in its documentation, it
Emits an item from the source Observable after a particular timespan
has passed without the Observable omitting any other items.
throttle
rxmarbles url: none yet ; github doc
Returns an Observable that emits only the first item emitted by the
source Observable during sequential time windows of a specified
duration.
Basically, if you would like to wait until the inputs have quieted for a certain period of time before taking action, you want to debounce. If you do not want to wait at all, but do not wish to make more than 1 query within a specific amount of time, you will want to throttle.
Hope it makes sense.

Resources