RxJS parallel queue with concurrent workers? - parallel-processing

Let's say I want to I download 10,000 files. I can easily build a queue of those 10,000 files (happy to take advice if any of this can be done better),
import request from 'request-promise-native';
import {from} from 'rxjs';
let reqs = [];
for ( let i = 0; i < 10000; i++ ) {
reqs.push(
from(request(`http://bleh.com/${i}`))
)
};
Now I have an array of Rx.JS observable I've created from promises that represent my queue. Now for the behavior of what I want, I want to issue
Three-concurrent requests to the server
Upon completion of a request, I would like a new request to fire.
I can create a solution to this problem, but in light of things like the Rxjs queue, which I've never used I'm wondering what the right-most Rxjs way to do this is.

It sounds like you want an equivalent of forkJoin that supports a caller-specified maximum number of concurrent subscriptions.
It's possible to re-implement forkJoin using mergeMap and to expose the concurrent parameter, like this:
import { from, Observable } from "rxjs";
import { last, map, mergeMap, toArray } from "rxjs/operators";
export function forkJoinConcurrent<T>(
observables: Observable<T>[],
concurrent: number
): Observable<T[]> {
// Convert the array of observables to a higher-order observable:
return from(observables).pipe(
// Merge each of the observables in the higher-order observable
// into a single stream:
mergeMap((observable, observableIndex) => observable.pipe(
// Like forkJoin, we're interested only in the last value:
last(),
// Combine the value with the index so that the stream of merged
// values - which could be in any order - can be sorted to match
// the order of the source observables:
map(value => ({ index: observableIndex, value }))
), concurrent),
// Convert the stream of last values to an array:
toArray(),
// Sort the array of value/index pairs by index - so the value
// indices correspond to the source observable indices and then
// map the pair to the value:
map(pairs => pairs.sort((l, r) => l.index - r.index).map(pair => pair.value))
);
}

I am coming across this same problem in 2021 and was able to leverage #cartant's answer so I thought I'd share:
index.ts
import request from 'request-promise-native';
import { from, defer } from "rxjs";
import { forkJoinConcurrent } from './forkJoinConcurrent';
const handleRequest = async (id: string) => await request.get(`http://bleh.com/${id}`, { json: true });
const ids: string[] = [...Array(10000).keys()].map((k: number) => k.toString());
const concurrent: number = 3;
/* use `defer` instead of `from` to generate the Observables.
`defer` uses a factory to generate the promise and it will execute
the factory only when it is subscribed to */
const observables = ids.map((id: string) => defer(() => from(handleRequest(id))))
forkJoinConcurrent<any>(observables, concurrent).subscribe(value => console.log(value));
forkJoinConcurrent.ts
import { from, Observable } from "rxjs";
import { last, map, mergeMap, toArray } from "rxjs/operators";
export function forkJoinConcurrent<T>(
observables: Observable<T>[],
concurrent: number
): Observable<T[]> {
// Convert the array of observables to a higher-order observable:
return from(observables).pipe(
// Merge each of the observables in the higher-order observable
// into a single stream:
mergeMap((observable, observableIndex) => observable.pipe(
// Like forkJoin, we're interested only in the last value:
last(),
// Combine the value with the index so that the stream of merged
// values - which could be in any order - can be sorted to match
// the order of the source observables:
map(value => ({ index: observableIndex, value }))
), concurrent),
// Convert the stream of last values to an array:
toArray(),
// Sort the array of value/index pairs by index - so the value
// indices correspond to the source observable indices and then
// map the pair to the value:
map(pairs => pairs.sort((l, r) => l.index - r.index).map(pair => pair.value))
);
}

Related

How do I iterate over functions that return rxjs observables

I want to iterate over a series of asynchronous functions and end the iterating when a false is returned.
I'm new to rxjs and can't get the use-case below to work. I feel like I'm not understanding something fundamental. Can someone please point it out to me?
function validateA(): Observable<any> {
// do stuff.
return of({ id: "A", result: true }); // hardcoding result for now
}
function validateB(): Observable<any> {
// do stuff
return of({ id: "B", result: true }); // hardcoding result for now
}
function validateC(): Observable<any> {
// do stuff
return of({ id: "C", result: false });// hardcoding result for now
}
from([validateA, validateB, validateC])
.pipe(
map(data => data()),
takeWhile(data => !!data.result)
)
.subscribe(data => console.log(`${data.id} passed!`));
https://stackblitz.com/edit/typescript-ub9c5r?file=index.ts&devtoolsheight=100
I would say that the core of your logic is right. What is missing is some rxJs pecularity.
The solutions could be something like this. Explanation of the nuances are in the comments.
// start from an array of functions and turn it into a stream using RxJs from function
from([validateA, validateB, validateC])
.pipe(
// now execute each function sequentially, one after the other, via concatMap
// operator. This operator calls each function and each function returns an Observable
// concatMap ensures that the functions are called sequentially and also that the returned Observable (because each function returns an Observable)
// is "flattened" in the result stream. In other words, you execute each function one at the time
// and return the value emitted by the Observable returned by that function
// until that Observable completes. Considering that you use the "of" function to
// create the Observable which is returned by each function, such Observable emits just one value and then completes.
concatMap(func => func()),
// now you have a stream of values notified by the Observables returned by the functions
// and you terminate as soon as a flase is received
takeWhile(data => !!data.result)
)
.subscribe(data => console.log(`${data.id} passed!`));
The following seems to do the trick and calls functions lazily:
https://stackblitz.com/edit/typescript-9ystxv?file=index.ts
import { from, Observable, of } from "rxjs";
import { concatAll, find, map } from "rxjs/operators";
function validateA() {
console.log('validateA');
return of({ id: "A", result: false });
}
function validateB() {
console.log('validateB');
return of({ id: "B", result: true });
}
function validateC() {
console.log('validateC');
return of({ id: "C", result: false });
}
from([validateA, validateB, validateC])
.pipe(
map(validate => validate()),
concatAll(),
find(data => data.result)
)
.subscribe(data => console.log(`${data.id} passed!`));

Wait for Observable to Complete Inside of Array.Map

I have an array of objects in which one of the keys includes a customer id.
const customerArray = [{ customerId: 123, ...}, { customerId: 456, ...}];
I want to iterate through this array and make an api call to get further details about this customer from a separate endpoint.
const mapped = customerArray
.map(customer => ({
customerId: customer.customerId,
rating: this.productService(customer.customerId)
.pipe(map(rating => rating))}));
My expectation is that I would then have an array that includes an object with the following shape:
{
customerId: number,
rating: number
}
Instead, I end up with:
{
customerId: number,
rating: Observable
}
My productService call returns on observable and is used elsewhere in the app successfully. I need to have my map wait for the call to complete on the rating key before mapping to the next item in the array.
If I understand it right, you have to iterate through an array, make an http request to an endpoint for each element of the array, and fill each element of the array with the data returned by the endpoint.
So, if this is the case, you may try mergeMap like this
const myObs = from(customerArray).pipe(
mergeMap(customer => {
return this.productService(customer.customerId).pipe(
map(rating => ({customerId: customer.customerId, rating}))
)
})
)
If you subscribe to myObs you should get a stream of objects in the shape you are looking for, i.e.
{
customerId: number,
rating: number
}
mergeMap, previously known as flatMap, allows you to flatten a stream of Observables. In other words, if you iterate through an array to generate an array of Observables, which should be your case, mergeMap allows you to extract the values inside the Observables generated.
I got a stackblitz setup that shows a way you can manage it, but for the sake of keeping it always available
import { Observable, of, from } from 'rxjs';
import { map, mergeMap, combineAll } from 'rxjs/operators';
const custArray = [{customerId: 1}, {customerId: 2}, {customerId: 3}];
function mapSomeStuff(id: number): Observable<number> {
return of(id * id);
}
function doProductStuff(custArr: Array<{customerId: number}>): Observable<Array<{ customerId: number, rating: number}>> {
return from(custArray)
.pipe(
map(async (cust) => ({
customerId: cust.customerId,
rating: await mapSomeStuff(cust.customerId).toPromise()
})),
combineAll()
);
}
doProductStuff(custArray).subscribe(x => console.log(x))
This breaks up the array and creates an observable for each value in the array, runs the service, converts the observable to a promise and gets the final value of it, then combines all of the observables into a single observable with an array of values being the final out. You can check the output and see [Object, Object, Object] and check to see that the customerId and rating are available on each Object.

In RxJS, map not executing inner mapped observable

New to RxJS, but I'm trying to map a single element stream to another that produces an array after all internal/subsequent streams are finished/loaded. However, my inner observables don't seem to be executing. They just get returned cold.
High level, I need to execute http post to upload a list of files (in two different arrays to two different endpoints). Since they are large I emulate with a delay of 5 seconds. The requests need to be executed in parallel, but limited to concurrently executing X at a time (here 2). This all needs to be inside a pipe and the pipe should only allow the stream to continue after all posts are complete.
https://stackblitz.com/edit/rxjs-pnwa1b
import { map, mapTo, mergeMap, mergeAll, delay, tap, catchError, toArray } from 'rxjs/operators';
import { interval, merge, forkJoin, of, from, range, Observable } from 'rxjs';
const single = "name";
const first = ["abc", "def"];
const second = of("ghi", "jkl", "mno");
of(single)
.pipe(tap(val => console.log(`emit:${val}`)))
.pipe(
map(claim =>
merge(
from(first).pipe(map(photo => of(photo).pipe(delay(5000)))),
from(second).pipe(map(video => of(video).pipe(delay(5000))))
)
.pipe(
mergeAll(2)
)
.pipe(tap(val => console.log(`emit:${val}`)))
.pipe(toArray())
.pipe(tap(val => console.log(`emit:${val}`)))
)
)
.pipe(
catchError(error => {
console.log("error");
return Observable.throw(error);
})
)
.subscribe(val => console.log(`final:${val}`));
An inner subscribe would not wait until they are complete. Using forkJoin would not allow me to limit the concurrent uploads. How can I accomplish this?
Update:
Answer by #dmcgrandle was very helpful and led me to make the changes below that seem to be working:
import { map, mapTo, mergeMap, mergeAll, delay, tap, catchError, toArray } from 'rxjs/operators';
import { interval, merge, forkJoin, of, from, range, Observable, throwError } from 'rxjs';
const single = "name";
const first = ["abc", "def"];
const second = of("ghi", "jkl", "mno");
of(single)
.pipe(tap(val => console.log(`emit:${val}`)))
.pipe(
mergeMap(claim =>
merge(
from(first).pipe(map(photo => of(photo).pipe(delay(5000)).pipe(tap(val => console.log(`emit:${val}`))))),
from(second).pipe(map(video => of(video).pipe(delay(5000)).pipe(tap(val => console.log(`emit:${val}`)))))
)
),
mergeAll(2),
toArray()
)
.pipe(
catchError(error => {
console.log("error");
return throwError(error);
})
)
.subscribe(val => console.log(`final:${val}`));
If I am understanding you correctly, then I think this is a solution. Your issue was with the first map, which won't perform an inner subscribe, but rather just transform the stream into Observables of Observables, which didn't seem to be what you wanted. Instead I used mergeMap there.
Inside the from's I used concatMap to force each emission from first and second to happen in order and wait for one to complete before another started. I also set up postToEndpoint functions that return Observables to be closer to what your actual code will probably look like.
StackBlitz Demo
code:
import { mergeMap, concatMap, delay, tap, catchError, toArray } from 'rxjs/operators';
import { merge, of, from, concat, throwError } from 'rxjs';
const single = "name";
const first = ["abc", "def"];
const second = of("ghi", "jkl", "mno");
const postToEndpoint1$ = photo => of(photo).pipe(
tap(data => console.log('start of postTo1 for photo:', photo)),
delay(5000),
tap(data => console.log('end of postTo1 for photo:', photo))
);
const postToEndpoint2$ = video => of(video).pipe(
tap(data => console.log('start of postTo2 for video:', video)),
delay(5000),
tap(data => console.log('end of postTo2 for video:', video))
);
of(single).pipe(
tap(val => console.log(`initial emit:${val}`)),
mergeMap(claim =>
merge(
from(first).pipe(concatMap(postToEndpoint1$)),
from(second).pipe(concatMap(postToEndpoint2$))
)
),
toArray(),
catchError(error => {
console.log("error");
return throwError(error);
})
).subscribe(val => console.log(`final:`, val));
I hope this helps.

RxJS: forkJoin mergeMap

I'm trying to make multiple http requests and get returned data in one object.
const pagesToFetch = [2,3]
const request$ = forkJoin(
from(pagesToFetch)
.pipe(
mergeMap(page => this.mockRemoteData(page)),
)
)
mockRemoteData() return a simple Promise.
After first Observable emits (the once created from first entry of pagesToFetch the request$ is completed, second value in not included. How can I fix this?
You can turn each value in pagesToFetch into an Observable and then wait until all of them complete:
const observables = pagesToFetch.map(page => this.mockRemoteData(page));
forkJoin(observables)
.subscribe(...);
Or in case it's not that simple and you need pagesToFetch to be an Observable to collect urls first you could use for example this:
from(pagesToFetch)
.pipe(
toArray(),
mergeMap(pages => {
const observables = pages.map(page => this.mockRemoteData(page));
return forkJoin(observables);
}),
)
.subscribe(...);
Try the below sample format...
Observable.forkJoin(
URL 1,
URL 2
).subscribe((responses) => {
console.log(responses[0]);
console.log(responses[1]);
},
error => {console.log(error)}
);

rxjs - combining inner observables after filtering

I call backend that respond with:
[
"https://some-url.com/someData1.json",
"https://some-url.com/someData2.json"
]
Each JSON can have following schema:
{
"isValid": boolean,
"data": string
}
I want to get array with all data, that have isValid is set to true
backend.get(url)
.pipe(
mergeMap((urls: []) =>
urls.map((url: string) =>
backend.get(url)
.pipe(
filter(response => response.isValid),
map(response => response.data)
)
)
),
combineAll()
)
When both .json have "isValid" set to true, I get array with both data.
But when one of them has "isValid" set to false observable never completes.
I could use mergeAll instead of combineAll, but then I receive stream of single data not collection of all data.
Is there any better way to filter out observable?
As you said, the inner observable never emits, because filter does not forward the only value that is ever emitted by the backend.get observable. In that case, the operator subscribing on that observable - in your case combineAll - will also never receive any value and cannot ever emit itself.
What I would do is just move the filtering and mapping to combineAll by providing a project function, like that:
backend.get(url)
.pipe(
mergeMap((urls: string[]) =>
urls.map((url: string) => backend.get(url))
),
combineAll(responses =>
responses
.filter(response => response.isValid)
.map(response => response.data)
)
)
See if that works for you ;)
import { forkJoin, Observable } from 'rxjs';
import { map } from 'rxjs/operators';
interface IRes {
isValid: boolean;
data: string;
}
interface IResValid {
isValid: true;
data: string;
}
function isValid(data: IRes): data is IResValid {
return data.isValid;
}
const res1$: Observable<IRes> = backend.get(url1);
const res2$: Observable<IRes> = backend.get(url2);
// When all observables complete, emit the last emitted value from each.
forkJoin([res1$, res2$])
.pipe(map((results: IRes[]) => results.filter(isValid)))
.subscribe((results: IResValid[]) => console.log(results));

Resources