I'm trying to achieve something very similar to a buffer count. As values come through the pipe, bufferCount of course buffers them and sends them down in batches. I'd like something similar to this that will emit all remaining items if there are currently fewer than the buffer size in the stream.
It's a little confusing to word, so I'll provide an example with what I'm trying to achieve.
I have something adding items individually to a subject. Sometimes it'll add 1 item a minute, sometimes it'll add 1000 items in 1 second. I wish to do a long running process (2 seconds~) on batches of these items as to not overload the server.
So for example, consider the timeline where P is processing
---A-----------B----------C---D--EFGHI------------------
|_( P(A) ) |_(P(B)) |_( P(C) ) |_(P([D, E, F, G, H, I]))
This way I can process the events in small or large batches depending on how many events are coming through, but i ensure the batches remain smaller than X.
I basically need to map all the individual emits into emits that contain chunks of 5 or fewer. As I pipe the events into a concatMap, events will start to stack up. I want to pick these stacked up events off in batches. How can I achieve this?
Here's a stackblitz with what I've got so far: https://stackblitz.com/edit/rxjs-iqwcbh?file=index.ts
Note how item 4 and 5 don't process until more come in and fill in the buffer. Ideally after 1,2,3 are processed, it'll pick off 4,5 the queue. Then when 6,7,8 come in, it'll process those.
EDIT: today I learned that bufferTime has a maxBufferSize parameter, that will emit when the buffer reaches that size. Therefore, the original answer below isn't necessary, we can simply do this:
const stream$ = subject$.pipe(
bufferTime(2000, null, 3), // <-- buffer emits # 2000ms OR when 3 items collected
filter(arr => !!arr.length)
);
StackBlitz
ORIGINAL:
It sounds like you want a combination of bufferCount and bufferTime. In other words: "release the buffer when it reaches size X or after Y time has passed".
We can use the race operator, along with those other two to create an observable that emits when the buffer reaches the desired size OR after the duration has passed. We'll also need a little help from take and repeat:
const chunk$ = subject$.pipe(bufferCount(3));
const partial$ = subject$.pipe(
bufferTime(2000),
filter(arr => !!arr.length) // don't emit empty array
);
const stream$ = race([chunk$, partial$]).pipe(
take(1),
repeat()
);
Here we define stream$ to be the first to emit between chunk$ and partial$. However, race will only use the first source that emits, so we use take(1) and repeat to sort of "reset the race".
Then you can do your work with concatMap like this:
stream$.pipe(
concatMap(chunk => this.doWorkWithChunk(chunk))
);
Here's a working StackBlitz demo.
You may want to roll it into a custom operator, so you can simply do something like this:
const stream$ = subject$.pipe(
bufferCountTime(5, 2000)
);
The definition of bufferCountTime() could look like this:
function bufferCountTime<T>(count: number, time: number) {
return (source$: Observable<T>) => {
const chunk$ = source$.pipe(bufferCount(count));
const partial$ = source$.pipe(
bufferTime(time),
filter((arr: T[]) => !!arr.length)
);
return race([chunk$, partial$]).pipe(
take(1),
repeat()
);
}
}
Another StackBlitz sample.
Since I noticed the use of forkJoin in your sample code, I can see you are sending a request to the server for each emission (I was originally under the impression that you were making only 1 call per batch with combined data).
In the case of sending one request per item the solution is much simpler!
There is no need to batch the emissions, you can simply use mergeMap and specify its concurrency parameter. This will limit the number of currently executing requests:
const stream$ = subject$.pipe(
mergeMap(val => doWork(val), 3), // 3 max concurrent requests
);
Here is a visual of what the output would look like when the subject rapidly emits:
Notice the work only starts for the first 3 items initially. Emissions after that are queued up and processed as the prior in flight items complete.
Here's a StackBlitz example of this behavior.
TLDR;
A StackBlitz app with the solution can be found here.
Explanation
Here would be an approach:
const bufferLen = 3;
const count$ = subject.pipe(filter((_, idx) => (idx + 1) % bufferLen === 0));
const timeout$ = subject.pipe(
filter((_, idx) => idx === 0),
switchMapTo(timer(0))
);
subject
.pipe(
buffer(
merge(count$, timeout$).pipe(
take(1),
repeat()
)
),
concatMap(buffer => forkJoin(buffer.map(doWork)))
)
.subscribe(/* console.warn */);
/* Output:
Processing 1
Processing 2
Processing 3
Processed 1
Processed 2
Processed 3
Processing 4
Processing 5
Processed 4
Processed 5
Processing 6 <- after the `setTimeout`'s timer expires
Processing 7
Processing 8
Processed 6
Processed 7
Processed 8
*/
The idea was to still use the bufferCount's behavior when items come in synchronously, but, at the same time, detect when fewer items than the chosen bufferLen are in the buffer. I thought that this detection could be done using a timer(0), because it internally schedules a macrotask, so it is ensured that items emitted synchronously will be considered first.
However, there is no operator that exactly combines the logic delineated above. But it's important to keep in mind that we certainly want a behavior similar to the one the buffer operator provides. As in, we will for sure have something like subject.pipe(buffer(...)).
Let's see how we can achieve something similar to what bufferTime does, but without using bufferTime:
const bufferLen = 3;
const count$ = subject.pipe(filter((_, idx) => (idx + 1) % bufferLen === 0));
Given the above snippet, using buffer(count$) and bufferTime(3), we should get the same behavior.
Let's move now onto the detection part:
const timeout$ = subject.pipe(
filter((_, idx) => idx === 0),
switchMapTo(timer(0))
);
What it essentially does is to start a timer after the subject has emitted its first item. This will make more sense when we have more context:
subject
.pipe(
buffer(
merge(count$, timeout$).pipe(
take(1),
repeat()
)
),
concatMap(buffer => forkJoin(buffer.map(doWork)))
)
.subscribe(/* console.warn */);
By using merge(count$, timeout$), this is what we'd be saying: when the subject emits, start adding items to the buffer and, at the same time, start the timer. The timer is started too because it is used to determine if fewer items will be in the buffer.
Let's walk through the example provided in the StackBlitz app:
from([1, 2, 3, 4, 5])
.pipe(tap(i => subject.next(i)))
.subscribe();
// Then mimic some more items coming through a while later
setTimeout(() => {
subject.next(6);
subject.next(7);
subject.next(8);
}, 10000);
When 1 is emitted, it will be added to the buffer and the timer will start. Then 2 and 3 arrive immediately, so the accumulated values will be emitted.
Because we're also using take(1) and repeat(), the process will restart. Now, when 4 is emitted, it will be added to the buffer and the timer will start again. 5 arrives immediately, but the number of the collected items until now is less than the given buffer length, meaning that until the 3rd value arrives, the timer will have time to finish. When the timer finishes, the [4,5] chunk will be emitted. What happens with [6, 7, 8] is the same as what happened with [1, 2, 3].
Suppose I have some Observable which may have some arbitrarily long sequence of events at the time I subscribe to it but which may also continue to emit events after I subscribe.
I am interested only in those events from the time at which I subscribe and later. How do I just get the latest events?
In this example I use a ReplaySubject as an artificial source to illustrate the question. In practice this would be some arbitrary Observable.
let observable = ReplaySubject<Int>.createUnbounded()
observable.onNext(1)
observable.onNext(2)
observable.onNext(3)
observable.onNext(4)
_ = observable.subscribe(onNext: {
print($0)
})
observable.onNext(5)
observable.onNext(6)
observable.onNext(7)
Produces the output:
1
2
3
4
5
6
7
What I really want is only events from the time of subscription onwards. i.e. 4 5 6 7
I can use combineLatest with some other dummy Observable:
let observable = ReplaySubject<Int>.createUnbounded()
observable.onNext(1)
observable.onNext(2)
observable.onNext(3)
observable.onNext(4)
_ = Observable.combineLatest(observable, Observable<Int>.just(42)) { value, _ in value }
.subscribe(onNext: {
print($0)
})
observable.onNext(5)
observable.onNext(6)
observable.onNext(7)
which produces the desired output 4 5 6 7
How can I produce a similar result without artificially introducing another arbitrary Observable?
I have tried a number of things including combineLatest with an array consisting of just one observable, but that emits the complete sequence, not just the latest. I know I could use PublishSubject but I am just using ReplaySubject here as an illustration.
By default, an observable will call its generator for every subscriber and emit all of the values produced by that generator. So for example:
let obs = Observable.create { observer in
for each in [1, 2, 3, 5, 7, 11] {
observer.onNext(each)
}
observer.onCompleted()
}
(Note that the above is the implementation of Observable.from(_:))
Every time something subscribes to obs the closure is called and all 6 next events will be received. This is what's known as a "cold" observable, and again it's the default behavior. Assume an Observable is cold unless you know otherwise.
There is also the concept of a "hot" observable. A hot observable doesn't call its generator function when something subscribes to it.
Based on your question, and your subsequent comment, it sounds like you want to know how to make a cold observable hot... The fundamental way is by calling .multicast on it (or one of the operators that use its implementation like publish(), replay(_:) or replayAll().) There is also a special purpose operator called .share() that will "heat up" an observable and keep it hot until all subscribers unsubscribe to it (then it will be cold again.) And of course, Subjects are considered hot because they don't have a generator function to call.
Note however, that many observables have synchronous behavior, this means that they will emit all their values as soon as something subscribes and thus will have already completed before any other observer (on that thread) has a chance to subscribe.
Some more examples... .interval(_:scheduler:) is a cold observable with async behavior. Let's say you have the following:
let i = Observable<Int>.interval(.seconds(3), scheduler: MainScheduler.instance)
i.subscribe(onNext: { print($0, "from first") })
DispatchQueue.main.asyncAfter(deadline: .now() + 5) {
i.subscribe(onNext: { print($0, "from second") })
}
What you will find is that each observer will get it's own independent stream of values (both will start with 0) because the generator inside interval is called for both observers. So you will see output like:
0 from first
1 from first
0 from second
2 from first
1 from second
3 from first
2 from second
If you multicast the interval you will see different behavior:
let i = Observable<Int>.interval(.seconds(3), scheduler: MainScheduler.instance)
.publish()
i.subscribe(onNext: { print($0, "from first") })
i.connect()
DispatchQueue.main.asyncAfter(deadline: .now() + 5) {
i.subscribe(onNext: { print($0, "from second") })
}
The above will produce:
0 from first
1 from first
1 from second
2 from first
2 from second
3 from first
3 from second
(Note that "second" started with 1 instead of 0.) The share operator will work the same way in this case except you don't have to call connect() because it does it automatically.
Lastly, watch out. If you publish a synchronous observable, you might not get what you expect:
let i = Observable.from([1, 2, 3, 5])
.publish()
i.subscribe(onNext: { print($0, "from first") })
i.connect()
i.subscribe(onNext: { print($0, "from second") })
produces:
1 from first
2 from first
3 from first
5 from first
Because all 5 events (the four next events and the completed event) emit as soon as connect() is called before the second observer gets a chance to subscribe.
An article that might help you is Hot and Cold Observables but it's pretty advanced...
Why not simply use a publish subject like this? Isn't this the desired output? Publish Subjects only emits the elements after it's subscribed. And that's the whole purpose of it.
let observable = PublishSubject<Int>()
observable.onNext(1)
observable.onNext(2)
observable.onNext(3)
observable.onNext(4)
_ = observable.subscribe(onNext: {
print($0)
})
observable.onNext(5)
observable.onNext(6)
observable.onNext(7)
}
If you don't want to use a subject you can share the observable and add a 2nd subscriber like this,
let observable = ReplaySubject<Int>.createUnbounded()
observable.onNext(1)
observable.onNext(2)
observable.onNext(3)
observable.onNext(4)
let shared = observable.share()
// this will print full sequence
shared.subscribe(onNext: {
print("full sequence: \($0)")
}).disposed(by: disposeBag)
// this will only print new events
shared.subscribe(onNext: {
print("shared sequence: \($0)")
}).disposed(by: disposeBag)
// new events
observable.onNext(5)
observable.onNext(6)
observable.onNext(7)
Observables are lazy, pull driven sequences. Without your first subscription stream won't even start. Once started, by sharing it, you can subscribe only to the new events.
I have a list of observables I'd like to fire 5 at a time. I've tried using a mergeMap, but clearly I'm using it wrong:
// obsArray defined above... is a Array of Observables... about 30 of them
of(obsArray).pipe(
mergeMap(x => x, 5)
).subscribe();
The issue is that x in the mergeMap is the entire observable list. How do I send 5 at a time to be fired (they are http calls)?
Use from to emit single items from an array. You can also replace mergeMap(x => x) with mergeAll.
from(obsArray).pipe(mergeAll(5))
Having such a convenient method like .startWith it would make sense to have his oposite, .endWith, which makes the observable yield a value whenever it gets completed.
I have come up with this solution, but is there anything better? This thing gets a bit hard to read for what it is.
source.concat(Rx.Observable.just(lastValue))
There is in RxJS6 (no clue when it was added to be honest)
Documentation:
https://rxjs-dev.firebaseapp.com/api/operators/endWith
Source: https://github.com/ReactiveX/rxjs/blob/df0ea7c78767c07a6ed839608af5a7bb4cefbde5/src/internal/operators/endWith.ts
Also, defaultIfEmpty() only emits a value if the observable CLOSES without emitting a value. It's a subtle, yet not so subtle distinction. It may have the same effect as endWith() in limited situations.
Example of endWith():
const source = of(1, 2, 3, 4, 5);
const example = source.pipe(
takeWhile(val => val != 4),
endWith(4));
Emits:
[1, 2, 3, 4]
Also I'm noticing that the https://learnrxjs.io website is increasingly out of date, and currently doesn't show this operator.
Why did I need it?
I was looking for the ability to emit false until a condition became true, but never go back to false. So slightly similar to debouncing, but not quite.
Given a stream that pushes items into Observable A.
How to notify subscribers when there are exactly 3 items pushed in A?
To visualize it:
--o--i--o------i---------o----i------|---> A
I need to wait for all i to arrive even though there are other items present such as o.
And if there are no exactly 3 is pushed within a timeframe, then fire an error to redo the procedure.
Thanks.
I'd use filter() to filter only i characters, count() to count all items emitted until the source completes and concatMap() to decide whether the number of items is correct and eventually throw an error.
import {Observable} from 'rxjs';
Observable.from(['o', 'i', 'o', 'i', 'o', 'i'/*, 'i'*/])
.filter(val => val == 'i')
.count()
.concatMap(count => count == 3 ? Observable.of(count) : Observable.throw('Expected 3 values'))
.subscribe(
count => console.log(count),
err => console.log(err)
);
Which prints to console:
3
And if you add one more i to the source Observable the subscriber receives an error:
Expected 3 values
See live demo: http://plnkr.co/edit/VkQeGpY2gDj3gFhXD2uR?p=preview
If you want to resubscribe when an error happens use retry() or retryWhen(). This another demo fails first because it contains only 2 is but it resubscribes with retry() and then succeeds.
Here's more complicated demo: http://plnkr.co/edit/ID5WTVYGYZuhEJ7fXn1F?p=preview