I'm trying to to group the values from an observable into an array of n size, to be able to batch send these to a service to improve the overall performance.
The thing is that I want to make sure that when the items left are less then n, they will be still be passed down the chain after a certain timeout.
I'm trying to rewrite the C# solution from
https://stackoverflow.com/a/22873833/2157455
in Javascript.
The main problem is that in Rx.Js lots of methods have been deprecated and it's hard to find the new functions.
var people = new List<(string name, int age)>
{
("Sue", 25 ),
("Joe", 30 ),
("Frank", 25 ),
("Sarah", 35 ),
("John", 37)
}.ToObservable();
var buffers = people
.GroupByUntil(
// yes. yes. all items belong to the same group.
x => true,
g => Observable.Amb(
// close the group after 5 seconds of inactivity
g.Throttle(TimeSpan.FromSeconds(5)),
// close the group after 10 items
g.Skip(1)
))
// Turn those groups into buffers
.SelectMany(x => x.ToArray());
I could get this far, but I can't find the replacement for groupByUntil. And I'm not sure what's the selectMany operator in Rx.Js, probably toArray().
Most examples I find are using deprecated or non-exising functions.
I'm using rxjs 7.8.0
The syntax does not help as well, using the pipe all the time makes the code difficult to read in my opinion.
const people = [
{ name: 'Sue', age: 25 },
{ name: 'Joe', age: 30 },
{ name: 'Frank', age: 25 },
{ name: 'Sarah', age: 35 },
{ name: 'John', age: 37 }
];
const source = from(people);
const example = source.pipe(
groupBy(person => true),
mergeMap(group => group.pipe(
raceWith(
group.pipe(throttle(() => interval(1000))),
group.pipe(skip(2))
),
toArray()
)));
example.forEach(x => console.log(x.length));
I'm getting all 5, instead of two arrays, one with 3 the other with 2.
Perhaps there is a better way to write it in js, but I can;t see the replacement for groupByUntil.
Thanks.
bufferTime is probably what you are looking for
One of its signature is :
bufferTime(bufferTimeSpan: number, bufferCreationInterval: number, maxBufferSize: number, scheduler?: SchedulerLike): OperatorFunction<T, T[]>
so with bufferTime(1000, null, 2) you get a buffered of length=2 or every 1s.
I have the following, and it does work, it keeps increasing the delay and eventually timing out which is what I wanted.
But because I am using Concatmap i lose the original value from the interval.
let x = 1
let source2$ = interval(500)
.pipe(
concatMap(() => {
x++
let newtime = x * 500
console.log("newtime ", newtime)
return of(5).pipe(delay(newtime))
}),
timeout(3000),
map((data) => {
return 'Source 2: ' + data
})
)
so it prints Source 2: 5.. where as i want it to print the value of the interval.
I got working what i wanted using the concatmap but i think its the wrong operator as I lose the original value.
Can somebody help?
More info
TO summarize, all i would like to do is emit values using the interval and after each emit increase the delay time - eventually it hits the timeout of 3000 ms and errors out.
I've mentioned in comments that you can use concatMap for this that receives ever increasing index from interval:
concatMap(index => {
let newtime = index * 500
console.log("newtime ", newtime)
return of(index).pipe(delay(newtime))
}),
Notice, that I'm returning the value back to the stream by of(index).
I think I understand what were you concerned about returning another Observable. Since you want to emit items in sequence (emit one only after the previous one completes) then you have to use concatMap with another inner Observable. There isn't a special operator only for this functionality because this is "composable behavior" which means you can achieve this behavior by combining existing operators.
const source2$ = interval(500)
.pipe(
map(x => x * 500),
switchMap(x => timer(x)),
timeout(3000),
map(data => 'Source 2: ' + data)
)
UPDATE:
https://stackblitz.com/edit/rxjs-iywcm6?devtoolsheight=60
const source2$ = interval(500)
.pipe(
tap(x => console.log('Tick before delay', x)),
concatMap(x => timer((x + 1) * 500).pipe(mapTo(x))),
tap(x => console.log('Tick after delay', x)),
map(data => 'Source 2: ' + data),
timeout(3000)
).subscribe(
(data) => console.log(data),
e => console.error('Timeout', e))
I'm trying to fit a model using fitDataset(). I can train using the "normal" approach, with a for loop and getting random batches of data (20000 data points).
I'd like to use the fitDataset() and be able to use the entire dataset and not rely on "randomness" of my getBatch function.
I'm getting closer, using the API docs and the example on tfjs-data but, i'm stuck on a probably dumb data manipulation...
So here's how i'm doing it:
const [trainX, trainY] = await bigData
const model = await cnnLSTM // gru performing well
const BATCH_SIZE = 32
const dataSet = flattenDataset(trainX.slice(200), trainY.slice(200))
model.compile({
loss: 'categoricalCrossentropy',
optimizer: tf.train.adam(0.001),
metrics: ['accuracy']
})
await model.fitDataset(dataSet.train.batch(32), {
epochs: C.trainSteps,
validationData: dataSet.validation,
callbacks: {
onBatchEnd: async (batch, logs) => (await tf.nextFrame()),
onEpochEnd: (epoch, logs) => {
let i = epoch + 1
lossValues.push({'epoch': i, 'loss': logs.loss, 'val_loss': logs.val_loss, 'set': 'train'})
accuracyValues.push({'epoch': i, 'accuracy': logs.acc, 'val_accuracy': logs.val_acc, 'set': 'train'})
// await md `${await plotLosses(train.lossValues)} ${await plotAccuracy(train.accuracyValues)}`
}
}
})
here's my interpretation of the dataset creation:
flattenDataset = (features, labels, split = 0.35) => {
return tf.tidy(() => {
let slice =features.length - Math.floor(features.length * split)
const featuresTrain = features.slice(0, slice)
const featuresVal = features.slice(slice)
const labelsTrain = labels.slice(0, slice)
const labelsVal = labels.slice(slice)
const data = {
train: tf.data.array(featuresTrain, labelsTrain),
validation: tf.data.array(featuresVal, labelsVal)
}
return data
})
}
I'm getting an error:
Error: Dataset iterator for fitDataset() is expected to generate an Array of length 2: `[xs, ys]`, but instead generates Tensor
[[0.4106583, 0.5408, 0.4885066, 0.9021732, 0.1278526],
[0.3711334, 0.5141, 0.4848816, 0.9021571, 0.2688071],
[0.4336613, 0.5747, 0.4822159, 0.9021728, 0.3694479],
...,
[0.4123166, 0.4553, 0.478438 , 0.9020132, 0.8797594],
[0.3963479, 0.3714, 0.4871198, 0.901996 , 0.7170534],
[0.4832076, 0.3557, 0.4892016, 0.9019232, 0.9999322]],Tensor
[[0.3711334, 0.5141, 0.4848816, 0.9021571, 0.2688071],
[0.4336613, 0.5747, 0.4822159, 0.9021728, 0.3694479],
[0.4140858, 0.5985, 0.4789927, 0.9022084, 0.1912155],
...,
The input data is 6 timesteps with 5 dimensions and the labels are just one-hot encoded classes [0,0,1], [0,1,0] and [1, 0, 0]. I guess the flattenDataset() is not sending the data in the correct way.
Does data.train needs to output for each data point [6 timesteps with 5 dims, label] ? I get this error when i tried that:
Error: The feature data generated by the dataset lacks the required input key 'conv1d_Conv1D5_input'.
Could really use some pro insight...
--------------------
Edit #1:
I feel i'm close to an answer.
const X = tf.data.array(trainX.slice(0, 100))//.map(x => x)
const Y = tf.data.array(trainY.slice(0, 100))//.map(x => x)
const zip = tf.data.zip([X, Y])
const dataSet = {
train: zip
}
dataSet.train.forEach(x => console.log(x))
With this i get on the console:
[Array(6), Array(3)]
[Array(6), Array(3)]
[Array(6), Array(3)]
...
[Array(6), Array(3)]
[Array(6), Array(3)]
but the fitDataset is giving me: Error: The feature data generated by the dataset lacks the required input key 'conv1d_Conv1D5_input'.
my model look like this:
const model = tf.sequential()
model.add(tf.layers.conv1d({
inputShape: [6, 5],
kernelSize: (3),
filters: 64,
strides: 1,
padding: 'same',
activation: 'elu',
kernelInitializer: 'varianceScaling',
}))
model.add(tf.layers.maxPooling1d({poolSize: (2)}))
model.add(tf.layers.conv1d({
kernelSize: (1),
filters: 64,
strides: 1,
padding: 'same',
activation: 'elu'
}))
model.add(tf.layers.maxPooling1d({poolSize: (2)}))
model.add(tf.layers.lstm({
units: 18,
activation: 'elu'
}))
model.add(tf.layers.dense({units: 3, activation: 'softmax'}))
model.compile({
loss: 'categoricalCrossentropy',
optimizer: tf.train.adam(0.001),
metrics: ['accuracy']
})
return model
What is wrong here?
What model.fitDataset expects are a Dataset, each element inside this dataset is a tuple of two items, [feature, label].
So in your case, you need to create featureDataset and labelDataset, then merge then with tf.data.zip to create trainDataset. Same for validation dataset.
Solved it
so after a lot of trial an error i found a way to make it work.
So, i had an input shape of [6, 5], meaning an array with 6 arrays of 5 floats each.
[[[0.3467378, 0.3737, 0.4781905, 0.90665, 0.68142351],
[0.44003019602788285, 0.3106, 0.4864576, 0.90193448, 0.5841830879700972],
[0.30672944860847245, 0.3404, 0.490295674, 0.90720676, 0.8331748581920732],
[0.37475716007758336, 0.265, 0.4847249, 0.902056932, 0.6611207914113887],
[0.5639427928616854, 0.2423002, 0.483168235, 0.9020202294447865, 0.82823],
[0.41581425627336555, 0.4086, 0.4721923, 0.902094287, 0.914699]], ... 20k more]
What i did was to flatten the array becoming an array of 5 dimensions arrays. Then applied the .batch(6) to it.
const BATCH_SIZE = 20 //batch size fed to the NN
const X = tf.data.array([].concat(...trainX)).batch(6).batch(BATCH_SIZE)
const Y = tf.data.array(trainY).batch(BATCH_SIZE)
const zip = tf.data.zip([X, Y])
const dataSet = {
train: zip
}
Hope it can help others on complex data!!
Silly question that has me stumped. I want to give a different debounceTime based on data in the stream. I have:
const fakeData = [{number: 1}, {number: 2}, {number: 3}];
const stream$ = Rx.Observable.from(fakeData);
const delayedStream$ = stream$.concatMap(x => Rx.Observable.of(x).delay(300));
delayedStream$
.concatMap(x => x.number >=2
? Rx.Observable.of(x).debounceTime(500)
: Rx.Observable.of(x).debounceTime(1000)
)
.subscribe(x => console.log(x));
// expected output: 3
// actual output: 1 ... 2 ... 3 |
http://jsbin.com/dafaxoraca/edit?js,console
The above code simply returns x without a debounce. But if I replace debounceTime with delay, the delay works as expected. I'm obviously missing something fundamental between the two operators. I've gone through the docs and am not getting it.
Thanks for your help!
I can't test this with your actual use-case but you're not using debounceTime correctly.
Operator debounceTime applies debounce only on this Observable stream and its data. Since you're chaining concatMap and the returned Observable with debounceTime, the concat will always wait until the Observable completes. So this always returns all three values.
You can use debounce() that expects to get an Observable that lets you use delay by emitting values instead of hardcoded time.
const fakeData = [{number: 1}, {number: 2}, {number: 3}];
const stream$ = Rx.Observable.from(fakeData);
const delayedStream$ = stream$.concatMap(x => Rx.Observable.of(x).delay(300));
delayedStream$
.debounce(val => Rx.Observable.of(true).delay(val >= 2 ? 500 : 1000))
.subscribe(x => console.log(x.number));
See live demo: http://jsbin.com/tifajodogi/1/edit?js,console
This emits just: 3
Update: Since RxJS 5.5+ the same technique can be restructured like so:
const fakeData = [{number: 1}, {number: 2}, {number: 3}];
const stream$ = from(fakeData);
const delayedStream$ = stream$.pipe(concatMap(x => of(x).pipe(delay(300))));
delayedStream$.pipe(
debounce(val => of(true).pipe(delay(val >= 2 ? 500 : 1000)))
subscribe(x => console.log(x.number))
);
I have some pre-defined events set to occur at specific times.
And I have a timer, like this:
const timer = Rx.Observable.interval(100).timeInterval()
.map(x => x.interval)
.scan((ms, total) => total + ms, 0)
The timer emits something close to 100,200,300,400,500 (although in reality it's more like 101,200,302,401,500...which is totally fine)
I also have some stuff I want to do at certain times. For example, let's say I want to do stuff at the following times:
const stuff = Rx.Observable.from([1000, 2000, 2250, 3000, 5000]);
What I'd like is to combine "stuff" and "timer" in such a way that the resulting stream emits a value once per time defined in "stuff" at that time (or ever so slightly later). in this case, that would be t=1000 ms, 2000 ms, 2250 ms, 3000 ms and 5000 ms. Note: the 2250 guy should emit around time 2300 because of the interval size. that's fine. they just can't come early or more than once.
I have one solution, but it's not very good. it re-starts "stuff" every single step (every single 100 ms in this case) and filters it and takes 1. I would prefer that, once an event is emitted from "stuff", that it be gone, so subsequent filters on it don't have those values.
In the real application, there will be stuff and stuff2 and maybe stuff3...(but I will call them something else!)
Thanks in advance! I hope that was clear.
If I've understood what you're after correctly, this should be achievable with a simple projection:
const times$ = stuff.flatMap(x => Rx.Observable.timer(x));
Here's a working sample: https://jsbin.com/negiyizibu/edit?html,js,console,output
Edit
For the second requirement, try something like this:
const times$ = Rx.Observable
.from([{"val":"jeff", "t": 1000}, {"val":"fred", "t": 2500}])
.flatMap(x => Rx.Observable.timer(x.t).map(y => x.val));
https://jsbin.com/cegijudoci/edit?js,console,output
Here's a typescript function I wrote based on Matt's solution.
import {from, timer} from 'rxjs';
import {flatMap, map} from 'rxjs/operators';
export interface ActionQueueEntry {
action: string;
payload?: any;
delay: number;
}
export function actionQueue(entries: ActionQueueEntry[]) {
return from(entries).pipe(flatMap((x: any) => {
return timer(x.delay).pipe(map(y => x));
}));
}
const q = actionQueue([
{action: 'say: hi', delay: 500},
{action: 'ask: how you are', delay: 2500},
{action: 'say: im fine', delay: 5000},
]);
q.subscribe(console.log);