Related
I have set of emitted values (stock market transactions) with time and price and vol like this...
TIME PRICE VOL
13:45:01 12 1
13:45:01 12 1
13:45:01 12 10
13:45:01 13 1
13:45:01 13 3
13:45:02 13 1
And I just want to merge values within the same second and same value and accumulate VOL so the resulting observable will be emitting such values based on source above:
TIME PRICE ACCUMULATED VOL
13:45:01 12 12
13:45:01 13 4
13:45:02 13 1
This is some kind of grouping and then reducing. I'm reading documentation but can't figure out which operators to use...
Can somebody help?
Assuming stockSource is the realtime stream that push buy/sell item. You can use scan to accumulate your data and compute accumulation count there.
stockSource.pipe(scan((acc,curr)=>{
const foundObj=acc.find(obj=>obj.PRICE===curr.PRICE);
if(!foundObj){ return [...acc,curr]}
foundObj.ACCUMULATED++;
return acc
},[])).subscribe()
A Custom Operator
I think there must be a better way to do this, but it didn't come to me, so I made a custom operator.
This might be overkill? No idea, You can try it out/ test it yourself if you're so inclined.
Note that for many operators, returning from(thing) is the same as returning thing. So when I return an array from concatMap, that array is turned into a stream for me.
function customOperator(){
return s => defer(() => {
let currentTime = "";
let buffer = {};
return s.pipe(
concatMap(({time, price, vol}) => {
let ret = [];
if(currentTime != time){
ret = Object.values(buffer);
buffer = {}
}
let accumVol = buffer[price]?.accumulatedVol;
accumVol = accumVol != null? accumVol : 0;
buffer[price] = {
time,
price,
accumulatedVol: accumVol + vol
};
currentTime = time;
return ret;
}),
s => concat(s, defer(() => Object.values(buffer)))
)
})
}
function makeTransaction(time, price, vol){
return ({time, price, vol});
}
of(
makeTransaction("13:45:01", 12, 1),
makeTransaction("13:45:01", 12, 1),
makeTransaction("13:45:01", 12, 10),
makeTransaction("13:45:01", 13, 1),
makeTransaction("13:45:01", 13, 3),
makeTransaction("13:45:02", 13, 1)
).pipe(
customOperator()
).subscribe(console.log);
Unless I misunderstood your requirements, this looks like a job for groupBy:
Generate a key to group items by time and price
There will be one stream per set
Reduce each stream into a single reduced emission to accumulate all volumes
Merge back each stream into the output
const transactions$ =
of( {time: '13:45:01', price: 12, volume: 1}
, {time: '13:45:01', price: 12, volume: 1}
, {time: '13:45:01', price: 12, volume: 10}
, {time: '13:45:01', price: 13, volume: 1}
, {time: '13:45:01', price: 13, volume: 3}
, {time: '13:45:02', price: 13, volume: 1});
const groups$ =
transactions$.pipe( groupBy(t => `${t.price}#${t.time}`)
, mergeMap(group$ =>
group$.pipe(reduce((tt, {time, price, volume}) =>
({time, price, volume: volume + tt.volume}),
{volume: 0}))));
groups$.subscribe(t => console.log(JSON.stringify(t)))
<script src="https://cdnjs.cloudflare.com/ajax/libs/rxjs/7.3.0/rxjs.umd.min.js" integrity="sha512-y3JTS47nnpKORJX8Jn1Rlm+QgRIIZHtu3hWxal0e81avPrqUH48yk+aCi+gprT0RMAcpYa0WCkapxe+bpBHD6g==" crossorigin="anonymous" referrerpolicy="no-referrer"></script>
<script>
const {of} = rxjs;
const {groupBy, mergeMap, reduce} = rxjs.operators;
</script>
How do I filter a publisher for the elements having the highest value without knowing the highest value beforehand?
Here is a little test to illustrate what I'm trying to achieve:
#Test
fun filterForHighestValuesTest() {
val numbers = Flux.just(1, 5, 7, 2, 8, 3, 8, 4, 3)
// what operators to apply to numbers to make the test pass?
StepVerifier.create(numbers)
.expectNext(8)
.expectNext(8)
.verifyComplete()
}
Ive started with the reduce operator:
#Test
fun filterForHighestValuesTestWithReduce() {
val numbers = Flux.just(1, 5, 7, 2, 8, 3, 8, 4, 3)
.reduce { a: Int, b: Int -> if( a > b) a else b }
StepVerifier.create(numbers)
.expectNext(8)
.verifyComplete()
}
and of course that test passes but that will only emit a single Mono whereas I would like to obtain a Flux containing all the elements having the highest values e.g. 8 and 8 in this simple example.
First of all, you'll need state for this so you need to be careful to have per-Subscription state. One way of ensuring that while combining operators is to use compose.
Proposed solution
Flux<Integer> allMatchingHighest = numbers.compose(f -> {
AtomicInteger highestSoFarState = new AtomicInteger(Integer.MIN_VALUE);
AtomicInteger windowState = new AtomicInteger(Integer.MIN_VALUE);
return f.filter(v -> {
int highestSoFar = highestSoFarState.get();
if (v > highestSoFar) {
highestSoFarState.set(v);
return true;
}
if (v == highestSoFar) {
return true;
}
return false;
})
.bufferUntil(i -> i != windowState.getAndSet(i), true)
.log()
.takeLast(1)
.flatMapIterable(Function.identity());
});
Note the whole compose lamdba can be extracted into a method, making the code use a method reference and be more readable.
Explaination
The solution is done in 4 steps, with the two first each having their own AtomicInteger state:
Incrementally find the new "highest" element (so far) and filter out elements that are smaller. This results in a Flux<Integer> of (monotically) increasing numbers, like 1 5 7 8 8.
buffer by chunks of equal number. We use bufferUntil instead of window* or groupBy because the most degenerative case were numbers are all different and already sorted would fail with these
skip all buffers but one (takeLast(1))
"replay" that last buffer, which represents the number of occurrences of our highest value (flatMapIterable)
This correctly pass your StepVerifier test by emitting 8 8. Note the intermediate buffers emitted are:
onNext([1])
onNext([5])
onNext([7, 7, 7])
onNext([8, 8])
More advanced testing, justifying bufferUntil
A far more complex source that would fail with groupBy but not this solution:
Random rng = new Random();
//generate 258 numbers, each randomly repeated 1 to 10 times
//also, shuffle the whole thing
Flux<Integer> numbers = Flux
.range(1, 258)
.flatMap(i -> Mono.just(i).repeat(rng.nextInt(10)))
.collectList()
.map(l -> {
Collections.shuffle(l);
System.out.println(l);
return l;
})
.flatMapIterable(Function.identity())
.hide();
This is one example of what sequence of buffers it could filter into (keep in mind only the last one gets replayed):
onNext([192])
onNext([245])
onNext([250])
onNext([256, 256])
onNext([257])
onNext([258, 258, 258, 258, 258, 258, 258, 258, 258])
onComplete()
Note: If you remove the map that shuffles, then you obtain the "degenerative case" where even windowUntil wouldn't work (the takeLast would result in too many open yet unconsumed windows).
This was a fun one to come up with!
One way to do it is to map the flux of ints to a flux of lists with one int in each, reduce the result, and end with flatMapMany, i.e.
final Flux<Integer> numbers = Flux.just(1, 5, 7, 2, 8, 3, 8, 4, 3);
final Flux<Integer> maxValues =
numbers
.map(
n -> {
List<Integer> list = new ArrayList<>();
list.add(n);
return list;
})
.reduce(
(l1, l2) -> {
if (l1.get(0).compareTo(l2.get(0)) > 0) {
return l1;
} else if (l1.get(0).equals(l2.get(0))) {
l1.addAll(l2);
return l1;
} else {
return l2;
}
})
.flatMapMany(Flux::fromIterable);
One simple solution that worked for me -
Flux<Integer> flux =
Flux.just(1, 5, 7, 2, 8, 3, 8, 4, 3).collectSortedList(Comparator.reverseOrder()).flatMapMany(Flux::fromIterable);
StepVerifier.create(flux).expectNext(8).expectNext(8).expectNext(7).expectNext(5);
One possible solution is to group the Flux prior to the reduction and flatmap the GroupedFlux afterwards like this:
#Test
fun filterForHighestValuesTest() {
val numbers = Flux.just(1, 5, 7, 2, 8, 3, 8, 4, 3)
.groupBy { it }
.reduce { t: GroupedFlux<Int, Int>, u: GroupedFlux<Int, Int> ->
if (t.key()!! > u.key()!!) t else u
}
.flatMapMany {
it
}
StepVerifier.create(numbers)
.expectNext(8)
.expectNext(8)
.verifyComplete()
}
Problem statement: I have an array of N sorted integers and a threshold value K. I would like to group them in such a way that for each element, the difference between the group mean and the element is <= K. What is the best algorithm to use?
I've looked into Jenks' natural breaks and k-means clustering, but both of those seem better suited to a situation where you have a desired number of clusters, whereas I have a desired maximum variance per-cluster.
// example
const distances = [5, 8, 8, 9, 16, 20, 29, 42, 56, 57, 57, 58, 103, 104, 150, 167]
const threshold = 10
// desired output:
// cluster(distances) =>
// [
// [8, 8, 9, 5, 16, 20]
// [29, 42]
// [56, 57, 57, 58]
// [103, 104]
// [150, 167]
// ]
Here's my progress so far: https://gist.github.com/qrohlf/785c667735171b7353702cc74c10857d
I'm probably going to try some kind of divide-and-conquer approach for correcting the 'ballpark' results I get from the implementation I currently have, but I don't really see a great, clean way to do this right now.
I searched and I found this: Unweighted Pair Group Method with Arithmetic Mean.
Here is an article with an example: link. I think it will help you, It looks easy to confirm with your purpose.
The UPGMA algorithm produces rooted dendrograms and requires a constant-rate assumption - that is, it assumes an ultrametric tree in which the distances from the root to every branch tip are equal.
For anyone else bumping into this, here's my (unoptimized) implementation of the UPGMA algorithm described above:
const head = array => array[0]
const tail = array => array.slice(1)
const last = array => array[array.length - 1]
const sum = array => array.reduce((a, b) => a + b)
const avg = array => sum(array) / array.length
const minIndex = array => array.reduce((iMin, x, i) => x < array[iMin] ? i : iMin, 0)
const range = length => Array.apply(null, Array(length)).map((_, i) => i)
const isArray = Array.isArray
const distances = [5, 8, 8, 9, 16, 20, 29, 42, 56, 57, 57, 58, 103, 104, 150, 167, 800]
// cluster an array of numeric values such that the mean difference of each
// point within each cluster is within a threshold value
const cluster = (points, threshold = 10) => {
return _cluster(points, range(points.length).map(i => [i]), threshold).map(c =>
isArray(c) ? c.map(i => points[i]) : [points[c]])
}
// recursive call
const _cluster = (points, clusters, threshold) => {
const matrix = getDistanceMatrix(points, clusters)
// get the minimum col index for each row in the matrix
const rowMinimums = matrix.map(minIndex)
// get the index for the column containing the smallest distance
const bestRow = minIndex(rowMinimums.map((col, row) => matrix[row][col]))
const bestCol = rowMinimums[bestRow]
const isValid = isValidCluster(points, mergeClusters(clusters[bestRow], clusters[bestCol]), threshold)
if (!isValid) {
return clusters
}
return _cluster(points, merge(clusters, bestRow, bestCol), threshold)
}
const isValidCluster = (points, cluster, threshold) => {
// at this point, cluster is guaranteed to be an array, not a single point
const distances = cluster.map(i => points[i])
const mean = avg(distances)
return distances.every(d => Math.abs(mean - d) <= threshold)
}
// immutable merge of indices a and b in clusters
const merge = (clusters, a, b) => {
// merge two clusters by index
const clusterA = clusters[a]
const clusterB = clusters[b]
// optimization opportunity: this filter is causing *another* iteration
// of clusters.
const withoutPoints = clusters.filter(c => c !== clusterA && c !== clusterB)
return [mergeClusters(clusterA, clusterB)].concat(withoutPoints)
}
const mergeClusters = (clusterA, clusterB) => clusterA.concat(clusterB)
// optimization opportunity: this currently does 2x the work needed, since the
// distance from a->b is the same as the distance from b->a
const getDistanceMatrix = (points, clusters) => {
// reduce clusters to distance/average distance
const reduced = clusters.map(c => Array.isArray(c) ? avg(c.map(i => points[i])) : points[c])
return reduced.map((i, row) => reduced.map((j, col) => (row === col) ? Infinity : Math.abs(j - i)))
}
const log2DArray = rows => console.log('[\n' + rows.map(row => ' [' + row.join(', ') + ']').join('\n') + '\n]')
console.log('clustered points:')
log2DArray(cluster(distances))
I'm new to RxJS. I know I could just .filter and .map an observable to get the change I'm looking for. But, is there any method which combines the two into one function?
Yes there is.
FlatMap.
Suppose you have an Observable of numbers (1, 2, 3, 4, 5, ...) and you want to filter for even numbers and map them to x*10.
var tenTimesEvenNumbers = numbers.flatMap(function (x) {
if (x % 2 === 0) {
return Rx.Observable.just(x * 10);
} else {
return Rx.Observable.empty();
}
});
As of rxjs v6.6.7, the solution becomes as following:
// Initialise observable with some numbers
const numbers = of(1, 2, 3, 4, 5, 6, 7, 8, 9, 10);
// Pipe the observable using mergeMap
const tenTimesEvenNumbers = numbers.pipe(
mergeMap((x: number) => {
// If the number is even, return an observable containing the number multiplied by ten
// Otherwise return an empty observable
return x % 2 === 0 ? of(x * 10) : EMPTY;
})
);
// Subscribe to the observable and print the values
tenTimesEvenNumbers.subscribe((value: number) =>
console.log('Value:', value)
);
The above will print:
Value: 20
Value: 40
Value: 60
Value: 80
Value: 100
Here is a working stackblitz as well.
I'd like to alternately combine elements of multiple streams:
var print = console.log.bind(console);
var s1 = Rx.Observable.fromArray([1, 1, 5]);
var s2 = Rx.Observable.fromArray([2, 9]);
var s3 = Rx.Observable.fromArray([3, 4, 6, 7, 8]);
alternate(s1, s2, s3).subscribe(print); // 1, 2, 3, 1, 9, 4, 5, 6, 7, 8
How looks the function definition of alternate?
Use zip and concatMap when working on observables that were created from arrays (as in your example), or zip and flatMap when working on observables that are inherently asynchronous.
Rx.Observable
.zip(s1, s2, s3, function(x,y,z) { return [x,y,z]; })
.concatMap(function (list) { return Rx.Observable.from(list); })
.subscribe(print); // 1, 2, 3, 1, 9, 4
Notice that this doesn't proceed anymore once one of the source observables completes. That's because zip is strictly "balanced" and it waits until all the sources have a matching event. What you want is a somewhat loose version of zip when dealing with completed sources.
If there is a value (for example undefined) that is not emitted by the source observables, this solution works:
var concat = Rx.Observable.concat;
var repeat = Rx.Observable.repeat;
var zipArray = Rx.Observable.zipArray;
var fromArray = Rx.Observable.fromArray;
var print = console.log.bind(console);
var s1 = fromArray([1, 1, 5]);
var s2 = fromArray([2, 9]);
var s3 = fromArray([3, 4, 6, 7, 8]);
alternate(s1, s2, s3).subscribe(print);
function alternate() {
var sources = Array.slice(arguments).map(function(s) {
return concat(s, repeat(undefined))
});
return zipArray(sources)
.map(function(values) {
return values.filter(function(x) {
return x !== undefined;
});
}).takeWhile(function(values) {
return values.length > 0;
}).concatMap(function (list) { return fromArray(list); })
}
Same example in ES6:
const {concat, repeat, zipArray, fromArray} = Rx.Observable;
var print = console.log.bind(console);
var s1 = fromArray([1, 1, 5]);
var s2 = fromArray([2, 9]);
var s3 = fromArray([3, 4, 6, 7, 8]);
alternate(s1, s2, s3).subscribe(print);
function alternate(...sources) {
return zipArray(sources.map( (s) => concat(s, repeat(undefined)) ))
.map((values) => values.filter( (x) => x !== undefined ))
.takeWhile( (values) => values.length > 0)
.concatMap( (list) => fromArray(list) )
}