Performance comparison between Rxjava2, Java 8 Streams, Plain Old Iteration - performance

I have become a big fan of functional programming in java in Java 8 and also Rx java. but a colleague recently pointed out that there is a performance hit using these. So decided to run JMH Bench marking but it seems he was right. No matter what i do, i can't get the streams version to give better performance. Below is my code
#OutputTimeUnit(TimeUnit.NANOSECONDS)
#BenchmarkMode(Mode.AverageTime)
#OperationsPerInvocation(StreamVsVanilla.N)
public class StreamVsVanilla {
public static final int N = 10000;
static List<Integer> sourceList = new ArrayList<>(N);
static {
for (int i = 0; i < N; i++) {
sourceList.add(i);
}
}
#Benchmark
public List<Double> vanilla() {
List<Double> result = new ArrayList<Double>(sourceList.size() / 2 + 1);
for (Integer i : sourceList) {
if (i % 2 == 0){
result.add(Math.sqrt(i));
}
}
return result;
}
#Benchmark
public List<Double> stream() {
return sourceList.stream().parallel()
.mapToInt(Integer::intValue)
.filter(i -> i % 2 == 0)
.mapToDouble(i->(double)i)
.map(Math::sqrt)
.boxed()
.collect(Collectors.toList());
}
#Benchmark
public List<Double> rxjava2(){
return Flowable.fromIterable(sourceList)
.parallel()
.runOn(Schedulers.computation())
.filter(i->i%2==0)
.map(Math::sqrt)
.collect(()->new ArrayList<Double>(sourceList.size()/2+1),ArrayList::add)
.sequential()
.blockingFirst();
}
public static void main(String[] args) throws RunnerException {
Options options = new OptionsBuilder()
.include(StreamVsVanilla.class.getSimpleName()).threads(1)
.forks(1).shouldFailOnError(true).shouldDoGC(true)
.jvmArgs("-server").build();
new Runner(options).run();
}
}
Results for above code:
# Run complete. Total time: 00:03:16
Benchmark Mode Cnt Score Error Units
StreamVsVanilla.rxjava2 avgt 20 1179.733 ± 322.421 ns/op
StreamVsVanilla.stream avgt 20 10.556 ± 1.195 ns/op
StreamVsVanilla.vanilla avgt 20 8.220 ± 0.705 ns/op
Even if i remove parellal operators and use sequential versions as below:
#OutputTimeUnit(TimeUnit.NANOSECONDS)
#BenchmarkMode(Mode.AverageTime)
#OperationsPerInvocation(StreamVsVanilla.N)
public class StreamVsVanilla {
public static final int N = 10000;
static List<Integer> sourceList = new ArrayList<>(N);
static {
for (int i = 0; i < N; i++) {
sourceList.add(i);
}
}
#Benchmark
public List<Double> vanilla() {
List<Double> result = new ArrayList<Double>(sourceList.size() / 2 + 1);
for (Integer i : sourceList) {
if (i % 2 == 0){
result.add(Math.sqrt(i));
}
}
return result;
}
#Benchmark
public List<Double> stream() {
return sourceList.stream()
.mapToInt(Integer::intValue)
.filter(i -> i % 2 == 0)
.mapToDouble(i->(double)i)
.map(Math::sqrt)
.boxed()
.collect(Collectors.toList());
}
#Benchmark
public List<Double> rxjava2(){
return Observable.fromIterable(sourceList)
.filter(i->i%2==0)
.map(Math::sqrt)
.collect(()->new ArrayList<Double>(sourceList.size()/2+1),ArrayList::add)
.blockingGet();
}
public static void main(String[] args) throws RunnerException {
Options options = new OptionsBuilder()
.include(StreamVsVanilla.class.getSimpleName()).threads(1)
.forks(1).shouldFailOnError(true).shouldDoGC(true)
.jvmArgs("-server").build();
new Runner(options).run();
}
}
The Results are not very favourable:
# Run complete. Total time: 00:03:16
Benchmark Mode Cnt Score Error Units
StreamVsVanilla.rxjava2 avgt 20 12.226 ± 0.603 ns/op
StreamVsVanilla.stream avgt 20 13.432 ± 0.858 ns/op
StreamVsVanilla.vanilla avgt 20 7.678 ± 0.350 ns/op
Can somebody help me figure out what m i doing wrong?
Edit:
akarnokd pointed i am using extra stage to unbox and box in my stream version during sequential verstion(i added it to avoid implicit boxing unboxing in filter and map methods), however it got slower so i tried without those with code below
#OutputTimeUnit(TimeUnit.NANOSECONDS)
#BenchmarkMode(Mode.AverageTime)
#OperationsPerInvocation(StreamVsVanilla.N)
public class StreamVsVanilla {
public static final int N = 10000;
static List<Integer> sourceList = new ArrayList<>(N);
static {
for (int i = 0; i < N; i++) {
sourceList.add(i);
}
}
#Benchmark
public List<Double> vanilla() {
List<Double> result = new ArrayList<Double>(sourceList.size() / 2 + 1);
for (Integer i : sourceList) {
if (i % 2 == 0){
result.add(Math.sqrt(i));
}
}
return result;
}
#Benchmark
public List<Double> stream() {
return sourceList.stream()
.filter(i -> i % 2 == 0)
.map(Math::sqrt)
.collect(Collectors.toList());
}
#Benchmark
public List<Double> rxjava2(){
return Observable.fromIterable(sourceList)
.filter(i->i%2==0)
.map(Math::sqrt)
.collect(()->new ArrayList<Double>(sourceList.size()/2+1),ArrayList::add)
.blockingGet();
}
public static void main(String[] args) throws RunnerException {
Options options = new OptionsBuilder()
.include(StreamVsVanilla.class.getSimpleName()).threads(1)
.forks(1).shouldFailOnError(true).shouldDoGC(true)
.jvmArgs("-server").build();
new Runner(options).run();
}
}
Results are still more or less same:
# Run complete. Total time: 00:03:16
Benchmark Mode Cnt Score Error Units
StreamVsVanilla.rxjava2 avgt 20 10.864 ± 0.555 ns/op
StreamVsVanilla.stream avgt 20 10.466 ± 0.050 ns/op
StreamVsVanilla.vanilla avgt 20 7.513 ± 0.136 ns/op

For the parallel version
It is relatively expensive to fire up and dispatch values to multiple threads. To offset this, the parallel computation is usually several times more costly than the infrastructure overhead. However, with your case in RxJava, Math::sqrt is so trivial the parallel overhead dominates the performance.
Then why is Stream two orders of magnitude faster? I can only assume that thread stealing comes in where the benchmark thread does most of the actual work and maybe one background thread does some small amount of the rest because by the time the background thread spins up, the main thread has stolen most of the tasks back. Therefore, there you don't have strict parallel execution like with RxJava's parallel where the operator dispatches work in a round-robin fashion so that all parallel rails could become busy roughly equally.
For the sequential version
I think the fact that you have extra unboxing and boxing stages in your Stream version adds a little bit of overhead. Try without it:
return sourceList.stream()
.filter(i -> i % 2 == 0)
.map(Math::sqrt)
.collect(Collectors.toList());

Related

Spring Batch AbstractPagingItemReader stop when result size id less than ITEMS_BY_PAGE

I've add AbstractPagingItemReader to read by packet from api by 100 items
The execution is OK until the size of returned items is less than 100
Here is my code :
public static final int ITEMS_BY_PAGE = 10;
#Override
protected void doReadPage() {
setPageSize(ITEMS_BY_PAGE);
results = this.documentService.searchRecordsWithAwaitingDocument(getPage(), ITEMS_BY_PAGE);
}
#Override
protected void doJumpToPage(int itemIndex) {
}
Is there a way to not stop the execution when the size of returned items is less than ITEMS_BY_PAGE ? i want to stop the execution only when there is no result on return (size 0) ?

How would I split the contents of an array with a whitespace?

0
I have an assignment which asks for everything I have in the code below. That all works fine - I just need to calculate any monthly hours over 160 hours to be paid at 1.5 times the normal hourly rate. My math seems sound and calculates fine:
((hours - 160) * overtime) + (160 * hourlyRate)
But I dont know if I'm putting this if statement in the right method or if it even should be an if statement. My increase/decreasePay methods are working prior to this and they need to stay. I removed some things so it's easier to read.
HourlyWorker Class:
public class HourlyWorker extends Employee
{
private int hours;
private double hourlyRate;
private double monthlyPay;
private double overtime = (1.5 * hourlyRate);
public HourlyWorker(String last, String first, String ID, double rate)
{
super(last, first, ID);
hourlyRate = rate;
}
public void setHours(int hours)
{
this.hours = hours;
}
public int getHours()
{
return hours;
}
public void setHourlyRate(double rate)
{
this.hourlyRate = rate;
}
public double getHourlyRate()
{
return hourlyRate;
}
public double getMonthlyPay()
{
if (hours > 160)
{
monthlyPay = ((hours - 160) * overtime) + (160 * hourlyRate);
}
else
{
monthlyPay = hourlyRate * hours;
}
return monthlyPay;
}
public void increasePay(double percentage)
{
hourlyRate *= 1 + percentage / 100;
}
public void decreasePay(double percentage)
{
hourlyRate *= 1 - percentage / 100;
}
}
What I'm testing with:
public class TestEmployee2
{
public static void main(String[] args)
{
Employee [] staff = new Employee[3];
HourlyWorker hw1 = new HourlyWorker("Bee", "Busy", "BB1265", 10);
hw1.setHours(200);
staff[0] = hw1;
System.out.println(staff[0].getMonthlyPay());
staff[0].increasePay(10);
System.out.println(staff[0].getMonthlyPay());
}
}
Output is:
1600 (initial monthly rate, with 40 overtime hours and 160 regular hours)
1760 (10% increase to the monthlyPay)
Should be:
2006
2206.6
String.split() will do the trick.
go over the list of artists you have a split each row to artist/genre.
for (String artist : artists) {
String[] split = artist.split(" ");
// add some data validation to avoid ArrayIndexOutOfBounds
String name = split[0];
String genre = split[1];
}
You can use Files.readAllLines(myPath) to read from File.
If you are familiar with Streams from Java 8 you can use Streams on read Lines from File.
Using .stream() and collecting them in a format you want. Either as list or joining them to a single String.

I have a long time to stay at making some RxJava codes better to average some data

I have a collection of data like dummy below
class Place {
userId,
price
}
That means a collection of some places.
Use-case:
There is a user with userId and login.
How to calc average place-price that equal to userId ?
RxJava is nice and I have tried filter and toList, however it is not so performance nice.
Observable.fromIterable(places)
.subscribeOn(Schedulers.newThread())
.filter(new Predicate<Place>() {
#Override
public boolean test(Place place) throws Exception {
return place.userId == global.login.userId;
}
})
.toList()
.observeOn(AndroidSchedulers.mainThread())
.subscribe(new Consumer<List<Place>>() {
#Override
public void accept(List<Place> filteredPlace) throws Exception {
//Here I have to use loop to do math-average, it is not nice to average.
}
});
If the places is something that is already available in-memory, you can rearrange the evaluation such as this:
Observable.just(places)
.subscribeOn(Schedulers.computation())
.map((Iterable<Place> list) -> {
double sum = 0.0;
int count = 0;
for (Place p : list) {
if (p.userId == global.login.userId) {
sum += p.price;
count++;
}
}
return sum / count;
})
.observeOn(AndroidSchedulers.mainThread())
.subscribe(average -> { /* display average */ });
If the sequence of places becomes available over time (through an Observable):
Observable<Place> places = ...
places
.observeOn(Schedulers.computation())
.filter((Place p) -> p.userId == global.login.userId)
.compose(o -> MathObservable.averageDouble(o.map(p -> p.price)))
.observeOn(AndroidSchedulers.mainThread())
.subscribe(average -> { /* display average */ });
MathObservable is part of the RxJava 2 Extensions library.

Throughput measure

I have to implement a limitation algorithm in order to avoid to reach a throughput limit imposed by the service I'm interacting with.
The limit is specified as «N requests over 1 day» where N is of the order of magnitude of 10^6.
I have a distributed system of clients interacting with the service so they should share the measure.
An exact solution should involve to record all the events and than computing the limit «when» the event of calling the service occur: of course this approach is too expensive and so I'm looking for an approximate solution.
The first one I devised imply to discretize the detection of the events: for example maintaing 24 counters at most and recording the number of requests occurred within an hour.
Acceptable.
But I feel that a more elegant, even if leaded by different «forces», is to declinate the approach to the continuum.
Let's say recording the last N events I could easily infer the «current» throughput. Of course this algorithm suffer for missing consideration of the past events occurred the hours before. I could improve with with an aging algorithm but… and here follow my question:
Q: «There's an elegant approximate solution to the problem of estimating the throughput of a service over a long period with and high rate of events?»
As per my comments, you should use a monitor and have it sample the values every 15 minutes or something to get a reasonable guess of the number of requests.
I mocked something up here but haven't tested it, should give you a starter.
import java.util.LinkedList;
import java.util.Queue;
import java.util.Timer;
import java.util.TimerTask;
public class TestCounter {
private final Monitor monitor;
private TestCounter() {
monitor = new Monitor();
}
/** The thing you are limiting */
public void myService() {
if (monitor.isThresholdExceeded()) {
//Return error
} else {
monitor.incremenetCounter();
//do stuff
}
}
public static void main(String[] args) {
TestCounter t = new TestCounter();
for (int i = 0; i < 100000; i++) {
t.myService();
}
for (int i = 0; i < 100000; i++) {
t.myService();
}
}
private class Monitor {
private final Queue<Integer> queue = new LinkedList<Integer>();
private int counter = 1;
/** Number of 15 minute periods in a day. */
private final int numberOfSamples = 76;
private final int threshold = 1000000;
private boolean thresholdExceeded;
public Monitor() {
//Schedule a sample every 15 minutes.
Timer t = new Timer();
t.scheduleAtFixedRate(new TimerTask() {
#Override
public void run() {
sampleCounter();
}
}, 0l, 900000 /** ms in 15 minutes */
);
}
/** Could synchroinise */
void incremenetCounter() {
counter++;
}
/** Could synchroinise */
void sampleCounter() {
int tempCount = counter;
counter = 0;
queue.add(tempCount);
if (queue.size() > numberOfSamples) {
queue.poll();
}
int totalCount = 0;
for (Integer value : queue) {
totalCount += value;
}
if (totalCount > threshold) {
thresholdExceeded = true;
} else {
thresholdExceeded = false;
}
}
public boolean isThresholdExceeded() {
return thresholdExceeded;
}
}
}

Bloom filter design

I wanted to know where I can find an implementation of the Bloom filter, with some explanation about the choice of the hash functions.
Additionally I have the following questions:
1) the Bloom filter is known to have false positives. Is it possible to reduce them by using two filters, one for used elements, and one for non-used elements (assuming the set is finite and known a-priori) and compare the two?
2) are there other similar algorithms in the CS literature?
My intuition is that you'll get a better reduction in false positives by using the additional space that the anti-filter would have occupied to just expand the positive filter.
As for resources, the papers referenced for March 8 from my course syllabus would be useful.
An Java implementation of the Bloom filter can be found from here. In case you cannot view it, I will paste the code in the following (with comments in Chinese).
import java.util.BitSet;
publicclass BloomFilter
{
/* BitSet初始分配2^24个bit */
privatestaticfinalint DEFAULT_SIZE =1<<25;
/* 不同哈希函数的种子,一般应取质数 */
privatestaticfinalint[] seeds =newint[] { 5, 7, 11, 13, 31, 37, 61 };
private BitSet bits =new BitSet(DEFAULT_SIZE);
/* 哈希函数对象 */
private SimpleHash[] func =new SimpleHash[seeds.length];
public BloomFilter()
{
for (int i =0; i < seeds.length; i++)
{
func[i] =new SimpleHash(DEFAULT_SIZE, seeds[i]);
}
}
// 将字符串标记到bits中
publicvoid add(String value)
{
for (SimpleHash f : func)
{
bits.set(f.hash(value), true);
}
}
//判断字符串是否已经被bits标记
publicboolean contains(String value)
{
if (value ==null)
{
returnfalse;
}
boolean ret =true;
for (SimpleHash f : func)
{
ret = ret && bits.get(f.hash(value));
}
return ret;
}
/* 哈希函数类 */
publicstaticclass SimpleHash
{
privateint cap;
privateint seed;
public SimpleHash(int cap, int seed)
{
this.cap = cap;
this.seed = seed;
}
//hash函数,采用简单的加权和hash
publicint hash(String value)
{
int result =0;
int len = value.length();
for (int i =0; i < len; i++)
{
result = seed * result + value.charAt(i);
}
return (cap -1) & result;
}
}
}
In terms of designing Bloom filter, the number of hash functions your bloom filter need can be determined as in here also refering the Wikipedia article about Bloom filters, then you find a section Probability of false positives. This section explains how the number of hash functions influences the probabilities of false positives and gives you the formula to determine k from the desired expected prob. of false positives.
Quote from the Wikipedia article:
Obviously, the probability of false
positives decreases as m (the number
of bits in the array) increases, and
increases as n (the number of inserted
elements) increases. For a given m and
n, the value of k (the number of hash
functions) that minimizes the
probability is
It's very easy to implement a Bloom filter using Java 8 features. You just need a long[] to store the bits, and a few hash functions, which you can represent with ToIntFunction<T>. I made a brief write up on doing this from scratch.
The part to be careful about is selecting the right bit from the array.
public class BloomFilter<T> {
private final long[] array;
private final int size;
private final List<ToIntFunction<T>> hashFunctions;
public BloomFilter(long[] array, int logicalSize, List<ToIntFunction<T>> hashFunctions) {
this.array = array;
this.size = logicalSize;
this.hashFunctions = hashFunctions;
}
public void add(T value) {
for (ToIntFunction<T> function : hashFunctions) {
int hash = mapHash(function.applyAsInt(value));
array[hash >>> 6] |= 1L << hash;
}
}
public boolean mightContain(T value) {
for (ToIntFunction<T> function : hashFunctions) {
int hash = mapHash(function.applyAsInt(value));
if ((array[hash >>> 6] & (1L << hash)) == 0) {
return false;
}
}
return true;
}
private int mapHash(int hash) {
return hash & (size - 1);
}
public static <T> Builder<T> builder() {
return new Builder<>();
}
public static class Builder<T> {
private int size;
private List<ToIntFunction<T>> hashFunctions;
public Builder<T> withSize(int size) {
this.size = size;
return this;
}
public Builder<T> withHashFunctions(List<ToIntFunction<T>> hashFunctions) {
this.hashFunctions = hashFunctions;
return this;
}
public BloomFilter<T> build() {
return new BloomFilter<>(new long[size >>> 6], size, hashFunctions);
}
}
}
I think we should look at the application of Bloom Filters, and the secret is in the name, it is a Filter and not a data-structure. It is mostly used for saving resources by checking whether items are not part of a set. If you want to minimize false positives to 0, you will have to insert all the items that aren't apart of the set, since there are no false negatives for a well designed Bloom Filter, except that bloom filter would be gigantic and unpractical, might as well just store the items in a skip-list :) I wrote a simple tutorial on Bloom Filters if anyone is interested.
http://techeffigy.wordpress.com/2014/06/05/bloom-filter-tutorial/

Resources