Responsive asynchronous search-as-you-type in Java 8 - java-8

I'm trying to implement a "search as you type" pattern in Java.
The goal of the design is that no change gets lost but at the same time, the (time consuming) search operation should be able to abort early and try with the updated pattern.
Here is what I've come up so far (Java 8 pseudocode):
AtomicReference<String> patternRef
AtomicLong modificationCount
ReentrantLock busy;
Consumer<List<ResultType>> resultConsumer;
// This is called in a background thread every time the user presses a key
void search(String pattern) {
// Update the pattern
synchronized {
patternRef.set(pattern)
modificationCount.inc()
}
try {
if (!busy.tryLock()) {
// Another search is already running, let it handle the change
return;
}
// Get local copy of the pattern and modCount
synchronized {
String patternCopy = patternRef.get();
long modCount = modificationCount.get()
}
while (true) {
// Try the search. It will return false when modificationCount changes before the search is finished
boolean success = doSearch(patternCopy, modCount)
if (success) {
// Search completed before modCount was changed again
break
}
// Try again with new pattern+modCount
synchronized {
patternCopy = patternRef.get();
modCount = modificationCount.get()
}
}
} finally {
busy.unlock();
}
}
boolean doSearch(String pattern, long modCount)
... search database ...
if (modCount != modificationCount.get()) {
return false;
}
... prepare results ...
if (modCount != modificationCount.get()) {
return false;
}
resultConsumer.accept(result); // Consumer for the UI code to do something
return modCount == modificationCount.get();
}
Did I miss some important point? A race condition or something similar?
Is there something in Java 8 which would make the code above more simple?

The fundamental problem of this code can be summarized as “trying to achieve atomicity by multiple distinct atomic constructs”. The combination of multiple atomic constructs is not atomic and trying to reestablish atomicity leads to very complicated, usually broken, and inefficient code.
In your case, doSearch’s last check modCount == modificationCount.get() happens while still holding the lock. After that, another thread (or multiple other threads) could update the search string and mod count, followed by finding the lock occupied, hence, concluding that another search is running and will take care.
But that thread doesn’t care after that last modCount == modificationCount.get() check. The caller just does if (success) { break; }, followed by the finally { busy.unlock(); } and returns.
So the answer is, yes, you have potential race conditions.
So, instead of settling on two atomic variables, synchronized blocks, and a ReentrantLock, you should use one atomic construct, e.g. a single atomic variable:
final AtomicReference<String> patternRef = new AtomicReference<>();
Consumer<List<ResultType>> resultConsumer;
// This is called in a background thread every time the user presses a key
void search(String pattern) {
if(patternRef.getAndSet(pattern) != null) return;
// Try the search. doSearch will return false when not completed
while(!doSearch(pattern) || !patternRef.compareAndSet(pattern, null))
pattern = patternRef.get();
}
boolean doSearch(String pattern) {
//... search database ...
if(pattern != (Object)patternRef.get()) {
return false;
}
//... prepare results ...
if(pattern != (Object)patternRef.get()) {
return false;
}
resultConsumer.accept(result); // Consumer for the UI code to do something
return true;
}
Here, a value of null indicates that no search is running, so if a background thread sets this to a non-null value and finds the old value to be null (in an atomic operation), it knows it has to perform the actual search. After the search, it tries to set the reference to null again, using compareAndSet with the pattern used for the search. Thus, it can only succeed if it has not changed again. Otherwise, it will fetch the new value and repeat.
These two atomic updates are already sufficient to ensure that there is only a single search operation at a time while not missing an updated search pattern. The ability of doSearch to return early when it detects a change, is just a nice to have and not required by the caller’s loop.
Note that in this example, the check within doSearch has been reduced to a reference comparison (using a cast to Object to prevent compiler warnings), to demonstrate that it can be as cheap as the int comparison of your original approach. As long as no new string has been set, the reference will be the same.
But, in fact, you could also use a string comparison, i.e. if(!pattern.equals(patternRef.get())) { return false; } without a significant performance degradation. String comparison is not (necessarily) expensive in Java. The first thing, the implementation of String’s equals does, is a reference comparison. So if the string has not changed, it will return true immediately here. Otherwise, it will check the lengths then (unlike C strings, the length is known beforehand) and return false immediately on a mismatch. So in the typical scenario of the user typing another character or pressing backspace, the lengths will differ and the comparison bail out immediately.

Related

Why filter with side effects performs better than a Spliterator based implementation?

Regarding the question How to skip even lines of a Stream obtained from the Files.lines I followed the accepted answer approach implementing my own filterEven() method based on Spliterator<T> interface, e.g.:
public static <T> Stream<T> filterEven(Stream<T> src) {
Spliterator<T> iter = src.spliterator();
AbstractSpliterator<T> res = new AbstractSpliterator<T>(Long.MAX_VALUE, Spliterator.ORDERED)
{
#Override
public boolean tryAdvance(Consumer<? super T> action) {
iter.tryAdvance(item -> {}); // discard
return iter.tryAdvance(action); // use
}
};
return StreamSupport.stream(res, false);
}
which I can use in the following way:
Stream<DomainObject> res = Files.lines(src)
filterEven(res)
.map(line -> toDomainObject(line))
However measuring the performance of this approach against the next one which uses a filter() with side effects I noticed that the next one performs better:
final int[] counter = {0};
final Predicate<String> isEvenLine = item -> ++counter[0] % 2 == 0;
Stream<DomainObject> res = Files.lines(src)
.filter(line -> isEvenLine ())
.map(line -> toDomainObject(line))
I tested the performance with JMH and I am not including the file load in the benchmark. I previously load it into an array. Then each benchmark starts by creating a Stream<String> from previous array, then filtering even lines, then applying a mapToInt() to extract the value of an int field and finally a max() operation. Here it is one of the benchmarks (you can check the whole Program here and here you have the data file with about 186 lines):
#Benchmark
public int maxTempFilterEven(DataSource src){
Stream<String> content = Arrays.stream(src.data)
.filter(s-> s.charAt(0) != '#') // Filter comments
.skip(1); // Skip line: Not available
return filterEven(content) // Filter daily info and skip hourly
.mapToInt(line -> parseInt(line.substring(14, 16)))
.max()
.getAsInt();
}
I am not getting why the filter() approach has better performance (~80ops/ms) than the filterEven() (~50ops/ms)?
Intro
I think I know the reason but unfortunately I have no idea how to improve performance of Spliterator-based solution (at least without rewritting of the whole Streams API feature).
Sidenote 1: performance was not the most important design goal when Stream API was designed. If performance is critical, most probably re-writting the code without Stream API will make the code faster. (For example, Stream API unavoidably increases memory allocation and thus GC-pressure). On the other hand in most of the scenarios Stream API provides a nicer higher-level API at a cost of a relatively small performance degradation.
Part 1 or Short theoretical answer
Stream is designed to implement a kind of internal iteration as the main mean of consuming and external iteration (i.e. Spliterator-based) is an additional mean that is kind of "emulated". Thus external iteration involves some overhead. Laziness adds some limits to the efficiency of external iteration and a need to support flatMap makes it necessary to use some kind of dynamic buffer in this process.
Sidenote 2 In some cases Spliterator-based iteration might be as fast as the internal iteration (i.e. filter in this case). Particularly it is so in the cases when you create a Spliterator directly from that data-containing Stream. To see it, you can modify your tests to materialize your first filter into a Strings array:
String[] filteredData = Arrays.stream(src.data)
.filter(s-> s.charAt(0) != '#') // Filter comments
.skip(1)
.toArray(String[]::new);
and then compare preformance of maxTempFilter and maxTempFilterEven modified to accept that pre-filtered String[] filteredData. If you want to know why this is so, you probably should read the rest of this long answer or at least Part 2.
Part 2 or Longer theoretical answer:
Streams were designed to be mainly consumed as a whole by some terminal operation. Iterating elements one by one although supported is not designed as a main way to consume streams.
Note that using the "functional" Stream API such as map, flatMap, filter, reduce, and collect you can't say at some step "I have had enough data, stop iterating over the source and pushing values". You can discard some incoming data (as filter does) but can't stop iteration. (take and skip transformations are actually implemented using Spliterator inside; and anyMatch, allMatch, noneMatch, findFirst, findAny, etc. use non-public API j.u.s.Sink.cancellationRequested, also they are easier as there can't be several terminal operations). If all transformations in the pipeline are synchronous, you can combine them into a single aggregated function (Consumer) and call it in a simple loop (optionally splitting the loop execution over several thread). This is what my simplified version of the state based filter represents (see the code in the Show me some code section). It gets a bit more complicated if there is a flatMap in the pipeline but idea is still the same.
Spliterator-based transformation is fundamentally different because it adds an asynchronous consumer-driven step to the pipeline. Now the Spliterator rather than the source Stream drives the iteration process. If you ask for a Spliterator directly on the source Stream, it might be able to return you some implementation that just iterates over its internal data structure and this is why materializing pre-filtered data should remove performance difference. However, if you create a Spliterator for some non-empty pipeline, there is no other (simple) choice other than asking the source to push elements one by one through the pipeline until some element passes all the filters (see also second example in the Show me some code section). The fact that source elements are pushed one by one rather than in some batches is a consequence of the fundamental decision to make Streams lazy. The need for a buffer instead of just one element is the consequence of support for flatMap: pushing one element from the source can produce many elements for Spliterator.
Part 3 or Show me some code
This part tries to provide some backing with the code (both links to the real code and simulated code) of what was described in the "theoretical" parts.
First of all, you should know that current Streams API implementation accumulates non-terminal (intermediate) operations into a single lazy pipeline (see j.u.s.AbstractPipeline and its children such as j.u.s.ReferencePipeline. Then, when the terminal operation is applied, all the elements from the original Stream are "pushed" through the pipeline.
What you see is the result of two things:
the fact that streams pipelines are different for cases when you
have a Spliterator-based step inside.
the fact that your OddLines is not the first step in the pipeline
The code with a stateful filter is more or less similar to the following straightforward code:
static int similarToFilter(String[] data)
{
final int[] counter = {0};
final Predicate<String> isEvenLine = item -> ++counter[0] % 2 == 0;
int skip = 1;
boolean reduceEmpty = true;
int reduceState = 0;
for (String outerEl : data)
{
if (outerEl.charAt(0) != '#')
{
if (skip > 0)
skip--;
else
{
if (isEvenLine.test(outerEl))
{
int intEl = parseInt(outerEl.substring(14, 16));
if (reduceEmpty)
{
reduceState = intEl;
reduceEmpty = false;
}
else
{
reduceState = Math.max(reduceState, intEl);
}
}
}
}
}
return reduceState;
}
Note that this is effectively a single loop with some calculations (filtering/transformations) inside.
When you add a Spliterator into the pipeline on the other hand, things change significantly and even with simplifications code that is reasonably similar to what actually happens becomes much larger such as:
interface Sp<T>
{
public boolean tryAdvance(Consumer<? super T> action);
}
static class ArraySp<T> implements Sp<T>
{
private final T[] array;
private int pos;
public ArraySp(T[] array)
{
this.array = array;
}
#Override
public boolean tryAdvance(Consumer<? super T> action)
{
if (pos < array.length)
{
action.accept(array[pos]);
pos++;
return true;
}
else
{
return false;
}
}
}
static class WrappingSp<T> implements Sp<T>, Consumer<T>
{
private final Sp<T> sourceSp;
private final Predicate<T> filter;
private final ArrayList<T> buffer = new ArrayList<T>();
private int pos;
public WrappingSp(Sp<T> sourceSp, Predicate<T> filter)
{
this.sourceSp = sourceSp;
this.filter = filter;
}
#Override
public void accept(T t)
{
buffer.add(t);
}
#Override
public boolean tryAdvance(Consumer<? super T> action)
{
while (true)
{
if (pos >= buffer.size())
{
pos = 0;
buffer.clear();
sourceSp.tryAdvance(this);
}
// failed to fill buffer
if (buffer.size() == 0)
return false;
T nextElem = buffer.get(pos);
pos++;
if (filter.test(nextElem))
{
action.accept(nextElem);
return true;
}
}
}
}
static class OddLineSp<T> implements Sp<T>, Consumer<T>
{
private Sp<T> sourceSp;
public OddLineSp(Sp<T> sourceSp)
{
this.sourceSp = sourceSp;
}
#Override
public boolean tryAdvance(Consumer<? super T> action)
{
if (sourceSp == null)
return false;
sourceSp.tryAdvance(this);
if (!sourceSp.tryAdvance(action))
{
sourceSp = null;
}
return true;
}
#Override
public void accept(T t)
{
}
}
static class ReduceIntMax
{
boolean reduceEmpty = true;
int reduceState = 0;
public int getReduceState()
{
return reduceState;
}
public void accept(int t)
{
if (reduceEmpty)
{
reduceEmpty = false;
reduceState = t;
}
else
{
reduceState = Math.max(reduceState, t);
}
}
}
static int similarToSpliterator(String[] data)
{
ArraySp<String> src = new ArraySp<>(data);
int[] skip = new int[1];
skip[0] = 1;
WrappingSp<String> firstFilter = new WrappingSp<String>(src, (s) ->
{
if (s.charAt(0) == '#')
return false;
if (skip[0] != 0)
{
skip[0]--;
return false;
}
return true;
});
OddLineSp<String> oddLines = new OddLineSp<>(firstFilter);
final ReduceIntMax reduceIntMax = new ReduceIntMax();
while (oddLines.tryAdvance(s ->
{
int intValue = parseInt(s.substring(14, 16));
reduceIntMax.accept(intValue);
})) ; // do nothing in the loop body
return reduceIntMax.getReduceState();
}
This code is larger because the logic is impossible (or at least very hard) to represent without some non-trivial stateful callbacks inside the loop. Here interface Sp is a mix of j.u.s.Stream and j.u.Spliterator interfaces.
Class ArraySp represents a result of Arrays.stream.
Class WrappingSp is similar to j.u.s.StreamSpliterators.WrappingSpliterator which in the real code represents an implementation of Spliterator interface for any non-empty pipeline i.e. a Stream with at least one intermediate operation applied to it (see j.u.s.AbstractPipeline.spliterator method). In my code I merged it with a StatelessOp subclass and put there logic responsible for filter method implementation. Also for simplcity I implemented skip using filter.
OddLineSp corresponds to your OddLines and its resulting Stream
ReduceIntMax represents ReduceOps terminal operation for Math.max for int
So what's important in this example? The important thing here is that since you first filter you original stream, your OddLineSp is created from a non-empty pipeline i.e. from a WrappingSp. And if you take a closer look at WrappingSp, you'll notice that every time tryAdvance is called, it delegates the call to the sourceSp and accumulates that result(s) into a buffer. Moreover, since you have no flatMap in the pipeline, elements to the buffer will be copied one by one. I.e. every time WrappingSp.tryAdvance is called, it will call ArraySp.tryAdvance, get back exactly one element (via callback), and pass it further to the consumer provided by the caller (unless the element doesn't match the filter in which case ArraySp.tryAdvance will be called again and again but still the buffer is never filled with more than one element at a time).
Sidenote 3: If you want to look at the real code, the most intersting places are j.u.s.StreamSpliterators.WrappingSpliterator.tryAdvance which calls
j.u.s.StreamSpliterators.AbstractWrappingSpliterator.doAdvance which in turn calls j.u.s.StreamSpliterators.AbstractWrappingSpliterator.fillBuffer which in turn calls pusher that is initialized at j.u.s.StreamSpliterators.WrappingSpliterator.initPartialTraversalState
So the main thing that's hurting performance is this copying into the buffer.
Unfortunately for us, usual Java developers, current implementation of the Stream API is pretty much closed and you can't modify only some aspects of the internal behavior using inheritance or composition.
You may use some reflection-based hacking to make copying-to-buffer more efficient for your specific case and gain some performance (but sacrifice laziness of the Stream) but you can't avoid this copying altogether and thus Spliterator-based code will be slower anyway.
Going back to the example from the Sidenote #2, Spliterator-based test with materialized filteredData works faster because there is no WrappingSp in the pipeline before OddLineSp and thus there will be no copying into an intermediate buffer.

SonarQube - Nested If Depth

I'm getting this violation on sonarqube Nested If Depth
if (some condition){
some code;
if (some condition) {
some code;
}
}
and also here:
for (some condition) {
if (some condition) {
some code;
}
}
how can I reduce the depth?
Answer is already accepted, but doesn't answer the actual question.
So for completeness sake I want to add my 2 cents.
For reference see This question here
The example you list looks like it is dealing with guard conditions. Things like "only run this method if ...." or "only perform this loop iteration if ...."
In these cases, if you have 3 or 4 groups of guards you might end up indenting very deeply, making the code harder to read.
Anyways the way to fix this code to be more readable is to return early.
instead of
if (some condition) {
// some code
if (some other condition) {
// some more code
}
}
You can write
if (!some condition) {
return;
}
// some code
if (!some other condition) {
return;
}
// some more code
You now only every have 1 level of nesting and it is clear that you do not run this method unless 'some condition' has been met.
The same goes for the loop, using continue:
for (some condition) {
if (some other condition) {
// some code;
}
}
becomes
for (some condition) {
if (!some other condition) {
continue;
}
// some code
}
What you would state here is that unless 'some other condition' is met, you skip this loop.
The real question is why is the max depth set to 1 ? It's overkill.
This kind of rule is meant to keep your code readable. More than 2 nested blocks can make the code unreadable, but 1-2 will always be readable.
If you decide to keep the max depth set to 1, you need to refactor your code and put every 2nd condition check inside a separate method. No offense, but unless you have a very specific and good reason to do it, it looks like a bit stupid.

Understanding pre/post in pseudo-code notation

I am following these examples of C# code. But I am little confused by the Pseudo Code comments all over the place.
For example:
public void addToHead(Object value)
// pre: value non-null
// post: adds element to head of list
{
SinglyLinkedListElement temp =
new SinglyLinkedListElement(value);
if (tail == null) {
tail = temp;
tail.setNext(tail);
}
else {
temp.setNext(tail.next());
tail.setNext(temp);
}
count++;
}
What does Pre and Post mean here?
I've never seen Post used here. I know what Post means in the context of the Web and HTML etc, but not in pure code.
"Pre" indicates an assumption made at the beginning of execution. In this case, it's indicating that the value passed in is assumed to be not null.
"Post" indicates an assumption made at the end of the execution, i.e. what the routine actually does. In this case, when the routine finishes a new element will have been added to the end of the list. If the routine modifies its parameters or has any other side effects, those modifications should be listed in the "Post" as well.

How to avoid Linq chaining to return null?

I have a problem with code contracts and linq. I managed to narrow the issue to the following code sample. And now I am stuck.
public void SomeMethod()
{
var list = new List<Question>();
if (list.Take(5) == null) { }
// resharper hints that condition can never be true
if (list.ForPerson(12) == null) { }
// resharper does not hint that condition can never be true
}
public static IQueryable<Question> ForPerson(this IQueryable<Question> source, int personId)
{
if(source == null) throw new ArgumentNullException();
return from q in source
where q.PersonId == personId
select q;
}
What is wrong with my linq chain? Why doesn't resharper 'complain' when analyzing the ForPerson call?
EDIT: return type for ForPerson method changed from string to IQueryable, which I meant. (my bad)
Reshaper is correct that the result of a Take or Skip is never null - if there are no items the result is an IEnumerable<Question> which has no elements. I think to do what you want you should check Any.
var query = list.Take(5);
if (!query.Any())
{
// Code here executes only if there were no items in the list.
}
But how does this warning work? Resharper cannot know that the method never returns null from only looking at the method definition, and I assume that it does not reverse engineer the method body to determine that it never returns null. I assume therefore that it has been specially hard-coded with a rule specifying that the .NET methods Skip and Take do not return null.
When you write your own custom methods Reflector can make assumptions about your method behaviour from the interface, but your interface allows it to return null. Therefore it issues no warnings. If it analyzed the method body then it could see that null is impossible and would be able to issue a warning. But analyzing code to determine its possible behaviour is an incredibly difficult task and I doubt that Red Gate are willing to spend the money on solving this problem when they could add more useful features elsewhere with a much lower development cost.
To determine whether a boolean expression can ever return true is called the Boolean satisfiability problem and is an NP-hard problem.
You want Resharper to determine whether general method bodies can ever return null. This is a generalization of the above mentioned NP-hard problem. It's unlikely any tool will ever be able to do this correctly in 100% of cases.
if(source == null) throw new ArgumentNullException();
That's not the code contract way, do you instead mean:
Contract.Require(source != null);

How does LINQ implement the SingleOrDefault() method?

How is the method SingleOrDefault() evaluated in LINQ? Does it use a Binary Search behind the scenes?
Better than attempting to explain in words, I thought I'd just post the exact code of implementation in the .NET Framework, retrieved using the Reflector program (and reformatted ever so slightly).
public static TSource SingleOrDefault<TSource>(this IEnumerable<TSource> source)
{
if (source == null)
throw Error.ArgumentNull("source");
IList<TSource> list = source as IList<TSource>;
if (list != null)
{
switch (list.Count)
{
case 0:
return default(TSource);
case 1:
return list[0];
}
}
else
{
using (IEnumerator<TSource> enumerator = source.GetEnumerator())
{
if (!enumerator.MoveNext())
return default(TSource);
TSource current = enumerator.Current;
if (!enumerator.MoveNext())
return current;
}
}
throw Error.MoreThanOneElement();
}
It's quite interesting to oberserve that an optimisation is made if the object is of type IList<T>, which seems quite sensible. It simply falls back to enumerating over the object otherwise if the object implements nothing more specific than IEnumerable<T>, and does so just how you'd expect.
Note that it can't use a binary search because the object doesn't necessarily represent a sorted collection. (In fact, in almost all usage cases, it won't.)
I would assume that it simply performs the query and if the result count is zero, it returns the default instance of the class. If the result count is one, it returns that instance, and if the result count is greater than one, it throws an exception.
I don't think it does any searching, it's all about getting the first element of the source [list, result set, etc].
My best guess is that it just pulls the first element. If there is no first it returns the default (null, 0, false, etc). If there is a first, it attempts to pull the second result. If there is a second result it throws an exception. Otherwise it returns the first result.

Resources