How do I register a custom converter so that #GeoShapeField picks it up? - elasticsearch

I am using spring-data-elasticsearch (5) to automagically write third-party data into an ES (8) index. The data contains geodata in GML format, which is parsed into a nested Map<String, Object>.
In my POJO I have a field
#GeoShapeField
private Map<String, Object> geometry;
This is written perfectly fine in many cases; however, the data I get can also contain e.g. Envelope, which is not supported by GeoJson but could be imported without problems into ES.
I can write simple custom ReadingConverter/WritingConverters - but how can I register them in a way that #GeoShapeField automatically chooses them when appropriate?
I see that org.springframework.data.elasticsearch.core.convert.GeoConverters is responsible for choosing the correct converter, esp. .GeoJsonToMapConverter and .MapToGeoJsonConverter. How would I correctly extend the class/replace it, so that #GeoShapeField looks for an additional (or more) type(s)?

As pointed out by P.J.Meisch in the comments, I had several understanding problems, which led to the formulation of the question.
The answer to my actual question is straightforward: for Envelope, Elasticsearch expects
"myField": {
"type" : "envelope",
"coordinates" : [ [100.0, 1.0], [101.0, 0.0] ]
}
To achieve this using spring-data-elasticsearch, it is enough to provide a simple translation into a Map<String, Object>:
Map<String, Object> myField = new HashMap<>();
myField.put("type", "envelope");
myField.put("coordinates", Arrays.asList(Arrays.asList(100.0, 1.0), Arrays.asList(101.0, 0.0)));
The point over which I tripped was the data. I receive bounding boxes specifying lower left and upper right corners. However, ES expects upper left and lower right corners for a bounding box. After switching the respective positions, everything works now.

Related

How to get the output of the last but one layer of the Vision transformer using the hugging face implementation?

I am trying to use the huggingface implementation of the vision transformer to get the feature vector of the last but one dense layer
In order to get information from the second last layer, you need to output_hidden_states=True. Here is an example in my context:
configBert = BertConfig.from_pretrained('bert-base-uncased', output_hidden_states=True, num_labels=NUM_LABELS)
modelBert = TFBertModel.from_pretrained('bert-base-uncased', config=configBert)

What is the purpose of RocksDBStore with Serdes.Bytes() and Serdes.ByteArray()?

RocksDBStore<K,V> stores keys and values as byte[] on disk. It converts to/from K and V typed objects using Serdes provided while constructing the object of RocksDBStore<K,V>.
Given this, please help me understand the purpose of the following code in RocksDbKeyValueBytesStoreSupplier:
return new RocksDBStore<>(name,
Serdes.Bytes(),
Serdes.ByteArray());
Providing Serdes.Bytes() and Serdes.ByteArray() looks redundant.
RocksDbKeyValueBytesStoreSupplier is introduced in KAFKA-5650 (Kafka Streams 1.0.0) as part of KIP-182: Reduce Streams DSL overloads and allow easier use of custom storage engines.
In KIP-182, there is the following sentence :
The new Interface BytesStoreSupplier supersedes the existing StateStoreSupplier (which will remain untouched). This so we can provide a convenient way for users creating custom state stores to wrap them with caching/logging etc if they chose. In order to do this we need to force the inner most store, i.e, the custom store, to be a store of type <Bytes, byte[]>.
Please help me understand why we need to force custom stores to be of type <Bytes, byte[]>?
Another place (KAFKA-5749) where I found similar sentence:
In order to support bytes store we need to create a MeteredSessionStore and ChangeloggingSessionStore. We then need to refactor the current SessionStore implementations to use this. All inner stores should by of type < Bytes, byte[] >
Why?
Your observation is correct -- the PR implementing KIP-182 did miss to remove the Serdes from RocksDBStore that are not required anymore. This was fixed in 1.1 release already.

How to realize the lucene suggester with support contexts

Now only AnalyzingInfixSuggester support contexts. All other implementations, such as AnalyzingSuggester, FreeTextSuggester, FuzzySuggester not support contexts.
In my task I need to implementing suggester, which search only on terms which are exist in documents that have specific field with a specific value.
For example, only terms of field description of the documents in which the field type have value TYPE_A.
Now to solve this problem, I created different iterator for each type,
like that:
Map<String, List<String>> mapOfTerms...;
int maxDoc = indexReader.maxDoc();
for (int i = 0; i < maxDoc; i++) {
Document doc = indexReader.document(i);
String type = doc.get("type");
List<String> list = mapOfTerms.get(type);
//... add terms from doc to list
}
//create custom InputIterator for each type list
//create AnalyzingSuggester, AnalyzingInfixSuggester, FreeTextSuggester, FuzzySuggester for each InputIterator
For example, for three types "TYPE_A", "TYPE_B", TYPE_C" I make 12 suggesters.
How to solve this problem better?
The question is about a context for suggester. My answer is about filtering the result as if the suggester would have been build only for a subset of the documents.
The lucene-folk calls this filter a context: https://issues.apache.org/jira/browse/LUCENE-6464
Version 5.4 is the first solr version with supported filter on suggesters, but only for BlendedInfixSuggester and AnalyzingInfixSuggester:
Make Lucene's AnalyzingInfixSuggester.lookup() method that takes a BooleanQuery filter parameter available in Solr
So at time for all other suggesters you have to build one suggester for each possible filter e.g. by creating a extra field for each filter "field_filtername".
Without extra information I can not see how to solve this problem better.
Possible your context could be used as route for Shard Splitting (and the separation of suggesters per context is already done by SolrCloud).
Possible you don't need a suggester at all and highlighting or facets could solve your original problem.

AngularDart custom filter call() method required to be idempotent?

The main running example of the Angular Dart tutorial is a Recipe Book app. The exercise at the end of the Chapter 5 on filters and services suggests trying to "create a [custom] filter that will multiply all the amounts [of each ingredient listed] in the recipes" thus allowing a "user to double, triple, or quadruple the recipe." E.g. an ingredient of "1/2 cup of flour" would become "1 cup of flour" when doubled.
I have written such a custom filter: it takes a list of Ingredients (consisting of a quantity and a description) and returns a new list of new Ingredients (with increased quantities), but I am getting the following error:
5 $digest() iterations reached. Aborting!
My question is: what is the required and/or permitted behavior of an AngularDart custom filter call() method? E.g., clearly it is permitted to remove (i.e. filter) elements from its input list, but can it also add new or replace elements? The Dart angular.core NgFilter documentation simply says that a "filter is a class with a call method". I have not found more details.
Extrapolating from the answer to this AngularJS post, it would seem that repeated invocations of call() should (eventually?) yield "the same result". If so, this would be a reasonable constraint.
Yielding "the same result" could mean that call() needs to be idempotent, but in the case of Dart such idempotence should be relative to == (object equivalence) not identical() (object identity), IMHO. I ran a few tests using the following small example to illustrate the issues:
main.dart
import 'package:angular/angular.dart';
class A { }
#NgFilter(name:'myFilter') class MutatingCustomFilter {
final A _a = new A();
call(List list) => new List.from(list)..add(_a); // runs ok.
// call(List list) => new List.from(list)..add(new A()); // gives error
}
class MyAppModule extends Module {
MyAppModule() { type(MutatingCustomFilter); }
}
main() => ngBootstrap(module: new MyAppModule());
index.html excerpt
<ul>
<li ng-repeat="x in [1,2,3] | myFilter">{{x}}</li>
</ul>
If I change the body of class A to be
#override bool operator==(other) => true;
#override int get hashCode => 1;
which makes all instances of A considered ==, then the second implementation of call() in main.dart (the one with add(new A())) still gives an error (though a different one).
I can see how to solve the tutorial exercise without use of a custom filter, but I am trying to not give up on the challenge of finding a filter that will work as requested. I am new to Angular and decided to jump in with AngularDart, so any help in explaining the effects of the various flavors of call(), or in finding documentation for the expected behavior of call(), (or letting me know if you think such a custom filter simply cannot be written!) would be appreciated.
Too many iterations
When angular detects a change in the model, it executes a reaction function. The reaction function can further change the model. This would leave the model in inconsistent state. For this reason we re-run the change detection, which can further create more changes. For this reason we keep re-running the changes until the model stabilizes. But how many times should we rerun the change detection before giving up? By default it is 5 times. If the model does not stabilize after 5 iteration we give up. This is what is going on in your case.
Change Detection
When has object changed? one can use identical or == (equals). Good arguments can be made for each, but we have chosen to use identical because it is fast and consistent. Using == (equals) is tricky and it would negatively impact the change detection algorithm.
Filters and arrays
When a filter which operates an an array, executes it has no choice but to create a new instance of the array. This breaks identical, but luckily it is fed into ng-repeat which uses its own algorithm for array contents detection rather the array detection. While the array does not have to be identical between runs, its content must be. Otherwise ng-repeat can not tell the difference between insertions and changes, which it needs to do proper animations.
Your code
The issue with your filter is that it creates new instance on each iteration of the digest loop. These new instances prevent the model from stabilizing and hence the error. (There are plans to solve this issue, but it will be few weeks before we get there.)
Solution
Your solutions is attempting to create a filter which consumes the whole array and then attempts to create a new array, for the ng-repeat. A different (prefered) solution would be to leave the ng-repeat iteration as is, and instead place the filter on the binding which is creating the qty and apply it there.
<span>{{recipe.qty | myFilter:multiply}}</span>

How to filter out "Hits" result returned by indexsearcher.search() function?

How can I reduce the size of a "Hits" object that is returned by indexsearcher.search() function?
Currently I do something like:
Hits hits = indexSearch.search(query,filter,...);
Iterator hitsIt = hits.iterator();
int newSize=0;
while (hitsIt.hasNext()){
Hit currHit = (Hit)hitsIt.next();
if (hasPermission(currHit)){
newSize++;
}
}
However, this is creating a huge performance problem when the number of hits is large (like 500 or more).
I have heard of something called "HitsCollector" or maybe "Collector" which is supposed to help improve performance, but I don't have any idea how to use it.
Would appreciate it if someone could point me in the right direction.
We are using Apache Lucene for indexing within the Atlassian Confluence web application.
A Collector is just a simple callback mechanism that gets invoked for each document hit, you'd use a collector like this :-
public class MyCollector extends HitCollector {
// this is called back for every document that
// matches, with the docid and the score
public void collect(int doc, float score){
// do whatever you have to in here
}
}
..
HitCollector collector = new MyCollector();
indexSearch(query,filter,collector);
For good performance you will have to index you security information along with each document. This of course depends on your security model. For example, if you can assign each document to security roles that have permissions for it, then use that. Also
check out this question. Yours is pretty much a duplicate of that.

Resources