Filtering with DoubleMetaphoneFilter in Lucene - filter

I want to use DoubleMetaphone in Lucene programmatically.
<dependency>
<groupId>org.apache.lucene</groupId>
<artifactId>lucene-analyzers-phonetic</artifactId>
<version>4.4.0</version>
</dependency>
The above package contains appropriate classes.
This filter can be used in Solr via setting xml.
But I want it to use in Java programmaticaly.
analyzer = new StandardAnalyzer(Version.LUCENE_44);
String field = "title";
Query q = new QueryParser(Version.LUCENE_44, field, analyzer).parse(querystr);
int hitsPerPage = 100;
IndexReader reader = DirectoryReader.open(index);
IndexSearcher searcher = new IndexSearcher(reader);
TopScoreDocCollector collector = TopScoreDocCollector.create(hitsPerPage, true);
searcher.search(q, collector);
ScoreDoc[] hits = collector.topDocs().scoreDocs;
But I don't know how I use the filter.

To use this filter, you'll need to create your own custom Analyzer, similar to the example in the Analyzer documentation. If you want to add a metaphone filter to
Analyzer analyzer = new Analyzer() {
#Override
protected TokenStreamComponents createComponents(String fieldName, Reader reader) {
final StandardTokenizer source = new StandardTokenizer(Version.LUCENE_44, reader);
source.setMaxTokenLength(StandardAnalyzer.DEFAULT_MAX_TOKEN_LENGTH);
TokenStream filter = new StandardFilter(Version.LUCENE_44, filter);
filter = new LowerCaseFilter(Version.LUCENE_44, filter);
filter = new StopFilter(Version.LUCENE_44, filter, StopAnalyzer.ENGLISH_STOP_WORDS_SET);
filter = new DoubleMetaphoneFilter(filter, 4, true);
return new TokenStreamComponents(source, filter)
}
}
That is just an example, of course. Setup you Analyzer however makes sense for the data you want to index.
Also, keep in mind that this filter will need to be applied at index time, as well as query time, so you will need to reindex your data with this filter applied to index the metaphone codes.

Related

Spring Data elastic search with out entity fields

I'm using spring data elastic search, Now my document do not have any static fields, and it is accumulated data per qtr, I will be getting ~6GB/qtr (we call them as versions). Lets say we get 5GB of data in Jan 2021 with 140 columns, in the next version I may get 130 / 120 columns, which we do not know, The end user requirement is to get the information from the database and show it in a tabular format, and he can filter the data. In MongoDB we have BasicDBObject, do we have anything in springboot elasticsearch
I can provide, let say 4-5 columns which are common in every version record and apart from that, I need to retrieve the data without mentioning the column names in the pojo, and I need to use filters on them just like I can do in MongoDB
List<BaseClass> getMultiSearch(#RequestBody Map<String, Object>[] attributes) {
Query orQuery = new Query();
Criteria orCriteria = new Criteria();
List<Criteria> orExpression = new ArrayList<>();
for (Map<String, Object> accounts : attributes) {
Criteria expression = new Criteria();
accounts.forEach((key, value) -> expression.and(key).is(value));
orExpression.add(expression);
}
orQuery.addCriteria(orCriteria.orOperator(orExpression.toArray(new Criteria[orExpression.size()])));
return mongoOperations.find(orQuery, BaseClass.class);
}
You can define an entity class for example like this:
public class GenericEntity extends LinkedHashMap<String, Object> {
}
To have that returned in your calling site:
public SearchHits<GenericEntity> allGeneric() {
var criteria = Criteria.where("fieldname").is("value");
Query query = new CriteriaQuery(criteria);
return operations.search(query, GenericEntity.class, IndexCoordinates.of("indexname"));
}
But notice: when writing data into Elasticsearch, the mapping for new fields/properties in that index will be dynamically updated. And there is a limit as to how man entries a mapping can have (https://www.elastic.co/guide/en/elasticsearch/reference/current/mapping-settings-limit.html). So take care not to run into that limit.

ES Match query analogue in Lucene

I use queries like this one to run in ES:
boolQuery.must(QueryBuilders.matchQuery("field", value).minimumShouldMatch("50%"))
What's the straight analogue for this query in Lucene?
Match Query, as I understand it, basically analyzes the query, and creates a BooleanQuery out of all the terms the analyzer finds. You could get sorta close by just passing the text through QueryParser.
But you could replicate it something like this:
public static Query makeMatchQuery (String fieldname, String value) throws IOException {
//get a builder to start adding clauses to.
BooleanQuery.Builder qbuilder = new BooleanQuery.Builder();
//We need to analyze that value, and get a tokenstream to read terms from
Analyzer analyzer = new StandardAnalyzer();
TokenStream stream = analyzer.tokenStream(fieldname, new StringReader(value));
stream.reset();
//Iterate the token stream, and add them all to our query
int countTerms = 0;
while(stream.incrementToken()) {
countTerms++;
Query termQuery = new TermQuery(new Term(
fieldname,
stream.getAttribute(CharTermAttribute.class).toString()));
qbuilder.add(termQuery, BooleanClause.Occur.SHOULD);
}
stream.close();
analyzer.close();
//The min should match is a count of clauses, not a percentage. So for 50%, count/2
qbuilder.setMinimumNumberShouldMatch(countTerms / 2);
Query finalQuery = qbuilder.build();
return finalQuery;
}

Spring MongoDB query with or operator and text search

How can i build this MongoDB query with Spring Criteria?
{
$or: [
{ "$text" : { "$search" : "570-11024" } },
{"productDetails.code": "572-R110"}
]
}
It combines a fulltext index search with normal Where criteria with an orOperator.
Query's orOperator(Criteria... criteria) method takes only Criteria and no TextCriteria and also no CriteriaDefinition interface.
Yeah you are right, in spring data mongo you could do this,
final TextCriteria textCriteria = TextCriteria.forDefaultLanguage().matchingAny("570-11024");
final DBObject tc = textCriteria.getCriteriaObject();
final Criteria criteria = Criteria.where("productDetails.code").is("572-R110");
final DBObject co = criteria.getCriteriaObject();
BasicDBList or = new BasicDBList();
or.add(tc);
or.add(co);
DBObject qq = new BasicDBObject("$or", or);
// Use MongoTemplate to execute command
mongoTemplate.executeCommand(qq);
Yes, you currently cannot use the Query's orOperator method to combine Criteria and TextCriteria. A workaround involves converting both the Criteria and TextCriteria objects to its Document representations, adding it to a BasicDbList and then converting back to a "$or" Criteria object.
TextCriteria textCriteria = TextCriteria.forDefaultLanguage().matchingAny("570-11024");
Criteria criteria = Criteria.where("productDetails.code").is("572-R110");
BasicDBList bsonList = new BasicDBList();
bsonList.add(criteria.getCriteriaObject());
bsonList.add(textCriteria.getCriteriaObject());
Query query = new Query();
query.addCriteria(new Criteria("$or").is(bsonList));
mongoTemplate.find(query, YourEntity.class);
PS: Someone has raised this issue in the spring-data-mongodb repo with a proposed fix by
changing the parameter types of orOperator from Criteria to CriteriaDefinition.
https://github.com/spring-projects/spring-data-mongodb/issues/3895.

ES Java dynamically add keyed filters to AggregationBuilder

I want to have a method that loops through an ArrayList and based on it's content dynamically generate x amount of Keyed Filters
List<KeyedFilter> filters = new ArrayList<KeyedFilter>();
for (String a: b) {
filters.add(generateKeyedFilterFromList(a.key, a.value, a.buckets);
}
private KeyedFilter (generateKeyedFilterFromList(String key, String value, String[] buckets) {
KeyedFilter filter = new KeyedFilter(key,
QueryBuilder(value, buckets));
}
this return the list of KeyedFilters just fine, however I have yet to find a way to apply it to AggregationBuilder (String, KeyedFilter...)
AggregationBuilder agg = AggregationBuilders.filter("filterName", filters);
tried an array
KeyedFilter[] filterArray = new KeyedFilter[filters.size()];
filterArray = filters.toArray(filterArray);
AggregationBuilder agg = AggregationBuilders.filter("filterName", filterArray); // honestly I am not completely sure why this doesn't work here
Apologies for answering my own question, but in case anyone else is interested. The above actually does work, it was another error that was causing it to fail.

Setting Fuzziness to Auto for MatchQuery

I'm using the fuzziness option for my MatchQuery, however I want to set the Fuzziness value to auto. Is there any way to do this?
Also, for the completion suggester you can set it to be unicode aware, is there any way to do this for my MatchQuery?
This is how I create the request:
var request = new SearchRequest<object>
{
Types = types,
Size = 5,
Query = new QueryContainer(new MatchQuery
{
Field = new PropertyPathMarker { Name = "ProductName.autocomplete" },
Query = q,
Fuzziness = 2.0
}),
Fields = new[]
{
new PropertyPathMarker{Name = "ProductName"}
}
};
return _client.Search<object>(request);
Sadly at the moment you cant everywhere, we have a specialised interface that can represent all fuzziness states but not all places taking a fuzziness parameter use it.
We received a pull request for this that we merged into our 2.0 branch since its a breaking change:
https://github.com/elasticsearch/elasticsearch-net/pull/941
We have no ETA on a 2.0 release as of yet though.

Resources