ES Match query analogue in Lucene - elasticsearch

I use queries like this one to run in ES:
boolQuery.must(QueryBuilders.matchQuery("field", value).minimumShouldMatch("50%"))
What's the straight analogue for this query in Lucene?

Match Query, as I understand it, basically analyzes the query, and creates a BooleanQuery out of all the terms the analyzer finds. You could get sorta close by just passing the text through QueryParser.
But you could replicate it something like this:
public static Query makeMatchQuery (String fieldname, String value) throws IOException {
//get a builder to start adding clauses to.
BooleanQuery.Builder qbuilder = new BooleanQuery.Builder();
//We need to analyze that value, and get a tokenstream to read terms from
Analyzer analyzer = new StandardAnalyzer();
TokenStream stream = analyzer.tokenStream(fieldname, new StringReader(value));
stream.reset();
//Iterate the token stream, and add them all to our query
int countTerms = 0;
while(stream.incrementToken()) {
countTerms++;
Query termQuery = new TermQuery(new Term(
fieldname,
stream.getAttribute(CharTermAttribute.class).toString()));
qbuilder.add(termQuery, BooleanClause.Occur.SHOULD);
}
stream.close();
analyzer.close();
//The min should match is a count of clauses, not a percentage. So for 50%, count/2
qbuilder.setMinimumNumberShouldMatch(countTerms / 2);
Query finalQuery = qbuilder.build();
return finalQuery;
}

Related

Initialize an Elasticsearch SearchResponse object before running a query

I am running various Elastic Search queries from my Java code. In order to consolidate my code, I would like to initialize a SearchResponse object before my conditional loops that each run an ElasticSearch query with different settings. This way, I can execute a single line of code once for getting the total hits from the query. You'll get what I mean from the code
#GET
#Path("/search")
public SearchResultsAndFacets search() {
SearchResultsAndFacets srf = new SearchResultsAndFacets();
RestHighLevelClient client = createHighLevelRestClient();
// Build the base query that applies to all searches
SearchSourceBuilder querySourceBuilder = buildQueryWrapper(colNames, sro.q, sro.f,
facetsToUpdate, sro.u, sro.lc);
SearchResponse searchresponse; // This line does not work. How can I initialize this object here (outside of the following conditional loops)?
// Searches executed from the table view to populate a table of documents
if (searchType.equals("table")) {
List<SortParameters> sortParametersList = sortAdapter(sro.s);
searchResponse = runTableQuery(client, querySourceBuilder, sortParameters, offset, limit);
}
// Searches involving geo_point data to populate a leaflet map
if (searchType.equals("contacts")) {
RestHighLevelClient client = createHighLevelRestClient();
ElasticSearchMapService esms = new ElasticSearchMapService();
searchResponse = esms.runContactsMapQuery(querySourceBuilder, client, <some geographic coordinate parameters necessary for this search>);
MapSearchResponse mapSearchResponse = esms.getLocationsFromSearchResponse(searchResponse);
srf.mapSearchResponse = mapSearchResponse;
}
// I would like to include these next few lines here at the end of the conditional loops.
// Currently they must be inside each if clause.
srf.totalHits = searchResponse.getHits().getTotalHits().value;
srf.elapsed = searchResponse.getTook().getMillis();
srf.facetsData = getUpdatedFacetData(facetsToUpdate,
searchResponse, sro.f);
return srf;
}
Elastic's high level REST client for JAVA does not allow initializing a SearchResponse object like this. It is also not possible to do so with
SearchResponse searchResponse = new SearchResponse();
And there is a null pointer error if we do...
SearchResponse searchResponse = new SearchResponse(null);
How can I rewrite this code so that I can fetech totalHits, elapsed and facetsData outside of the conditional loops?

Spring MongoDB query with or operator and text search

How can i build this MongoDB query with Spring Criteria?
{
$or: [
{ "$text" : { "$search" : "570-11024" } },
{"productDetails.code": "572-R110"}
]
}
It combines a fulltext index search with normal Where criteria with an orOperator.
Query's orOperator(Criteria... criteria) method takes only Criteria and no TextCriteria and also no CriteriaDefinition interface.
Yeah you are right, in spring data mongo you could do this,
final TextCriteria textCriteria = TextCriteria.forDefaultLanguage().matchingAny("570-11024");
final DBObject tc = textCriteria.getCriteriaObject();
final Criteria criteria = Criteria.where("productDetails.code").is("572-R110");
final DBObject co = criteria.getCriteriaObject();
BasicDBList or = new BasicDBList();
or.add(tc);
or.add(co);
DBObject qq = new BasicDBObject("$or", or);
// Use MongoTemplate to execute command
mongoTemplate.executeCommand(qq);
Yes, you currently cannot use the Query's orOperator method to combine Criteria and TextCriteria. A workaround involves converting both the Criteria and TextCriteria objects to its Document representations, adding it to a BasicDbList and then converting back to a "$or" Criteria object.
TextCriteria textCriteria = TextCriteria.forDefaultLanguage().matchingAny("570-11024");
Criteria criteria = Criteria.where("productDetails.code").is("572-R110");
BasicDBList bsonList = new BasicDBList();
bsonList.add(criteria.getCriteriaObject());
bsonList.add(textCriteria.getCriteriaObject());
Query query = new Query();
query.addCriteria(new Criteria("$or").is(bsonList));
mongoTemplate.find(query, YourEntity.class);
PS: Someone has raised this issue in the spring-data-mongodb repo with a proposed fix by
changing the parameter types of orOperator from Criteria to CriteriaDefinition.
https://github.com/spring-projects/spring-data-mongodb/issues/3895.

ES Java dynamically add keyed filters to AggregationBuilder

I want to have a method that loops through an ArrayList and based on it's content dynamically generate x amount of Keyed Filters
List<KeyedFilter> filters = new ArrayList<KeyedFilter>();
for (String a: b) {
filters.add(generateKeyedFilterFromList(a.key, a.value, a.buckets);
}
private KeyedFilter (generateKeyedFilterFromList(String key, String value, String[] buckets) {
KeyedFilter filter = new KeyedFilter(key,
QueryBuilder(value, buckets));
}
this return the list of KeyedFilters just fine, however I have yet to find a way to apply it to AggregationBuilder (String, KeyedFilter...)
AggregationBuilder agg = AggregationBuilders.filter("filterName", filters);
tried an array
KeyedFilter[] filterArray = new KeyedFilter[filters.size()];
filterArray = filters.toArray(filterArray);
AggregationBuilder agg = AggregationBuilders.filter("filterName", filterArray); // honestly I am not completely sure why this doesn't work here
Apologies for answering my own question, but in case anyone else is interested. The above actually does work, it was another error that was causing it to fail.

Filtering with DoubleMetaphoneFilter in Lucene

I want to use DoubleMetaphone in Lucene programmatically.
<dependency>
<groupId>org.apache.lucene</groupId>
<artifactId>lucene-analyzers-phonetic</artifactId>
<version>4.4.0</version>
</dependency>
The above package contains appropriate classes.
This filter can be used in Solr via setting xml.
But I want it to use in Java programmaticaly.
analyzer = new StandardAnalyzer(Version.LUCENE_44);
String field = "title";
Query q = new QueryParser(Version.LUCENE_44, field, analyzer).parse(querystr);
int hitsPerPage = 100;
IndexReader reader = DirectoryReader.open(index);
IndexSearcher searcher = new IndexSearcher(reader);
TopScoreDocCollector collector = TopScoreDocCollector.create(hitsPerPage, true);
searcher.search(q, collector);
ScoreDoc[] hits = collector.topDocs().scoreDocs;
But I don't know how I use the filter.
To use this filter, you'll need to create your own custom Analyzer, similar to the example in the Analyzer documentation. If you want to add a metaphone filter to
Analyzer analyzer = new Analyzer() {
#Override
protected TokenStreamComponents createComponents(String fieldName, Reader reader) {
final StandardTokenizer source = new StandardTokenizer(Version.LUCENE_44, reader);
source.setMaxTokenLength(StandardAnalyzer.DEFAULT_MAX_TOKEN_LENGTH);
TokenStream filter = new StandardFilter(Version.LUCENE_44, filter);
filter = new LowerCaseFilter(Version.LUCENE_44, filter);
filter = new StopFilter(Version.LUCENE_44, filter, StopAnalyzer.ENGLISH_STOP_WORDS_SET);
filter = new DoubleMetaphoneFilter(filter, 4, true);
return new TokenStreamComponents(source, filter)
}
}
That is just an example, of course. Setup you Analyzer however makes sense for the data you want to index.
Also, keep in mind that this filter will need to be applied at index time, as well as query time, so you will need to reindex your data with this filter applied to index the metaphone codes.

How to use multifieldquery and filters in Lucene.net

I want to perform a multi field search on a lucene.net index but filter the results based on one of the fields. Here's what I'm currently doing:
To index the fields the definitions are:
doc.Add(new Field("id", id.ToString(), Field.Store.YES, Field.Index.UN_TOKENIZED));
doc.Add(new Field("title", title, Field.Store.NO, Field.Index.TOKENIZED));
doc.Add(new Field("summary", summary, Field.Store.NO, Field.Index.TOKENIZED, Field.TermVector.YES));
doc.Add(new Field("description", description, Field.Store.NO, Field.Index.TOKENIZED, Field.TermVector.YES));
doc.Add(new Field("distribution", distribution, Field.Store.NO, Field.Index.UN_TOKENIZED));
When I perform the search I do the following:
MultiFieldQueryParser parser = new MultiFieldQueryParser(new string[]{"title", "summary", "description"}, analyzer);
parser.SetDefaultOperator(QueryParser.Operator.AND);
Query query = parser.Parse(text);
BooleanQuery bq = new BooleanQuery();
TermQuery tq = new TermQuery(new Term("distribution", distribution));
bq.Add(tq, BooleanClause.Occur.MUST);
Filter filter = new QueryFilter(bq);
Hits hits = searcher.Search(query, filter);
However, the result is always 0 hits.
What am I doing wrong?
I think I now have a solution. I have discarded the use of the QueryFilter and am using a boolean query to constrain the results before the MultiFieldQuery. So the code will look something like this:
MultiFieldQueryParser parser = new MultiFieldQueryParser(new string[]{"title", "summary", "description"}, analyzer);
parser.SetDefaultOperator(QueryParser.Operator.AND);
Query query = parser.Parse(text);
BooleanQuery bq = new BooleanQuery();
TermQuery tq = new TermQuery(new Term("distribution", distribution));
bq.Add(tq, BooleanClause.Occur.MUST);
bq.Add(query, BooleanClause.Occur.MUST)
Hits hits = searcher.Search(bq);

Resources